# HISTORY 26 Mar 2016: Updated by: TOUCHUP-v1.15 17 Mar 2016: Updated by: TOUCHUP-v1.14 17 Mar 2016: Updated by: TOUCHUP-v1.14 # molecular_function # cellular_component 20140915: Eukaryota_PTN000523215 is found in nucleolus (GO:0005730) 20140915: Eukaryota_PTN000523215 is found in small-subunit processome (GO:0032040) 20140915: Eukaryota_PTN000523215 is found in Cul4-RING E3 ubiquitin ligase complex (GO:0080008) # biological_process 20140915: Eukaryota_PTN000523215 participates in maturation of SSU-rRNA from tricistronic rRNA transcript (SSU-rRNA, 5.8S rRNA, LSU-rRNA) (GO:0000462) # WARNINGS - THE FOLLOWING HAVE BEEN REMOVED FOR THE REASONS NOTED # NOTES This family comprises the SOF1 subunits of the ribosomal Small Subunit Processome, also called the SSU Processome, a large complex which is involved in the initial cleavages of the primary rRNA transcript to separate the small ribosomal subunit (SSU) rRNA from the remainder of the transcript and the biogenesis of the small ribosomal subunit. The SSU processome was originally identified and characterized from S. cerevisiae (Dragon et al. 2002, PMID:12068309; Gallagher et al. 2004, PMID:15489292; Bernstein et al. 2004, PMID:15590835; and reviewed in Phipps et al. 2011, PMID:21318072). As of September 2014, it has begun to be characterized experimentally from other species such as human (Turner et al. 2012, PMID:22418842; Sato et al. 2013, PMID:24219289; and Hu et al. 2011, PMID:21078665), zebrafish (Wilkins et al. 2013, PMID:24147052), and mouse (Gallenberger et al. 2011, PMID:21051332). The SOF1 subunit is a confirmed subunit of the SSU processome, but is not classified as being part of a specific subcomplex (Phipps et al. 2011, PMID:21318072). Feng et al. 2013 (PMID:24214024) performed an extensive computational analysis from 77 completely sequenced eukaryotic genomes, including representatives of the five eukaryotic supergroups: Opisthokonts, Amoebozoa, Plantae, Excavates, and Chromalveolates, and compared these to sequences from both prokaryotic and Archaeal species for all 51 confirmed and 26 likely SSU processome subunits in S. cerevisiae as indicated in Phipps et al. 2011 (PMID:21318072). In addition, Srivastava et al. have identified SSU processome subunits in the parasitic protist Entamoeba histolytica (PMID:24631428). SOF1 is one of the 51 confirmed proteins of the S. cerevisiae SSU processome (Phipps et al. 2011, PMID:21318072)) and is highly conserved across the 77 eukaryotic species, as listed in Table 1 of Feng et al. 2013 (PMID:24214024). It is also found in the parasitic protist Entamoeba histolytica (Srivastava et al. 2014, PMID:24631428). Annotation comments: --------------------- - No MF annotations were propagated in this tree. - There were experimental MF annotations for human DCAF13 to "poly(A) RNA binding" (GO:0044822) from two high throughput studies: PMID:22681889 and PMID:22658674. Between the fact that these were high throughput experiments, the fact that this protein is normally part of a large complex, and the fact that it is not clear that poly(A) RNA binding is biologically relevant, I have chosen not to propagate this MF annotation. - The SOF1 homologs in human (DCAF13) and Arabidopsis (At4g28450) have also been characterized as being substrate receptors for the Cul4 ubiquitin ligase. There is no MF term that represents this, but there is a complex term for "Cul4-RING E3 ubiquitin ligase complex ; GO:0080008)", so I have propagated the CC term. - There are a number of BP annotations in C. elegans and D. melanogaster that I have not propagated as they are most likely downstream effects of disrupting ribosome biogenesis. Sequence comments ---------------- The Phytophthora infestans sequence PHYIT_PITG_01283 has a suspiciously long N-terminal extension of about 1850 aa's, on a protein that's less than 1000 aa's long, while only a few other sequences have N-terminal extensions at all, and the other N-terminal extensions that exist are much shorter (50-125 aa's). The duplication of Branchiostoma floridae sequences appears to be a sequence issue rather than a true duplication. The N-terminal portion of the BRAFL_BRAFLDRAFT_63324 sequence does not align with the conserved sequence, the the C-terminus does. The BRAFL-BRAFLDRAFT_116978 sequence contains only the amino terminus. The duplication of Ornithorhynchus anatinus sequences appears to be a sequence issue rather than a true duplication. Both sequences, ORNAN_LOC_100087775 and ORNAN_LOC_100078987, appear to be partial. The duplication node containing four Entamoeba histolytica sequences looks like it contains at least one partial sequence (ENTHI_EHI_141220). The other three look similar or identical, except for minor differences at the extreme C-terminus. # REFERENCE Annotation inferences using phylogenetic trees The goal of the GO Reference Genome Project, described in PMID 19578431, is to provide accurate, complete and consistent GO annotations for all genes in twelve model organism genomes. To this end, GO curators are annotating evolutionary trees from the PANTHER database with GO terms describing molecular function, biological process and cellular component. GO terms based on experimental data from the scientific literature are used to annotate ancestral genes in the phylogenetic tree by sequence similarity (ISS), and unannotated descendants of these ancestral genes are inferred to have inherited these same GO annotations by descent. The annotations are done using a tool called PAINT (Phylogenetic Annotation and INference Tool).