# HISTORY 08 May 2016: Updated by: TOUCHUP-v1.19 02 May 2016: Saved by krc using Paint 2.0-beta17 # molecular_function 20150220: root_PTN000559990 has function RNA binding (GO:0003723) 20150220: Eukaryota_PTN000560394 has function box H/ACA snoRNA binding (GO:0034513) # cellular_component 20150218: root_PTN000559990 is found in cytosolic large ribosomal subunit (GO:0022625) 20160502: Eukaryota_PTN000559994 is found in 90S preribosome (GO:0030686) 20150220: Eukaryota_PTN000560394 is found in box H/ACA snoRNP complex (GO:0031429) 20150220: Eukaryota_PTN000560317 is found in U4/U6 x U5 tri-snRNP complex (GO:0046540) 20150220: Eukaryota_PTN000560317 is found in precatalytic spliceosome (GO:0071011) 20150220: Eukaryota_PTN000560317 is found in small-subunit processome (GO:0032040) 20150220: Eukaryota_PTN000560317 is found in box C/D snoRNP complex (GO:0031428) 20160502: Eukaryota_PTN001134767 is found in mitochondrion (GO:0005739) 20150218: Eukaryota_PTN000559994 is NOT found in cytosolic large ribosomal subunit (GO:0022625) 20150218: Eukaryota_PTN001134767 is NOT found in cytosolic large ribosomal subunit (GO:0022625) 20150218: Eukaryota_PTN000560394 is NOT found in cytosolic large ribosomal subunit (GO:0022625) 20150218: Eukaryota_PTN000560317 is NOT found in cytosolic large ribosomal subunit (GO:0022625) # biological_process 20150220: root_PTN000559990 participates in translation (GO:0006412) 20150220: root_PTN000559990 participates in maturation of LSU-rRNA (GO:0000470) 20150220: Eukaryota_PTN000560394 participates in cleavage involved in rRNA processing (GO:0000469) 20150220: Eukaryota_PTN000560394 participates in rRNA pseudouridine synthesis (GO:0031118) 20150220: Eukaryota_PTN000560394 participates in snRNA pseudouridine synthesis (GO:0031120) 20150220: Eukaryota_PTN000560317 participates in mRNA splicing, via spliceosome (GO:0000398) # PRUNED 20150220: Eukaryota_PTN000560317 participates in maturation of SSU-rRNA (GO:0030490) 20150220: Eukaryota_PTN000559994 does NOT participate in translation (GO:0006412) 20150220: Eukaryota_PTN000560317 does NOT participate in translation (GO:0006412) 08 May 2016: Schizosaccharomyces pombe_SPAP14E8.02 has been pruned from tree 08 May 2016: Saccharomyces cerevisiae S288c_S000004173 has been pruned from tree 08 May 2016: Saccharomyces cerevisiae S288c_S000002909 has been pruned from tree 08 May 2016: Saccharomycetaceae_PTN000560039 has been pruned from tree # WARNINGS - THE FOLLOWING HAVE BEEN REMOVED FOR THE REASONS NOTED # NOTES 29 Apr 2016: Schizosaccharomyces pombe_SPAP14E8.02 has been pruned from tree 29 Apr 2016: Saccharomycetaceae_PTN000560039 has been pruned from tree 29 Apr 2016: Saccharomyces cerevisiae S288c_S000004173 has been pruned from tree 29 Apr 2016: Saccharomyces cerevisiae S288c_S000002909 has been pruned from tree This family contains the uL1 large ribosomal subunit sequences conserved across eubacteria (including chloroplast ribosomes), Archaea, and eukaryotes (including mitochondrial ribosomes) and also the eL8 large ribosomal subunit sequences conserved across Archaea and eukaryotes (aka yeast L8, mammalian L7A, Archaeal rpl7ae). There have been additional duplications in the eukaryotes producing proteins involved in ribosome biogenesis, but not present in the mature ribosome. - In Eubacteria, this tree contains: -- uL1 (encoded by rplA sequences) -- Note: bacterial clade includes RPL1 genes encoding "50S ribosomal protein L1, chloroplastic" - In Archaea, this tree contains sequences for two ribosomal subunits: -- uL1 (encoded by Archaeal rpl1p sequences; placement of rpl1p sequences is not consistent) -- eL8 (encoded by Archaeal rpl7ae sequences) - In Eukaryea, this family includes -- two cytosolic ribosomal subunits: --- uL1 protein (RPL1A & RPL1B in S. cerevisiae), L10A in vertebrates (e.g. human RPL10A), rpl1p in Archaea --- eL8 (aka yeast L8, mammalian L7A, Archaeal rpl7ae) -- mitochondrial ribosomal subunits (encoded by MRPL1 genes) -- RP related proteins involved in ribosome biogenesis, but not found in mature ribosome --- UTP30 (aka RSL1D1 in vertebrates) in subclade with most of Archaeal rpl1p sequences --- two proteins in subclade with eL8 sequences ---- NHP2 ---- SNU13 (aka NHP2L1 in vertebrates) ============================== Comments on the tree ------------------- This tree has a bacterial node (which includes plant chloroplast ribosomal subunits) and then a duplication node with four Eukaryotic/Archaeal branches. However, there is probably an issue with the placement of some of the Archaeal sequences as the rpl1p sequences are not all placed in the same branch. Archaea - There are two sets of Archael sequences present in this tree -- The rpl7ae sequences are present within a major branch that contains the eukaryotic ribosomal protein RPL7A sequences as well as two additional eukaryotic duplications that are not RP proteins: NHP2 and NHP2L1 (aka SNU13). -- Issues with placement of rpl1p sequences (not clear which of three locations is correct): --- Most of the rpl1p sequences are present in the clade with the eukaryotic RSL1D1 (aka UTP30) sequences that are involved in ribosomal large subunit biogenesis, but which are not present within the mature ribosome. Looking at the MSA of these 5 Archaeal sequences, they do not seem to be as conserved with the eukaryotic RPL1/RPL10A sequences or the bacterial rplA sequences as do the Archaeal rpl7ae sequences. --- The SULSO_rpl1p sequence is present in the RPL1/RPL10A clade containing the eukaryotic ribosomal protein L1 (aka L10A) sequences. --- The HALSA_rpl1p sequence is present in the clade that contains the mitochondrial MRPL1 sequences encoding the L1 subunit of the mitochondrial ribosome. --- Note that Archaea do contain snoRNPs that are involved in rRNA modification and ribosome biogenesis, though I don't know much about the conservation of the "nonRP" proteins involved in ribosome assembly, some of which are thought to be derived from RP proteins, or how early they split from RP proteins. RSL1D1/UTP30 clade - In addition to S.cerevisiae UTP30 and its E. gossypii homolog (Q756L8_ASHGO), this clade contains S.cerevisiae CIC1 and its E. gossypii homolog (Q75BL9_ASHGO), under a duplication node with just these two pairs of sequences. There aren't any other similar duplication nodes in this branch of the tree and the branch length to the CIC1 pair is greater than to the UTP30 pair. Looking at the full alignment, the Sc CIC1 and ASHGO_AGOS_AER236C (aka Q756L8_ASHGO) sequences look like they are out of place, while the UTP30 and ASHGO_ACR252C pair looks pretty good. Therefore, I have pruned the Sc CIC1 and ASHGO_AGOS_AER236C (aka Q756L8_ASHGO) sequences. NHP2 clade - S. pombe tos4 is located in a duplication node with S. pombe nhp2, but with a significantly longer branch length. Note that the Sc TOS4 gene was placed in the MRPL1 clade. RPL7A clade - The sequence for HUMAN_RPL7A (UniProtKB:P62424) is present in two different places within the RPL7A clade. - The tree for the RPL7A clade has a really large number of unreviewed TrEMBL sequences in the primates (mouse, rat, macaque). MRPL1 clade - Yeast PLM2 and TOS4 sequences are paralogous transcription factors and do not belong in this tree. Note that the S. pombe tos4 gene was placed in the NHP2 clade. - The G4VLH1_SCHMA is located most closely to the Yeast TOS4 sequence which does not belong in this tree, but it is annotated as "Putative 50s ribosomal protein L1" so I have left it here. ============================== Note on Nomenclature ------------------- - This family contains both the S. cerevisiae RPL8A and RPL8B sequences, but not either ScRPL7A or ScRPL7B. This is a known nomenclature inconsistency. The yeast L8 proteins are equivalent to the human L7A proteins, a eukaryote specific ribosomal protein. Yeast L7 and human L7 are equivalent to each other and to bacterial L30 (PMID:24524803). ============================== Annotation comments ------------------- - Several human genes: RPL10A, NHP2, RPL7A hav experimental MF annotations for to "poly(A) RNA binding" (GO:0044822) from two high throughput studies: PMID:22681889 and PMID:22658674. Between the fact that these were high throughput experiments, the fact that this protein is normally part of a large complex, and the fact that it is not clear that poly(A) RNA binding is biologically relevant, I have chosen not to propagate this MF annotation. Watkins et a.. A common core RNP structure shared between the small nucleoar box C/D RNPs and the spliceosomal U4 snRNP. Cell. 2000 Oct 27;103(3):457-66. PubMed PMID: 11081632 - instead of just "box C/D snoRNP complex" annots, this paper supports annots to both a U3 snoRNP specific complex term and a "methylation guide box C/D snoRNP" term for all 5 genes currently annotated # REFERENCE Annotation inferences using phylogenetic trees The goal of the GO Reference Genome Project, described in PMID 19578431, is to provide accurate, complete and consistent GO annotations for all genes in twelve model organism genomes. To this end, GO curators are annotating evolutionary trees from the PANTHER database with GO terms describing molecular function, biological process and cellular component. GO terms based on experimental data from the scientific literature are used to annotate ancestral genes in the phylogenetic tree by sequence similarity (ISS), and unannotated descendants of these ancestral genes are inferred to have inherited these same GO annotations by descent. The annotations are done using a tool called PAINT (Phylogenetic Annotation and INference Tool).