# HISTORY 26 Mar 2016: Updated by: TOUCHUP-v1.15 17 Mar 2016: Updated by: TOUCHUP-v1.14 # molecular_function 20150917: root_PTN000566842 has function RNA binding (GO:0003723) 20150929: Eukaryota_PTN000566979 has function pre-mRNA 3'-splice site binding (GO:0030628) 20150929: Eukaryota_PTN000566979 has function poly-pyrimidine tract binding (GO:0008187) 20150917: Eukaryota_PTN000567094 contributes to function first spliceosomal transesterification activity (GO:0000384) 20150930: Eukaryota_PTN000566947 has function mRNA binding (GO:0003729) 20150929: Euteleostomi_PTN000566987 has function protein serine/threonine kinase activity (GO:0004674) 20150929: Euteleostomi_PTN000566987 has function ribonucleoprotein complex binding (GO:0043021) 20150930: Eukaryota_PTN000566843 has function mRNA binding (GO:0003729) # cellular_component 20150929: Eukaryota_PTN000566979 is found in nuclear speck (GO:0016607) 20150929: Eukaryota_PTN000566979 is found in commitment complex (GO:0000243) 20150929: Eukaryota_PTN000566979 is found in U2-type prespliceosome (GO:0071004) 20150929: Eukaryota_PTN000566979 is found in U2AF (GO:0089701) 20150917: Eukaryota_PTN000567094 is found in precatalytic spliceosome (GO:0071011) 20150917: Eukaryota_PTN000567094 is found in U2 snRNP (GO:0005686) 20150917: Eukaryota_PTN000567094 is found in catalytic step 2 spliceosome (GO:0071013) 20150917: Eukaryota_PTN000567094 is found in RES complex (GO:0070274) 20150930: Eukaryota_PTN000566947 is found in exon-exon junction complex (GO:0035145) 20150929: Euteleostomi_PTN000566987 is found in ribonucleoprotein granule (GO:0035770) 20150930: Eukaryota_PTN000566843 is found in mRNA cleavage and polyadenylation specificity factor complex (GO:0005847) 20150930: Eukaryota_PTN000755691 is found in U12-type spliceosomal complex (GO:0005689) # biological_process 20150917: Eukaryota_PTN000567094 participates in mRNA export from nucleus (GO:0006406) # PRUNED 20150917: Eukaryota_PTN000567094 participates in mRNA splicing, via spliceosome (GO:0000398) 20150930: Eukaryota_PTN000566947 participates in regulation of alternative mRNA splicing, via spliceosome (GO:0000381) 20150930: Eukaryota_PTN000566947 participates in nuclear-transcribed mRNA catabolic process, nonsense-mediated decay (GO:0000184) 20150929: Euteleostomi_PTN000566987 participates in regulation of protein export from nucleus (GO:0046825) 20150929: Euteleostomi_PTN000566987 participates in peptidyl-serine phosphorylation (GO:0018105) 20150930: Eukaryota_PTN000566843 participates in pre-mRNA cleavage required for polyadenylation (GO:0098789) 20150930: Eukaryota_PTN000755691 participates in mRNA splicing, via spliceosome (GO:0000398) 26 Mar 2016: Deuterostomia_PTN000395674 has been pruned from tree 26 Mar 2016: Saccharomycetaceae_PTN000567059 has been pruned from tree 26 Mar 2016: Eukaryota_PTN001899219 has been pruned from tree 26 Mar 2016: Pentapetalae_PTN001899168 has been pruned from tree 26 Mar 2016: Ascomycota_PTN000395705 has been pruned from tree 26 Mar 2016: Saccharomyces cerevisiae S288c_S000003482 has been pruned from tree 26 Mar 2016: Saccharomyces cerevisiae S288c_S000001557 has been pruned from tree 26 Mar 2016: Eukaryota_PTN001502923 has been pruned from tree # WARNINGS - THE FOLLOWING HAVE BEEN REMOVED FOR THE REASONS NOTED # NOTES 17 Mar 2016: Saccharomyces cerevisiae S288c_S000003482 has been pruned from tree 17 Mar 2016: Deuterostomia_PTN000395674 has been pruned from tree 17 Mar 2016: Saccharomyces cerevisiae S288c_S000001557 has been pruned from tree 17 Mar 2016: Saccharomycetaceae_PTN000567059 has been pruned from tree 17 Mar 2016: Ascomycota_PTN000395705 has been pruned from tree 17 Mar 2016: Eukaryota_PTN001899219 has been pruned from tree 17 Mar 2016: Pentapetalae_PTN001899168 has been pruned from tree 17 Mar 2016: Eukaryota_PTN001502923 has been pruned from tree This is a complicated tree of RNA binding proteins with a duplication node at the top. These are the sequences and nodes that are directly below the duplication node. There was a core pattern of conservation that ran through the entire family. There were also a number of places where it appeared to me that things, either individual sequences or nodes containing varying numbers of sequences, were mis-placed into this family. This is the summary of the nodes & sequences directly below the root duplication node: - Eukaryota_PTN001502923 - 2 Opisthokonts + a C reinhardtii sequence [pruned] - ASHGO_AER042W - Eukaryota_PTN000566979 - Sp prp2, Sc YGR250C [pruned], vertebrate USAF2 - Eukaryota_PTN000567094 - Sc IST3, vertebrate RBMX2, Dmel tay - Eukaryota_PTN000566947 - Sp mis3 [pruned], Sc MUD2 [pruned], vertebrate RBM8A, Dmel tsu, At Y14 - Eukaryota_PTN000566843 - Sp rna15, Sc RNA15, vertebrate CSTF2, At CSTF64 - Eukaryota_PTN001899219 - a bunch of Phytophthora infestans & Physcomitrella patens sequences [pruned] - ARATH_AT2G33435 [pruned] - ANOGA_AgaP_AGAP013055 [pruned] - Eukaryota_PTN000755691 - Sc CTF3 [pruned], vertebrate CENPI [pruned], vertebrate ZCRB1 Here are more detailed comments about various subclades: * Eukaryota_PTN000566979 - Sp prp2, Sc YGR250C, vertebrate USAF2 - S. cerevisiae YGR250C was pruned because it looked quite poor in the MSA both in the "Entire alignment" and the compressed view. In addition, this protein has never been identified as involved in the well characterized spliceosomal splicing machinery in S. cerevisiae and what little available characterization of this protein suggests that it is found in cytoplasmic stress granules. Furthermore, Huang et al. 2002 (PMID:12374752) indicate in the intro that the S. cerevisiae spliceosome lacks the U2AF complex (present in both S. pombe and in mammals) that interacts with the U2 snRNP. - The annotation of human U2AF2 to "colocalizes_with Prp19 complex" from David et al. 2011 (PMID:21536736) is OK, in that they show that this gene product interacts with the Prp19 complex. It is not inconsistent with the characterized role of this gene product as a subunit of the U2AF complex. - Eukaryota_PTN000567094 - Sc IST3, vertebrate RBMX2, * Eukaryota_PTN000566947 - Sp mis3, Sc MUD2, vertebrate RBM8A, Dmel tsu, At Y14 - The node Euteleostomi_PTN000566987 contains sequences of serine/threonine protein kinase Kist proteins. In the full alignment view, the alignment looks fairly bad, but in the condensed view, these sequences look like they have some of the key sequences. In addition, Maucuer et al. 1997 (PMID:9287318) characterize rat KIS as a "protein kinase with an RNA recognition motif" so I have left this within the tree. - The node Ascomycota_PTN000395705 was pruned because the sequences within it look out of place in the MSA (full alignment view) and because the experimental characterization of S. pombe mis6 is inconsistent with its placement within this tree. - Within the node Ascomycota_PTN000566972, the sequences YARLI_YALI0_E09152g and YEAST_MUD2 were pruned because they look out of place in the MSA (full alignment view) and because the experimental characterization of S. cerevisiae MUD2 is inconsistent with being placed within a subclade comprised of exon-exon junction binding proteins. - The node Pentapetalae_PTN001899168 was pruned because the sequences within it look out of place in the MSA (full alignment view). There is no experimental characterization shown within PAINT at this time. - There are annotations to "catalytic step 2 spliceosome" from two purification papers of native C spliceosome complexes, Jurica et al. 2002. (PMID:11991638) for human and Herold et al. 2009. (PMID:18981222) for Drosophila. However, when I look at these papers, I think that it would be accurate to say that this protein, which is part of the exon junction complex (EJC), colocalizes with the spliceosome C complex, but not that it is part of the "catalytic step 2 spliceosome" itself. The language of the Drosophila paper is particularly misleading in referring to all the proteins that came down as part of the C complex. * Eukaryota_PTN000566843 - Sp rna15, Sc RNA15, vertebrate CSTF2, At CSTF64 * Eukaryota_PTN000755691 - Sc CTF3, vertebrate CENPI, vertebrate ZCRB1 - The node Saccharomycetaceae_PTN000567059 was pruned because it looked out of place in the MSA (both with and without the full alignment option checked) and because the annotations based on experimental characterization of S. cerevisiae CTF3 seemed inconsistent with location in this family. - The node Deuterostomia_PTN000395674 was pruned because it looked out of place in the MSA (both with and without the full alignment option checked) and because the annotations based on experimental characterization of human CENPI seemed inconsistent with location in this family. * These sequences and nodes positioned directly off the root duplication node were pruned because they did look good in the MSA: - Eukaryota_PTN001502923 - 2 Opisthokonts + a C reinhardtii sequence - ARATH_AT2G33435 - ANOGA_AgaP_AGAP013055 - Eukaryota_PTN001899219 - a bunch of Phytophthora infestans & Physcomitrella patens sequences # REFERENCE Annotation inferences using phylogenetic trees The goal of the GO Reference Genome Project, described in PMID 19578431, is to provide accurate, complete and consistent GO annotations for all genes in twelve model organism genomes. To this end, GO curators are annotating evolutionary trees from the PANTHER database with GO terms describing molecular function, biological process and cellular component. GO terms based on experimental data from the scientific literature are used to annotate ancestral genes in the phylogenetic tree by sequence similarity (ISS), and unannotated descendants of these ancestral genes are inferred to have inherited these same GO annotations by descent. The annotations are done using a tool called PAINT (Phylogenetic Annotation and INference Tool).