# HISTORY 24 Mar 2016: Updated by: TOUCHUP-v1.15 14 Mar 2016: Updated by: TOUCHUP-v1.12 # molecular_function 20150915: root_PTN000827569 has function snRNA stem-loop binding (GO:0035614) 20150915: Eukaryota_PTN000052389 has function U1 snRNA binding (GO:0030619) 20150915: Magnoliophyta_PTN000052494 has LOST/MODIFIED function U1 snRNA binding (GO:0030619) # cellular_component 20150915: Eukaryota_PTN000052389 is found in U1 snRNP (GO:0005685) 20150915: Eukaryota_PTN001629532 is found in catalytic step 2 spliceosome (GO:0071013) 20150915: Eukaryota_PTN001629532 is found in U2 snRNP (GO:0005686) 20150915: Magnoliophyta_PTN000052494 is NOT found in U1 snRNP (GO:0005685) # biological_process 20150915: root_PTN000827569 participates in mRNA splicing, via spliceosome (GO:0000398) # PRUNED 24 Mar 2016: Saccharomyces cerevisiae S288c_S000001448 has been pruned from tree # WARNINGS - THE FOLLOWING HAVE BEEN REMOVED FOR THE REASONS NOTED # NOTES 14 Mar 2016: Saccharomyces cerevisiae S288c_S000001448 has been pruned from tree This tree contains spliceosomal proteins. Note that this tree contains a duplication node at the top with three child nodes. - The first subclade (Eukaryota_PTN000052389) contains a U1 protein (MUD1 in S. cerevisiae, SNRPA in vertebrates, snf in D. melanogaster) and looks like a good representation of taxa that one would expect for a spliceosomal protein conserved across eukaryea. -- Except there is a duplication within the plants where one branch contains U2 sequences instead of U1 sequences; possible the U2 sequences belong in the third subclade under the root duplication node. -- There is a duplication node for S. cerevisiae sequences containing the U1 protein MUD1 and also the U2 protein MSL1. The MSL1 sequence does not look like it belongs in this position in the tree. While it is possible that it does belong in this tree in the U2 subclade, I have pruned it because it's current position makes it hard to give it meaningful annotations and since it has a major sequence difference in that it is missing most of the C-terminal portion. - Though the first and third of these three subclades align quite well with each other throughout, the second/middle subclade (Eukaryota_PTN001629469) does not align well with the other two in the N-terminus, while the C-terminus looks OK. Nothing within this subclade has direct evidence so it is possible that this is too far diverged and should be pruned. - The third/last subclade contains U2 sequences, but there is only a small number of sequences, much fewer than one would expect for a conserved U2 snRNP protein. # REFERENCE Annotation inferences using phylogenetic trees The goal of the GO Reference Genome Project, described in PMID 19578431, is to provide accurate, complete and consistent GO annotations for all genes in twelve model organism genomes. To this end, GO curators are annotating evolutionary trees from the PANTHER database with GO terms describing molecular function, biological process and cellular component. GO terms based on experimental data from the scientific literature are used to annotate ancestral genes in the phylogenetic tree by sequence similarity (ISS), and unannotated descendants of these ancestral genes are inferred to have inherited these same GO annotations by descent. The annotations are done using a tool called PAINT (Phylogenetic Annotation and INference Tool).