# HISTORY 11 Aug 2016: Saved by krc using Paint 2.21 09 Jun 2016: Saved by krc using Paint 2.0-beta19 # molecular_function # cellular_component 20160811: Eukaryota_PTN001114632 is found in nucleus (GO:0005634) 20160811: Eukaryota_PTN000529705 is found in preribosome, large subunit precursor (GO:0030687) # PRUNED # biological_process 20160811: Eukaryota_PTN001114632 participates in regulation of DNA-templated transcription, elongation (GO:0032784) 20160811: Eukaryota_PTN001114632 participates in regulation of mRNA export from nucleus (GO:0010793) 20160811: Eukaryota_PTN001114632 participates in regulation of mRNA processing (GO:0050684) 20160811: Eukaryota_PTN001114632 participates in regulation of histone H3-K36 trimethylation (GO:2001253) 20160811: Eukaryota_PTN000529705 participates in rRNA processing (GO:0006364) 20160811: Eukaryota_PTN000529705 participates in ribosomal large subunit assembly (GO:0000027) 09 Aug 2016: Mammalia_PTN001114667 has been pruned from tree 09 Aug 2016: Boreoeutheria_PTN000529769 has been pruned from tree 10 Aug 2016: saccharomyceta_PTN000786132 has been pruned from tree 10 Aug 2016: Schizosaccharomyces pombe_SPBC19G7.16 has been pruned from tree 10 Aug 2016: node_PTN001877149 has been pruned from tree 10 Aug 2016: Saccharomycetaceae_PTN001877135 has been pruned from tree 11 Aug 2016: Chromadorea_PTN001114648 has been pruned from tree 11 Aug 2016: Drosophila melanogaster_FBgn0000283 has been pruned from tree 11 Aug 2016: node_PTN001877153 has been pruned from tree # WARNINGS - THE FOLLOWING HAVE BEEN REMOVED FOR THE REASONS NOTED # NOTES The tree structure seems a bit muddled up. There is a duplication node at the top with two branches below it. The first branch seems to be mostly IWS1 sequences, while the second branch is mostly MDN1 (midasin) sequences. However, there are a bunch of IWS1 sequences mixed into the MDN1 clade. There are also lots of long branch lengths. . Tree structure ==================== . - 9ZZZZ:PTN001486914 -- PTN001114630 --- KORCO:Kcr_1472 --- 9ZZZZ:PTN001486937 (duplication node) ---- 9EUKA:PTN001114632 -------- 9BILA:PTN001114636 - IWS1 (vertebrates & insects) ----- 9MAGN:PTN001486951 - IWS1 (plants) ---- 9EUKA:PTN001877103 - mostly not named, but ixodid IscW (named as putative midasin) -- 9BACT:PTN001877116 - Ecoli yehL, Ypestis cobS - 9ZZZZ:PTN000529704 -- 9ZZZZ:PTN001877123 (diamond node) --- 9ZZZZ:PTN001877124 (2 Bacillus sequences) --- 9ZZZZ:PTN001877126 (duplication node) ---- 9EUKA:PTN001877127 (Tetrahymena, Plasmodium, ...) ---- 9EUKA:PTN000529705 ----- 9EUKA:PTN000529706 mix of IWS1 & MDN1 sequences -- 9ZZZZ:PTN001114778 - bacterial sequences . . Pruned Sequences and nodes ==================== SAMD15 sequences ---------------------------------- - 9MAMM:PTN001114667 - This node of SAMD15 sequences is under a duplication node with vertebrate IWS1 sequences, but with a long branch length and the SAMD15 sequences look terrible in the MSA. In addition, while there is no experimental annotation for either human or mouse SAMD15, the one InterPRO domain (which references S cerevisiae BOI2; UniProtKB:P39969) is inconsistent with a function similar to IWS1. - 9ZZZZ:PTN001877149 - This duplication node with a long branch length within the MDN1 clade has two rat sequences, one of which is a Samd15 sequence. . TCHHL1 sequences ---------------------------------- - PTN000529769 - This node of vertebrate TCHHL1 sequences has a long branch length from the MDN1 genes and also looks terrible in the MSA . IWS1 sequences misplaced in MDN1 clade ---------------------------------- - PTN000786132 - node with several fungal IWS1 sequences which don't look good in MSA - S pombe iws1 is in a duplication node with S pombe mdn1, but with a significantly longer branch length for the iws1 sequence - 9SACH:PTN001877135 - This node contains two sequences (S.cerevisiae SPN1 (aka IWS1) and Ashbya IWS1 sequence), with a very long branch length. . Sequences within PTN000781460 ---------------------------------- - D melanogaster Cp190 & 9ZZZZ:PTN001486940 (2 Dmel seqs) - These have long branch lengths and don't look good in the MSA. In addition, one of these sequences is named and is called Cp190, rather than midasin. - PTN001114648 - 4 sequences including two C elegans sequences named as mel-28 transcription factors. The MSA looks poor, even in the regions conserved between sequence above and below them. . Other possible tree issues ==================== I did NOT prune any of these nodes since they didn't get in the way of making annotations. I just didn't make any annotations that would propagate to any of these nodes. - 9EUKA:PTN001877103 - This branch is within the IWS1 half of the tree, but the branch lengths are huge, the MSA isn't very good, and the two IXOSC sequences within this branch (both unreviewed TrEMBL records) are named as putative midasin. - These three nodes are within the MDN1 portion of the tree, but from the MSA, I don't feel comfortable making annotations to them. -- 9EUKA:PTN001877127 (Tetrahymena, Plasmodium, ...) -- 9ZZZZ:PTN001877124 (2 Bacillus sequences) -- 9ZZZZ:PTN001114778 - bacterial sequences . 9EUKA: # REFERENCE Annotation inferences using phylogenetic trees The goal of the GO Reference Genome Project, described in PMID 19578431, is to provide accurate, complete and consistent GO annotations for all genes in twelve model organism genomes. To this end, GO curators are annotating evolutionary trees from the PANTHER database with GO terms describing molecular function, biological process and cellular component. GO terms based on experimental data from the scientific literature are used to annotate ancestral genes in the phylogenetic tree by sequence similarity (ISS), and unannotated descendants of these ancestral genes are inferred to have inherited these same GO annotations by descent. The annotations are done using a tool called PAINT (Phylogenetic Annotation and INference Tool).