# HISTORY 26 Mar 2016: Updated by: TOUCHUP-v1.15 17 Mar 2016: Updated by: TOUCHUP-v1.14 # molecular_function 20100415: cellular organisms_PTN000501645 has function phosphoacetylglucosamine mutase activity (GO:0004610) 20120414: Bacteria _PTN000501735 has function phosphoglucosamine mutase activity (GO:0008966) 20140505: Euteleostomi_PTN000501434 has function glucose-1,6-bisphosphate synthase activity (GO:0047933) 20140505: node_PTN001096974 has function phosphomannomutase activity (GO:0004615) 20120414: node_PTN000501426 has function phosphoglucomutase activity (GO:0004614) 20100415: node_PTN000501530 has function magnesium ion binding (GO:0000287) 20100415: node_PTN000501530 has function hypoxanthine phosphoribosyltransferase activity (GO:0004422) 20100415: Tetrapoda_PTN000501539 has LOST/MODIFIED function magnesium ion binding (GO:0000287) 20100415: Tetrapoda_PTN000501539 has LOST/MODIFIED function hypoxanthine phosphoribosyltransferase activity (GO:0004422) 20120414: Euteleostomi_PTN000501335 has LOST/MODIFIED function phosphoglucomutase activity (GO:0004614) # cellular_component 20100415: root_PTN000501326 is found in cytosol (GO:0005829) 20100416: Euteleostomi_PTN000501335 is found in sarcolemma (GO:0042383) 20100416: Euteleostomi_PTN000501335 is found in dystrophin-associated glycoprotein complex (GO:0016010) 20100416: Euteleostomi_PTN000501335 is found in stress fiber (GO:0001725) 20100416: Euteleostomi_PTN000501335 is found in cytoplasmic side of plasma membrane (GO:0009898) 20100416: Euteleostomi_PTN000501335 is found in intercalated disc (GO:0014704) 20100416: Euteleostomi_PTN000501335 is found in costamere (GO:0043034) 20100416: Euteleostomi_PTN000501335 is found in Z disc (GO:0030018) 20100416: Euteleostomi_PTN000501335 is found in focal adhesion (GO:0005925) 20100416: Euteleostomi_PTN000501335 is found in spot adherens junction (GO:0005914) 20100419: Viridiplantae_PTN000501414 is found in chloroplast stroma (GO:0009570) 20100419: Viridiplantae_PTN000501414 is found in stromule (GO:0010319) 20100415: Euteleostomi_PTN000501335 is NOT found in cytosol (GO:0005829) # biological_process 20120414: Eukaryota_PTN000501647 participates in UDP-N-acetylglucosamine biosynthetic process (GO:0006048) 20101220: Ascomycota_PTN000501682 participates in cell wall chitin biosynthetic process (GO:0006038) 20101220: saccharomyceta_PTN000501683 participates in fungal-type cell wall chitin biosynthetic process (GO:0034221) 20100426: Euteleostomi_PTN000501651 participates in hemopoiesis (GO:0030097) 20120414: Bacteria _PTN000501735 participates in UDP-N-acetylglucosamine biosynthetic process (GO:0006048) 20140505: node_PTN000501426 participates in glucose metabolic process (GO:0006006) 20140505: node_PTN000501426 participates in glycogen biosynthetic process (GO:0005978) 20140505: node_PTN000501426 participates in galactose catabolic process (GO:0019388) 20100423: Viridiplantae_PTN000501414 participates in detection of gravity (GO:0009590) 20100420: Viridiplantae_PTN000501414 participates in starch biosynthetic process (GO:0019252) 20100420: node_PTN000501530 participates in GMP salvage (GO:0032263) 20100420: node_PTN000501530 participates in adenine salvage (GO:0006168) 20100419: node_PTN000501530 participates in hypoxanthine metabolic process (GO:0046100) 20100420: node_PTN000501530 participates in IMP salvage (GO:0032264) 20100420: node_PTN000501530 participates in guanine salvage (GO:0006178) 20100420: Tetrapoda_PTN000501539 does NOT participate in GMP salvage (GO:0032263) 20100420: Tetrapoda_PTN000501539 does NOT participate in adenine salvage (GO:0006168) 20100419: Tetrapoda_PTN000501539 does NOT participate in hypoxanthine metabolic process (GO:0046100) 20100420: Tetrapoda_PTN000501539 does NOT participate in IMP salvage (GO:0032264) 20100420: Tetrapoda_PTN000501539 does NOT participate in guanine salvage (GO:0006178) 20140505: Euteleostomi_PTN000501335 does NOT participate in glucose metabolic process (GO:0006006) 20140505: Euteleostomi_PTN000501335 does NOT participate in glycogen biosynthetic process (GO:0005978) 20140505: Euteleostomi_PTN000501335 does NOT participate in galactose catabolic process (GO:0019388) # WARNINGS - THE FOLLOWING HAVE BEEN REMOVED FOR THE REASONS NOTED # NOTES Description of phylogeny:PTHR22573 has 4 major clades defined by the following ancestral nodes: AN1, AN100, AN204, and AN318. Notable members of each clade include:AN1: 1 bacterial protein (non-coli), 3 Arabidopsis proteins, S. cerevisiae PGM1 and PGM2 (which appear to be paralogs), fly Pgm, and 2 paralogous clades of vertebrate proteins which can be classified as PGM1 and PGM5.AN100: E. coli pgm, S. cerevisiae PGM3, fly Pmm45A, and paralogous vertebrate clades PGM2 and PGM2L1.AN204: E. coli hpt, 1 Arabidopsis protein, no yeast proteins, and paralogous vertebrate clades HPRT1 and PRTDFC1. ***Note: the branch length to AN204 is longer than the branch length to the other 3 major clades. AN318: E. coli glmM and manB, 2 Arabidopsis proteins, S. cerevisiae PCM1, and vertebrate PGM3. P-POD OrthoMCL places E. coli pgm (P36938) in the AN1 clade and subdivides the AN318 clade differently, but there are no major disagreements with PANTHER. Annotation notes and reasoning Do not propagate anything (other than cytoplasmic location) to root, as each clade is present in LUCA and extremely divergent. Molecular Function: -Phosphoglucomutase (PGM) activity is found in the AN1, AN100, and AN318 clades, but not in the AN204 clade. Propagate separately to each of these. -AN204 gets an HPRT activity annotation AND magnesium ion binding. - In this alignment, the PGM5 clade has 2 mutations that lie in conserved motifs near the active site as described in PMID 8631316: N667C in an ashNpgg motif and G1106A in an fdGdgdr motif. Unable to find Y-->F in rYdye also mentioned in paper. Therefore add a NOT/descendant_sequences modifier to AN9. Would prefer to be able to use missing_residues also, but have chosen the stronger code since only 1 can be used. -The HPRT (AN204) clade is also missing these 2 domains. N667 lies in a large deletion, and a novel domain spans G1106. This reinforces the idea that AN204 has lost PGM activity, but do not change qualifier to a NOT. Really, the best interpretation is that HPRT is related to the PGMs but has mutated and switched activities to something new. -Also missing N667: the AN384 clade (N667L). Use IMR. -Do not propagate protein binding annotations. -Checked out mouse Pgm3 PGM activity annotation because it is the only one in its clade, where there are several annotations to phosphoacetylglucosamine mutase activity. Actually, there are 3 annotations to PGM activity from 3 different papers. -Note that the abstract to PMID 17548465 alludes to pre-existing knowledge that mouse Pgm3 has phosphoacetylglucosamine mutase activity. Possible earlier citation? -It appears that members of the AN318 clade have both phosphoglucomutase activity and phosphoacetylglucosamine mutase activity (GO:0004610), so it is appropriate for AN318 to inherit PGM activity and to propagate 4610 within some subset of the clade. E. coli glmM is annotated to have phosphoglucosamine mutase activity (GO:0008966), and no descendant of AN397 is documented to have 4610, which is attributed to a different gene (glmU) in the intro of PMID 10231382. So, it appears that the function of the protein diverged at AN318 to be 4610 in the AN319 branch and 8966 in the AN397 branch. So, propagate 4610 to AN319 instead of AN318. This may still be too broad, but there is no evidence to contradict it. Also, propagate phosphoglucosamine mutase activity (GO:0008966) to AN397. -Do not propagate the "purine binding" annotation to rat Hprt1 pending resolution of the annotation/ontology question for RGD. Cellular Component: -There are many annotations to both "cytosol" and "cytoplasm" spanning all clades, and a number of annotations to "nucleus." However, the "nucleus" annotations come exclusively from high-throughput papers that also show a cytosolic distribution of the protein. All annotations to "cytosol" also come from HTP papers that also show a nuclear distribution, except for 1 possibly mistaken annotation to human PGM1 and another to human HPRT1 based on a cytoplasmic extract (both sent to MODs for review). There are several annotations to "cytoplasm" that are clearly low-throughput and do not also show a nuclear distribution: yeast PGM1 and PGM2, E. coli pgm and hpt, and human HPRT1. Therefore, an annotation to "cytoplasm" is clearly appropriate. However, in the context of several annotations to "cytosol," even though they are from HTP papers, and based on the characteristics of these proteins as PGM's and HPRT's, it is reasonable to make the more specific "cytosol" annotation. The observations of the proteins in nuclei, mostly in nuclear extracts, is probably due to contamination, as it never occurs without being observed in the cytosol as well. Furthermore, to make an annotation to "nucleus," I would have 2 choices: (1) make the annotation to AN0 to reflect the ubiquity of the observation; this would mean that the ancestral protein was found in the nucleus of an ancestral organism that gave rise to bacteria, which is hard to justify, and I would also have to block propagation to multiple bacterial clades. OR (2) make the annotation several times, within each of the major clades, thereby implying that the protein gained this function multiple times, which is hard to justify based on only the HTP papers mentioned above. Therefore, do not make a "nucleus" annotation. -The rat Hprt1 annotation to "soluble fraction" is consistent with the protein being in the cytosol. -Mouse Pgm5 and human PGM5 have, between them, 11 annotations to components that cannot be part of the cytosol and no annotations to "cytosol" or its children. Therefore, blocl propagation of "cytosol" to AN9. Propagate the 11 other annotations within then AN9 clade. (They all have something to do with muscles.) -Propagate "stroma" and "chloroplast stroma" within plant/algae PGM1 clade. -There is 1 annotation to "mitochondrion" from an HTP TAIR paper. Not enough to propagate. Biological Process: -All proteins with an annotation to PGM activity get an annotation to GO:0019255, "glucose 1-phosphate metabolic process." Nodes that are blocked or received NOT qualifiers under MF will receive the same type of qualifiers under BP. Exception: STOP propagation to AN9 since NOT/IDS is not available. -Similarly, AN204 gets "hypoxanthine metabolic process." -Both phosphoglucosamine mutase activity and phosphoacetylglucosamine mutase activity are involved in the synthesis of UDP-N-acetylglucosamine, so propagate GO:0006048 to AN318. -AN318 proteins are involved in the biosynthesis of UDP-GlcNAc, which is the basic building block of chitin. However, the evolution of chitin is not clear (PMID:7994124, and may have occurred multiple times, so do not propagate chitin biosynthesis. -Propagate 6 terms for various purine and purine nt biosynthetic processes within the HPRT clade: 6166, 6168, 6178, 46100, 32263, 32264. -Propagate "glycogen biosynthetic process" among animal and fungal proteins in PGM1 clade (excluding PGM5 clade) and "starch biosynthetic process" among plant PGM1's. -Propagate "galactose catabolic process" to PGM1 clade (excluding PGM5). Propagate "trehalose biosynthetic process to PGM1 clade excluding vertebrate PGM1 and PGM5 clades, because vertebrates don't make trehalose. -Yeast PGM2 has an annotation to "cellular calcium ion homeostasis." This seems to act through the vacuolar membrane protein PMC1 (PMID 15252028). Since it is not clear how direct this is, do not propagate. -Do not propagate Rat Pgm1 "response to radiation" IEP, Arabidopsis 2165351 "response to cold" IEP. -Do not propagate worm WBGene00019890 "embryonic development ending in birth or egg hatching" or any BP annotation to WBGene00012803 or WBGene00013690 (RNAi phenotype-to-GO mapping). -Detection of gravity in Arabidopsis 2165351 depends on amyloplasts, so propagate within plant clade. -HPRT affects dopamine via the receptor, so propagate "pos. reg. of dopamine metabolic process" to all animals with nervous systems. -Do not propagate "protein homotetramerization" -HPRT: Self-injurious behavior in humans and injurious overgrooming in mice are analogous. Propagate "grooming behavior" to all animals with central nervous systems. Also propagate other neurological annotations to animals with CNS's and immunological annotations (including "cytolysis") to animals with immune systems. -Do not propagate "protein amino acid autophoshorylation." Questions for MODs: -Fly: Consider renaming fly Pmm45A (FBgn0033377), which is named as a phosphomannomutase but lies in a clade of phosphoglucomutases and has no EXP annotations. -Fly: Is the fly Pgm (Q9VUY9) IMP annotation to "phosphoglycerate mutase" correct? The only phenotypes examined in this paper (PMID 17159148) have to do with flight, and phosphoglycerate mutase is not mentioned. Also, the name and EC #'s are very similar to "phosphoglucomutase (5.4.2.1 vs. 5.4.2.2). -SGD: Does PCM1 have phosphoglucomutase activity? See PMID's 8174553 and 8119301, cited in gene summary paragraph. -E. coli: Does glmM have phosphoglucomutase activity? See PMID's 10231382 and 8550580. -RGD: Should the "purine binding" annotation from PMID 6206848 be assumed as part of HPRT activity? I.E., does HPRT activity include the assumption of hypoxanthine binding? There is no "hypoxanthine binding" term in the GO, but it would be a child of "purine binding." Perhaps this is an ontology question instead. Alternative: perhaps the "purine binding" annotation is not valid at all, since the competitors used in PMID 6206848 were purine nucleotides, not straight purines. -Human: There may be a body of literature for PGM5 under the name "aciculin." EBI replies, July 29, 2010 7:53:51 AM EDT: Yes, these papers were checked at the time of annotation, there is nothing to add. -Human: The PGM1 cytosol EXP annotation is not shown in the cited paper (PMID 7902568). EBI replies, July 29, 2010 7:53:51 AM EDT: This annotation is from Reactome, I will contact the relevant person to have it removed from PGM1 and PGM2. Reactome replies, 29/07/2010 16:40: A better reference is PMID: 1840235 (an earlier publication by the authors of PMID:7902568 and cited by them) that demonstrates presence of active enzyme in erythrocytes and as these have no other internal organelles the enzyme is cytosolic. -Human: The HPRT1 cytoplasm and cytosol annotations in PMID 6300847 are based on activity found in cytoplasmic extracts. Is this sufficient justification for a "cytosol" annotation, or just "cytoplasm"? Why are there 2 different evidence codes (EXP vs IDA)? EBI replies, July 29, 2010 7:53:51 AM EDT: The IDA annotations were made by GOA whereas the EXP annotations are from Reactome. The IDA ones were made after the EXP ones to give more specificity to the annotations. I could contact Reactome to remove the redundant EXP annotations. -E. coli: Does PMID 8550580 justify annotation of glmM with the terms GO:0009252 "peptidoglycan biosynthetic process" and GO:0009103 "lipopolysaccharide biosynthetic process"? Refer to the section entitled "Biochemical Effects of the glmM Mutation." Is there an appropriate cell wall or cell envelope annotation that can be inferred, too? -SGD: Should the PCM1 annotation to GO:0006038 "cell wall chitin biosynthetic process" instead be to GO:0034221 "fungal-type cell wall chitin biosynthetic process"? -SGD: Are there cell wall processes that can be annotated from PMID 9252577? Calcofluor white phenotypes have been captured. -TAIR: Is At1g23190 really involved in "response to cadmium ion" based on changes in expression level? Is IDA appropriate (vs. IEP)? PMID 20005002 shows that its expression changes in response to cadmium, but does not show an effect. Ontology questions: -Should "contractile fiber" (GO:0043292) be an "organelle" (GO:0043226) ? This would clarify that it should be excluded from "cytosol." -UDP-GlcNAc is the basic building block of chitin. Can we infer that UDP-GlcNAc biosynthesis is related to chitin biosynthesis? -Is "protein homotetramerization" a valid process? -Should "cerebral cortex neuron differentiation" (GO:0021895) be a child of "central nervous system neuron development" (GO:0021954)? -Why does GO:0034221 "fungal-type cell wall chitin biosynthetic process" specify that the cells are vegetative, while GO:0009277 "fungal-type cell wall" does not? (SF# 3140940) MSL, updated 2010 Dec 20 PDT, 2011 June 14, (chitin biosynthesis, calcium ion homeostasis) HM review and updated (2014 May 5) # REFERENCE Annotation inferences using phylogenetic trees The goal of the GO Reference Genome Project, described in PMID 19578431, is to provide accurate, complete and consistent GO annotations for all genes in twelve model organism genomes. To this end, GO curators are annotating evolutionary trees from the PANTHER database with GO terms describing molecular function, biological process and cellular component. GO terms based on experimental data from the scientific literature are used to annotate ancestral genes in the phylogenetic tree by sequence similarity (ISS), and unannotated descendants of these ancestral genes are inferred to have inherited these same GO annotations by descent. The annotations are done using a tool called PAINT (Phylogenetic Annotation and INference Tool).