# HISTORY 07 May 2016: Updated by: TOUCHUP-v1.18 24 Mar 2016: Updated by: TOUCHUP-v1.15 14 Mar 2016: Updated by: TOUCHUP-v1.14 07 May 2016: Updated by: TOUCHUP-v1.18 # molecular_function 20100715: Eukaryota_PTN000145761 has function guanine/thymine mispair binding (GO:0032137) 20100715: Eukaryota_PTN000145761 has function dinucleotide insertion or deletion binding (GO:0032139) 20100715: node_PTN000145505 has function damaged DNA binding (GO:0003684) 20100715: node_PTN000145505 has function mismatched DNA binding (GO:0030983) 20100715: Eukaryota_PTN000145640 has function single base insertion or deletion binding (GO:0032138) 20100715: Eukaryota_PTN000145640 has function guanine/thymine mispair binding (GO:0032137) 20100715: Eukaryota_PTN000145640 has function heteroduplex DNA loop binding (GO:0000404) 20100715: Eukaryota_PTN000145640 has function Y-form DNA binding (GO:0000403) 20100715: Eukaryota_PTN000145640 has function double-strand/single-strand DNA junction binding (GO:0000406) 20100715: Eukaryota_PTN000145640 has function four-way junction DNA binding (GO:0000400) 20100715: Eukaryota_PTN000145560 has function single base insertion or deletion binding (GO:0032138) 20100715: Eukaryota_PTN000145560 has function four-way junction DNA binding (GO:0000400) 20100715: Eukaryota_PTN000145560 has function guanine/thymine mispair binding (GO:0032137) 20100715: Eukaryota_PTN000145706 has LOST/MODIFIED function damaged DNA binding (GO:0003684) 20100715: Dictyostelium discoideum_DDB_G0283957 has LOST/MODIFIED function damaged DNA binding (GO:0003684) 20100715: Caenorhabditis elegans_WBGene00001872 has LOST/MODIFIED function damaged DNA binding (GO:0003684) 20100715: Eukaryota_PTN000145508 has LOST/MODIFIED function damaged DNA binding (GO:0003684) 20100715: Eukaryota_PTN000145759 has LOST/MODIFIED function mismatched DNA binding (GO:0030983) # cellular_component 20100715: Eukaryota_PTN000145761 is found in mitochondrion (GO:0005739) 20100715: node_PTN000145505 is found in mismatch repair complex (GO:0032300) 20100715: node_PTN000145507 is found in nuclear chromosome (GO:0000228) 20100715: Eukaryota_PTN000145640 is found in MutSbeta complex (GO:0032302) 20100715: Eukaryota_PTN000145640 is found in MutSalpha complex (GO:0032301) 20100715: Eukaryota_PTN000145508 is found in synaptonemal complex (GO:0000795) 20100715: Eukaryota_PTN000145560 is found in MutSalpha complex (GO:0032301) 20100715: Eukaryota_PTN000145706 is NOT found in mismatch repair complex (GO:0032300) 20100715: Eukaryota_PTN000145706 is NOT found in nuclear chromosome (GO:0000228) 20100715: Eukaryota_PTN000145772 is NOT found in mismatch repair complex (GO:0032300) 20100715: Eukaryota_PTN000145508 is NOT found in mismatch repair complex (GO:0032300) # biological_process 20100719: Eukaryota_PTN000145761 participates in mitochondrial DNA repair (GO:0043504) 20100719: node_PTN000145505 participates in mismatch repair (GO:0006298) 20100719: node_PTN000145507 participates in reciprocal meiotic recombination (GO:0007131) 20100719: node_PTN000145507 participates in meiotic mismatch repair (GO:0000710) 20100719: Eukaryota_PTN000145640 participates in negative regulation of reciprocal meiotic recombination (GO:0045128) 20100719: Eukaryota_PTN000145640 participates in meiotic gene conversion (GO:0006311) 20100719: Eukaryota_PTN000145640 participates in maintenance of DNA repeat elements (GO:0043570) 20100719: Eukaryota_PTN000145640 participates in postreplication repair (GO:0006301) 20100719: Ascomycota_PTN000145678 participates in removal of nonhomologous ends (GO:0000735) 20101216: Bilateria_PTN000145643 participates in intra-S DNA damage checkpoint (GO:0031573) 20101216: Bilateria_PTN000145643 participates in response to UV-B (GO:0010224) 20101216: Bilateria_PTN000145643 participates in double-strand break repair (GO:0006302) 20101216: Bilateria_PTN000145643 participates in response to X-ray (GO:0010165) 20101216: Bilateria_PTN000145643 participates in intrinsic apoptotic signaling pathway in response to DNA damage by p53 class mediator (GO:0042771) 20100719: Euteleostomi_PTN000145645 participates in isotype switching (GO:0045190) 20100719: Euteleostomi_PTN000145645 participates in somatic hypermutation of immunoglobulin genes (GO:0016446) 20100719: Eukaryota_PTN000145508 participates in chiasma assembly (GO:0051026) 20100719: Eukaryota_PTN000145560 participates in negative regulation of DNA recombination (GO:0045910) 20100719: Eukaryota_PTN000145560 participates in response to UV (GO:0009411) 20100719: Eukaryota_PTN000145560 participates in maintenance of DNA repeat elements (GO:0043570) 20100719: Eukaryota_PTN000145706 does NOT participate in meiotic mismatch repair (GO:0000710) 20100719: Dictyostelium discoideum_DDB_G0283957 does NOT participate in meiotic mismatch repair (GO:0000710) 20100719: Caenorhabditis elegans_WBGene00001872 does NOT participate in meiotic mismatch repair (GO:0000710) 20100719: Eukaryota_PTN000145508 does NOT participate in meiotic mismatch repair (GO:0000710) # WARNINGS - THE FOLLOWING HAVE BEEN REMOVED FOR THE REASONS NOTED # NOTES Phylogeny The core of this family consists of five paralogous clades of eukaryotic mismatch repair proteins descended from nodes AN27, AN79, AN159, AN225, and A278 and corresponding to MSH5, MSH6, MSH2, MSH4, and MSH3. The MSH3 clade also contains fungal MSH1. Outgroups (moving back from the core) are: Methanosarcina (i.e., archael) mutS, a bacterial mutS clade, and a mostly unannotated clade (AN1) that appears to contain bacterial mutSB proteins and some plant homologs. This analysis will ignore AN1 and treat AN24 as the root. Of the five major clades, the branch lengths are longest to MSH5 and MSH4, which are believed to be meiosis-specific (review: PMID 16464007). Note that MSH2 is known to form complexes with MSH6 and MSH3; these are called MutSalpha and MutSbeta, respectively. E. coli mutS forms dimers on DNA (PMID 11048711); interestingly, although both are mutS, they are in different configurations, and the authors refer to it as a "heterodimer." Yeast Msh2p forms heterodimers with Msh3p or Msh6p, all of which account for most of the length of mutS. BLAST alignments of E. coli mutS (full length: 853 aa) with yeast proteins shows that the yeast proteins align as follows: yeast protein mutS residues Msh2p 285-802 Msh3p 11-803 Msh6p 7-162 and 246-819 Msh1p 1-413 and 542-793 Msh4p 120-776 Msh5p 566-791 SGD headlines indicate that Msh1p is mitochondrial-specific and that Msh4p and Msh5p form a heterodimer that acts during meiosis. So, it looks like a single bacterial protein with heterodimer and homodimer qualities evolved into a set of specialized heterodimers that act in different organelles/times. MF -DNA binding: There are 14 DNA-binding terms used throughout the family (including "chromatin binding"). Six are different types of mismatched DNA binding. Propagate GO:0030983 "mismatched DNA binding" to AN24 and propagate the more specific terms within clades. "Four-way junction" and "Y-form DNA binding" are terms used to describe mismatch recognition in in vitro assays, so propagate them within clades, too. Propagate 30983 last to reduce redundancy. There are also many annotations to GO:0003684 "damaged DNA binding," so propagate this to AN24, also. Do not propagate "chromatin" or "centromeric DNA binding" as these sequences are highly diverse between species. -ATPase activity: Annotations to "ATPase activity," including its child "DNA-dependent ATPase activity," are found in the MSH1, 2, and 6 clades, plus E. coli mutS. Uniprot shows the ATP binding domain at aa 614-621; this region is present in the BLAST hits for all 6 yeast proteins. Propagate the child term to AN24. CC -By definition, MSH2 and MSH6 should be annotated to the MutSalpha complex, and MSH2 and MSH3 should be annotated to the MutSbeta complex. Propagate "mismatch repair complex" to AN24 to cover the rest. -Propagate "mitochondrion" to MSH1 clade. Note that MSH1 is not nuclear. -Chromosomal annotations: MSH4 and 5 have annotations to "synaptonemal complex," which is part_of "condensed nuclear chromosome." There are "nuclear chromosome" annotations (but not "condensed") for MSH2 and 6; since 3 binds 2, it is reasonable to believe that 3 is also found there. Propagate "synaptonemal complex" to the MSH4 and 5 clades; propagate "nuclear chromosome" to AN26 (to exclude bacteria and archaea). MSH1 is found in the mitochondrion, not the nucleus, so block propagation of "nuclear chromosome" to MSH1 clade. BP The vast majority of the annotations are IMP's (or IGI's) with only a handful of IDA's. Try to focus on the molecular events and meiotic/mitotic events, but avoid multicellular processes and developmental terms. -"Chiasma assembly" (51026) is found on worm him-14, but the parent term "synapsis" (7129) is found on mouse and Arabidopsis MSH4. 51026 is the better term, and actually refers to the links between chromosomes, so propagate to the MSH4 clade. Also propagate to the MSH5 clade since it's found in 2 proteins there. -S. cerevisiae and S. pombe MSH2 and -3 have annotations to GO:0000735 "removal of nonhomologous ends," GO:0007534 "gene conversion at mating-type locus," and GO:0006312 "mitotic recombination," which aren't found anywhere else. Propagate 7534 to entire fungal MSH2 and -3 clades. -There are annotations to GO:0000710 "meiotic mismatch repair" in the MSH2, 3, and 6 clades, plus annotations to "meiosis" or "meiosis I" (parent terms) in clades 4 and 5. Since these are MMR proteins, propagate 710 to the eukaryotic ancestor (AN26). Also propagate GO:0006298 "mismatch repair" to AN24 to cover bacteria. -Do not propagate GO:0007128 "meiotic prophase I" as it is only found once (in AtMSH4) and is the only (IMP) annotation here to describe a time period in meiosis instead of a molecular event. -Propagate GO:0045143 "homologous chromosome segregation" within the MSH4 and -5 clades. -Propagate GO:0007131 "reciprocal meiotic recombination" to AN26; this covers GO:0006310 "DNA recombination." Propagate the corresponding regulatory terms GO:0045128 "negative regulation of reciprocal meiotic recombination" and GO:0045910 "negative regulation of DNA recombination" to the MSH2 and MSH6 clades, respectively. Also propagate the parent term GO:0006310 "DNA recombination" to AN24 to cover bacteria. Block 7131 from propagating to MSH1 clade by IDS since this clade acts in mitochondria and there is an existing yeast NOT annotation. -Propagate GO:0008630 "DNA damage response, signal transduction resulting in induction of apoptosis" to MSH6 metazoa (AN82) and GO:0042771 "DNA damage response, signal transduction by p53 class mediator resulting in induction of apoptosis" to MSH2 metazoa (AN162). -Propagate GO:0009411 "response to UV" to the MSH6 clade. -Propagate GO:0010224 "response to UV-B" and GO:0010165 "response to X-ray" to the MSH2 clade, but restrict both to metazoa (AN162). -Propagate GO:0006301 "postreplication repair" and GO:0006302 "double-strand break repair" to the MSH2 clade. -Propagate GO:0043504 "mitochondrial DNA repair" to the MSH1 clade. -Do not propagate GO:0030466 "chromatin silencing at silent mating-type cassette," a stray annotation from an HTP paper. -Propagate "GO:0043570 maintenance of DNA repeat elements" to the MSH2 and MSH6 clades. -Also propagate to MSH3 clade (AN278), but STOP propagation to MSH1 clade per S. pombe curator request (SF item 3073335). Specifically, this annotation was based on 2 annotations in the nearby MSH3 clade. One (PMID 16388310) refers to CAG repeats in an HPRT construct integrated into a human cell line. The other (PMID 11333219) refers to GT repeats in pombe ade6. Since the pombe mitochondrial genome has negligible repeats, these phenomena do not seem relevant here. -Propagate GO:0045190 "isotype switching" to the vertebrate MSH2 and MSH6 clades, GO:0016447 "somatic recombination of immunoglobulin gene segments" to vertebrate MSH3's, and GO:0016446 "somatic hypermutation of immunoglobulin genes" to vertebrate MSH2's and MSH6's (but not to vertebrate MSH3's, since the only annotation there is a NOT). Propagate GO:0031573 "intra-S DNA damage checkpoint" and GO:0007050 "cell cycle arrest" to the MSH2 clade. Questions for MOD curators: Phylogeny -Arabidopsis: Is locus:2087193 (AT3G24320) placed correctly? Currently, it is one of three Arabidopsis paralogs in the MSH6 clade, but it's the most divergent of the three. Also, as is unique to the MSH1's, it is involved in mitochondrial processes (1 CC and 2 BP EXP annotations). Furthermore, it's actually named MSH1. MF -All organisms with "protein binding" annotations: Is there a more specific term you can use? Human (EBI) replies, July 29, 2010 7:53:51 AM EDT: I have changed a few protein binding annotations to either 'enzyme binding' or 'protein kinase binding' BP -All organisms with annotations to GO:0043570 "maintenance of DNA repeat elements": Do we need to request additional child terms to specify the class of DNA repeats? -Mouse: Is the Msh2 "ATP catabolic process" IMP valid, or is addressed by the "ATPase activity" MF annotation from the same paper? -Mouse: Please verify the IMP's to GO:0043066 "negative regulation of apoptosis," GO:0043524 "negative regulation of neuron apoptosis," and GO:0006119 "oxidative phosphorylation." Are these valid or just downstream effects? Questions for ontology curators -Does "DNA-dependent ATPase activity" has_part "DNA binding activity"? -Does "ATPase activity" has_part "magnesium ion binding"? Added NOTs to reflect the loss of DNA repair functionality for MSH4/5 complex. Also added NOTs to reflect problematic tree location of MSH1 clade. MSL, updated 2010 Dec 16 PDT, 2011 June 13 # REFERENCE Annotation inferences using phylogenetic trees The goal of the GO Reference Genome Project, described in PMID 19578431, is to provide accurate, complete and consistent GO annotations for all genes in twelve model organism genomes. To this end, GO curators are annotating evolutionary trees from the PANTHER database with GO terms describing molecular function, biological process and cellular component. GO terms based on experimental data from the scientific literature are used to annotate ancestral genes in the phylogenetic tree by sequence similarity (ISS), and unannotated descendants of these ancestral genes are inferred to have inherited these same GO annotations by descent. The annotations are done using a tool called PAINT (Phylogenetic Annotation and INference Tool).