**************************************** **************************************** ** DEPRECATED!!!!! ** ** This has been replaced by: ** http://www.geneontology.org/doc/GO.style_guide ** **************************************** **************************************** !version: $Revision: 1.9 $ !date: $Date: 2010/03/09 16:58:33 $ ! !Gene Ontology !style_guide ! !editors: Michael Ashburner (FlyBase), Midori Harris (GO), Judith Blake (MGD) !Leonore Reiser (TAIR), Karen Christie (SGD) and colleagues !with software by Suzanna Lewis (FlyBase Berkeley). ! GO STYLE GUIDE This document is written for curators of GO. Others may find it of some interest. I. Ontology term conventions. II. Annotation conventions. III. Ontology-building conventions. Ontology term conventions deal with the syntactical conventions we would like to follow when adding GO terms and definitions, and Annotation conventions deal with how species databases annotate their gene products to GO. The Ontology-building conventions are an attempt at a guide to changing relationships in GO, as well as adding and deleting ontology terms. Miscellaneous observations are simply things that don't fit elsewhere, and the Appendix: Relevant past email is just that. This document should be read with go/doc/GO.doc. Other useful documents are go/doc/GO.sgd_annotation_guide and go/doc/GO.evidence. I. ONTOLOGY TERM CONVENTIONS. Ontology terms are descriptive of the molecular function, biological process or cellular component. They are not names of gene products. If you create a new term, or refine a term, define it in go/doc/GO.defs. I.1. All lower case except where demanded by context, eg DNA, not dna. I.2.1. Use the singular, except where a term is _only_ used in the plural (eg caveolae). I.2.2. Where there are differences in the accepted spelling between English and US usage, use the US form, e.g. polymerizing, signaling, rather than polmerising, signalling. There is a dictionary of 'words' used in GO terms in the file go/doc/GO.word_dictionary. This will be periodically updated, until such time as it can be automatically made by the database. If in doubt check this list or consult Michael or Midori. I.3. Database cross references to be database:ID. The abbreviations to be used for external databases are to be found in go/doc/GO.xref_abbs. I.4. If an authority of a term or group of terms exists then use it. Add that authority to go/doc/GO.bib. One minor exception is that the EC names (that is those in the DE line of the "ENZYME nomenclature database") begin with a capital letter; GO replaces this by a lower case letter (except when the initial capital has obvious semantic meaning such as 'L-xylulose reductase'). I.5. Aim to be reasonably descriptive, even at the risk of some verbal redundancy - remember all that may be in a genome database (FB, MGD, SGD) may be the finest level term and something like "type V" may be very uninformative; if the parent is "collagen", then the child should be "collagen type V". I.6. Use full element names, not symbols. (copper and zinc rather than Cu and Zn). Use "hydrogen" for H+. Use copper(II), copper(III), etc, rather than cuprous, cupric, etc. I.7. EC/TC database cross references. 1.7.1. The authorative source of EC numbers is: http://www.chem.qmw.ac.uk/iubmb/enzyme/ (and not the SIB ENZYME database). 1.7.2 EC numbers are not used in cellular_component. I.8. As a convention the catalytic and regulatory subunits of an enzyme are described in molecular_function as parts_of the holoenzyme. e.g.: %enzyme ; GO:0003824 %protein kinase ; GO:0004672 %protein serine/threonine kinase ; GO:0004674 ; EC:2.7.1.37 %casein kinase ; GO:0004680 %casein kinase II ; GO:0004682 ' character is necessary use '\>'. Greek letters are spelled out, e.g. alpha, beta, etc. I.11 As a general rule anatomical qualifiers of terms are not to be used. For example, GO has the term "DNA-directed DNA polymerase" but neither "nuclear DNA polymerase" nor "mitochondrial DNA polymerase". These are treated by annotators making independent attributions using both (in this case) the molecular_function and cellular_component ontologies. I. 12. Use of 'sensu' in terms. Used to restrict the meaning of a term to a particular taxonomic group; use latin for describing taxonomic groups. Use as: term (sensu Group), e.g.: %gastrulation (sensu Insecta) If more than one taxonomic group is included then separate by commas, e.g.: %female meiotic spindle assembly (sensu Drosophila, sensu Mus) ; GO:0007056 II. ANNOTATION CONVENTIONS. II.1. Annotate gene products in each species database to the most detailed level in the ontology which correctly describes the biology of the gene product. II.2. A gene product can be annotated to zero or more nodes of each ontology. II.3. Uncertain knowledge of where a gene product operates should be denoted by annotating it to two nodes, one of which can be a parent of the other. For instance, a yeast gene product known to be in the nucleolus, but also experimentally observed in the nucleus generally, can be annotated to both nucleolus and nucleus in the cell component ontology. Even though annotation to nucleolus alone implies that a gene product is also in the nucleus, annotate to both so as to explicitly indicate that it has been reported in the two locations. Similar reports of general and specific molecular function or biological process for a gene product could be handled the same way (come up with some good examples). You also can annotate to multiple nodes that conflict with each other if there are conflicting claims in the literature. II.4. All annotations in genome databases must be: - attributed - include the evidence for the attribution. III. ONTOLOGY-BUILDING CONVENTIONS. III.1. What is included. III.1.1. Do not include processes that occur only in mutant organisms. III.1.2. cellular_component includes multisubunit enzymes and other proteins, but not individual proteins or nucleic acids. III.2. All three ontologies (molecular-function, biological_process, and cellular_component) are directed acyclic graphs. That means that any term can have more than one parent term. III.3. Synonyms. III.3.1. It is useful to add synonyms to terms. GO numbers are not attached to synonyms. Abbreviations can be regarded as a special class of synonym. The number of synonyms for a term is not limited. The syntax is: term ; synonym:synonym ; synonym:synonym III.3.2. One synonym can be used for more than one GO term. This is necessary to deal with the way biologists use their vocabulary. Viewers that allow queries by synonyms will need to make it clear that they are returning items from more than one GO node if that happens. III.4. Dependent ontology terms. Some base GO terms imply the presence of others in the ontology. Examples are: Process Ontology: If either "X biosynthesis" or "X catabolism" exists, then the parent "X metabolism" must also exist. If "X regulation" exists, then the process "X" must also exist. Potentially any process in the ontology can be regulated. III.5. Naming different pathways leading to the same product. Where there are several biosynthetic pathways leading to the same product, we list each of them as an instance of a general pathway. For instance, we have: %phosphatidylethanolamine biosynthesis ; GO:0006646 %phosphatidyl-N-monomethylethanolamine (PMME) biosynthesis ; GO:0006647 %dihydrosphingosine-1-P pathway ; GO:0006648 What to call them is easy for well-known pathways (glycolosis and the pentose-phosphate pathway are two ways to accomplish glucose catabolism), but harder for nameless minor pathways, such as the dihydrosphingosine-1-P pathway above. So that we do not mistakenly give two names to the same minor pathway, use the name of the first intermediate, as a synonym if not as the primary GO name.