****************************************
****************************************
** DEPRECATED!!!!!
**
** This has been replaced by:
**   http://www.geneontology.org/doc/GO.style_guide
**
****************************************
****************************************

!version: $Revision: 1.9 $
!date: $Date: 2010/03/09 16:58:33 $
!
!Gene Ontology
!style_guide
!
!editors: Michael Ashburner (FlyBase), Midori Harris (GO), Judith Blake (MGD)
!Leonore Reiser (TAIR), Karen Christie (SGD) and colleagues
!with software by Suzanna Lewis (FlyBase Berkeley).
!
GO STYLE GUIDE

This document is written for curators of GO.  Others may find it of some 
interest.

I.    Ontology term conventions.
II.   Annotation conventions.
III.  Ontology-building conventions.

Ontology term conventions deal with the syntactical conventions we
would like to follow when adding GO terms and definitions, and
Annotation conventions deal with how species databases annotate their
gene products to GO.  The Ontology-building conventions are an
attempt at a guide to changing relationships in GO, as well as adding
and deleting ontology terms.  Miscellaneous observations are simply
things that don't fit elsewhere, and the Appendix: Relevant past email
is just that.

This document should be read with go/doc/GO.doc.  Other useful
documents are go/doc/GO.sgd_annotation_guide and go/doc/GO.evidence.


I.  ONTOLOGY TERM CONVENTIONS.

Ontology terms are descriptive of the molecular function, biological
process or cellular component.  They are not names of gene products.

If you create a new term, or refine a term, define it in go/doc/GO.defs.

I.1.  All lower case except where demanded by context, eg DNA, not dna.

I.2.1.  Use the singular, except where a term is _only_ used in the plural
    (eg caveolae).

I.2.2.  Where there are differences in the accepted spelling between English
and US usage, use the US form, e.g. polymerizing, signaling, rather than
polmerising, signalling.

There is a dictionary of 'words' used in GO terms in the file
go/doc/GO.word_dictionary.  This will be periodically updated, until
such time as it can be automatically made by the database.  If in doubt
check this list or consult Michael or Midori.

I.3.  Database cross references to be database:ID.
  
The abbreviations to be used for external databases are to be found in
go/doc/GO.xref_abbs.

I.4.  If an authority of a term or group of terms exists then use it. Add 
that authority to go/doc/GO.bib. One minor exception is that the EC
names (that is those in the DE line of the  "ENZYME nomenclature database")
begin with a capital letter; GO replaces this by a lower case letter (except
when the initial capital has obvious semantic meaning such as 
'L-xylulose reductase').

I.5.  Aim to be reasonably descriptive, even at the risk of some
verbal redundancy - remember all that may be in a genome database (FB,
MGD, SGD) may be the finest level term and something like "type V" may
be very uninformative; if the parent is  "collagen", then the child should be 
"collagen type V".

I.6.  Use full element names, not symbols.  (copper and zinc rather than Cu
and Zn). Use "hydrogen" for H+. Use copper(II), copper(III), etc, rather
than cuprous, cupric, etc.


I.7.  EC/TC database cross references.

1.7.1. The authorative source of EC numbers is:

	http://www.chem.qmw.ac.uk/iubmb/enzyme/

(and not the SIB ENZYME database).

1.7.2 EC numbers are not used in cellular_component.

I.8.  As a convention the catalytic and regulatory subunits of an enzyme
are described in molecular_function as parts_of the holoenzyme.

e.g.:

  %enzyme ; GO:0003824
   %protein kinase ; GO:0004672
    %protein serine/threonine kinase ; GO:0004674 ; EC:2.7.1.37
     %casein kinase ; GO:0004680
      %casein kinase II ; GO:0004682
       <casein kinase II catalyst ; GO:0008604
       <casein kinase II regulator ; GO:0008605

I.9.  Subscripts/superscripts: superscripted and subscripted characters
are not indicated.

I.10. Special characters and greek.  Special characters may be escaped with
a backslash, e.g. if the '>' character is necessary use '\>'.  Greek
letters are spelled out, e.g. alpha, beta, etc.

I.11 As a general rule anatomical qualifiers of terms are not to be used.

For example, GO has the term "DNA-directed DNA polymerase" but neither
"nuclear DNA polymerase" nor "mitochondrial DNA polymerase".

These are treated by annotators making independent attributions using
both (in this case) the molecular_function and cellular_component
ontologies.

I. 12. Use of 'sensu' in terms.

Used to restrict the meaning of a term to a particular taxonomic group;
use latin for describing taxonomic groups.

Use as:

term (sensu Group), e.g.:

	%gastrulation (sensu Insecta)

If more than one taxonomic group is included then separate by commas, e.g.:
	%female meiotic spindle assembly (sensu Drosophila, sensu Mus) ; GO:0007056


II.  ANNOTATION CONVENTIONS.

II.1.  Annotate gene products in each species database to the most
detailed level in the ontology which correctly describes the biology of
the gene product. 

II.2.  A gene product can be annotated to zero or more nodes of each
ontology.

II.3.  Uncertain knowledge of where a gene product operates should be denoted
by annotating it to two nodes, one of which can be a parent of the
other.  For instance, a yeast gene product known to be in the nucleolus,
but also experimentally observed in the nucleus generally, can be
annotated to both nucleolus and nucleus in the cell component
ontology. Even though annotation to nucleolus alone implies that a gene
product is also in the nucleus, annotate to both so as to explicitly
indicate that it has been reported in the two locations.  Similar reports
of general and specific molecular function or biological process for a
gene product could be handled the same way (come up with some good
examples).  You also can annotate to multiple nodes that conflict with each
other if there are conflicting claims in the literature.

II.4. All annotations in genome databases must be:

   - attributed
   - include the evidence for the attribution.

III.  ONTOLOGY-BUILDING CONVENTIONS.

III.1.  What is included.

III.1.1.  Do not include processes that occur only in mutant organisms.

III.1.2.  cellular_component includes multisubunit enzymes and other proteins,
but not individual proteins or nucleic acids.

III.2.  All three ontologies (molecular-function, biological_process, and
cellular_component) are directed acyclic graphs.  That means that any
term can have more than one parent term.

III.3.  Synonyms.

III.3.1.  It is useful to add synonyms to terms. GO numbers are not attached to 
synonyms.  Abbreviations can be regarded as a special class of synonym.
The number of synonyms for a term is not limited.

The syntax is:

    term ; synonym:synonym ; synonym:synonym

III.3.2.  One synonym can be used for more than one GO term.  This is necessary
to deal with the way biologists use their vocabulary.  Viewers that allow
queries by synonyms will need to make it clear that they are returning
items from more than one GO node if that happens.

III.4.  Dependent ontology terms.

Some base GO terms imply the presence of others in the ontology.  Examples
are:

Process Ontology:

If either "X biosynthesis" or "X catabolism" exists, then the parent "X
metabolism" must also exist.

If "X regulation" exists, then the process "X" must also exist.  Potentially
any process in the ontology can be regulated.

III.5.  Naming different pathways leading to the same product.

Where there are several biosynthetic pathways leading to the same product,
we list each of them as an instance of a general pathway.  For instance,
we have:

        %phosphatidylethanolamine biosynthesis ; GO:0006646
         %phosphatidyl-N-monomethylethanolamine (PMME) biosynthesis ; GO:0006647
         %dihydrosphingosine-1-P pathway ; GO:0006648

What to call them is easy for well-known pathways (glycolosis and the
pentose-phosphate pathway are two ways to accomplish glucose catabolism),
but harder for nameless minor pathways, such as the dihydrosphingosine-1-P
pathway above.  So that we do not mistakenly give two names to the same minor
pathway, use the name of the first intermediate, as a synonym if not as
the primary GO name.