Annotation Conventions

This page contains guidelines which apply to all annotation methods and are particularly useful for manual literature-based annotation. More information on annotation can be found in the introduction to GO annotation and the GO annotation standard operating procedures.

See also the Annotation Camp minutes for additional information, including examples, on annotation practices and recommendations.

General recommendations

  • A gene product can be annotated to zero or more nodes of each ontology.
  • Annotation of a gene product to one ontology is independent of its annotation to other ontologies.
  • Annotate gene products in each species database to the most detailed level in the ontology that correctly describes the biology of the gene product.
  • Keep in mind that annotating to a term implies annotation to all parents via any path, so it is a good idea to check the parentage of a term before annotating (and request new terms or path corrections if necessary).
  • Uncertain knowledge of where a gene product operates should be denoted by annotating it to two nodes, one of which can be a parent of the other. For instance, a yeast gene product known to be in the nucleolus, but also experimentally observed in the nucleus generally, can be annotated to both nucleolus and nucleus in the cell component ontology. Even though annotation to nucleolus alone implies that a gene product is also in the nucleus, annotate to both so as to explicitly indicate that it has been reported in the two locations. The two annotations may have the same or different supporting evidence. Similar reports of general and specific molecular function or biological process for a gene product could be handled the same way; for example, you may have direct experimental evidence (IDA) for DNA binding, but only a mutant phenotype (IMP) the more specific function term transcription factor activity and the process transcription. You also can annotate to multiple nodes that conflict with each other if there are conflicting claims in the literature.
  • An individual gene product that is part of a complex can be annotated to terms that describe the action (function or process) of the complex. This practice is colloquially known as annotating 'to the potential of the complex', and is a way to capture information about what a complex does in the absence of database objects and identifiers representing complexes. For molecular function annotations, also see Using the Qualifier column below.
  • A gene product should be annotated with terms reflecting its normal activity and location. A function, process, or localization (component) observed only in a mutant or disease state is therefore not usually included. In some circumstances, however, what is "normal" is a matter of perspective, depending on the organism being annotated and on the point of view of the annotator. For example, many viruses use host proteins to carry out viral processes. The host protein is then doing something abnormal from the perspective of the host, but completely normal from the perspective of the virus. GO annotators handle these cases by including two taxon IDs in the Taxon column of the gene association file; see annotating gene products that interact with other organisms for how to handle these cases.
  • The evidence code No Data (ND) should be used as an indicator of curation status to denote gene products for which no relevant information could be found. It distinguishes gene products with no data available from those that have not yet been annotated. For more details on the code and its usage, please consult the ND evidence code documentation.

Back to top

Database Objects

Because a single gene may encode very different products with very different attributes, GO recommends associating GO terms with database objects representing gene products rather than genes. At present, however, many participating databases are unable to associate GO terms to gene products, and therefore use genes instead. If the database object is a gene, it is associated with all GO terms applicable to any of its products. See the annotation file format guide for more information.

Back to top

References and Evidence

Every annotation must be attributed to a source, which may be a literature reference, another database or a computational analysis.

The annotation must indicate what kind of evidence is found in the cited source to support the association between the gene product and the GO term. A simple controlled vocabulary of evidence codes is used to capture this; please see the GO evidence code documentation for more information on the meaning and use of the evidence codes.

Back to top

Using the Qualifier column

The Qualifier column is used for flags that modify the interpretation of an annotation. Allowable values are NOT, contributes_to, and colocalizes_with.

NOT

NOT may be used with terms from any of the three ontologies.

NOT is used to make an explicit note that the gene product is not associated with the GO term. This is particularly important in cases where associating a GO term with a gene product should be avoided (but might otherwise be made, especially by an automated method). For example, if a protein has sequence similarity to an enzyme (whose activity is GO:nnnnnnn), but has been shown experimentally not to have the enzymatic activity, it can be annotated as NOT GO:nnnnnnn. (Note: in an email exchange from Sept. 2003 this phenomenon was referred to as "sequence dissimilarity.")

NOT can also be used when a cited reference explicitly says (e.g. "our favorite protein is not found in the nucleus"). Prefixing a GO ID with the string NOT allows annotators to state that a particular gene product is NOT associated with a particular GO term. This usage of NOT was introduced to allow curators to document conflicting claims in the literature.

Note that NOT is used when a GO term might otherwise be expected to apply to a gene product, but an experiment, sequence analysis, etc. proves otherwise. (It is not generally used for negative or inconclusive experimental results.)

colocalizes_with

colocalizes_with may be used only with cellular component terms.

Gene products that are transiently or peripherally associated with an organelle or complex may be annotated to the relevant cellular component term, using the colocalizes_with qualifier. This qualifier may also be used in cases where the resolution of an assay is not accurate enough to say that the gene product is a bona fide component member.

Example (from Schizosaccharomyces pombe):

Clp1p relocalizes from the nucleolus to the spindle and site of cell division; i.e. it is associated transiently with the spindle pole body and the contractile ring (evidence from GFP fusion). Clp1p is annotated to spindle pole body ; GO:0005816 and contractile ring ; GO:0005826, using the colocalizes_with qualifier in both cases.

contributes_to

contributes_to may be used only with molecular function terms.

As noted above, an individual gene product that is part of a complex can be annotated to terms that describe the function of the complex. Many such function annotations should use the qualifier contributes_to:

Annotating individual gene products according to attributes of a complex is especially useful for molecular function annotations in cases where a complex has an activity, but not all of the individual subunits do. (For example, there may be a known catalytic subunit and one or more additional subunits, or the activity may only be present when the complex is assembled.) Molecular function annotations of complex subunits that are not known to possess the activity of the complex must include the entry contributes_to in the Qualifier column. The contributes_to qualifier should not be used in biological process annotations. All gene products annotated using contributes_to must also be annotated to a cellular component term representing the complex that possesses the activity.

Annotations using contributes_to will often use the evidence code IC, but other codes may be used as well.

Note that contributes_to is not needed to annotate a catalytic subunit. Furthermore, contributes_to may be used for any non-catalytic subunit, whether the subunit is essential for the activity of the complex or not.

Examples

  • Subunits of nuclear RNA polymerases: none of the individual subunits have RNA polymerase activity, yet all of these subunits are annotated to DNA-dependent RNA polymerase activity (with the contributes_to note), to capture the activity of the complex.
  • ATP citrate lyase (ACL) in Arabidopsis: it is a heterooctamer, composed of two types of subunits, ACLA and ACLB in a A(4)B(4) stoichiometry. Neither of the subunits expressed alone give ACL activity, but co-expression results in ACL activity. Both subunits can be annotated to ATP citrate lyase activity.
  • eIF2: has three subunits (alpha, beta, gamma); one binds GTP; one binds RNA; the whole complex binds the ribosome (all three subunits are required for ribosome binding). So one subunit is annotated to GTP binding and one to RNA binding without qualifiers, and all three are annotated to ribosome binding, with the contributes_to qualifier. And all three are annotated to the component term for eIF2 complex.

Back to top

Annotating gene products that interact with other organisms

The majority of gene products act within the organism that encoded them. However, sometimes gene products encoded by one organism can act on or in other organisms. For example, in obligate parasitic species (including viruses), almost all their gene products will be interacting with their host organism. Interactions may also be between organisms of the same species: for example, the proteins used by bacteria to adhere to one another to form a biofilm.

For annotating gene products involved in these multi-organism interactions, there are special terms in the biological process ontology, under multi-organism process ; GO:0051704, and in the cellular component ontology, under other organism ; GO:0044215. More specific information can be found in the biological process documentation on multi-organism processes and in the cellular component guidelines on host cell.

The species in the interaction are recorded in an annotation by using terms from this node and entering two taxon IDs in the Taxon column. The first taxon ID should be that of the species encoding the gene product, and the second should be the taxon of the other species in the interaction. Where the interaction is between organisms of the same species, both taxon IDs should be the same. The taxon column of the annotation file is described in more detail in the annotation file format guide.

An additional taxon ID should not be added in cases where the annotation is based on sequence or structural similarity.

Nomenclature Conventions

The terms 'symbiont' and 'host' may carry connotations of the nature of the interaction between two organisms, but in the Gene Ontology, they are used solely to differentiate between organisms on the basis of their size. The word symbiont is used to refer to the smaller organism in a symbiotic interaction; the larger organism is called the host. If the two organisms are the same size, the term will be contain other organism. Note that parasites and pathogens are also referred to as 'symbionts', as symbiosis encompasses parasitism, commensalism and mutualism.

Requesting new terms in the multi-organism process node

Like the rest of GO, the multi-organism process node is not complete, and you will probably have to request some new terms when annotating your gene products. These should be submitted via the GO curator requests tracker in the usual way. Here are a few points to bear in mind when requesting new terms, and annotating using this node:

  • A term name should make the direction of the interaction clear. An example of this is given below; induction of nodule morphogenesis in host would be used to annotate the symbiont gene product, while induction of nodule morphogenesis by symbiont is used to annotate the host genes. Both processes would be children of a common term nodulation.
  • If your gene product affects a 'normal' host process, you should always request a new term in the MOP node, rather than just annotating directly to the term in the 'normal' ontology. So for example, if your bacterial gene product regulates the ethylene-mediated signaling pathway in plants, rather than using dual taxon to annotate to regulation of ethylene mediated signaling pathway ; GO:0010104, you should instead request a new term regulation of ethylene mediated signaling pathway in host.
  • Where an organism subverts a 'normal' biological process, e.g. the transcription of viral DNA by host transcription machinery, host proteins should not be annotated to a 'symbiont' term like transcription of symbiont DNA. This is because this would be considered considered a pathological process, i.e. not 'normal' for the host.

Example: Performing a process with another organism

Nod factor export proteins transfer nod factors out of the purple bacterium Sinorhizobium meliloti into the surrounding soil. Here they are detected by LysM nod factor receptor kinases in Medicago truncatula roots and initiate the process of nodulation.

Annotation of Nod factor export ATP-binding protein I from S. meliloti

suggest a new term induction of nodule morphogenesis in host

nodulation ; GO:0009877
[p] induction of nodule morphogenesis in host ; GO:00new01

Sinorhizobium meliloti taxonomy ID: 382
Medicago truncatula taxonomy ID: 3880

protein name: Nod factor export ATP-binding protein I
GO term: induction of nodule morphogenesis in host ; GO:00new01
taxon column: taxon:382|taxon:3880

Annotation of LysM receptor kinase LYK3 precursor from M. truncatula

suggest a new term induction of nodule morphogenesis by symbiont

nodulation ; GO:0009877
[p] induction of nodule morphogenesis by symbiont ; GO:00new02

Medicago truncatula taxonomy ID: 3880
Sinorhizobium meliloti taxonomy ID: 382

protein name: LysM receptor kinase LYK3 precursor
GO term: induction of nodule morphogenesis by symbiont ; GO:00new02
taxon column: taxon:3880|taxon:382

Example: Performing a process in more than one species

The protein cardiotoxin from the southern Indonesian spitting cobra Naja sputatrix kills mammalian cells by cytolysis when it enters the host cell cytoplasm.

Annotation of cardiotoxin precursor, from N. sputatrix

use the GO terms cytolysis of cells of another organism ; GO:0051715 and host cell cytoplasm ; GO:0030430

Naja sputatrix taxonomy ID: 33626
Mammalia taxonomy ID: 40674

protein name: cardiotoxin precursor
GO term: cytolysis of cells of another organism ; GO:0051715
taxon column: taxon:33626|taxon:40674

protein name: cardiotoxin precursor
GO term: host cell cytoplasm ; GO:0030430
taxon column: taxon:33626|taxon:40674

Example: Regulating a process in another organism

Mosquito saliva contains D7 proteins, which bind biogenic amines in order to suppress hemostasis in humans.

Annotation of D7 protein long form, from A. gambiae

suggest a new term negative regulation of hemostasis in host

evasion of host defense response ; GO:0030682
[i] negative regulation of hemostasis in host ; GO:00new03

Anopheles gambiae taxonomy ID: 7165
Homo sapiens taxonomy ID: 9606

protein name: D7 protein long form
GO term: negative regulation of hemostasis in host ; GO:00new03
taxon column: taxon:7165|taxon:9606

Back to top

Downstream Process guidelines

Where there is limited knowledge regarding the processes that a gene product is directly involved in, curators may often have annotated to terms that describe the processes that are downstream of the direct activity of the gene product. Where more knowledge regarding a gene product's functional activity exists, curators need to make a judgement as to how to represent its direct activities and whether to continue to include downstream processes in the annotation set. Curators are encouraged to request more specific terms to describe how the gene product is involved in a downstream process and also evaluate the annotation set as more functional information becomes available. More detailed curator guidance is provided below.

Requesting more specific terms for downstream processes

Where a specific, descriptive GO term does not exist (for instance to describe the involvement of a process in another process), curators are encouraged to request these terms to provide more specificity to their annotation.

For example, to describing the "intent" of growth factor BMP2 to change the "state" of the cell is instrumental in cardiac cell differentiation. Therefore requesting the new GO term BMP signaling involved in cardiac cell differentiation would make it possible to qualify how the gene product is involved in the downstream process of cardiac cell differentiation than annotating to separate terms BMP signaling and cardiac cell differentiation.

Annotating downstream processes for gene products involved in core or specific processes

Curators should annotate to the experimental evidence in the paper. However, curator judgement should be used, taking into account what the curator knows about:

  1. The background of the gene product; is it widely known to have a central role causing it to affect multiple processes, or does it have few specific targets?
  2. the quality of the experimental assays performed in the paper; are they fully explained and the evidence supplied convincing? (See separate guidelines for annotation of high-throughput experiments.)

Example 1. Gene product involved in core process.

Yeast RNA polymerase II subunit RPB2

RNA polymerase II subunit RPB2 has a core function of RNA polymerase activity, which has downstream effects on a large number of processes. However, curators should only annotate to the gene product's transcription activity, rather than the multiple downstream processes altered as a consequence of its activity.

Yeast spliceosome

In S. cerevisiae, the mutation of several genes that are components of the spliceosome result in translation defects. However, later work supplied evidence for the genes' involvement in mRNA splicing, not translation. Downstream effects on translation are to be expected as many ribosomal transcripts are spliced in yeast. The curation decision was to remove annotations to the term translation for spliceosome component genes once data was available to describe the direct activity the genes contributed towards.

Example 2. Gene product involved in core and specific process(es).

S. pombe gene Sre1

The S. pombe gene Sre1 is a transcriptional regulator of genes that are involved in heme and phosphoplipid biosynthesis. From reading PMID:16537923, the curator decided this information should be captured in the annotation. Therefore annotations were made to:

  • RNA polymerase II core promoter proximal region sequence-specific DNA binding
  • regulation of transcription, DNA-dependent or regulation of transcription from RNA polymerase II promoter
  • positive regulation of heme biosynthesis
  • positive regulation of phospholipid biosynthesis

In addition, in accordance with these guidelines for annotating downstream processes, we would recommend that new terms are requested for:

  • regulation of transcription involved in heme biosynthesis
  • regulation of transcription involved in phospholipid biosynthesis

Annotating downstream processes to gene products in a ligand-receptor signaling pathway

Curators should anotate ligand-receptor signaling pathways as shown in the following diagrams.

For a signaling pathway, the ligand is considered part of the pathway. Therefore a factor which limits or increases the availability of a ligand to a receptor should be annotated as regulating the ligand/receptor pathway.

N.b. Ongoing work to clarify of the start/end of a signaling pathway in the definition of GO terms will allow us to refine these guidelines.

General ligand-receptor pathway

ligand-receptor pathway diagram
Stimulus
regulation of signaling pathway
Ligand
signaling pathway
regulation of other cellular processes
Receptor
signaling pathway
regulation of other cellular processes
Signaling molecules
signaling pathway
regulation of other process(es)
regulation of gene-specific transcription
regulation of translation
(regulation of) transcription in response to stimulus ligand
(regulation of) transcription involved in other process(es)
(regulation of ) other cellular process(es)
Transcription factors*
signaling pathway
regulation of transcription involved in other process(es)
Target
cellular response to stimulus
other process(es)
regulation of other processes

* We would not consider annotating the core transcription machinery to the downstream (other) processes that the target is involved in unless the transcription factor is gene-specific, in which case we would annotate to regulation of transcription involved in other process(es)

Regulation of glucose transport

ligand-receptor pathway diagram
Insulin (ligand)
insulin receptor signaling pathway
regulation of glucose transport/homeostasis
Insulin receptor (receptor)
Insulin receptor signaling pathway
Regulation of glucose transport/homeostasis
IRS1, PI3K, PDK1, PKC (signaling molecules)
Insulin receptor signaling pathway
Regulation of glucose transport/homeostasis
Protein localization at cell surface (NTR: involved in response to insulin)
GLUT4 (target)
Cellular response to insulin
Glucose transport/homeostasis

General note on current status of revision of annotation sets

If a gene product has limited experimental literature, such as a newly characterised protein, it is understandable that curators need to annotate to more general 'downstream' process terms that may, in fact, represent a phenotype.

However, as more functional information is published about a gene product, curators may decide to revise these annotations to downstream processes. However currently different actions are taken by different curation groups, based on considerations of user requirements and curation capacity:

  1. Annotations may be removed to indirect/downstream processes, or updated to 'regulation' terms. This 'deleted' information is usually stored in the annotating group's phenotype database.
  2. Annotations not removed to indirect/downstream processes because
    • downstream annotations are supported by good evidence, or the group wants to keep as history of annotation or give a complete overview of knowledge about the gene product.
    • the curation group does not have resources to revise annotation sets or do not have alternative place to store data

Curation groups need to be aware that keep annotations to downstream processes will be a source of such data to other groups who may have a different annotation philosophy.

Back to top

Binding guidelines

Using terms that imply binding of substrates

As many terms in the Molecular Function ontology implicitly or explicitly imply the binding of a chemical or protein, it is unnecessary to co-annotate a gene product to a term from the binding node of GO to describe the binding of substrates or products that are already adequately captured in the definition of the Molecular Function term. For instance, a protein with enzymatic activity MUST bind all of the substrates and products of the reaction it catalyzes. Similarly, a protein with transporter activity MUST bind the molecules it transports. The curator should try to capture the specifics as much as feasible and avoid redundant annotations. Annotate to a binding term whenever an experiment shows binding, but not catalysis/transport. Curators should use their judgment to decide whether the interaction is physiologically relevant and capture information relevant to the in vivo situation.

Choosing more descriptive terms than 'protein binding'

Child terms that describe a particular class of protein binding (e.g. GO:0030971:receptor tyrosine kinase binding) should be used in preference to the parent term GO:0005515 protein binding. The IPI evidence code should be used where possible for annotation of all protein-protein interactions and the precise identity of the interacting protein should be captured in the 'with' column (8). At present a variety of identifiers can be used in the 'with' column (8) or the annotation extension column (16), see GO Annotation File Format 2.0 Guide.

Identifying binding partners using columns 8 and 16

When a gene product is being annotated to a binding activity term, the 'with' column (8) and/or the annotation extension column (16) can be used to capture additional information about the identify of the binding partner of the gene product being annotated. To understand when to use column 8, column 16, or both, it is important to remember that entries in column 8 support the evidence used to infer the function, while entries in column 16 modify the GO term used in the GO_ID column (5). The curator also needs to remember that the 'with' column (8) can be used with only a subset of evidence codes: IPI, IC, IEA, IGI, IMP or ISS; column 8 cannot be used with an IDA evidence code, see evidence code documentation.

Examples of using the 'with' column (8)

The annotation of Protein A to a GO binding term with evidence code IPI and Protein B in the 'with' column (8) makes the statement that Protein A has the binding activity defined by the GO term and this function was inferred from interaction with Protein B; binding to Protein B isn't necessarily the in vivo function of Protein A.

  1. Column 8 can be used to make annotations based on experiments where the evidence for the function of Protein A binding Protein B in species X is based on binding of protein B from species Y. For example, the C. elegans Unc-115 protein was shown to bind to actin filaments made with actin purified from rabbit skeletal muscle. This would be annotated as GO:0051015:actin filament binding using an IPI evidence code and putting an accession for rabbit skeletal muscle actin, UniProtKB:P68135, in the 'with' column (8). This annotation makes the statement that C. elegans Unc-115 has the molecular function of actin filament binding inferred from experiments using rabbit actin.
  2. Column 8 can be used to indicate that the evidence for binding a small molecule is based on an experiment using an analog. The annotation Protein A GO:0005524:ATP binding IPI column 8 ATP-gamma-S captures the information that ATP binding activity was inferred from binding of a non-hydrolyzable ATP analog.

Examples of using the annotation extension column (16)

The annotation of Protein A to a GO function term with Protein B and a has_participant relationship in the annotation extension column (16) makes the statement that an in vivo target of Protein A is Protein B. This is equivalent to the post-compositional creation of a new child term.

3) The zebrafish Lnx2b protein (UnitProtKB:A4VCF7) was shown to ubiquitinate zebrafish Dharma (UniProtKB:O93236) in PMID:19668196. Therefore Lnx2b can be annotated to GO:0004842:ubiquitin-protein ligase activity adding has_input UniProtKB:O93236 in annotation extension column (16). This annotation makes the statement that Dharma is a substrate of the ubiquitin-protein ligase activity of Lnx2b.

4) The human ABCG1 protein has been annotated to GO:0034041 sterol-transporting ATPase activity with an IDA evidence code. The experiments in the paper, demonstrate that the target is 7-hydroxycholesterol; this information can be added to the annotation by including the ChEBI ID for 7-hydroxycholesterol, CHEBI:42989, in the annotation extension column (16): post-composing the GO term 7-hydroxycholesterol-transporting ATPase activity.

The 'with' column (8) and the annotation extension column (16) should be used only for direct interactions and only when the binding relationship is not already included in the GO term and/or definition. See column 16 documentation for relationship types to use when adding IDs in the annotation extension column (16).

Ontology development for protein binding

Future ontology development efforts should be relied upon to improve the searching capability of any user who is specifically interested in gene products carrying out a certain type of substrate/product binding. Ongoing relevant ontology development of 'has_part' relationships will provide links to implied substrate binding (the GOC are developing 'has_part' relationships to implying substrate binding). The existing GO will follow this new format, e.g. Transcription factor activity will have a 'has_part' relationship to DNA binding rather than an 'is_a' relationship. Curators should request new 'has_part' relationships (and terms) if these do not exist.

Back to top

'Response to' guidelines

The definition of the top-level 'response to' terms has been updated to indicate where the response begins and ends:

Any process that results in a change in state or activity of a cell or organism as the result of a stimulus. The process begins with detection of the stimulus and ends with a change in state or activity or the cell or organism.

This change was made and released in ontology version 1.1960

Examples:

response to stimulus ; GO:0050896

Any process that results in a change in state or activity of a cell or an organism (in terms of movement, secretion, enzyme production, gene expression, etc.) as a result of a stimulus. The process begins with detection of the stimulus and ends with a change in state or activity or the cell or organism

GO:0051716 cellular response to stimulus

Any process that results in a change in state or activity of a cell (in terms of movement, secretion, enzyme production, gene expression, etc.) as a result of a stimulus. The process begins with detection of the stimulus by a cell and ends with a change in state or activity or the cell.

Advisory quality control check: High level 'response to' terms should not directly be used for annotation, unless additional information is supplied in column 16.

Update guidelines: Encourage the use of granular terms for 'responses'

Update guidelines: Be careful to use IEP when the experiment is observing expression level. Example: PMID:8888624 and annotation for A. thaliana BIP1. Should use IEP than IDA

Back to top

Use of Regulation Terms

Background

The GO Consortium recognized quite early on in the development of the Biological Process ontology that there were gene products that participated directly in a process and gene products that regulated a process, positively and/or negatively. But how do curators know to which of these terms they should be annotating and is it possible, for a given process, to annotate the same gene product to both a parent term and one of its associated regulation term?

To begin to address these questions here are some guidelines for annotating, or not, to regulation terms:

Guideline 1: Use existing biological knowledge to define the process.

In order to determine whether a gene product participates in a process or regulates that process (or both) curators need to consider the nature of the process. Processes can be considered as ordered assemblies of molecular functions and every process has a beginning, middle, and end.

Use existing biological knowledge and the paper being curated as guides. Is there a defined pathway, i.e. distinct molecular functions, and have the gene products that perform those functions been identified? Does the gene product being annotated perform one of those functions or a function outside of the process that might start, stop, or change the rate at which the process proceeds?

In reality, the beginning, middle, and end of some processes will be easier to define than others. For example, signaling pathways, such as MAPK signaling, will be easier to define than broader, organismal-level processes such as embryonic development. Curators should use their jugdement, based on the published literature, to guide their annotation.

Example: Atg1

Saccharomyces cerevisiae Atg1 encodes a protein kinase that is involved in autophagy: "The process by which cells digest parts of their own cytoplasm; allows for both recycling of macromolecular constituents under conditions of cellular stress and remodeling the intracellular structure for cell differentiation."

Atg1 activity is critical for the induction of autophagy, specifically for formation of autophagic vacuoles. Should Atg1 be annotated to autophagic vacuole formation or regulation of autophagic vacuole formation? Authors have used language that could lead curators to make annotations to either term.

In this case, annotators need to consider the sum of what is known about the autophagic pathway and Atg1's role in that pathway.

Using that knowledge, SGD has annotated Atg1 to the parent process term, autophagic vacuole formation, because once Atg1 is active, the 'go' or 'no go' decision for autophagy has already been made. More upstream genes appear to actually be regulating the autophagic pathway.

http://wiki.geneontology.org/index.php/2010_GO_camp_Use_of_Regulation_issues#Example_2

Guideline 2: If you aren't sure, consider annotating to the parent process term.

If the gene product performs one of the functions, annotate directly to the process. If the gene product regulates then it should be annotated to regulation of that process.

If you aren't sure what term to use, annotate to the parent process term. As more information about the process becomes available, you may be able to refine your annotations (see Guideline #4 below).

Guideline 3: Improve the ontology by defining, wherever possible, the beginning, middle, and end of a process.

Wherever possible, include the beginning, middle, and end of a process in the corresponding term definition. This will help annotators choose the appropriate term for their annotations.

Guideline 4: Revisit annotations when new knowledge becomes available.

GO annotations should reflect the present state of biological knowledge. Therefore, as the understanding of a biological process improves, it may be necessary to revisit and refine existing annotations.

Guideline 5: Annotations based on mutant phenotypes should take mechanism into account.

Mutant phenotypes are often used to make annotations to regulation terms because they fit the criteria of the term definition, i.e. authors report a change in the frequency, rate, or extent of a process.

However, in using IMP to correctly make regulation annotations it is important to consider various factors, including: 1) the assay type, 2) nature of the alleles (null vs reduction of function), and 3) molecular identity of the gene product.

Again, if it isn't clear that a gene product is involved in regulation, it is better to annotate to the parent process term.

Example: muscle contraction and C. elegans mutants

In C. elegans, a number of genes can mutate to paralysis or slowed locomotion due to defects in muscle contraction. This includes genes that encode everything from myosin heavy chain to calcium channels to transcription factors. Depending upon the nature of the allele, sometimes the mutant phenotypes for the same gene can lead to both process and regulation terms. In this case, consideration of the process, the nature of the allele (complete or partial loss of function), and the molecular identity of the gene product can guide curators in making the appropriate annotation.

http://wiki.geneontology.org/images/4/47/Regulation_example.pdf

Guideline 6: Some gene products may be annotated to both a process and regulation of that process.

Positive and negative feedback loops are an essential part of many signaling pathways.

If one member of a pathway regulates the activity of a different member of the pathway, it could be annotated to both the process and regulation of that process.

When annotating gene products involved in a signaling pathway, however, curators should not annotate gene products that directly activate the next gene product in the pathway to regulation of that pathway.

For example, MAPKK would not be annotated to positive regulation of MAPKKK cascade just because it phosphorylates and activates MAPK.

However, gene products that (for example) feedback on to earlier steps in the pathway, may be annotated to both the parent process term and a regulation term.

Example: ERK1/2

ERK1/2 activation requires activity of FRS2alpha which, in turn, is negatively regulated by activated ERK1/2.

Could ERK1/2 be annotated to both MAPKKK cascade and negative regulation of MAPKKK cascade?

Phosphoprotein Enriched in Astrocytes 15 kDa (PEA-15) Reprograms Growth Factor Signaling by Inhibiting Threonine Phosphorylation of Fibroblast Receptor Substrate 2{alpha}

Cases where the presence/absence of one of the members of a pathway is limiting should not be annotated to regulation, e.g. if the amount of a receptor on the surface of a cell regulates the process, the receptor should not be annotated to the regulation term.

Back to top

Use of Transcription related terms

The transcription branch of the ontology was overhauled in 2011 to remove any overlap between Function and Process terms and to accurately represent Function terms so they actually describe molecular activities (how something occurs). As a consequence the changes will affect the annotations. For example, if the experiments indicate that a gene product is involved in regulating transcription but gave no indication on how it acts, it would be appropriate to annotate that gene product only to a Process terms. The Transcription Annotation Guide is available to facilitate the process of annotating gene products using this new ontology structure.