Last updated December 5, 2005

dictyBase README
----------------

Contents
--------

1.  Introduction
2.  Gene Association File
3.  Columns in the Gene Association File
4.  dictyBase Nomenclature
5.  Annotation Priorities
6.  Methods of Annotation
7.  Contact Information

1.  Introduction
----------------

dictyBase (http://dictybase.org) provides a single access point for information about
the social amoeba Dictyostelium discoideum, including genome and functional
annotations, curated Dictyostelium literature, tools for discovering homology and
functional relationships using BLAST and Gene Ontology annotations, investigators
involved in Dictyostelium research, overviews of Dictyostelium biology and the Dicty
Stock Center.

2.  Gene Association File
-------------------------

gene_association.ddb.gz

This file contains all GO annotations for Dictyostelium discoideum gene products
(protein and RNA) in dictyBase:

  http://dictybase.org

This file is available for download at:

  http://www.geneontology.org/GO.current.annotations.shtml

This file uses the standard file format for gene_association files of the Gene
Ontology (GO) Consortium:

  http://www.geneontology.org/index.shtml


A description of the file format is found here:

  http://www.geneontology.org/doc/GO.annotation.html#file

This file is updated weekly to reflect changes to the GO annotations within the
dictyBase database.

3.  Columns in the Gene Association File
----------------------------------------

The columns in the gene_association.ddb.gz file are:

1.  DB 
    Database contributing the file.
    ["DDB" for dictyBase]
    
2.  DB_Object_ID
    A unique identifier in the DB for the gene product being annotated.
    ["DDB" followed by seven digits, also referred to as the dictyBaseID]
    Example: DDB0229847
    
3.  DB_Object_Symbol
    A (unique and valid) symbol to which DB_Object_ID is matched.
    [The name of the gene product, see below for nomenclature]
    Example: rnrB
    
4.  Qualifier
    This column is used for flags that modify the interpretation of an annotation.
    [This field may be equal to: NOT, colocalizes_with, contributes_to]
    Example: NOT calmodulin binding
    
5.  GOid
    The GO identifier for the term attributed to the DB_Object_ID.
    Example: GO:0030150
    
6.  DB:Reference
    Reference cited to support the annotation.
    [This may be a published paper or a dictyBase unpublished reference]
    Examples: PMID:8486739, (Reference_No)DDB:6182, (Reference_No)DDB:9851 
    
7.  Evidence
    The evidence supporting the annotation.
    [One of either TAS, IDA, IMP, IGI, IPI, ISS, IEP, NAS, IC, or ND]
    Example: IMP
    
8.  With/From
    The DB_Object_ID from which the annotation derives.
    [This column is used for evidence codes IEA, ISS, and IPI]
    Examples: UniProt:Q15208, SGD:S000005105, InterPro:IPR000719
    
9.  Aspect
    One of the three ontologies: P (biological Process), F (molecular Function) 
    or C (cellular Component).
    Example: F
    
10. DB_Object_Name
    Name of gene or gene product
    [Gene product names generated both manually and automatically]
    Example: calcium-calmodulin dependent protein kinase
    
11. Synonym
    Other names by which the gene is known in the DB.
    [Multiple synonyms may exist; when this occurs, synonyms are separated by pipes
    (|)]
    Example: SKP1|FP21|fpa2
    
12. DB_Object_Type
    What kind of gene product is being annotated.
    [Can be gene, transcript, protein, protein_structure, or complex; currently all GO
    annotations in dictyBase are attributed to genes; the GO terms describe the
    attributes of the products (both RNA and protein) encoded by genes]
    Example: gene
    
13. Taxon_ID
    Identifier for the species being annotated.
    [Always taxonomy id:44689 for the species Dictyostelium discoideum; taxonomy
    id:5782 for the genus Dictyostelium]

14. Date
    The date of last annotation update in the format 'YYYYMMDD'
    Example: 20051013

15. Assigned_By
    Describes the source of the annotation.
    [Always "DDB" for dictyBase]

4.  dictyBase Nomenclature
--------------------------

Notes on dictyBase nomenclature; relevant to columns 3 (DB_Object_Symbol) and 11
(Synonym):

  a)  In general, Dictyostelium discoideum names conform to the Demerec system:
  
  Demerec, M. et al. (1966). A proposal for a uniform nomenclature in bacterial
  genetics. Genetics 54:61-76.

  In the Demerec system, a gene name consists of three lowercase, italicized letters,
  followed by a capital italicized letter to distinguish genes with the same
  descriptor that are related in a significant way.  A protein may be named after the
  gene encoding it by capitalizing the first letter and without the use of italics.
  The first three letters should, ideally, describe either mutant phenotype or
  molecular function.
  
  [Examples: rdeA, rdeB and rdeC; or tagA, tagB and tagC]

  b)  Dictyostelium genes may be non-Demerec as well, particularly if the gene name
  is consistent across species.
  
  [Examples: abcA1, abcA2, abcB1, atg1, atg4]

  c)  Other, less descriptive gene names exist in dictyBase.  Genes that have not
  been manually inspected by a curator retain the identifier given by the
  Dictyostelium Sequencing Consortium.
  
  [Examples: JC2V2_0_01058, BC5V2_0_01489,
  BEC6V2_0_01292]

  d)  Non-descriptive names are also given to manually curated genes that have not
  been previously identified or described in the literature.  In these cases, the
  DB_Object_Symbol is identical to the DB_Object_ID (dictyBaseID) preceded by the
  word "gene."
  
  [Examples: geneDDB0216241, geneDDB0220099, geneDDB0229347]

For more information on nomenclature, please see the dictyBase Gene Nomenclature
Guidelines at:

  http://dictybase.org/NomenclatureGuidelines.htm#genes

5.  Annotation Priorities
-------------------------

dictyBase curators annotate gene products using the Gene Ontology with the following
priorities:

  a)  Gene products in the primary literature
  
  b)  Genes of interest to researchers studying Dictyostelium
  
  c)  Genes with similarity to genes of known function
  
  d)  Genes with no similarity to genes of known function (and have ESTs)

6.  Methods of Annotation
-------------------------

dictyBase curators annotate gene products using the following:

  a)  Information extracted from the published literature, including experimental and
  similarity-based statements of biological process, molecular function, and cellular
  component
  
  b)  Tools to predict biological process, molecular function, and cellular component
  based on sequence similarity, orthology, and predictions of cellular location
  
  c)  Monthy consistency checks to ensure all curators annotate using the same
  standards

For more information on GO annotation at dictyBase, please visit:

  http://dictybase.org/SOPs/GOCuration.html

7.  Contact Information
-----------------------

The dictyBase database is held and maintained at Northwestern University, Chicago,
IL, USA.

dictyBase is supported by NIH (NIGMS and NHGRI).

Questions about this file should be sent to: dictybase@northwestern.edu