Last updated December 5, 2005 dictyBase README ---------------- Contents -------- 1. Introduction 2. Gene Association File 3. Columns in the Gene Association File 4. dictyBase Nomenclature 5. Annotation Priorities 6. Methods of Annotation 7. Contact Information 1. Introduction ---------------- dictyBase (http://dictybase.org) provides a single access point for information about the social amoeba Dictyostelium discoideum, including genome and functional annotations, curated Dictyostelium literature, tools for discovering homology and functional relationships using BLAST and Gene Ontology annotations, investigators involved in Dictyostelium research, overviews of Dictyostelium biology and the Dicty Stock Center. 2. Gene Association File ------------------------- gene_association.ddb.gz This file contains all GO annotations for Dictyostelium discoideum gene products (protein and RNA) in dictyBase: http://dictybase.org This file is available for download at: http://www.geneontology.org/GO.current.annotations.shtml This file uses the standard file format for gene_association files of the Gene Ontology (GO) Consortium: http://www.geneontology.org/index.shtml A description of the file format is found here: http://www.geneontology.org/doc/GO.annotation.html#file This file is updated weekly to reflect changes to the GO annotations within the dictyBase database. 3. Columns in the Gene Association File ---------------------------------------- The columns in the gene_association.ddb.gz file are: 1. DB Database contributing the file. ["DDB" for dictyBase] 2. DB_Object_ID A unique identifier in the DB for the gene product being annotated. ["DDB" followed by seven digits, also referred to as the dictyBaseID] Example: DDB0229847 3. DB_Object_Symbol A (unique and valid) symbol to which DB_Object_ID is matched. [The name of the gene product, see below for nomenclature] Example: rnrB 4. Qualifier This column is used for flags that modify the interpretation of an annotation. [This field may be equal to: NOT, colocalizes_with, contributes_to] Example: NOT calmodulin binding 5. GOid The GO identifier for the term attributed to the DB_Object_ID. Example: GO:0030150 6. DB:Reference Reference cited to support the annotation. [This may be a published paper or a dictyBase unpublished reference] Examples: PMID:8486739, (Reference_No)DDB:6182, (Reference_No)DDB:9851 7. Evidence The evidence supporting the annotation. [One of either TAS, IDA, IMP, IGI, IPI, ISS, IEP, NAS, IC, or ND] Example: IMP 8. With/From The DB_Object_ID from which the annotation derives. [This column is used for evidence codes IEA, ISS, and IPI] Examples: UniProt:Q15208, SGD:S000005105, InterPro:IPR000719 9. Aspect One of the three ontologies: P (biological Process), F (molecular Function) or C (cellular Component). Example: F 10. DB_Object_Name Name of gene or gene product [Gene product names generated both manually and automatically] Example: calcium-calmodulin dependent protein kinase 11. Synonym Other names by which the gene is known in the DB. [Multiple synonyms may exist; when this occurs, synonyms are separated by pipes (|)] Example: SKP1|FP21|fpa2 12. DB_Object_Type What kind of gene product is being annotated. [Can be gene, transcript, protein, protein_structure, or complex; currently all GO annotations in dictyBase are attributed to genes; the GO terms describe the attributes of the products (both RNA and protein) encoded by genes] Example: gene 13. Taxon_ID Identifier for the species being annotated. [Always taxonomy id:44689 for the species Dictyostelium discoideum; taxonomy id:5782 for the genus Dictyostelium] 14. Date The date of last annotation update in the format 'YYYYMMDD' Example: 20051013 15. Assigned_By Describes the source of the annotation. [Always "DDB" for dictyBase] 4. dictyBase Nomenclature -------------------------- Notes on dictyBase nomenclature; relevant to columns 3 (DB_Object_Symbol) and 11 (Synonym): a) In general, Dictyostelium discoideum names conform to the Demerec system: Demerec, M. et al. (1966). A proposal for a uniform nomenclature in bacterial genetics. Genetics 54:61-76. In the Demerec system, a gene name consists of three lowercase, italicized letters, followed by a capital italicized letter to distinguish genes with the same descriptor that are related in a significant way. A protein may be named after the gene encoding it by capitalizing the first letter and without the use of italics. The first three letters should, ideally, describe either mutant phenotype or molecular function. [Examples: rdeA, rdeB and rdeC; or tagA, tagB and tagC] b) Dictyostelium genes may be non-Demerec as well, particularly if the gene name is consistent across species. [Examples: abcA1, abcA2, abcB1, atg1, atg4] c) Other, less descriptive gene names exist in dictyBase. Genes that have not been manually inspected by a curator retain the identifier given by the Dictyostelium Sequencing Consortium. [Examples: JC2V2_0_01058, BC5V2_0_01489, BEC6V2_0_01292] d) Non-descriptive names are also given to manually curated genes that have not been previously identified or described in the literature. In these cases, the DB_Object_Symbol is identical to the DB_Object_ID (dictyBaseID) preceded by the word "gene." [Examples: geneDDB0216241, geneDDB0220099, geneDDB0229347] For more information on nomenclature, please see the dictyBase Gene Nomenclature Guidelines at: http://dictybase.org/NomenclatureGuidelines.htm#genes 5. Annotation Priorities ------------------------- dictyBase curators annotate gene products using the Gene Ontology with the following priorities: a) Gene products in the primary literature b) Genes of interest to researchers studying Dictyostelium c) Genes with similarity to genes of known function d) Genes with no similarity to genes of known function (and have ESTs) 6. Methods of Annotation ------------------------- dictyBase curators annotate gene products using the following: a) Information extracted from the published literature, including experimental and similarity-based statements of biological process, molecular function, and cellular component b) Tools to predict biological process, molecular function, and cellular component based on sequence similarity, orthology, and predictions of cellular location c) Monthy consistency checks to ensure all curators annotate using the same standards For more information on GO annotation at dictyBase, please visit: http://dictybase.org/SOPs/GOCuration.html 7. Contact Information ----------------------- The dictyBase database is held and maintained at Northwestern University, Chicago, IL, USA. dictyBase is supported by NIH (NIGMS and NHGRI). Questions about this file should be sent to: dictybase@northwestern.edu