!ZFIN README file for gene_association.zfin !version: $Revision: 1.4 $ !date: $Date: 2011/06/07 23:02:13 $ !from: ZFIN !saved-by: Doug Howe !send comments to: curators@zfin.org I. TABLE OF CONTENTS ================================================================================ I......TABLE OF CONTENTS II.....INTRODUCTION III....GENE_ASSOCIATION.ZFIN FILE FORMAT IV.....METHODS OF GO ANNOTATION IV.1....DATABASE OBJECTS IV.2....REDUNDANCY IN GENE_ASSOCIATION.ZFIN IV.3....ELECTRONIC (IEA) GO ANNOTATION IN ZFIN IV.4....USE OF THE ND EVIDENCE CODE IN ZFIN V......CONTACT INFORMATION ================================================================================ II. INTRODUCTION ================================================================================ This file provides a brief description of how GO data is captured in ZFIN and how it is displayed in the gene_association.zfin file. The gene_association.zfin file is updated weekly in the GO CVS, generally on Tuesday or Wednesday. ZFIN is a database of genetic and genomic data for the zebrafish (Danio rerio), produced by a team of professional curators and computer scientists. The data found in ZFIN is a combination of information curated from scientific literature, and provided through reciprocal collaborations with UniProt, NCBI, and specific labs. Funding for ZFIN is provided by the NIH (P41 HG002659).. For additional information, please visit the ZFIN website at ZFIN.org or contact curators@zfin.org. III. GENE_ASSOCIATION.ZFIN FILE FORMAT ================================================================================ The gene_association.zfin file contains GO annotations for zebrafish gene products. The gene_association.zfin file uses the standard file format for gene_association files of the Gene Ontology (GO) Consortium. A more complete description of the file format is found here: http://www.geneontology.org/GO.format.gaf-2_0.shtml The following provides a brief description of the columns in the gene_association files. Lines beginning 'ZFIN File:' refer specifically to the format and contents found in the gene_association.zfin file. 1: DB (cardinality = 1) ----------------------- The database contributing the gene_association file ZFIN File: always "ZFIN" for gene_association.zfin. 2: DB_Object_ID (cardinality = 1) --------------------------------- A unique identifier for the object being annotated. ZFIN File: This is always the unique ZFIN identifier for a zebrafish gene. Example: ZDB-GENE-990415-72 ; format is ZDB-GENE-######-#[###] where the last three digits are optional. 3: DB_Object_Symbol (cardinality = 1) ------------------------------------- A gene symbol for the gene having the identifier found in DB_Object_ID. ZFIN File: This is always the primary gene symbol for a zebrafish gene. Example: fgf8 4: Qualifier (cardinality = 0,1) -------------------------------- This field will be empty the vast majority of the time. When data is present, it will be one of the annotation qualifiers 'NOT', 'contributes_to', or 'colocalizes with' 5: GO ID (cardinality = 1) -------------------------- The unique GO identifier for the GO term being attributed to the DB_Object_ID. Example: GO:0005160 ; always has format GO:####### 6: DB:Reference (cardinality = 1) --------------------------------- The unique identifier for the reference to which the GO annotation is attributed. ZFIN File: Each ZFIN reference including published literature and automated annotation methods has a unique identifier in the form ZDB-PUB-######-#[###], where the last 3 digits are optional. This field contains a ZFIN PUB ID, and when possible, the PubMed ID is also listed, separated from the ZDB-PUB ID with a pipe (|). Example: ZFIN:ZDB-PUB-030716-20|PMID:12798293 7: Evidence (cardinality = 1) ----------------------------- The evidence code for the GO annotation; one of IMP, IGI, IPI, ISS, IDA, IEP, IEA, TAS, NAS, ND, IC For evidence code details: http://www.geneontology.org/GO.evidence.html 8: With (or) From (cardinality =0,1,>1) --------------------------------------- ZFIN File: This column contains the identifier for a related object being used in an inferrence in annotations where the evidence code is IGI, IPI, IMP, ISS or IC. In any case, the abbreviation for the source database is listed followed by a colon and the ID from that database for the object being referred to. If not blank, this field can contain one or more of the following: ISS : UniProt ID, InterPro ID, RefSeq, RefPept, or EMBL(nucl. or prot.) ID IMP : ZDB-FISH ID, ZDB-GENO ID, ZDB-GENE ID, or ZDB-MRPHLNO ID IPI : UniProt ID, EMBL protein ID, RefPept ID IGI : ZDB-FISH ID, ZDB-GENO ID, or ZDB-GENE ID, EMBL nucl. ID, RefSeq ID IC : GO ID Examples: IGI :ZFIN: ZDB-FISH-000821-1 ISS : UniProt:P35569 ISS :EMBL:AF064523 ISS :InterPro:IPR001092 IC : GO:0045298 9: Aspect (cardinality = 1) --------------------------- Which ontology the GO term belongs to: Function (F), Process (P) or Component (C). 10: DB_Object_Name (cardinality = 1) ------------------------------------ ZFIN File: The full name of the gene in ZFIN. Example: atonal homolog 7 11: DB_Object_Synonym (cardinality = 0,1) ----------------------------------------- Alternative names by which the database object is known. ZFIN File: ZFIN does not currently populate this field, so all will be blank. This is subject to change in the future. 12: DB_Object_Type (cardinality = 1) ------------------------------------ The type of object being annotated. ZFIN file: currently always "gene_product" for gene_association.zfin 13: Taxon (cardinality = 1) --------------------------- The taxonomic identifier of the species encoding the gene product ZFIN file: gene_association.zfin only contains data for Danio rerio, so this field will always be taxon:7955 14:Date (cardinality = 1) ------------------------- The date the annotation was last edited in the format 'YYYYMMDD'. Example: 20040821 15: Assigned_by (cardinality = 1) --------------------------------- The source of the GO annotation. ZFIN File: One of either ZFIN ,UniProt, HGNC, BHF, or several others. "ZFIN" indicates that a ZFIN curator made the annotation, or that translation tables were applied to data locally to generate IEA GO annotations. 16: Currently empty --------------------------------- 17: Currently empty -------------------------------- IV. METHODS OF GO ANNOTATION ================================================================================ IV.1 Database Objects --------------------- Currently all GO annotations in ZFIN are attributed to genes. The GO terms describe the attributes of the products of the gene, which could be a protein in the case of many genes, but also could be RNA when the gene product is a non-translated RNA. IV.2 Redundancy in gene_association.zfin ---------------------------------------- NOTE: In this section, "GO annotation" is meant to include the gene, GO Term, evidence code, "with" data, and qualifiers associated with an annotation. Aside from the publication, these items define a unique annotation. Redundant GO annotations from the literature are curated at ZFIN with the following qualifications: A) If a single paper has multiple experiments supporting a single GO annotation, the annotation is only made one time. For example if a single paper shows confocal images of a protein in the cell nucleus, and a Western blot of a nuclear extract also showing this protein is in the nucleus, we will have one GO annotation to capture this information from this publication ("nucleus" by IDA). However, we will capture multiple annotations to the same GO term from a single publication when the annotations are not identical (different evidence or "with" column contents for example). B) If a paper has data supporting a GO annotation that has already been made from independent publications, ZFIN curators will still add the GO annotation to our database until the annotation is represented 5 independent times. Making additional instances of an annotation is optional once 5 identical annotations (including GO term, evidence code, qualifiers and "with" data) from 5 independent publications have been made. C) If two or more papers show conflicting GO data, all sets of GO data are recorded. If in subsequent references a conclusion is reached and agreed upon by all parties involved, then GO terms which are no longer correct will be removed or updated as these situations are identified. IV.3. ELECTRONIC (IEA) GO ANNOTATION IN ZFIN ------------------------------------------- IEA-supported GO annotations in ZFIN are generated electronically thru local application of the interpro2go, ec2go and spkw2go translation tables. More information about the translation tables can be found here: http://www.geneontology.org/GO.indices.shtml IV.4. USE OF THE ND EVIDENCE CODE IN ZFIN ---------------------------------------- The Gene Ontology (GO) Consortium created the evidence code "ND" to indicate "no biological data available". This code is used for annotations to the three root terms `molecular_function ; GO:0005554', `biological_process ; GO:0000004' or `cellular_component ; GO:0008372'. In ZFIN, the use of any of these three GO terms is attributed to the reference ZDB-PUB-031118-1 and supported by the ND evidence code. These annotations signify that a curator has examined the literature associated with that gene in ZFIN, and on the date of that analysis, there was no information supporting an annotation to any GO term in that ontology. As soon as more meaningful GO terms are located for a gene, the annotations to "unknown" are replaced. V. CONTACT INFORMATION ================================================================================ Questions or comments about this file should be sent to: curators@zfin.org