!ZFIN README file for gene_association.zfin
!version: $Revision: 1.4 $
!date: $Date: 2011/06/07 23:02:13 $
!from: ZFIN 
!saved-by: Doug Howe
!send comments to: curators@zfin.org


I.  TABLE OF CONTENTS
================================================================================

I......TABLE OF CONTENTS
II.....INTRODUCTION
III....GENE_ASSOCIATION.ZFIN FILE FORMAT
IV.....METHODS OF GO ANNOTATION
 IV.1....DATABASE OBJECTS
 IV.2....REDUNDANCY IN GENE_ASSOCIATION.ZFIN 
 IV.3....ELECTRONIC (IEA) GO ANNOTATION IN ZFIN 
 IV.4....USE OF THE ND EVIDENCE CODE IN ZFIN
V......CONTACT INFORMATION

================================================================================


II.  INTRODUCTION
================================================================================

This file provides a brief description of how GO data is captured in ZFIN and how
it is displayed in the gene_association.zfin file.  The gene_association.zfin file
is updated weekly in the GO CVS, generally on Tuesday or Wednesday.

ZFIN is a database of genetic and genomic data for the zebrafish (Danio rerio),
produced by a team of professional curators and computer scientists.  The data found
in ZFIN is a combination of information curated from scientific literature, and 
provided through reciprocal collaborations with UniProt, NCBI, and specific labs.  

Funding for ZFIN is provided by the NIH (P41 HG002659)..
For additional information, please visit the ZFIN website at ZFIN.org or contact curators@zfin.org.


III.  GENE_ASSOCIATION.ZFIN FILE FORMAT
================================================================================

The gene_association.zfin file contains GO annotations for zebrafish gene products.  The 
gene_association.zfin file uses the standard file format for gene_association files of 
the Gene Ontology (GO) Consortium.  A more complete description of the file format is found here:

http://www.geneontology.org/GO.format.gaf-2_0.shtml

The following provides a brief description of the columns in the gene_association files.  
Lines beginning 'ZFIN File:' refer specifically to the format and contents found in the gene_association.zfin file.


1: DB (cardinality = 1)
-----------------------
The database contributing the gene_association file
ZFIN File: always "ZFIN" for gene_association.zfin.

2: DB_Object_ID (cardinality = 1)
---------------------------------
A unique identifier for the object being annotated.
ZFIN File: This is always the unique ZFIN identifier for a zebrafish gene.
Example: ZDB-GENE-990415-72 ; format is ZDB-GENE-######-#[###] where the last three digits are optional.
 
3: DB_Object_Symbol (cardinality = 1)
-------------------------------------
A gene symbol for the gene having the identifier found in DB_Object_ID.
ZFIN File: This is always the primary gene symbol for a zebrafish gene.
Example: fgf8

4: Qualifier (cardinality = 0,1)
--------------------------------
This field will be empty the vast majority of the time.  When data is present, it will be one of the 
annotation qualifiers 'NOT', 'contributes_to', or 'colocalizes with'

  
5: GO ID (cardinality = 1)
--------------------------
The unique GO identifier for the GO term being attributed to the DB_Object_ID.
Example: GO:0005160 ; always has format GO:#######

6: DB:Reference (cardinality = 1)
---------------------------------
The unique identifier for the reference to which the GO annotation is attributed.
ZFIN File: Each ZFIN reference including published literature and automated annotation methods 
has a unique identifier in the form ZDB-PUB-######-#[###], where the last 3 digits are optional.  
This field contains a ZFIN PUB ID, and when possible, the PubMed ID is also listed, separated from the ZDB-PUB ID with a pipe (|).
Example: ZFIN:ZDB-PUB-030716-20|PMID:12798293

7: Evidence (cardinality = 1)
-----------------------------
The evidence code for the GO annotation; one of IMP, IGI, IPI, ISS, IDA, IEP, IEA, TAS, NAS, ND, IC

For evidence code details: http://www.geneontology.org/GO.evidence.html

8: With (or) From (cardinality =0,1,>1)
---------------------------------------
ZFIN File: This column contains the identifier for a related object being used in an inferrence in annotations where the
evidence code is IGI, IPI, IMP, ISS or IC.  In any case, the abbreviation for the source database 
is listed followed by a colon and the ID from that database for the object being referred to.

If not blank, this field can contain one or more of the following: 
ISS : UniProt ID, InterPro ID, RefSeq, RefPept, or EMBL(nucl.  or prot.) ID
IMP : ZDB-FISH ID, ZDB-GENO ID, ZDB-GENE ID, or ZDB-MRPHLNO ID
IPI : UniProt ID, EMBL protein ID, RefPept ID
IGI : ZDB-FISH ID, ZDB-GENO ID, or ZDB-GENE ID, EMBL nucl. ID, RefSeq ID
IC  : GO ID

Examples:
IGI :ZFIN: ZDB-FISH-000821-1
ISS : UniProt:P35569
ISS :EMBL:AF064523
ISS :InterPro:IPR001092
IC  : GO:0045298

9: Aspect (cardinality = 1)
---------------------------
Which ontology the GO term belongs to: Function (F), Process (P) or Component (C).

10: DB_Object_Name (cardinality = 1)
------------------------------------
 ZFIN File: The full name of the gene in ZFIN.
Example: atonal homolog 7
  
11: DB_Object_Synonym (cardinality = 0,1)
-----------------------------------------
Alternative names by which the database object is known.  
ZFIN File: ZFIN does not currently populate this field, so all will be blank.  This is subject to change in the future.

12: DB_Object_Type (cardinality = 1)
------------------------------------
The type of object being annotated.
ZFIN file: currently always "gene_product" for gene_association.zfin

13: Taxon (cardinality = 1)
---------------------------
The taxonomic identifier of the species encoding the gene product
ZFIN file: gene_association.zfin only contains data for Danio rerio, so this field will always be taxon:7955

14:Date (cardinality = 1)
-------------------------
The date the annotation was last edited in the format 'YYYYMMDD'.
Example: 20040821

15: Assigned_by (cardinality = 1)
---------------------------------
The source of the GO annotation.
ZFIN File: One of either ZFIN ,UniProt, HGNC, BHF, or several others.  "ZFIN" indicates that a ZFIN curator made the annotation, 
or that translation tables were applied to data locally to generate IEA GO annotations.

16:  Currently empty
---------------------------------

17:  Currently empty
--------------------------------


IV.  METHODS OF GO ANNOTATION
================================================================================

IV.1 Database Objects
---------------------
Currently all GO annotations in ZFIN are attributed to genes.  The GO terms describe the attributes 
of the products of the gene, which could be a protein in the case of many genes, but also 
could be RNA when the gene product is a non-translated RNA.

IV.2 Redundancy in gene_association.zfin
----------------------------------------
NOTE: In this section, "GO annotation" is meant to include the gene, GO Term, evidence code, 
"with" data, and qualifiers associated with an annotation.  Aside from the publication, these 
items define a unique annotation.  Redundant GO annotations from the literature are curated 
at ZFIN with the following qualifications:

    A) If a single paper has multiple experiments supporting a single GO annotation, the annotation 
is only made one time.  For example if a single paper shows confocal images of a protein in the 
cell nucleus, and a Western blot of a nuclear extract also showing this protein is in the nucleus, 
we will have one GO annotation to capture this information from this publication ("nucleus" by IDA).  
However, we will capture multiple annotations to the same GO term from a single publication when the 
annotations are not identical (different evidence or "with" column contents for example).  

    B) If a paper has data supporting a GO annotation that has already been made from independent 
publications, ZFIN curators will still add the GO annotation to our database until the annotation 
is represented 5 independent times.  Making additional instances of an annotation is optional once 
5 identical annotations (including GO term, evidence code, qualifiers and "with" data) from 5 
independent publications have been made.  

    C) If two or more papers show conflicting GO data, all sets of GO data are recorded.  If in 
subsequent references a conclusion is reached and agreed upon by all parties involved, then GO terms 
which are no longer correct will be removed or updated as these situations are identified.


IV.3.  ELECTRONIC (IEA) GO ANNOTATION IN ZFIN
-------------------------------------------
IEA-supported GO annotations in ZFIN are generated electronically thru local application of the 
interpro2go, ec2go and spkw2go translation tables.  More information about the translation tables can be found here:
http://www.geneontology.org/GO.indices.shtml


IV.4.  USE OF THE ND EVIDENCE CODE IN ZFIN
----------------------------------------

The Gene Ontology (GO) Consortium created the evidence code "ND" to indicate "no biological data available".  
This code is used for annotations to the three root terms `molecular_function ; GO:0005554', 
`biological_process ; GO:0000004' or `cellular_component ; GO:0008372'.  In ZFIN, the use 
of any of these three GO terms is attributed to the reference ZDB-PUB-031118-1 and supported by the ND evidence 
code.  These annotations signify that a curator has examined the literature associated with that gene in ZFIN, 
and on the date of that analysis, there was no information supporting an annotation to any GO term in that ontology.  
As soon as more meaningful GO terms are located for a gene, the annotations to "unknown" are replaced.


V.  CONTACT INFORMATION
================================================================================

Questions or comments about this file should be sent to: curators@zfin.org