GOA README ---------- 1. Contents ------------ 1. Contents 2. Introduction 3. Differences in the UniProt gene association file from GO and GOA ftp sites 4. List of files and file formats 5. Contacts 6. Copyright Notice 2. Introduction ---------------- For full information on the GOA project, please go to: http://www.ebi.ac.uk/GOA GOA (GO Annotation) is a project run by the European Bioinformatics Institute that aims to provide assignments of proteins to the Gene Ontology (GO) resource. The goal of the Gene Ontology Consortium is to produce a dynamic controlled vocabulary that can be applied to all organisms, even while the knowledge of the gene product roles in cells is still accumulating and changing. In the GOA project, this vocabulary is applied to all proteins described in the UniProt (Swiss-Prot and TrEMBL) Knowledgebase, and to RNAs and macromolecular complexes, identified by RNAcentral and Complex Portal identifiers, respectively, to create "annotations", which are provided in the UniProt annotation files. GOA also provides species-specific annotation sets using the UniProtKB Reference Proteome sets. The Reference Proteome for a species comprises the protein sequences annotated in Swiss-Prot or the longest TrEMBL transcript if there is no Swiss-Prot record. The species-specific annotation sets have been filtered in order to reduce redundancy. This filtering process consists of removing electronic annotations to less-specific GO terms where an annotation (either manual or electronic) to a more specific term exists. Additionally, if there are multiple manual annotations to the same GO term, then preference is given to those with experimental evidence codes. The current set of species that we provide these files for is listed on our project website. All files can be downloaded from ftp://ftp.ebi.ac.uk/pub/databases/GO/goa. Additional, non-filtered species-specific sets are available from the proteomes sets, which include separate annotation files for all species whose genome has been fully sequenced, where the sequence is publicly available, and where the proteome contains >25% GO annotation. UniProt manual GO annotations are created by UniProt curators from the EBI and the Swiss Insitute of Bioinformatics. The dataset is supplemented with manual GO annotation from external model organism databases and specialist groups (full list is on our project website). For manual annotation, curators aim to capture the most recent data from curated papers that provide experimental evidence for the unique features of a given protein. Our approach is protein-centric rather than paper-centric, as we don't read all papers that might be used to assign the same GO term. However when experimental evidence is read which further experimentally verifies a function, redundant annotations to a term using different references are created as this can provide greater confidence to a GO annotation. 3. Differences in the UniProt gene association file from GO and GOA ftp sites. ------------------------------------------------------------------------------ Please note that both the filtered and unfiltered versions of the GOA UniProt gene association file are available from the GO Consortium ftp site (ftp.geneontology.org). The filtered version does not contain annotations for those species where a different Consortium group is primarily responsible for providing GO annotations. If you would like to download an unfiltered GOA UniProt gene association file, please use either the GOA ftp site: ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/UNIPROT/goa_uniprot_all.gaf.gz or the submissions folder in the GO Consortium ftp site: ftp://ftp.geneontology.org/pub/go/gene-associations/submission/goa_uniprot_all.gaf.gz Further information on this filtering script can be found at: http://geneontology.org/page/annotation-quality-control-checks 4. List of files and file formats ---------------------------------- All annotation sets are provided in both GAF2.1 (http://geneontology.org/page/go-annotation-file-gaf-format-21) and GPAD1.1 (http://geneontology.org/page/gene-product-association-data-gpad-format) format, the format being indicated by the file suffix (.gaf for GAF2.1 and .gpa for GPAD1.1). All metadata files are provided in GPI1.2 (http://geneontology.org/page/gene-product-information-gpi-format) format; the file suffix is .gpi. The GOA project provides the following sets of GO annotations and gene product metadata to the GO Consortium: i) goa_uniprot_all.gaf This file contains all GO annotations and information for proteins in the UniProt KnowledgeBase (UniProtKB) and for entities other than proteins, e.g., macromolecular complexes (Complex Portal identifiers) and RNAs (RNAcentral identifiers). ii) goa_.[gaf|gpa] This set contains all GO annotations for canonical accessions from the UniProt reference proteome for the species. iii) goa__isoform.[gaf|gpa] This set contains all GO annotations for isoforms of canonical accessions from the UniProt reference proteome for the species. iv) goa__complex.[gaf|gpa] This set contains all GO annotations for macromolecular complexes (identified by Complex Portal identifiers) for the species. v) goa__rna.[gaf|gpa] This set contains all GO annotations and information for RNAs (identified by RNAcentral identifiers) for the species. vi) goa_.gpi This file contains metadata (name, symbol, synonyms, etc) for all canonical accessions from the UniProt reference proteome for the species, whether they have GO annotations or not. vii) goa__isoform.gpi This file contains metadata (name, symbol, synonyms, etc) for all isoforms of canonical accessions from the UniProt reference proteome for the species, whether they have GO annotations or not. viii) goa__complex.gpi This file contains metadata (name, symbol, synonyms, etc) for all macromolecular complexes (identified by Complex Portal identifiers) for the species, whether they have GO annotations or not. ix) goa__rna.gpi This file contains metadata (name, symbol, synonyms, etc) for all RNAs (identified by RNAcentral identifiers) for the species, whether they have GO annotations or not. Other files we provide are available from ftp://ftp.ebi.ac.uk/pub/databases/GO/goa and include: x) goa_uniprot_all.gpa This file contains all GO annotations for proteins in the UniProt KnowledgeBase (UniProtKB) and for entities other than proteins, e.g., macromolecular complexes (Complex Portal identifiers) and RNAs (RNAcentral identifiers). xi) goa_uniprot_all.gpi This file contains metadata (name, symbol, synonyms, etc) for all canonical entries in the UniProt KnowledgeBase (UniProtKB), regardless of whether they have GO annotations, and for isoforms of canonical entries for which we have GO annotations. xii) goa_uniprot_gcrp.[gaf|gpa] This set contains all GO annotations for canonical accessions from the UniProt reference proteomes for all species, which provide one protein per gene. xiii) goa_uniprot_gcrp.gpi This file contains metadata (name, symbol, synonyms, etc) for all canonical entries from the UniProt reference proteomes for all species, which provide one protein per gene, regardless of whether they have GO annotations. All files we provide are gzipped to reduce the size. 5. Contacts ----------- Please direct any questions to goa@ebi.ac.uk We welcome any feedback. 6. Copyright Notice -------------------- GOA - GO Annotation Copyright 2018 (C) The European Bioinformatics Institute. This README and the accompanying databases may be copied and redistributed freely, without advance permission, provided that this copyright statement is reproduced with each copy. $Date: 2018/04/11 11:18:19 $