RELEASE.notes Compugen Gene Ontology Gene Association Data July 10, 2002 Compugen Flat File Release 0.5.1 Distribution Release Notes This document describes the format and content of flat files that comprise public releases of the Compugen Gene Ontology Gene Association data. If you have any questions or comments about Compugen or this document, please contact Compugen Inc. via email at GO@cgen.com or: Compugen Inc. 7 Center Drive, suite 7 Jamesburg, NJ 08831 Phone: (609) 655-5105 Fax: (609) 655-5114 IMPORTANT NOTICE: due to the large file size, both Genbank and Swiss-Prot gene association files are gzipped and please use "save link as" in the web browser to download the files, rather than trying to open them in the browser. ========================================================================== TABLE OF CONTENTS ========================================================================== 1. INTRODUCTION 1.1 Compugen Inc. 1.2 Release version 0.5.1 1.3 Statistics 2. DATA AND METHODOLOGIES 2.1 Input Data 2.2 Brief Introduction of Methodologies 3. FILES 3.1 File Descriptions 3.2 File Format 3.3 Sample Gene Ontology Gene Association 4. GENERAL INFORMATION 4.1 Citing Compugen 4.2 Other Methods of Accessing Compugen's GO Gene Association Data 4.3 Request for Corrections and Comments 4.4 Disclaimer ========================================================================== 1. INTRODUCTION 1.1 Compugen Compugen develops and markets platforms, tools and products to accelerate post-genomic research, advance the study of proteins and protein pathways, and support drug target discovery. These products include: LEADS, Gencarta, DNA Chip design, Z3, LabOnWeb.com and Bioccelerators. The Company's products and methodologies are developed through its leadership position in the convergence of the life sciences and computational technologies. Utilizing its in-house molecular biology laboratories, Compugen both validates its methodologies and is engaged in original genomic and proteomic research. Genes, proteins and other intellectual property discovered by the Company in its original research activities are patented by the Company and will be commercialized, primarily by licensing to third parties, through its Novel Genomics Division. For additional information, please visit Compugen's Corporate Web Site at www.cgen.com and the Company's Internet research engine for molecular biologists, www.LabOnWeb.com. 1.2 Release version 0.5.1 Compugen Inc. (a subsidiary of Compugen LTD.) is distributing the Gene Ontology Gene Association Data files, as of July 3, 2002. This release includes three files as detailed below. 1.3 Statistics The current release includes Gene Ontology Gene Associations to 436,476 SwissProt version 40 and TRemble protein entries with unique protein sequences with 1,102,590 novel GO association, and to 641,179 unique GenBank version 129 Genpep proteins with a total of 3,116,673 GO associations. GO annotations in the EBI annotation file (gene_association.goa_sptr asof 2002/06/21) are not included. Please note that some of the proteins are listed in both gene association files and some proteins may have more than one records due to their presence in different species. 2. DATA AND METHODOLOGIES 2.1 DATA Compugen Inc. Gene Ontology Association Data Release version 0.3.1 was built based on data collected and extracted from the following public databases, data and files. GenBank release 129.0 SwissProt release 40.0 Enzyme database Release 27.0 Medline databases as of June 6, 2001 And the following files from Gene Ontology Consortium: gene_association.fb version 1.35, 2002/04/01 gene_association.goa_sptr 2002/06/21 gene_association.gramene_oryza version 1.1, 2002/03/17 gene_association.mgi version 1.71, 2002/05/17 gene_association.sgd version 1.395,2002/05/09 gene_association.tair 2002/05/29 gene_association.wb version 1.10 2002/05/01 ec2go version 1.3, 2002/05/10 spkw2go version 1.23, 2000/05/02 2.2 Brief Introduction of Methodologies Compugen Inc. has developed proprietary method to automatically annotate protein sequences using the controlled vocabularies of Gene Ontology. The annotation was centered on homology comparison including protein profiles and ProlocTM Compugen LTD. proprietary software for protein subcellular localization. In addition, information from definition lines of the sequence databases and from Medline database were extracted to increase the accuracy of annotation and to provide novel annotations. Our publication listed below provides detailed description of the methodologies. 3. FILES 3.1 File Descriptions This release consists of three files. The following list briefly describes each of the files included in the distribution, along with their sizes. 1. RELEASE.notes - Release notes (this document). 2. Gene_association.GenBank - gene association file for genes from GenBank release 129. 3. Gene_association.SP - gene association file for proteins from SwissProt release 40. 3.2 File Format The file formats conform to the guidelines provided by the Gene Ontology Consortium (www.geneontology.com, see also section 3.3). All gene associations have the evidence code 'IEA' to indicate that annotations released here are obtained through computational method. 3.3 Sample Gene Ontology Association Data An example of a complete gene association is provided here. ---------------------------------------------------------------------- CGEN PrID131022 O00116 GO:0008609 CGEN:ProdVersion0.5.1 IEA F protein TaxonID:9606 ---------------------------------------------------------------------- Legend: CGEN Compugen database name PrID131022 Unique ID assigned by Compugen O00116 SwissProt locus number for GenBank association file, GenBank identifier GO:0008609 GO number CGEN:ProdVersion0.5.1 reference to Compugen internal production version IEA evidence code F ontology category protein indicating a protein is annotated TaxonID:9606 Taxonomy ID number (see NCBI for more details) 4. GENERAL INFORMATION 4.1 Citing Compugen, Inc. When you use Compugen data in your research, please reference Compugen, Inc. as Compugen, Inc. (http://www.cgen.com, http://www.labonweb.com) and cite the following article as a reference. Hanqing Xie, Alon Wasserman, Zurit Levine, Amit Novik, Vladimir Grebinskiy, Avi Shoshan, and Liat Mintz. Large-Scale Protein Annotation through Gene Ontology Genome Research, Vol. 12, Issue 5, 785-794, May 2002 4.2 Other Methods of Accessing Compugen's GO Gene Association Data The data provided here, or updates if any, can also be obtained through e-mail inquiry to GO@cgen.com. 4.3 Request for Corrections and Comments We welcome your suggestions for improvements. Compugen's GO gene association data is work in progress. Therefore, we are especially interested in learning about errors or inconsistencies in the data. Suggestions and corrections can be sent by e-mail to: GO@cgen.com. 4.4 Disclaimer The Compugen public GO Gene Association Data (the "Information") is provided on an "as is" basis. Compugen expressly disclaims all implied warranties, including without limitation any warranty of non-infringement, and any warranty in respect of quality or fitness for any particular use of the Compugen public GO Annotation. Compugen Inc. does not warrant or assume any legal liability or responsibility for the accuracy, completeness, or usefulness of any of the Information, or process disclosed. Compugen expressly disclaims any warranty that the Information will not be subject to patents which already have been issued or to patents which may be issued in the future, owned by any other entity including Compugen itself or entities related to Compugen. The information is experimental in nature, and is not approved by the U.S. Food and Drug administration or any other regulatory body. Compugen will not be liable for any indirect, special, incidental, consequential or punitive damages or any other damages based on economic harm, injury to property or lost profits, regardless of whether Compugen has been advised. Compugen shall not be responsible for any damage caused, directly or indirectly, in connection with your use of the Information. Disclaimer of Endorsement Information It is not the intention of Compugen Inc. to provide definitive functional annotation for the genes and proteins described in this data file, but rather to provide users with information to better understand the functions of these genes and proteins and their involvement in biological processes. Copyright Status Unless stated otherwise, the information may be freely downloaded and reproduced. However, any publication or commercial use of the Information must include recognition of Compugen according to standard practice for assigning scientific credit, either through authorship or acknowledgment as may be appropriate. Please reference Compugen as: Compugen Inc.: http://www.cgen.com/, http://www.labonweb.com Compugen Inc. public GO Annotation Availability The Compugen Inc. public GO Annotation is designed to provide and encourage access within the scientific community to an up to date and comprehensive sequence annotation. Therefore, Compugen Inc. places no restrictions on the use or distribution of the Compugen Inc. public GO Annotation data. July 10, 2002