RELEASE.notes Compugen Gene Ontology Gene Association Data January 14, 2004 Compugen Flat File Release 0.6.1 Distribution Release Notes This document describes the format and content of flat files that comprise public releases of the Compugen Gene Ontology Gene Association data. If you have any questions or comments about Compugen or this document, please contact Compugen USA, Inc. via email at GO@cgen.com or write to: Compugen USA, Inc. 7 Center Drive, suite 9 Jamesburg, NJ 08831 Phone: (609) 655-5105 Fax: (609) 655-5114 ========================================================================== TABLE OF CONTENTS ========================================================================== 1. INTRODUCTION 1.1 Compugen USA, Inc. 1.2 Release version 0.6.1 1.3 Statistics 2. DATA AND METHODOLOGIES 2.1 Input Data 2.2 Brief Introduction of Methodologies 3. FILES 3.1 File Descriptions 3.2 File Format 3.3 Sample Gene Ontology Gene Association 4. GENERAL INFORMATION 4.1 Citing Compugen 4.2 Other Methods of Accessing Compugen's GO Gene Association Data 4.3 Request for Corrections and Comments 4.4 Disclaimer ========================================================================== 1. INTRODUCTION 1.1 Compugen Compugen is a genomics-based drug and diagnostic discovery company, whose mission is to increase the probability of successful development of novel drug and diagnostic products by incorporating ideas and methods from mathematics, computer science, and physics into the disciplines of biology, organic chemistry, and medicine. This unique capability results in powerful predictive models, which are both advancing the understanding of important biological phenomena and enabling the discovery of numerous potential therapeutic and diagnostic products. Compugen has established collaborations with leading pharmaceutical and diagnostic companies, and has begun in-house development of selected putative therapeutic proteins that it has discovered. Compugen is publicly traded on Nasdaq (NASDAQ: CGEN) and on the Tel Aviv Stock Exchange. We have corporate offices in Israel, with a wholly owned subsidiary headquartered in New Jersey, Compugen USA, Inc., and marketing and customer support presence in California and Maryland. For additional information, please visit Compugen's Corporate Web Site at www.cgen.com. 1.2 Release version 0.6.1 Compugen USA, Inc. is distributing the Gene Ontology Gene Association Data files, as of January 14, 2004. This release includes three files as detailed below. 1.3 Statistics The current release includes Gene Ontology Gene Associations to 231373 Uniprot Swiss-Prot and TrEMBL (Dec. 15, 2003 release) protein entries and 540103 GenBank version 139.0 protein entries, corresponding to 488851 unique proteins with a total of 2387153 GO associations. Please note that some of the proteins are listed in both gene association files and some proteins and their annotations may have more than one records due to their presence in different species. 2. DATA AND METHODOLOGIES 2.1 DATA Compugen USA, Inc. Gene Ontology Association Data Release version 0.6.1 was built based on data collected and extracted from the following public databases, data and files. GenBank release 139.0 UniProt release Dec. 15, 2003 Medline databases as of April 6, 2001 And the following files from Gene Ontology Consortium, downloaded on Oct. 22, 2003: gene_association.fb gene_association.mgi gene_association.sgd gene_association.wb gene_association.goa_sptr 2.2 Brief Introduction of Methodologies Compugen USA, Inc. has developed proprietary method to automatically annotate protein sequences using the controlled vocabularies of Gene Ontology. The annotation was centered on homology comparison including protein profiles and ProlocTM Compugen LTD. proprietary software for protein subcellular localization. In addition, information from definition lines of the sequence databases and from Medline database were extracted to increase the accuracy of annotation and to provide novel annotations. Detailed description about our methodologies can be found in the publication Genome Research volume 12, page 785-794. 3. FILES 3.1 File Descriptions This release consists of three files. The following list briefly describes each of the files included in the distribution, along with their sizes. 1. RELEASE.notes - Release notes (this document). 2. gene_association.Compugen_GenBank - gene association file for genes from GenBank release 139.0, 161249720 bytes 3. gene_association.Compugen_UniProt - gene association file for proteins from UniProt release on Dec. 15, 2003, 71301849 bytes Both gene_association.Compugen_UniProt and gene_association.Compugen_GenBank files exclude any Gene Ontology association used as input data from the files listed in Section 2.1. Please inquire GO@cgen.com for a combined gene association file. 3.2 File Format The file formats conform to the guidelines provided by the Gene Ontology Consortium (www.geneontology.com, see also section 3.3). All gene associations have the evidence code 'IEA' to indicate that annotations released here are obtained through computational method. 3.3 Sample Gene Ontology Association Data An example of a complete gene association is provided here. (tab spaces are condensed here) ---------------------------------------------------------------------- CCGEN PrID69417 GI10 GO:0016538 CGEN:ProdVersion0.6.1 IEA F protein taxon:9913 20040107 ---------------------------------------------------------------------- Legend: CGEN Compugen database name PrID69417 Unique ID assigned by Compugen GI10 GenBank GI number for UniProt proteins, UniProt accession number GO:0016538 GO number CGEN:ProdVersion0.6.1 reference to Compugen internal production version IEA evidence code F ontology category protein indicating a protein is annotated Taxon:9913 Taxonomy ID number (see NCBI for more details) 20040107 indicate the date Jan. 7, 2004 4. GENERAL INFORMATION 4.1 Citing Compugen USA , Inc. When you use Compugen data in your research, please reference Compugen USA, Inc. as Compugen USA, Inc. (http://www.cgen.com) 4.2 Other Methods of Accessing Compugen's GO Gene Association Data The data provided here, or updates if any, can also be obtained through e-mail inquiry to GO@cgen.com. 4.3 Request for Corrections and Comments We welcome your suggestions for improvements. Compugen's GO gene association data is work in progress. Therefore, we are especially interested in learning about errors or inconsistencies in the data. Suggestions and corrections can be sent by e-mail to: GO@cgen.com. 4.4 Disclaimer The Compugen public GO Gene Association Data (the "Information") is provided on an "as is" basis. Compugen expressly disclaims all implied warranties, including without limitation any warranty of non-infringement, and any warranty in respect of quality or fitness for any particular use of the Compugen public GO Annotation. Compugen Inc. does not warrant or assume any legal liability or responsibility for the accuracy, completeness, or usefulness of any of the Information, or process disclosed. Compugen expressly disclaims any warranty that the Information will not be subject to patents which already have been issued or to patents which may be issued in the future, owned by any other entity including Compugen itself or entities related to Compugen. The information is experimental in nature, and is not approved by the U.S. Food and Drug administration or any other regulatory body. Compugen will not be liable for any indirect, special, incidental, consequential or punitive damages or any other damages based on economic harm, injury to property or lost profits, regardless of whether Compugen has been advised. Compugen shall not be responsible for any damage caused, directly or indirectly, in connection with your use of the Information. Disclaimer of Endorsement Information It is not the intention of Compugen Inc. to provide definitive functional annotation for the genes and proteins described in this data file, but rather to provide users with information to better understand the functions of these genes and proteins and their involvement in biological processes. Copyright Status Unless stated otherwise, the information may be freely downloaded and reproduced. However, any publication or commercial use of the Information must include acknowledgement of Compugen as the data source. Please reference Compugen as: Compugen USA, Inc.: http://www.cgen.com/. Compugen USA, Inc. public GO Annotation Availability The Compugen USA, Inc. public GO Annotation is designed to provide and encourage access within the scientific community to an up to date and comprehensive sequence annotation. Therefore, Compugen USA, Inc. places no restrictions on the use or distribution of the Compugen USA, Inc. public GO Annotation data. January 14, 2004