FlyBase Progress Report. Gene Ontology Consortium Meeting, September 26th - 27th 2003. 1. CURRENT GO ANNOTATIONS IN FLYBASE May 22nd 2003 Sept. 19th 2003 Total genes annotated with at least 1 GO term: 7745 10384 non-melanogaster genes with GO annotations: 85 Total number of process terms: 11527 19904 Number of unique process terms: 8730 16649 Number of 'biological_process unknown' annotations: 191 Total number of function terms: 12354 16357 Number of unique function terms: 8824 12670 Number of 'molecular_function unknown' annotations: 237 Total number of component terms: 7243 7740 Number of unique component terms: 5302 5705 Number of 'cellular_component unknown' annotations: 255 Total lines of GO annotation in FlyBase: 31124 44001 GO terms supported by IEA: 129 9995 In summary: 21 species of Drosophila (including melanogaster) have at least 1 gene in FlyBase with one or more GO annotations. The large increase in IEA-supported annotations is the result of adding GO annotations from the PANTHER analysis into FlyBase (see part 2D of this report). The large number of process-, function- and component- unknown terms are principally from a gene-by-gene approach to both literature curation and Release 3 sequence curation (see parts 2B and 2C of this report). 2. ANNOTATION A. Swiss-Prot Curation Eleanor Whitfield at Swiss-Prot is continuing to send GO annotations (for incorporation into FlyBase) of new and updated Swiss-Prot records for melanogaster and non-melanogaster Drosophila species. B. Literature Curation Literature curation continues to provide the predominant source of FlyBase GO annotations. FlyBase curators are adding GO terms by curation of primary papers and personal communications. Curation of reviews is also ongoing. In addition to the standard paper-by-paper approach employed by FlyBase, curation on a gene-by-gene basis has lead to an increase in the number of GO-annotated genes. C. Sequence Curation New gene models that arose from the Release 3 re-annotation of the Drosophila genome are being looked at on a gene-by-gene basis. GO terms are being added where possible, based on sequence similarity. A large number of process-, function-, and component- unknown tems are being added in this approach. Chris Mungall has provided a list of all gene models where the coding sequence has changed since Release 3. These will also be looked at for GO data. In addition, Michael continues to curate GO data as part of his curation of GenBank records. D. PANTHER Analysis The 'PANTHER group' has provided FlyBase with a list of their GO term predictions, regardless of their score. Suzi removed all low scoring (lower than -85) lines in addition to process/function/component-unknown lines. Michael fixed any parser blips to create a final file. - For any FlyBase genes with no current GO annotations: the PANTHER predictions were added automatically by Aubrey de Grey, attributed to a personal communication and supported by the IEA evidence code. - For any FlyBase genes with existing GO annotation: Aubrey removed any where the PANTHER prediction was a parent to an existing GO annotation. The remainder are being looked at by Michael and Becky to check for any redundant predictions and any conflicts with existing GO terms. Useful predictions will then be added into the FlyBase files, supported by IEA. Michael Ashburner and Rebecca Foulger. e-mail: m.ashburner@gen.cam.ac.uk r.foulger@gen.cam.ac.uk http://flybase.bio.indiana.edu/ http://fly.ebi.ac.uk (UK mirror)