Search

the Gene Ontology

GO Curation Consistency Study Instructions

July 10-14, Stanford University, Stanford, CA

Part 1: Annotation Standards - July 10-11, 2006
(Consortium members only)
 
Part 2: Training for Manual Curation of Research Literature - July 12-14, 2006
(all welcome)

Motivation for the Study

In the GO Consortium grant renewal, which was under review during the first week of June, we stated the need for the development of metrics to help monitor the annotation progress and to analyze the quality and consistency of these annotations. As our initial foray into developing such metrics, we will be working with John MacMullen, a Ph.D. student in Library and Information Sciences at the University of North Carolina. John is interested in studying the process of making biological annotations, specifically GO annotations, and has been working with SGD on a study of the process of making GO annotations. A GO curation consistency study began the week of Monday, June 12th and be completed at the Annotation Camp in July.

Back to top

Participation in the GO Curation Consistency Study

Groups Representing the Five Selected Organisms

Participation by the five database groups that represent the five organisms selected for the study is essential. The two participants from each group will provide expert individual and consensus annotations for the two papers selected from the body of literature about that organism.

The following five organisms were chosen to give reasonable phylogenetic coverage. As the new grant focuses on human disease genes, the GOA human group is included specifically to help address annotation of human genes. We need to limit the number of organisms to five so that that we can limit the total number of papers to ten; this is part of the study design by John to keep the workload manageable for participants.

  • S. cerevisiae (SGD)
  • C. elegans (WormBase)
  • A. thaliana (TAIR)
  • M. musculus (MGI)
  • H. sapiens (GOA)

The second criteria for selecting an organism was a requirement for two annotators from the appropriate MOD/group to serve as the expert annotators.

Other Reference Genome Groups or GO Associates

We strongly encourage the other reference genomes and GO associate groups, especially those attending the Annotation Camp, to also participate in the study. While we will not add any more organisms or papers to the set of 10 that we are using for the study, additional individual and paired consensus annotations for this set of papers will be data useful to John in analyzing curation consistency across the GO Consortium member and associate groups.

If possible, each group should select two curators to read the set of 10 papers and record their individual annotations and then discuss the similarities and differences between the two of them to decide on a consensus set of annotations that is recorded separately from the individual annotations. If it is only possible for one person from a group to participate, we will still welcome an individual annotation workbook from that group. More details about participating in the study are in the Instructions section.

Attendees of Part 2 of the Annotation Camp

We strongly encourage all attendees of Part 2 to read all 10 papers prior to arrival at the Annotation Camp. We also recommend that you also fill out the Excel workbook to record your own individual annotations, or any notes or questions that arise when you read the paper. This will help you gain a sense of where you have questions in the procedure of making GO annotations and will help maximize your learning experience at the GO Annotation Camp.

As we will be using the same set of 10 papers for both the Consistency Study and for the small working group sessions of Part 2 of the Annotation Camp, we would be interested in receiving an anonymous copy of your Excel workbook, if you are comfortable sharing it with us. More details about participating in the study are in the Instructions section.

Back to top

Instructions for Participation in the GO Study

List of Papers

We have selected a set of 10 papers representing 5 organisms. We will read these over the course of the month prior to the Annotation Camp. These are the same papers we are using for the small group working sessions for the Annotation Camp.

  1. PMID:16642040 (S. cerevisiae) Noma A. et al. (2006) Biosynthesis of wybutosine, a hyper-modified nucleoside in eukaryotic phenylalanine tRNA. EMBO J. 25(10):2142-54.     PubMed     PDF
  2. PMID:16598690 (S. cerevisiae) Warringer J and Blomberg A. (2006) Involvement of yeast YOL151W/GRE2 in ergosterol metabolism. Yeast. 23(5):389-98.     PubMed     PDF
  3. PMID:14697201 (C. elegans) Malone CJ et al. (2003) The C. elegans hook protein, ZYG-12, mediates the essential attachment between the centrosome and nucleus. Cell. 115(7):825-36.     PubMed     PDF
  4. PMID:11703940 (C. elegans) Wu YC et al. (2001) C. elegans CED-12 acts in the conserved crkII/DOCK180/Rac pathway to control cell migration and cell corpse engulfment. Dev Cell. 1(4):491-502.     PubMed     PDF
  5. PMID:16682356 (A. thaliana) Fahlgren N et al. (2006) Regulation of AUXIN RESPONSE FACTOR3 by TAS3 ta-siRNA affects developmental timing and patterning in Arabidopsis. Curr Biol. 16(9):939-44.     PubMed     PDF
  6. PMID:16507088 (A. thaliana) Auldridge ME et al. (2006) Characterization of three members of the Arabidopsis carotenoid cleavage dioxygenase family demonstrates the divergent roles of this multifunctional enzyme family. Plant J. 45(6):982-93.     PubMed     PDF
  7. PMID:16449762 (M. musculus) Beigneux AP et al. (2006) Agpat6--a novel lipid biosynthetic gene required for triacylglycerol production in mammary epithelium. J Lipid Res. 47(4):734-44.     PubMed     PDF
  8. PMID:14500813 (M. musculus) Kim YS et al. (2003) GLIS3, a novel member of the GLIS subfamily of Kruppel-like zinc finger proteins with repressor and activation functions. Nucleic Acids Res. 31(19):5513-25.     PubMed     PDF
  9. PMID:15693750 (H. sapiens) Jagadish N et al. (2005) Characterization of a novel human sperm-associated antigen 9 (SPAG9) having structural homology with c-Jun N-terminal kinase-interacting protein. Biochem J. 389(Pt 1):73-82.     PubMed     PDF
  10. PMID:11027586 (H. sapiens) Abdul KM et al. (2000) Functional analysis of human metaxin in mitochondrial protein import in cultured cells and its relationship with the Tom complex. Biochem Biophys Res Commun. 276(3):1028-34.     PubMed     PDF

Back to top

Recording your Annnotations

Other than reading the papers themselves, there is very minimal additional work requested of the study participants. This section outlines the basic procedure. Please send any questions to .

Individual Annotations: To collect the data for the study, each individual participant will read each paper individually, without discussing it with anyone else, fill out an Excel workbook with their annotations.

Consensus Annotations: Each pair of curators from the same GOC member or associate group will then discuss their annotations for each paper and come up with a consensus set of annotations; these will be recorded in a another copy of the Excel workbook. Do not edit your individual annotation spreadsheet for that paper once you've participated in a consistency discussion on that paper.

Scope of Annotations: For this exercise, let's attempt to make all annotations that could be made from the results reported in this paper for all genes dealt with in the paper.

Gene Nomenclature tips: While the main focus of the Annotation Camp is on the selection of GO terms, it is of course critical to be able to select the correct gene name or gene identifier to associate those GO annotations with. The Annotation Camp Reading List page will provide brief introductions to the gene nomenclature of each of the five species we will be dealing with in the reading list.

Excel workbook: We will be recording our annotations in an Excel workbook. There is one sheet for each of the 10 papers we will be reading. These sheets contain everything we need to record in order to discuss our annotations. Note that the sheets do not contain all the required fields of the gene_association file format. While required for the gene_association files, some of the columns I have left off contain information that is not relevant to the discussion of manual curation of the literature. For our ease of viewing, I have also added a column for the actual GO term name and have asked you to record the Latin name of the organism rather than the NCBI taxon ID. A more complete discussion of the require format for gene association files is found in the GO Annotation Guide.

The Excel workbook template file is named GOstudy_template.xls. For your individual annotations file, please change the template portion of the file name to your name. For example, my file would be called GOstudy_KChristie.xls. For paired consensus annotations, please change the template portion of the file name to the name of your group. For example, for SGD, the consensus file would be called GOstudy_SGD.xls. There should be three files for each group, one for each curator, and a third file for the paired consensus annotations. Both individual and consensus annotation workbooks need to be completed prior to the start of the Annotation Camp.

Once you have completed your individual annotation or consensus annotation files, please send the files to . Note: I am also happy to receive any intermediate copies of your workbooks, either individual or consensus, to serve as a backup.

Download the Excel workbook

Consent Form

Since this is a study involving human participants, yourselves, John has a consent form which must be signed in order for your work to be included in the study. Study participants representing the five selected organisms have received a consent form via email. All other participants should use this consent form. Completed consent forms should be sent to John MacMullen at:
          John MacMullen
          School of Information & Library Science
          University of North Carolina
          CB# 3360, 100 Manning Hall
          Chapel Hill NC 27599-3360

It is NOT necessary to participate in the study or sign a consent form in order to fully participate in Part 2 of the Annotation Camp. However, if you are willing to provide copies of your individual and, if relevant your consensus annotations, we will appreciate it. This data will be very useful to John MacMullen and to the GO Consortium in the development of metrics to assess the effectiveness and accuracy of GO curation.

Back to top