Reading List for the 2006 GO Annotation Camp

July 12-14, Stanford University, Stanford, CA

Part 2: Training for Manual Curation of Research Literature - July 12-14, 2006
(all welcome)

About the Reading List
Reading List
Background on GO
Two papers for large group working session
Papers for small group working sessions
Gene Nomenclature Tips

About the Reading List

This page contains the reading list for Part 2 of the GO Consortium's 2006 Annotation Camp. We strongly encourage you to take a look at these papers in advance of arriving at the Annotation Camp to maximize your learning experience.

We are also using this same reading list for a comparative study of GO curation consistency between five of the nine designated reference genome groups of the GO Consortium. Participants in Part 2 of the Annotation Camp are invited to contribute to the study, though participation in the study is NOT required and lack of participation in the study will NOT have any impact on your participation in in the Annotation Camp.

Reading List

Background on GO

If you are completely new to GO, we recommend that you read a little about it before arriving at the Annotation Camp. Here are some reviews:

The Gene Ontology Consortium. 2000. Gene Ontology: tool for the unification of biology. Nat Genet 25: 25-29. [ABSTRACT] [PDF]

The Gene Ontology Consortium. 2001. Creating the gene ontology resource: design and implementation. Genome Res 11: 1425-1433. [ABSTRACT] [FULL TEXT] [PDF]

The Gene Ontology Consortium. 2004. The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res 32: D258-D261. [ABSTRACT] [FULL TEXT] [PDF]

A more extensive reading list of publications about GO or using GO is available from the Gene Ontology Bibliography page.

Two papers for large group working session

During the large group working session on Wednesday July 12th, we will read two papers and discuss the GO annotations that may or may not, be reasonably made from each paper. We suggest that you read these papers at least briefly before arriving at the camp.

Chang M et al. (2005) RMI1/NCE4, a suppressor of genome instability, encodes a member of the RecQ helicase/Topo III complex. EMBO J [PDF]
supplementary material for Chang et al.
- the before ontology file
- the after ontology file
Loyola A et al. (2003) Functional analysis of the subunits of the chromatin assembly factor RSF. Mol Cell Biol. 23(19):6759-68. [PDF]

Papers for small group working sessions

These are the 10 papers we will be reading for Part 2 of the Annotation Camp. You may get more out of the Annotation Camp if you read these papers in advance and make notes about any questions you may have. During the camp, we will be using this Excel workbook to record our annotations. Feel free to use it in advance to record any notes or questions about each paper.

PMID:16642040 (S. cerevisiae) Noma A. et al. (2006) Biosynthesis of wybutosine, a hyper-modified nucleoside in eukaryotic phenylalanine tRNA. EMBO J. 25(10):2142-54. PubMed PDF

PMID:16598690 (S. cerevisiae) Warringer J and Blomberg A. (2006) Involvement of yeast YOL151W/GRE2 in ergosterol metabolism. Yeast. 23(5):389-98. PubMed PDF

PMID:14697201 (C. elegans) Malone CJ et al. (2003) The C. elegans hook protein, ZYG-12, mediates the essential attachment between the centrosome and nucleus. Cell. 115(7):825-36. PubMed PDF

PMID:11703940 (C. elegans) Wu YC et al. (2001) C. elegans CED-12 acts in the conserved crkII/DOCK180/Rac pathway to control cell migration and cell corpse engulfment. Dev Cell. 1(4):491-502. PubMed PDF

PMID:16682356 (A. thaliana) Fahlgren N et al. (2006) Regulation of AUXIN RESPONSE FACTOR3 by TAS3 ta-siRNA affects developmental timing and patterning in Arabidopsis. Curr Biol. 16(9):939-44. PubMed PDF

PMID:16507088 (A. thaliana) Auldridge ME et al. (2006) Characterization of three members of the Arabidopsis carotenoid cleavage dioxygenase family demonstrates the divergent roles of this multifunctional enzyme family. Plant J. 45(6):982-93. PubMed PDF

PMID:16449762 (M. musculus) Beigneux AP et al. (2006) Agpat6--a novel lipid biosynthetic gene required for triacylglycerol production in mammary epithelium. J Lipid Res. 47(4):734-44. PubMed PDF

PMID:14500813 (M. musculus) Kim YS et al. (2003) GLIS3, a novel member of the GLIS subfamily of Kruppel-like zinc finger proteins with repressor and activation functions. Nucleic Acids Res. 31(19):5513-25. PubMed PDF

PMID:15693750 (H. sapiens) Jagadish N et al. (2005) Characterization of a novel human sperm-associated antigen 9 (SPAG9) having structural homology with c-Jun N-terminal kinase-interacting protein. Biochem J. 389(Pt 1):73-82. PubMed PDF

PMID:11027586 (H. sapiens) Abdul KM et al. (2000) Functional analysis of human metaxin in mitochondrial protein import in cultured cells and its relationship with the Tom complex. Biochem Biophys Res Commun. 276(3):1028-34. PubMed PDF

GO Study: This same group of papers will also be used for the GO curation consistency study. If you fill out an Excel workbook with your annotations in advance of the camp, we would be interested in receiving an anonymous copy of your Excel workbook, if you are comfortable sharing it with us. More details about participating in the study are here.

Small Group Working Sessions: During Part 2 of the Annotation Camp, we will break into small groups to examine each of these papers carefully and discuss the GO annotations that can reasonably be made from the papers. Each group will contain at least one experienced GO annotator from one of the GO Consortium groups. These sessions will be a good opportunity for people to ask questions about the process of making GO annotations.

Group Discussions: Once the small groups have had a chance to discuss the papers and arrive at a group consensus, we will discuss the papers in the full group and can compare small group results to the consensus derived by the pair of experts for the relevant organism.

Gene Nomenclature Tips

Issues related to gene names

The main focus of the Annotation Camp is on determining appropriate GO terms to associate with a gene. However, sometimes determining the correct gene to which the GO annotation should be attached is itself a tricky issue. To facilitate your reading of the papers we have selected for the Annotation Camp, here are some of the basics of gene naming conventions in the organisms we will encounter in the reading list.

Budding yeast (S. cerevisiae)

To get a feel for gene naming conventions in S. cerevisiae, let's start with a specific gene. SGD has a page for every gene in the database, for example the one for the gene DST1.

The Standard Name, e.g. "DST1", is our main name, when it exists.

However, we do have genes that have not yet been "named"; these only have a systematic name, e.g. "YGL043W".

When the Standard Name exists use that for the main gene name; when it does not exist, use the Systematic Name.

Our database identifier, the Primary SGDID, is at the bottom of the blue column on the left hand side of the page.

While the Primary SGDID and the Systematic Names are unambiguous, the Standard Names are a little more tricky. While a given name is only used as a Standard Name for 1 gene in the database, the same name may be associated with other genes as an Alias name. For an example, search for "TAF1" in the Quick Search box on any SGD page.

The SGD Gene Nomenclature Conventions page contains a more complete explanation of Standard Gene names and Systematic names for S. cerevisiae.

Nematode worm (C. elegans)

In WormBase, genes have a three- or four-letter gene name that is followed by a number. For example, zyg-12 or ced-12.
Each gene also has a *stable*, WBGene ID. For example, zyg-12 is associated with the WBGene ID WBGene00006997.
WormBase also lists, on each gene page, any possible synonyms for genes names as well as a sequence name if the gene is cloned. For zyg-12, the sequence name is ZK546.1 and two synonyms exist, ber-1 and 2F243.
When users type any given gene name into the Any Gene search box on the home page, they should get re-directed to the zyg-12/WBGene00006997 gene
If there is more than one transcript for a gene, then lower case letters are attached to the end of the sequence name, for example ZK546.1a.
C. elegans proteins have the same name as the gene, distinguished by capitalization. So, in the literature, the zyg-12 protein product would be referred to as: ZYG-12. Proteins also have WormBase IDs which have the format: WP:CE32088.
In the literature, genes and alleles are represented in the format of gene name (allele name), e.g. zyg-12(ct350). The letter(s) of the allele name corresponds to the laboratory from which the allele was isolated and the number is assigned by the lab. This page contains an explanation of the three-letter gene names and this page provides a key to the allele designation.
This page provides a comprehensive discussion of C. elegans nomenclature, which includes details about genes, alleles, transposons, etc.

Mustardweed (A. thaliana)

For names, a sequenced Arabidopsis genes will generally have an AGI name (#1 below), and may have one or both of the others.

an AGI (Arabidopsis Genome Initiative) name. This will look something like ATxGnnnnn.n, where x = chromosome number and n = a digit
a symbol-based name that may look like ABC1
a number of aliases, because TAIR will record all published names

Uncloned Arabidopsis genes will have #2 and/or #3 from above.

All Arabidopsis genes have a TAIR accession number, which you can find on the gene's detail page (right under Gene Model Type). For example, on the page for the gene called SQN, the TAIR accession is Gene:1945377.

The TAIR Nomenclature page provides more complete information about naming genes in A. thaliana.

Mouse (M. musculus)

Mouse gene names are should be brief and specific and should convey the character or function of the gene. Mouse genes are often referred to by their gene symbols, which are 3-5 characters, not to exceed 10 characters, e.g. Ash2l. The MGI Quick Guide to Nomenclature for Genes provides more detailed information.

Human (H. sapiens)

Human genes tend to have both a descriptive gene name, e.g. iduronate 2-sulfatase, and a gene symbol that is a short form, or abbreviation, derived from the gene name, e.g. IDS The HGNC Guidelines on Human Gene Nomenclature provides more detailed information about the naming of human genes.

The UniProt GOA group uses UniProtKB accession IDs for annotation. For example, if you go to the UniProt website and are looking for Angiomotin, type in "angiomotin human". In your results, you will get (mostly) human proteins which are angiomotin or have angiomotin somewhere in their description. In the angiomotin example there are several accessions claiming to be human angiomotin, but some of these are fragments - the accession we use is the full length one which has been given a protein name - i.e. AMOT_HUMAN as opposed to Q8TEN8_HUMAN which is only a fragment.

HGNC also give the UniProtKB accessions.