This tutorial will run through the process of building and editing an ontology in DAG-Edit, the ontology editing tool written and used by the GO Consortium.
Go to SourceForge and download the latest version of the installer for DAG-Edit. You will also need the latest DAG-Edit plugins, which add GO-specific functions to the basic DAG-Edit program. Run the DAG-Edit installer by double-clicking on it, and then install the plugins by dragging them into the DAG-Edit "extensions" folder.
Start up the DAG-Edit application. You should see a screen that looks like this:
Before starting the exercises, it will be useful to familiarize yourselves with how DAG-Edit works.
Download the sample ontology (you may need to right-click on the link and choose Save as...) and load it into DAG-Edit.
To load a file into DAG-Edit:
You can browse through an ontology by clicking on the icons next to a term name.
will expand a node with children,
collapses it. An
next to the term means the relationship to the parent is is-a;
means part-of. Clicking on a term selects it and the term information is shown in the panels to the right.
There are two special nodes that cannot be selected, the Types node and the Obsolete node. Types lists the relationship types used in the ontology; GO uses two (is-a and part-of) but other ontologies may use more (for example, develops-from). The Obsolete node is the home of terms that are no longer valid or concepts that are outside the scope of GO.
The term editor panel describes the selected term in detail.
ID: a unique numerical identifier for the term, allowing it to be identified even if the term name changes.
Namespace: the name of the ontology. The Gene Ontology is split into three ontologies: biological process, molecular function and cellular component.
Term Name: the primary string by which the term is known.
The definition, comment, synonym, dbxref and category fields may or may not be present. An asterisk (*) indicates that data is present, whilst fields without an asterisk are blank. Click on the tabs to view each field.
Definition: definitions are in free text and the dbxrefs [database reference] field is used to show where the definition came from. A definition cannot be committed without a reference; to add a dbxref, click on the Edit button, press Add and alter the fields appropriately. Definition dbxrefs can databases, books, webpages, people or any other source of knowledge!
Comment: used to store additional information about a GO term which isn't appropriate for adding to the definition. Advice for annotators is often held in the comments.
Synonyms: GO synonyms are useful both as alternate term names or as search aids. The relationship of the synonym to the term name is set using the pull-down menu.
DbXrefs [database cross-references]: objects in other databases which are exactly equivalent to the GO term. Most dbxrefs in GO link to enzyme activities or metabolic pathways in other databases.
Categories: subsets of terms created by users of GO. At present, there are five categories in GO: four GO slims and the prokaryotic subset. GO slims are small sets of high-level GO terms, useful for broad analysis of data sets or for giving an overview of the ontologies. The prokaryotic subset is a pruned version of GO with eukaryote-specific terms (e.g. terms related to the nucleus or to organelles, or to processes that occur in multicellular organisms) removed.
Commit: To save any changes you have made to the term, click Commit. DAG-Edit will not alter the term otherwise.
For the following exercises, you can uncheck the Advanced Search option so the simpler interface is displayed. You can set both which field you search - eg. GOids, term names, database references - and how you search - eg. search for an exact match or search using a regular expression.
Search results are displayed in a new window; clicking on a term or terms and choosing Select terms brings them up in the DAG Browser window.
This shows all the occurrences of the selected term in the ontologies. Note that a term can have more than one parent and that it can have different relationships to different parents. In this case, cytoplasmic chromosome is a type of chromosome and a part of the cytoplasm.
To create a new term, click on a parent term, choose Add term from the Edit menu and enter a term name into the appropriate box. To create a definition for the term, type some text into the appropriate box and click Edit to add a dbxref for the definition. Click Commit to add the term to the ontology. Note that you can add a term without definition but that you cannot add a definition without specifying a dbxref.
Search for or navigate through the DAG to find the term cell wall. Add a new child cell wall (sensu Fungi), with the definition A rigid yet dynamic structure surrounding the plasma membrane that affords protection from stresses and contributes to cell morphogenesis. Major components are glycoproteins and peptidoglycans including mannoproteins, glucans and sometimes chitin. This definition comes from a paper with PubMed ID 3319422, so add PMID 3319422 as the definition reference.
If you want to change the parentage of a term, simply drag it on to the term you want to be the new parent.
To give a term an additional parent, hold down the Shift key and then drag and drop the term on to the new parent.
To change the relationship between the term and its parent, right-click on the term. This should bring up a menu; select Change relationship type to and then the appropriate relationship. This can also be achieved by selecting the same option from the Edit menu.
Give the term protein transport additional parentage under the term death and change the relationship to part of.
You may find it helpful to be able to differentiate between defined and undefined terms. Choose DAG-Edit configuration manager from the Plugins menu; in the first pane, General GUI options, under Runtime display options, you can choose to gray out undefined terms.
If you are having problems using DAG-Edit, there is a user guide under the Help menu.