New Relationships in the Gene Ontology: Modeling Biological Regulation [note: define abbreviations MF, BP ... or spell out every time -mah] Abstract The Gene Ontology (GO) Consortium is continually improving the representation of molecular functions, biological processes, and cellular components in the ontologies developed and maintained as part of the GO project. To better represent biological regulation we have introduced three new relationship types: regulates, negatively_regulates, and positively_regulates. Previously, regulatory processes were represented as part_of the processes they regulate. The new relationship types better reflect that regulation is not necessarily a step in the process that is being regulated. Accompanying the new relationship types, new high-level regulation terms have been added to represent processes that regulate the activity of gene products and processes that regulate measurable biological attributes. These new terms and links allow automated analysis of the organization of regulation terms, enabling curators to maintain logical consistency with terms for the corresponding regulated processes. I. Introduction The Gene Ontology (GO) project is a collaborative effort to address the need for standardized descriptions of genes and gene products across databases. The GO ontology itself covers three domains: biological processes, cellular components and molecular functions. It is structured as a directed graph, with each node of the graph (terms) being linked by edges (relations). Each term in the graph can have one or more ancestors and zero or more descendants. Since its inception, terms in GO have been linked by two types of relations: is_a and part_of. The is_a relationship is a simple class-subclass relationship, where A is_a B means that A is a subclass of B; for example, chloroplast is_a organelle. The part_of relationship is slightly more complex: A part_of B means that *every* A is part_of *some* B; for example, GO has nucleus part_of cell, meaning that every nucleus is part of some cell, and embryo implantation part_of female pregnancy, meaning that every occurrence of embryo implantation takes place in the context of some female pregnancy. With only two relationship types, however, regulatory functions and processes were difficult to represent accurately in GO due to lack of expressivity. Although GO content developers wwere aware that regulatory processes are not necessarily integral to the processes they regulate, regulatory processes were of necessity represented as 'part_of' the processes they regulate. To improve the representation of regulation, these part_of relationships have been replace with new relationship types, 'regulates' and two sub-types, 'negatively regulates' and 'positively regulates'. Extending the representation of regulation beyond replacing part_of relationships with regulates relationships *within* the BP ontology, regulates relationships were also added within MF, and between BP and MF. II. Methods A. Defining the regulates relationships [text from http://wiki.geneontology.org/index.php/Image:Regulates_documentation.doc; probably want to clean up letters used for abstract processes, as it's kind of alphabet soup at the moment] To represent biological regulation accurately in an ontology such as GO, the relationships between terms must be precisely defined. The regulates, positively_regulates and negatively_regulates relationships describe interactions between biological processes and other biological processes, molecular functions or biological qualities. When a biological process E regulates a function or a process F, it modulates the occurrence of F. If F is a biological quality, then E modulates the value of F. An example of the regulation of a biological process is "regulation of transcription" -- when regulation of transcription occurs, it always alters the rate, extent or frequency at which a gene is transcribed. The regulates relationships are transitive over both the part_of and is_a relationships. A. part_of transitivity: If process Y exists in the GO biological process ontology and it is a part_of child of process X then any process that regulates process Y also regulates process X. In the example above, Ôregulation of transcriptionÕ regulates ÔtranscriptionÕ which is a part of gene expression. Therefore, Ôregulation of transcription also must regulate gene expression. B. is_a transitivity: If process B exists in the GO biological process ontology and it is an is_a child of process A then any process that regulates process B also regulates process A. A set of draft definitions have been proposed for addition to the OBO Relation ontology: id: OBO_REL:regulates name: regulates def: "A relation between a process and a process or quality. A regulates B if the unfolding of A affects the frequency, rate or extent of B. A is called the regulating process, B the regulates process" [] transitive_over: OBO_REL:part_of id: OBO_REL:positively_regulates name: positively_regulates def: "A regulation relation in which the unfolding of the regulating process *increases* the frequency, rate or extent of the regulated process" [] is_a: OBO_REL:regulates transitive_over: OBO_REL:part_of id: OBO_REL:negatively_regulates name: negatively_regulates def: "A regulation relation in which the unfolding of the regulating process *decreases* the frequency, rate or extent of the regulated process" [] is_a: OBO_REL:regulates transitive_over: OBO_REL:part_of These draft formal definitions are undergoing review and revision at present. [Currently in ro_proposed. Waiting for new defs from Chris. Will probably need some more explanation in this section when we have them including the transitivity of the relations, could use Amelia's new documentation -jl]. B. Organization of the biological regulation branch of biological process To facilitate the addition of the new relationship types, the organization of the regulation terms in the biological process ontology was overhauled, and new high-level regulation classes added. The terms were organized into three classes according to the object of the regulation: 1. Regulation of biological process 2. Regulation of molecular function 3. Regulation of biological quality The first describes the straightforward case where a biological process is being regulated, as in 'regulation of glucose metabolism'. The second class describes the regulation of a molecular function, for example, 'regulation of kinase activity'. The final class is regulation of a biological quality, where a biological quality is "a measurable attribute of an organism or part of an organism, such as size, mass, shape, color, etc." For example, regulation of heart rate is regulation of a biological quality. All regulation terms were placed under one of these categories, and new terms were added where required. In addition, term names and definitions were standardized and improved. [Include examples, generic regulation defs here? -jl] [examples probably good; don't think we need to repeat reg defs since they're in the very preceding section -mah] We required that the biological processes mentioned in 'regulation of biological process' and 'regulation of molecular function' existed in the BP and MF ontologies respectively, so these were added when they were missing, or regulation terms were obsoleted where they had no legitimate target [egs needed here? Is this redundant with section D?]. For 'regulation of biological quality' terms, we required that the biological quality be in the PATO [ref] ontology. [I'm inclined to put this in the QC section, with examples -mah] C. Usage of regulates relationships C1. Regulatory relations within BP Within the biological process ontology, regulates relationships were added for 'regulation of biological process' terms and the corresponding biological processes; regulation terms and relationships follow a consistent general structure, as shown in Figure [1 (or 1a?)]. For example, regulation of cell growth [regulates] cell growth. Positively regulates and negatively regulates relationships were also added between the terms 'positive regulation of X' and 'negative regulation of X' and X, where X is the corresponding BP term. process [r] regulation of process ---[i] negative regulation of process ---[i] positive regulation of process [r-] negative regulation of process [r+] positive regulation of process [probably all better as graph figs -jl] [I agree; can put some details about the relations in fig legends -mah] In addition, positive and negative regulation processes are permitted several optional subclasses, as shown in Figure [2, or 1b] process [r] regulation of process ---[i] negative regulation of process ------[i] downregulation of process ------[i] inhibition of process ------[i] termination of process ---[i] positive regulation of process ------[i] activation of process ------[i] maintenance of process ------[i] upregulation of process These subclasses describe specific types of positive and negative regulation processes; for example, activation of process A is defined as 'Any process that starts the inactive process A'. No specific relationship types, e.g. inhibits, were added for these subclasses. C2. Regulatory molecular functions / regulates relationships within molecular function For processes that of molecular functions, the regulates relationship forms a link between the process and function ontologies: biological process [i] biological regulation ---[i] regulation of molecular function ------[i] regulation of protein binding ... molecular function [i] protein binding ---[r] regulation of protein binding Regulates relationships were added between BP and MF for 'regulation of molecular function' terms and the corresponding molecular functions. For example, regulation of kinase activity (BP) regulates kinase activity (MF). Positively regulates and negatively regulates relationships were also added between the terms 'positive regulation of X' and 'negative regulation of X' and X, where X is the corresponding MF term. These relationships were the first inter-ontology links in the GO vocabularies. Prior to the addition of the new regulatory relationships, these relationships could be thought of as existing implicitly in the ontologies via the term names, but were opaque to reasoning. For example, a human can clearly see that 'regulation of kinase activity' in BP has a relationship to 'kinase activity' in MF. One aim of the 'regulates' project was to make these implicit relationships explicit in the ontology. To represent regulatory functions, as opposed to regulatory processes, regulates relationships have also been added within MF; for example, calcium channel regulator activity (MF) regulates calcium channel activity (MF). [we haven't added these yet have we? [snip] -jl] [yep, they're in _write and _ext -mah] D. Ontology quality control methods [reading this back, I think I may have slipped too much into talking about xp methodology - we may need to cut some of this stuff -jl] [It would probably help to recast this section to shift the emphasis from mechanics to what the parsing, reasoning, etc. were meant to find. I also think we need input from Chris, David and Tanya to go any further with this section. I'm not sure things are presented in the best order, and I think they'll have better suggestions. -mah] 1. 1a. Parsing Basic Regulation Terms We first used the OBO-Edit 2.0 semantic parser [ref] to derive explicit logical definitions (necessary and sufficient conditions) for a subset of regulation terms that followed a very regular form. The OE2 semantic parser uses the following grammar: negative regulation of P -> biological_regulation THAT negatively_regulates P regulation of P -> biological_regulation THAT regulates P positive regulation of P -> biological_regulation THAT positively_regulates P [Q for Chris/David/Tanya - did you parse the activation etc terms here too? -jl] Only the basic regulation terms that could be parsed by the rules above were included in this set of definitions. The multi-organism process regulation terms posed a more difficult problem and were parsed using the more complex Obol grammar [ref] [see section X]. The necessary and sufficient conditions were transcribed in OBO format [ref], for example: [Term] id: GO:0000414 name: regulation of histone H3-K36 methylation namespace: biological_process def: "Any process that modulates the frequency, rate or extent of the covalent addition of a methyl group to the lysine at position 36 of histone H3." [GOC:krc] intersection_of: GO:0065007 ! biological regulation intersection_of: regulates GO:0010452 ! histone H3-K36 methylation 1b. Reasoning Using the files containing the logical definitions for the regulation terms the OBO-Edit 2.0 reasoner (?) was used to deduce which links were missing from the ontologies. These were checked by ontology editors and added to the live GO. This editorial process also involved improving the structure of the regulation graph such that the hierarchy of the standard regulation terms paralleled that of the main graph. This was an iterative process, so once the editors had made a round of fixes to the graph, the terms were parsed again and the whole process repeated until no missing links were found by the reasoner. The reasoner was also used to find abduced links - these are cases where we have 'regulation of X is_a regulation of X', yet there is no declared or inferred relation between X and Y [eaxmple here?]. [not sure about the process for the checks below - Midori, where was that list from? -mah] [reply: I got it from Tanya's slides; I don't know whether we've got the timeline correct but D&T can fix it! -mah] a. for each 'regulation of [biological process]' there is a corresponding [biological process] term b. for each 'regulation of [molecular function]' there is a corresponding [molecular function] term c. for each 'regulation of [biological quality]' there is a corresponding [biological quality] term in PATO (or request pato term) d. every 'regulation of [target]' has an is_a path to the correct one of the three children of biological regulation 1c. Parsing RoMOP terms Not all of the terms in GO that describe regulatory processes conform to a simple grammar. The terms within the 'multi-organism process' node of the GO have a highly complex composition, and parsing the regulation terms within this node therefore required the more elaborate OBOL grammar. For example: [Term] id: GO:0052539 ! positive regulation by symbiont of defense-related host cell wall thickening intersection_of: GO:0065007 ! biological regulation intersection_of: OBO_REL:regulated_by obol:symbiont ! symbiont intersection_of: positively_regulates GO:0052386 ! cell wall thickening intersection_of: regulates_process_in obol:host ! host intersection_of: positively_regulates GO:0006952 ! defense response The complexity of the regulation of multi-organism process terms meant that the grammar was not able to parse all of these terms: for example it does not catch the regulation of qualities, such as levels of RNA in host. There are other terms it cannot parse because the regulation terms do not match the wording of the process being regulated, for example: 'modification of morphology or physiology of other organism via protein secreted by type III secretion system during symbiotic interaction' [regulates] protein secretion by the type II secretion system. This generally occurs because of the constraints of language - the terms have to be written with a certain word order to make sense to the reader. We continue to revise and improve both the grammar and the wording of the RoMOP terms themselves to increase the proportion of these terms are transparent to the reasoner. [too vague? better in the conclusion? -jl] [add more here? Roles ontology? Relations used? limitations of grammar? -jl] [depends on the target audience, methinks ... might be too much if we're aiming for biologists, but geekier types would eat it up -mah] 2. manual review of QC reports; types of problems & solutions [Text from email & wiki; may need polishing -mah] Ontology developers have thoroughly reviewed the relationships involving regulatory processes and their targets to ensure internal consistency. If a term 'regulation of process X' exists in the ontology, it must be a valid subtype of 'regulation of biological process', and must have a relationship (originally part_of, then transformed into one of the 'regulates' relationships) with 'process X' or be a valid subtype of another regulatory process. III. Results and Discussion The regulates relationships within BP are included in all OBO format versions of GO [download page URL here?]. The regulates relationships within MF, and those between MF and BP, are released in an "extended GO" OBO file, and are displayed in the AmiGO browers. [anything more to add on availability/release? -mah] A. Metrics [from http://wiki.geneontology.org/index.php/Regulation_metrics] [convert to a table, then use this text] Table [1] summarizes the changes made to the ontologies during the initial phase of the 'regulates' project, in which regulates relationships were added to BP. Terms: 1137 terms reviewed. 53 new terms were made. 5 term merges took place. 360 names were changed. 490 general dbxrefs were added. Relationships: 64 new part_of relationships were made. 1305 new is_a relationships were made. (969 by reasoner) 32 part_of relationships were deleted. 546 is_a relationships were deleted. (302 by reasoner) Definitions: 58 definitions were added. 84 definitions were changed. 587 definition dbxrefs were added. 4 definition dbxrefs were deleted. Synonyms: 311 new synonyms were added. 328 synonyms were assigned a scope. 42 synonyms were deleted. [anything available for subsequent phases? -mah] B. Benefits and implications 1. Support for clever queries 2. relation composition, transitive_over part_of, etc. 3. implications for displays, tools, annotation counts, etc. [text straight from email & wiki; could probably be polished -mah] Adding these relationships improves the ability of the ontology to represent biology completely and accurately. The average GO user will benefit from these new links because they will be able to ask and answer more complex questions than they could previously. Users must understand what the different relationships mean and how the various GO tools utilize them. The addition of these links also has important implications for tools that ignore relationship types when summarizing annotations. For example, it is important to understand whether a query will return all children of a term regardless of its relationship to the parent, or can discriminate between relationship types. If your tool of choice lumps annotations to 'calcium channel regulator activity' together with the regulates parent 'calcium channel activity', a query for calcium channels will also retrieve gene products that function as calcium channel regulators (and not necessarily as channels!). More sophisticated tools will allow users to customize queries to return results that better reflect their interests. For example, tools that are upgraded to take relationships into consideration will allow users to look for processes or functions, and specify whether to include or exclude their regulates children. [include anything about relation composition? http://wiki.geneontology.org/index.php/Relation_composition -mah] C. Future directions 1. Cross-product definitions for regulation terms [http://wiki.geneontology.org/index.php/Regulation_cross-products, http://wiki.geneontology.org/index.php/XP:biological_process_xp_regulation] A near-term goal for the representation of regulation in GO is to provide explicit computable logical definitions, also known as genus-differentia definitions or cross-products, for regulation terms. 2. tool development? poss for mf-bp and/or tools, queries, from email & http://wiki.geneontology.org/index.php/Overview_of_Function-Process_regulates_links