The OBO Flat File Format Guide, version 1.4
DRAFT
OBO format is the text file format used by OBO-Edit, the open source, platform-independent application for viewing and editing ontologies.
This document is intended as a companion guide to the OBO-Format, in the style of the OBO 1.2 file format guide. It is not intended as a formal specification of syntax or semantics - this is available as a seperate document, the OBO Format Syntax and Semantics available at http://purl.obolibrary.org/obo/oboformat/spec.html document. The formal specification is normative, this document is only a guide.
The official website for OBO Format documentation and software is http://purl.obolibrary.org/obo/oboformat/
This document supersedes the obsolete 1.3 specification. Syntactic changes are minimal.
Any changes between OBO-Format 1.2 and 1.4 are either in highlighted text, or with a coloured bar at the left side of the page. See further on in this document for a summary of these changes.
Abstract
The OBO flat file format is for representing ontologies and controlled vocabularies.
The format itself attempts to achieve the following goals:
- Human readability
- Ease of parsing
- Extensibility
- Minimal redundancy
OBO Format Syntactic Structure
The format is similar to the tag-value format of the GO definitions file, with a few modifications. One important difference is that unrecognized tags in any context do not necessarily generate fatal errors (although some parsers may decide to do so; see Parser Requirements below). This allows parsers to read files that contain information not used by a particular tool.
OBO Document Structure
An OBO document is structured as follows:
<header>
<stanza>
<stanza>
...
Blank lines are ignored.
The header is an unlabeled section at the beginning of the document containing tag-value pairs. The header ends when the first stanza is encountered.
A stanza is a labeled section of the document, indicating that an object of a particular type is being described. Stanzas consist of a stanza name in square brackets, and a series of tag-value pairs, structured as follows:
[<Stanza name>]
<tag-value pair>
<tag-value pair>
<tag-value pair>
Comments
An OBO file may contain any number of lines beginning with !
, at any point in the file. These lines are ignored by parsers.
Further, any line may end with a !
comment. Parsers that encounter an unescaped !
will ignore the !
and all data until the end of the line. \<newline>
sequences are not allowed in !
comments (see escape characters).
Tag-Value Pairs
Tag-value pairs consist of a tag name, an unescaped colon, the tag value, and a newline:
<tag>: <value> {<trailing modifiers>} ! <comment>
The tag name is always a string. The value is always a string, but the value string may require special parsing depending on the tag with which it is associated.
In general, tag-value pairs occur on a single line. Multi-line values are possible using escape characters (see escape characters).
In general, each stanza type expects a particular set of pre-defined tags. However, a stanza may contain any tag. If a parser does not recognize a tag name for a particular stanza, no error will be generated. This allows new experimental tags to be added without breaking existing parsers. See handling unrecognized tags for specifics.
Trailing Modifiers
Any tag-value pair may be followed by a trailing modifier. Trailing modifiers have been introduced into the OBO 1.2 Specification to allow the graceful addition of new features to existing tags.
A trailing modifier has the following structure:
{<name>=<value>, <name=value>, <name=value>}
That is, trailing modifiers are lists of name-value pairs.
Parser implementations may choose to decode and/or round-trip these trailing modifiers. However, this is not required. A parser may choose to ignore or strip away trailing modifiers.
For this reason, trailing modifiers should only include information that is optional or experimental.
Trailing modifiers may also occur within dbxref definitions (see dbxref formatting).
Escape characters
Tag names and values may contain the following escape characters:
- \n
- newline
- \W
- single space
- \t
- tab
- \:
- colon
- \,
- comma
- \"
- double quote
- \\
- backslash
- \(
- open parenthesis
- \)
- close parenthesis
- \[
- open bracket
- \]
- close bracket
- \{
- open brace
- \}
- close brace
- @
- at (language tag)
- \<newline>
- <no value>
Escaped characters should only be used when a literal character is needed (that is, a character that the parser should not interpret as having a special meaning when parsing). Some tag values may contain unescaped colons, brackets, quotes, etc., that have meaning in decoding the tag value. Unescaped spaces between the separator colon and the start of the value tag are discarded.
OBO parser implementations may support only these escape characters, or they may assume that any character following a backslash is an escaped character. Parsers that choose the latter approach will translate \a
and \?
to "a" and "?" respectively.
Identifier Syntax
Identifiers (IDs) in OBO should be strings consisting of an IDSpace concatenated to a LocalID via a : (colon) character. The ID should not contain any whitespace. The IDSpace should not itself contain any colon characters, and should ideally be registered on the GO xrefs page or with OBO.
Built-in OBO Semantics
Document Header Tags
Required tags
- format-version
- Gives the OBO specification version that this file uses. This is useful if tag semantics change from one OBO specification version to the next. Cardinality: zero or one.
Optional tags
- data-version
- Gives the version of the current ontology. This gets translated to versionIRI in OWL. Cardinality: zero or one.
- version
-
Deprecated. Use
data-version
instead. - ontology
- The ID space of this ontology. This should correspond to to the ID prefix of the terms that belong to that ontology, translated to lowercase. For GO, the value of this field will be "go". For the cell ontology (i.e. the ontology that contains CL:0000001), the value will be "cl". If the obo document contains some alternative cut or extension of the ontology (for example, a GO slim, or an ontology merged with another), then the ontology should be of form "X/Y", where X is the basic ontology name, and Y identifies the cut. For example go/gosubset_prok. A URI is also permitted in here. In the translation to OWL, the usual default prefix rules will apply, with the ".owl" suffix. E.g. "go" will be treated as "http://purl.obo-library.org/obo/go.owl"
- date
- The current date in dd:MM:yyyy HH:mm format (note: for historic reasons, this is NOT a ISO 8601 date, as is the case for the creation-date field) . Cardinality: zero or one.
- saved-by
- The username of the person to last save this file. The meaning of "username" is entirely up to the application that generated the file. Cardinality: zero or one.
- auto-generated-by
- The program that generated the file. Cardinality: zero or one.
- subsetdef
-
A description of a term subset. The value for this tag should contain a subset name, a space, and a quote enclosed subset description, as follows:
subsetdef: GO_SLIM "GO Slim"
- import
-
A url or ontology ID
referencing another OBO
document. Previously this was interpreted as a directive to download and merge the referenced file. This is no longer the case. The semantics are now identical to OntologyImports in OWL.. Parsers must translate simple ontology IDs (e.g. "go") to full URIs using the same expansion as for the ontology header tag (e.g. http://purl.obolibrary.org/obo/go.owl). Parsers may use different mechanisms for resolving the URL. For example, they may use a catalog.xml file. The imported document MUST be in either obo format or an OWL concrete syntax. Cardinality: any.
- synonymtypedef
- A description of a user-defined synonym type. The value for this tag should contain a synonym type name, a space, a quote enclosed description, and an optional scope specifier, as follows:
synonymtypedef: UK_SPELLING "British spelling" EXACT
The scope specifier indicates the default scope for any synonym that has this type. See the synonym section of tags in a term stanza for more information on the scope specifier. Cardinality: any.
- idspace
- A mapping between a "local" ID space and a "global" ID space. The value for this tag should be a local idspace, a space, a URI, optionally followed by a quote-enclosed description, like this:
idspace: GO urn:lsid:bioontology.org:GO: "gene ontology terms"
- default-relationship-id-prefix
- Any relationship lacking an ID space will be prefixed with the value of this tag. For example:
default-relationship-id-prefix: OBO_REL
The above will make sure that all relations referred to in the current file come from the OBO relations ontology, unless otherwise specified.
The scope of this tag is within the current file only. See also
Cardinality: zero or one.id-mapping
, below- id-mapping
- Maps a Term or Typedef ID to another Term or Typedef ID. The main reason for this tag is to increase interoperability between different OBO ontologies.
id-mapping: part_of OBO_REL:part_of
This maps all cases of the unqualified relationship
part_of
to the IDOBO_REL:part_of
defined in the OBO relations ontologyThe scope of this tag is within the current file only. Note that the
Cardinality: anydefault-relationship-id-prefix
tag takes precedence over this tag- remark
- General comments for this file. This tag is differentiated from a
!
comment in that the contents of aremark
tag are guaranteed to be preserved by a parser. Cardinality: anyThe following tags are new in OBO 1.4:
- treat-xrefs-as-equivalent
-
The value for this tag should contain an ID Space. Ideally one declared here
Macro. Treats all xrefs coming from a particular ID-Space as being statements of exact equivalence. Normally, xrefs have no special meaning beyond "This xref is of relevance to the current entity".
Example:
treat-xrefs-as-equivalent: CL
.
.
.
[Term]
id: GO:0005623
name: cell
xref: CL:0000000This declares CL:0000000 and GO:0005623 to be equivalent in what they reference.
Cardinality: any - treat-xrefs-as-genus-differentia
-
The value for this tag should contain an ID Space followed by a relation and then a class filler. Cardinality: any.
Macro. Treats all xrefs coming from a particular ID-Space as being genus-differentia definitions (cross products, logical definitions, intersection definitions). Normally, xrefs have no special meaning beyond "This xref is of relevance to the current entity".
Example:
treat-xrefs-as-genus-differentia: CL part_of NCBITaxon:7955
.
.
[Term]
id: ZFA:0000134
name: neuron
xref: ZFIN:ZDB-ANAT-010921-563
xref: CL:0000540
is treated as if it states:
[Term]
id: ZFA:0000134
name: neuron
xref: ZFIN:ZDB-ANAT-010921-563
intersection_of: CL:0000540
intersection_of: part_of NCBITaxon:7955 - treat-xrefs-as-relationship
-
The value for this tag should contain an ID Space followed by a relation ID. Cardinality: any
Macro. Treats all xrefs coming from a particular ID-Space as being relationships. Normally, xrefs have no special meaning beyond "This xref is of relevance to the current entity".
Example:
treat-xrefs-as-relationship: MA homologous_to
This declares all xrefs to MA to be homology relationships.
- treat-xrefs-as-is_a
-
The value for this tag
should contain an ID
Space. Cardinality: any
Macro. Treats all xrefs coming from a particular ID-Space as being is_a relationships. Normally, xrefs have no special meaning beyond "This xref is of relevance to the current entity".
Example:
treat-xrefs-as-is_a: CL
This declares all xrefs to CL to be is_a relations.
- relax-unique-identifier-assumption-for-namespace
-
The value for this tag should be a namespace
By default, an obo namespace (note: not ID-space) partitions all the entities such that no two entities belonging to the same namespace may be equivalent. This header tag relaxes this assumption.
Note that this assumption does not hold by default *between* namespaces (it is OK for cellular_component and cell to use different identifiers to denote the type "cell").
It is unlikely that this tag will be used frequently. One scenario in which it may be useful is if a single ontology is created from multiple sources, with redundancy:
relax-unique-identifier-assumption-for-namespace: my_combined_ontology
- relax-unique-label-assumption-for-namespace
-
The value for this tag should be a namespace
By default, an obo namespace (note: not ID-space) partitions all the entities such that no two entities belonging to the same namespace may have the same name tag (obsolete entities omitted). This header tag relaxes this assumption.
Note that this assumption does not hold by default *between* namespaces (it is OK for mouse_anatomy and fma to use the same names).
It is recommended that the unique label assumption should be maintained at all times. However, there may be times at an early stage of ontology development where this is relaxed.
relax-unique-label-assumption-for-namespace: my_combined_ontology
Stanzas
At present, all Term, Typedef and Instance stanzas always begin with an
id
tag. The value of theid
tag announces the object to which the rest of the tags in the stanza refer. Normal, non-anonymousid
s have global scope. An object has the sameid
in every file, and in every namespace. See ID Syntax.The
id
tag may be optionally followed by anis_anonymous
tag. If the value ofis_anonymous
is true, the object is anonymous. Theid
of an anonymous object is not fixed; if the ontology is parsed and then reserialized, theid
may change. Anonymousid
s have local scope; they are only valid in the file from which they were loaded. The same anonymousid
in two different files refers to a different object in each file.Any given stanza does not have to contain all the required tags. A file (or collection of files) may contain multiple stanzas that describe different aspects of an object. A required tag must be specified at least once for each object in a given set of files. This makes it possible for optional information to be stored in a separate file, and only loaded when necessary.
This means that parsers must wait until the end of the parse batch to check whether required information is missing. Multiple descriptions may produce parse errors if:
- A stanza contains tags that contradict a previous stanza (ie one term description gives a different term name than another description)
- A parser has processed all the files in a batch, but an object is still missing some required value (such as a term name).
There are currently three supported stanza types:
[Term]
,[Typedef]
,[Instance]
. Parsers/serializers will round-trip (successfully load and save) unrecognized stanzas.Stanza Types
- Term
- Term stanzas constitute the nodes in an ontology graph. Formally a Term stanza is equivalent to a Class declaration in OWL.
- Typedef
- Typedef stanzas constitute the edge labels that may be used in an ontology graph. Also known as relations, relationship types, properties or predicates. The name "Typedef" is somewhat confusing but is retained for forwards compatibility. Formally equivalent to a property in OWL.
- Instance
- Instance stanzas are used to represent the spatiotemporal particulars that instantiate types. Note that instances are typically not represented in ontologies. OBO allows them for completeness, to allow generalized data exchange and for compatibility with other languages.
Tags in a [Term] Stanza
Required tags
- id
- The unique id of the current term. Cardinality: exactly one.
Optional tags
- name
- The term name. Any term may have only zero or one name defined. Cardinality: zero or one - If multiple term names are defined, it is a parse error. In 1.2 name was required. This has been relaxed in 1.4. This helps with OWL interoperability, as labels are optional in OWL
- is_anonymous
- Whether or not the current object has an anonymous id. Cardinality: zero or one. The semantics are the same as B-Nodes in RDF.
- alt_id
- Defines an alternate id for this term. Cardinality: any. A term may have any number of alternate ids.
- def
-
The definition of the current
term. Cardinality: zero
or one.
More than one definition for a term generates a parse error. The value of this tag should be the quote enclosed definition text, followed by a dbxref list containing dbxrefs that describe the origin of this definition (see dbxref formatting for information on how dbxref lists are encoded). An example of this tag would look like this:
definition: "The breakdown into simpler components of (+)-camphor, a bicyclic monoterpene ketone." [UM-BBD:pathway "", http://umbbd.ahc.umn.edu/cam/cam_map.html ""]
- comment
- A comment for this term. Cardinality: zero or one. There must be zero or one instances of this tag per term description. More than one comment for a term generates a parse error.
- subset
-
This tag indicates a term
subset to which this term
belongs. The value of this tag
must be a subset name as
defined in
a
subsetdef
tag in the file header. If the value of this tag is not mentioned in asubsetdef
tag, a parse error will be generated. Cardinality: any. A term may belong to any number of subsets. - synonym
-
This tag gives a synonym for this term, some xrefs to describe the origins of the synonym, and may indicate a synonym category or scope information. Cardinality: any.
The value consists of a quote enclosed synonym text, a scope identifier, an optional synonym type name, and an optional dbxref list, like this:
synonym: "The other white meat" EXACT MARKETING_SLOGAN [MEAT:00324, BACONBASE:03021]
The synonym scope may be one of four values:
EXACT
,BROAD
,NARROW
,RELATED
. If the first form is used to specify a synonym, the scope is assumed to beRELATED
.The synonym type must be the id of a synonym type defined by a
synonymtypedef
line in the header. If the synonym type has a default scope, that scope is used regardless of any scope declaration given by a synonym tag.The dbxref list is formatted as specified in dbxref formatting. A term may have any number of synonyms.
- exact_synonym
-
Deprecated. An alias for the
synonym
tag with the scope modifier set toEXACT
. - narrow_synonym
-
Deprecated. An alias for the
synonym
tag with the scope modifier set toNARROW
. - broad_synonym
-
Deprecated. An alias for the
synonym
tag with the scope modifier set toBROAD
. - xref
- A dbxref that describes an analagous term in another vocabulary (see dbxref formatting for information about how the value of this tag must be formatted). Cardinality: any. A term may have any number of xrefs.
- xref_analog
-
Deprecated. An alias for the
xref
tag. - xref_unk
-
Deprecated. An alias for the
xref
tag. - is_a
-
This tag describes a
subclassing relationship
between one term and
another. The value is the id
of the term of which this term
is a subclass. A term may have
any number
of
is_a
relationships. This is equivalent to a SubClassOf axiom in OWL. Cardinality: any.Parsers which support trailing modifiers may optionally parse the following trailing modifier tags for
is_a
:namespace <any namespace id>
derived true OR falseThe
namespace
modifier allows theis_a
relationship to be assigned its own namespace (independent of the namespace of the superclass or subclass of thisis_a
relationship).The
derived
modifier indicates that theis_a
relationship was not explicitly defined by a human ontology designer, but was created automatically by a reasoner, and could be re-derived using the non-derived relationships in the ontology.This tag previously supported the
completes
trailing modifier. This modifier is now deprecated. Use theintersection_of
tag instead. - intersection_of
-
Cardinality: EITHER zero OR
two or more.
This tag indicates that this term is equivalent to the intersection of several other terms. The value is either a term id, or a relationship type id, a space, and a term id. For example:
id: GO:0000085
name: G2 phase of mitotic cell cycle
intersection_of: GO:0051319 ! G2 phase
intersection_of: part_of GO:0000278 ! mitotic cell cycleThis means that GO:0000085 is equivalent to any term that is both a subtype of 'G2 phase' and has a part_of relationship to 'mitotic cell cycle' (i.e. the G2 phase of the mitotic cell cycle). Note that whilst relationship tags specify necessary conditions, intersection_of tags specify necessary and sufficient conditions.
A collection of intersection_of tags appearing in a term is also known as a cross-product definition (this is the same as what OWL users know as a defined class, employing intersectionOf constructs).
It is strongly recommended that all intersection_of tags follow a genus-differentia pattern. In this pattern, one of the tags is directly to a term id (the genus) and the other tags are relation term pairs. For example:
[Term]
id: GO:0045495 name: pole plasm
intersection_of: GO:0005737 ! cytoplasm
intersection_of: part_of CL:0000023 ! oocyteThese definitions can be read as sentences, such as a pole plasm is equivalent to a cytoplasm that is part_of an oocyte
If any
intersection_of
tags are specified for a term, at least twointersection_of
tags need to be present or it is a parse error. The full intersection for the term is the set of all ids specified by all intersection_of tags for that term.As of OBO 1.4, this tag may be applied in Typedef stanzas
- union_of
-
Cardinality: EITHER zero OR
two or more.
This tag indicates that this term represents the union of several other terms. The value is the id of one of the other terms of which this term is a union.
If any
union_of
tags are specified for a term, at least 2union_of
tags need to be present or it is a parse error. The full union for the term is the set of all ids specified by all union_of tags for that term.This tag may not be applied to relationship types.
Parsers which support trailing modifiers may optionally parse the following trailing modifier tag for
disjoint_from
:namespace <any namespace id>
- disjoint_from
-
Cardinality: any.
This tag indicates that a term is disjoint from another, meaning that the two terms have no instances or subclasses in common. The value is the id of the term from which the current term is disjoint. This tag may not be applied to relationship types.
Parsers which support trailing modifiers may optionally parse the following trailing modifier tag for
disjoint_from
:namespace <any namespace id>
derived true OR falseThe
namespace
modifier allows thedisjoint_from
relationship to be assigned its own namespace.The derived modifier indicates that the
disjoint_from
relationship was not explicitly defined by a human ontology designer, but was created automatically by a reasoner, and could be re-derived using the non-derived relationships in the ontology. - relationship
-
Cardinality: any.
This tag describes a typed relationship between this term and another term or terms. The value of this tag should be the relationship type id, and then the id of the target term, plus, optionally, other target terms. The relationship type name must be a relationship type name as defined in a typedef tag stanza. The
[Typedef]
must either occur in a document in the current parse batch, or in a file imported via an import header tag. If the relationship type name is undefined, a parse error will be generated. If the id of the target term cannot be resolved by the end of parsing the current batch of files, this tag describes a "dangling reference"; see the parser requirements section for information about how a parser may handle dangling references. If a relationship is specified for a term with anis_obsolete
value of true, a parse error will be generated.Parsers which support trailing modifiers may optionally parse the following trailing modifier tags for relationships:
namespace <any namespace id>
inferred true OR false
cardinality any non-negative integer
maxCardinality any non-negative integer
minCardinality any non-negative integerThe
namespace
modifier allows the relationship to be assigned its own namespace (independant of the namespace of the parent, child, or type of the relationship).The
inferred
modifier indicates that the relationship was not explicitly defined by a human ontology designer, but was created automatically by a reasoner, and could be re-derived using the non-derived relationships in the ontology.Cardinality qualifiers can be used to specify constraints on the number of relations of the specified type any given instance can have. For example, in the stanza declaring a
id: SO:0000634 ! polycistronic mRNA
, we can say:relationship: has_part SO:0000316 {minCardinality=2} ! CDS
which means that every instance of a transcript of this type has two or more CDS features such that they stand in a has_part relationship from the transcript.The semantics of a relationship tag is by default "all-some". Formally, in OWL this corresponds to an existential restriction - see the OWL section.
- is_obsolete
-
Cardinality: zero or one.
Whether or not this term is obsolete. Allowable values are "true" and "false" (false is assumed if this tag is not present). Obsolete terms must have no relationships, and no defined
is_a
,inverse_of
,disjoint_from
,union_of
, orintersection_of
tags. - replaced_by
-
Cardinality: any.
Gives a term which replaces an obsolete term. The value is the id of the replacement term. The value of this tag can safely be used to automatically reassign instances whose
instance_of
property points to an obsolete term. -
The
replaced_by
tag may only be specified for obsolete terms. A single obsolete term may have more than onereplaced_by
tag. This tag can be used in conjunction with theconsider
tag. - consider
- Cardinality: any. Gives a term which may be an appropriate substitute for an obsolete term, but needs to be looked at carefully by a human expert before the replacement is done.
-
This tag may only be specified for obsolete terms. A single obsolete term may have many
consider
tags. This tag can be used in conjunction withreplaced_by
. - use_term
-
Deprecated. Equivalent to
consider
. - builtin
- Cardinality: zero or one. Whether or not this term or relation is built in to the OBO format. Allowable values are "true" and "false" (false assumed as default). Rarely used. One example of where this is used is the OBO relations ontology, which provides a stanza for the is_a relation, even though this relation is axiomatic to the language.
Additional tags in 1.4:
- created_by
-
Cardinality: zero or one.
Name of the creator of the
term. May be a short username,
initials or ID. Example: dph
Note that although this tag is defined in obof1.4, it can be used in obof1.2 harmlessly
- creation_date
-
Cardinality: zero or one.
Date of creation of the term
specified in ISO 8601
format. Example:
2009-04-13T01:32:36Z
Note that although this tag is defined in obof1.4, it can be used in obof1.2 harmlessly
Dbxref Formatting
Dbxref definitions take the following form:
<dbxref name> {optional-trailing-modifier}
or
<dbxref name> "<dbxref description>" {optional-trailing-modifier}
The dbxref is a colon separated key-value pair. The key should be taken from GO.xrf_abbs but this is not a requirement. If provided, the dbxref description is a string of zero or more characters describing the dbxref. DBXref descriptions are rarely used and as of obof1.4 are discouraged.
Dbxref lists are used when a tag value must contain several dbxrefs. Dbxref lists take the following form:
[<dbxref definition>, <dbxref definition>, ...]
The brackets may contain zero or more comma separated dbxref definitions. An example of a dbxref list can be seen in the GO def for "ribonuclease MRP complex":
def: "A ribonucleoprotein complex that contains an RNA molecule of the snoRNA family, and cleaves the rRNA precursor as part of rRNA transcript processing. It also has other roles: In S. cerevisiae it is involved in cell cycle-regulated degradation of daughter cell-specific mRNAs, while in mammalian cells it also enters the mitochondria and processes RNAs to create RNA primers for DNA replication." [GOC:sgd_curators, PMID:10690410, PMID:14729943, PMID:7510714]
Note that the trailing modifiers (like all trailing modifiers) do not need to be decoded or round-tripped by parsers; trailing modifiers can always be optionally ignored. However, all parsers must be able to gracefully ignore trailing modifiers. It is important to recognize that lines which accept a dbxref list may have a trailing modifier for each dbxref in the list, and another trailing modifier for the line itself.
Tags in [Typedef] Stanza
[Typedef]
stanzas support almost all the same tags as a[Term]
stanza.In OBO Format 1.2, the following tags were not allowed in a
[Typedef]
stanza. In 1.4 they are allowed.-
union_of
-
intersection_of
-
disjoint_from
The following additional tags are only allowed in a
[Typedef]
stanza:- domain
-
The id of a term, or a special
reserved identifier, which
indicates the domain for this
relationship type. If a
property P has domain D, then
any term T that has a
relationship of type P to
another term is a subclass of
D. Note that this
does not mean
that the domain restricts
which classes of terms can
have a relationship of type P
to another term. Rather, it
means that any term that has a
relationship of type P to
another term is by
definition a subclass of
D.
Cardinality: zero or
one. If the intent is
to declare a disjunctive
domain, then a new class must
be declared and defined using
the
union_of
construct. - range
-
The id of a term, or a special
reserved identifier, which
indicates acceptable range for
this relationship type. If a
property P has range R, then
any term T that is the target
of a relationship of type P is
a subclass of R. Note that
this does not
mean that the range restricts
which classes of terms can be
the target of relationships of
type P. Rather, it means that
any term that is the target of
a relationship of type P
is by definition a
subclass of R.
Cardinality: zero or
one. If the intent is
to declare a disjunctive
range, then a new class must
be declared and defined using
the
union_of
construct. - inverse_of
- The id of another relationship type that is the inverse of this relationship type. If relation A is the inverse_of type B, and X has relationship A to Y, then it is implied that Y has relation B to X. In obof1.2 the semantics of inverse_of were unclear, as obof1.2 unofficially allowed type-level relations. In obof1.4, the semantics are identical to OWL. Cardinality: any.
- transitive_over
- The id of another relationship type that this relationship type is transitive over. If P is transitive over Q, and the ontology has X P Y and Y Q Z then it follows that X P Z (term/type level). Equivalent to property chains in OWL2. Cardinality: any.
- is_cyclic
- Whether or not a cycle can be made from this relationship type. If a relationship type is non-cyclic, it is illegal for an ontology to contain a cycle made from user-defined or implied relationships of this type. Allowed values: true or false. Cardinality: zero or one.
- is_reflexive
- Whether this relationship is reflexive. All reflexive relationships are also cyclic. Allowed values: true or false. Term/type level. Cardinality: zero or one.
- is_symmetric
- Whether this relationship is symmetric. All symmetric relationships are also cyclic. Allowed values: true or false. Term/type level. Cardinality: zero or one.
- is_anti_symmetric
- Whether this relationship is anti-symmetric. Allowed values: true or false. Term/type level. Cardinality: zero or one.
- is_transitive
- Whether this relationship is transitive. Allowed values: true or false. Term/type level. Cardinality: zero or one.
- is_metadata_tag
- Whether this relationship is a metadata tag. Properties that are marked as metadata tags are used to record object metadata. Object metadata is additional information about an object that is useful to track, but does not impact the definition of the object or how it should be treated by a reasoner. Metadata tags might be used to record special term synonyms or structured notes about a term, for example. Cardinality: zero or one.
- is_class_level
- Whether this relation is a class-level relation. In OBO-Format, all relationship tags are taken by default to mean an all-some relationship over an instance level relation. This tag is used for other cases, e.g. lacks_part. In OWL this is translated to a hasValue restriction. Cardinality: zero or one.
Tags in an [Instance] Stanza
Required tags
- id
- The unique id of the current term. Cardinality: exactly one.
Optional tags
- name
- The instance name. Cardinality: zero or one. Any instance may have only one name defined.
- instance_of
- The term id that gives the class of which this is an instance. Cardinality: zero or one. If an instance belongs to multiple classes then a class intersection must be declared.
- property_value
-
This tag binds a property to a value in this instance. The value of this tag is a relationship type id, a space, and a value specifier. The value specifier may have one of two forms; in the first form, it is just the id of some other instance, relationship type or term. In the second form, the value is given by a quoted string, a space, and datatype identifier. See IDs for more information on legal datatype identifiers.
[Instance]
id: john
name: John Day-Richter
instance_of: boy
property_value: married_to heather
property_value: shoe_size "8" xsd:positiveInteger
The following optional tags are also allowable for instances. They have exactly the same syntax and semantics as defined in tags in a term stanza:
- is_anonymous
- namespace
- alt_id
- comment
- xref
- synonym
- created_by
- creation_date
- is_obsolete
- replaced_by
- consider
The
replaced_by
andconsider
tags are also allowable for obsolete instances, but they must refer to another instance, rather than another term, to use as a replacement.Parsers and Serializers
General Behavior
All parsers should be capable of failing gracefully and generating errors explaining the failure. Parsers may optionally be capable of generating warnings, if the file being read contains non-fatal errors.
Handling Unrecognized Tags
A parser may do one of several things when an unrecognized tag is found:
- FAIL: report a fatal error and terminate parsing
- WARN: report a warning, but continue parsing and ignore the unrecognized tag
- WARN_AND_RECORD: report a warning, but record the unrecognized tag for later serialization
- IGNORE: silently ignore the unrecognized tag
- RECORD: record the unrecognized tag for later serialization (recommended)
Non-Roundtripping Header Tags
The following optional header tags need not survive round-tripping:
- format-version
- version
- date
- saved-by
- auto-generated-by
They do not need to be round tripped, because the correct values will change when the file is saved.
Dangling References
There are several options when a dangling reference is encountered
- FAIL: report a fatal error and terminate parsing
- WARN_AND_IGNORE: report a fatal error and ignore the dangling reference
- WARN_AND_READ: report a warning and read in the dangling reference, storing it in a form suitable for round-tripping
- READ: silently read and store the dangling relationship (recommended)
Serializer Conventions
Any parser should be able to read correctly formatted files in any layout. However, it is suggested that serializers obey the following conventions to ensure consistency, and to facilitate file comparison (for example in CVS).
General Conventions
- Within a single file, all tags relating to a single entity should appear in the same stanza (thereby minimizing the total number of stanzas and keeping all tags regarding a single entity in the same place)
-
Any time an identifier is referenced (i.e. anywhere other than an id: tag), it should be accompanied by the corresponding
name
value in the comments. See this guide for examples. - In any case where the correct ordering of tags is ambiguous (for example, if there are two tags with the same name, or the ordering is not given in this document), tags should be ordered alphabetically, first on the tag name, then on the tag value.
Stanza Conventions
All new stanza declarations should be preceded by a blank line.
[Typedef]
stanzas should appear before[Term]
stanzas, and[Instance]
stanzas should appear after[Term]
stanzas. All other stanza types should appear after[Instance]
stanzas, in alphabetical order on the stanza name.Header Tags
Header tags should appear in the following order:
- format-version
- data-version
- date
- saved-by
- auto-generated-by
- import
- subsetdef
- synonymtypedef
- default-namespace
- namespace-id-rule
- idspace
- treat-xrefs-as-equivalent
- treat-xrefs-as-genus-differentia
- treat-xrefs-as-relationship
- treat-xrefs-as-is_a
- remark
- ontology
Ordering Term and Typedef stanzas
[Term]
,[Typdef]
, and[Instance]
stanzas should be serialized in alphabetical order on the value of their id tag.Ordering Term and Typedef tags
Term tags should appear in the following order:
- id
- is_anonymous
- name
- namespace
- alt_id
- def
- comment
- subset
- synonym
- xref
- builtin
- property_value
- is_a
- intersection_of
- union_of
- equivalent_to
- disjoint_from
- relationship
- created_by
- creation_date
- is_obsolete
- replaced_by
- consider
Typedef tags should appear in the following order:
- id
- is_anonymous
- name
- namespace
- alt_id
- def
- comment
- subset
- synonym
- xref
- property_value
- domain
- range
- builtin
- holds_over_chain
- is_anti_symmetric
- is_cyclic
- is_reflexive
- is_symmetric
- is_transitive
- is_functional
- is_inverse_functional
- is_a
- intersection_of
- union_of
- equivalent_to
- disjoint_from
- inverse_of
- transitive_over
- equivalent_to_chain
- disjoint_over
- relationship
- is_obsolete
- created_by
- creation_date
- replaced_by
- consider
- expand_assertion_to
- expand_expression_to
- is_metadata_tag
- is_class_level
Instance tags should appear in the following order:
- id
- is_anonymous
- name
- namespace
- alt_id
- def
- comment
- subset
- synonym
- xref
- instance_of
- property_value
- relationship
- created_by
- creation_date
- is_obsolete
- replaced_by
- consider
Dbxref lists
Values in dbxref lists should be ordered alphabetically on the dbxref name.
Clause order
Clauses of the with the same tag should be sorted in the following way:- If tag is intersection_of, first sort the intersection_of clauses into groups by values count, prefer short ones.
- The sort keys are the clause values and their string representation. The sort order is alphabetically and case-insensitive.
- If two values are equal, alphabetically, case-sensitive
Changes in 1.4
Changes that break forwards compatibility
- All string values are treated as instances of rdf:text. String values that lack '@' a followed by a valid language tag are treated as if they have a trailing '@'. Strings that should genuinely contain the '@' character must escape it. This is technically a forwards-incompatible change in that it changes the semantics of strings. However, it is syntactically forwards compatible.
- The semantics of inverse_of were not clear in 1.2. In 1.4, relations are instance-level and this tag has the same meaning as the InverseProperties in OWL2.
- Use of the backslash character "\" to split a tag over multiple lines was never used in practice and has been deprecated.
- Unicode support. TODO.
New Header Tags
Header Macros
- treat-xrefs-as-equivalent
- treat-xrefs-as-genus-differentia
- treat-xrefs-as-relationship
- treat-xrefs-as-is_a
- relax-unique-identifier-assumption-for-namespace
- relax-unique-label-assumption-for-namespace
Definitional Expressions
ID Definitional Expressions added. E.g. GO:0005737^part_of(CL:0000023) can be used wherever one wants to say "cytoplasm of oocyte". This is treated as if it has the following definition:
[Term]
id: GO:0005737^part_of(CL:0000023)
intersection_of: GO:0005737 ! cytoplasm
intersection_of: part_of CL:0000023 ! oocyteThis is known as post-composition. We can refer to an unnamed entity (i.e. one with no ID in any ontology) by describing it via a logical expression. The The Obolog document for the formal semantics of these expressions.
Relation Tags
Many of these are advanced features that can safely be ignored by parsers.
- holds_over_chain
-
See Relation
Composition. This is an
extension of the
transitive_over tag,
introduced in 1.2. The
equivalent construct in OWL is
a property chain.
holds_over_chain(R R1 R2) ==> R1(x y) R2(y z) -> R(x z)
- equivalent_to_chain
-
See Relation
Composition. This is an
extension of the
transitive_over tag,
introduced in 1.2.
equivalent_to_chain(R R1 R2) ==> R1(x y) R2(y z) -> R(x z)
equivalent_to_chain(R R1 R2) ==> R(x z) -> exists y: R1(x y) R2(y z)
- disjoint_over
-
For
example: spatially_disconnected_from
is disjoint_over part_of, in
that two disconnected entities
have no parts in common. This
can be translated to OWL as:
disjoint_over(R S), R(A B) ==> (S some A) disjointFrom (S some B)
- intersection_of
-
Previously, this tag could only be used in [Term] stanzas, to define types/classes/universals/patterns. Now it can be used to define relations. For example, we can define a temporal relation coincides_with as being true if both start end end boundaries are shared.
[Typedef]
id: coincides_with
intersection_of: has_same_start_as
intersection_of: has_same_end_as - union_of
- Previously, this tag could only be used in [Term] stanzas, to define types/classes/universals/patterns. Now it can be used to define relations
- is_functional
- Relation acts like a function. E.g. any entity only relates to one other entity by this relation. Identical to OWL FunctionalProperty
- is_inverse_functional
- Like is_functional, but the opposite "direction". Identical to OWL InverseFunctionalProperty
Tags for either relations or types/classes
- equivalent_to
- Used to specify exact equivalence between two instances, types or relations . Identical to OWL EquivalentClasses
OBO and OWL
The formal specification of the mapping between OBO format syntax and OWL is provided in the The OBO Syntax and Semantics document. The intention is that OBO-Format can be regarded as a syntax expressing a subset of OWL2.
Translating basic relationships to OWL
An
is_a
tag is translated to a subclass axiom, and all relationships are treated by default as all-some (existential). See later for advanced features for representing other kinds of quantification.[Term]
translates to
id: GO:0000085
name: G2 phase of mitotic cell cycle
is_a: GO:0051319 ! G2 phase
Class: GO_0000085
Annotations: label 'G2 phase of mitotic cell cycle'@en
SubClassOf: GO:0051319
[Term]
translates to
id: GO:0000085
name: G2 phase of mitotic cell cycle
relationship: part_of GO:0051329 ! interphase of mitotic cell cycle
Class: GO_0000085
Annotations: label 'G2 phase of mitotic cell cycle'@en
SubClassOf: part_of some GO_0051329
Translating intersection_of tags to OWL
A collection of OBO
intersection_of
tags are translated to an owl intersection construct. There is an implicit equivalence axiom.[Term]
translates to
id: GO:0000085
name: G2 phase of mitotic cell cycle
intersection_of: GO:0051319 ! G2 phase
intersection_of: part_of GO:0000278 ! mitotic cell cycle
Class: GO_0000085 Annotations: label 'G2 phase of mitotic cell cycle'@en EquivalentTo: GO:0051319 and part_of some GO_0000278
Translating OBO metadata
The IAO ontology-metadata ontology is used for metadata.
[Term] id: GO:0000087 name: M phase of mitotic cell cycle namespace: biological_process def: "M phase occurring as part of ...." [GOC:dph, GOC:mah, ISBN:0815316194] synonym: "M-phase of mitotic cell cycle" EXACT [] xref: Reactome:1006743 "M Phase" is_a: GO:0000279 ! M phase intersection_of: GO:0000279 ! M phase intersection_of: part_of GO:0000278 ! mitotic cell cycle relationship: part_of GO:0000278 ! mitotic cell cycle
translates toClass: GO_0000087
Annotations:
label 'M phase of mitotic cell cycle'@en
IAO_TODO ??? 'M-phase of mitotic cell cycle'@en
IAO_TODO Reactome_1006743
IAO_TODO (IAO_nnnn GOC_dph) TODO "M phase occurring as part of ...."@en
Quantification of restrictions
- All property-class (P,C) pairs will by default be treated as SomeValuesFrom(P,C)
- An exception is made if cardinality is specified, then the obvious cardinality OWL2 construct(s) are used
- If a property is declared class-level in the Typedef stanza via an is_class_level true tag, then the pair is treated as HasValue(P,C). The rationale for this is explained in the macros document.
OWL Macros
The following two tags can be used within Typedef stanzas:
expand_expression_to
maps to an AnnotationAssertion(IAO_0000424 P V)expand_assertion_to
maps to an AnnotationAssertion(IAO_0000425 P V)
Extensions to OWL2
- OWL2 prohibits cardinality constraints on transitive object properties, but obof1.4 allows this.
- obof1.4 allows property chains in both
directions via
the
equivalent_to_chain
tag. - obof1.4 allows property definitions via intersection and union constructs.