# About * Status: INCOMPLETE, DRAFT This is a proposed new exchange format for GO annotations. It is designed such that "classic" GO annotations (with or without annotation extensions) can be exchanged, as well as complex interlinked "lego"-style annotations. The exchange format is JSON-LD, which is simply JSON that is accompanied by a mapping (or "context" in JSON-LD terminology) that describes how the JSON translates to RDF. The JSON can thus be immediately consumed by many OWL-aware tools and used for reasoning without an intermediate piece of custom translation code. Even if no translation is performed, JSON-LD can be used as a standard way to document conventions used in JSON formats. The model used is not "association-based" - it is instead based around the concept of the *activity* of a gene product (see "lego" white paper for more details). (This does not preclude the use of a simpler more "association-based" JSON-LD/RDF exchange format, e.g. for linked data). By modeling the biology we can naturally extend the model to encompass more expressivity, and reasoning will yield useful biological inferences. This representation is identical to the one used for "GO instance graphs" (aka LEGO), examples can be seen in the owl/ directory. ## Conventions used in this document ### Reasoning and OPTIONAL fields Many of the fields marked OPTIONAL will be filled in by a reasoner. For example, the part_of field is marked optional: OPTIONAL part_of: * If we have an asserted set of triples: { type: "GO:0001065", ## mitochondrial single subunit type RNA polymerase activity enabled_by: { type: "PR:123" }} Then a reasoner can fill in: { type: "GO:0001065", ## mitochondrial single subunit type RNA polymerase activity part_of: { type: "GO:0007005" } ## mitochondrion organization occurs_in: { type: "GO:0005739"} ## mitochondrion enabled_by: { type: "PR:123" }} Based on axioms in the core GO. ## JSON-LD Contexts These are the two contexts that should be used * http://geneontology.org/contexts/core.jsonld * http://geneontology.org/contexts/lego.jsonld <-- IN PROGRESS Standard conventions are used * id is shorthand for the IRI * type is mapped to rdf:type * label is mapped to rdfs:label * ... ## blank nodes, anonymous individuals and class expressions A consequence of the existing representation is that numerous bNodes (anon individuals) are generated. These can be arbitrarily named. Alternately, they can be replaced by class expressions. E.g. the JSON-LD: { id: FOO:1, part_of: { type: "GO:1234567" } } Is the same as the triples: FOO:1 part_of: _:1 _:1 type GO:1234567 Which, if no other statements are provided about this bnode, is equivalent to the OWL: Individual: FOO:1 Types: part_of some GO:1234567 This latter representation is how things are currently conventionally encoded in the OWL examples in the lego directory. ## Examples TODO ## Translators OWLTools can be used to translate a classic GAF (1 or 2) to RDF that is semantically equivalent to the JSON-LD specified here. Standard JSON-LD converters can be used to translate and then "compact" these using the contexts above. # Specification This specification is divided into (1) annotations and (2) "in-vivo biology". The annotation points to the biological assertion and contains metadata about that assertion. The idea is that the core structures below can be communicated embedded in other structures - e.g. list of annotations could be top level. AnnotationDocument ::= { objects : [ (Annotation|FunctionInstanceObject|ProcessInstanceObject)* ] } ## Annotations An annotation is metadata about a biological assertion Annotation ::= { OPTIONAL id: , type: "Annotation", evidence : , source : , reference : , date_of_last_update : , describes : } The string "Annotation" is mapped to an OWL class in the LD-context. TBD: is this from SIO, IAO? SIO has "association" which is not the same concept. We could have subclasses for "GOAnnotation" which may allow additional document checks. Every GO annotation is about what a molecular entity does - this may be indirect - e.g. a CC annotation is ultimately about the location in which the gene product is activated. ### Evidence Evidence ::= { OPTIONAL id: , ## unique ID for this particular evidence usage; this is unlikely to be used type : , ## from ECO source : ## alt ids? } A typical evidence instance may look like this: { type: "Annotation", reference: ... evidence: { type: "ECO:12345", }, describes : .. } ## Biology ### Function Instance Every GO Annotation is either explicitly or implicitly about the execution of a molecular function. A basic GO annotation may be explicitly about a location or process, but biologically it is about the location of the execution or the process which the execution is a part of. Because basic GO annotations are always about one aspect of the GO, many of the fields are optional. However, in future a single GO annotation may be simultaneously about all 3 aspects. FunctionInstanceObject ::= { OPTIONAL id: , OPTIONAL type: { type: , OPTIONAL * } OPTIONAL part_of: * OPTIONAL occurs_in: * OPTIONAL : OPTIONAL described_by: * enabled_by: } Notes on basic GO annotations: * If the GO annotation is a BASIC annotation, with an aspect A of * F: type field is MANDATORY, part_of and occurs_in are EXCLUDED * P: type field in MANDATORY, type and occurs_in are EXCLUDED * C: occurs_in field in MANDATORY, type and part_of are EXCLUDED * If the GO annotation is "LEGO", then any combination can be filled in This can be described more precisely: FunctionInstanceObject ::= BasicFunctionInstance | FreeFormFunctionInstance BasicFunctionInstance ::= MFBasicFunctionInstance | BPBasicFunctionInstance | CCBasicFunctionInstance MFBasicFunctionInstance ::= { type: { type: , OPTIONAL * } } BPBasicFunctionInstance ::= { part_of: { type: , OPTIONAL * } } CCBasicFunctionInstance ::= { occurs_in: { type: , OPTIONAL * } } Note that a basic GO annotation can be transformed through reasoning such that other fields are filled in, based on axioms in the core GO. #### Example of basic MF annotation object { type: "Annotation", reference: "PMID:1234", evidence: { type: "ECO:12345", }, describes : { type: "GO:nnnn" , ## MF enabled_by: { type: "UniProtKB:nnn"} } } #### Example of basic BP annotation { type: "Annotation", ... describes : { part_of: { type: "GO:nnnn" }, ## BP enabled_by: { type: "UniProtKB:nnn"} } } #### Example of basic CC { type: "Annotation", ... describes : { occurs_in: { type: "GO:nnnn" }, ## CC enabled_by: { type: "UniProtKB:nnn"} } } ### Example of non-basic combined MF/BP/CC Here we have an experient that shows an activity in a processual context in a cellular location { type: "Annotation", reference: "PMID:1234", evidence: { type: "ECO:12345", }, describes : { type: "GO:nnnn" , ## MF part_of: { type: "GO:nnnn" }, ## BP occurs_in: { type: "GO:nnnn" }, ## CC enabled_by: { type: "UniProtKB:nnn"} } } TODO - decide whether to invert the nesting between annotation and the function instance. E.g. { type: "GO:nnnn" , ## MF part_of: { type: "GO:nnnn" }, ## BP occurs_in: { type: "GO:nnnn" }, ## CC enabled_by: { type: "UniProtKB:nnn"}, ## i.e. gene product describedBy: { reference: "PMID:123456", evidence: { type: "ECO:0000001", with: "XXXX" } } } If we wish to connect modular annotations then we should give them IDs. These could be blank nodes (i.e. IDs that are just "local pointers" within the document, but this could be confusing). { id: "GOINST:678" type: "GO:nnnn" , ## MF part_of: { type: "GO:nnnn" }, ## BP occurs_in: { type: "GO:nnnn" }, ## CC enabled_by: { type: "UniProtKB:nnn"}, activates : "MAGO:543", describedBy: { reference: "PMID:123456", evidence: { type: "ECO:0000001", with: "XXXX" } } } ### Qualifiers TODO - qualifiers can make exceptions to some of the rules above ### Biological Processes BiologicalProcessInstanceReference ::= | { type: } An instance ID is provided if we wish to connect multiple sub-processes or function instances to the same individual. ### Locations LocationInstance ::= { OPTIONAL id: , OPTIONAL type: { type: , OPTIONAL * } } Here LocaltionClassId is any ClassId (see Generic section below) - this should be an Id for a non-obsolete class in GO-CC LocationExtension ::= { : } ## Generic ClassId ::= + ":" + The IDSpace MUST be provided either in the core JSON-LD context or in a local one. InstanceGraph ::= { OPTIONAL id: type: OPTIONAL } # Formal notes ## Annotation Extensions Note that GAF2 extension expressions are treated as semantically equivalent graph structures using bNodes (existential variables) rather than OWL class level existential retsrictions.