This is the GO Database Schema documentation, in Autodoc format. Everything on this page has been automatically generated from the schema metadata. For more background on the schema, please read GO Database documentation.
note on modules: The tables are partitioned into modules for the purposes of Autodoc. These modules are invisible in the actual instantiations of the database, so the table "go_graph.term" is just simply "term"
This documentation was generated on 2010-02-08
Annotation model: An association is a link between a gene product record and an ontology term, with one or more pieces of evidence *** IMPORTANT: NOT all associations are positive: some posit negative links. THESE SHOULD TYPICALLY BE FILTERED OUT FOR MOST ANALYSIS PURPOSES. See the is_not column ***
F-Key | Name | Type | Description |
---|---|---|---|
id | serial | PRIMARY KEY | |
go_graph.term.id | term_id | integer |
NOT NULL
the (GO) term to which the gene_product is associated |
go_associations.gene_product.id | gene_product_id | integer |
NOT NULL
the gene or gene_product to which the term is associated |
is_not | integer |
** IMPORTANT ** when this field is non-zero, the meaning of the annotation is that the gene_product does NOT have the role defined by the GO term (column 4 = NOT in the gene-association file) See also: association_qualifier table |
|
assocdate | integer |
a date in YYYYMMDD format. This is the date the association was last checked the source db providers (column 14 in the gene-association file) |
|
go_general.db.id | source_db_id | integer |
the source of the association; for instance, the association file may come from SwissProt, but the source of the association (Example: SGD) (Example: MGI) (column 15 = NOT in the gene-association file) (docs: http://www.geneontology.org/cgi-bin/xrefs.cgi) |
Tables referencing this one via Foreign Key Constraints:
a1 term_id a2 gene_product_id a3 term_id, gene_product_id a4 id, term_id, gene_product_id a5 id, gene_product_id a6 is_not, term_id, gene_product_id a7 assocdateIndex - Schema go_associations
(column 16 in the gene-association file) (see http://wiki.geneontology.org/index.php/Annotation_Cross_Products)
F-Key | Name | Type | Description |
---|---|---|---|
id | serial | PRIMARY KEY | |
go_associations.association.id | association_id | integer | NOT NULL |
go_graph.term.id | relationship_type_id | integer |
NOT NULL
e.g. the term.id that has acc='part_of' |
go_graph.term.id | term_id | integer |
NOT NULL
e.g. the term.id that has acc='CL:0000017' |
Index - Schema go_associations
associations can have a number of qualifiers. These include, but are not limited to the NOT qualifier (which technically is not a qualifier at all as it fundamentally changes the semantics of an association) note that this table IS redundant with association.is_not this is intentional - negation is important enough to go directly in the association table. It also goes in this table for consistency with the gene association file (column 4 in the gene-association file) (docs: http://www.geneontology.org/GO.format.annotation.shtml)
F-Key | Name | Type | Description |
---|---|---|---|
id | serial | PRIMARY KEY | |
go_associations.association.id | association_id | integer | NOT NULL |
go_graph.term.id | term_id | integer |
NOT NULL
qualifiers come from their own terminology |
value | character varying(255) |
qualifiers can potentialy be tag=value pairs. however, all qualifiers are currently boolean tags, so this column is always null |
Index - Schema go_associations
(see http://www.geneontology.org/GO.annotation.shtml#manySpp) (column 13 in gene_association file, ONLY WHEN card>1, this is the next entry) to be used only in conjunction with terms that have the term 'interaction between organisms' as an ancestor. gene_product.species_id is for the organism type encoding the gene or gene product, association_species_qualifier.species_id should be that of the other organism in the interaction. aka "dual taxon"
F-Key | Name | Type | Description |
---|---|---|---|
id | serial | PRIMARY KEY | |
go_associations.association.id | association_id | integer | NOT NULL |
go_associations.species.id | species_id | integer |
The species of the interacting organism (eg host) Example: in cytolysis of cells of another organism (GO:0051715) this would be the species playing the 'other organism' role |
Index - Schema go_associations
each association can have one or more pieces of evidence attached to it (the schema actually allows zero or more, but with GO all annotation have at least one piece of evidence) (doc: http://www.geneontology.org/GO.evidence.shtml)
F-Key | Name | Type | Description |
---|---|---|---|
id | serial | PRIMARY KEY | |
code | character varying(8) |
UNIQUE#1
NOT NULL
a string code (typically 3-letter) corresponding to a GO evidence code. (column 7 in the gene-association file) (Example: IEA - inferred from electronic annotation) (Example: IMP - inferred from mutant phenotype) evidence codes may eventually become "ontologized", allowing us to take full advantage of the OBO evidence ontology: http://www.obofoundry.org/cgi-bin/detail.cgi?evidence_code |
|
go_associations.association.id | association_id | integer | UNIQUE#1 NOT NULL |
go_general.dbxref.id | dbxref_id | integer |
UNIQUE#1
NOT NULL
A reference for the annotation. Typically a pubmed ID (column 6 in the gene-association file) |
seq_acc | character varying(255) |
a denormalised field containing a "|" separated list of accession supporting the call. for the normalised data, use evidence_dbxref (column 8 in the gene-association file, copied identically) |
Tables referencing this one via Foreign Key Constraints:
ev1 association_id ev2 code ev3 dbxref_id ev4 association_id, codeIndex - Schema go_associations
each piece of evidence can have multiple dbxrefs associated with it; this is the *normalised* version of the "With" or "From" field of the evidence (column 8 in the gene-association file, normalized)
F-Key | Name | Type | Description |
---|---|---|---|
go_associations.evidence.id | evidence_id | integer | NOT NULL |
go_general.dbxref.id | dbxref_id | integer |
NOT NULL
globally unique identifier for the evidence (Example: GO:0000346 - will be stored as DB=GO, Acc=0000346) |
Index - Schema go_associations
Represents a gene or gene_product, typically at the species level. GO allows for annotation of genes OR gene products. Annotation of a gene is understood to be "proxy" for annotation of the corresponding gene products. (docs: http://www.geneontology.org/GO.annotation.fields.shtml) (docs: http://www.geneontology.org/GO.annotation.shtml#file) (Example: human p53-gene) (Example: human p53-protein)
F-Key | Name | Type | Description |
---|---|---|---|
id | serial | PRIMARY KEY | |
symbol | character varying(128) |
NOT NULL
concise label for this gene product (Example: p53) (Example: BRCA) (Example: PHO3) typically unique within an originating database authority, but not guaranteed; (an example of such as authority is FlyBase or UniProt) |
|
go_general.dbxref.id | dbxref_id | integer |
UNIQUE
NOT NULL
A globally unique identifier for this gene or gene product. All (non-GO) unique identifiers are stored as dbxrefs - they must consist of both a DB and a DB_Object_ID (Example: SGD:S000000296) (column 1 and 2 in gene_association file) |
go_associations.species.id | species_id | integer |
The species or taxon to which this gene product belongs (Note: in future we reserve the option to use gene_product to represent families at higher levels in the taxonomic tree above species) (column 13 in gene_association file; if card>1, this is the first entry) |
go_graph.term.id | type_id | integer |
gene_product type (eg gene, transcript, protein, complex) (column 13 in the gene-association file) the type term may correspond to a SO ID, but typically SO is not loaded an an ontology into the GO database |
full_name | text |
symbol is typically a concise label, full_name may be more textual (column 10 in the gene-association file) |
Tables referencing this one via Foreign Key Constraints:
g1 symbol g2 dbxref_id g3 species_id g4 id, species_id g5 dbxref_id, species_id g6 id, dbxref_id g7 id, species_id g8 id, dbxref_id, species_idIndex - Schema go_associations
(aka goslims). Each subsetdef (slim) is stored as a gene_product in the database (with term_type = 'subset') The subset_id links to this term (OBO-Format: *subset* tag. term_id references a term housing the *subsetdef*)
F-Key | Name | Type | Description |
---|---|---|---|
go_associations.gene_product.id | gene_product_id | integer | NOT NULL |
go_graph.term.id | subset_id | integer | NOT NULL |
Index - Schema go_associations
alternate label for the gene or gene product (column 11 in the gene-association file)
F-Key | Name | Type | Description |
---|---|---|---|
go_associations.gene_product.id | gene_product_id | integer | UNIQUE#1 NOT NULL |
product_synonym | character varying(255) |
UNIQUE#1
NOT NULL
alternate label. Typically NOT redundant with gene_product.symbol or gene_product.full_name, but this is not guaranteed (column 11 in the gene-association file) |
Index - Schema go_associations
Linnaean taxonomic information for an organism type. Modeled after NCBI Taxonomy (Note: the name of the table is misleading, as it can model ANY node in Linnaen taxonomy) (The table should be better called "taxon")
F-Key | Name | Type | Description |
---|---|---|---|
id | serial | PRIMARY KEY | |
ncbi_taxa_id | integer |
UNIQUE
identifier within the NCBI Taxonomy database. (Example: Dmel=7227) (Example: S Cerevisae=4932) |
|
common_name | character varying(255) |
Non-scientific name (Example: fruitfly) |
|
lineage_string | text |
denormalized list of taxon names as text. (Note: not currently populated) |
|
genus | character varying(55) |
If the row in the table is genuinely a species, this column is for storing the "genus" in the Linnaean system. If the row is a higher taxon, then this column is for the scientific name of that taxon. (Example: Drosophila -- for leaf node taxon) (Example: Homo -- for leaf node taxon) (Example: Metazoa -- for non-leaf node taxon) unfortunately the name of this column is misleading. However, it will be retained for backwards compatibility |
|
species | character varying(255) |
If the row in the table is genuinely a species, this column is for storing the "species" name in the Linnaean system. If the row is a higher taxon, this column is null (Example: sapiens) (Example: pombe) unfortunately the name of this column is misleading. However, it will be retained for backwards compatibility note that (genus,species) is not declared unique |
|
parent_id | integer |
parent taxon in hierarchy (direct parent - for indirect parents see left_value and right_value) |
|
left_value | integer |
left_value, right_value implement a nested set model see http://www.oreillynet.com/pub/a/network/2002/11/27/bioconf.html or Joe Celko's "SQL for smarties" for more information. |
|
right_value | integer |
see left_value |
|
taxonomic_rank | character varying(255) |
eg species, family, phylum, ... |
Tables referencing this one via Foreign Key Constraints:
sp1 ncbi_taxa_id sp10 right_value sp11 left_value, right_value sp12 id, left_value sp13 genus, left_value, right_value sp2 common_name sp3 genus sp4 species sp5 genus, species sp6 id, ncbi_taxa_id sp7 id, ncbi_taxa_id, genus, species sp8 parent_id sp9 left_valueIndex - Schema go_associations
metadata on this particular instance/build of the GO database
F-Key | Name | Type | Description |
---|---|---|---|
release_name | character varying(255) |
UNIQUE
Typically named by date/version in YYYY-MM-DD format |
|
release_type | character varying(255) |
One of: termdb (ontology only) assocdb (termdb + associations) seqdb (assocdb + sequences) seqdblite (seqdb - IEAs) |
|
release_notes | text |
notes specific to this release. Will typically be null unless this release is unusual in some way |
|
ontology_data_version | character varying(255) |
data-version tag from the header of the obo file (not yet implemented) |
time of last modification of data source (usually type:file) source_path is a file path or name
F-Key | Name | Type | Description |
---|---|---|---|
source_id | character varying(255) | ||
source_fullpath | character varying(255) | ||
source_path | character varying(255) | ||
source_type | character varying(255) | ||
source_md5 | character(32) | ||
source_parsetime | integer | ||
source_mtime | integer |
not in use
F-Key | Name | Type | Description |
---|---|---|---|
go_graph.term.id | term_id | integer | UNIQUE NOT NULL |
term_loadtime | integer |
metadata on the different database / accession granting authorities. the data should come from the GO.xref_abbs file. (docs: http://www.geneontology.org/cgi-bin/xrefs.cgi) the dbname is the abbreviation, and should match dbxref.xref_dbname however we have no foreign key so not every dbxref.xref_dbname will have an entry here most columns will not be populated in the short term - the other fields are for future expansion
F-Key | Name | Type | Description |
---|---|---|---|
id | serial | PRIMARY KEY | |
name | character varying(55) | UNIQUE | |
fullname | character varying(255) | ||
datatype | character varying(255) | ||
generic_url | character varying(255) | ||
url_syntax | character varying(255) | ||
url_example | character varying(255) | ||
uri_prefix | character varying(255) |
Tables referencing this one via Foreign Key Constraints:
db1 name db2 fullname db3 datatypea unique bipartite identifier for a record typically housed in an external database. dbxrefs are of the form DB_ID:Local_ID. the combination of DB and local identifier is unique: it is up to the DB_ID authority to ensure Local_ID is unique within their ID space This table is referenced from a variety of other table: (1) as an xref for a sequence (*seq_dbxref* table) (2) as a primary identifier for a gene product (gene_product != seq except in the case of GOA) (*gene_product.dbxref_id* column) (3) as an xref for a term or term definition (eg swissprot keywords) (*term_dbxref* table) (4) as evidence for an association based on sequence evidence (*evidence* or *evidence_dbxref* table) (docs: http://www.geneontology.org/cgi-bin/xrefs.cgi)
F-Key | Name | Type | Description |
---|---|---|---|
id | serial | PRIMARY KEY | |
xref_dbname | character varying(55) |
UNIQUE#1
NOT NULL
The name of the database or ID-granting authority from which information concerning this dbxref can be retrieved. It is recommended that this comes from the set here: http://www.geneontology.org/cgi-bin/xrefs.cgi; however, this is not enforced at the schema level. There *may* be a corresponding entry in the *db* table, housing metadata on this dbname but this is not enforced as a foreign key reference (Example: FB) (Example: SGD) (Example: UniProt) (Example: PubMed) |
|
xref_key | character varying(255) |
UNIQUE#1
NOT NULL
The local identifier that is unique within xref_dbname (Example: FBgn0000001) |
|
xref_keytype | character varying(32) |
DEPRECATED. Was in principle used for what "type" the xref was - eg symbol vs ID |
|
xref_desc | character varying(255) |
DEPRECATED optional description of dbxref |
Tables referencing this one via Foreign Key Constraints:
dx1 xref_dbname dx2 xref_key dx3 id, xref_dbname dx4 id, xref_key, xref_dbname dx5 id, xref_key(See http://wiki.geneontology.org/index.php/Relation_composition) Stores rules of the form: r1 . r2 -> r i.e. IF [ X r1 Y ] AND [ Y r2 Z ] THEN [ X r Z ] Corresponds to "transitive_over" and "holds_over_chain" tags in obo-format.
F-Key | Name | Type | Description |
---|---|---|---|
id | serial | PRIMARY KEY | |
go_graph.term.id | relation1_id | integer |
UNIQUE#1
NOT NULL
The first relation in the pairwise composition. References an entry in the term table. (recall that the term table housed both the terms themselves, and the relations) |
go_graph.term.id | relation2_id | integer |
UNIQUE#1
NOT NULL
The second relation in the pairwise composition. References an entry in the term table. (recall that the term table housed both the terms themselves, and the relations) |
go_graph.term.id | inferred_relation_id | integer |
UNIQUE#1
NOT NULL
The inferred relation in the pairwise composition. References an entry in the term table. (recall that the term table housed both the terms themselves, and the relations) |
F-Key | Name | Type | Description |
---|---|---|---|
go_graph.term.id | relationship_type_id | integer |
UNIQUE
NOT NULL
The first relation in the pairwise composition. References an entry in the term table. (recall that the term table housed both the terms themselves, and the relations) |
is_transitive | integer |
equals 1 if this relation is transitive. IF [X R Y] AND [Y R Z] AND [R is_transitive] THEN [X R Z] (OBO-Format: *is_transitive* tag) valid values: 0 or 1 |
|
is_symmetric | integer |
equals 1 if this relation is symmetric. IF [X R Y] AND [R is_symmetric] THEN [Y R X] (OBO-Format: *is_symmetric* tag) valid values: 0 or 1 |
|
is_anti_symmetric | integer |
equals 1 if this relation is anti_symmetric. IF [X R Y] AND [Y R X] AND [R is_anti_symmetric] THEN [X=Y] (OBO-Format: *is_anti_symmetric* tag) valid values: 0 or 1 |
|
is_cyclic | integer |
equals 1 if this relation is cyclic. (OBO-Format: *is_cyclic* tag) valid values: 0 or 1 |
|
is_reflexive | integer |
equals 1 if this relation is reflexive. IF [R is_reflexive] THEN [X R X] (OBO-Format: *is_reflexive* tag) valid values: 0 or 1 |
|
is_metadata_tag | integer |
equals 1 if this relation is metadata_tag. IF [X R Y] AND [R is_metadata_tag] THEN [Y R X] (OBO-Format: *is_metadata_tag* tag) valid values: 0 or 1 |
Fundamental representational unit in a controlled vocabulary or ontology. *Terms* form the nodes in the ontology graph (structured using the *term2term* table). An example of a term in GO is GO:0009333 "cysteine synthase complex". Each entry in the term table corresponds to a distinct type (kind, class) of entity in reality. Note that the term table is also used for storing *relations*. The two fundamental relations in GO are "is_a" and "part_of". Relations comprise the "labels" of the edges of the ontology graph. Note the term "term" is misleading in that this table is *not* used for storing synonyms and alternate labels; only the *preferred* terms, corresponding to nodes in the ontology graph each representing a distinct type of entity. In OBO-Format, both Term and Typedef (relation) stanzas are housed in the term table. (doc: http://www.geneontology.org/GO.format.obo-1_2.shtml)
F-Key | Name | Type | Description |
---|---|---|---|
id | serial | PRIMARY KEY | |
name | character varying(255) |
NOT NULL
DEFAULT ''::character varying
A textual label for the term. Each term has a single such label (see *term_synonym* for alternate labels). The name should be unique within an ontology (in fact uniqueness is encourage across ontologies - the principle of univocity. However, this is not enforced at the database schema level). the uniqueness recommendation is relaxed in the case of obsolete terms, which are also housed in this table: there can be many "ex-terms" with the same name. In some alternate systems, *term.name* is also known as the "preferred term" (OBO-Format: *name* tag) (Example: "cysteine biosnthetic process") |
|
term_type | character varying(55) |
NOT NULL
The ontology or namespace to which this term belongs (OBO-Format: *namespace* tag) (Example: biological_process) (Note: the column name is somewhat misleading, but is retained for historical reasons. It would be better named "namespace" or "ontology") The namespace for GO terms will always be molecular_function, biological_process or cellular_component. The relations defined in the main GO obo file (from which this table is populated) go into the gene_ontology namespace, with the exception of is_a, which has namespace "relationship" (taken from the obo relation ontology). is_a is a builtin relation as far as obo is concerned, so it does not go in the gene_ontology namespace |
|
acc | character varying(255) |
UNIQUE
NOT NULL
The unique identifier for this term. This should be in OBO bipartite ID format, and should be unique within OBO, but this is not enforced at the schema level. (Example: GO:0019344) (OBO-Format: *id* tag) |
|
is_obsolete | integer |
NOT NULL
equals 1 if this row corresponds to an obsoleted "ex-term". Note that obsoletes are not terms in the true sense, but we house them in the same table as this is the most expedient for the kinds of queries people wish to perform. (OBO-Format: *is_obsolete* tag) valid values: 0 or 1 |
|
is_root | integer |
NOT NULL
equals 1 if this term is the root term in the ontology graph. Note that in some instantiations of the GO database "fake" root nodes are added (OBO-Format: No correspoding tag) |
|
is_relation | integer |
NOT NULL
equals 1 if this term is a relation (relationship type) (OBO-Format: Typedef stanzas) |
Tables referencing this one via Foreign Key Constraints:
t1 name t2 term_type t3 acc t4 id, acc t5 id, name t6 id, term_type t7 id, acc, name, term_typeEach entry in this table corresponds to an arc/edge in the ontology graph, which represents a relationship that holds between two entities in reality. Graphs are often thought of in terms of parent-child links; with this conception term1_id is the parent and term2_id is the child. However, it may be better to think of each edge as a *statement*, each statement being about a subject and it's relationship to some other entity. For example, a part_of edge between terms "nucleus" and "cell" is a statement about cell nuclei in general, namely that all nuclei are part_of some cell. (the statement is *not* a general statement about cells: not all cells have a nucleus) Here the "subject" of the statement corresponds to term2_id. EXAMPLE: if term1_id points to nucleic acid binding AND term2_id points to DNA binding AND relationship_type points to "is_a" THEN we have a statement "DNA binding is_a nucleic acid binding" (OBO-Format: *is_a* tag or *relationship* tag)
F-Key | Name | Type | Description |
---|---|---|---|
id | serial | PRIMARY KEY | |
go_graph.term.id | relationship_type_id | integer |
UNIQUE#1
NOT NULL
References an entry in the term table corresponding to the relation that holds between term2 and term1 (recall that the term table housed both the terms themselves, and the relations) (Example: a reference to a row "part_of" in the term table) |
go_graph.term.id | term1_id | integer |
UNIQUE#1
NOT NULL
the "parent" node of the edge. For example, in the edge corresponding to "nucleus part_of cell", (all nuclei are part_of some cell) term1_id is "cell" |
go_graph.term.id | term2_id | integer |
UNIQUE#1
NOT NULL
the "child" node of the edge. For example, in the edge corresponding to "nucleus part_of cell", (all nuclei are part_of some cell) term2_id is "nucleus" |
complete | integer |
UNIQUE#1
NOT NULL
equals 1 if this edges comprises an element of the *complete definition*, a set of necessary and sufficient conditions. Note that this field is always =0 for current publically deployed instantiations of the GO database, but is currently used in experimental instantiations for housing so called "cross-products". Example: the term "cysteine metabolism" is completely defined by the edges "is_a metabolism" and "has_participant cysteine" - which is to say anything that satisfies these two conditions is by definition an instance of cysteine metabolism. "complete" links/edges may be provided in addition to normal links/edges, even though these may be partially redundant. Formally: if complete=0, the edge states a NECESSARY CONDITION for term2_id. The set of all edges with complete=1 for any term2_id states the NECESSARY AND SUFFICIENT CONDITIONS for that term. Although currently unused in publically deployed GO DB instances, software using the schema may wish to explicitly query for complete=0 to ensure "normal" graph links are returned as results. |
connects two nodes in a phylogenetic tree
F-Key | Name | Type | Description |
---|---|---|---|
go_associations.gene_product.id | gene_product_id | integer | UNIQUE#1 NOT NULL |
go_associations.gene_product.id | ancestor_id | integer | UNIQUE#1 NOT NULL |
go_homology.phylotree.id | phylotree_id | integer | UNIQUE#1 NOT NULL |
branch_length | double precision | ||
is_transitive | integer | NOT NULL |
F-Key | Name | Type | Description |
---|---|---|---|
go_associations.gene_product.id | gene_product1_id | integer | NOT NULL |
go_associations.gene_product.id | gene_product2_id | integer | NOT NULL |
go_graph.term.id | relationship_type_id | integer |
NOT NULL
References an entry in the term table corresponding to the relation that holds between 1 and 2 |
a set-member relation between a gene product and the homolset to which it belongs. the relation should, where possible, be supported by individual homology-based relations [TODO] REQUIRED FOR REFERENCE GENOMES PROJECT
F-Key | Name | Type | Description |
---|---|---|---|
id | serial | PRIMARY KEY | |
go_associations.gene_product.id | gene_product_id | integer | NOT NULL |
go_homology.homolset.id | homolset_id | integer | NOT NULL |
a set-member relation between a gene product and the phylogenetic tree to which it belongs. This is a new table - this will be populated from PANTHER datasets
F-Key | Name | Type | Description |
---|---|---|---|
id | serial | PRIMARY KEY | |
go_associations.gene_product.id | gene_product_id | integer | NOT NULL |
go_homology.phylotree.id | phylotree_id | integer | NOT NULL |
A collection of genes or gene products from a common evolutionary lineage. The purpose of this table is to group homologous sets of gene products to query for shared and divergent biological function. The table is neutral with respect to the method used to determine evolutionary relations; instead it represents the derived results of some kind of some analysis. A homolset may also be derived from a collection of analyses. The table is also neutral w.r.t questions of homology or orthology REQUIRED FOR REFERENCE GENOMES PROJECT
F-Key | Name | Type | Description |
---|---|---|---|
id | serial | PRIMARY KEY | |
symbol | character varying(128) |
A convenient human-assigned label for the homology set; this may be arbitarily chosen from one of the set members (frequently the target_gene_product); or it may be the name of the gene family. no guarantee is given to its usefulness |
|
go_general.dbxref.id | dbxref_id | integer |
UNIQUE
A globally unique identifier for this set, or some proxy for the set. For example, if the purpose of building homolsets is to examine disease genes, then an OMIM ID may be appropriate to use here, even though OMIM is not concerned with homology or orthology. If the homolset is derived |
go_associations.gene_product.id | target_gene_product_id | integer |
A homolset may be constructed from a collection of pairwise homology assignments between individual gene products and a (possibly arbitrary) "target" gene_product. this field is optional: for example, a target is not required for sets that are derived from a tree-based analysis |
go_associations.species.id | taxon_id | integer |
The least common ancestor of all members of the homolset. (may not be populated) |
go_graph.term.id | type_id | integer |
homolsets may fall into different categories - this field identifies the category. May not be populated |
description | text |
for example: if the purpose of the homolset is to examine disease genes and model organism orthologs, the description could be a summary of the disease in human |
Tables referencing this one via Foreign Key Constraints:
A phylogenetic tree. This is a new table - this will be populated from PANTHER datasets. Additional rows or property tables may be added at a future date; e.g. to add bootstrap values and so on.
F-Key | Name | Type | Description |
---|---|---|---|
id | serial | PRIMARY KEY | |
name | character varying(255) | NOT NULL DEFAULT ''::character varying | |
go_general.dbxref.id | dbxref_id | integer |
NOT NULL
A globally unique identifier for this tree. e.g. PantherDB:PTHR11361 make sure dbname is registered in GO.xrf_abbs |
Tables referencing this one via Foreign Key Constraints:
a metadata link between two terms this is primarily to support the "consider" and "replaced_by" tags in OBO Format 1.2. It could also be used for other metadata links we may want to include in the future. The main difference between term2term and this table is that term2term is for encoding the relationships that hold between types of biological entity, whereas this table is for relationships between the units in the ontology. Different rules apply to both. Eg consider/replaced_by would never propagate over the is_a relation open question: do we also want to include disjointness axioms here?
F-Key | Name | Type | Description |
---|---|---|---|
id | serial | PRIMARY KEY | |
go_graph.term.id | relationship_type_id | integer |
NOT NULL
"consider" or "replaced_by" may in future be extended to other tags |
go_graph.term.id | term1_id | integer |
UNIQUE#1
NOT NULL
the "parent" node of the edge. For example, in the edge corresponding to for "GO:0005696 telomere" consider "GO:0000781 chromosome, telomeric region" term1_id is "GO:0005696" |
go_graph.term.id | term2_id | integer |
UNIQUE#1
NOT NULL
the "child" node of the edge. For example, in the edge corresponding to for "GO:0005696 telomere" consider "GO:0000781 chromosome, telomeric region" term2_id is "GO:0000781" |
linking table between term and dbxref used where there is some other information entity of relevance to the term in question; it may be a dbxref for a publication defining the term, or it may be a reference to an entity in another database, terminology or ontology-like system with similar or identical semantics to the term.
F-Key | Name | Type | Description |
---|---|---|---|
go_graph.term.id | term_id | integer | UNIQUE#1 NOT NULL |
go_general.dbxref.id | dbxref_id | integer | UNIQUE#1 NOT NULL |
is_for_definition | integer |
UNIQUE#1
NOT NULL
equals 1 if this dbxref references the source of the definition of the term in question. This includes but is not limited to PubMed IDs. It may also be so-called "GO Curator" references, of the form GOC:<curator-id> |
In OBO, a term should where possible be defined. A term can have only one definition. Note: due to this cardinality constraint, it would be possible to merge term_definition into the *term* table. The decision was made not to do this early on, and the schema has not been changed since. (doc: http://www.geneontology.org/GO.format.obo-1_2.shtml) (OBO-Format: *def* tag; and also the *comment* tag - see column documentation)
F-Key | Name | Type | Description |
---|---|---|---|
go_graph.term.id | term_id | integer | UNIQUE NOT NULL |
term_definition | text |
NOT NULL
A textual definition for the term referenced in term_definition.term_id. The attentive user will have two criticisms; the first is that it is poor practice for a column to share a name with a table. The second is for more egregious: although this column is declared non-null, in fact some entries are in fact empty strings. This is due to the fact that term comments are housed in this table, rather than in the term table or their own term_comment table! We accept that this was a poor design decision. However, we have no plans to rectify this modeling error in the short term as this would break many pieces of 3rd party software not under our control. No fixes are likely until after MySQL has good efficient support for SQL Views. |
|
go_general.dbxref.id | dbxref_id | integer |
SEVERELY DEPRECATED: see term_dbxref.is_for_definition |
term_comment | text |
A free-text comment with non-definitional information that may be useful for end-users or curators. (OBO-Format: *comment* tag - each term has max 1 comment) Please see notes for term_definition column |
|
reference | character varying(255) |
SEVERELY DEPRECATED |
(aka goslims). Each subsetdef (slim) is stored as a term in the database (with term_type = 'subset') The subset_id links to this term (OBO-Format: *subset* tag. term_id references a term housing the *subsetdef*)
F-Key | Name | Type | Description |
---|---|---|---|
go_graph.term.id | term_id | integer | NOT NULL |
go_graph.term.id | subset_id | integer | NOT NULL |
In OBO, each term can have 0 or more synonyms or alternate identifiers. A synonym is an alternate label for a preferred term, intended for humans. An alternate identifer is an OBO ID intended for unique identification of the term in information systems. Note: OBO Format has the concept of broad and narrow synonyms - these might better be called broad and narrow aliases or alternate labels, since technically the definition of synonym is such that synonyms can replace words without changing the meaning.
F-Key | Name | Type | Description |
---|---|---|---|
go_graph.term.id | term_id | integer | UNIQUE#1 NOT NULL |
term_synonym | character varying(996) |
UNIQUE#1
A textual label typically intended for humans. (Example: "cysteine biosynthesis" is a synonym for the term named "cysteine biosynthetic process") One wrinkle is that alternate identifiers are redundantly stored in this column, as well as in the acc_synonym column. This design decision can be rightfully criticised, but we retain this use for software compatibility reasons. alt_ids can ve discriminated from synonyms using synonym_type_id - this will be "alt_id" for alternate identifiers. Note: although we accept it is bad practice to name columns the same as tables, we retain this for software compatibility reasons (OBO-Format: *synonym* tag) |
|
acc_synonym | character varying(255) |
An alternate identifier. (OBO-Format: *alt_id* tag) See also docs for term_synonym column |
|
go_graph.term.id | synonym_type_id | integer |
NOT NULL
actually corresponds to a synonym "scope" in OBO-Format - one of exact, broad, narrow, related for alternate identifiers we use the term "alt_id" - see notes above |
go_graph.term.id | synonym_category_id | integer |
category/class to which this synonym belongs Correspinds to type OBO-Format 1.2 Not currently used in GO Note: the synonym "scope" (eg exact, narrow) goes in synonym_type_id |
this table for use in "data warehouse mode" (will typically be populated in publically available instantiations of the GO database). caches recursive gene product counts the number of DISTINCT gene product records at OR BELOW a term filtered by evidence code refers to the evidence code used to filter this particular count; if preceded by a ! it means exclude this evidence code; typically this will be "!IEA" (ie exclude IEA) speciesdbname corresponds to the dbname from gene_product.dbxref_id and represents the authority that contributed the annotated gene product being counted the product count is partitioned by the speciesdbname product_count is the number of DISTINCT gene product IDs owned by speciesdbname at or below term_id in the DAG note that the product_count is additive across speciesdbnames (because no two speciesdbnames may contribute the same ID), but is NOT additive across evidence codes (the same gene product can be associated more than once beneath a term with different evidence codes) this makes filtering by evidence code hard - we must include all combinations which is a factorial!! to get round this we typically only include counts for non-IEA associations Equivalent to the query: SELECT path.term1_id AS term_id, count(DISTINCT a.gene_product_id) AS total FROM association AS a INNER JOIN evidence AS e ON (e.association_id=a.id) INNER JOIN graph_path AS path ON (path.term2_id=ae.term_id) WHERE <<evidence constraint here>>
F-Key | Name | Type | Description |
---|---|---|---|
go_graph.term.id | term_id | integer |
NOT NULL
the term for which gene products are counted; also includes terms below this term in the graph |
code | character varying(8) |
evidence code over which this count hold. can include negation; eg "!IEA" note that not every combination will be pre-computed. typically just !IEA is stored. Also note that counts over evidence codes are NON-additive; this is because the same gene_product can be double-counted if it is anntated with >1 evidence code |
|
speciesdbname | character varying(55) |
this should match the gene_product.dbxref (Examples: FlyBase, SGD, MGI, UniProt) (docs: http://www.geneontology.org/cgi-bin/xrefs.cgi) counts *ARE* additive across speciesdbnames - this is because each gene_product record belongs to a single speciesdbname, so double counting is not possible (although the same gene product in reality may be double annotated, for example by UniProt and by a Model Organism Database - these should be filtered out, but even in cases where this filtering fails we do not worry about double-counting when summing across speciesdbnames as we are technically counting distinct gene product *records*) (Note: there is no foreign key reference to the *db* table, but their could in principle be a nullable link here) |
|
go_associations.species.id | species_id | integer |
The species or taxon the count pertains to (Note: in future we reserve the option to use gene_product to represent families at higher levels in the taxonomic tree above species) |
product_count | integer |
NOT NULL
total number of DISTINCT genes/gene product records annotated directly to or via transitivity to term_id |
Index - Schema go_optimisations
A transitive or implied link between two terms Example: if nuclear chromosome is_a chromosome, AND chromosome is_a organelle, then nuclear chromosome is_a organelle this table states whether there exists a path between a parent and a child, and the distance between them. multiple paths mean multiple entries in this table [this table only used in "data warehouse mode"] an entry also exists linking every term with itself of distance 0. This is known as "reflexive transitive closure". [See http://foldoc.doc.ic.ac.uk/foldoc/foldoc.cgi?query=transitive+closure] [See also http://foldoc.doc.ic.ac.uk/foldoc/foldoc.cgi?query=reflexive+transitive+closure] note: use of this table is optional. but as most relational dbs don't implement recursive queries, you will have to incrementally calculate the transitive closure via multiple SQL calls if you do not use it for graph based queries At this time, this table holds the general transitive closure across *all* relations. In the future this table may house transitive closure on a per-relation basis, taking into account the various rules that follow from the definitions of the relation in question. For example, if X is_a Y and Y part_of Z then X part_of Z.
F-Key | Name | Type | Description |
---|---|---|---|
id | serial | PRIMARY KEY | |
go_graph.term.id | term1_id | integer |
NOT NULL
the object node See docs for *term.term1_id* |
go_graph.term.id | term2_id | integer |
NOT NULL
the subject node See docs for *term.term2_id* |
go_graph.term.id | relationship_type_id | integer |
References an entry in the term table corresponding to the INFERRED relation that holds between term2 and term1. For future extension. See: http://wiki.geneontology.org/index.php/Transitive_closure |
distance | integer |
The distance in terms of the number of "hops" between nodes in the asserted graph (term2term). The relationship_type_id is ignored here. Example: if A part_of B is_a C part_of D, then distance=3 for A part_of D |
|
relation_distance | integer |
(added 2008-10-27) The distance in terms of the number of "hops" over relationship_type_id in the asserted graph (term2term). Example: if A part_of B is_a C part_of D, then relation_distance=2 for A part_of D |
Index - Schema go_optimisations
relationship between gene_product and seq is potentially many to many, although in practice each gene product may only have one seq, depending on data population method. If this link is for the representatibe sequence for a gene product, is_primary_seq should be set to true (=1) (a gene_product should only have one seq that is marked is_primary_seq, although this is not- enforced. the idea is that there may be different seqs for a product - allelic variations, variant spliceforms, but only one is the "representative" seq) This table is typically sourced from the gp2protein contributed data file: http://www.geneontology.org/gp2protein
F-Key | Name | Type | Description |
---|---|---|---|
go_associations.gene_product.id | gene_product_id | integer | NOT NULL |
go_sequence.seq.id | seq_id | integer | NOT NULL |
is_primary_seq | integer |
If this link is for the representatibe sequence for a gene product, is_primary_seq should be set to true (=1) |
A representative DNA or amino acid sequence for a gene_product; will typically be amino acid. This table is modeled after the BioPerl Bio::PrimarySeq model
F-Key | Name | Type | Description |
---|---|---|---|
id | serial | PRIMARY KEY | |
display_id | character varying(64) |
UNIQUE#1
the primary label used for identifying the sequence for humans. Not guaranteed to be globally unique. typically corresponds to the first part of a FASTA header |
|
description | character varying(255) |
textual information for humans concerning this sequence. typically corresponds to the part after the ID in the FASTA header |
|
seq | text |
residue sequence: standard IUPAC alphabetic codes are used |
|
seq_len | integer |
number of residues in sequence. should always correspond to length(seq), where seq is populated |
|
md5checksum | character varying(32) |
UNIQUE#1
result of md5(seq), where md5 is the standard MD5 checksum algorithm. see GO::Model::Seq for calculation almost 100% guaranteed to be unique for any sequence of symbols representing the biopolymer |
|
moltype | character varying(25) |
DNA or AA |
|
timestamp | integer |
Tables referencing this one via Foreign Key Constraints:
seq1 display_id seq2 md5checksumlinking table for external identifiers for a sequence seq_dbxref is derived from the dbxrefs in the source sequence file. For example, if the source is a UniProt file, this table will reflect the DR lines if the source is a FASTA file from UniProt, the DB:ACC parts of the fasta header will be used to populate this table typically a seq entry will be something like a protein/ polypeptide - ie something that can be annotated with GO; the dbxref could be for entities in other databases that may only be tangentially related to the protein for example: mRNA records, genomic records, OMIM IDs, etc
F-Key | Name | Type | Description |
---|---|---|---|
go_sequence.seq.id | seq_id | integer | UNIQUE#1 NOT NULL |
go_general.dbxref.id | dbxref_id | integer | UNIQUE#1 NOT NULL |
seq can have various properties attached to it. not currently used
F-Key | Name | Type | Description |
---|---|---|---|
id | serial | PRIMARY KEY | |
go_sequence.seq.id | seq_id | integer | UNIQUE#1 NOT NULL |
property_key | character varying(64) | UNIQUE#1 NOT NULL | |
property_val | character varying(255) | UNIQUE#1 NOT NULL |
standard public schema
Generated by PostgreSQL Autodoc