!
!Gene Ontology
!Specification of abbreviations for cross-referenced databases.
!
! NOTE: this file documents the Legacy GO.xrf_abbs format.
! We recommend use of the newer YAML format, which is documented here:
! https://github.com/geneontology/go-site/blob/master/metadata/db-xrefs.md
INTRODUCTION

The file GO.xrf_abbs contains metadata about the organizations which 
contribute to the GO.  There is a one to one relationship between 
abbreviations and urls where data can be retrieved.  From here on, a 
single url which can be queried using database ids, will be referred 
to as a datasource.  Each organization may have mutliple datasources.
Each abbreviation identifies one section of the file which provides the 
abbreviation and full name of that data source, the object type which is 
retrieved, an example database id, the generic url which identifies that data 
source uniquely and globally and the syntax of an actual query request with 
parameters filled in.  The url syntax may be repeated in the case of mirrors.


PAGE LAYOUT

The page begins with comments denoted by an exclamation point (!) at 
the beginning of the line.  The comments contain the revision and date 
of the file.  The comments are followed by a blank line and then the 
first data source.  

Datasources are separated by a blank line (two consecutive newlines).

COMMENTS

Comments may be added if you desire.  A comment corresponding to a 
section should be above that section.  A block of comments should 
be separated by consecutive newlines on either side, just as a data 
section is separated by consecutive newlines.

Correct use of comments:

data_section

! I am commenting on the following datasource

data_section

Incorrect use of comments:

data_section

! I am commenting on the following datasource
data_section


SECTION SYNTAX

Each section is composed of a series of lines with the syntax:

<label><colon><whitespace><value>

e.g.:

abbreviation: CGEN

The allowed labels are:

abbreviation
shorthand_name
database
object
synonym
example_id
local_id_syntax
generic_url
url_syntax
url_example
is_obsolete
consider
replaced_by

DISCUSSION OF VALUES

abbreviation

An abbreviated name for the datasource, e.g. SGD.  Each abbreviation 
must be unique within GO.xref_abbs.  Each abbreviation must 
correspond to exactly one datasource.  The convention is that if 
multiple datasources are controlled by one organization, they should 
all share one prefix followed by an underscore and
postfix, e.g. SGD and SGD_REF  This allows datasources to be grouped by 
organization.

shorthand_name

OPTIONAL.  A name of less than 10 characters you'd like processing
applications to display to the public when display space is 
limited.  If this line is not included the abbreviation
will be used.  Capitalization should be maintained by any
processing applications.

database

The full name of the data source,  e.g. Saccharomyces Genome Database.

object

the type of data returned from this data source.  e.g.: gene product.  
Currently the values  for this are ad hoc but this should be 
standardized in the future.

synonym

An alternate name or abbreviation for the data source.

example_id

An example database ID, e.g.:   SGD:S0006169

This is the full (global) ID formed by concatenating the abbreviation with the local ID

local_id_syntax

A regular expression that all IDs for the data source will match

generic_url

The root or representative URL for this datasource.  ALL GENERIC_URLS 
SHOULD BE UNIQUE.  The generic_url may be used as a global identifier 
for this datasource in much the same way the abbreviation is used as 
an identifier within this file.
e.g.: http://www.yeastgenome.org/

NOTE: The trailing hash is a semi-standard syntax for using urls as 
identifiers.  It does not affect linking. 

url_sytax

A string to which one can append a database ID and get a valid URL 
query for the object referenced by that id.  The string [example_id]
should be replaced with the local ID.
e.g. the abbreviation SGD has the url_syntax of http://db.yeastgenome.org/cgi-bin/locus.pl?dbid=[example_id]
so a global ID of SGD:S0006169 would be translated to a URL o
http://db.yeastgenome.org/cgi-bin/locus.pl?dbid=S0006169

(note the use of [example_id] here as the string to be replaced is
inconsistent with the example_id tag, which is the full global ID)


NOTE:  In the case of mirrors, the url_syntax field may be repeated.

url_example

An example of a complete, working URL including an example ID.

NOTE:  In the case of mirrors, the url_example field may be repeated.

is_obsolete

[new tag, not currently in use]

Entries should not be deleted from GO.xrf_abbs; instead they should be
marked as obsolete, like this:

        is_obsolete: true

"false" is also a valid value; however, any entry without an
is_obsolete tag is considered non-obsolete by default

obsolete entries may optionally be accompanied by consider or replaced_by tags

consider

[new tag, not currently in use]

Should only be used with obsolete entries (above). If the abbreviation
has been retired and there is another suggested abbrev to use, use
this tag. The meaning of this tag is the same as in obo-format.

        consider: OTHER_ID

replaced_by

[new tag, not currently in use]

Should only be used with obsolete entries (above). If the abbreviation
has been retired and there is another replacement abbrev, use this
tag. The meaning of this tag is the same as in obo-format.

        replaced_by: NEW_ID

################################################################################
EXAMPLE FILE - comments are preceded by !
################################################################################
!version: $Revision: 1.11 $
!date: $Date: 2009/10/06 00:14:54 $
!
!Gene Ontology
!Abbreviations for cross-referenced databases.
!
!Note that url's are not necessarily stable entities and that some
!databases may have many other access routes or mirror sites.
!
! NOTE:  a newline follows before the first entry

! Here's the root SGD database where we get gene products.
! Note that there is no shorthand name, since we want any
! applications to use 'SGD' as the shorthand name.
! Also note the trailing # on the generic_url.  This is a 
! somewhat standard way of referencing a url when being used
! as an identifier.  It doesn't interfere with linking.

abbreviation: SGD
database: Saccharomyces Genome Database.
object: gene product
example_id: S0004660
generic_url: http://www.yeastgenome.org/
url_syntax: http://db.yeastgenome.org/cgi-bin/SGD/locus.pl?locus=

! Here are the SGD references.  Note that References has been added to 
! the generic_url to make it unique.  This is a good convention to follow.

abbreviation: SGD_REF
shorthand_name: SGD
database: Saccharomyces Genome Database.
object: Reference Citation.
example_id: 12031
generic_url: http://www.yeastgenome.org/
url_syntax: 
http://db.yeastgenome.org/cgi-bin/SGD/reference/reference.pl?refNo=

! Here's one from flybase.  Note that the generic_url field has been
! repeated since there is a mirror.  Also, the shorthand_name has been
! filled in as FlyBase, since this is less than 10 characters.

abbreviation: FB
shorthand_name: FlyBase
database: FlyBase.
object: gene product
example_id: FBgn0000024
generic_url: http://flybase.bio.indiana.edu#
url_syntax: http://fly.ebi.ac.uk:7081/.bin/fbidq.html?
url_syntax: http://flybase.bio.indiana.edu/.bin/fbidq.html?