BLAST Web Service Documentation =============================== A. Overview -------- The BLAST Web Service WSDL file can be accessed at http://www.ncbi.nlm.nih.gov/blast/netblast/blastws.cgi?WSDL The data structures used in this WSDL file are defined in http://www.ncbi.nlm.nih.gov/blast/data_specs/NCBI_Blast4.xsd, which is a translation to XML schema of several ASN.1 specifications available in the NCBI C++ toolkit. The functionality exposed by this web service encompasses: 1) Search submission: 1.1 SubmitSearchLite 1.2 SubmitSearch 2) Search management 2.1 CheckSearchStatus 2.2 GetSearchResults 2.3 GetSearchStrategy 3) Information retrieval 3.1 GetDatabases 3.2 GetSupportedMatrices 3.3 GetSupportedOptions 3.4 GetSupportedTasks 3.5 GetSupportedPrograms 3.6 GetSequences The goal of this service is to provide developers with a platform independent means of programmatically accessing BLAST using the programming language/environment of their choice. B. Description of services ----------------------- 1) Search submission: 1.1 SubmitSearchLite This is the simplest and most limited way of submitting a BLAST search. The input to this function is: -- Simplified search submission structure Blast4-queue-search-request-lite ::= SEQUENCE { -- query sequence: provide a FASTA sequence, a gi number, or an accession query VisibleString, -- Name of BLAST database to search database-name VisibleString, -- BLAST options options Blast4-options-lite } Note that this type of request does not support multiple queries, PSI-BLAST, PHI-BLAST, or Blast2Sequences. The options field is a structure which contains the most commonly used BLAST algorithm options. The return of this function is a structure whose only field is a string containing the RID, or a SOAP fault in case of error. 1.2 SubmitSearch This function takes a Blast4-queue-search-request structure as input, which allows for the specification of more complex BLAST searches. The program and service fields should be populated with the values returned via the GetSupportedPrograms function. The Blast4-queries type allows the specification of sequence data (in its bioseq-set field), sequence identifiers (in its seq-loc-list field), or a PSSM (in its PSSM field). The Blast4-subject type allows the specification of sequence data (in its sequences field) or a BLAST database name (in its database field). The latter value must be one of those returned via the GetDatabases function, specifically the value of the name field in the Blast4-database structure is expected. The paramset field, if present, should be populated with one of the values returned by the GetSupportedTasks function. The algorithm-options and program-options are documented in sections C. and D. 2) Search management 2.1 CheckSearchStatus This function takes as input a Blast4-get-search-status-request, whose only field is an RID and returns a Blast4-get-search-status-reply which contains a string indicating whether the search corresponding to that RID is 'ready', 'pending', or whether an 'error' occurred. 2.2 GetSearchResults This function takes as input a Blast4-get-search-results-request, whose only field is an RID and returns a Blast4-get-search-results-reply. The latter type contains results applicable to the search originally submitted. (FIXME: error structure should be added or fix conversion to soap fault?) 2.3 GetSearchStrategy This function takes as input a Blast4-get-search-strategy-request, whose only field is an RID and returns a Blast4-get-search-strategy-reply, which is an alias for the Blast4-queue-search-request type. This is the canonical way of storing search strategies and can be utilized in the BLAST web pages as well as the command line binaries in the NCBI C++ toolkit. 3) Information retrieval 3.1 GetDatabases This takes a Blast4-request structure, whose body field contains a Blast4-request-body with a get-databases element. The ident field of the Blast4-request is an identifier for the client. This function returns a Blast4-get-databases-reply, which is a list of Blast4-database-info structures containing database specific information. 3.2 GetSupportedMatrices This takes a Blast4-request structure, whose body field contains a Blast4-request-body with a get-matrices element. The ident field of the Blast4-request is an identifier for the client. This function returns a Blast4-get-matrices-reply, which is a list of Blast4-matrix-id structures. 3.3 GetSupportedOptions This takes a Blast4-request structure, whose body field contains a Blast4-request-body with a get-parameters element. The ident field of the Blast4-request is an identifier for the client. This function returns a Blast4-get-parameters-reply, which is a list of Blast4-parameter-info structures containing the names and data types of the algorithm options to BLAST. 3.4 GetSupportedTasks This takes a Blast4-request structure, whose body field contains a Blast4-request-body with a get-paramsets element. The ident field of the Blast4-request is an identifier for the client. This function returns a Blast4-get-paramsets-reply, which is a list of Blast4-task-info structures containing the names and documentation for the tasks supported by the BLAST service. Please note that the concepts of 'task' and 'paramset' are synonymous. 3.5 GetSupportedPrograms This takes a Blast4-request structure, whose body field contains a Blast4-request-body with a get-programs element. The ident field of the Blast4-request is an identifier for the client. This function returns a Blast4-get-programs-reply, which is a list of Blast4-program-info structures containing programs and services available for each program. 3.6 GetSequences This takes a Blast4-request structure, whose body field contains a Blast4-request-body with a Blast4-get-sequences-request element. This specifies the source BLAST database and the sequence identifiers for the sequences requested. The ident field of the Blast4-request is an identifier for the client. This function returns a Blast4-get-sequences-reply, which is a list of Bioseqs. C. Specifying BLAST algorithm options ---------------------------------- BLAST options are specified as key/value pairs in the Blast4-parameter structure. The names of the options supported and their data types can be obtained via the GetSupportedOptions function. These options are documented in http://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/doxyhtml/classCBlastOptions.html (FIXME: this needs to be documented!) D. Specifying BLAST program options -------------------------------- There are two BLAST program options: EntrezQuery and GiList (FIXME: any others?). The EntrezQuery has a string value and it follows the syntax specified in the Entrez documentation [1]. The GiList is made up of a list of gis (integers). Both of these options are used to restrict the database being searched to a subset that is made up of those sequences which match the Entrez query or the gi list specified. E. Notes ----- [1] http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=helpentrez.section.EntrezHelp.Limits D. Sample code ----------- Please see the provided sample programs: c++_soap_client.tgz: C++ client using the NCBI C++ toolkit java_soap_client.tgz, java_soap_client.tgz: Java client using the Apache AXIS2 Web Services project soap_client.pl: Perl client using SOAP::Lite csharp_soap_client.zip: C# client using the .NET Framework version 2.0