# examples here use Dispatcher::Tool::InterProScan;
$Id: InterProScan.pm.html,v 1.1.1.1 2005/08/18 13:18:25 hunter Exp $
Copyright (c) European Bioinformatics Institute 2002
Ville Silventoinen <vsi@ebi.ac.uk> Emmanuel Quevillon <tuco@ebi.ac.uk>
Description: Constructor allocates an anonymous hash, which is tied to the class. Arguments: $name InterProScan tool name (optional). If given, sets the Dispatcher::Config object automatically inside the object. If not given, the Dispatcher::Config must be set later with setConfig. $defaults Default values for configuration, hash reference (optional). Returns: $self object
Description: Checks whether search is interactive or not. Arguments: - Returns: 1 interactive search 0 otherwise
Description: Checks InterProScan input parameters, sets the defaults and writes the input sequence(s) to file(s). The input sequence may also be given in the input parameters with key 'sequence'. Arguments: - Returns: 1, '' on success 0, msg on failure
Description: Formats input sequences to fasta format. Arguments: $out Path to the output file (fasta formatted, translated). $orig Path to original output (non-formatted, non-translated). $seqs Input sequences passed in a string (optional). $seqf Input sequences passed in a file or file handle (optional). $seqtype P (protein) or N (nucleotide) (optional). $frame Frame: 1, 2, 3 forward, -1, -2, -3 reverse (optional). $trtable Translation table code for nucleotide seqs (optional). $trlen Translation threshold length (optional). $mode Numeric mode for the directory if it needs to be created.
Returns: 1, hash reference on success. 0, msg on failure
Description: Checks the fasta formatted input sequences. This method calculates the CRC64 and length for each sequence and also writes the non-matching sequences to a file (optional). Returned hash contains following fields:
'WAP_RAT' => { 'seqn' => '1', 'len' => '137', 'crc64' => '1C2E8ADA9FD97949', 'rawentry' => '...' }, 'WAP_PIG' => { ... } ...
TODO: the hash may potentially take lot of memory depending on how many sequences are checked. If one entry takes 1-2KB, 50000 would take 50-100MB. Optionally we could write the raw entries to files, that would then generate 50000 files (worse?).
Arguments: $in Input sequence file or file handle. $checkcrc 1: check CRC64 for each sequence, 0: do not $out Output file or file handle for sequences that have unknown CRC64 checksum ($checkcrc 1) or all sequences ($checkcrc 0). $apps Reference to an array containing requested applications (affects the raw entries). $iprfields 1: includes InterPro fields with raw entries 0: no $goterms 1: includes GO terms with raw entries 0: no
Returns: 1, reference to a hash 0, msg
Description: index the input sequence by descrption field
Ex : >wap_rat blah balh rthgensvcdawwq.... ...... >another sequence ... Will index the position of the 'wap_rat' word.
Arguments: $file Path of the file to index.
Returns: 1, on success 0, msg on failure
Description: Submits InterProScan application jobs. Returns a hash with jobnames as keys and paths to output and error files:
jobname1 => { output => 'path/to/output', errors => 'path/to/errors' }, jobname2 => { ... },
Arguments: $in Path to the input sequences file. Returns: 1, jobs in array reference, '' on success 0, successfully submitted jobs, msg on failure
Description: Terminates jobs with given names. Job names must begin with the tool name, so Dispatcher::Queue::kill can read the tool configuration (defines the queue). Arguments: $jobids Reference to a hash table Returns: -
Description: Check jobs with given names. Job names must begin with the tool name, so Dispatcher::Queue::check can read the tool configuration (defines the queue). Arguments: $h_chunk Reference to a hash table containing info on each chunk $chunk Chunk number to check (optional) Returns: 1, $href on success 0, $msg on failure
Description: Insert the chunk number in the path. Arguments: $num number of the chunk directory $sref an array of scalar references Returns: 1 0, msg on failure
Description: Removes temporary files created during job submissions. Calls Queue::cleanup for each job. Arguments: $h_chunk Reference to a hash table containing info on each chunk $chunk Chunk number to check (optional) Returns: -
Description: Update the status of job in a file Arguments: $h_chunk hash reference containing status for chunks jobs $chunk number of the chunk (optional) Returns: 1 0, msg on failure
Description: Creates Dispatcher tool objects for InterProScan. Arguments: $tname Tool name. $defaults Defaults for tool configuration (optional). Returns: 1, tool on success 0, msg on failure
Description: Maps InterProScan job identifier to application name. This is a convenience method for mapping the application job IDs to the real tool names. Arguments: $jobid Job identifier. Returns: tool name on success empty string on failure
Description: Maps InterProScan job identifier to tool name. This is a convenience method for mapping the application job IDs to the real tool names. Arguments: $jobid Job identifier. Returns: tool name on success empty string on failure
Description: Maps InterProScan job identifier to input sequence number. This is not intended for the chunk submissions, which don't have the sequence number in the job identifier. Arguments: $jobid Job identifier. Returns: sequence number on success empty string on failure
Description: Maps InterProScan application to tool name. This is a convenience method. Arguments: $appl Application name, i.e., 'hmmsmart'. Returns: tool name on success empty string on failure
Description: Maps status code to InterProScan job status text. Arguments: $status Status code. Returns: status text
Description: Returns the home page for interproscan Arguments: Retruns: ref to a variable
Description: Retruns an HTML page saying that no jobs have been started yet in the current chunk. Arguments: $out path where to write this page otherwise write to STDOUT Retruns: a string
Description: Returns top part of an HTML page. Arguments: - Returns: string
Description: Returns bottom part of an HTML page. Arguments: - Returns: string
Description: Returns the top part of the HTML result page. Arguments: $cnk Chunk number to point to the right files Returns: top part string
Description: Returns the bottom part of the HTML result page. Arguments: - Returns: bottom part string
Description: Creates an HTML status page.
NOTE 1: HTTP headers must be added by the caller, because it is impossible to know when they should be added (caller may have started the page and this call finishes it).
NOTE 2: If you use stdout and have created a poll page for the URL, beware that the poll page will be overwritten and the user will get an empty page as soon as this sub starts writing the result. In such cases it is better to pass $in as a path, because this method creates a temporary file and renames it when the file has been written.
Arguments: $in Job result. $out Output (default: stdout). $top Page top (default: getHtmlTop). $bottom Page bottom (default: getHtmlBottom).
Returns: 1, '' on success 0, msg on failure
Description: Creates an HTML result page.
NOTE 1: HTTP headers must be added by the caller, because it is impossible to know when they should be added (caller may have started the page and this call finishes it).
NOTE 2: If you use stdout and have created a poll page for the URL, beware that the poll page will be overwritten and the user will get an empty page as soon as this sub starts writing the result. In such cases it is better to pass $in as a path, because this method creates a temporary file and renames it when the file has been written.
Arguments: $in InterProScan XML result file. $out Output (default: stdout). $top Page top (default: getResultPageTop). $bottom Page bottom (default: getResultPageBottom).
Returns: 1, '' on success 0, msg on failure
Description: Creates an HTML result page.
NOTE 1: HTTP headers must be added by the caller, because it is impossible to know when they should be added (caller may have started the page and this call finishes it).
NOTE 2: If you use stdout and have created a poll page for the URL, beware that the poll page will be overwritten and the user will get an empty page as soon as this sub starts writing the result. In such cases it is better to pass $in as a path, because this method creates a temporary file and renames it when the file has been written.
Arguments: $in InterProScan XML result file. $out Output (default: stdout). $top Page top (default: getResultPageTop). $bottom Page bottom (default: getResultPageBottom).
Returns: 1, '' on success 0, msg on failure
Description: Filter the results for each ids found in a run. It is possible to use 'and, or, and/or' search. Arguments: $id the id to look for. Returns: 1, 1/0 to accept the id on success 0, $msg on error
Description: Make a comparison between two lists of taxonomy. Arguments: $iprtaxos reference to an array containing InterPro taxonomy list. $ustaxos reference to an array containing user taxonomy list. Retruns: 1, $retvalue on success 0, $msg on error
Description: Creates raw result file. All jobs must be finished. Arguments: $seqs Sequence hash as returned by checkSequences. $out Output handle or file path (default: stdout). $jobnames Job names in an array (optional). Returns: 1, '' on success 0, msg on failure
Description: create a report for failed jobs. It contains command launched, error status and how to relaunch the jobs failed and concatenate the whole results. Arguments: $jobid Job identifier $h_chunk Hash table for chunks $file Report file for the run $erfile Error File from application $mode To overwrite the file if already exists instead of filling it. Returns: 1, '' on success 0, msg on failure
Description: Queries a raw entry from IPRMATCHES. This method uses the sequence CRC64 checksum to query the entry. Arguments: $seqid Sequence ID. $seqcrc CRC64 checksum. $seqlen Sequence length. $appl Applications (reference to an array). By default all applications in configuration are used. $ipr 1: add IPR lookup info 0: no (default) $go 1: add GO terms 0: no (default) Returns: 1, raw entry on success 0, msg on failure
Description: Queries accession number and name fields from InterPro. Arguments: $hitac Hit accession number, e.g. 'PD019552' (BlastProDom) 'PR00003' (FingerPrintScan) 'PS00317' (ScanRegExp) 'PS50311' (ProfileScan) 'PF00095' (HMMPfam) 'SM00217' (HMMSmart) 'TIGR00010' (HMMTigr) 'PIRSF000196' (HMMPIRSF) 'SSF46561' (HMMSSF)
Returns: 1, ac, name, goterms on success 0, msg, undef, undef on failure
Description: Calculates CRC64 checksum from a string.
This implementation is from Swissknife-1.42 package SWISS:CRC64 implemeted by Alexandre Gattiker (gattiker@isb-sib.ch).
http://swissknife.sourceforge.net/
Arguments: $in Input string. Returns: checksum
Description: Sorts locations by start positions.