NAME InterProScan - InterProScan utilities.


SYNOPSIS

  # examples here
  use Dispatcher::Tool::InterProScan;


DESCRIPTION


VERSIONS

$Id: InterProScan.pm.html,v 1.1.1.1 2005/08/18 13:18:25 hunter Exp $

Copyright (c) European Bioinformatics Institute 2002


AUTHORS / ACKNOWLEDGEMENTS

Ville Silventoinen <vsi@ebi.ac.uk> Emmanuel Quevillon <tuco@ebi.ac.uk>

new

 Description: Constructor allocates an anonymous hash, which
              is tied to the class.
 Arguments:   $name        InterProScan tool name (optional). If given, sets the
                           Dispatcher::Config object automatically inside
                           the object. If not given, the Dispatcher::Config
                           must be set later with setConfig.
              $defaults    Default values for configuration, hash reference
                           (optional).
 Returns:     $self object

isInteractive

 Description: Checks whether search is interactive or not.
 Arguments:   -
 Returns:     1 interactive search
              0 otherwise

checkParams

 Description: Checks InterProScan input parameters, sets the defaults and writes
              the input sequence(s) to file(s). The input sequence may also
              be given in the input parameters with key 'sequence'.
 Arguments:   -
 Returns:     1, ''  on success
              0, msg on failure

formatSequences

 Description:  Formats input sequences to fasta format.
 Arguments:    $out        Path to the output file (fasta formatted, translated).
               $orig       Path to original output (non-formatted, non-translated).
               $seqs       Input sequences passed in a string (optional).
               $seqf       Input sequences passed in a file or file handle (optional).
               $seqtype    P (protein) or N (nucleotide) (optional).
               $frame      Frame: 1, 2, 3 forward, -1, -2, -3 reverse (optional).
               $trtable    Translation table code for nucleotide seqs (optional).
               $trlen      Translation threshold length (optional).
               $mode       Numeric mode for the directory if it needs to be created.
 Returns:      1, hash reference on success.
               0, msg on failure

checkSequences

 Description: Checks the fasta formatted input sequences. This method calculates
              the CRC64 and length for each sequence and also writes the non-matching
              sequences to a file (optional). Returned hash contains following fields:
               'WAP_RAT' => { 'seqn'     => '1',
                              'len'      => '137',
                              'crc64'    => '1C2E8ADA9FD97949',
                              'rawentry' => '...'
                            },
               'WAP_PIG' => { ... }
               ...
              TODO: the hash may potentially take lot of memory depending on how
              many sequences are checked. If one entry takes 1-2KB, 50000 would
              take 50-100MB. Optionally we could write the raw entries to files,
              that would then generate 50000 files (worse?).
 Arguments:   $in           Input sequence file or file handle.
              $checkcrc     1: check CRC64 for each sequence, 0: do not
              $out          Output file or file handle for sequences that have unknown
                            CRC64 checksum ($checkcrc 1) or all sequences ($checkcrc 0).
              $apps         Reference to an array containing requested applications
                            (affects the raw entries).
              $iprfields    1: includes InterPro fields with raw entries 0: no
              $goterms      1: includes GO terms with raw entries 0: no
 Returns:     1, reference to a hash
              0, msg

indexInput

 Description:  index the input sequence by descrption field
               Ex : >wap_rat blah balh
                    rthgensvcdawwq....
                    ......
                    >another sequence
                    ...
               Will index the position of the 'wap_rat' word.
 Arguments:    $file   Path of the file to index.
 Returns:      1, on success
               0, msg on failure

submitJobs

 Description:  Submits InterProScan application jobs. Returns a hash with
               jobnames as keys and paths to output and error files:
               jobname1 => { output => 'path/to/output',
                             errors => 'path/to/errors' },
               jobname2 => { ... },
 Arguments:    $in    Path to the input sequences file.
 Returns:      1, jobs in array reference,     ''  on success
               0, successfully submitted jobs, msg on failure

terminateJobs

 Description:  Terminates jobs with given names.  Job names must begin with the
               tool name, so Dispatcher::Queue::kill can read the tool
               configuration (defines the queue).
 Arguments:    $jobids  Reference to a hash table
 Returns:      -

checkJobs

 Description:  Check jobs with given names.  Job names must begin with the tool
               name, so Dispatcher::Queue::check can read the tool configuration
               (defines the queue).
 Arguments:    $h_chunk Reference to a hash table containing info on each chunk
               $chunk   Chunk number to check (optional)
 Returns:      1, $href on success
               0, $msg  on failure

addChunk

 Description:  Insert the chunk number in the path.
 Arguments:    $num   number of the chunk directory
               $sref  an array of scalar references
 Returns:      1
               0, msg  on failure

cleanup

 Description:  Removes temporary files created during job submissions.
               Calls Queue::cleanup for each job.
 Arguments:    $h_chunk Reference to a hash table containing info on each chunk
               $chunk   Chunk number to check (optional)
 Returns:      -

updateStatus

 Description:  Update the status of job in a file
 Arguments:    $h_chunk  hash reference containing status for chunks jobs
               $chunk    number of the chunk (optional)
 Returns:      1
               0, msg  on failure

createTool

 Description:  Creates Dispatcher tool objects for InterProScan.
 Arguments:    $tname       Tool name.
               $defaults    Defaults for tool configuration (optional).
 Returns:      1, tool on success
               0, msg  on failure

mapJobIDToApplName

 Description:  Maps InterProScan job identifier to application name. This is a
               convenience method for mapping the application job IDs to the
               real tool names.
 Arguments:    $jobid    Job identifier.
 Returns:      tool name    on success
               empty string on failure

mapJobIDToToolName

 Description:  Maps InterProScan job identifier to tool name. This is a
               convenience method for mapping the application job IDs
               to the real tool names.
 Arguments:    $jobid    Job identifier.
 Returns:      tool name    on success
               empty string on failure

mapJobIDToSeqNo

 Description:  Maps InterProScan job identifier to input sequence number.
               This is not intended for the chunk submissions, which don't
               have the sequence number in the job identifier.
 Arguments:    $jobid    Job identifier.
 Returns:      sequence number on success
               empty string    on failure

mapApplNameToToolName

 Description:  Maps InterProScan application to tool name. This is a
               convenience method.
 Arguments:    $appl    Application name, i.e., 'hmmsmart'.
 Returns:      tool name    on success
               empty string on failure

mapStatus

 Description:  Maps status code to InterProScan job status text.
 Arguments:    $status    Status code.
 Returns:      status text

getHomePage

 Description: Returns the home page for interproscan
 Arguments:
 Retruns:     ref to a variable

getNoJobsStarted

 Description: Retruns an HTML page saying that no jobs have been started yet in the current chunk.
 Arguments:   $out path where to write this page otherwise write to STDOUT
 Retruns:     a string

getHtmlTop

 Description: Returns top part of an HTML page.
 Arguments:   -
 Returns:     string

getHtmlBottom

 Description: Returns bottom part of an HTML page.
 Arguments:   -
 Returns:     string

getResultPageTop

 Description: Returns the top part of the HTML result page.
 Arguments:   $cnk Chunk number to point to the right files
 Returns:     top part string

getResultPageBottom

 Description: Returns the bottom part of the HTML result page.
 Arguments:   -
 Returns:     bottom part string

createStatusPage

 Description: Creates an HTML status page.
              NOTE 1: HTTP headers must be added by the caller, because it is
              impossible to know when they should be added (caller may have
              started the page and this call finishes it).
              NOTE 2: If you use stdout and have created a poll page for the URL,
              beware that the poll page will be overwritten and the user will get
              an empty page as soon as this sub starts writing the result. In such
              cases it is better to pass $in as a path, because this method creates
              a temporary file and renames it when the file has been written.
 Arguments:   $in        Job result.
              $out       Output (default: stdout).
              $top       Page top (default: getHtmlTop).
              $bottom    Page bottom (default: getHtmlBottom).
 Returns:     1, ''  on success
              0, msg on failure

createResultPicture

 Description: Creates an HTML result page.
              NOTE 1: HTTP headers must be added by the caller, because it is
              impossible to know when they should be added (caller may have
              started the page and this call finishes it).
              NOTE 2: If you use stdout and have created a poll page for the URL,
              beware that the poll page will be overwritten and the user will get
              an empty page as soon as this sub starts writing the result. In such
              cases it is better to pass $in as a path, because this method creates
              a temporary file and renames it when the file has been written.
 Arguments:   $in        InterProScan XML result file.
              $out       Output (default: stdout).
              $top       Page top (default: getResultPageTop).
              $bottom    Page bottom (default: getResultPageBottom).
 Returns:     1, ''  on success
              0, msg on failure

createResultTable

 Description: Creates an HTML result page.
              NOTE 1: HTTP headers must be added by the caller, because it is
              impossible to know when they should be added (caller may have
              started the page and this call finishes it).
              NOTE 2: If you use stdout and have created a poll page for the URL,
              beware that the poll page will be overwritten and the user will get
              an empty page as soon as this sub starts writing the result. In such
              cases it is better to pass $in as a path, because this method creates
              a temporary file and renames it when the file has been written.
 Arguments:   $in        InterProScan XML result file.
              $out       Output (default: stdout).
              $top       Page top (default: getResultPageTop).
              $bottom    Page bottom (default: getResultPageBottom).
 Returns:     1, ''  on success
              0, msg on failure

filterTaxoResults

    Description: Filter the results for each ids found in a run.
                 It is possible to use 'and, or, and/or' search.
    Arguments:   $id the id to look for.
    Returns:     1, 1/0 to accept the id on success
                 0, $msg on error

compareTaxos

    Description: Make a comparison between two lists of taxonomy.
    Arguments:   $iprtaxos reference to an array containing InterPro taxonomy list.
                 $ustaxos  reference to an array containing user taxonomy list.
    Retruns:     1, $retvalue on success 
                 0, $msg on error

createRawResult

 Description: Creates raw result file. All jobs must be finished.
 Arguments:   $seqs        Sequence hash as returned by checkSequences.
              $out         Output handle or file path (default: stdout).
              $jobnames    Job names in an array (optional).
 Returns:     1, ''  on success
              0, msg on failure

createFailedReport

 Description: create a report for failed jobs.
              It contains command launched, error status and how to relaunch
              the jobs failed and concatenate the whole results.
 Arguments:   $jobid     Job identifier
              $h_chunk   Hash table for chunks
              $file      Report file for the run
              $erfile    Error File from application
              $mode      To overwrite the file if already exists instead of filling it.
 Returns:     1, '' on success
              0, msg   on failure

getRawEntryFromIprMatches

 Description: Queries a raw entry from IPRMATCHES. This method uses the sequence
              CRC64 checksum to query the entry.
 Arguments:   $seqid     Sequence ID.
              $seqcrc    CRC64 checksum.
              $seqlen    Sequence length.
              $appl      Applications (reference to an array). By default all
                         applications in configuration are used.
              $ipr       1: add IPR lookup info 
                         0: no (default)
              $go        1: add GO terms
                         0: no (default)
 Returns:     1, raw entry on success
              0, msg   on failure

getInterProFields

 Description: Queries accession number and name fields from InterPro.
 Arguments:   $hitac    Hit accession number, e.g.
                        'PD019552'    (BlastProDom)
                        'PR00003'     (FingerPrintScan)
                        'PS00317'     (ScanRegExp)
                        'PS50311'     (ProfileScan)
                        'PF00095'     (HMMPfam)
                        'SM00217'     (HMMSmart)
                        'TIGR00010'   (HMMTigr)
                        'PIRSF000196' (HMMPIRSF)
                        'SSF46561'    (HMMSSF)
 Returns:     1, ac, name, goterms on success
              0, msg, undef, undef on failure

crc64

 Description: Calculates CRC64 checksum from a string.
              This implementation is from Swissknife-1.42 package SWISS:CRC64
              implemeted by Alexandre Gattiker (gattiker@isb-sib.ch).
              http://swissknife.sourceforge.net/
 Arguments:   $in    Input string.
 Returns:     checksum

byStartPosition

 Description: Sorts locations by start positions.