- SEQUENCE INPUT WINDOW
You can cut and paste or type a Nucleotide or Protein sequence into the large text
window. A free text (raw) sequence is simply a block of
characters representing a DNA/RNA or Protein sequence.
You may also paste a sequence in Fasta, EMBL, Swiss-Prot and GenBank format.
Partially formatted sequences will not be accepted.
Copying and Pasting directly from word processors may yield
unpredictable results as hidden/control characters may be present.
Adding a return to the end of the sequence may help certain
applications understand the input. Some examples of common sequence
formats may be seen here.
- UPLOAD A FILE
You may upload a file from your computer which containing a valid sequence in any format (Raw, Fasta, EMBL, Swiss-Prot and GenBank)
using this option. Please note that this option only works with
Netscape Browsers or Internet Explorer version 5 or later. Some word
processors may yield unpredictable results as hidden/control characters
may be present in the files. It is best to save files with the Unix
format option to avoid hidden windows characters. Some examples of
common sequence formats may be seen here.
- APPLICATIONS TO RUN
A number of different protein sequence applications are launched. These
applications search against specific databases and have preconfigured
cut off thresholds.
- BlastProDom
Scans the families in the ProDom database.
ProDom is a comprehensive set of protein domain families automatically
generated from the Swiss-Prot and TrEMBL sequence databases using
psi-blast. In InterProScan the blastpgb program is used to scan the
database. Blastpgp performs gapped blastp searches and can be used to
perform iterative searches in psi-blast and phi-blast mode.
- FPrintScan
Scans against the fingerprints in the PRINTS database.
These fingerprints are groups of motifs that together are more potent
than single motifs by making use of the biological context inherent in
a multiple motif method.
- HMMPIR
Scans the hidden markov models (HMMs) that are present in the PIR Protein Sequence Database (PSD) of functionally annotated protein sequences, PIR-PSD.
- HMMPfam
Scans the hidden markov models (HMMs) that are present in the PFAM Protein families database.
- HMMSmart
Scans the hidden markov models (HMMs) that are present in the SMART domain/domain families database.
- HMMTigr
Scans the hidden markov models (HMMs) that are present in the TIGRFAMs protein families database.
- ProfileScan
Scans against PROSITE profiles. These profiles are based on weight matrices and are more sensitive for the detection of divergent protein families.
- ScanRegExp
Scans against the regular expressions in the PROSITE protein families and domains database.
- SuperFamily
SUPERFAMILY is a library of profile hidden Markov models that represent all proteins of known structure.
- TRANSLATION & READING FRAMES
N.B. As nucleotide input sequences needed to be converted into a
hypothetical protein. This occurrs in 6 reading frames, i.e. results in
6 possible protein sequences. Each 3 bases in the DNA sequence codes for 1 amino
acid. As you may not be sure what position to start at when predicting
what protein sequence may be produced by this code, you could start
with one of 3 positions from either end of the DNA sequence. Thus there
are 6 possible predicted protein sequences resulting from such a piece
of code. These are known as the 6 possible reading frames. There are 3
forward frames and 3 reverse sense frames.
e.g.
gcagccgggcggccgcagaagcgcccaggcccgcgcgccacccct DNA
Forward frames
gca gcc ggg cgg ccg cag aag cgc cca ggc ccg cgc gcc acc cct DNAs
A A G R P Q K R P G P R A T P amino acids
g cag ccg ggc ggc cgc aga agc gcc cag gcc cgc gcg cca ccc ct DNA
Q P G G R R S A Q A R A P P amino acids
gc agc cgg gcg gcc gca gaa gcg ccc agg ccc gcg cgc cac ccc t DNA
S R A A A E A P R P A R H P amino acids
Reverse frames
tcc cca ccg cgc gcc cgg acc cgc gaa gac gcc ggc ggg ccg acg DNA
R G G A R A W A L L R P P G C amino acids
t ccc cac cgc gcg ccc gga ccc gcg aag acg ccg gcg ggc cga cg DNA
G W R A G L G A S A A A R L X amino acids
tc ccc acc gcg cgc ccg gac ccg cga aga cgc cgg cgg gcc gac g DNA
G V A R G P G R F C G R P A A amino acids
Example of Hypothetical proteins produced from a translation:
-
Genetic Code table used: [0] -> Standard Genetic Code
Frames: All Six Frames
>_1 AAGRPQKRPGPRATP >_2 QPGGRRSAQARAPP >_3 SRAAAEAPRPARHP >_4 RGGARAWALLRPPGC >_5 GWRAGLGASAAARLX >_6 GVARGPGRFCGRPAA
|
Also
the translation into protein does not apply uniformly to all organisms,
the same nucleotide sequence can code for a different set of amino
acids in different organisms. Therefore you can
translate using the standard ('Universal') genetic code and also with a
selection of non-standard codes that may predict the hypothetical
protein sequence more accurately. Please select the most appropriate
genetic code for the species from which the sequence was obtained. More about genetic codes.
- MIN. OPEN READING FRAME SIZE
If you for example set this option to 100, this means that when a nucleotide sequence is translated
to a hypothetical protein sequence, if a stop codon is hit before a
hundred nucleotide bases are translated, the hypothetical protein
sequence will be discarded, and the application will commence
translating the next peice of nucleotide sequence. This means any
pieces of nucleotide sequence that are less than 100 bases long before
hitting a stop codon (which code for 33 amino acids) will be excluded
from the experiment.
- CRC (Internal use only)
Every sequence has a CRC. If a sequence is submitted to InterProScan its
CRC is checked against a precomputed list of matches of protein
sequences to InterPro entries(that are contained in the IPRMATCHES
database). If the CRC of the query sequence matches to one in the
precomputed results, this result is returned to the user and
InterProScan is not executed. If the CRC does not match to anything,
InterProScan is launched on the query sequence.
- REFERENCES
1. The InterPro Consortium (*R.Apweiler, T.K.Attwood, A.Bairoch,
A.Bateman, E.Birney, M.Biswas, P.Bucher, L.Cerutti, F.Corpet,
M.D.R.Croning, R.Durbin, L.Falquet, W.Fleischmann, J.Gouzy,
H.Hermjakob, N.Hulo, I.Jonassen, D.Kahn, A.Kanapin, Y.Karavidopoulou,
R.Lopez, B.Marx, N.J.Mulder, T.M.Oinn, M.Pagni, F.Servant,
C.J.A.Sigrist, E.M.Zdobnov), " The InterPro database, an integrated documentation resource for protein families, domains and functional sites",
Nucleic Acids Research, 2001. vol 29(1):37-40.
2. Hofmann K., Bucher P., Falquet L., and Bairoch A., " The Prosite Database, Its Status in 1999".
Nucleic Acids Res, 1999. 27(1): p. 215-9.
3. Attwood T.K., Croning M.D., Flower D.R., Lewis A.P., Mabey J.E., Scordis P., Selley J.N., and Wright W., " Prints-S: The Database Formerly Known as Prints".
Nucleic Acids Res, 2000. 28(1): p. 225-7.
4. Bateman A., Birney E., Durbin R., Eddy S.R., Howe K.L., and
Sonnhammer E.L., "The Pfam Protein Families Database". Nucleic Acids
Res, 2000. 28(1): p. 263-6.
5. Corpet F., Gouzy J., and Kahn D., " Recent Improvements of the Prodom Database of Protein Domain Families".
Nucleic Acids Res, 1999. 27(1): p. 263-7.
6. Schultz J., Copley R.R., Doerks T., Ponting C.P., and Bork P., " Smart: A Web-Based Tool for the Study of Genetically Mobile Domains".
Nucleic Acids Res, 2000. 28(1): p. 231-4.
Bucher P., Karplus K., Moeri N., and Hofmann K., " A Flexible Motif Search Technique Based on Generalised Profiles".
Comput Chem, 1996. 20(1): p. 3-23.
Scordis P., Flower D.R., and Attwood T.K.,
"Fingerprintscan: Intelligent Searching of the Prints Motif Database".
Bioinformatics, 1999. 15(10): p. 799-806.
9. Altschul S.F., Madden T.L., Schaffer A.A., Zhang J., Zhang Z., Miller W., and Lipman D.J.,
"Gapped Blast and Psi-Blast: A New Generation of Protein Database Search Programs".
Nucleic Acids Res, 1997. 25(17): p. 3389-402.
11. Haft,D.H., Loftus,B.J., Richardson,D.L., Yang,F., Eisen,J.A., Paulsen,I.T., White,O., " TIGRFAMs: a protein family resource for the functional identification of proteins".
Nucleic. Acids. Res, 2001. 29 (1):41-3
12. Eddy, S.R. "HMMER:
Profile hidden Markov models for biological sequence analysis".
WWW, 2001. http://hmmer.wustl.edu/
- OTHER SERVICES:
This services is also available as an application from the EBI's
srs server: http://srs.ebi.ac.uk/
|