The PROSITE database of protein families and domains
Release Notes

Release 18, July 2003


Table of contents

 1   Introduction
 2   Description of the changes made to PROSITE since release 17.0
 3   Forthcoming changes
 4   Status of the PROSITE files
 5   FTP access to PROSITE
 6   Acknowledgments

(1)   Introduction

This release of PROSITE contains 1,200 documentation entries describing 1,639 different patterns, rules or profiles/matrices. Since release 17.0, 96 entries have been updated, 92 documentation and 138 signatures have been added.

The following table shows the growth of the database since its creation in 1989.

Rel. Date Doc Entries Note
1.003/895860Only released in PC/Gene (Version 5.16)
2.003/89129132Only released in PC/Gene (Version 6.00)
3.005/89? 160 
4.010/89? 202Printed release (EMBL Biocomputing document)
5.004/90296 338 
6.011/90375433 
7.005/91441 508  
8.011/91530605  
9.006/91580 689  
10.012/92635 803  
11.010/93715 927  
12.006/94785 1029First release to include profiles
13.011/958891167 
14.012/979971335 
15.006/981014 1352 
16.0 07/9910341374 
17.012/0111081501 
18.007/0312001639 


(2)   Description of the changes made to PROSITE since release 17.0


2.1   New version of the profile scan tools pfscan and pfsearch
For more details on new implementation see:
2.2   Distribution of a reference tool to scan PROSITE
Since release 17 we are distributing a program (ps_scan) that allows to scan a sequence against all PROSITE patterns, profiles and rules. New output formats are now available, see:

2.3   Cross-reference to PDB chain in the documentation (prosite.doc)
When a signature identifies a specific chain in a PDB entry this chain is indicated in the documentation as follows:
 (see  <PDB:1J5E; M>)
where M is the PDB chain identifier.
2.4   Update of cross-references to Swiss-Prot
Cross-references to Swiss-Prot entries in DR lines were previously updated at each release. From release 18.0 onward these lines will be updated at each weekly update.

2.5   Deleted accession number

A file containing all deleted AC (psdelac.txt) has been added to the prosite package:



(3)   Forthcoming changes

3.1   Introduction of a new method to identify repeats

Generally repeats possess high amino acid substitution rates and their identification is highly problematic. Even if the presence of a certain repeat family is known, the exact locations and the number of repetitive units often cannot be determined using current profile search. We have implemented a context dependant threshold that allows the detection of strongly divergent repeats when well characterized ones have already been identified.

This method will be implemented in ps_scan.pl, the reference tool to scan PROSITE and the following minor changes will be done in the profile format:

Tags 'R' and 'RR' will be introduced in the field TEXT of MA   /CUT_OFF lines. This minor change are compatible with old version of pfsearch and pfscan. Example:

MA   /CUT_OFF: LEVEL=0; SCORE=246; N_SCORE=8.5; MODE=1; TEXT='R';
MA   /CUT_OFF: LEVEL=-1; SCORE=158; N_SCORE=5.8; MODE=1; TEXT='RR'; 

3.2   Extension of the DR line length to 76 characters
Swiss-Prot has plans to elongate the mnemonic code for the protein name from up to 4 characters to up to 5 characters. E.g. the mnemonic code for the meiotic recombination protein rec10 is currently 'RE10'. After the introduction of extended entry names it could be modified to the 5-letter code 'REC10'.

This Swiss-Prot modification will introduce a change in the size of PROSITE DR lines. As soon as Swiss-Prot introduces the 5-letter code in ID lines, we will extend PROSITE DR lines to 76 characters.


3.3   Cross-references to external databases

3.4   Modification of the taxonomic range description
This qualifier is used to indicate the taxonomic range of a pattern or a matrix. The syntax of that qualifier is actually as follows:
CC   /TAXO-RANGE=ABEPV;

Some domains are clearly known to be absent from certain taxonomic range. For example no ubiquitination is observed in prokaryotes so we can exclude the presence of domains linked to ubiquitination in this kingdom. Actually there are no possibilities to distinguish between a range where we know that the presence of a given domain will never appear and a range where a domain was never found but where we cannot exlude its presence. To distinguish these two possibilities we change the TAXO-RANGE format as followed:

CC   /TAXO-RANGE=A?EP-;

where '?' indicates that no matches were observed for a given signature but we cannot exclude its presence in this range. and '-' indicates that we can exclude its presence.

3.5   Version numbers for signatures

The format will be:

CC   /VERSION=n;
where n is a digit number.

All signatures anterior to release 18.0 will have the version number 1. Version number will be incremented only when modification will appear in PA and MA lines.



(4)   Status of the PROSITE files

PROSITE is distributed with different data and documentation files. The following table lists the files that are currently available.

prosuser.txt User manual
profile.txt Description of the profile syntax
psrelnot.txt Release notes for the current release
prosite.dat Patterns, profiles and rules databases (updated weekly)
prosite.doc Documentation database for each pattern and profile (updated weekly)
prosite.lis List of documentation entries (updated weekly)
pautindex.txt Authors index (updated weekly)
psdelac.txt Deleted accession number index (updated weekly)
experts.txt List of on-line experts for PROSITE and Swiss-Prot (updated weekly)
jourlist.txt List of cited journals in PROSITE (updated weekly )
ps_98.txt Announcement concerning PROSITE

We have continued to include in some PROSITE documentation entries the references of Web sites relevant to the subject under consideration. There are now 69 documents that include such links.

(5)   FTP access to PROSITE

PROSITE is available for download on the following anonymous FTP servers:

Organization Swiss Institute of Bioinformatics (SIB)
Address ftp.expasy.org
Directory /databases/prosite/


(6)   Acknowledgments

This release of PROSITE has been prepared by:

Nicolas Hulo (1) and Christian Sigrist (1);
With the help of: Edouard De Castro (1), Elisabeth Gasteiger (1), Alexandre Gattiker (1), Virginie Le Saux (1), Petra Langendijk-Genevaux (1), Amos Bairoch (1) and Philipp Bucher (2)

(1) Swiss-Prot group, Swiss Institute of Bioinformatics.
(2) ISREC bioinformatics group, Swiss Institute of Bioinformatics.