* SPD DataSet Release On Aug.11 2004 * spd.all.gz: description: All SPD protein sequences in fasta format. format: Header line (protein_id, name if exists, description, species, data source, length), sequence line. spd.nr90.gz description: A non-redundant dataset of "spd.all.gz". Via pairwise-blastp, sequences with overall-identity >= 90% and overall-coverage (the ratio between the tiled HSPs and the short one between query and subject) >= 90% are excluded. format: Identical with "spd.all.gz". spd.nr90.species.gz description: In comparison with the above "spd.nr90.gz", redundancy is removed only in the same species, i.e., similar sequences among different organisms are remained. format: Identical with "spd.all.gz". spd.mrna.gz description: Corresponding mRNA sequences for all entries if exist in fasta format. format: Header line (mRNA accession number, protein_id, length), sequence. spd.name.gz description: A list of spd protein names if exists (tab-delimited). format: Protein_id, official name, rank, species (ordered by species and rank). spd.name.nr90.gz description: The version of "spd.name.gz" corresponding to "spd.nr90.gz". format: Identical with "spd.name.gz". spd.name.nr90.species.gz description: The version of "spd.name.gz" corresponding to "spd.nr90.species.gz". format: Identical with "spd.name.gz". spd.ref.gz description: Reference(paper) number of each proteins in SPD (tab-delimited). format: Protein_id, number. spd.classes.domain.gz description: Representative domains used to classify SPD core proteins (tab-delimited). format: Domain accession number, domain name, functional category (ordered by category, accession number). spd.classes.pro.gz description: Classification of SPD core proteins with known SwissProt secreted proteins excluded via domain analysis (tab-delimited). format: Protein_id, functional category (ordered by category, protein_id) spd_core_ref.mcl.gz description: Cluster of SPD core proteins as well as nine reference datasets via tribe-mcl format: cluster_id, protein_id, protein source, member number of cluster notes: Some proteins might have multiple data sources. For example, both GO derived secreted proteins and SwissProt vertebrate secreted proteins contains "MM20_BOVIN", "WN3A_XENLA", etc. For further information, please refer to http://spd.cbi.pku.edu.cn/help/spd_help.php Center of Bioinformatics, Peking University