Universal Protein Resource (UniProt) ==================================== The Universal Protein Resource (UniProt), a collaboration between the European Bioinformatics Institute (EBI), the SIB Swiss Institute of Bioinformatics, and the Protein Information Resource (PIR), is comprised of three databases, each optimized for different uses. The UniProt Knowledgebase (UniProtKB) is the central access point for extensively curated protein information, including function, classification and cross-references. The UniProt Reference Clusters (UniRef) combine closely related sequences into a single record to speed up sequence similarity searches. The UniProt Archive (UniParc) is a comprehensive repository of all protein sequences, consisting only of unique identifiers and sequences. UniParc ======= The UniProt Archive (UniParc) is a non-redundant protein sequence archive, containing all new and revised protein sequences from all publicly available sources (http://www.uniprot.org/help/uniparc) to ensure that complete sequence coverage is available at a single site. To avoid redundancy, all sequences 100% identical over the entire length are merged, regardless of the source organism. New and updated sequences are cross-referenced to the source database accession number, and provided with a sequence version that increments upon changes to the underlying sequence. The basic information stored within each UniParc entry is the identifier, the sequence, cyclic redundancy check number, source database(s) with accession and version numbers, and a time stamp. If a UniParc entry lacks a cross-reference to a UniProtKB entry, the reason for its exclusion from UniProtKB is provided (e.g. pseudogene). In addition, each source database accession number is tagged with its status in that database, indicating if the sequence still exists or has been deleted in the source database and cross-references to NCBI GI and TaxId if appropriate. This directory contains the following files: uniparc_active.fasta.gz UniParc sequences with active cross-references to the source database. uniparc_all.xml.gz All UniParc sequences including those that have been deleted from the source database. It also includes: - cross-references to the source databases - status of the sequence in the source database (e.g. if the sequence still exist the status will be "active") - source database accessions and version numbers - if the sequence is not in UniProtKB, the reason for its exclusion - cross-references to NCBI GI and TaxID if appropiate uniparc.xsd Schema definition for the UniParc XML These files are updated with each UniProt release. -------------------------------------------------------------------------------- LICENSE -------------------------------------------------------------------------------- We have chosen to apply the Creative Commons Attribution 4.0 International (CC BY 4.0) License (https://creativecommons.org/licenses/by/4.0/) to all copyrightable parts of our databases. (c) 2002-2021 UniProt Consortium -------------------------------------------------------------------------------- DISCLAIMER -------------------------------------------------------------------------------- We make no warranties regarding the correctness of the data, and disclaim liability for damages resulting from its use. We cannot provide unrestricted permission regarding the use of the data, as some data may be covered by patents or other rights. Any medical or genetic information is provided for research, educational and informational purposes only. It is not in any way intended to be used as a substitute for professional medical advice, diagnosis, treatment or care.