Universal Protein Resource (UniProt)
====================================


The Universal Protein Resource (UniProt), a collaboration between the European
Bioinformatics Institute (EBI), the SIB Swiss Institute of Bioinformatics, and
the Protein Information Resource (PIR), is comprised of three databases, each
optimized for different uses. The UniProt Knowledgebase (UniProtKB) is the
central access point for extensively curated protein information, including
function, classification and cross-references. The UniProt Reference Clusters
(UniRef) combine closely related sequences into a single record to speed up
sequence similarity searches. The UniProt Archive (UniParc) is a comprehensive
repository of all protein sequences, consisting only of unique identifiers and
sequences.


IDMAPPING
=========

This directory, databases/uniprot/current_release/knowledgebase/idmapping/,
contains the idmapping data files which are updated in conjunction with the UniProt 
Knowledgebase (UniProtKB). Whenever available the mappings are extracted from the 
UniProtKB records. 

All files listed below contain the complete data sets corresponding to the
most recent release.

1) idmapping.dat
This file has three columns, delimited by tab:
1. UniProtKB-AC 
2. ID_type 
3. ID
where ID_type is the database name as appearing in UniProtKB cross-references, 
and as supported by the ID mapping tool on the UniProt web site, 
http://www.uniprot.org/mapping and where ID is the identifier in 
that cross-referenced database.


2) idmapping_selected.tab
We also provide this tab-delimited table which includes
the following mappings delimited by tab:

1. UniProtKB-AC
2. UniProtKB-ID
3. GeneID (EntrezGene)
4. RefSeq
5. GI
6. PDB
7. GO
8. UniRef100
9. UniRef90
10. UniRef50
11. UniParc
12. PIR
13. NCBI-taxon
14. MIM
15. UniGene
16. PubMed
17. EMBL
18. EMBL-CDS
19. Ensembl
20. Ensembl_TRS
21. Ensembl_PRO
22. Additional PubMed


3) example files
idmapping_selected.tab.example has the first 1000 lines from idmapping_selected.tab
idmapping.dat.example has the first 10,000 lines from idmapping.dat


4) We provide separate ID mapping tables for selected model organisms in subdirectory: by_organism


5) idmapping.dat.2015_03 and idmapping_selected.tab.2015_03
These are archived versions of the files idmapping.dat and idmapping_selected.tab, respectively, for release 2015_03.
This was the last release before proteome redundancy reduction (http://www.uniprot.org/help/proteome_redundancy) which 
caused the size of UniProtKB/TrEMBL to drop from 92 million to 47 million entries. Users trying to map obsolete identifiers
to external databases and vice versa may find these files useful.


6) README
This file.


The /complete/docs subdirectory contains various UniProt documents.


--------------------------------------------------------------------------------
  LICENSE
--------------------------------------------------------------------------------
We have chosen to apply the Creative Commons Attribution 4.0 International
(CC BY 4.0) License (https://creativecommons.org/licenses/by/4.0/) to all
copyrightable parts of our databases.

(c) 2002-2022 UniProt Consortium

--------------------------------------------------------------------------------
  DISCLAIMER
--------------------------------------------------------------------------------
We make no warranties regarding the correctness of the data, and disclaim
liability for damages resulting from its use. We cannot provide unrestricted
permission regarding the use of the data, as some data may be covered by patents
or other rights.

Any medical or genetic information is provided for research, educational and
informational purposes only. It is not in any way intended to be used as a
substitute for professional medical advice, diagnosis, treatment or care.