Universal Protein Resource (UniProt) ==================================== The Universal Protein Resource (UniProt), a collaboration between the European Bioinformatics Institute (EBI), the SIB Swiss Institute of Bioinformatics, and the Protein Information Resource (PIR), is comprised of three databases, each optimized for different uses. The UniProt Knowledgebase (UniProtKB) is the central access point for extensively curated protein information, including function, classification and cross-references. The UniProt Reference Clusters (UniRef) combine closely related sequences into a single record to speed up sequence similarity searches. The UniProt Archive (UniParc) is a comprehensive repository of all protein sequences, consisting only of unique identifiers and sequences. This directory contains files of amino acid altering variants imported from Ensembl Variation databases. Mapped sequence variants are supplied per species in a tab delimited text file. Variants that are manually annotated in UniProtKB/Swiss-Prot, for Homo sapiens only, are available in the humsavar.txt document. This directory contains the following files: humsavar.txt: Index of manually curated Human polymorphisms and disease mutations from UniProtKB/Swiss-Prot. aedes_aegypti_variation.txt.gz The UniProtKB Aedes aegypti reference proteome strain is LVPib12; this is the same strain used by the Ensembl Genome. Ensembl Genomes variation data is derived from two sets of variation data both imported via VectorBase. bos_taurus_variation.txt.gz The UniProtKB Bos taurus (Cow) reference proteome breed is Hereford; this is the Variants are sourced from dbSNP, Online Mendelian Inheritance in Animals (OMIA). The Animal Quantitative Trait Loci (QTL) database Animal QTLdb) and Database of Genomic (Animal QTLdb) and Database of Genomic Variants Archive (DGVa). brachypodium_distachyon_variation.txt.gz The UniProtKB Brachypodium distachyon (Purple false brome) reference proteome strain is cv. Bd21; this is the same strain used by the Genome Reference Consortium for their primary assembly. Ensembl Genomes variation data comes from variations have been identified by the alignment of transcriptome assemblies from three slender false brome (Brachypodium sylvaticum) populations. canis_familiaris_variation.txt.gz The UniProtKB Canis lupus (Dog) reference proteome breed is Boxer; this is the same breed used by Ensembl. Variants are sourced from dbSNP, Online Mendelian Inheritance in Animals (OMIA) and Database of Genomic Variants Archive (DGVa). danio_rerio_variation.txt.gz The UniProtKB Danio rerio (Zebrafish) reference proteome strain is Tuebingen; this is the same strain used by the Genome Reference Consortium for their primary assembly. Ensembl variation source variants from multiple strains and map the variants to the primary assembly; therefore the zebrafish variants defined in this file may have been discovered in another strain of zebrafish. equus_caballus_variation.txt.gz The UniProtKB Equus caballus (Horse) reference proteome breed is Thoroughbred; this is the same breed used by Ensembl. Variants are sourced from dbSNP, Online Mendelian Inheritance in Animals (OMIA), the Animal Quantitative Trait Loci (QTL) database (Animal QTLdb) and Database of Genomic Variants Archive (DGVa). fusarium_oxysporum_variation.txt.gz The UniProtKB Fusarium oxysporum reference proteome strain is 4287 / CBS 123668 FGSC 9935 / NRRL 34936; this is the same strain used by the Ensembl Genome. Ensembl Genomes variation data is derived from comparing 27 different strains of this species. gallus_gallus_variation.txt.gz The UniProtKB Gallus gallus (Chicken) reference proteome breed is Red Jungle fowl, inbred line UCD001; this is the same breed used by Ensembl. Variants are sourced from dbSNP, Online Mendelian Inheritance in Animals (OMIA), the Animal Quantitative Trait Loci (QTL) database (Animal QTLdb) and Database of Genomic Variants Archive (DGVa). homo_sapiens_variation.txt.gz: The variants listed are the Ensembl Variation databases' set of 1000 Genomes project (http://www.1000genomes.org/) and Catalogue of Somatic Mutations In Cancer (COSMIC) v71, imported directly from COSMIC and via Ensembl Variation, protein altering variants (SO:0001583). COSMIC v71 variants are the last freely available somatic variants from COSMIC before their licence change; therefore the accuracy of the information provided for a COSMIC variant should be verified with COSMIC. hordeum_vulgare_variation.txt.gz The UniProtKB Hordeum vulgare reference proteome strain is cv. Morex; this is the same strain used by the Ensembl Genome Ensembl Genomes variation data is derived from WGS survey sequencing of four cultivars, Barke, Bowman, Igri, Haruna Nijo and a wild barley (H. spontaneum), SNPs discovered from RNA-Seq performed on the embryo tissues of 9 spring barley varieties (Barke, Betzes, Bowman, Derkado, Intro, Optic, Quench, Sergeant and Tocada) and Morex, from population sequencing of 90 Morex x Barke individuals, and from population sequencing of 84 Oregon Wolfe barley individuals and SNPs from the Illumina iSelect 9k barley SNP chip. ixodes_scapularis_variation.txt.gz The UniProtKB Ixodes scapularis reference proteome strain is Wikel; this is the strain used by the Ensembl Genome. Ensembl Genomes variation data is same derived from ten populations of Ixodes scapularis, imported from VectorBase. macaca_mulatta_variation.txt.gz The UniProtKB Macaca mulatta (Macaque) reference proteome strain is 17573; this is the same strain used by Ensembl. Variants are sourced from dbSNP, Online Mendelian Inheritance in Animals (OMIA) and Database of Genomic Variants Archive (DGVa). meleagris_gallopavo_variation.txt.gz For Meleagris gallopavo (Turkey) variants are sourced from dbSNP and Online Mendelian Inheritance in Animals (OMIA). monodelphis_domestica_variation.txt.gz The UniProtKB Monodelphis domestica (Opossum) variants are sourced from dbSNP. mus_musculus_variation.txt.gz The UniProtKB Mus musculus (mouse) reference proteome strain is C57BL/6J; this is the same strain used by the Genome Reference Consortium for their primary assembly. Ensembl variation source variants from multiple strains and map the variants to the primary assembly; therefore the mouse variants defined in this file may have been discovered in another strain of mouse. nomascus_leucogenys_variation.txt.gz The UniProtKB Nomascus leucogenys (Gibbon) variants are sourced from Ensembl. ornithorhynchus_anatinus_variation.txt.gz The UniProtKB Ornithorhynchus anatinus (Platypus) reference proteome is from an individual female called Glennie; this is the same breed used by Ensembl. Variants are sourced from dbSNP and Ensembl. oryza_glaberrima_variation.txt.gz The UniProtKB Oryza glaberrima (African rice) reference proteome strain is IRGC 96717; this is the same strain used by the Genome Reference Consortium for their primary assembly. Ensembl Genomes variation data comes from two (unpublised) sources: 20 diverse accessions of Oryza glaberrima and 19 accessions of its wild progenitor, Oryza barthii, collected from geographically distributed regions of Africa. oryza_indica_variation.txt.gz The UniProtKB Oryza sativa (indica) reference proteome strain is cv. 93-11; this is the same strain used by the Genome Reference Consortium for their primary assembly. Ensembl Genomes variation data comes from two NCBI dbSNP sources: SNPs called from the comparison of Oryza sativa Indica and Oryza sativa Japonica and SNPs resulting from OMAP project alignments between O. glaberrima, O. punctata, O. nivara, and O. rufipogon agains O. sativa Japoinca mapped to O. sativa indica. oryza_sativa_variation.txt.gz The UniProtKB Oryza sativa Japonica reference proteome strain is cv. Nipponbare; this is the same strain used by the Genome Reference Consortium for their primary assembly. Ensembl Genomes variation data comes from a collection of SNPs produced by the BGI based on comparison of the Japonica and Indica genome, SNPs derived from the OMAP project, a SNP variation study involving 1311 SNPs across 395 accessions and OryzaSNP, and a large scale SNP variation study involving ~160K SNPs in 20 diversity rice accessions. ovis_aries_variation.txt.gz The UniProtKB Ovis aries (Sheep) variants are sourced from from dbSNP, Online Mendelian Inheritance in Animals (OMIA), and the Animal Quantitative Trait Loci (QTL) database (Animal QTLdb). phytophthora_infestans_variation.txt.gz The UniProtKB Phytophthora infestans reference proteome strain is T30-4; this is the same strain used by the Ensembl Genome. Ensembl Genomes variation data derives from resequecing for 3 different strains PIC99189 (ERP000341), 90128 (ERP000343) and T30-4 (ERP000344). plasmodium_falciparum_variation.txt.gz The UniProtKB Plasmodium_falciparum reference proteome strain is Isolate 3D7; this is the same strain used by the Ensembl Genome. Ensembl Genomes variation data is a direct import from dbSNP. pongo_abelii_variation.txt.gz The UniProtKB Pongo abelii (Orangutan) variants are sourced from dbSNP. solanum_lycopersicum_variation.txt.gz The UniProtKB Solanum lycopersicum reference proteome strain is cv. Heinz 1706; this is the same strain used by the Ensembl Genome. Ensembl Genomes variation data comprises of genetic variation from sequencing of a selection of 84 tomato accessions and related wild species representative for the Lycopersicon, Arcanum, Eriopersicon and Neolycopersicon groups. The variation data has been submitted to the ENA with accession ERP004618. sorghum_bicolor_variation.txt.gz The UniProtKB Sorghum bicolor reference proteome strain is cv. BTx623; this is the same strain used by the Ensembl Genome Ensembl Genomes variation data is derived from two studies: Morris et al 2013. Proc. Natl. Acad. Sci. U.S.A. 110:453-458 and Mace et al. 2013. Nat Commun. 4:2320. sus_scrofa_variation.txt.gz The UniProtKB Sus scrofa (Pig) variants are sourced from dbSNP, the Animal Quantitative Trait Loci (QTL) database (Animal QTLdb), Database of Genomic Variants Archive (DGVa) and the Pig SNP Consortium. taeniopygia_guttata_variation.txt.gz The UniProtKB Taeniopygia guttata (Zebra finch) variants are sourced from dbSNP. triticum_aestivum_variation.txt.gz The UniProtKB Triticum aestivum reference proteome strain is cv. Chinese Spring this is the same strain used by the Ensembl Genome Ensembl Genomes variation data is derived from SNP markers provided by CerealsDB, from the University of Bristol. vitis_vinifera_variation.txt.gz The UniProtKB Vitis vinifera reference proteome strain is cv. Pinot noir PN40024; this is the same strain used by the Ensembl Genome. Ensembl Genomes variation data derives from a collection of grape cultivars and wild Vitis species from the USDA germplasm collection. -------------------------------------------------------------------------------- LICENSE -------------------------------------------------------------------------------- We have chosen to apply the Creative Commons Attribution 4.0 International (CC BY 4.0) License (https://creativecommons.org/licenses/by/4.0/) to all copyrightable parts of our databases. (c) 2002-2022 UniProt Consortium -------------------------------------------------------------------------------- DISCLAIMER -------------------------------------------------------------------------------- We make no warranties regarding the correctness of the data, and disclaim liability for damages resulting from its use. We cannot provide unrestricted permission regarding the use of the data, as some data may be covered by patents or other rights. Any medical or genetic information is provided for research, educational and informational purposes only. It is not in any way intended to be used as a substitute for professional medical advice, diagnosis, treatment or care.