#### README ####

IMPORTANT: Please note you can download subsets of data via the
BioMart data mining tool.
See https://www.ensembl.org/info/data/biomart/ for more information.

####################
Fasta Peptide dumps
####################
These files hold the protein translations of Ensembl genes.

-----------
FILE NAMES
------------
The files are consistently named following this pattern:
   <species>.<assembly>.<sequence type>.<status>.fa.gz

<species>:       The systematic name of the species.
<assembly>:      The assembly build name.
<sequence type>: pep for peptide sequences
<status>
  * 'pep.all' - all translations resulting from Ensembl genes.
  * 'pep.abinitio' translations resulting from 'ab initio' gene
     prediction algorithms such as SNAP and GENSCAN. In general, all
     'ab initio' predictions are based solely on the genomic sequence and
     not any other experimental evidence. Therefore, not all GENSCAN
     or SNAP predictions represent biologically real proteins.
fa : All files in these directories represent FASTA database files
gz : All files are compacted with GNU Zip for storage efficiency.

EXAMPLES (Note: Not all species have 'pep.abinitio' data)
 for Human:
    Homo_sapiens.NCBI36.pep.all.fa.gz
      contains all annotated peptides
    Homo_sapiens.NCBI36.pep.abinitio.fa.gz
      contains all abinitio predicted peptide

-------------------------------
FASTA Sequence Header Lines
------------------------------
The FASTA sequence header lines are designed to be consistent across
all types of Ensembl FASTA sequences.

Stable IDs for genes, transcripts, and translations are suffixed with
a version if they have been generated by Ensembl (this is typical for
vertebrate species, but not for non-vertebrates).
All ab initio data is unversioned.

General format:

>TRANSLATION_ID SEQTYPE LOCATION GENE_ID TRANSCRIPT_ID GENE_BIOTYPE TRANSCRIPT_BIOTYPE

Example of Ensembl Peptide header:

>ENSP00000328693.1 pep chromosome:NCBI35:1:904515:910768:1 gene:ENSG00000158815.1 transcript:ENST00000328693.1 gene_biotype:protein_coding transcript_biotype:protein_coding
 ^                 ^   ^                                   ^                      ^                            ^                           ^
 TRANSLATION_ID    |   LOCATION                            GENE_ID                TRANSCRIPT_ID                GENE_BIOTYPE                TRANSCRIPT_BIOTYPE
                SEQTYPE