#### README ####

IMPORTANT: Please note you can download subsets of data via the
BioMart data mining tool.
See https://www.ensembl.org/info/data/biomart/ for more information.

##################
Fasta cds dumps
#################

These files hold the coding sequences corresponding to Ensembl genes.
CDS does not contain UTR or intronic sequence.

------------
FILE NAMES
------------
The files are consistently named following this pattern:
<species>.<assembly>.<sequence type>.<status>.fa.gz

<species>: The systematic name of the species.
<assembly>: The assembly build name.
<sequence type>: cds for CDS sequences
<status>
  * 'cds.all' - all transcript coding sequences resulting from Ensembl genes.

EXAMPLES
  for Human:
    Homo_sapiens.NCBI37.cds.all.fa.gz
      cds sequences for all protein-coding transcripts

-------------------------------
FASTA Sequence Header Lines
------------------------------
The FASTA sequence header lines are designed to be consistent across
all types of Ensembl FASTA sequences.

Stable IDs for genes and transcripts are suffixed with
a version if they have been generated by Ensembl (this is typical for
vertebrate species, but not for non-vertebrates).
All ab initio data is unversioned.

General format:

>TRANSCRIPT_ID SEQTYPE LOCATION GENE_ID GENE_BIOTYPE TRANSCRIPT_BIOTYPE

Example of an Ensembl CDS header:

>ENST00000525148.1 cds chromosome:GRCh37:11:66188562:66193526:1 gene:ENSG00000174576.1 gene_biotype:protein_coding transcript_biotype:nonsense_mediated_decay
 ^                 ^   ^                                        ^                      ^                           ^
 TRANSCRIPT_ID     |   LOCATION                                 GENE_ID                GENE_BIOTYPE                TRANSCRIPT_BIOTYPE
                SEQTYPE