#### README ####

IMPORTANT: Please note you can download subsets of data via the
BioMart data mining tool.
See https://www.ensembl.org/info/data/biomart/ for more information.

##################
Fasta cDNA dumps
#################

These files hold the cDNA sequences corresponding to Ensembl genes,
excluding ncRNA genes, which are in a separate 'ncrna' Fasta file.
cDNA consists of transcript sequences for actual and possible
genes, including pseudogenes, NMD and the like. See the file names 
explanation below for different subsets of both known and predicted 
transcripts.

------------
FILE NAMES
------------
The files are consistently named following this pattern:
<species>.<assembly>.<sequence type>.<status>.fa.gz

<species>: The systematic name of the species.
<assembly>: The assembly build name.
<sequence type>: cdna for cDNA sequences
<status>
  * 'cdna.all' - all transcripts of Ensembl genes, excluding ncRNA.
  * 'cdna.abinitio' - transcripts resulting from 'ab initio' gene prediction
     algorithms such as SNAP and GENSCAN. In general all 'ab initio'
     predictions are solely based on the genomic sequence and do not
     use other experimental evidence. Therefore, not all GENSCAN or SNAP
     cDNA predictions represent biologically real cDNAs.
     Consequently, these predictions should be used with care.

EXAMPLES  (Note: Not all species have 'cdna.abinitio' data)
  for Human:
    Homo_sapiens.NCBI36.cdna.all.fa.gz
      cDNA sequences for all transcripts
    Homo_sapiens.NCBI36.cdna.abinitio.fa.gz
      cDNA sequences for 'ab initio' prediction transcripts.

------------------------------
FASTA Sequence Header Lines
------------------------------
The FASTA sequence header lines are designed to be consistent across
all types of Ensembl FASTA sequences.

Stable IDs for genes and transcripts are suffixed with
a version if they have been generated by Ensembl (this is typical for
vertebrate species, but not for non-vertebrates).
All ab initio data is unversioned.

General format:

>TRANSCRIPT_ID SEQTYPE LOCATION GENE_ID GENE_BIOTYPE TRANSCRIPT_BIOTYPE

Example of an Ensembl cDNA header:

>ENST00000289823.1 cdna chromosome:NCBI35:8:21922367:21927699:1 gene:ENSG00000158815.1 gene_biotype:protein_coding transcript_biotype:protein_coding
 ^                 ^    ^                                       ^                      ^                           ^
 TRANSCRIPT_ID     |    LOCATION                                GENE_ID                GENE_BIOTYPE                TRANSCRIPT_BIOTYPE
                SEQTYPE