#### README ####

IMPORTANT: Please note you can download correlation data tables, 
supported by Ensembl, via the highly customisable BioMart data mining tool. 
See http://fungi.ensembl.org/biomart/martview 
or http://www.ebi.ac.uk/biomart/ for more information. Not available for
Ensembl Bacteria. 


#######################
Fasta DNA dumps
#######################

----------
FILE NAMES
----------
The files are consistently named using the following name elements:

<species>:   The systematic name of the species. 
<assembly>:  The assembly build name.
<eg_version>: The version of Ensembl Genomes from which the data was exported.
<sequence type>:
  * 'dna' - unmasked genomic DNA sequences.
  * 'dna_rm' - masked genomic DNA.  Interspersed repeats and low 
     complexity regions are detected with the RepeatMasker tool and masked
     by replacing repeats with 'N's.
  * 'dna_sm' - masked genomic DNA.  Interspersed repeats and low 
     complexity regions are detected with the RepeatMasker tool and masked
     by replacing repeats with a lower-cased version of the sequence.
<sequence_id> 
For genomes with a chromosome-level assembly, separate files are also provided 
using the following patterns:
  * 'chromosome.<name>'     - Assembled chromosome sequences
  * 'plasmid.<name>'        - Holds plasmid sequences
  * 'nonchromosomal' - Contains DNA that has not been assigned a chromosome

For all genomes, the complete assembly is available as a set of three files 
with the pattern:
   <species>.<assembly>.<eg_version>.<sequence type>.genome.fa.gz

Note: In this file, the collection of non-overlapping assembled sequences 
such that all assembled sequence is included, and each sequence region is 
included in the largest possible assembly, is known as the set of "toplevel" 
sequences. According to the data, this may consist of chromosome/plasmid 
sequences only (for a completely assembled genome), a mixture of molecule-level 
assemblies and other shorter fragments; or entirely sub-molecule assemblies. 
For compatibility with legacy applications, links to these files are provided 
using the name "toplevel" in place of "genome":
   <species>.<assembly>.<eg_version>.<sequence type>.toplevel.fa.gz

For genomes with a chromosome-level assembly, three sets of files are provided 
for each chromosome or plasmid, and for all other non-chromosomal sequences 
not found in chromosomes:
   <species>.<assembly>.<eg_version>.<sequence type>.<sequence id>.fa.gz
   <species>.<assembly>.<eg_version>.<sequence type>.nonchromosomal.fa.gz
   

Examples:

Drosophila_melanogaster.BDGP5.21.dna.genome.fa.gz - complete unmasked assembled 
sequences from D. melanogaster. 

Drosophila_melanogaster.BDGP5.21.dna_sm.genome.fa.gz - complete soft-masked 
assembled sequences from D. melanogaster. 

Drosophila_melanogaster.BDGP5.21.dna_rm.genome.fa.gz - complete hard-masked 
assembled sequences from D. melanogaster. 

Drosophila_melanogaster.BDGP5.21.dna.toplevel.fa.gz - link to 
Drosophila_melanogaster.BDGP5.21.dna.genome.fa.gz 

Drosophila_melanogaster.BDGP5.21.dna.chromosome.2L.fa.gz - unmasked chromosome
2L sequence

Drosophila_melanogaster.BDGP5.21.dna_sm.chromosome.2L.fa.gz - soft-masked chromosome
2L sequence

Drosophila_melanogaster.BDGP5.21.dna_rm.chromosome.2L.fa.gz - hard-masked chromosome
2L sequence

Drosophila_melanogaster.BDGP5.21.dna.chromosome.2L.fa.gz - unmasked chromosome
2L sequence

Drosophila_melanogaster.BDGP5.21.dna_sm.chromosome.2L.fa.gz - soft-masked chromosome
2L sequence

Drosophila_melanogaster.BDGP5.21.dna_rm.chromosome.2L.fa.gz - hard-masked chromosome
2L sequence