#### README #### IMPORTANT: Please note you can download correlation data tables, supported by Ensembl, via the highly customisable BioMart data mining tool. See http://protists.ensembl.org/biomart/martview or http://www.ebi.ac.uk/biomart/ for more information. Not available for Ensembl Bacteria. ####################### Fasta DNA dumps ####################### ---------- FILE NAMES ---------- The files are consistently named using the following name elements: : The systematic name of the species. : The assembly build name. : The version of Ensembl Genomes from which the data was exported. : * 'dna' - unmasked genomic DNA sequences. * 'dna_rm' - masked genomic DNA. Interspersed repeats and low complexity regions are detected with the RepeatMasker tool and masked by replacing repeats with 'N's. * 'dna_sm' - masked genomic DNA. Interspersed repeats and low complexity regions are detected with the RepeatMasker tool and masked by replacing repeats with a lower-cased version of the sequence. For genomes with a chromosome-level assembly, separate files are also provided using the following patterns: * 'chromosome.' - Assembled chromosome sequences * 'plasmid.' - Holds plasmid sequences * 'nonchromosomal' - Contains DNA that has not been assigned a chromosome For all genomes, the complete assembly is available as a set of three files with the pattern: ....genome.fa.gz Note: In this file, the collection of non-overlapping assembled sequences such that all assembled sequence is included, and each sequence region is included in the largest possible assembly, is known as the set of "toplevel" sequences. According to the data, this may consist of chromosome/plasmid sequences only (for a completely assembled genome), a mixture of molecule-level assemblies and other shorter fragments; or entirely sub-molecule assemblies. For compatibility with legacy applications, links to these files are provided using the name "toplevel" in place of "genome": ....toplevel.fa.gz For genomes with a chromosome-level assembly, three sets of files are provided for each chromosome or plasmid, and for all other non-chromosomal sequences not found in chromosomes: .....fa.gz ....nonchromosomal.fa.gz Examples: Drosophila_melanogaster.BDGP5.21.dna.genome.fa.gz - complete unmasked assembled sequences from D. melanogaster. Drosophila_melanogaster.BDGP5.21.dna_sm.genome.fa.gz - complete soft-masked assembled sequences from D. melanogaster. Drosophila_melanogaster.BDGP5.21.dna_rm.genome.fa.gz - complete hard-masked assembled sequences from D. melanogaster. Drosophila_melanogaster.BDGP5.21.dna.toplevel.fa.gz - link to Drosophila_melanogaster.BDGP5.21.dna.genome.fa.gz Drosophila_melanogaster.BDGP5.21.dna.chromosome.2L.fa.gz - unmasked chromosome 2L sequence Drosophila_melanogaster.BDGP5.21.dna_sm.chromosome.2L.fa.gz - soft-masked chromosome 2L sequence Drosophila_melanogaster.BDGP5.21.dna_rm.chromosome.2L.fa.gz - hard-masked chromosome 2L sequence Drosophila_melanogaster.BDGP5.21.dna.chromosome.2L.fa.gz - unmasked chromosome 2L sequence Drosophila_melanogaster.BDGP5.21.dna_sm.chromosome.2L.fa.gz - soft-masked chromosome 2L sequence Drosophila_melanogaster.BDGP5.21.dna_rm.chromosome.2L.fa.gz - hard-masked chromosome 2L sequence