#### README ####

IMPORTANT: Please note you can download correlation data tables,
supported by Ensembl, via the highly customisable BioMart and
EnsMart data mining tools. See http://bacteria.ensembl.org/biomart or
http://www.ebi.ac.uk/biomart/ for more information.

Please send comments or questions to dev@ensembl.org.

---------------------------------
PhyloXML GeneTree Flat File Dumps
---------------------------------

PhyloXML (http://www.phyloxml.org/ and Pubmed ID 19860910) is an XML format which is backed by an XMLSchema for validation purposes. Multiple parsers are available for PhyloXML from numerous toolkits including BioPerl, BioRuby, Forester (Java), Biopython and many more. The PhyloXML format also allows for richer dumps allowing us to provide more information about a gene tree in a single format.

Structure
=========

The structure conforms to the standard PhyloXML structure apart from the following rules and extensions

* A property is provided on clades called "Compara:dubious_duplication" in order to flag nodes which have this same confidence rating in our database
* A property called "Compara:genome_db_name" is provided on every leaf node to indicate the source of the peptide. In some cases taxonomy is a redundant value
* All stable identifiers have the source of EnsemblGenomes even though the true source may be a third party
* All sequences are CDNA alignments

Files
=====
* Every gene tree is put into one file. The file name is the stable id for the tree. If none exists, it is the database id for this gene tree.
* The number of trees for one division can be large. Therefore they are not all placed into one directory, but split into several directories each of which having a maximum number of 1000 files. The names of the directories are given by a running number. They have no meaning other than to avoid the creation of directories with high numbers of files.