This directory contains all the 44-way Enredo-Pecan-Ortheus (EPO) multiple alignments corresponding to Release 114 of Ensembl (see http://www.ensembl.org for further details and credits about the Ensembl project). The set of species is: - Elephant (loxAfr3) - Mouse Lemur (Mmur_3.0) - White-tufted-ear marmoset (mCalJac1.pat.X) - Vervet-AGM (ChlSab1.1) - Olive baboon (Panubis1.0) - Macaque (Mmul_10) - Crab-eating macaque (Macaca_fascicularis_6.0) - Gibbon (Nleu_3.0) - Sumatran orangutan (Susie_PABv2) - Human (GRCh38) - Bonobo (panpan1.1) - Chimpanzee (Pan_tro_3.0) - Gorilla (gorGor4) - Rabbit (OryCun2.0) - Alpine marmot (marMar2.1) - Eurasian red squirrel (mSciVul1.1) - Guinea Pig (Cavpor3.0) - Shrew mouse (PAHARI_EIJ_v1.1) - Ryukyu mouse (CAROLI_EIJ_v1.1) - Mouse (GRCm39) - Norway rat - BN/NHsdMcwi (GRCr8) - Chinese hamster CHOK1GS (CHOK1GS_HDv1) - Prairie vole (MicOch1.0) - Northern American deer mouse (HU_Pman_2.1) - Arabian camel (CamDro2) - Goat (ARS1) - Sheep (ARS-UI_Ramb_v2.0) - Domestic yak (LU_Bosgru_v3.0) - Cattle (ARS-UCD2.0) - Hybrid - Bos Indicus (UOA_Brahman_1) - Yarkand deer (CEY_v1) - Narwhal (NGI_Narwhal_1) - Beluga whale (ASM228892v3) - Vaquita (mPhoSin1.pri) - Sperm whale (ASM283717v2) - Blue whale (mBalMus1.v2) - Chacoan peccary (CatWag_v2_BIUU_UCD) - Pig (Sscrofa11.1) - Greater horseshoe bat (mRhiFer1_v1.p) - Horse (EquCab3.0) - Dingo (ASM325472v1) - Dog (ROS_Cfam_1.0) - Lion (PanLeo1.0) - Leopard (PanPar1.0) The species tree was: ( ( ( ( ( ( ( ( ( Cricetulus griseus CHOK1GS_HDv1:0.02331, Microtus ochrogaster MicOch1.0:0.02605 ):0.00176, Peromyscus maniculatus bairdii HU_Pman_2.1:0.02125 ):0.00606, ( ( ( Mus musculus reference CL57BL6 strain:0.00814, Mus caroli strain CAROLI_EIJ:0.00867 ):0.00509, Mus pahari strain PAHARI_EIJ:0.01121 ):0.00907, Rattus norvegicus reference strain:0.02018 ):0.01501 ):0.03858, Cavia porcellus Cavpor3.0:0.06846 ):0.00199, ( Marmota marmota marmota marMar2.1:0.02069, Sciurus vulgaris mSciVul1.1:0.02098 ):0.01752 ):0.0053, Oryctolagus cuniculus OryCun2.0:0.05566 ):0.00284, ( ( ( ( ( ( ( ( Pan troglodytes Pan_tro_3.0:0.00168, Pan paniscus panpan1.1:0.00306 ):0.00278, Homo sapiens GRCh38:0.00274 ):0.00104, Gorilla gorilla gorilla gorGor4:0.00701 ):0.00342, Pongo abelii Susie_PABv2:0.00671 ):0.00117, Nomascus leucogenys Nleu_3.0:0.01685 ):0.00357, ( ( ( Macaca mulatta Mmul_10:0.00115, Macaca fascicularis Macaca_fascicularis_6.0:0.00533 ):0.0028, Papio anubis Panubis1.0:0.00472 ):0.00117, Chlorocebus sabaeus ChlSab1.1:0.00518 ):0.00859 ):0.00562, Callithrix jacchus mCalJac1.pat.X:0.02358 ):0.01713, Microcebus murinus Mmur_3.0:0.03684 ):0.00432 ):0.00439, ( ( ( ( ( ( ( ( ( Monodon monoceros NGI_Narwhal_1:0.00197, Delphinapterus leucas ASM228892v3:0.00236 ):0.00247, Phocoena sinus mPhoSin1.pri:0.0061 ):0.00841, Physeter catodon ASM283717v2:0.01179 ):0.00166, Balaenoptera musculus mBalMus1.v2:0.01245 ):0.01498, ( ( ( Ovis aries reference breed:0.00323, Capra hircus reference breed:0.00334 ):0.00759, ( ( Bos taurus reference breed:0.00078, Bos indicus x Bos taurus UOA_Brahman_1:0.00105 ):0.00146, Bos grunniens LU_Bosgru_v3.0:0.01194 ):0.01023 ):0.00393, Cervus hanglu yarkandensis CEY_v1:0.0123 ):0.02052 ):0.00458, ( Catagonus wagneri CatWag_v2_BIUU_UCD:0.01739, Sus scrofa reference breed:0.02226 ):0.01581 ):0.00238, Camelus dromedarius CamDro2:0.03147 ):0.00795, Rhinolophus ferrumequinum mRhiFer1_v1.p:0.0438 ):0.00124, ( ( ( Panthera pardus PanPar1.0:0.00079, Panthera leo PanLeo1.0:0.00164 ):0.02393, ( Canis lupus familiaris reference breed:0.00154, Canis lupus dingo ASM325472v1:0.00155 ):0.0263 ):0.01067, Equus caballus breed thoroughbred:0.02949 ):0.00143 ):0.00618 ):0.005, Loxodonta africana loxAfr3:0.05129 ); First, Enredo is used to build a set of co-linear regions between the genomes. Then Pecan aligns these whole set of sequences. Last, Ortheus uses the Pecan alignments to infer the ancestral sequences. Enredo is a graph-based method. The initial graph is built from a mapping of a set of anchors on every genome. Note that each anchor can map several times on a single genome. Enredo uses this information to define co-linear regions. Read more about Enredo: https://github.com/jherrero/enredo Pecan is a global multiple sequence alignment program that makes practical the probabilistic consistency methodology for significant numbers of sequences of practically arbitrary length. As input it takes a set of sequences and a phylogenetic tree. The parameters and heuristics it employs are highly user configurable, it is written entirely in Java and also requires the installation of Exonerate. Read more about Pecan: https://github.com/benedictpaten/pecan Ortheus is a probabilistic method for the inference of ancestor (a.k.a tree) alignments. The main contribution of Ortheus is the use of a phylogenetic model incorporating gaps to infer insertion and deletion events. Read more about Ortheus: https://github.com/benedictpaten/ortheus Alignments are grouped by human chromosome, and then by coordinate system. Alignments containing duplications in human are dumped once per duplicated segment. The files named *.other*.emf contain alignments that do not include any human region. Each file contains up to 200 alignments. An emf2maf parser is available with the ensembl compara API, in the scripts/dumps directory. Alternatively you can download it using the GitHub frontend: https://github.com/Ensembl/ensembl-compara/raw/master/scripts/dumps/emf2maf.pl