This directory contains all the 91-way Enredo-Pecan-Ortheus (EPO) multiple alignments corresponding to Release 114 of Ensembl (see http://www.ensembl.org for further details and credits about the Ensembl project). The core set of species used for the 44-way EPO alignment: - Greater horseshoe bat (mRhiFer1_v1.p) - Arabian camel (CamDro2) - Sperm whale (ASM283717v2) - Vaquita (mPhoSin1.pri) - Beluga whale (ASM228892v3) - Narwhal (NGI_Narwhal_1) - Blue whale (mBalMus1.v2) - Yarkand deer (CEY_v1) - Sheep (ARS-UI_Ramb_v2.0) - Goat (ARS1) - Cattle (ARS-UCD2.0) - Hybrid - Bos Indicus (UOA_Brahman_1) - Domestic yak (LU_Bosgru_v3.0) - Pig (Sscrofa11.1) - Chacoan peccary (CatWag_v2_BIUU_UCD) - Dingo (ASM325472v1) - Dog (ROS_Cfam_1.0) - Leopard (PanPar1.0) - Lion (PanLeo1.0) - Horse (EquCab3.0) - White-tufted-ear marmoset (mCalJac1.pat.X) - Olive baboon (Panubis1.0) - Macaque (Mmul_10) - Crab-eating macaque (Macaca_fascicularis_6.0) - Vervet-AGM (ChlSab1.1) - Gibbon (Nleu_3.0) - Bonobo (panpan1.1) - Chimpanzee (Pan_tro_3.0) - Human (GRCh38) - Gorilla (gorGor4) - Sumatran orangutan (Susie_PABv2) - Mouse Lemur (Mmur_3.0) - Rabbit (OryCun2.0) - Mouse (GRCm39) - Ryukyu mouse (CAROLI_EIJ_v1.1) - Shrew mouse (PAHARI_EIJ_v1.1) - Norway rat - BN/NHsdMcwi (GRCr8) - Northern American deer mouse (HU_Pman_2.1) - Chinese hamster CHOK1GS (CHOK1GS_HDv1) - Prairie vole (MicOch1.0) - Guinea Pig (Cavpor3.0) - Eurasian red squirrel (mSciVul1.1) - Alpine marmot (marMar2.1) - Elephant (loxAfr3) And the extra 2X genomes are: - Microbat (Myoluc2.0) - Megabat (pteVam1) - Alpaca (vicPac1) - Dolphin (turTru1) - Siberian musk deer (MosMos_v2_BIUU_UCD) - Wild yak (BosGru_v2.0) - American bison (Bison_UMD1.0) - Red fox (VulVul2.2) - Polar bear (UrsMar_1.0) - American black bear (ASM334442v1) - Giant panda (ASM200744v2) - American mink (NNQGG.v01) - Ferret (MusPutFur1.0) - Tiger (PanTig1.0) - Domestic cat (F.catus_Fca126_mat1.0) - Donkey (ASM1607732v2) - Hedgehog (HEDGEHOG) - Shrew (COMMON_SHREW1) - Tarsier (Tarsius_syrichta-2.0.1) - Bolivian squirrel monkey (SaiBol1.0) - Panamanian white-faced capuchin (Cebus_imitator-1.0) - Ma's night monkey (Anan_2.0) - Drill (Mleu.le_1.0) - Sooty mangabey (Caty_1.0) - Pig-tailed macaque (Mnem_1.0) - Black snub-nosed monkey (ASM169854v1) - Golden snub-nosed monkey (Rrox_v1) - Bushbaby (OtoGar3) - Coquerel's sifaka (Pcoq_1.0) - Greater bamboo lemur (Prosim_1.0) - Tree Shrew (TREESHREW) - Pika (OchPri2.0-Ens) - Kangaroo rat (Dord_2.0) - Lesser Egyptian jerboa (JacJac1.0) - Steppe mouse (MUSP714) - Western wild mouse (SPRET_EiJ_v3) - Golden Hamster (MesAur1.0) - Upper Galilee mountains blind mole rat (S.galili_v1.0) - Degu (OctDeg1.0) - Long-tailed chinchilla (ChiLan1.0) - Naked mole-rat female (Naked_mole-rat_maternal) - Squirrel (SpeTri2.0) - Arctic ground squirrel (ASM342692v1) - Armadillo (Dasnov3.0) - Sloth (choHof1) - Hyrax (proCap1) - Lesser hedgehog tenrec (TENREC) The species tree was: ( ( ( ( Loxodonta africana loxAfr3:0.02728, Procavia capensis proCap1:0.11856 ):0.00704, Echinops telfairi:0.1665 ):0.01314, ( Dasypus novemcinctus Dasnov3.0:0.03877, Choloepus hoffmanni:0.15408 ):0.01369 ):0.00383, ( ( ( ( ( ( Rhinolophus ferrumequinum:0.03242, Pteropus vampyrus pteVam1:0.0769 ):0.00379, Myotis lucifugus Myoluc2.0:0.05004 ):0.00759, ( ( ( ( ( ( ( ( Monodon monoceros:0.00197, Delphinapterus leucas:0.00236 ):0.00247, Phocoena sinus:0.0061 ):0.00175, Tursiops truncatus turTru1:0.03779 ):0.00666, Physeter catodon:0.01179 ):0.00166, Balaenoptera musculus:0.01245 ):0.01498, ( ( ( ( Ovis aries reference breed:0.00323, Capra hircus reference breed:0.00334 ):0.00759, ( ( ( Bos mutus:0.00232, Bos grunniens:0.00817 ):0.00261, Bison bison bison:0.00799 ):0.00116, ( Bos taurus reference breed:0.00078, Bos indicus x Bos taurus UOA_Brahman_1:0.00105 ):0.00146 ):0.01023 ):0.00256, Moschus moschiferus:0.01336 ):0.00137, Cervus hanglu yarkandensis:0.0123 ):0.02052 ):0.00458, ( Catagonus wagneri:0.01739, Sus scrofa reference breed:0.02226 ):0.01581 ):0.00238, ( Camelus dromedarius:0.0064, Vicugna pacos vicPac1:0.07923 ):0.02507 ):0.00795 ):0.00124, ( ( ( ( ( ( Ursus americanus:0.00276, Ursus maritimus:0.00902 ):0.00591, Ailuropoda melanoleuca reference isolate:0.01034 ):0.01052, ( Mustela putorius furo:0.00597, Neogale vison:0.00707 ):0.01674 ):0.00333, ( ( Canis lupus familiaris reference breed:0.00154, Canis lupus dingo:0.00155 ):0.0029, Vulpes vulpes:0.00492 ):0.01883 ):0.00457, ( ( ( Panthera pardus:0.00079, Panthera leo:0.00164 ):0.00057, Panthera tigris altaica:0.00607 ):0.00214, Felis catus reference strain:0.00393 ):0.02122 ):0.01067, ( Equus caballus breed thoroughbred:0.00199, Equus asinus reference breed:0.00425 ):0.0275 ):0.00143 ):0.00226, ( Sorex araneus COMMON_SHREW1:0.1506, Erinaceus europaeus:0.15173 ):0.02049 ):0.00392, ( ( ( ( ( ( ( ( ( ( ( Cricetulus griseus CHOK1GS_HDv1:0.01528, Mesocricetus auratus:0.02368 ):0.00803, Microtus ochrogaster:0.02605 ):0.00176, Peromyscus maniculatus bairdii HU_Pman_2.1:0.02125 ):0.00606, ( ( ( ( ( Mus spretus reference strain:0.00345, Mus musculus reference CL57BL6 strain:0.00438 ):0.0006, Mus spicilegus:0.00363 ):0.00316, Mus caroli strain CAROLI_EIJ:0.00867 ):0.00509, Mus pahari strain PAHARI_EIJ:0.01121 ):0.00907, Rattus norvegicus reference strain:0.02018 ):0.01501 ):0.01794, Nannospalax galili:0.03459 ):0.00802, Jaculus jaculus:0.05535 ):0.00853, Dipodomys ordii Dord_2.0:0.06002 ):0.00409, ( ( ( Chinchilla lanigera:0.02015, Octodon degus:0.03211 ):0.00436, Cavia porcellus Cavpor3.0:0.03759 ):0.0065, Heterocephalus glaber reference strain:0.04216 ):0.02437 ):0.00199, ( ( ( Ictidomys tridecemlineatus SpeTri2.0:0.00535, Urocitellus parryii:0.00601 ):0.00213, Marmota marmota marmota:0.00716 ):0.01353, Sciurus vulgaris:0.02098 ):0.01752 ):0.0053, ( ( Oryctolagus cuniculus OryCun2.0:0.03171, Ochotona princeps OchPri2.0-Ens:0.12141 ):0.01834, Tupaia belangeri:0.16118 ):0.00561 ):0.00284, ( ( ( ( ( ( ( ( ( Pan troglodytes Pan_tro_3.0:0.00168, Pan paniscus:0.00306 ):0.00278, Homo sapiens GRCh38:0.00274 ):0.00104, Gorilla gorilla gorilla gorGor4:0.00701 ):0.00342, Pongo abelii Susie_PABv2:0.00671 ):0.00117, Nomascus leucogenys Nleu_3.0:0.01685 ):0.00357, ( ( ( ( ( Mandrillus leucophaeus:0.00372, Cercocebus atys:0.00574 ):0.0008, Papio anubis Panubis1.0:0.00399 ):0.00073, ( ( Macaca mulatta Mmul_10:0.00115, Macaca fascicularis Macaca_fascicularis_6.0:0.00533 ):0.00059, Macaca nemestrina:0.00387 ):0.00221 ):0.00117, Chlorocebus sabaeus:0.00518 ):0.00275, ( Rhinopithecus roxellana:0.00139, Rhinopithecus bieti:0.00972 ):0.00783 ):0.00584 ):0.00562, ( ( ( Cebus imitator Cebus_imitator-1.0:0.00911, Saimiri boliviensis boliviensis:0.01126 ):0.00152, Aotus nancymaae:0.01227 ):0.00058, Callithrix jacchus mCalJac1.pat.X:0.01117 ):0.01241 ):0.01413, Carlito syrichta Tarsius_syrichta-2.0.1:0.03919 ):0.003, ( ( ( Microcebus murinus Mmur_3.0:0.01715, Propithecus coquereli:0.02417 ):0.00152, Prolemur simus:0.01322 ):0.00945, Otolemur garnettii OtoGar3:0.03356 ):0.00872 ):0.00432 ):0.00439 ):0.005 ); To build the 44-way alignment, first, Enredo is used to build a set of co-linear regions between the genomes and then Pecan aligns these regions. Next, Ortheus uses the Pecan alignments to infer the ancestral sequences. Then the 2X genomes were mapped to the human sequence using their pairwise BlastZ-net alignments. Any insertions in the 2X genomes were removed (ie no gaps were introduced into the human sequence). Enredo is a graph-based method. The initial graph is built from a mapping of a set of anchors on every genome. Note that each anchor can map several times on a single genome. Enredo uses this information to define co-linear regions. Read more about Enredo: https://github.com/jherrero/enredo Pecan is a global multiple sequence alignment program that makes practical the probabilistic consistency methodology for significant numbers of sequences of practically arbitrary length. As input it takes a set of sequences and a phylogenetic tree. The parameters and heuristics it employs are highly user configurable, it is written entirely in Java and also requires the installation of Exonerate. Read more about Pecan: https://github.com/benedictpaten/pecan Ortheus is a probabilistic method for the inference of ancestor (a.k.a tree) alignments. The main contribution of Ortheus is the use of a phylogenetic model incorporating gaps to infer insertion and deletion events. Read more about Ortheus: https://github.com/benedictpaten/ortheus GERP scores the conservation of each position in the alignment and defines constrained elements based on these conservation scores. Read more about Gerp: http://mendel.stanford.edu/SidowLab/downloads/gerp/index.html Alignments are grouped by human chromosome, and then by coordinate system. Alignments containing duplications in human are dumped once per duplicated segment. The files named *.other*.emf contain alignments that do not include any human region. Each file contains up to 200 alignments. An emf2maf parser is available with the ensembl compara API, in the scripts/dumps directory. Alternatively you can download it using the GitHub frontend: https://github.com/Ensembl/ensembl-compara/raw/master/scripts/dumps/emf2maf.pl