This directory contains all the 21-way Enredo-Pecan-Ortheus (EPO) multiple alignments corresponding to Release 114 of Ensembl (see http://www.ensembl.org for further details and credits about the Ensembl project). The set of species is: - Norway rat - BN/NHsdMcwi (GRCr8) - Ryukyu mouse (CAROLI_EIJ_v1.1) - Western wild mouse (SPRET_EiJ_v3) - Mouse PWK/PhJ (PWK_PhJ_v3) - Mouse LP/J (LP_J_v3) - Mouse 129S1/SvImJ (129S1_SvImJ_v3) - Mouse WSB/EiJ (WSB_EiJ_v3) - Mouse (GRCm39) - Mouse C57BL/6NJ (C57BL_6NJ_v3) - Mouse NZO/HlLtJ (NZO_HlLtJ_v3) - Mouse BALB/cJ (BALB_cJ_v3) - Mouse NOD/ShiLtJ (NOD_ShiLtJ_v3) - Mouse A/J (A_J_v3) - Mouse CBA/J (CBA_J_v3) - Mouse C3H/HeJ (C3H_HeJ_v3) - Mouse DBA/2J (DBA_2J_v3) - Mouse AKR/J (AKR_J_v3) - Mouse FVB/NJ (FVB_NJ_v3) - Mouse CAST/EiJ (CAST_EiJ_v3) - Steppe mouse (MUSP714) - Shrew mouse (PAHARI_EIJ_v1.1) The species tree was: ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( Mus musculus strain C3H/HeJ:0.00023, Mus musculus strain CBA/J:0.0004 ):0.00014, Mus musculus strain DBA/2J:0.00026 ):0.0001, Mus musculus strain A/J:0.00036 ):2e-05, ( Mus musculus strain BALB/cJ:0.00017, Mus musculus strain NOD/ShiLtJ:0.00048 ):0.00015 ):6e-05, Mus musculus strain AKR/J:0.0003 ):0.00011, Mus musculus strain FVB/NJ:0.00148 ):8e-05, ( ( ( Mus musculus reference CL57BL6 strain:5e-05, Mus musculus strain C57BL/6NJ:0.00035 ):0.00041, Mus musculus strain NZO/HlLtJ:0.00036 ):0.00019, Mus musculus domesticus strain WSB/EiJ:0.00097 ):0.00021 ):0.00012, ( Mus musculus strain 129S1/SvImJ:0.00014, Mus musculus strain LP/J:0.00029 ):0.00019 ):0.00152, Mus musculus castaneus strain CAST/EiJ:0.00115 ):0.00029, Mus musculus musculus strain PWK/PhJ:0.0019 ):0.00159, Mus spretus reference strain:0.00345 ):0.0006, Mus spicilegus:0.00363 ):0.00316, Mus caroli strain CAROLI_EIJ:0.00867 ):0.00509, Mus pahari strain PAHARI_EIJ:0.01121 ):0.00907, Rattus norvegicus reference strain:0.02018 ); First, Enredo is used to build a set of co-linear regions between the genomes. Then Pecan aligns these whole set of sequences. Last, Ortheus uses the Pecan alignments to infer the ancestral sequences. Enredo is a graph-based method. The initial graph is built from a mapping of a set of anchors on every genome. Note that each anchor can map several times on a single genome. Enredo uses this information to define co-linear regions. Read more about Enredo: https://github.com/jherrero/enredo Pecan is a global multiple sequence alignment program that makes practical the probabilistic consistency methodology for significant numbers of sequences of practically arbitrary length. As input it takes a set of sequences and a phylogenetic tree. The parameters and heuristics it employs are highly user configurable, it is written entirely in Java and also requires the installation of Exonerate. Read more about Pecan: https://github.com/benedictpaten/pecan Ortheus is a probabilistic method for the inference of ancestor (a.k.a tree) alignments. The main contribution of Ortheus is the use of a phylogenetic model incorporating gaps to infer insertion and deletion events. Read more about Ortheus: https://github.com/benedictpaten/ortheus Alignments are grouped by mouse chromosome, and then by coordinate system. Alignments containing duplications in mouse are dumped once per duplicated segment. The files named *.other*.emf contain alignments that do not include any mouse region. Each file contains up to 200 alignments. An emf2maf parser is available with the ensembl compara API, in the scripts/dumps directory. Alternatively you can download it using the GitHub frontend: https://github.com/Ensembl/ensembl-compara/raw/master/scripts/dumps/emf2maf.pl