This directory contains all the 44-way Enredo-Pecan-Ortheus (EPO) multiple alignments corresponding
to Release 114 of Ensembl (see http://www.ensembl.org for further details and credits about the
Ensembl project).

The set of species is:
 - Elephant (loxAfr3)
 - Mouse Lemur (Mmur_3.0)
 - White-tufted-ear marmoset (mCalJac1.pat.X)
 - Vervet-AGM (ChlSab1.1)
 - Olive baboon (Panubis1.0)
 - Macaque (Mmul_10)
 - Crab-eating macaque (Macaca_fascicularis_6.0)
 - Gibbon (Nleu_3.0)
 - Sumatran orangutan (Susie_PABv2)
 - Human (GRCh38)
 - Bonobo (panpan1.1)
 - Chimpanzee (Pan_tro_3.0)
 - Gorilla (gorGor4)
 - Rabbit (OryCun2.0)
 - Alpine marmot (marMar2.1)
 - Eurasian red squirrel (mSciVul1.1)
 - Guinea Pig (Cavpor3.0)
 - Shrew mouse (PAHARI_EIJ_v1.1)
 - Ryukyu mouse (CAROLI_EIJ_v1.1)
 - Mouse (GRCm39)
 - Norway rat - BN/NHsdMcwi (GRCr8)
 - Chinese hamster CHOK1GS (CHOK1GS_HDv1)
 - Prairie vole (MicOch1.0)
 - Northern American deer mouse (HU_Pman_2.1)
 - Arabian camel (CamDro2)
 - Goat (ARS1)
 - Sheep (ARS-UI_Ramb_v2.0)
 - Domestic yak (LU_Bosgru_v3.0)
 - Cattle (ARS-UCD2.0)
 - Hybrid - Bos Indicus (UOA_Brahman_1)
 - Yarkand deer (CEY_v1)
 - Narwhal (NGI_Narwhal_1)
 - Beluga whale (ASM228892v3)
 - Vaquita (mPhoSin1.pri)
 - Sperm whale (ASM283717v2)
 - Blue whale (mBalMus1.v2)
 - Chacoan peccary (CatWag_v2_BIUU_UCD)
 - Pig (Sscrofa11.1)
 - Greater horseshoe bat (mRhiFer1_v1.p)
 - Horse (EquCab3.0)
 - Dingo (ASM325472v1)
 - Dog (ROS_Cfam_1.0)
 - Lion (PanLeo1.0)
 - Leopard (PanPar1.0)

The species tree was:
(
(
  (
    (
      (
        (
          (
            (
              (
                Cricetulus griseus CHOK1GS_HDv1:0.02331,
                Microtus ochrogaster MicOch1.0:0.02605
              ):0.00176,
              Peromyscus maniculatus bairdii HU_Pman_2.1:0.02125
            ):0.00606,
            (
              (
                (
                  Mus musculus reference CL57BL6 strain:0.00814,
                  Mus caroli strain CAROLI_EIJ:0.00867
                ):0.00509,
                Mus pahari strain PAHARI_EIJ:0.01121
              ):0.00907,
              Rattus norvegicus reference strain:0.02018
            ):0.01501
          ):0.03858,
          Cavia porcellus Cavpor3.0:0.06846
        ):0.00199,
        (
          Marmota marmota marmota marMar2.1:0.02069,
          Sciurus vulgaris mSciVul1.1:0.02098
        ):0.01752
      ):0.0053,
      Oryctolagus cuniculus OryCun2.0:0.05566
    ):0.00284,
    (
      (
        (
          (
            (
              (
                (
                  (
                    Pan troglodytes Pan_tro_3.0:0.00168,
                    Pan paniscus panpan1.1:0.00306
                  ):0.00278,
                  Homo sapiens GRCh38:0.00274
                ):0.00104,
                Gorilla gorilla gorilla gorGor4:0.00701
              ):0.00342,
              Pongo abelii Susie_PABv2:0.00671
            ):0.00117,
            Nomascus leucogenys Nleu_3.0:0.01685
          ):0.00357,
          (
            (
              (
                Macaca mulatta Mmul_10:0.00115,
                Macaca fascicularis Macaca_fascicularis_6.0:0.00533
              ):0.0028,
              Papio anubis Panubis1.0:0.00472
            ):0.00117,
            Chlorocebus sabaeus ChlSab1.1:0.00518
          ):0.00859
        ):0.00562,
        Callithrix jacchus mCalJac1.pat.X:0.02358
      ):0.01713,
      Microcebus murinus Mmur_3.0:0.03684
    ):0.00432
  ):0.00439,
  (
    (
      (
        (
          (
            (
              (
                (
                  (
                    Monodon monoceros NGI_Narwhal_1:0.00197,
                    Delphinapterus leucas ASM228892v3:0.00236
                  ):0.00247,
                  Phocoena sinus mPhoSin1.pri:0.0061
                ):0.00841,
                Physeter catodon ASM283717v2:0.01179
              ):0.00166,
              Balaenoptera musculus mBalMus1.v2:0.01245
            ):0.01498,
            (
              (
                (
                  Ovis aries reference breed:0.00323,
                  Capra hircus reference breed:0.00334
                ):0.00759,
                (
                  (
                    Bos taurus reference breed:0.00078,
                    Bos indicus x Bos taurus UOA_Brahman_1:0.00105
                  ):0.00146,
                  Bos grunniens LU_Bosgru_v3.0:0.01194
                ):0.01023
              ):0.00393,
              Cervus hanglu yarkandensis CEY_v1:0.0123
            ):0.02052
          ):0.00458,
          (
            Catagonus wagneri CatWag_v2_BIUU_UCD:0.01739,
            Sus scrofa reference breed:0.02226
          ):0.01581
        ):0.00238,
        Camelus dromedarius CamDro2:0.03147
      ):0.00795,
      Rhinolophus ferrumequinum mRhiFer1_v1.p:0.0438
    ):0.00124,
    (
      (
        (
          Panthera pardus PanPar1.0:0.00079,
          Panthera leo PanLeo1.0:0.00164
        ):0.02393,
        (
          Canis lupus familiaris reference breed:0.00154,
          Canis lupus dingo ASM325472v1:0.00155
        ):0.0263
      ):0.01067,
      Equus caballus breed thoroughbred:0.02949
    ):0.00143
  ):0.00618
):0.005,
Loxodonta africana loxAfr3:0.05129
);


First, Enredo is used to build a set of co-linear regions between the genomes. Then Pecan aligns
these whole set of sequences. Last, Ortheus uses the Pecan alignments to infer the ancestral
sequences.

Enredo is a graph-based method. The initial graph is built from a mapping of a set of anchors on
every genome. Note that each anchor can map several times on a single genome. Enredo uses this
information to define co-linear regions. Read more about Enredo: https://github.com/jherrero/enredo

Pecan is a global multiple sequence alignment program that makes practical the probabilistic
consistency methodology for significant numbers of sequences of practically arbitrary length. As
input it takes a set of sequences and a phylogenetic tree. The parameters and heuristics it employs
are highly user configurable, it is written entirely in Java and also requires the installation of
Exonerate. Read more about Pecan: https://github.com/benedictpaten/pecan

Ortheus is a probabilistic method for the inference of ancestor (a.k.a tree) alignments. The main
contribution of Ortheus is the use of a phylogenetic model incorporating gaps to infer insertion
and deletion events. Read more about Ortheus: https://github.com/benedictpaten/ortheus

Alignments are grouped by human chromosome, and then by coordinate system. Alignments containing
duplications in human are dumped once per duplicated segment. The files named *.other*.emf contain
alignments that do not include any human region. Each file contains up to 200 alignments.

An emf2maf parser is available with the ensembl compara API, in the scripts/dumps directory.
Alternatively you can download it using the GitHub frontend:
https://github.com/Ensembl/ensembl-compara/raw/master/scripts/dumps/emf2maf.pl