This directory contains all the 65-way Enredo-Pecan-Ortheus (EPO) multiple alignments corresponding to Release 112 of Ensembl (see http://www.ensembl.org for further details and credits about the Ensembl project). The core set of species used for the 32-way EPO alignment: - Reedfish (fErpCal1.1) - Mexican tetra (Astyanax_mexicanus-2.0) - Zebrafish (GRCz11) - Common carp (Cypcar_WagV4.0) - Goldfish (ASM336829v1) - Rainbow trout (USDA_OmykA_1.1) - Coho salmon (Okis_V2) - Atlantic salmon (Ssal_v3.1) - Brown trout (fSalTru1.1) - Pinecone soldierfish (fMyrMur1.1) - Tongue sole (Cse_v1.0) - Siamese fighting fish (fBetSpl5.2) - Greater amberjack (Sdu_1.0) - Turbot (ASM1334776v1) - Gilthead seabream (fSpaAur1.1) - European seabass (dlabrax2021) - Large yellow croaker (L_crocea_2.0) - Channel bull blenny (fCotGob3.1) - Lumpfish (fCycLum1.pri) - Guppy (Guppy_female_1.0_MT) - Platyfish (X_maculatus-5.0-male) - Turquoise killifish (Nfu_20140520) - Japanese medaka HdrR (ASM223467v1) - Indian medaka (Om_v0.7.RACA) - Javanese ricefish (OJAV_1.1) - Orange clownfish (Nemo_v1) - Zebra mbuna (M_zebra_UMD2a) - Nile tilapia (O_niloticus_UMD_NMBU) - Tetraodon (TETRAODON8) - Fugu (fTakRub1.2) - Asian bonytongue (fSclFor1.1) - Spotted gar (LepOcu1) And the extra 2X genomes are: - Electric eel (fEleEle1.pri) - Channel catfish (ASM400665v3) - Red-bellied piranha (fPygNat1.pri) - Golden-line barbel (SAMN03320097.WGS_v1.1) - Denticle herring (fDenClu1.2) - Atlantic herring (Ch_v2.0.2v2) - Huchen (ASM331708v1) - Chinook salmon (Otsh_v2.0) - Northern pike (fEsoLuc1.pri) - Zig-zag eel (fMasArm1.2) - Climbing perch (fAnaTes1.3) - Yellowtail amberjack (Sedor1) - Barramundi perch (ASB_HGAPassembly_v1) - Ballan wrasse (BallGen_V1) - Pike-perch (SLUC_FBN_1) - Stickleback (GAculeatus_UGA_version5) - Mummichog (Fundulus_heteroclitus-3.0.2) - Sheepshead minnow (C_variegatus-1.0) - Sailfin molly (P_latipinna-1.0) - Amazon molly (PoeFor_5.1.2) - Mangrove rivulus (ASM164957v1) - Chinese medaka (ASM858656v1) - Spiny chromis (ASM210954v1) - Clown anemonefish (ASM2253959v1) - Bicolor damselfish (Stegastes_partitus-1.0.2) - Midas cichlid (Midas_v5) - Lyretail cichlid (NeoBri1.0) - Eastern happy (fAstCal1.3) - Makobe Island cichlid (PunNye1.0) - Burton's mouthbrooder (AstBur1.0) - Tiger tail seahorse (H_comes_QL1_v1) - Atlantic cod (gadMor3.0) - Paramormyrops kingsleyae (PKINGS_0.1) The species tree was: ( ( ( ( ( ( Denticeps clupeoides reference strain:0.75323, Clupea harengus reference strain:0.78091 ):0.15906, ( ( ( Electrophorus electricus reference strain:0.58425, Ictalurus punctatus strain USDA103-YYmale1:0.64666 ):0.05371, ( Pygocentrus nattereri reference strain:0.33795, Astyanax mexicanus:0.43537 ):0.13273 ):0.19974, ( ( ( Cyprinus carpio carpio:0.15419, Carassius auratus:0.20312 ):0.03673, Sinocyclocheilus grahami:0.2193 ):0.13317, Danio rerio:0.41057 ):0.4007 ):0.20046 ):0.13331, ( ( ( ( ( ( ( ( ( ( ( Anabas testudineus reference strain:0.28662, Betta splendens:0.47626 ):0.09832, Mastacembelus armatus:0.39061 ):0.0438, ( ( ( Seriola dumerili:0.0629, Seriola lalandi dorsalis:0.07968 ):0.19449, Lates calcarifer:0.2475 ):0.02589, Scophthalmus maximus:0.43711 ):0.04712 ):0.02713, ( ( ( ( Dicentrarchus labrax:0.25921, Larimichthys crocea:0.31315 ):0.02616, Sparus aurata:0.35136 ):0.02855, Labrus bergylta:0.48716 ):0.02481, ( ( Sander lucioperca:0.27455, Cottoperca gobio:0.37557 ):0.02822, ( Cyclopterus lumpus:0.32096, Gasterosteus aculeatus aculeatus reference strain:0.42779 ):0.11683 ):0.06945 ):0.04663 ):0.02177, Cynoglossus semilaevis:0.78272 ):0.01598, ( ( ( ( ( Amphiprion ocellaris ecotype Okinawa:0.04286, Amphiprion percula:0.04759 ):0.0977, Acanthochromis polyacanthus:0.1712 ):0.11863, Stegastes partitus:0.25354 ):0.14691, ( ( ( ( ( ( Maylandia zebra:0.02492, Astatotilapia calliptera reference strain:0.03223 ):0.02461, Pundamilia nyererei:0.04826 ):0.01246, Haplochromis burtoni:0.05057 ):0.03852, Neolamprologus brichardi:0.11518 ):0.04949, Oreochromis niloticus:0.09781 ):0.1765, Amphilophus citrinellus:0.24422 ):0.20757 ):0.03031, ( ( ( Kryptolebias marmoratus:0.4172, Nothobranchius furzeri:0.526 ):0.08462, ( ( Fundulus heteroclitus:0.39044, Cyprinodon variegatus:0.41691 ):0.04558, ( ( ( Poecilia formosa:0.03607, Poecilia latipinna:0.04169 ):0.08606, Poecilia reticulata:0.12567 ):0.05593, Xiphophorus maculatus:0.14646 ):0.23393 ):0.21836 ):0.0809, ( ( Oryzias melastigma:0.15829, Oryzias javanicus:0.185 ):0.10854, ( Oryzias latipes:0.09465, Oryzias sinensis:0.14735 ):0.1882 ):0.47712 ):0.10514 ):0.05653 ):0.04394, ( Takifugu rubripes:0.34799, Tetraodon nigroviridis:0.45786 ):0.4561 ):0.04204, Hippocampus comes:0.90438 ):0.13392, Myripristis murdjan:0.4309 ):0.10898, Gadus morhua:0.99063 ):0.23245, ( ( ( ( Salmo salar:0.08069, Salmo trutta:0.08624 ):0.06362, ( ( Oncorhynchus kisutch:0.0776, Oncorhynchus tshawytscha reference strain:0.08536 ):0.0284, Oncorhynchus mykiss:0.09091 ):0.07056 ):0.03446, Hucho hucho:0.1995 ):0.22067, Esox lucius reference strain:0.43949 ):0.33094 ):0.19612 ):0.18782, ( Scleropages formosus:0.64486, Paramormyrops kingsleyae:0.72514 ):0.25088 ):0.29227, Lepisosteus oculatus:0.74728 ):0.24864, Erpetoichthys calabaricus:1.07869 ); To build the 32-way alignment, first, Enredo is used to build a set of co-linear regions between the genomes and then Pecan aligns these regions. Next, Ortheus uses the Pecan alignments to infer the ancestral sequences. Then the 2X genomes were mapped to the japanese medaka hdrr sequence using their pairwise BlastZ-net alignments. Any insertions in the 2X genomes were removed (ie no gaps were introduced into the japanese medaka hdrr sequence). Enredo is a graph-based method. The initial graph is built from a mapping of a set of anchors on every genome. Note that each anchor can map several times on a single genome. Enredo uses this information to define co-linear regions. Read more about Enredo: https://github.com/jherrero/enredo Pecan is a global multiple sequence alignment program that makes practical the probabilistic consistency methodology for significant numbers of sequences of practically arbitrary length. As input it takes a set of sequences and a phylogenetic tree. The parameters and heuristics it employs are highly user configurable, it is written entirely in Java and also requires the installation of Exonerate. Read more about Pecan: https://github.com/benedictpaten/pecan Ortheus is a probabilistic method for the inference of ancestor (a.k.a tree) alignments. The main contribution of Ortheus is the use of a phylogenetic model incorporating gaps to infer insertion and deletion events. Read more about Ortheus: https://github.com/benedictpaten/ortheus GERP scores the conservation of each position in the alignment and defines constrained elements based on these conservation scores. Read more about Gerp: http://mendel.stanford.edu/SidowLab/downloads/gerp/index.html Alignments are grouped by japanese medaka hdrr chromosome, and then by coordinate system. Alignments containing duplications in japanese medaka hdrr are dumped once per duplicated segment. The files named *.other*.emf contain alignments that do not include any japanese medaka hdrr region. Each file contains up to 200 alignments. An emf2maf parser is available with the ensembl compara API, in the scripts/dumps directory. Alternatively you can download it using the GitHub frontend: https://github.com/Ensembl/ensembl-compara/raw/master/scripts/dumps/emf2maf.pl