187 faq assembly, NCBI36, archive, older, previous, version {"category": "archives", "question": "
Where are older or archive sites?
", "answer": "Click on the 'View in archive site' link at the bottom of any page. Or, go to www.ensembl.org/info/website/archives/.
", "division": ["vertebrates"]} \N \N 120522 2021-11-15 10:15:36 live 0 0 186 faq FAQ, comparative genomics, species, homology, homologue, paralogue, paralogy, gene tree, protein tree, whole genome alignment, synteny, family, compara, alignment {"question": "How do I see multi-species comparisons?
", "answer": "A number of comparative genomics views provide access to multi-species comparisons. Picture menus are provided for the gene comparative views or the location comparative views.
\\r\\nClick on the genomic alignments link in any gene page to view whole genome alignments for that region. Links at the left of the gene tab also provide access to gene trees, orthologues, paralogues and protein families.
\\r\\nGraphical views showing multi-species comparisons are available in the location tab. The alignments (image) view displays genomic alignments, including genes present in the alignments. The region comparison view shows genomes from multiple species side by side. A dedicated synteny view is also available in the location tab.
", "category": "comparative"} \N \N 5655 2019-09-19 13:26:02 live 0 0 185 faq FAQ, Sequence, exons, introns, nucleotide, DNA, transcript, cDNA {"question": "Can I view exons, introns, and flanking sequence to a transcript?", "answer": "For a colour-coded sequence showing exons including untranslated regions (UTRs) and introns, click on any transcript. From the transcript tab, click on the Exons link at the left. The Exons page allows you to view the transcript sequence, along with flanking and intronic regions.
A short video describes this view.
Click on the configure this page link at the left and customise your view. Or, try BioMart for sequence export.
", "category": "genes"} \N \N \N 2019-09-19 13:26:02 live 0 0 125 faq FAQ, BioMart, HGNC, convert, ID, probes, EntrezGene, data mining {"question": "How do I convert IDs? I have ENSG... IDs and I would like HGNC symbols and EntrezGene IDs along with matching Affymetrix platform HC G110 probes.
", "answer": "This can be done using BioMart. We outline the protocol using Ensembl genes ENSG00000162367 and ENSG00000187048. We will enter in the list of genes and export IDs from multiple databases.
Database: Ensembl genes Dataset: Homo sapiens genes Filters: GENE: ID list limit box: select as the header Ensembl Gene ID(s) and enter gene names.Attributes: EXTERNAL:External References, select HGNC symbol and EntrezGene ID. Scroll down to EXTERNAL: Microarray Attributes to select Affy HC G110.
Click Results at the top.
For BioMart tutorials, see our video on YouTube or YouKu .
", "category": "data"} 2 2008-07-28 11:24:09 104467 2019-09-19 13:26:02 live 7 2 126 faq FAQ, archive, ID, Ensembl ID, old, version, ID history converter, previous, release, current, update {"question": "I have a list of old Ensembl IDs from a previous release. How can I find their IDs in the current version?
", "answer": "The gene IDs might be the same in the current version. Search for the gene ID in the browser, or in BioMart. A gene ID can change if the gene structure changes dramatically, for example if a gene is split into two, or alternatively, two genes are merged into one.
\\r\\nIf you have a list of IDs, submit them to our ID History converter. Click on the 'Tools' link at the top of most Ensembl pages, and follow the link to the converter.
\\r\\nOr, view our older, archive sites.
", "category": "archives"} 2 2008-07-28 15:41:12 106005 2019-09-19 13:26:02 live 0 0 142 view regulation, reg feat, reg-feat, functional genomics, promoter, enhancer, ENCODE, ChIP-Chip, ChIP-seq, gene regulation, regulatory regions, transcription factor binding, DNAse I, Pol II, CTCF, RNA-Seq, ENCODE, expression, RNASeq, ChIPSeq, protein, histone, {"ensembl_object": "Gene", "content": "Gene regulation information from the following sources are shown in this view:
\\r\\nThe graphical display shows sequences from the sources listed above that are potentially involved in gene regulation. Click on any block for information about that sequence. Ensembl transcripts are shown at the top of the view. Configure this page allows track selection.
\\r\\nFor human this includes Regulatory Features from the Ensembl Regulatory Build (Zerbino et al. 2015). These sequences are assigned a stable ID beginning with ENSR, and reflect information from various cell types. The Reg. Feats track combines evidence from all cell types into predicted regulatory elements, while the regulatory features for individual cell types are shown in individual tracks in the diagram (e.g. GM06990, etc.) Regulatory features are also shown in the table below.
\\r\\nClick on any block in the 'Reg. Feats' track or any ENSR ID to jump to more detailed information about the regulatory feature.
\\r\\nIn human, eQTLs from GTEX are displayed. You can add different tissues by clicking on Configure this page. eQTL variants are shown as a Manhattan plot of the p-value, against a log scale with lower (ie more significant) p-values higher up. Click on the variants to see the p-value and to click through for more information about the variant.
\\r\\nData sets can also be displayed in a genomic context by using the Region in detail view, or mined from the Ensembl Regulation (funcgen) database using our API. BioMart allows export of these features (start with the functional genomics database). Find an introduction to BioMart on our tutorials page.
", "ensembl_action": "Regulation"} 5655 2008-08-14 18:23:40 106005 2019-09-19 13:26:02 live 0 0 127 faq {"question": "I am looking for a clone that contains my gene or region of interest", "answer": "The \\"Region in Detail\\" view displays clones along the chromosome, along with genes. Turn on clones from various clonesets in the Control Panel. Look for more information by clicking on a clone in the display. Note: Ensembl does not sell clones, only displays positional information.", "category": "z_data"} 2 2008-07-28 15:46:09 \N 2019-09-19 13:26:02 dead 0 0 128 faq FAQ, functional genomics, regulation, gene regulation, regulatory regions, promoter, enhancer, transcription factor binding, DNAse I, Pol II, CTCF, ChIp-chip, ChIp-Seq, RNA-Seq, ENCODE {"question": "Does Ensembl have promoters or regulatory regions?", "answer": "Known promoters are not available for most genes in Ensembl. However, for mouse and human you can find possible promoters and enhancers based on experiments from the ENCODE project. See the Ensembl Regulatory Build documentation for more. Data from genome-wide studies (ChIP-Seq) are used to generate Regulatory Features, which amongst other classifications, can be annotated as \\"Promoter Associated\\". Regulatory features are accessible as follows:
1. Turn on one of the Regulatory Features tracks in the Location view. See the \\"Regulation\\" menu in the configuration panel. Clicking on a regulatory feature in this track will show a stable ID with a link to the Regulation view. Other menu choices show the specific transcription factor binding sites (TFBS) or histone modications, for example, in the region.
2. The Gene tab also has a Regulation view (available in the left hand menu). This page displays regulatory features based on ENCODE data as described above. Where available, there are additional features for human, mouse, and fly (e.g. cisRED, miRNA targets, VISTA enhancers, and REDfly).
Other access to these data uses the Perl API to query the functional genomics database. Alternatively, use BioMart.
", "category": "regulation"} 2 2008-07-28 15:47:48 106005 2019-09-19 13:26:02 dead 21 6 129 faq FAQ, genes, genebuild, transcript, ENSG, Ensembl identifiers, annotation {"question": "I think my gene is wrongly annotated, or missing transcripts.", "answer": "Ensembl determines genes using automatic annotation, involving both computer and biological expertise to determine an entire gene set. This is the Ensembl genebuild. Initial alignment of proteins/mRNAs lead to our transcript set, so all genes in Ensembl link back to protein/mRNA evidence, termed the supporting evidence. For an example see this page. Transcript information must be present in public biological databases such as EMBL-Bank, UniProt and NCBI RefSeq in order to be used to determine Ensembl genes. Click on External References from a gene page or General identifiers from a transcript page to see matching sequences across databases. Consider submitting any sequences to EMBL-Bank. For more transcripts, turn on Vega/Havana genes in the Region in Detail page. Please report any confusing gene annotation to our helpdesk.", "category": "web"} 2 2008-07-28 15:50:47 \N 2019-09-19 13:26:02 dead 3 1 130 faq FAQ, sequence, export, data, BioMart, APIs {"question": "How can I export sequence?
", "answer": "Export individual sequences or whole genome alignments using the Export data button at the left of the gene, transcript, or location page. A video describes this. Alternatively, export in batch using BioMart. Perls programmers can use our Perl-API to access all Ensembl data. Sequences can be exported via our REST API as well. See here for more.
", "category": "data"} 2 2008-07-28 15:52:38 120522 2019-09-19 13:26:02 live 1 0 135 view ortholog, orthologue, homolog, homologue, species, orthologs, orthologues, homologs, homologues, homology, orthology {"content": "Orthologues inferred from gene trees are determined using all species in that particular database, i.e. all the (mostly) chordates in Ensembl, all the fungi in Ensembl Fungi, all the plants in Ensembl plants, all the metazoa in Ensembl Metazoa, all the protists in Ensembl Protists, or all the species in the Pan-Compara set for Pan-Compara orthologues in Ensembl Genomes. A detailed description of the method is provided here.
Unaligned sequences (nucleotide and/or amino acids) of orthologous genes can be exported in FASTA format by clicking on Sequence export. The Compara API and BioMart can also be used to export orthologues.
Species are grouped by clades in the top table, such as Primate, Rodents, and Fish. By default, the full list of orthologues is shown below the table. Click on Show details to display only the orthologues for species in one clade.
The number of species for each orthologue type is shown in the top table. Orthologue types are assigned by comparing two species, and are as follows:
Orthologues are defined in Ensembl as genes for which the most common ancestor node is a speciation event. These ancestral speciation events are represented by blue nodes in the gene trees.
Possible orthologues are homologues between species where the common ancestor is a weakly supported duplication event. Although they should be called paralogues according to the Compara rules, the low confidence on the duplication node might suggest an error in the phylogenetic reconstruction. We list these cases here as they might be real orthologues, especially in cases where no better orthologue is found.
The list of orthologues underneath the top table shows the species, the orthologue type, the Ensembl gene ID and name, links to other views, the Target %ID and the Query %ID, the Gene Order Conservation (GOC) score, the Whole Genome Alignment (WGA) coverage, and an indication of confidence of orthology.
If you are searching for a gene in human, for example, and looking for its homologue in another species such as mouse, the Query %ID refers to the percentage of the query sequence (human) that matches to the homologue (the mouse protein). Target %ID refers to the percentage of the target sequence (mouse) that matches to the query sequence (human).
IDs, orthology types, and dn/ds values can also be obtained using the compara API or with BioMart.
", "ensembl_action": "Compara_Ortholog", "ensembl_object": "Gene"} 5655 2008-08-13 10:23:09 120522 2023-01-18 09:49:27 live 0 0 376 view variant, variation, SNP, polymorphism, 1000 Genomes, mutation, HapMap, consequence, dbSNP {"content": "Short sequence variations are shown by consequence type in the Variation Table.
\\r\\nThis view shows all the variant consequences in a gene. If the same variant falls in several transcripts within the same gene, a new row will be displayed for each transcript.
\\r\\nThe table columns are as follows:
\\r\\nYou can order the table by the columns by clicking on the up/down arrows by the column titles. Filters above the table allow you to filter the data by various parameters.
\\r\\n[[IMAGE::Variation_table2.png height=\\"294\\" width=\\"504\\"]]
"} 5655 2012-07-18 11:03:10 120522 2019-09-19 13:26:02 live 0 0 136 view paralog, paralogue, homolog, homologue, paralogs, paralogues, homologs, homologues, paralogy, homology, duplicate, copy {"ensembl_object": "Gene", "content": "Homologues are inferred from the gene trees, which are determined using all species in Ensembl. A detailed description of the method is provided here.
\\r\\nParalogues are defined in Ensembl as genes for which the most common ancestor node is a duplication event. These ancestral duplications are represented by red nodes in the gene trees.
\\r\\nThe table shows the taxonomic level of the ancestor duplication node, the Ensembl gene ID and name, the location of the paralogue, and the percent of identical amino acids in the paralogue compared with the gene of interest (Target %ID). The identity of the gene of interest when compared with the paralogue is the query %ID.
\\r\\nFor alignments of the paralogues, use the links in the Compare column.
\\r\\nUnaligned sequences (nucleotide and/or amino acids) of paralogs can be exported in FASTA format by clicking on Sequence export. The Compara API and BioMart can also be used to export paralogues.
", "ensembl_action": "Compara_Paralog"} 5655 2008-08-13 10:26:33 99923 2019-09-19 13:26:02 live 0 0 137 view comparative genomics, homology, homolog, paralog, ortholog, species, phylogenetic, protein, paralogy, orthology, homologue, orthologue, paralogue, gene, phylogeny, taxonomy, tree {"ensembl_object": "Gene", "content": "Ensembl gene trees are generated by the Gene Orthology/Paralogy prediction method pipeline. All homologues in Ensembl are determined from gene trees.
\\r\\nGene trees are constructed using one representative protein for every gene in every species in Ensembl. The longest translation annotated by the CCDS project is used, if any are available, or the longest protein-coding translation otherwise. (The trees can also be considered as protein trees).
\\r\\nThe display shows the maximum likelihood phylogenetic tree representing the evolutionary history of genes. These trees are reconciled with a species tree, generated by TreeBeST. Internal nodes are then annotated for duplication (red boxes) or speciation (blue boxes) events.
\\r\\n[[IMAGE::genetree.png]]
\\r\\nRed squares represent duplications nodes, blue squares represent speciation nodes, giving rise to paralogues and orthologues. Another class of node, ambiguous, is shown as a lighter blue square.
\\r\\nThe gene of interest is highlighted in red and within-species paralogues are shown in blue, if the option to view paralogues is selected (below the tree diagram).
\\r\\nTaxonomy IDs refer to the NCBI Taxonomy Browser. The number at the top of pop-up menus (upon clicking on a node) corresponds to the node_id from the protein_tree_node table in the compara database.
\\r\\nMultiple alignment of the peptides (green bars) was made using MUSCLE. Green bars shows areas of amino acid alignment, white areas are gaps in the alignment. Dark green bars indicate consensus alignments.
\\r\\nClick on a node to expand a collapsed set of branches into a full tree. The consensus amino acid alignment corresponds to the consensus residues in the collapsed node, and will be expanded when the tree is expanded.
\\r\\nYou can also view a detailed sequence alignment in Wasabi by clicking on a node.
\\r\\nYou can also use the collapse and expand links at the bottom as follows:
\\r\\nView current gene only: Shows the default view of the gene tree, where the selected gene and the node it is within is expanded fully, while all other nodes are collapsed.
\\r\\nView paralogs of current gene: Shows the current gene and all its paralogues with their nodes expanded fully, while all other nodes are collapsed.
\\r\\nView all duplication nodes: Expands all the red duplication nodes and all of the nodes they fall within, while speciation nodes remain collapsed.
\\r\\nView fully expanded tree: Expands all nodes.
\\r\\nCollapse all the nodes at the taxonomic rank: Choose your preferred taxonomic rank from the drop down to see all nodes expanded above that rank, and collapsed below it.
\\r\\nConfigure this page to customise the tree. Colouring by clade can be removed.
\\r\\nGene trees can be exported as EMF (Ensembl Multi Format) files from the Ensembl ftp site.
", "ensembl_action": "Compara_Tree"} 5655 2008-08-13 10:32:27 122937 2019-09-19 13:26:02 live 0 0 215 faq FAQ, download, sequence, genome, DNA, ftp, BioMart, protein, FASTA, cDNA, CDS, transcriptome {"question": "How can I download all the gene/transcript/protein sequences for a species? Do I use BioMart?
", "answer": "While BioMart can export sequences, the entire set for any species can be downloaded directly from our ftp site. Please use this method of download, as BioMart cannot handle the very large query of a whole genome/transcriptome/proteome.
\\r\\nIf you do need a customised sequence header, consider splitting the BioMart query into chromosomes. Make sure the compressed webfile/notify by email option is selected.
\\r\\nFor BioMart tutorials, see our video on YouTube or YouKu.
", "category": "data"} 5655 2009-07-16 11:08:34 120522 2019-09-19 13:26:02 live 0 0 138 view comparative genomics, alignment, species, whole genome alignment, WGA, conserved, conservation, genome, Pecan, EPO, nucleotide, sequence {"ensembl_object": "Gene", "content": "Whole genome aligments include \\"pairwise\\" sequence alignments between two species, and multi-species alignments using genomes of more than two species.
\\r\\nExport alignments with the \\"export data\\" link in the left hand navigation column of the \\"Genomic alignments\\" page. Note that the exported sequence will come from the forward strand, even if the alignment shown is on the reverse strand, and flanking regions shown in the alignments view will not be included in the exported sequence.
\\r\\nThe sequence is centered on one gene. To change the sequence shown, use a different gene or click on \\"genomic alignments\\" from the location tab.
\\r\\nOnly one species is shown by default. Click on select an alignment at the top of the sequence in order to choose which species' alignment to view, this will open the Species Selector box. You can either use the search box to select your species or click on the species divisions (in green) to navigate and select (by checking the boxes) from any of the available species in Ensembl. The selected species will appear on the right side of the species selector box, to remove species; click the (-) button on the right of its name. Click 'apply' to close the species selector and view the alignment.
\\r\\nChromosomes and scaffolds in the alignment are listed for each species. Sequence is shown under these coordinates. Red, highlighted nucleotides are located in exons.
\\r\\nClick on configure this page at the left of the view to add or change the display. Customisable options are as follows:
\\r\\nIf the sequence is not known, nucleotides are replaced by dots.
\\r\\n[[IMAGE::text_alignments.png width=\\"400\\" height=\\"300\\"]]
\\r\\nThe image above shows the multiple alignments for five catarrhini primates. Conserved nucleotides have been turned on, and are shown by blue highlighting. Variations and line numbering (relative to the coordinate system) are also selected by clicking on the configure this page button.
\\r\\nNote: Ancestral sequences may be turned off using configure this page.
\\r\\n[[MOVIE::222]]
", "ensembl_action": "Compara_Alignments"} 5655 2008-08-13 11:07:05 120463 2019-09-19 13:26:02 live 0 0 139 view chromosome, statistics, custom data, gene, upload, number, count {"ensembl_object": "Location", "content": "A schematic representation of a chromosome is shown and it includes graphical displays of different biotypes of genes and genomic features:
\\r\\nProtein Coding Genes gives the number of genes annotated on this chromosome and that contain an ORF (open reading frame).
\\r\\nShort Non Coding Genes gives the number of genes annotated on this chromosome that are less than 200 nt long and do not contain an ORF (open reading frame), e.g. miRNAs and snoRNAs.
\\r\\nLong Non Coding Genes gives the number of genes annotated on this chromosome that are longer than 200 nt long and do not contain an ORF (open reading frame), e.g. lincRNAs and antisense.
\\r\\nPseudogenes shares an evolutionary history with a functional protein-coding gene but it has been mutated through evolution to contain to contain frameshift (s) and/or stop codon(s) that disrupt the open reading frame.
\\r\\n% GC / repeats shows the percentage of Gs and Cs (in red in the graph) and of repetitive (e.g. satellite DNA) regions (in black)
\\r\\nVariations lists the number of variations that Ensembl has mapped on this chromosome.
\\r\\nFor more on Ensembl gene annotation, please see this article.
\\r\\nYou can change from a chromosome to another from the drop down menu at the bottom of the image.
\\r\\nA summary is provided per chromosome and includes length (in base pairs), number of coding and non coding genes, pseudogenes and short variants (e.g. SNPs and indels). Length is pre-calculated in order to speed up page display, and stored in the seq_region table of the core database. The number is based on the assembled end position of the last seq_region in each chromosome (from the AGP file), or if there is a terminal gap it is set to the assembled end location of that terminal gap.
\\r\\nSummary statistics are also available for the entire genome rather per chromosome and can be found on species-specific pages. See statistics for human assembly and annotation of GRCh38.p2.
\\r\\nPlease note: Gene counts presented per chromosome on an Ensembl chromosome.
\\r\\nYou can add your own data to this display by clicking on the Custom Tracks button at the left of the page. More details on how to use your data in Ensembl.
\\r\\nIf you have already uploaded data to another view, you can turn this track on by clicking on the Configure this page link and selecting a track in Your data.
\\r\\nNote: The display is highly customisable. All tracks may be turned off or their track style can be changed by clicking on the Configure this page link. The track styles available for this display are bar chart - outline, bar chart - filled and line graph. For the % GC / repeats track, the options to customise are on and off only.
", "ensembl_action": "Chromosome"} 5655 2008-08-13 14:37:49 125866 2019-09-19 13:26:02 live 0 0 422 faq share, email, e-mail, collaborator, colleague, lab, custom, track, view, archive, data, user, upload {"question": "I've created a customised view in Ensembl. Can I share this with my colleagues and collaborators by email?
", "answer": "[[IMAGE::large_share_button.png]]
\\r\\n[[IMAGE::small_share_button.png]]
\\r\\nThe share button creates a permanent web-link that you can email to a colleague or collaborator. This link will include what you see on the location graphic: the location, whatever Ensembl tracks you have switched on and any custom tracks that you have uploaded. The link specifies the version of Ensembl you're using, so will point to the archive sites in the case of updates, even changes in assembly.
\\r\\nFor example you can find the sharing icon on the Region in Detail page.
\\r\\nYou could even save share links in your bookmarks, so that you can return to a view at a later date.
", "category": "data"} 106005 2013-01-04 09:56:56 120463 2019-09-19 13:26:02 live 0 0 140 view genome, region, location, position, zoom, browse, scrollable, scroll, user data, custom data, upload track, attach data {"content": "Region in detail allows you to browse genes, variants, sequence conservation, and other annotation along the genome. There are three main images (or panels): Chromosome, Overview and Region (Figure 1).
[[IMAGE::Region_in_detail_Fig_1.png]]
The first panel shows the chromosome of interest, marking any haplotypes or patches in red or green, respectively, and a cytogenetic banding pattern when available.
[[IMAGE::Region_in_detail_Fig_2.png]]
The next panel is called the Overview Image (Figure 2). It shows an overview of genes along the chromosome (depicted as a blue bar). The overview size depends on the species. It's a 1Mb region in human, 0.5 Mb region in zebrafish and 0.1 Mb in yeast, for example. The individual contigs that make up the genomic assembly are coloured in light or dark blue.
Gene colours are as follows:
The overview image is scrollable in up to date versions of Chrome, Firefox, and Internet Explorer 9 and later. There is also support for Safari 5.1, but may not work on certain old Macbooks.
To scroll along the genome, click the scroll arrows (A). Hold the arrows to keep scrolling to farther up or downstream regions.
Use the track height button (B) to switch between fixed track height (arrows facing in icon) and auto-adjust track height (arrows facing out icon), and reset to default using the arrow wheel button. In the fixed track height mode, adjust the height by dragging up and down from the paired horizontal lines between the tracks. As you scroll across the chromosome in automatic track height, the track height automatically adjusts to fit in all features. In fixed track height, you may find that not all features within a track are displayed, and the height needs to be adjusted to fit them in.
Jump to a position in your display using the drag/select icons (C and D). Click the double ended arrow (C) to scroll to another region by dragging the mouse cursor along the image up or downstream. Click (D) to select a region. The cursor will change to a vertical dotted line. Drag a box in the image and select Jump to region or Mark region.
In the overview image display, icons are available to change data tracks, share or resize the view, export the image, reset the configuration, reset data track order and change the image from scroll to static.
[[IMAGE::Region_in_detail_Fig_3.png]]
The third image (Figure 3) allows a more zoomed-in view of Ensembl genes and annotation.
Data tracks (genes, cDNA alignments, etc) are shown above and below the blue bar. Tracks above the blue bar are on the forward strand of the chromosome, and tracks under the blue bar are on the reverse strand. Non-stranded data (such as variants or regulatory features) are shown at the very top or bottom of the image.
Reorder the tracks by using the vertical blue/peach bar at the left. To zoom in or change the display, use our zoom slider, or enter in basepairs manually. Alternatively, click any gene or transcript and select the location link in the pop-up window to zoom in on the feature.
The Drag/Select option in the top right of the image allows you to choose your action, scroll to a region or select a region. Select Scroll to move along the genome with a click and drag, and let go to allow the page to reload with your new location. Click Select to drag out a box around a region or feature of interest to view a pop-up window. You can then Mark a region in the view or Jump to a region.
The Region Image supports custom data with the user tracks, which you can display using the Custom Tracks button in the left hand menu or clicking on icon (E), note that this is also where you can view/edit existing data that you've already added your data to Ensembl
[[IMAGE::Region_in_detail_Fig_4.png]]
[[IMAGE::Region_in_detail_Fig_5.png]]
Drag and select a region in this image and use the pop-up window to mark it. You can also click on the gene itself to mark the location of it. The region/gene will be highlighted in grey. The marked region will remain highlighted if you zoom in and out. To remove the marking, click on the (x) (Figure 4).
[[IMAGE::Region_in_detail_Fig_6.png]]
Add or change data tracks using the configure this page tool button at the left of the page, the cog wheel icons in the images, or click on the track name itself. A list of tracks can be viewed in the configuration panel (Figure 5).
If you have added custom tracks or attached public track hubs; you can view, edit or add more in the Personal Data tab of the configuration panel.
Some tracks can be displayed in different styles, however there is a limit on how much data can be displayed in certain styles. This FAQ explains the styles available and the amount of data that can be displayed.
[[MOVIE::600]]
", "ensembl_action": "View", "ensembl_object": "Location"} 5655 2008-08-13 14:45:47 254453 2022-01-27 12:57:27 live 0 0 600 movie genome, region, location, position, zoom, browse, scrollable, scroll, user data, custom data, upload track, attach data {"title": "The region in Detail View", "youtube_id": "sXZrtcZ-u7A", "youku_id": "", "list_position": "", "length": ""} 254453 2022-01-13 17:36:28 254453 2022-01-13 17:37:20 live \N \N 141 view comparative genomics, alignment, species, whole genome alignment, WGA, conserved, conservation, genome, Pecan, EPO, diagram, nucleotide, sequence, align, compara {"content": "The top panel is similar to the gene map shown in the Region in Detail view in the Location tab.
Whole genome alignments are shown graphically in the lower panel. Alignments themselves are drawn. Note: The Region Comparison view is a similar page that shows the chromosomes, scaffolds and contigs as they are. For example, gaps in the multi species comparison page are gaps in the genome assembly. Gaps in this align slice view may be gaps in the alignment.
Select the alignment at the top of the this panel by using the "alignment" roll-down menu. Choose multi-species alignments across more than two species, or a pairwise alignment between two species.
Horizontal blue bars represent genomic sequence as in other Ensembl views. The filled or hollow horizontal brown bars are clickable, and represent the alignments. If a bar is filled, the forward strand of the chromosome was aligned. Hollow bars represent alignments using the reverse strand of the chromosome.
The brown background is there for contrast, and white vertical stripes are gaps in the alignment. Each panel shows this unique shading for a specific species. Different colours of vertical shading represent different chromosomes in the alignment.
Arrows (triangles) indicate breaks in the alignment. Click on an arrow for more information about the break.
These are inferred from the multiple alignments. If present, the species used to determine the ancestral sequence are listed on the blue bar. For example, a blue bar labelled Hsap, Ptro, Mmul shows an ancestor of the Homo sapiens, Pan troglodytes, and Macaca mulatta genomes.
Customise the view by adding variations, gene sets, and other features using the configure this page link at the left.
To export alignments do so from the genomic alignments link, and click on export data from that page.
Default track visibility will be switched off for gene tracks at larger scales, and for sequence tracks at larger scales still. The thresholds determining these default track visibility changes are set differently for each alignment, and will be displayed in an information box above Cactus image alignments for which tracks have been hidden.
Tracks will be hidden by default with the following limitations:
Tracks hidden in this way can be revealed by zooming in, or by enabling them directly via “Configure this page” or “Add/remove tracks”, by clicking “Genes and transcripts” in the sidebar menu of the “Configure Alignments Image” tab, selecting the “Species to configure” from the dropdown list, and clicking the checkbox of the tracks you would like to display.
Please note that enabling the tracks at very large scales may cause the image generation to fail due to a timeout error.
[[IMAGE::align_view.png width="400" height="300"]]
The image above shows a pairwise alignment between human and mouse. The mouse alignment was clicked on to show a pop-up box with information about that aligned region (on mouse chromosome 5).
", "ensembl_action": "Align", "ensembl_object": "Location"} 5655 2008-08-13 15:28:31 254482 2025-03-11 16:18:49 live 0 0 143 view gene, transcript, splice variant, diagram, structure, status, biotype, merged {"ensembl_action": "Summary", "ensembl_object": "Gene", "content": "This page gives an overview of the information available at the gene level and it's composed of three sections.
At the top, the page shows the gene name and Ensembl gene ID, the full description of the gene, its synonyms, its genomic location and strand, INSDC coordinates, and its number of transcripts.
The following sections show the Transcript Table and the Summary with links to external databases, and a Gene Diagram.
It shows each splice variant of a gene, i.e. protein-coding and non-coding transcripts, in addition to transcript and translation length, the transcript table displays information about biotype, mapped CCDS and RefSeq IDs as well as MANE, APPRIS and TSL flags. This table is hidden by default. Each transcript is given an Ensembl Transcript ID, which is unique and stable.
It provides additonal information and links to external databases:
It depicts the gene and all its transcripts in the context of the genome. The image can be configured to add or remove data tracks.
Transcripts are drawn as boxes for exons and connecting lines for introns. Filled boxes show coding sequence, and empty boxes show UTRs (untranslated regions). Transcripts drawn above the blue bar (i.e the contig) are on the forward strand, whereas transcripts below are on the reverse strand.
Transcripts are represented by different colours:
Blue, pink or grey transcripts are noncoding. Go to the transcript summary help page for more information
it's an indicator of biological significance for genes.
If a gene has been manually annotated (i.e. in human, mouse, zebrafish, pig, and rat), we use the biotypes assigned by the HAVANA team.
Biotypes can be grouped into protein coding, pseudogene, long noncoding and short noncoding.
"} 5655 2008-08-14 19:11:59 254419 2021-03-03 13:55:25 live 0 0 587 view LD, linkage, tag SNP, SNP, linkage disequilibrium, r2, D', variant, variation {"content": "Linkage Disequilibrium (LD) Calculator is a tool for calculating LD between variants using genotypes from a selected population. We only support LD calculation for variants for which we have genotypes from at least 40 samples in the selected population. At the moment we only have sufficient amounts of genotype data from the 1000 Genomes project for human.
\\r\\nThe LD Calculator supports three different types of calculations:
\\r\\n\\r\\n
Compute all pairwise LD values for all variants in a given region. The region is defined by the chromosome name, start and end.
\\r\\n[[IMAGE::LD1.png]]
\\r\\nCompute all pairwise LD values for a list of variants.
\\r\\n[[IMAGE::LD2.png]]
\\r\\nCompute all pairwise LD values for a given variant and all variants that are not further away from the given variant than the selected window size which centres around the input variant.
\\r\\n[[IMAGE::LD3.png]]
\\r\\nSelect the analysis you want to run.
\\r\\nYou can choose a name for your analysis which will help to identify it from the list of all analysis that have been run.
\\r\\nChoose the species for which you want to calculate LD values. We only show species for which we store sufficient amounts of genotype data in VCF files.
\\r\\nSelect the population(s) for which you want to calculate LD values. We only show the populations for which we have sufficient amounts of sample genotype data in VCF files.
\\r\\nPaste data: simply copy and paste the contents of your file into the large text box
\\r\\nUpload data: click the \\"Choose file\\" button and locate the file on your system
\\r\\nProvide file URL: point to a file hosted on a publically accessible address. This can be either a http:// or ftp:// address.
\\r\\nThe different LD calculations require different types of data input:
\\r\\nPairwise LD computation in a region: A newline-separated list of regions where a region is defined by a chromosome name, start and end, for example 6 47204841 48204841. You can paste no more than 20 regions. Each region needs to be smaller than 500000 bp.
\\r\\nPairwise LD computation for a list of variants: A newline-separated list of no more than 20 variant identifiers.
\\r\\nLD computation for a given variant and all variants that are not further away than a given window size: A newline-separated list of variant identifiers.
\\r\\nChoose a threshold value for r2. The value needs to be between zero and one. We only return results with an r2 value greater than or equal to the threshold value.
\\r\\nChoose a threshold value for D’. The value needs to be between zero and one. We only return results with a D’ value greater than or equal to the threshold value.
\\r\\nSet the window size if you want to compute all pairwise LD values for a given variant and all variants that are not further away from the given variant than the defined maximum distance. The maximum allowed window size is 500000 bp and centres around the input variant.
\\r\\nOnce you run the job you'll be redirected to a table that lists jobs that are currently running or recently completed. A ticket ID is assigned to each job and additional information is provided i.e. Analysis, Jobs and Submitted at (date and time). You can customise the table by showing/hiding columns. The status of the job is automatically refreshed every 10 seconds until it is complete.
\\r\\nWe generate a result table for each combination of selected population and depending on the selected analysis:
\\r\\nRegion
\\r\\nList of variants
\\r\\nVariant and all variants that are not further away than a selected window size
\\r\\nWe show the first ten rows of the result table as a preview. The complete table can be downloaded as tab delimited file by clicking on the download button below the preview table.
\\r\\nEach row in the export contains variant information and LD measures:
\\r\\n\\r\\n
Variant 1: Click the link to show further information about the variant
\\r\\nVariant 1 location: The location format is chromosom_name:start or chromosome_name:start-end if start and end position are not the same
\\r\\nVariant 1 consequence: Most severe consequence for the variant across all the features that overlap the variant.
\\r\\nVariant 1 evidence: Summary of the evidence supporting a variant.
\\r\\nVariant 2
\\r\\nVariant 2 location
\\r\\nVariant 2 consequence
\\r\\nVariant 2 evidences
\\r\\nr2
\\r\\nD’
\\r\\nThe Allele Frequency Calculator is a tool that allows you retreive frequency data for variants identified in the 1000 Genomes Project for a genomic region of interest.
\\r\\nWhen you reach the Allele Frequency Calculator web interface, you will be presented with a form to define the allele frequency data to want to retreive.
\\r\\nName for this job (optional): naming each of your data requests with a unique name allows you to track and search the list of your submitted jobs.
\\r\\nSpecies: The Allele Frequency Calculator is based on population frequency data generated by the 1000 Genomes project, and is therefore only available for the human GRCh37 assembly, which is selected by default.
\\r\\nRegion Lookup: Define your genomic region of interest in the format chromosome#:Start_coordinate-End_coordinate e.g 4:122868000-122946000.
\\r\\nChoose data collections or provide your own file URLs: Select the phase of the 1000 Genomes project for which you wish to retreive frequency data.
\\r\\nSelect Phase 3 / Phase 1 populations: If you have selected either 'Phase 3' or 'Phase 1' from the 'Choose data collections or provide your own file URLs' section (above), you are now able to select the populations of the 1000 Genomes project you wish to retreive frequency data. By default, 'ALL' is selected, which will return the frequency data for all the 1000 Genomes populations combined. You are also able to select one, or more, of the individual populations from the 1000 Genomes Project, to retreive the frequency data for particular populations of interest.
\\r\\nNote: You can select 'ALL' and multiple populations in one query, which will return the frequency data for all 1000 Genomes populations combined, followed by the frequency data for each selected individual population.
\\r\\n[[IMAGE::AlleleFrequencyConverter1.png height=\\"252\\" width=\\"1104\\"]]
\\r\\nIf you have selected 'Provide file URLs' from the 'Choose data collections or provide your own file URLs' section (above), you are now able to define URLs that contain files that contain the variation and frequency data you want the Allele Frequency Calculator to use in its calculation.
\\r\\nGenotype file URL: Define a URL that contains a VCF file that contains the population genotypes.
\\r\\nSample-population mapping file URL: Define a URL that contains a file which lists all the individuals and the populations from which they come.
\\r\\n[[IMAGE::AlleleFrequencyConverter2.png height=\\"192\\" width=\\"1135\\"]]
\\r\\nOutput: The output of the calculator can be previewed on the web page and an output file can be downloaded. The header is:
\\r\\nCHR: Chromosome
POS: Start position of the variant
ID: Identification of the variant
REF: Reference allele
ALT: Alternative allele
TOTAL_CNT: Total number of alleles in samples of the chosen population(s)
ALT_CNT: Number of alternative alleles observed in samples of the chosen populations(s)
FRQ: Ratio of ALT_CNT to TOTAL_CNT
The Transcript Support Level (TSL) is a method to highlight the well-supported and poorly-supported transcript models for users. The method relies on the primary data that can support full-length transcript structure: mRNA and EST alignments supplied by UCSC and Ensembl.
"} \N \N 125915 2019-09-19 13:26:02 live 0 0 144 view transcript, splice variant, isoform, alternative splicing event, splice, transcription, splicing, alternatively spliced, exon, intron {"ensembl_object": "Gene", "content": "This view shows all spliced transcripts for a gene, including EST transcripts and ncRNAs (non-coding RNAs).
\\r\\nTranscript structures and colours are described in the gene help page. Ensembl transcripts are based on mRNA/cDNA and protein information from underlying databases such as UniProtKB and NCBI RefSeq. See the gene annotation documentation for more.
\\r\\nTranscripts are drawn as boxes (exons) and lines connecting the boxes (introns). Filled boxes represent coding sequence and unfilled boxes (or portions of boxes) represent UnTranslated Regions (UTR).
\\r\\nFor coding transcripts (gold or red transcripts), protein motifs and domains are shown in purple. Click on a domain (purple block) to see more information such as amino acid positions and links to individual records. These motifs and domains come from various databases listed at the left of the view, for example ProSite and Superfamily.
\\r\\nVertical brown highlights show exon positions so that all exons across transcripts may be compared.
", "ensembl_action": "Splice"} 5655 2008-08-18 16:16:30 106005 2019-09-19 13:26:02 live 0 0 145 view genebuild, annotation, supporting evidence, uniprot, refseq, gene, evidence, data, experimental evidence, experimental data {"ensembl_object": "Gene", "content": "Ensembl genes and transcripts result from alignment of protein and mRNA sequences to the genome (the Ensembl genebuild). Protein and mRNA sequences used to determine the Ensembl transcripts can be viewed in this page.
\\r\\nCoding sequence (CDS) evidence is shown for each transcript in the gene. Strong evidence includes the Consensus Coding Sequence Set (CCDS) set, UniProtKB/Swiss-Prot entries, and the NCBI RefSeq set (IDs beginning with NP). If UnTranslated Sequence (UTR) is annotated, the evidence is shown in the table. More aligned sequences supporting a transcript can be found by clicking 'view evidence' next to a specific transcript .
", "ensembl_action": "Evidence"} 5655 2008-08-18 16:24:18 106005 2019-09-19 13:26:02 live 0 0 146 view {"ensembl_object": "Transcript", "content": "Supporting Evidence for a Transcript
Ensembl genes and transcripts result from alignment of protein and mRNA sequences to the genome (the Ensembl genebuild). Protein and mRNA sequences used to determine an Ensembl transcript can be viewed in this page.
Exons at the top represent the Ensembl transcript. Filled boxes indicate coding sequence, and unfilled, empty boxes at the ends of the transcript are UnTranslated Regions (UTR).
The mRNA and proteins that were used in the initial alignment phase of the Ensembl genebuild are shown, if they support one or more exons. Proteins are in yellow, mRNA is in green, and EST alignments are in purple. Clicking on any exon box shows links to the protein, mRNA or EST record, and to the alignment with the Ensembl transcript.
Note the supporting evidence shows the mRNA and proteins used at the time of the genebuild, which can be found on the species homepage. To find a more current set of matching mRNA and protein sequences to the Ensembl transcript, visit the General identifiers view.
", "ensembl_action": "Evidence"} 5655 2008-08-18 16:30:46 5655 2019-09-19 13:26:02 dead 0 0 147 view Family, protein, JalView, Markov, comparative genomics, homology, homolog, paralogs, paralogy, homologue, orthologue, paralogue, gene, {"ensembl_object": "Gene", "content": "Protein Families are groups of proteins with high sequence similarity. They result from classifying all Ensembl proteins and all the metazoan proteins from UniProtKB (SwissProt and TrEMBL) against the TreeFam HMM library. Clusters are then aligned with Mafft.
\\r\\nThe consensus annotation column shows the family name, which is assigned if over 40% of the members have the same name. For more on this naming, see the article.
\\r\\nTo view protein alignments in the JalView alignment editor, click on the JalView links at the right of the page.
\\r\\n", "ensembl_action": "Family"} 5655 2008-08-18 17:02:59 120463 2019-09-19 13:26:02 live 0 0 148 faq FAQ, clones, 129/AB2, BAC, configuration, Clones, 129/AB2, BAC, configuration, DAS, external data, MICER {"question": "
How do I view clone sets, such as BACs?
", "answer": "About clones
\\r\\n\\r\\n
Turn on clone tracks using the configure this page menu at the left of the region in detail or region overview pages under the location tab.
\\r\\nClones are found in the menu: Misc. regions & clones. Select one or more sets of clones, then click SAVE and close. Clicking on a clone drawn in the region in detail or region overview view displays the accession number from the EMBL database.
\\r\\nExport clones using the export data link at the left.
\\r\\nThe international BAC clone nomenclature is described here.
\\r\\n\\r\\n
Ordering clones
\\r\\nEnsembl does not have clones for sale, however there are several sources for ordering clones on the web. Try the clone registry. Individual libraries can be found here. Clones can also be ordered from imaGenes, C.H.O.R.I., and Geneservice. Clones from the Sanger Institute can be ordered here.
", "category": "z_data"} 5655 2008-08-19 11:45:14 125915 2019-09-19 13:26:02 live 1 0 149 view Variation, polymorphism, allele, SNP, insertion-deletion, indel, dbSNP, synonymous, non-synonymous, mutation, locus, loci, single nucleotide polymorphism {"ensembl_object": "Gene", "content": "Short sequence variants are displayed for each transcript in a gene in the Variation Image.
\\r\\nShort sequence variants for a gene are shown graphically. These are displayed as vertical lines, colour-coded according to the position of the variation in a transcript. These colours are described at the legend on the bottom of the graphic.
\\r\\nAfter the row showing all SNPs in the region, all transcripts in a gene are drawn. Red and gold transcripts are protein-coding, while blue transcripts are non-coding.
\\r\\nEach transcript is expanded to fill the display. See Transcript 1 in the diagram. Underneath each transcript are variations. If the variation is in the coding region, a coloured box will show any possible amino acids, such as the variation coding for alanine, in the diagram. Click any box for more information.
\\r\\nUnderneath the variations for a transcript, protein domains from various databases are drawn. The domains are also shown in the transcript tab, protein summary link at the left of transcript pages. Variations are traced through the protein domains using a line in the appropriate colour (see legend at the bottom of the diagram).
\\r\\nScrolling down past variations and domains for each transcript, all possible variations in the view are shown as boxes. If space is available, the nucleotide alleles are displayed. Click any empty box for the alleles, and a link to more information about the variation (variation properties).
\\r\\nTo simplify the image, choose only one or two variation consequences by configuring the display using the configure this page link at the left. This allows the consequence and variation source to be changed, and the intron context to be altered (i.e. if intronic variations are drawn, and the distance from an exon they must be to be shown). Note, this will also affect the variation table.
\\r\\nIf you would like to zoom in, we suggest you turn on the Sequence variants track using configure this page in Region in Detail.
\\r\\n[[IMAGE::variation_image_help.png width=\\"582\\" height=\\"575\\"]]
", "ensembl_action": "Variation_Gene"} 5655 2008-08-19 12:57:56 106005 2019-09-19 13:26:02 live 0 0 393 movie tutorial, video, RNASeq, genebuild, annotation {"list_position": 6, "youtube_id": "7hfHta6HF4w", "title": "RNASeq in the Genebuild", "youku_id": "XNDgxMzA0ODY4", "length": "6.48"} 106005 2012-11-22 16:46:29 5655 2019-09-19 13:26:02 live 0 0 379 view tree, comparative genomics, compara, orthologue, ortholog, paralogue, paralog, homologue, homolog, orthologues, orthologs, paralogues, paralogs, homologues, homologs, paralogy, orthology, homology, species, conserved, evolution, duplication, deletion {"content": "The Gene Gain/Loss tree summarises the phylogenetic history of a Ensembl gene-family by showing gene gain events (expansions) and gene loss events (contractions) over time.
\\r\\nA red branch on the tree indicates a significant expansion of the gene at that point in its history, a green branch denotes a contraction and a grey branch indicates that there was no significant change.
\\r\\nThe numbers at each node refer to the number of different genes in the ancestral species (as predicted with the CAFE1 tool). The colour of each node reflects the number of members (or genes) coloured according to the legend below.
\\r\\nThe species names are coloured; red (species for the current gene of interest), black (species with current genes in Ensembl in this tree) and grey (species with no current genes in this tree).
\\r\\nFor ncRNA genes, where available, only a selected tree consisting of human, chimp, marmoset, mouse, zebrafinch and zebrafish species is displayed. Note, in these trees, the species of interest may not be present.
\\r\\nClick on any node for a pop-up menu of information, as shown in the image below.
\\r\\n[[IMAGE::gainlosstreenode.png width=\\"463\\" height=\\"166\\"]]
\\r\\nA branch will only be marked as an expansion or contraction if the p-value as determined by CAFE is <0.01. The p-value is shown in the case of an expansion or contraction (compared with the previous node). If no significant change has occured, the p-value shown is 1-(p-value).
\\r\\nClick on the icons at the top left to change what you can see.
\\r\\n[[IMAGE::genegainlossicons.png height=\\"305\\" width=\\"479\\"]]
\\r\\nA minimal tree only displays species which have the gene, whereas a full tree displays all species.
\\r\\n1. CAFE links:
\\r\\nArticle: CAFE: a computational tool for the study of gene family evolution (Hahn et. al)
\\r\\nLab page: Hahn lab
"} 5655 2012-09-21 09:20:43 120522 2019-09-19 13:26:02 live 0 0 150 view gene, old, archive, ID, change, history, retire, past, previous {"ensembl_object": "Gene", "content": "Ensembl stable gene, transcript, and protein identifiers are kept the same throughout Ensembl releases unless the gene or transcript model changes dramatically. In this case, the old stable identifier may be retired and a new one assigned (or two identifiers may be merged into one). The Ensembl Archive tracks all stable identifiers and should provide mappings to the current gene, transcript, and protein set.
\\r\\nThe Ensembl ID is listed, and the status is current if the ID can be found in the current release. The latest version in which the gene, transcript, or protein was found is listed in case the ID has been retired.
\\r\\nFind old IDs in the current Ensembl version by clicking on the \\"Tools\\" link in the header at the top of the page. Follow the link to the ID History Converter.
\\r\\nAn ID History Map shows Ensembl release numbers, genomic assemblies, and versions for Ensembl IDs in a horizontal comparison. Small squares or nodes correspond to the ID shown on the left, and represent an update in the version of the ID. Versions are updated if there has been a change in the gene, transcript, or protein model. Nodes (squares) are connected by a line if the versions are related. This line reflects the score of how well the versions match, for recent releases. If a score is not calculated, the line will be grey (unknown score).
", "ensembl_action": "Idhistory"} 5655 2008-08-19 13:32:25 120463 2019-09-19 13:26:02 live 0 0 151 view gene, transcript, splice variant, diagram, structure, cDNA, spliced, splicing, intron, exon, alternative splicing, alternatively spliced {"ensembl_object": "Transcript", "ensembl_action": "Summary", "content": "Annotation views are separated into gene-based views and transcript-based views according to which level the information is more appropriately associated with. This view is a transcript level view. To flip between the two sets of views you can click on the Gene and Transcript tabs in the menu bar at the top of the page.
The table shows all splice variants for a gene and includes noncoding transcripts. Each transcript ID includes a unique, stable 11 digit number. Transcripts beginning with ENST are human transcripts (for example, ENST00000369985). A three-letter code is inserted for other species; (for example, ENSMUST defines a mouse transcript).
If the transcript is a member of the Consensus CoDing Sequence set, the CCDS ID is listed in the transcript table.
The transcript which you are viewing is highlighted in blue in the table.
Immediately below the transcript table, you will find additional information about the transcript you are viewing. This includes:
Boxes are exons. Lines connecting the boxes are introns. Filled boxes are coding sequence, and empty, unfilled boxes are UTR (UnTranslated Region).
Gold (merged) transcripts and those with a CCDS are both reviewed, high quality transcripts in human and mouse.
Depending on factors such as cell type/ tissue type, you may need to use one or more of the transcripts not in these 'reviewed' sets (i.e. not with a CCDS ID, nor in the merged set). The general identifiers link at the left of the transcript tab shows matching IDs in other databases, and may help you decide on transcripts. ESTs and expression data from various projects can be turned on in the Location tab, Region in Detail view. This may be of use when determining which transcript set to use.
Most of our non-coding transcripts (e.g. nonsense mediated decay, processed transcript) are annotated by the VEGA/Havana project, and are blue, pink, or grey. Descriptions can be found in the VEGA/Havana website or in the Ensembl glossary.
For more detail on Ensembl annotation, see articles listed here.
If the transcript contains a variation whose alternative allele has a population frequency of at least 10% and is causing the loss or gain of a stop codon in a HapMap population, the variation and affected populations are listed.
"} 5655 2008-08-19 14:33:56 254419 2020-06-05 11:35:12 live 0 0 597 faq metadata {"category": "data", "question": "Is there a list of all species and corresponding metadata available in Ensembl Genomes?
", "answer": "We provide a summary of the metadata on the FTP site. The metadata can be downloaded as txt, json and xml.
", "division": ["bacteria", "fungi", "metazoa", "plants", "protists"]} 120522 2020-08-21 09:47:54 128249 2023-05-20 21:08:12 live \N \N 586 view reactome, pathway, interaction, gene, protein, {"content": "Gene Pathway
\\r\\n\\r\\n
This page displays the biochemical pathways that this gene is involved in, from Reactome. The data and interface come in directly from Reactome and are not interpreted or processed by Ensembl.
\\r\\nFor a legend of the reactome diagram, click on the compass icon.
\\r\\n[[IMAGE::pathway_navigation.png]]
\\r\\nFor more information about the pathway analysis and how to use this interface, click on the i icon at the left of the diagram.
\\r\\n\\r\\n
[[IMAGE::pathway_info.png]]
"} 106005 2017-07-26 13:41:40 106005 2019-09-19 13:26:02 live 0 0 152 faq genes, genebuild, transcript, ENSG, Ensembl identifiers, annotation, Havana {"question": "What is the difference between Ensembl, Havana and Ensembl/Havana merged transcripts? And what does known and novel mean?
", "answer": "For human, mouse, zebrafish, rat and pig, Ensembl not only shows transcripts that are annotated automatically using the Ensembl genebuild pipeline, but also transcripts that are manually annotated by the HAVANA team. If the Ensembl and Havana annotation agree with each other the transcripts are combined into an Ensembl/Havana merged transcript. When a transcript is only annotated by Ensembl or Havana it is named an Ensembl or Havana transcript, respectively. Transcripts that do match a species-specific entry in the UniProtKB/Swiss-Prot or RefSeq databases are categorised as known, those that do not as categorised as novel. For more detailed information, please have a look at our genebuild documentation.
", "category": "genes"} 1849 2008-08-20 15:21:07 \N 2019-09-19 13:26:02 live 0 0 196 view BLAST, BLAT, align, sequence, search {"ensembl_object": "Location", "content": "BLAST is an alignment program that determines sequence identity between a query sequence and a large set of target sequences. The Ensembl installations of BLAST and BLAT allows you to align a protein or nucelotide sequence to any genome in Ensembl.
\\r\\nWhat is BLAT? BLAT, the BLAST-like Alignment Tool, quickly finds alignments to DNA sequences. It is not as flexible as BLAST, in that you need an exact or nearly-exact match to see a hit. As it is fast, it is the default alignment program in the Ensembl page when the query and target sequences are both nucleotide.
\\r\\nBLAST, the Basic Local Alignment Search Tool, allows searching more distantly related sequences. Ensembl uses the Washington University School of Medicine WU-BLAST 2.0 implementation for its sequence similarity search options.
\\r\\nUsing Ensembl BLAT/BLAST. Once you enter your query sequence (as FASTA, or as an ID in the sequence ID or accession box), and parameters, click RUN. BLAT will immediately give results. In the case of BLAST, you will see an intermediate page. Click Retrieve periodically until the View Results button appears.
\\r\\nClick View Results to show the final, formatted results.
\\r\\nPaste in a sequence in FASTA format, making sure no line numbers are included. Alternatively, upload a sequence file from a public sequence database such as UniProt, EMBL or NCBI RefSeq simply by typing in the sequence accession number.
\\r\\nYou may also enter a ticket or job identifier from a previous BLAST search. However, these are only saved for one week, or one month if you have logged in to the Ensembl website.
\\r\\nClick the appropriate button to specify whether the query sequence is protein or nucleotide.
\\r\\nFor any genomic sequence in Ensembl, select latest_GP. Masked genomes have been run through the RepeatMasker program. More than one organism may be selected with the cntrl key. You may also select a cDNA library, all or ab initio. The all option accesses Ensembl transcripts, which are based on protein and mRNA information. Ab initio will show possible cDNAs based on the sequence alone, these are predictions.
\\r\\nSimilarly, all peptides refers to the Ensembl peptides. The ab initio peptides are merely predictions.
\\r\\nBLAT can be chosen for nucelotide queries against nucelotide databases. The following BLAST options appear:
\\r\\nOptions are described in reference 2 at the bottom of this article:
\\r\\nEverything set, click RUN to start the search, or customise parameters first.
\\r\\nUse the configure button to alter the default parameters.
\\r\\nW - Word size for seeding alignments\\r\\nwink - Step-size for sliding-window used to seed alignments.h\\r\\nT - Neigborhood word threshold score- not blastn\\r\\nhitdist - Max distance between words for two-hit seeding\\r\\n One-hit seeding by default\\r\\nM - Match score - blastn only\\r\\nN - Missmatch score - blastn only\\r\\nmatrix - BLOSUM scoring matrix - not blastn\\r\\nQ - Cost of first gap character\\r\\nR - Cost of second and remaining gap characters\\r\\nnogap - Turn off gapped alignments\\r\\nX - Alignment extension cutoff\\r\\n\\r\\n+---------+-------------------+----------------------------------+\\r\\n| | W |wink| T |hit-| M | N |mat-| Q | R |no- | X |\\r\\n| | | | |dist| | |rix | | |gap | |\\r\\n|---------+----+----+----+----+----+----+----+----+----+----+----+\\r\\n| BLASTN |\\r\\n| exact | 15 | 15 | . | 0 | 1 | -3 | . | 10 | 10 | 1 | 5 |\\r\\n| low | 15 | 1 | . | 0 | 1 | -3 | . | 3 | 3 | 0 | ? |\\r\\n| oligo | 11 | 1 | . | 0 | 1 | -3 | . | 3 | 3 | 0 | ? |\\r\\n| medium | 11 | 1 | . | 0 | 1 | -1 | . | 2 | 1 | 0 | ? |\\r\\n| high | 9 | 1 | . | 0 | 1 | -1 | . | 2 | 1 | 0 | ? |\\r\\n| default | 11 | 1 | . | 0 | 5 | -4 | . | 10 | 10 | 0 | ? |\\r\\n+---------+----+----+----+----+----+----+----+----+----+----+----+\\r\\n| BLASTP |\\r\\n| TBLASTN |\\r\\n| exact | 6 | 1 |999 | 0 | . | . | 80 | 9 | 2 | 0 | ? |\\r\\n| low | 4 | 1 | 16 | 40 | . | . | 80 | 9 | 2 | 0 | ? |\\r\\n| oligo | 4 | 1 | 16 | 0 | . | . | 80 | 9 | 2 | 0 | ? |\\r\\n| medium | 3 | 1 | 15 | 40 | . | . | 62 | 9 | 2 | 0 | ? |\\r\\n| high | 3 | 1 | 15 | 0 | . | . | 45 | 9 | 2 | 0 | ? |\\r\\n| default | 3 | 1 | 11 | 0 | . | . | 62 | 9 | 2 | 0 | ? |\\r\\n+---------+----+----+----+----+----+----+----+----+----+----+----+\\r\\n| BLASTX |\\r\\n| TBLASTX |\\r\\n| exact | 6 | 1 |999 | 0 | . | . | 80 | 9 | 2 | 1 | 10 |\\r\\n| low | 4 | 1 | 20 | 40 | . | . | 62 | 9 | 2 | 0 | ? |\\r\\n| medium | 4 | 1 | 20 | 0 | . | . | 62 | 9 | 2 | 0 | ? |\\r\\n| high | 3 | 1 | 15 | 40 | . | . | 62 | 9 | 2 | 0 | ? |\\r\\n| default | 3 | 1 | 12 | 0 | . | . | 62 | 9 | 2 | 0 | ? |\\r\\n+---------+----+----+----+----+----+----+----+----+----+----+----+\\r\\n\\r\\n
Initially, the RESULTS page shows a ticket ID for the current query. Results are stored on our server for one week, so that they can be accessed later with this ID or a bookmark to the results page. Click the Retrieve button to see the status of the current query.
\\r\\nWhen a search is complete, its status will change from Job Queued to Parsing Results. After minutes to an hour, clicking the Retrieve button will cause the Raw Results link to appear, and eventually the View Results button. Click View Results for the formatted BLAST hits.
\\r\\nBoth BLAT and BLAST show results distributed on a Karyotype, if it is available for the species, showing hit or match locations of HSPs, high scoring pairs. Hits are shown as arrows, and the best hit is boxed.
\\r\\nThe diagram in the centre of the results page shows the query sequence as a chain of black and white boxes, and hits as red filled boxes. A Summary Table at the bottom of the page will list all hits in order of low to high score, but this can be customised. Links in front of each row, showing one BLAST/BLAT match, show:
\\r\\n1) BLAT - The BLAST-Like Alignment Tool
\\r\\nW. James Kent
\\r\\nGenome Res. 2002 Apr;12,4:656-664.
\\r\\n\\r\\n2) BLAST
\\r\\nJoseph Bedell, Ian Korf and Mark Yandell
\\r\\nOReilly & Associates, 2003
\\r\\n", "ensembl_action": "BLAST"} 5655 2009-01-26 14:11:59 106005 2019-09-19 13:26:02 draft 0 0 182 view user, upload, DAS, custom, data, external, own, personal, view, annotation, display {"ensembl_object": "Gene", "content": "Databases and projects external to Ensembl can be shown here. Click Configure this page to choose what information to view, related to your gene of interest. This page uses DAS or the Distributed Annotation System to show biological annotation from other sources. You can upload your own data (click on Configure this page, use the Custom Data tab).
", "ensembl_action": "ExternalData"} \N \N 106005 2019-09-19 13:26:02 live 0 0 202 view comparative genomics, alignment, species, whole genome alignment, WGA, conserved, conservation, genome, Pecan, EPO, diagram, nucleotide, sequence, align, compara {"ensembl_object": "Location", "content": "Whole genome aligments include \\"pairwise\\" sequence alignments between two species, and multi-species alignments using genomes of more than two species.
\\r\\nExport alignments with the \\"export data\\" link in the left hand navigation column of the \\"Genomic alignments\\" page.
\\r\\nOnly one species is shown by default. Click on select an alignment at the top of the sequence in order to choose an alignment to view.
\\r\\nChromosomes and scaffolds in the alignment are listed for each species. Sequence is shown under these coordinates. Red, highlighted nucleotides are located in exons.
\\r\\nClick on configure this page at the left of the view to add or change the display. Customisable options are as follows:
\\r\\nNote: Ancestral sequences may be turned off using configure this page.
\\r\\nIf the sequence is not known, nucleotides are replaced by dots.
\\r\\n[[IMAGE::text_alignments.png width=\\"400\\" height=\\"300\\"]]
\\r\\nThe image above shows the multiple alignments for five catarrhini primates. Conserved nucleotides have been turned on, and are shown by blue highlighting. Variants and line numbering (relative to the coordinate system) are also selected by clicking on the configure this page button.
\\r\\n[[MOVIE::222]]
", "ensembl_action": "Compara_Alignments"} 5655 2009-04-20 13:47:46 120463 2019-09-19 13:26:02 live 0 0 155 view sequence, exon, intron, transcript, DNA, nucleotide, base, bp {"content": "Exons, introns and flanking sequence are shown for one transcript (ENST...) in the 5' to 3' direction, regardless of whether it is a forward or reverse-stranded gene.
Change flanking sequence, view all intronic sequence, and/or turn on variations by clicking on the Configure this page tool button at the left of the view. Turn off columns using the Show/hide columns button at the top of the table. Export for use in Microsoft Word using the Download view as RTF button at the left of the view.
Exons - Uppercase letters
Flanking sequence and introns - lower case letters
To BLAST sequence, select your sequence of interest with your mouse then click on the pop-up button.
[[IMAGE::BLAST_sequence_button.png height="30" width="190"]]
[[MOVIE::213]]
", "ensembl_action": "Exons", "ensembl_object": "Transcript"} 5655 2008-09-12 14:47:01 254453 2023-09-07 12:45:24 live 0 0 156 view \N {"ensembl_object": "Transcript", "content": "One protein belonging to a gene is shown below. For more splice variants, go to the Gene Summary view by clicking on the Gene tab.
\\r\\nHuman protein IDs in Ensembl begin with ENSP. A three-letter code is inserted for other species (for example, ENSMUSP... is a mouse protein). For information about colours and sources of transcripts, see the Gene Summary help. For more detail, see articles listed here.
\\r\\nThe protein is displayed graphically as a long purple bar. Domains and variations (synonymous and non-synonymous single nucleotide polymorphims (SNPs)) are mapped along the protein. Click on a coloured line corresponding to a protein domain or motif to see its position in the primary sequence.
\\r\\nMapped domains are taken from the following sources:
\\r\\nOther motifs are annotated using:
\\r\\nCoiled Coil Regions - The Ensembl analysis and annotation pipeline uses the ncoils program implemented by R.B. Russell and A.N. Lupas for coiled-coil domain characterisation and annotation. Rob Russel group at the EMBL Heidelberg provides a public service.
\\r\\nLupas A, Van Dyke M and Stock J.
Predicting coiled coils from protein sequences.
Science. 1991 May 24;252(5010):1162-1164.
[PubMed]
Low-Complexity Regions - Low complexity regions are annotated with the SEG program.
\\r\\nWootton, J. C. and S. Federhen
Statistics of local complexity in amino acid sequences and sequence databases.
Computers in Chemistry 1993; 17:149-163.
doi:10.1016/0097-8485(93)85006-X
Wootton, J. C. and S. Federhen.
Analysis of compositionally biased regions in sequence databases.
Methods in Enzymology 1996; 266: 554-571.
doi:10.1016/S0076-6879(96)66035-2
Signal Sequence Regions - are characterised with SignalP.
\\r\\nNielsen H, Engelbrecht J, Brunak S, von Heijne G.
Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites.
Protein Eng. 1997 Jan;10(1):1-6.
[Abstract] [Full Text PDF]
Nielsen H, Krogh A.
Prediction of signal peptides and signal anchors by a hidden Markov model.
In J. Glasgow, T. Littlejohn, F. Major, R. Lathrop, D. Sankoff, and C. Sensen, editors
Proceedings of the Sixth International Conference on Intelligent Systems for Molecular Biology, pages 122-130, Menlo Park, CA, 1998. AAAI Press.
[PubMed]
Bendtsen JD, Nielsen H, von Heijne G, Brunak S.
Improved prediction of signal peptides: SignalP 3.0.
J Mol Biol. 2004 Jul 16;340(4):783-795.
doi:10.1016/j.jmb.2004.05.028
Transmembrane Regions - Ensembl uses TMHMM for the annotation of transmebrane helices.
\\r\\nA. Krogh, B. Larsson, G. von Heijne, and E. L. L. Sonnhammer.
Predicting transmembrane protein topology with a hidden Markov model: Application to complete genomes.
Journal of Molecular Biology, 305(3):567-580, January 2001.
doi:10.1006/jmbi.2000.4315
E. L.L. Sonnhammer, G. von Heijne, and A. Krogh.
A hidden Markov model for predicting transmembrane helices in protein sequences.
In J. Glasgow, T. Littlejohn, F. Major, R. Lathrop, D. Sankoff, and C. Sensen, editors
Proceedings of the Sixth International Conference on Intelligent Systems for Molecular Biology, pages 175-182, Menlo Park, CA, 1998. AAAI Press.
[PubMed]
A karyotype, displayed in this page, is available for some species in Ensembl. Images are imported from various sources, depending on the species. Dark and light bands reflect heterochromatin and euchromatin staining.
\\r\\nClick on a location within the karyotype to zoom in to one specific chromosome, or a genomic region.
\\r\\nStatistics are shown below the karyotype, as follows:
\\r\\nFor the remaining statistics, see this help page.
\\r\\nAdd your own data to this display. Click on the Custom Tracks button at the left of the view. Click the Features on Karyotype link at the left of the window that opens. Type in IDs (such as Gene IDs like ENSG00000012048, ENSG00000139618, ENSG00000198668) or names (such as Gene names like BRCA1, BRCA2, CALM) into the box. Click the Show features button to view them along the karyotype. Click on the features to get more information from yourinput file.
\\r\\nYou can also view custom tracks, such as BED and BigWig files on the karyotype. This page provides more information about uploading your own data to Ensembl. Add your data then Configure this page in the karyotype view to turn on your tracks.
", "ensembl_action": "Genome"} 5655 2008-09-03 14:53:14 120463 2019-09-19 13:26:02 live 0 0 164 view resequencing, alignment, individual, personal genomes, strain, sequence, variation, variants, allele, loci, differences {"ensembl_object": "Location", "content": "This view compares sequences across strains of mouse and rat.
\\r\\nThe first line of sequence is the reference genome. Subsequent lines of sequence are genomes for other samples or strains.
\\r\\nA vertical bar signifies we've got the same nucleotide in samples/strains as the reference sequence and genotype or read coverage data is availalbe to confirm that.
\\r\\nA dot means that we imply that the sample's sequence is the same as the reference at non-variant sites (i.e. the sequence between the variants with genotype calls). We don't have explicit data (either genotype or read coverage) to confirm this at the non-variant sites. We only have genotype calls at know variants sites (e.g. from the 1000 genomes project).
\\r\\nA tilde, or squiggle, signifies a lack of resequencing coverage at that position.
\\r\\nThe alignment can be customised using the Configure this page option to show the actual nucleotide rather than a vertical bar or a dot. In the configuration window, select the matching basepairs option and show all. This will reveal the nucleotide sequence, instead of dots/vertical bars in positions that have the same sequence as the reference genome.
\\r\\nThe sequence can also be marked-up with positions of exons, coding start and stop positions, and variations using Configure this page.
\\r\\nOptions to mark-up the sequence include:
\\r\\nExample in mouse.
\\r\\nEnsembl genes, transcripts, and proteins are matched to sequences and information in other biological databases. The matches are referred to as external references, or Xrefs, in the Ensembl API.
\\r\\nSequence matches to databases such as UniProtKB and NCBI RefSeq are shown in this view. Clicking on the matching ID will bring you to the record in the external database.
\\r\\nTarget %id indicates the percentage of the target (external sequence) that matches to the Ensembl transcript or protein.
\\r\\nQuery %id indicates the percentage of the query (Ensembl transcript or protein) that matches to the external sequence.
\\r\\nThe align link shows the sequence alignment between the Ensembl transcript and the sequence match in the external database.
\\r\\nIDs in other databases may also be extracted for one or more genes at a time using BioMart. See a tutorial video here.
", "ensembl_action": "Similarity"} 5655 2008-09-15 16:47:39 106005 2019-09-19 13:26:02 live 0 0 159 view gene ontology, GO {"ensembl_object": "Transcript", "content": "The Gene Ontology (GO) consortium assigns biological processes to genes. GO develops a controlled vocabulary of terms split into three ontologies: cellular component, biological process and molecular function.
\\r\\nEnsembl associates GO terms to genes via UniProt mappings. evidence codes are also shown and refer to the evidence used for the initial assignment of GO terms to UniProt records.
\\r\\n\\r\\n
Guide to evidence codes from GO:
\\r\\nProbesets from microarray platforms are matched to genes according to the Ensembl 2-step mapping procedure.
\\r\\nNote: probesets can only be mapped if the sequences are provided to Ensembl by the manufacturer.
\\r\\nYou can export oligo probe mappings for several genes or transcripts using the BioMart tool. Watch a tutorial video to learn how to use BioMart.
", "ensembl_action": "Oligos"} 5655 2008-09-17 16:21:53 2 2019-09-19 13:26:02 live 0 0 162 view genome, region, location, position, zoom {"ensembl_object": "Location", "content": "This image shows a large region within a chromosome, contig or scaffold. The 'Genes' data track is always displayed by default.
\\r\\nGene colours are as follows:
\\r\\nIn addition to the 'Genes', extra tracks will be on by default depending on the species e.g. 'Chromosome bands' and 'Tilepath' for human. Data tracks can be added using the Configure this page button at the left, or the cog wheel icon in the top left of the image. Other icons in the Region overview image allow you to add custom tracks, share or resize the view, export the image, reset the configuration and reset data track order.
\\r\\n[[IMAGE::help_page_162.png]]
\\r\\nThis Region overview image is zoomable (zoom slider) and is scrollable in up to date versions of Chrome, Firefox, and Internet Explorer 9 and later.
\\r\\nThe Drag/Select option in the top right of the image allows you to scroll to a region or select a region. Select Scroll to move along the genome with a click and drag, and let go to allow the page to reload with your new location. Click Select to drag out a box around a region or feature of interest to view a pop-up window. You can then Mark a region in the view or Jump to a region (Figure 2).
\\r\\n", "ensembl_action": "Overview"} 5655 2008-09-17 16:39:42 120463 2019-09-19 13:26:02 live 0 0 163 view synteny, comparative genomics, syntenic, compara, species {"ensembl_object": "Location", "content": "
Syntenic regions are calculated where possible from pairwise (two-species) whole genome alignments.
\\r\\nThis view presents synteny between two species only. The centre chromosome represents the species of interest, and the smaller chromosomes show syntenic regions with a second species. Blocks are coloured according to the chromosome number on the second species. These blocks are connected by lines. Black lines connect syntenic blocks with the same orientation. Brown lines indicate regions with opposite orientation. The small red boxes mark the gene of interest and its homologue.
\\r\\n[[IMAGE::synteny.png width=\\"400\\" height=\\"300\\"]]
\\r\\nUnder this chromosome diagram, find the gene list for the species of interest, along with homologues in the second species.
\\r\\nSynteny can also be displayed in region in detail (top panel) and region overview pages. Configure the page to show syntenic blocks.
", "ensembl_action": "Synteny"} 5655 2008-09-17 16:53:37 106005 2019-09-19 13:26:02 live 0 0 165 view sequence, gene, transcript, genome, DNA, nucleotide, protein, bases {"ensembl_object": "Gene", "content": "This page shows the genome sequence for a gene of interest. Flanking sequence and introns are displayed. Exons belonging to the gene of interest are shown in red letters with peach highlighting. Other exons within this region that do not belong to the gene of interest are shown in black letters with peach highlights. Note that coding and non-coding transcripts contribute to this view.
\\r\\nChange the display and highlight features in the gene structure using the Configure this page options as follows:
\\r\\nTo BLAST sequence, select your sequence of interest with your mouse then click on the pop-up button.
\\r\\n[[IMAGE::BLAST_sequence_button.png height=\\"30\\" width=\\"190\\"]]
\\r\\n[[MOVIE::214]]
", "ensembl_action": "Sequence"} 13 2008-10-31 12:19:55 120522 2019-09-19 13:26:02 live 0 0 166 view ID, accession, match, xref, external reference, uniprot, refseq, database, ucsc, omim, mim, entrez gene, entrezgene, ncbi, external {"ensembl_object": "Gene", "content": "Ensembl genes, transcripts, and proteins are matched to sequences and information in other biological databases. The matches are referred to as external references, or Xrefs.
\\r\\nXref sources for Ensembl genes include HGNC, UCSC, the Database of Aberrant 3prime splice sites, DBASS3, and OMIM
\\r\\nXref sources for Ensembl transcripts (i.e. matches to Ensembl transcript and protein sequences) include UniProtKB, CCDS, EntrezGene, and NCBI RefSeq.
\\r\\nPlease see the General identifiers view in the transcript tab for more IDs associated with a specific Ensembl transcript and/or protein.
", "ensembl_action": "Matches"} 13 2008-10-31 12:38:53 106005 2019-09-19 13:26:02 live 0 0 167 view download, sequence, get, export {"ensembl_object": "Gene", "content": "Download sequence for a gene using the Export data button at the left of a gene page. To export genomic sequence in FASTA format, choose only the genomic (unmasked) option under Options for FASTA sequence. Note, you should deselect the other options. Alternatively, export cDNA, coding sequence, protein (peptide) sequence, UnTranslated Region (UTR), Exons or Introns in FASTA format.
\\r\\ncDNA Exon sequences (spliced mRNA).
\\r\\nCoding sequence As above, but excluding UTRs.
\\r\\nPeptide sequence Translation of the coding sequence.
\\r\\n5prime and 3prime UTRs Both Untranslated regions (UTRs) can be exported.
\\r\\nOther export formats are available, such as CSV, GTF, EMBL and GenBank flat files. Change the format in the Output field after clicking the Export data button, to see the options for the export (for example, repeats, variations, etc. To export multiple features for many genes, consider the BioMart tool.
", "ensembl_action": "Export"} 13 2008-10-31 12:50:35 2 2019-09-19 13:26:02 live 0 0 168 view variant, variation, polymorphism, sequence variation, single nucleotide polymorphism, SNP, insertion, deletion, population, mutation, non-synonymous, synonymous, indel {"ensembl_object": "Variation", "content": "This page offers a top panel of information specific to the variant. Graphical icons are presented that lead you to more specific variant data, also accessible from the links at the left. The links in the left hand menu have a corresponding icon. It's your choice how to navigate through the variation displays.
From the top of the view, the following information can be found:
Please see the Ensembl variation documentation for more information such as source of variants, and consequence types (effect on genes and transcripts).
IUPAC Ambiguity Codes
[[IMAGE::iupac_table.png width="394" height="302"]]
", "ensembl_action": "Summary"} 13 2008-10-31 13:05:56 106005 2019-09-19 13:26:02 dead 0 0 169 view variation, polymorphism, sequence variation, single nucleotide polymorphism, SNP, insertion, deletion, population, mutation, non-synonymous, synonymous, indel, genotype, allele, variant, location, region, position, genome, chromosome {"ensembl_object": "Variation", "content": "For the top panel describing variation details such as source and class, see this help page.
\\r\\nThe picture shows the genomic assembly as a blue bar (composed of individual contigs). The selected variation (outlined by a black box) is shown in a 5 kb region along with surrounding variations. They are staggered in multiple rows for ease of viewing. A legend at the bottom indicates which colours are used for the different variation consequence types. In addition, transcripts and regulatory features annotated in this region are displayed in a similar way to the region in detail view.
\\r\\nIn Configure this page you can modify the width of the region and select or deselect variation and regulatory feature types shown.
\\r\\nThe tables you can find on this page contain information for the region shown in the picture. All tables can be sorted by position or name.
\\r\\nSpecific tables include:
\\r\\nFor background information on variation types and sources, see this article.
", "ensembl_action": "Context"} 13 2008-10-31 13:18:00 106005 2019-09-19 13:26:02 live 0 0 170 view download, sequence, get, export {"ensembl_object": "Transcript", "content": "Download sequence for a transcript using the Export data button at the left of a transcript page. To export cDNA sequence in FASTA format, choose only the cDNA option under Options for FASTA sequence. Note, you should deselect the other options, and change Genomic to none. Alternatively, export genome sequence in this region, coding sequence, protein (peptide) sequence, UnTranslated Region (UTR), Exons or Introns in FASTA format.
\\r\\ncDNA Exon sequences (spliced mRNA).
\\r\\nCoding sequence As above, but excluding UTRs.
\\r\\nPeptide sequence Translation of the coding sequence.
\\r\\n5prime and 3prime UTRs Both Untranslated regions (UTRs) can be exported.
\\r\\nOther export formats are available, such as CSV, GTF, EMBL and GenBank flat files. Change the format in the Output field after clicking the Export data button, to see the options for the export (for example, repeats, variations, etc. To export multiple features for many genes, consider the BioMart tool.
", "ensembl_action": "Export"} 13 2008-10-31 13:28:05 106005 2019-09-19 13:26:02 live 0 0 171 view variant, variation, polymorphism, sequence variation, single nucleotide polymorphism, SNP, insertion, deletion, population, mutation, non-synonymous, synonymous, indel, genotype, allele, frequency, 1000 genomes, HapMap, minor, major, MAF, ESP {"content": "For the top panel describing variation details such as source and class, see this help page.
Populations: Who was studied? Data from multiple projects is available for a range of species and for human these are divided into subpopulations where possible.
1000 Genomes Project samples are separated into five super-populations: AFR, AMR, EAS, EUR, and SAS and 26 more specific populations. See mouse-over help or this FAQ for a description of what they mean.
Pie charts
[[IMAGE::pie_graphs.png]]
Pie charts can be displayed for 1000 Genomes allele frequencies. If a pie chart is shown on the view for the 1000 Genomes Project, it represents the distribution of the alleles in a 1000 genomes population for a specific variation. Hover over the population three-letter codes to get their names. The super populations are shown on the top-row. Click on a plus alongside Sub-populations to open up the pie charts for the sub-populations, which are then shown on the row(s) below.
In the example above, 73% of the alleles found in the African population studied (AFR) are A (frequency of 0.73), and 27% are C. The sub-populations for the AFR population have been opened up and are displayed on the row below.
Frequency tables
Frequencies are grouped by project for studies with multiple subpopulations (including 1000 Genomes, TOPMed and gnomAD for human, NextGen for goat).
The populations studied are shown in the first column. Allele frequencies and counts are followed by the genotype frequencies and counts. The final column, Genotype detail allows you to jump to the individual genotypes for that population.
The first row shows a summary all the individuals in that study and is highlighted in yellow. The populations are then grouped, with the super population (in blue) followed by its sub-populations (white and grey).
[[IMAGE::variation_population.png]]
This example is taken from the same variant as the pie charts above.
"} 13 2008-10-31 13:52:32 1 2019-09-19 13:26:02 live 0 0 172 view variation, polymorphism, sequence variation, single nucleotide polymorphism, SNP, insertion, deletion, population, mutation, non-synonymous, synonymous, indel, genotype, allele, variant, sample, samples, individual, individuals, populations {"ensembl_object": "Variation", "content": "For the top panel describing variation details such as source and class, see this help page.
\\r\\nTables of genotype information from the 1000 Genomes project, HapMap, and other sources (other data) are shown. Number of genotypes (samples), populations, and descriptions appear by default, in the table. To see the list of all samples in a population, click the Show link. Samples and specific genotypes will be revealed.
", "ensembl_action": "Individual"} 13 2008-10-31 14:06:20 106005 2019-09-19 13:26:02 live 0 0 173 view variation, polymorphism, allele, SNP, insertion, deletion, indel, dbSNP, population, breed, strain, genome, alignment, resequence, mutation {"ensembl_object": "Transcript", "content": "This help page is for both the population comparison table and comparison image. Variation sources are described in this document.
\\r\\nFor the Ensembl transcript chosen, the table lists the allele at each variant position in different breeds or strains.
\\r\\nTo configure the breeds or strains displayed, or types of variation (such as intronic, non-synonymous, etc.), click on configure this page at the left of the page. By default, not all intronic variants are shown. This can be changed with the \\"Context\\" roll-down menu in the configure page panel. Any selections in the configure page menu will affect the comparison image as well as the table.
\\r\\nThis graphical display shows one Ensembl transcript and variation data in a genomic context across breeds or strains.
\\r\\nThis track displays all variations in this genomic region. These are colour-coded as shown in the variation legend at the bottom of the page.
\\r\\nThe transcript is drawn for each breed or strain selected in the configure this page menu. Under this, a grey bar shows sequence (if available) for that breed or strain. A dark grey bar indicates high coverage in that genomic region. A light grey bar shows low coverage. (For example, in a strain, high coverage would signify that both strands of the genome were sequenced, and low coverage would mean that only one strand was sequenced). Clicking on a grey bar produces a pop-up with detailed information on the coverage depth and positioning.
\\r\\nBoxes underneath each breed or strain indicate the presence of an alternative allele in a variant position. Click on any box for more details on that variation. Amino acids are written in single-letter code format within the boxes, if the variations map to coding regions. Changes for non-synonymous single nucleotide polymorphisms (SNPs) are indicated with a forward slash (e.g. S/C denotes a change from Serine to Cysteine). A SNP affecting a STOP codon either by introducing a new one (e.g. A/*) or removing an existing one (e.g. */Q) is indicated by a red box, and the STOP codon by an asterisk (*).
\\r\\nBelow the transcripts, each variation locus that falls within the transcript is shown by a hollow box outlined with a colour that corresponds to the variant's effect or position on the transcript. The colours are described in the variation legend. The possible alleles (nucleotides) for each variation are written in the corresponding box. The first allele is the one in the reference sequence assembly, and the second, the other possible allele reported. For example, a box with A/G indicates that at that position, A is the reference allele, but G is a potential allele for that breed or strain. Click on any hollow box for more information.
\\r\\nEach allele position box described above (i.e. hollow box showing nucleotides) corresponds to a green block in the reference sequence track below. The aim of this display is to compare the haplotype block structure in the reference sequence and the breed or strain selected. Where the allele matches the reference nucleotide sequence, the boxes are filled in green. (The reference allele is the first nucleotide in the corresponding allele position box). Where the alleles differ from the reference, the boxes are filled with purple. If both alleles are known in a breed or strain at a heterozygous position, the box is striped green and purple.
", "ensembl_action": "Population"} 5655 2008-10-31 14:47:13 120522 2019-09-19 13:26:02 live 0 0 585 view \N {"content": "This page displays related phenotypes, diseases and traits with counts of the numbers of reported associations broken down by loci type.
\\r\\nRelationships are established by mapping the names used in assocation reports to ontology terms and reporting conditions with terms in common. Conditions are reported if they match the same ontology term as the query name, or match child terms. For example, querying for 'Hearing loss' also returns assocations to the more specific term 'noise-indced hearing loss'.
\\r\\nThe table can be filtered by the ontology term used to link the conditions.
\\r\\n"} 120522 2017-02-27 15:21:21 120522 2019-09-19 13:26:02 live 0 0 174 view genebuild, annotation, supporting evidence, evidence, supporting data, experimental evidence, experimental data {"ensembl_object": "Transcript", "content": "
Ensembl/Havana transcripts result from either the alignment of protein and cDNA sequences to the genome (the Ensembl genebuild) or (for human, mouse, and zebrafish) from manual curation by VEGA/Havana. Protein, cDNA, and EST sequences used to determine the Ensembl/Havana transcripts can be viewed in this page.
\\r\\nExons in the Ensembl or Havana transcript are drawn as boxes at the top of the image. Filled red or gold boxes are protein-coding exons, while empty boxes with red or gold outline are UTR (UnTranslated Region) within a protein-coding transcript. Evidence (sequences) that the Ensembl/Havana transcript was based on during the genebuild is drawn underneath. This evidence can be cDNA, mRNA, protein, or EST in nature (see colour key at the bottom of the image).
\\r\\nClick on any alignment to find a link to the sequence record, or to view the alignment of the supporting evidence with the Ensembl/Havana transcript. Exon evidence aligned to the Ensembl/Havana transcript may support one or a few exons.
\\r\\nNon-canonical splice sites are highlighted as indicated in the key.
\\r\\nNote: The genebuild is carried out once a year or less frequently. For an up-to-date list of matches to current sequences in scientific databases, try the General identifiers page.
", "ensembl_action": "SupportingEvidence"} 13 2008-10-31 15:00:13 106005 2019-09-19 13:26:02 live 0 0 175 view sequence, exon, transcript, DNA, nucleotide, codon, cDNA, mRNA, amino acid, residue, coding, protein, gene, spliced, splicing, intron, exon, alternative splicing, alternatively spliced, peptide, translation {"ensembl_object": "Transcript", "content": "This highly customisable sequence shows, by default, the transcript sequence (cDNA), the coding sequence underneath it, and the protein sequence in the third line. Line numbering is different for all three sequences.
\\r\\nLine numbering
\\r\\nVariations are drawn along the sequence, and an IUPAC ambiguity code represents the variation at the top of the relevant nucleotide. Click any ambiguity code for more information. A table of possible codes is described here.
\\r\\nHighlighted nucleotides are coloured according to the key at the top of the view. Red amino acids indicate that, due to sequence variation, one or more other amino acids are possible that that position.
\\r\\nSequence colouring
\\r\\nTo turn off variations, coding sequence, UTR, and other markup, use the Configure this page tool button at the left. Note the view can be downloaded to open in Microsoft Word, using the Download view as RTF to do so.
\\r\\nTo BLAST sequence, select your sequence of interest with your mouse then click on the pop-up button.
\\r\\n[[IMAGE::BLAST_sequence_button.png height=\\"30\\" width=\\"190\\"]]
", "ensembl_action": "Sequence_cDNA"} 5655 2008-10-31 15:26:17 106005 2019-09-19 13:26:02 live 0 0 395 movie video, tutorial, API, database, install, Perl {"list_position": 20, "youtube_id": "l6a9Exe6Bsc", "title": "How to install the Ensembl Perl APIs in 4 minutes", "youku_id": "XNDQyNDUwNzE2", "length": "4.09"} 106005 2012-11-22 16:51:00 5655 2019-09-19 13:26:02 live 0 0 176 view sequence, exon, transcript, amino acid, residue, coding, protein, gene, spliced, splicing, intron, exon, alternative splicing, alternatively spliced, peptide, translation {"ensembl_object": "Transcript", "content": "A protein sequence corresponding to one Ensembl transcript is shown. For more isoforms and splice variants, go to the Gene Summary view by clicking on the Gene tab.
\\r\\nThe sequence is shown in alternating black and blue to differentiate exons resulting in the amino acid sequence. Amino acids at splice junctions are in red.
\\r\\nTo turn on sequence variations and/or line numbering, use the Configure this page tool button at the left of the view. To export the sequence as FASTA, use the Export data link. Alternatively, the view may be downloaded as RTF using the button at the left, and opened in Microsoft Word.
\\r\\nTo BLAST sequence, select your sequence of interest with your mouse then click on the pop-up button.
\\r\\n[[IMAGE::BLAST_sequence_button.png height=\\"30\\" width=\\"190\\"]]
", "ensembl_action": "Sequence_Protein"} 5655 2008-10-31 15:30:46 120522 2019-09-19 13:26:02 live 0 0 177 view domain, protein, peptide, profile, motif, transmembrane, coiled-coil, Pfam, Superfamily, Prosite, PIRSF, Panther, Prints, InterPro {"ensembl_object": "Transcript", "content": "One Ensembl protein is shown as a purple bar, with alternating shades of light and dark purple reflecting alternating exons. Below that, domains and motifs from multiple projects are mapped along the protein. Click on any motif to find the corresponding amino acid positions on the Ensembl protein, along with domain IDs in the respective databases. Sequence variations are displayed if they are found in the coding sequence.
\\r\\nFor a table of protein motifs and domains, go to the domains and features page
\\r\\nMotifs and domains in Ensembl proteins are predicted using InterProScan, which includes annotation from:
\\r\\nOther motifs come from the following analyses and sources:
\\r\\nCoiled-coils - The Ensembl analysis and annotation pipeline uses the ncoils program implemented by R.B. Russell and A.N. Lupas for coiled-coil domain characterisation and annotation. Rob Russel's group at the EMBL Heidelberg provides a public service.
\\r\\nPredicting coiled coils from protein sequences.
\\r\\nScience. 1991 May 24;252(5010):1162-1164.\\r\\n\\r\\nLow-complexity regions - Low complexity regions are annotated with the SEG program.
\\r\\nStatistics of local complexity in amino acid sequences and sequence databases.
\\r\\nComputers in Chemistry 1993; 17:149-163.
\\r\\n\\r\\nAnalysis of compositionally biased regions in sequence databases.
\\r\\nMethods in Enzymology 1996; 266: 554-571.
\\r\\n\\r\\nSignal sequences - These regions are characterised with SignalP.
\\r\\nProtein Eng. 1997 Jan;10(1):1-6.
\\r\\n\\r\\nIn J. Glasgow, T. Littlejohn, F. Major, R. Lathrop, D. Sankoff, and C. Sensen, editors Proceedings of the Sixth International Conference on Intelligent Systems for Molecular Biology, pages 122-130, Menlo Park, CA, 1998.
\\r\\n\\r\\nJ Mol Biol. 2004 Jul 16;340(4):783-795.
\\r\\n\\r\\nTransmembrane regions - Ensembl uses TMHMM for the annotation of transmebrane helices.
\\r\\nPredicting transmembrane protein topology with a hidden Markov model: Application to complete genomes.
\\r\\nJournal of Molecular Biology, 305(3):567-580, January 2001.
\\r\\n\\r\\nA hidden Markov model for predicting transmembrane helices in protein sequences. In J. Glasgow, T. Littlejohn, F. Major, R. Lathrop, D. Sankoff, and C. Sensen, editors
\\r\\nProceedings of the Sixth International Conference on Intelligent Systems for Molecular Biology
\\r\\n175-182, Menlo Park, CA, 1998.
\\r\\n\\r\\nMotifs and domains in the Ensembl protein of interest are listed. These are computationally predicted by InterProScan and references listed below. The domains table shows the original project that identified the domain, the amino acid start and end of the domain on the Ensembl peptide, the domain name, and IDs in the original database and InterPro consortium.
\\r\\nCoiled-coils - The Ensembl analysis and annotation pipeline uses the ncoils program implemented by R.B. Russell and A.N. Lupas for coiled-coil domain characterisation and annotation. Rob Russel's group at the EMBL Heidelberg provides a public service.
\\r\\nPredicting coiled coils from protein sequences.
\\r\\nScience. 1991 May 24;252(5010):1162-1164.\\r\\n\\r\\nLow-complexity regions - Low complexity regions are annotated with the SEG program.
\\r\\nStatistics of local complexity in amino acid sequences and sequence databases.
\\r\\nComputers in Chemistry 1993; 17:149-163.
\\r\\n\\r\\nAnalysis of compositionally biased regions in sequence databases.
\\r\\nMethods in Enzymology 1996; 266: 554-571.
\\r\\n\\r\\nSignal sequences - These regions are characterised with SignalP.
\\r\\nProtein Eng. 1997 Jan;10(1):1-6.
\\r\\n\\r\\nIn J. Glasgow, T. Littlejohn, F. Major, R. Lathrop, D. Sankoff, and C. Sensen, editors Proceedings of the Sixth International Conference on Intelligent Systems for Molecular Biology, pages 122-130, Menlo Park, CA, 1998.
\\r\\n\\r\\nJ Mol Biol. 2004 Jul 16;340(4):783-795.
\\r\\n\\r\\nTransmembrane regions - Ensembl uses TMHMM for the annotation of transmebrane helices.
\\r\\nPredicting transmembrane protein topology with a hidden Markov model: Application to complete genomes.
\\r\\nJournal of Molecular Biology, 305(3):567-580, January 2001.
\\r\\n\\r\\nA hidden Markov model for predicting transmembrane helices in protein sequences. In J. Glasgow, T. Littlejohn, F. Major, R. Lathrop, D. Sankoff, and C. Sensen, editors
\\r\\nProceedings of the Sixth International Conference on Intelligent Systems for Molecular Biology
\\r\\n175-182, Menlo Park, CA, 1998.
\\r\\n\\r\\nThis view shows all small sequence variations for one isoform (ENSP). The following columns are presented:
\\r\\n
MeDIP-chip: http://dx.doi.org/10.1101/gr.077479.108
MeDIP-seq: http://dx.doi.org/10.1038/nbt1414
Download these data from the EBI ftp site.
The methylation profiles are viewable as tracks in the region in detail page of the Ensembl browser. To find the region in detail, click on the Location tab. Click configure this page and go to the Regulation, DNA Methylation menu.
The regulation tracks displays promoter and enhancer predictions based on data from DNase I, ChIP-chip and ChIP-seq experiments. Read more here
These are the data in the regulatory regions track in the region in detail page. Data are accessible using the Perl API. Alternatively, download these data from the Ensembl ftp site.BioMart allows data mining of the functional genomics data.", "category": "regulation"} 5655 2009-06-18 11:29:01 106005 2019-09-19 13:26:02 dead 0 0 180 view variation, polymorphism, allele, SNP, insertion, deletion, Indel, dbSNP, synonymous, non-synonymous, codon {"ensembl_object": "Transcript", "content": "All variations for one transcript are shown. If it is a coding SNP, the amino acid position is shown (i.e. Residue column). The SNP ID links to more information about the variations. Non-synonymous SNPs show a change in the protein sequence, synonymous SNPs are silent mutations that do not alter the protein sequence. The Ambiguity code reflects the IUPAC code.", "ensembl_action": "Variations"} \N \N 5655 2019-09-19 13:26:02 dead 0 0 181 view transcript, old, archive, ID, previous, history, retired {"ensembl_object": "Transcript", "content": "
Ensembl stable gene, transcript, and protein identifiers are kept the same throughout Ensembl releases unless the gene or transcript model changes dramatically. In this case, the old stable identifier may be retired and a new one assigned (or two identifiers may be merged into one). The Ensembl Archive tracks all stable identifiers and should provide mappings to the current gene, transcript, and protein set.
\\r\\nThe Ensembl ID is listed, and the status is current if the ID can be found in the current release. The latest version in which the gene, transcript, or protein was found is listed in case the ID has been retired.
\\r\\nFind old IDs in the current Ensembl version by clicking on the \\"Tools\\" link in the header at the top of the page. Follow the link to the ID History Converter.
\\r\\nAn ID History Map shows Ensembl release numbers, genomic assemblies, and versions for Ensembl IDs in a horizontal comparison. Small squares or nodes correspond to the ID shown on the left, and represent an update in the version of the ID. Versions are updated if there has been a change in the gene, transcript, or protein model. Nodes (squares) are connected by a line if the versions are related. This line reflects the score of how well the versions match, for recent releases. If a score is not calculated, the line will be grey (unknown score).
", "ensembl_action": "Idhistory"} \N \N 120463 2019-09-19 13:26:02 live 0 0 183 view download, sequence, get, export {"ensembl_object": "Location", "content": "Download sequence for a chromosomal location using the Export data button at the left of a location page. To export genomic sequence in FASTA format, choose the type of genomic sequence under Options for FASTA sequence (i.e. unmasked, masked, flanking sequence ...)
\\r\\nOther export formats are available, such as BED, CSV, GTF, EMBL and GenBank flat files. Change the format in the Output field after clicking the Export data button.
", "ensembl_action": "Export"} \N \N 106005 2019-09-19 13:26:02 live 0 0 184 view marker, uniSTS, STS {"ensembl_object": "Location", "content": "This view displays information about a particular chromosome map marker or sequence tagged site (STS). Markers in Ensembl are mapped from UniSTS. In the case more than one marker is found in the region being viewed, click on a marker name to open a view with specific information for that marker.
\\r\\nThe following information is found on the view for one specific Marker:
\\r\\nThe table shows alternate genetic maps, the name of the marker used in the map, the chromosome, and the position in centimorgans.
", "ensembl_action": "Marker"} 13 2008-11-10 15:21:56 106005 2019-09-19 13:26:02 live 0 0 188 movie tutorial, video, genome, browse, gene, DNA, introduction, overview, variation {"list_position": 2, "youtube_id": "C2g37X_uMok", "title": "The Ensembl Genome Browser", "youku_id": "XMzQxNTA1MDkzMg", "length": "10:00"} 2 2008-12-04 14:41:15 120522 2019-09-19 13:26:02 live 0 0 189 movie tutorial, video, BioMart, data mining, export, gene, information, sequence, variation {"list_position": 18, "youtube_id": "QvGT2G0-hYA", "title": "Introduction to BioMart", "youku_id": "XMTUwNDIyNTMwNA", "length": "4.27"} 2 2008-12-04 14:52:55 120463 2019-09-19 13:26:02 live 0 0 190 faq FAQ, comparative genomics, species, homology, homologue, paralogue, paralogy, orthologue, orthology, gene tree, protein tree {"question": "How does Ensembl determine homology relationships?
", "answer": "For detailed documentation about the homology prediction pipeline, have a look at this article. Orthologues and paralogues are listed in the gene tab or viewable in the gene trees. Click on any node in the tree to export an alignment.
\\r\\n\\r\\n
Trees are downloadable from our ftp site.
\\r\\nPlease see the following reference for more: Vilella et al. 2008.
", "category": "comparative"} \N \N \N 2019-09-19 13:26:02 live 0 0 191 faq FAQ, comparative genomics, species, homology, homologue, alignments, CDS, sequence, API, programmatic {"question": "How do I get alignments of homologous proteins? Can I get the CDS (coding sequence) alignments as well? I'm using the API.
", "answer": "Yes, both can be obtained using the Compara API: see this example script.
\\r\\n\\r\\n
Protein alignments of homologues are also available using the orthologues or paralogues links from the Gene tab in the browser. The link at the bottom of the page allows a customised view of the protein alignments.
", "category": "compara_api"} \N \N 106005 2019-09-19 13:26:02 dead 0 0 192 faq FAQ, comparative genomics, conserved, conservation, API, programmatic, whole genome alignment, GERP {"question": "How can I obtain the constrained elements, i.e the conserved sequences, for multiple species?
", "answer": "You can use the Compara API. See this example script.
", "category": "compara_api"} \N \N 106005 2019-09-19 13:26:02 live 0 0 193 faq FAQ, comparative genomics, whole genome alignment, synteny, compara, syntenic {"question": "Can I view syntenic regions in Ensembl?
", "answer": "Click on the 'Synteny' link in the Location tab to view conserved blocks of sequences. Syntenic regions are calculated from the pairwise alignments.
", "category": "comparative"} \N \N 120522 2019-09-19 13:26:02 live 0 0 194 faq FAQ, comparative genomics, species, homology, homologue, paralogue, paralogy, orthologue, orthology, gene tree, protein tree, family, compara, homolog, paralog, ortholog {"question": "I would like a list of homologues to my gene. Should I look at the gene trees or the families?
", "answer": "Although there is overlap, the Ensembl Families and Gene Trees are two different complementary data sets.
To construct the Gene Trees, only the longest translation of each gene is included, and only species represented in Ensembl are used.
The families include all Ensembl transcripts plus the Uniprot (Swiss-Prot and TrEMBL) peptides for all the metazoans, which increases the total number of peptides represented in the gene trees.
You can view both using the gene tree, orthologues or paralogues, or protein family links from the Gene tab in the browser, or access both using the Compara API.
BioMart can be used to export homologues calculated from the gene trees.
", "category": "comparative"} \N \N 106005 2019-09-19 13:26:02 live 0 0 195 faq comparative genomics, compara, family, protein, parameters, markov {"question": "What are the BLAST and MCL options used to determine the Ensembl Compara MCL Families?
", "answer": "The families are calculated with the following parameters.
For version v50 (and future versions), the blastall options are:
blastall -d $fastadb -i $qy_file -p blastp -e 0.00001 -v 250 -b 0
For the MCL clustering, the parameters are:
-I 2.1 -tf 'gq(50)' -scheme 6
For version v48 and previous versions, the parameters were:
-I 2.1 -P 10000 -S 1000 -R 1260 -pct 90
", "category": "comparative"} \N \N 106005 2019-09-19 13:26:02 dead 0 0 197 view LD, linkage disequilibrium, allele, position, D', r2, d prime, D prime, haploview {"ensembl_object": "Location", "content": "This view displays detailed information on linkage disequilibrium (LD), a measure of the non-random association of alleles at two or more loci that descend from a single and ancestral chromosome.
\\r\\nThe commonly used summaries D' and r2 have been calculated.
\\r\\nD' is the difference between the observed and the expected frequency of a given haplotype. If two loci are independent (i.e. in linkage equilibrium and therefore not coinherited at all), the D' value will be 0.
\\r\\nr2 is the correlation between a pair of loci. It varies from 0 (loci are in complete linkage equilibrium) to 1 (loci are in complete linkage disequilibrium and coinherited). Note that only LDs with r2 values larger than 0.05 are available in Ensembl.
\\r\\n\\r\\nThe page shows the Ensembl genes and other genomic features annotated in the region such as SNPs, structural variants, etc. This can be customised by clicking on the 'Configure this page' button or on the cog wheel in the image. We also display the LD plot (s). In the centre of these plots, we give the genomic position of a SNP and show all LD values that are contained in a 20 kb region both upstream and downstream of this SNP.
\\r\\n[[IMAGE::LD_plot.png height=\\"540\\" width=\\"720\\"]]
\\r\\nLD values between any two variants in these plots are graphically displayed using inverted coloured triangles varying from white (low LD) to red (high LD). Hover over and click on the inverted triangle (s) to get the LD value between any two SNPs.
\\r\\nYou can select to view LD data for different populations by clicking on the 'Select populations' button in the left hand side. To export this data as HTML, Text, Excel or as a format for upload into Haploview, simply click on the 'Export data' button in the left hand side.
", "ensembl_action": "LD"} 13 2009-01-28 15:18:58 120522 2019-09-19 13:26:02 live 0 0 198 faq {"question": "I am looking for MeDIP data. These are the human, tissue-specific DNA methylation profiles discussed in Genome Research [Rakyan et. al, Sept 2008]. ", "answer": "From any human location tab, a region in detail view such as this example can display this information. To turn on the MeDIP tracks, select Configure this page from the left-hand menu. Select Functional genomics and switch on one or more MeDIP tracks. Click save and close. The region in detail view should reload with the new tracks added.", "category": "web"} 5655 2009-02-04 10:51:56 \N 2019-09-19 13:26:02 dead 0 0 199 faq Clones, mouse, MICER, knock-out {"question": "Where is the MICER resource for mouse?
", "answer": "The MICER clone set is available from any mouse location tab, in the region in detail. Turn on the MICER track as follows: click on Configure this page at the left of the region in detail view. Click on Misc. regions & clones in the left menu of the panel. Turn on the DAS track named MICER clones. Close the menu by clicking the check mark at the top right of the panel. The region in detail view should now reload with the new track displayed.
", "category": "z_data"} 5655 2009-02-04 10:58:38 106005 2019-09-19 13:26:02 dead 0 0 203 view whole genome alignment, conservation, variation, polymorphism, sequence variation, single nucleotide polymorphism, SNP, insertion, deletion, population, mutation, non-synonymous, synonymous, indel, genotype, allele, variant {"ensembl_object": "Variation", "content": "For the top panel describing variant details such as source and class, see this help page.
\\r\\nThis view shows a window of sequence taken from a whole genome sequence aligment across multiple species. Read more about alignments here. Click the \\"Select an alignment\\" roll-down menu at the top of the sequence in order to choose an alignment to view. The variant with 10bp flanking sequence will be shown for multiple species, if an alignment exists.
\\r\\nCoordinates for chromosomes and scaffolds in the alignment are listed for each species. Next, the sequence is shown for a small region of the alignment. Pink highlighted nucleotides show differences in the other species with respect to the primary species. Green highlighted nucleotides show positions of known variants in all species. The ambiguity or IUPAC code is shown above the variant.
\\r\\nVariants are available for multiple species in Ensembl. The majority of the variants are downloaded along with their flanking sequence from NCBI dbSNP and mapped to the genomic assembly, for each species. Ancestral alleles are determined by the Ensembl comparative genomics team using Ortheus. Other variants are obtained by resequencing efforts on mouse, rat, and human, and comparison of these newly sequenced genomes to our reference sequences. Sources of variants are listed above, in this view, or see the GENE tab, Variant Table link.
", "ensembl_action": "Compara_Alignments"} 5655 2009-04-23 10:45:42 120522 2019-09-19 13:26:02 live 0 0 593 view protein structure, PDB {"content": "\\r\\n Action \\r\\n | \\r\\n\\r\\n Mouse \\r\\n | \\r\\n\\r\\n Touchscreen \\r\\n | \\r\\n
\\r\\n Rotate \\r\\n | \\r\\n\\r\\n Left click and drag \\r\\n | \\r\\n\\r\\n One finger touch \\r\\n | \\r\\n
\\r\\n Zoom \\r\\n | \\r\\n\\r\\n Right click and drag \\r\\n | \\r\\n\\r\\n Two finger touch \\r\\n | \\r\\n
\\r\\n Move \\r\\n | \\r\\n\\r\\n Mouse wheel click and drag \\r\\n | \\r\\n\\r\\n Pinch \\r\\n | \\r\\n
\\r\\n Slab (move forward/backward through the structure) \\r\\n | \\r\\n\\r\\n Mouse wheel roller \\r\\n | \\r\\n\\r\\n Three finger touch \\r\\n | \\r\\n
What is a genome assembly?
", "answer": "The genome assembly is simply the genome sequence produced after chromosomes have been fragmented, those fragments have been sequenced, and the resulting sequences have been put back together. For more information, see the glossary.
\\r\\nEach species in Ensembl has a reference genome assembly that is produced by an international genome consortium. (Ensembl does not produce genome assemblies.) The reference assembly can be compiled from the DNA of one individual, a collection of individuals, a breed or a strain. This depends on the species. Find the DNA source of each genome sequence in the More information and statistics link on each species home page.
\\r\\nAssembly model
\\r\\nMost assemblies provided to Ensembl are 'haploid assemblies' and represent a single non-redundant path through the genome. Some assemblies, such as human and mouse, come with additional alternate sequences that represent additional paths through the genome. Examples of alternate sequences are:
\\r\\nThese alternate sequences can be viewed in the Ensembl browser where available.
\\r\\nUpdating a genome assembly
\\r\\nA genome assembly is updated when DNA has been sequenced that allows gaps to be filled. It may also be updated when a new assembling algorithm is released. This work is done by external groups, who submit the updated assembly to the INSDC.
\\r\\nA new genebuild may be performed by Ensembl when
\\r\\nAssemblies are updated in Ensembl on the order of once every two years, or less often, depending on the species.
\\r\\nOlder versions of genomic assemblies can be found in the archive sites.
\\r\\nGenome coverage
\\r\\nEnsembl does not generate genome assemblies, but rather we download genome assemblies from the INSDC and annotate them. If you have any questions regarding the sequencing coverage of a genome assembly in Ensembl, please contact the original submitter. This information can be found by querying the assembly accession (eg. GCA_000208655.2) or WGS record (eg. AAGV00000000.3).
", "category": "assemblies"} 5655 2009-08-05 13:37:54 5132 2019-09-19 13:26:02 live \N \N 212 movie tutorial, video, sequence, export, cDNA, genome, DNA, nucleotide, protein, FASTA, GenBank, EMBL {"list_position": 9, "youtube_id": "5moVkJrgvGM", "title": "Clip: Export Sequence", "youku_id": "XNDgxMjk2NjAw", "length": "1:08"} 2 2009-07-07 10:22:14 5655 2019-09-19 13:26:02 live 0 0 213 movie tutorial, video, sequence, exon, intron, DNA, nucleotide {"list_position": 8, "youtube_id": "_pZ8lYxc5KM", "title": "Clip: Exons and Introns", "youku_id": "XNDgxMjk2MDg0", "length": "1:12"} 2 2009-07-07 10:23:35 5655 2019-09-19 13:26:02 live 0 0 214 movie tutorial, video, variation, polymorphism, sequence variation, single nucleotide polymorphism, SNP, insertion, deletion, CNV, copy number variation, population, structural variation, mutation, non-synonymous, synonymous {"list_position": 10, "youtube_id": "wQbsLkjQqq4", "title": "Clip: Genome Variation", "youku_id": "XNDgxMjk3MDI0", "length": "0:37"} 2 2009-07-07 10:24:16 5655 2019-09-19 13:26:02 live 0 0 221 faq FAQ, comparative genomics, conserved, conservation, API, programmatic, whole genome alignment, GERP {"question": "Can I get the conservation scores (GERP scores) for nucleotides in whole genome alignments?
", "answer": "Yes, obtain the conservation scores, described here, by downloading the emf files for multiple species alignments from our ftp site.
Alternatively, use the Compara Perl API to obtain the scores. An example script is provided here.
", "category": "compara_api"} 5655 2009-08-10 15:55:38 120522 2019-09-19 13:26:02 live \N \N 222 movie tutorial, video, comparative genomics, species, whole genome alignment, synteny, conserved, homologous, compara {"list_position": 12, "youtube_id": "Re2voSoujeA", "title": "Clip: View Conserved Sequence", "youku_id": "XNDgxMjk4MDU2", "length": "1:26"} 5655 2009-08-18 16:33:26 5655 2019-09-19 13:26:02 live \N \N 223 movie tutorial, video, Array Express, transcript, functional genomics, transcriptome, Atlas, mRNA, tissue, DAS {"list_position": 11, "youtube_id": "7wYoWloZ8qA", "title": "Clip: Transcriptomics (ArrayExpress)", "youku_id": "XNDgxMjk3NTU2", "length": "1:12"} 5655 2009-08-18 16:41:45 5655 2019-09-19 13:26:02 live \N \N 224 movie tutorial, video, External data, DAS {"list_position": 13, "youtube_id": "dJv871q6Skc", "title": "Clip: View External Data (DAS)", "youku_id": "XNDgxMjk4NTIw", "length": "1:53"} 5655 2009-08-18 16:43:17 5655 2019-09-19 13:26:02 live \N \N 225 view comparative genomics, alignment, species, whole genome alignment, WGA, conserved, conservation, genome, Pecan, EPO, diagram, synteny, syntenic block, conservation, whole genome alignment, WGA, species, synteny view, patches, haplotypes, align {"ensembl_object": "Location", "content": "The top panels are similar to the chromosome diagram and gene map at the top of the Region in Detail view in the Location tab. More than one species can be shown in the panels. Patches and haplotypes can be viewed against the reference sequence, and paraloges can be compared.
\\r\\nGenomes for multiple species can be displayed graphically in the lower panel. This page shows chromosomes, scaffolds and contigs as they are. Whole genome alignments may be displayed. Note: The align slice page diagrams the alignments themselves.
\\r\\nSelect or remove a species by using the Select species button at the left of the page. Multiple species may be added to the view.
\\r\\nThe species you are coming from (for example if you were in the gene or transcript tab, or another view in the location tab) is shown in the first panel. Genes are drawn by default.
\\r\\nTurn on the BLASTz or tBLAT pairwise alignments using the Select species button at the left.
\\r\\n[[IMAGE::multi.png width=\\"400\\" height=\\"300\\"]]
\\r\\nThe image above shows human chromosome 13, base pairs 32,889,611 to 32,973,347 and the corresponding region in mouse (chromosome 5). BRCA2 is indicated in both human (gold gene) and mouse (red gene). The pink bar shown the pairwise alignment between human and mouse genomes. Click on the pink bar to see the chromosome and coordinates (in base pairs) of the alignment. Green shading connects the alignments.
\\r\\nCustomise the view using the Configure this page toolbar. You can select the species in the Select Species tab at the top of the configuration dialog. See below:
\\r\\n[[IMAGE::customised_view.PNG]]
\\r\\nTo see a blue line connecting homologous genes, click on the Configure multi-species image tab and under Comparative features select join genes. See below:
\\r\\n[[IMAGE::pageview.PNG width=\\"400\\" height=\\"300\\"]]
\\r\\nThe main panel of the configuration menu also allows display of different tracks, such as variations and ESTs (Expressed Sequence Tags) aligned to the genome.
\\r\\nZoom in or out by using the zoom slide, or the plus and minus buttons at the bottom of each panel. The panel may also be flipped in orientation or realigned using the buttons below the image. Click and drag a box with your mouse around any region to zoom in to that region.
", "ensembl_action": "Multi"} 5655 2009-09-03 11:22:47 106005 2019-09-19 13:26:02 live \N \N 226 view functional genomics, regulation, gene regulation, regulatory regions, promoter, enhancer, transcription factor binding, DNAse I, Pol II, CTCF, ChIp-chip, ChIp-Seq, RNA-Seq, ENCODE, core feature {"ensembl_object": "Regulation", "content": "The regulatory features (ENSR... IDs) are the output of the Ensembl Regulatory Build (Zerbino et al. 2015). They define predicted promoters, enhancers and other sequences involved in gene regulation. This view shows a summary across all the cell lines in the context of genes in the surrounding genomic region. Tarbase, Fantom5 and CRISPR SpCas9 data is also shown by default.
\\r\\nInformation about the activity of the regulatory feature in specific cell types is indicated in the 'Summary of Regulatory Activity' table below.
", "ensembl_action": "Summary"} 5655 2009-09-11 14:53:25 120522 2019-09-19 13:26:02 live \N \N 497 lookup \N {"expanded": "", "word": "TSL:4", "meaning": "Transcript Support Level 4, for transcripts supported by an EST flagged as suspect.
"} \N \N \N 2019-09-19 13:26:02 live 0 0 227 view promotor, ENCODE, regulatory feature, reg feat, functional genomics, regulation, reg-feat, enhancer, insulator, cell, ChIP-seq, regulatory region, transcription factor binding, DNAse I, Pol II, CTCF, epigentics, epigenomics, histone, histone modification {"ensembl_object": "Regulation", "content": "Supporting evidence for a regulatory feature is shown in tabular format. For more about regulatory features, see Ensembl Regulation Build (Zerbino et al. 2015).
", "ensembl_action": "Evidence"} 5655 2009-09-11 15:00:48 104467 2019-09-19 13:26:02 live \N \N 228 view promotor, ENCODE, regulatory feature, reg feat, functional genomics, regulation, reg-feat, enhancer, insulator, cell, ChIP-seq, regulatory region, transcription factor binding, DNAse I, Pol II, CTCF, epigentics, epigenomics, histone, histone modification {"ensembl_object": "Regulation", "content": "In this view, the selected regulatory feature is highlighted in a local neighbourhood, along sides genes and the 'Regulatory Segmentation'. To the bottom of the image, other relevant externally curated data sets are shown.
\\r\\nClick on any feature for more information. Add or remove tracks with the configure this page link at the left of the view.
", "ensembl_action": "Context"} 5655 2009-09-11 15:05:52 106005 2019-09-19 13:26:02 live \N \N 276 faq ENST, transcript, cDNA, isoform, CCDS {"question": "Which transcript should I use?
", "answer": "Different splice variants are found in different tissue types, developmental stages, etc. The sequence may be known to a high level of confidence, or it may be a transcript only sequenced once. It is often confusing to be presented with a list of transcripts. However, there are ways to identify the best choice for you.
\\r\\nIf a human gene has a MANE Select transcript, this is the one you should use. This is an agreement between NCBI and Ensembl as to the clinically most relevant transcript, and the transcript structure matches perfectly between the two databases.
\\r\\nIf a gene does not have a MANE Select transcript, APPRIS, provides a similar evaluation of the most biologically relevant transcript, which they call the Principal Isoform. They most likely candidate will be the P1 but they can score down to P5.
\\r\\nIf those are not available, you may wish to consider the quality of annotation. A CCDS identifier indicates that the coding region of the transcript is matched between NCBI and Ensembl, while a gold transcript indicates matching annotation by the two different methods of annotation in Ensembl. The TSL gives a score for the amount of data that supports the existence of a transcript.
\\r\\nIn the region or gene pages, you can plot RNA-seq data against the genome. Turn on the RNA-seq data for your tissue of interest and see which transcripts occur in that tissue.
", "category": "genes"} 5655 2010-02-24 14:09:11 106005 2019-09-19 13:26:02 live \N \N 277 view promotor, ENCODE, regulatory feature, reg feat, functional genomics, regulation, reg-feat, enhancer, insulator, cell, ChIP-seq, regulatory region, transcription factor binding, DNAse I, Pol II, CTCF, epigentics, epigenomics, histone, histone modification {"ensembl_object": "Regulation", "content": "In this view, regulatory features are divided according to cell line. The summary infomation at the top shows location and classification information along with details of which cell lines have evidence at this location, in the 'Active in' row.
\\r\\nThe image is split into cell type specific panels, the first describing the merged 'MultiCell' regulatory feature and the underlying regulatory segmentation analysis. Additional cell line and supporting evidence can be configured using the buttons above the image, or the usual config icons in the image header.
\\r\\nEvidence for each cell line are grouped into tracks describing 'Transcription Factor and DNAse' and 'Histone & Polymerase' evidence. Peak calls are displayed as blocks, with black triangles above and below the features indicating the position of the peak summit, and the vertical black bars indicate a transcription factor binding matrix(PWM) alignment or 'motif features'. Clicking on the motif features will highlight the relevant rows of the 'Motif Information' table in the pop-up menu, along with other information relevant to that peak. The underlying signal tracks describe a summary of the alignments for a given experiment, base on optimised window sizes.
\\r\\nClicking on a regulatory feature will also show it's activity status and any associated 'attributes' i.e. any underlying evidence and motif features.
\\r\\nFor more about the underlying data, see this article.
\\r\\n", "ensembl_action": "Cell_line"} 5655 2010-05-21 13:09:49 106005 2019-09-19 13:26:02 live \N \N 279 view linkage disequilibrium, LD, linked, variant, mapping, polymorphism, sequence variant, single nucleotide polymorphism, SNP, insertion, deletion, population, mutation, non-synonymous, synonymous, indel, genotype, allele {"ensembl_object": "Variation", "content": "
The top panel gives details on a given variant such as source, class, clinical significance and others. See 'Explore this variant' for more details.
\\r\\nA table is shown with links to find out about the linkage disequilibrium in the different 1000 genomes sub-populations, grouped by super-population. Click on the populations to find out more about them. Click on the links in the table to view a Manhattan plot, a list of variants in LD, an LD plot and a table of local variants indicating level of linkage.
\\r\\nClick on \\"Configure this page\\" at the left to select populations to be displayed, or to change the distance over which linked variants are shown.
\\r\\nYou can see variants in high LD by clicking on Show in the Variants in high LD column for a population. This will open up a table below the original one, listing all the variants. Note the values are calculated for the comparison of the variant of interest with nearby variants, from the 1000 Genomes individuals.
\\r\\nThe linked variant tables show the distance between the linked variant and the variant on which the view is focused, any overlapping genes, phenotypes associated with the linked variant and overlapping genes and the D' and r2 values.
\\r\\nD' is the difference between the observed and the expected frequency of a given haplotype. If two loci are independent (i.e. in linkage equilibrium and therefore not coinherited at all), the D' value will be 0. r2 is the correlation between a pair of loci. It varies from 0 (loci are in complete linkage equilibrium) to 1 (loci are in complete linkage disequilibrium and coinherited). Note that only LDs with r2 values larger than 0.05 are shown in Ensembl.
", "ensembl_action": "HighLD"} 5655 2010-07-07 11:00:20 106005 2019-09-19 13:26:02 live \N \N 280 view variant, consequence, type, variation, polymorphism, sequence variation, single nucleotide polymorphism, SNP, insertion, deletion, population, mutation, non-synonymous, synonymous, gene, transcript, SIFT, PolyPhen, protein, GTEx, eQTL, expression {"ensembl_object": "Variation", "content": "For the top panel describing variation details such as source and class, see this help page.
\\r\\nThree tables are shown.
\\r\\nThe Gene and Transcript consequences table shows the position and effect of the variation on specific genes and transcripts.
\\r\\nColumns in the table are as follows:
\\r\\nThe Gene expression correlations table shows GTEx eQTL data, which identifies the influence of variants on tissue-specific gene expression.
\\r\\nColumns in the table are as follows:
\\r\\nThe Regulatory consequences table shows Ensembl regulatory features and motifs at the variant position that may be involved in gene regulation from the Regulatory Build.
\\r\\nColumns in the table are as follows:
\\r\\nNote: Use the Show/hide columns button to turn unwanted columns off (or on). Expore the table using the 'CSV' button at the top right of each table. The Filter box provides a search.
", "ensembl_action": "Mappings"} 5655 2010-07-07 11:03:36 120522 2019-09-19 13:26:02 live \N \N 281 faq FAQ, Local installation, database {"question": "Can I install a local copy of the Ensembl database(s)?
", "answer": "Yes you can. In fact, if you wish to run a script that queries a large amount of information, it is best to install a local copy. Instructions are here.
", "category": "data"} 5655 2010-07-07 12:04:54 \N 2019-09-19 13:26:02 live \N \N 284 movie tutorial, video, BioMart, variation, gene symbol, gene name, ID, rs, dbSNP, convert {"list_position": 19, "youtube_id": "paC3sOANSJA", "title": "BioMart: Variation IDs to HGNC Symbols", "youku_id": "XMjQ2MzAxOTgw", "length": "2:58"} \N 2010-11-05 11:07:27 5655 2019-09-19 13:26:02 live \N \N 282 faq haplotype, assembly, COX, MHC, alternate, sequence {"question": "My human gene is on HSCHR6_COX. What is that?
", "answer": "The MHC region on human chromosome 6 is highly variable. The reference assembly only reflects one possible sequence at this position. Nine haplotypes are included in the human genome assembly hosted by the GRC to describe alternate sequence (with a different allele combination), and 7 of these are in the MHC region. HSCHR6_MHC_COX is one haplotype. Genes are annotated on both the reference sequence, and the haplotypes.
", "category": "assemblies"} 5655 2010-09-22 13:00:26 \N 2019-09-19 13:26:02 dead \N \N 285 view GO, gene ontology, function, protein, role {"ensembl_object": "Transcript", "content": "The Gene Ontology (GO) consortium has developed a controlled vocabulary of terms split into three categories, or ontologies: cellular component, biological process and molecular function.
\\r\\nThe Gene Ontology tables show the GO terms associated with the Ensembl transcripts of this gene. There are three separate GO tables for each gene, corresponding to GO: Biological process, GO: Molecular function and GO: Cellular component.
\\r\\nEnsembl associates GO terms to genes via UniProt mappings. Three-letter evidence codes refer to the evidence used for the initial assignment of GO terms to UniProt records. A summary is below. For more information, see the GO evidence codes page.
\\r\\nGuide to evidence codes from GO:
\\r\\nWhat human genome assembly and coordinate system is Ensembl using?
", "answer": "Ensembl uses a one-based coordinate system, whereas UCSC uses a zero-based coordinate system.
\\r\\nEnsembl uses the most recently updated human genome housed at the GRC. This current major assembly release is called GRCh38. NCBI and UCSC use the same genome. UCSC refers to the recent human genome as GRCh38/hg38.
\\r\\nWe maintain a long-term archive of the previous assembly of the human genome, GRCh37, with BLAST/BLAT, VEP and BioMart. The data in this archive is based on the Ensembl 75 data.
\\r\\nLinking back to a previous Ensembl release:
\\r\\nIf you are looking for older assemblies, try the Ensembl archive sites. In addition to the previous GRCh37, Ensembl release 54 contains the older human assembly NCBI36, which is referred as NCBI36/hg18 by UCSC.
\\r\\nTo link to a specific Ensembl archive site, remember to use the Permanent Link (found at the bottom left of each page) eg. http://jun2013.archive.ensembl.org/Homo_sapiens/Info/Index.
\\r\\nFinding the new coordinates for your region of interest
\\r\\nYou may have stored a genomic location on NCBI36 or GRCh37, and want to know where the equivalent region is on GRCh38. You can convert old coordinates to new ones via our Rest API or you find the new coordinates by simply adding the old assembly name into the address bar on the Ensembl browser webiste. See an example below:
\\r\\nwww.ensembl.org/Homo_sapiens/Location/View?db=core;r=13:32889611-32973805;a=ncbi36.
\\r\\nThis will redirect you to the new region in the latest assembly
\\r\\nAssembly updates
\\r\\nThe GRC will produce minor assembly releases to GRCh38 on a regular basis. We incorporate these assembly updates, known as \\"patches\\" into GRCh38, as we did for GRCh37. By default, the chromosome coordinates do not change when we update the human assembly to the most recent minor release. Instead, we're adding additional alternate sequences to the assembly that can be swapped in on request.
\\r\\nWhen we update GRCh38 to the latest patch release, the assembly name is appended with the patch number. For example, the first minor assembly release to become available for GRCh38 will be called GRCh38.p1.
", "category": "assemblies"} 5655 2011-01-04 14:07:47 5655 2019-09-19 13:26:02 live \N \N 288 faq variation, polymorphism, sequence variation, single nucleotide polymorphism, SNP, insertion, deletion, mutation, non-synonymous, synonymous {"question": "What is the Variant Effect Predictor?
", "answer": "This tool allows you to input coordinates of any alleles you have identified, and determine the effect on relevant Ensembl transcripts and proteins. An example input file is available. For example, if a variant that you enter as input causes a change in the protein sequence, Ensembl will calculate the possible amino acids at that position. The variant would be given a consequence type of \\"non-synonymous\\". Have a look at other possible consequence types in the variation documentation.
\\r\\nYou can find the variant effect predictor in Ensembl tools. Use the online interface, or a Perl script with the API.
", "category": "variation"} \N 2011-03-14 09:21:54 106005 2019-09-19 13:26:02 dead \N \N 289 faq FAQ, data upload, custom data, user data, BED, BedGraph, BAM, GFF, GTF, BIG, PSL, VEP, custom track, custom annotation, display data {"question": "I want to add my data and view it in Ensembl
", "answer": "If you have genomic coordinates, you can use the Add your data tool button [[IMAGE::CropperCapture554].png]] to display your features in Ensembl. Popular pages for this are Region in detail, Region overview, and the Karyotype. Alternatively, if you have sequence variants you may want to analyse them with our Variant Effect Predictor (VEP) tool.
\\r\\nEither upload a file or attach your data using a URL. If you have uploaded or attached data to Ensembl already, the Add your data button will change to Manage your data.
\\r\\nThere are many supported file formats you can use, namely BED, BedGraph, GFF/GTF, PSL, WIG, BAM for sequencing reads and many more.
\\r\\nDid you know you can share your customised view with collaborators? Free registration allows you to log in and save your data for access from any computer.
\\r\\nHave a look at these upload exercises to go through some examples of viewing your own data in Ensembl.
", "category": "data"} 5655 2011-03-15 15:42:12 106005 2019-09-19 13:26:02 dead \N \N 290 view variant, variation, polymorphism, sequence variation, single nucleotide polymorphism, SNP, insertion, deletion, population, mutation, non-synonymous, synonymous, indel, flank, flanking, surrounding, sequence, nucleotide, base, bp {"ensembl_object": "Variation", "content": "For the top panel describing variation details such as source and class, see this help page.
\\r\\nThe sequence used to place a variation (indicated in red as an ambiguity code within the sequence) is displayed. The sequence can be either on the forward or the reverse strand of the genome (please be aware of this when placing a variation within the context of a gene).
\\r\\nNote, the submitted sequence around the variation (to dbSNP) may differ from the Ensembl reference sequence. In this case, an alignment between the reference and submitted sequence is shown. This can be deselected using the configure this page tool button at the left of the view.
\\r\\nIUPAC Ambiguity Codes
\\r\\n[[IMAGE::iupac_table.png width=\\"394\\" height=\\"302\\"]]
", "ensembl_action": "Sequence"} 5655 2011-04-01 11:39:41 106005 2019-09-19 13:26:02 live \N \N 375 view variant, variation, SNP, polymorphism, 1000 Genomes, mutation, HapMap, consequence, dbSNP {"content": "Short sequence variations are displayed for a transcript (splice variant) in the Variation table.
This view provides a summary of all short sequence variants for a specific transcript (splice variant). The variants are summarised by "consequence type" (the location within the transcript or effect on the coding/splicing sequence). We use SO terms from the Sequence Ontology project, described here, and the ontology of the consequence type may be changed using the Configure this page tool button at the left of the view.
Click on "Show" next to the variation consequence for a table of a specific variation type. Export the table using the CSV icon at the top right of the table.
The variation sources shown in the table and diagram can be changed using the configure this page tool button at the left. By default, not all intronic variations are shown. To see a full display of intronic variations, select "Intron Context" and change it from 100bp to "Full Introns" in the menu.
If you show a table for a variation consequence type, the columns will be as follows:
Where do haplotypes and patches come from?
The GRC provides haplotypic regions along with the Primary assembly for human. In addition to haplotypes, which are regions with known variations to the Primary assembly, the GRC provides assembly patches on a regular basis.
[[MOVIE::372]]
There are two types of assembly patches: Novel patches and Fix patches. Novel patches are additional sequence for alternate alleles that are not represented on the primary assembly. Fix patches are additional sequence that will replace the known regions of misassembly in GRCh38 when the next major assembly update is released.
How is this shown on the browser?
You might notice some strange-looking chromosome names, or regions of the genome that have red or green highlighting. Red regions represent regions of the genome where there is a haplotype or novel patch. Green regions represent regions of the genome where there is a fix patch. For an example, see the top image in Region in Detail. You can jump to haplotypes and patches by clicking on those highlighted regions. Other access points include BioMart (see the Region: Chromosome filter), and the Perl API.
What haplotypes are available with GRCh38?
A list of haplotypes and patches can be found on the GRC human overview page. There are over 200 alternate haplotypes available for human in GRCh38.
How can I download the DNA sequence for haplotypes and patches?
The DNA sequence for the primary assembly plus haplotypes and patches can be downloaded from our FTP site.
In addition to the primary assembly chromosomes, we have constructed 'patched chromosomes' by applying the individual haplotypes and patches to their relevant chromosome at the position indicated by the GRC as being the 'equivalent' region. Each patched chromosome has only one patch or one haplotype applied. The patched chromosome will have a length that is similar to the primary assembly chromosome; if the length of the assembly patch is shorter than the region it replaces, then the patch chromosome will also be shorter than the primary assembly chromosome.
Outside the region of the assembly patch, the entire length of a 'patch chromosome' is padded with Ns. All 'patch chromosomes' in Ensembl have their sequence padded with N's to ensure alignment programs can report the correct index regions e.g. A patch with a start position of 1,000,001 will have 1e6 N's added its start so an alignment program will report coordinates with respect to the whole chromosome.
How do you (Ensembl) know where to apply an assembly patch?
When the GRC submit the assembly patches, they specify the genomic location (ie. chromosome, plus start- and end-coordinates) on the primary assembly that the assembly patch relates to. We download this information from the GRC FTP site.
How do I know which parts of the patch are different compared to the chromosome on the primary assembly?
If you are interested in alignments, we show these in two ways:
We download GRC's alignments between the primary assembly chromosome and assembly patch, and display these alignments on Region In Detail in the "GRC alignment import" track. This track can only be seen when you're viewing a primary assembly chromosome in a region where there is an overlapping assembly patch. Here is example for the ABO gene. You can click on the green triangles for more information.
We also produce our own alignments between the primary assembly chromosome and assembly patch, using LASTZ. You can see these alignments as a pink track when you're comparing the primary assembly and the patch in 'Region Comparison' view.
How do I access the haplotypes and patches programmatically?
The haplotypes and assembly patches can be fetch using our API. For historical reasons, when using the API the primary assembly is known as the ‘reference’ sequence and the alternate sequences (haplotypes and patches) are know as ‘non-reference’ sequence. For example:
$slices = $slice_adaptor->fetch_all( ‘toplevel’, undef, 1 );
or
$assembly_exception_features = $assembly_exception_feature_adaptor->fetch_all_by_Slice($slice);
How do I access the haplotypes and patches using MySQL queries?
We store information about alternate sequences in the assembly_exception table.
mysql -uanonymous -hensembldb.ensembl.org -P3306 -Dhomo_sapiens_core_77_38 -e "select sr2.name as chr_name, exc_seq_region_start,exc_seq_region_end,exc_type,sr1.name as alternate_seq_name,seq_region_start, seq_region_end from assembly_exception ae, seq_region sr1, seq_region sr2 where sr1.seq_region_id=ae.seq_region_id and sr2.seq_region_id=ae.exc_seq_region_id order by chr_name,exc_seq_region_start"
For more about patches and haplotypes, see our blog post.
", "question": "What haplotypes and assembly patches can I see for human?
", "category": "assemblies"} \N 2011-05-25 10:18:30 125866 2020-02-19 16:24:20 live \N \N 292 faq 1000 genomes, variation, API {"question": "How do I show and/or retrieve variation data from the 1000 Genomes project?", "answer": "Ensembl contains variation data from the 1000 Genomes project.", "category": "variation"} \N 2011-05-25 16:35:17 \N 2019-09-19 13:26:02 dead \N \N 293 faq 1000 genomes, variation, API {"question": "How do I show and/or retrieve variation data from the 1000 Genomes project?
", "answer": "Ensembl contains variation data from the 1000 Genomes project. Currently these are limited to the data from the pilot phase of the project.
\\r\\n\\r\\n
To show these data in the browser:
\\r\\n\\r\\n
\\r\\n
To retrieve these data genome-wide we recommend using the Perl Variation API. The variations from the 1000 Genomes project are grouped in various variation sets, e.g. 1000 genomes - High coverage - Trios or 1000 genomes - High coverage exons CEU. All variations within a particular set can be retrieved using the VariationAdaptor method fetch_all_by_VariationSet
", "category": "variation"} \N 2011-05-25 16:46:25 \N 2019-09-19 13:26:02 dead \N \N 294 faq RefSeq, Ensembl gene, Xref {"answer": "Ensembl gene sets are comprehensive sets, based on supporting evidence from sequence databases including UniProt and RefSeq. Where a transcript in the Ensembl set has a close match to a RefSeq transcript, the two transcripts are linked. RefSeq IDs linked to Ensembl transcripts are available in the browser under the Transcript tab, General identifiers view, and also from BioMart and from the API as Xrefs. Nearly 100% of NCBI RefSeq proteins have a corresponding protein in the Ensembl annotation.
In addition to linking the Ensembl annotation to the corresponding RefSeq annotation, the complete set of RefSeq models are imported into Ensembl for human and mouse. These are visible as a separate track in Location tab. To switch on the track, click 'Configure this page, open the 'Genes' list, and select 'Human RefSeq import' or 'Mouse RefSeq import'. The image below shows there are imported RefSeq models reflecting one protein coding transcript and three noncoding RNAs (snoRNAs). The Ensembl/Havana gene track shows one protein coding transcript agreed on by both the Ensembl annotation pipeline and Havana manual curation. A second transcript has a retained intron and is untranslated.
We load these models directly into the otherfeatures database and do not change any coordinates. Click here for an example script to access RefSeq gene models for human using the API.
Why do they differ?
While Ensembl gene models are annotated directly on the reference genome, RefSeq annotates on mRNA sequences. Due to sequence differences between the reference genomes and individual mRNAs, some of the RefSeq mRNAs may not map perfectly to the reference genome. For example, translations may contain stop codons when they are translated from the reference genome's DNA. Ensembl transcripts will reflect the reference genome in these cases, not the mRNA, and therefore there can be small differences between RefSeq mRNA/proteins and Ensembl transcripts/proteins.
See this article for more on Ensembl gene annotation.
", "question": "How do I access RefSeq annotation in Ensembl?
", "category": "genes"} \N 2011-06-03 10:19:29 254419 2020-06-05 11:12:48 live \N \N 298 faq OMIM, MIM, phenotype, disease, mendelidan, human {"question": "Where are disease and phenotype associations from OMIM (for human)?
", "answer": "Gene views
\\r\\nAssociations between diseases and phenotypes and human genes can be found on the Gene tab, phenotype view.
\\r\\nDirect associations are imported from the Online Mendelian Inheritance in Man (OMIM). They are considered as external references or Xrefs and may also be found in the Gene tab by clicking on the External references link in the left hand menu.
\\r\\nOther associations between a disease/phenotype and a gene are based on studies involving sequence variants. These are shown in the gene tab, phenotype view, and come from these sources.
\\r\\n\\r\\n
Location views
\\r\\nThe OMIM and other selected variation sets can be viewed for a region in the Location tab. Use Configure this page to turn on the OMIM phenotype - short variants (SNPs and indels) track, or example. The Variation tab, Phenotype Data view also shows phenotypes for variations.
\\r\\n\\r\\n
Phenotype views
\\r\\nPhenotypes and disease information can also be viewed on the phenotype tab.
\\r\\n\\r\\n
The Perl API and BioMart also allow access to OMIM associations for both genes and variants.
\\r\\n", "category": "variation"} \N 2011-06-16 10:10:23 120522 2019-09-19 13:26:02 live \N \N 299 view variant, OMIM, MIM, phenotype, disease, polymorphism, COSMIC, NHGRI, variation, sequence variation, single nucleotide polymorphism, SNP, insertion, deletion, mutation, non-synonymous, synonymous, indel {"content": "
Diseases and traits that are associated with the variant of interest are shown on this page.
The different sources of this data can be found in the second column of the table and some examples are listed below:
Ontology mappings and accession terms related to phenotypes associated with the vaRIANT of interest are also included in the table. Ontology annotations of human phenotypes are imported from EFO, Human Phenotype Ontology and Orphanet.
The reported gene comes directly from the specific study (linked in the second column), as do the associated allele, p value, odds ratio and beta coefficient (if available). These are genes that were reported in the paper as being associated with this GWAS variant, and may not correspond to the genes reported on the Genes and Regulation page for the variant. The associated allele is also that reported in the paper, and may be the positive or negative strand allele (the alleles shown in Ensembl are always the positive strand alleles).
The statistics that may be displayed are:
Odds ratio: The odds of having the phenotype if you have the associated allele or genotype, compared to the odds of having the phenotype in the general population.
P-value: The probability that the association (odds ratio) between the locus and the phenotype is due to chance, usually calculated through a chi-squared test.
Beta coefficient: A measure of the standard error.
It is recommended that you read the publications to determine exactly how individual statistics are calculated.
Click on the associated allele to see the frequency of that allele in 1000 Genomes populations.
It is possible to retrieve the list of variations associated to the phenotypes in BioMart or their locations can be viewed on the karyotype.
Check the Variation - Data sources for a full list of sources of variation data currently available in Ensembl.
", "ensembl_action": "Phenotype", "ensembl_object": "Variation"} \N 2011-06-16 16:46:37 254453 2024-03-11 17:04:36 live \N \N 300 faq LRG, LSDB, assembly {"question": "What is an LRG?", "answer": "LRG stands for Locus Reference Genomic. An LRG is a fixed sequence, independent of the genome, specifically created for the diagnostic community to record DNA sequence variation on a fixed framework.
Sequence variants in LSDBs (Locus Specific Databases) are reported using LRG sequences. For more information, please see the LRG project page.
", "category": "variation"} \N 2011-08-09 11:02:06 \N 2019-09-19 13:26:02 live \N \N 301 faq gene, assembly, update, transcript {"question": "My gene has changed, and I don't know why.
", "answer": "The Ensembl gene set is updated when a new genebuild is performed. New evidence such as recently submitted cDNA and protein sequences, and updated transcripts from the VEGA/Havana project (for human, mouse, rat, and zebrafish) are included in updated gene sets. A gene may look different in a more current Ensembl version if new evidence has been included, and if the underlying genome sequence has changed.
\\r\\nGenome sequences, or assemblies, are changed when new evidence allows gaps to be filled, or errors to be corrected. A genome assembly is usually updated once a year or less often. The stop and start coordinates of a gene may change if there has been an update in the underlying sequence assembly.
\\r\\nThe news for every Ensembl release (or update) explains the changes within the new version (such as new genebuilds and new assemblies). To find out what assembly you are working with in Ensembl, go to any species home page.
", "category": "genes"} \N 2011-08-09 11:17:16 \N 2019-09-19 13:26:02 live \N \N 302 view disease, phenotype, OMIM, variation, COSMIC, mutation {"ensembl_object": "Gene", "content": "Three sections appear on this page.
\\r\\nThe first section lists diseases and phenotypes directly associated with a gene by the Online Mendelian Inheritance in Man (OMIM) compendium, Orphanet, DDG2P and other sources in human. In other species these phenotypes come from International Mouse Phenotyping Consortium (IMPC; mouse), Europhenome (mouse), ZFIN (zebrafish), Rat Genome Database (RGD; rat), Animal_QTLdb (various) and Online Mendelian Inhertiance in Animals (OMIA; various). You can see which databases are used for phenotypes for each species on our Variation sources page.
\\r\\nThe second section shows a table of diseases and phenotypes associated with a gene based on a sequence variant. See this help page for the sources of these associations.
\\r\\nClick on Show next to any phenotype in the table in order to reveal more information about the sequence variant.
\\r\\nThe third section lists any orthologues of the gene that are associated with phenotypes in that species. Note that because of biological differences between species, some phenotypes will not be easy to translate to another species.
", "ensembl_action": "Phenotype"} \N 2011-09-07 10:49:59 120522 2019-09-19 13:26:02 live \N \N 303 faq GENCODE, CCDS, annotation, gene {"question": "What is GENCODE?
", "answer": "GENCODE is a sub-project of the ENCylcopedia Of DNA Elements (ENCODE) project. The aim of GENCODE is to annotate all evidence-based gene features (genes, transcripts, coding sequences, etc) in the entire human and mouse genomes at a high accuracy. The result will be a set of annotations including all protein-coding loci with alternatively transcribed variants, non-coding loci with transcript evidence and pseudogenes. The process to create this annotation involves manual curation, different computational analysis and targeted experimental approaches. Putative loci can be verified by wet-lab experiments and computational predictions are analysed manually. The Ensembl human and mouse gene sets are a merge of Havana's manual annotation with Ensembl's automatic annotation. It is this merged gene set that we provide to GENCODE. The default human and mouse gene sets in the Ensembl browser is therefore also the current version of GENCODE. More information can be found at the GENCODE website.
", "category": "genes"} \N 2011-09-08 12:53:52 \N 2019-09-19 13:26:02 live \N \N 314 movie variation, overview, www.ensembl.org, genome, genes {"list_position": 1, "youtube_id": "bGryvTCOMGA", "title": "Ensembl Presentation", "youku_id": "XNDgxMzAyNDQ4", "length": "10:43"} 5655 2011-10-28 12:50:52 106005 2019-09-19 13:26:02 live 0 0 315 movie variation, polymorphism, SNP, CNV, Structural variation, Ensembl, www.ensembl.org, genome browser, gene {"list_position": 14, "youtube_id": "mf6QZrfmaE4", "title": "Demo: Sequence Variation for a Gene", "youku_id": "XNDgxMzAwNjYw", "length": "5:41"} 5655 2011-10-28 12:54:21 5655 2019-09-19 13:26:02 live 0 0 316 movie variation, location, region, chromosome, CNV, Structural variation, Ensembl, www.ensembl.org, genome browser, gene {"list_position": 15, "youtube_id": "spTrW9I0vpQ", "title": "Demo: Structural variation for a region", "youku_id": "XNDgxMzAxMDYw", "length": "3:52"} 5655 2011-10-28 12:55:58 5655 2019-09-19 13:26:02 live 0 0 317 movie hg18, hg19, NCBI36, GRCh37, assembly, genome, coordinates, assembly converter, archive {"list_position": 17, "youtube_id": "UUUTAV9orgw", "title": "Demo: Old genome coordinates to new", "youku_id": "XNDgxMzAxMzQ0", "length": "2:44"} 5655 2011-10-28 12:57:33 5655 2019-09-19 13:26:02 live 0 0 319 view variation, CNV, SV, structural variant, copy number variant, structural, variant, DGVa, dbVar, deletion, duplication, translocation, gene {"ensembl_object": "Gene", "content": "Structural variants are imported from DGVa (which is synchronised with dbVar). They are drawn along the genome (blue bar) according to the style described in the dbVar documentation.
\\r\\nBelow the graphical view, the structural variants in the gene region are provided as a table.
\\r\\nFor more on Ensembl variation, see the variation documentation.
", "ensembl_action": "StructuralVariation_Gene"} 5655 2011-10-28 13:46:03 122937 2019-09-19 13:26:02 live 0 0 321 view polymorphism, sequence variant, single nucleotide polymorphism, SNP, insertion, deletion, population, mutation, non-synonymous, synonymous, indel, genotype, allele, variant, validation, evidence {"ensembl_object": "Variation", "content": "This page offers a top panel of information specific to the variant. Graphical icons are presented that lead you to more specific variant data, also accessible from the links at the left. The links in the left hand menu have a corresponding icon. It's your choice how to navigate through the variation displays.
From the top of the view, the following information can be found:
Please see the Ensembl variation documentation for more information such as source of variants, and consequence types (effect on genes and transcripts).
", "ensembl_action": "Explore"} 5655 2011-11-18 14:31:20 120522 2019-09-19 13:26:02 live 0 0 445 view \N {"content": "What do the population codes for human allele frequencies mean?
", "answer": "In views like Population genetics in the variation tab, you may find three letter codes for populations.
\\r\\nThese come from the HapMap project, and/or the 1000 Genomes project.
\\r\\nThe following table describes the population codes, and shows which populations are grouped into super populations.
\\r\\nPopulation Code | \\r\\nDescription | \\r\\nSuper Population Code | \\r\\n
CHB | \\r\\nHan Chinese in Bejing, China | \\r\\nEAS | \\r\\n
JPT | \\r\\nJapanese in Tokyo, Japan | \\r\\nEAS | \\r\\n
CHS | \\r\\nSouthern Han Chinese | \\r\\nEAS | \\r\\n
CDX | \\r\\nChinese Dai in Xishuanagbanna, China | \\r\\nEAS | \\r\\n
KHV | \\r\\nKinh in Ho Chi Minh City, Vietnam | \\r\\nEAS | \\r\\n
CEU | \\r\\n\\r\\n Utah Residents (CEPH) with Northern and Western European ancestry \\r\\n | \\r\\nEUR | \\r\\n
TSI | \\r\\nToscani in Italia | \\r\\nEUR | \\r\\n
FIN | \\r\\nFinnish in Finland | \\r\\nEUR | \\r\\n
GBR | \\r\\nBritish in England and Scotland | \\r\\nEUR | \\r\\n
IBS | \\r\\nIberian population in Spain | \\r\\nEUR | \\r\\n
YRI | \\r\\nYoruba in Ibadan, Nigera | \\r\\nAFR | \\r\\n
LWK | \\r\\nLuhya in Webuye, Kenya | \\r\\nAFR | \\r\\n
MAG | \\r\\nMandinka in The Gambia | \\r\\nAFR | \\r\\n
MSL | \\r\\nMende in Sierra Leone | \\r\\nAFR | \\r\\n
ESN | \\r\\nEsan in Nigera | \\r\\nAFR | \\r\\n
ASW | \\r\\nAmerican's of African Ancestry in SW USA | \\r\\nAFR | \\r\\n
ACB | \\r\\nAfrican Carribean in Barbados | \\r\\nAFR | \\r\\n
MXL | \\r\\nMexican Ancestry from Los Angeles USA | \\r\\nAMR | \\r\\n
PUR | \\r\\nPuerto Rican from Puerto Rica | \\r\\nAMR | \\r\\n
CLM | \\r\\nColombian from Medellian, Colombia | \\r\\nAMR | \\r\\n
PEL | \\r\\nPeruvian from Lima, Peru | \\r\\nAMR | \\r\\n
GIH | \\r\\nGujarati Indian from Houston, Texas | \\r\\nSAS | \\r\\n
PJL | \\r\\nPunjabi from Lahore, Pakistan | \\r\\nSAS | \\r\\n
BEB | \\r\\nBengali from Bangladesh | \\r\\nSAS | \\r\\n
STU | \\r\\nSri Lankan Tamil from the UK | \\r\\nSAS | \\r\\n
ITU | \\r\\nIndian Telugu from the UK | \\r\\nSAS | \\r\\n
\\r\\n
These populations have been divided into 5 super populations
\\r\\nHow do I install the API?
", "answer": "You can either clone it from our GitHub using Git or download the files from our FTP site or GitHub website. See the instructions. It is important to use the same version of the API as the database you would like to interrogate. The API should be updated with each new release.
\\r\\n\\r\\n
There are extra Perl modules that you may need to install and configure to work with the API. There are further instructions on how to do this on our blog for Mac and for Windows.
", "category": "core_api"} 5655 2012-02-01 16:47:29 106005 2019-09-19 13:26:02 dead 0 0 333 faq BLAST, BLAT, UCSC, Ensembl, NCBI {"question": "Why do I see different hits when I use BLAT or BLAST from Ensembl, UVSV, and NCBI?", "answer": "The parameters that Ensembl uses for BLAT are the same as UCSC. However, differences in the underlying genome assembly, and in our repeat marking, can cause different results. Similarly, BLAST difference will reflct different parameters and the underlying genome sequence, depending on where the BLAST tool is hosted.", "category": "z_data"} 5655 2012-02-10 14:05:58 5655 2019-09-19 13:26:02 dead 0 0 334 faq \N {"question": "Why do I see different hits when I use BLAT or BLAST from Ensembl, UCSC, and NCBI?", "answer": "The parameters that Ensembl uses for BLAT are the same as those used by UCSC. BLAT results can differ across these resources if there are discrepancies in the underlying genome assembly, or if sequence repeats are masked using different methods. BLAST can give various results when comparing resources depending on parameters used, and underlying sequence and repeat differences.
", "category": "z_data"} 5655 2012-02-14 11:56:33 \N 2019-09-19 13:26:02 live 0 0 335 faq region, track, configure, style, image, view, location {"question": "What styles are available for the data tracks in views like Region in detail?
", "answer": "To view data along a genomic region, you can choose from different styles using the 'Configure this page' menu in the Region in detail view. You can also hover over the track names and then the tool icon to view the style options for data which is already displayed.
\\r\\nSome tracks will only give you the option to turn them on or off (such as %GC, Contigs, and Structural variants). However, other data types give you more options. Below is a list of the track types available for different data types. Try them out to see if they suit your needs.
\\r\\nGenscan predictions
\\r\\nIn all styles, variants are shown with consequences indicated by a colour code. Insertions are shown as a small arrow underneath.
\\r\\nHow do you define the cut-off between an indel and a structural variant?
", "answer": "There is no offical cut-off between indels and SVs. Our variants are imported from our source databases and their classification is assigned by the original databases.
\\r\\n1000 Genomes indels have a cut-off of 50bp, however this does not apply to all variants. Some of the older archived indels in dbSNP are longer than this, however, and there is no clear boundary between what is classified as an indel or SV in dbSNP.
", "category": "variation"} 106005 2013-03-21 16:16:21 \N 2019-09-19 13:26:02 live 0 0 367 faq Correction for dbSNP strand info from Build 152 {"category": "variation", "question": "
Does Ensembl report variation alleles on the forward strand or reverse strand?
", "answer": "
In views like the variation tab, the majority of alleles are reported on the forward strand. For example A/T in this view signifies that A is the forward stranded allele on the reference genome, and T is the alternate allele (also on the forward strand). Most of our SNPs and short insertion-deletions are from NCBI dbSNP. From dbSNP 2.0 (Build 152) variants are reported only in the forward strand. But variants from earlier releases can be on either strand. Ensembl determines the forward-stranded allele and reports it. On displays where variants are shown as part of a gene sequence, the alleles are shown as they are in the cDNA or gene. For example, if it is a reverse-stranded gene, you will see the allele on the reverse strand in the gene and transcript sequences.
There are a few minor exceptions. When a variant has multiple mappings to the genome, and at least one of those is to the reverse strand, then we will report the alleles on their original strand. Also, for species where the data has been imported long time ago, the alleles will still match the original submitted strand (it can be the reverse strand) as received from dbSNP.
", "division": []} 104467 2012-05-30 16:12:07 254462 2022-09-08 10:56:47 live 0 0 371 view statistics, gene, genes, number, novel, assembly, annotation, variant, variation, CNV, known, pseudogene, count, length, size {"content": "
The Assembly and Genebuild page is divided into two section, the first giving information about the Assembly and the second about the genebuild.
\\r\\nAssembly: This field indicates the current assembly that is available, and the date that the primary assembly was released.
\\r\\nDatabase version: This field has the format \\"release number\\".\\"assembly\\" and allows the user to build the MySQL database name. For example, in release 67 the human database was assembly GRCh37.p7. The database version field would have shown \\"67.37\\" meaning that the human core Ensembl release 67 database on our public MySQL server was named \\"homo_sapiens_core_67_37\\". The pig showed a database version of \\"67.102\\", so it follows that the pig rnaseq database for Ensembl release 67 was named \\"sus_scrofa_rnaseq_67_102\\".
\\r\\nBase Pairs:
\\r\\nGolden Path Length:
\\r\\nGenebuild by: Who produced the gene annotation - Ensembl.
\\r\\nGenebuild method: For many species, this value will be \\"Full genebuild\\", meaning that the annotation used the standard pipeline. Some species, for example Sloth, have the value set to \\"Projection build\\" which is a different annotation mehtod whereby we projected annotation from human onto the new species.
\\r\\nGenebuild started: This is the date that we started the genebuidl and gives an indication as to when the UniProt, UniGene and vertebrate RNA databases were accessed.
\\r\\nGenebuild released: This is the date when the genebuild was first released.
\\r\\nGenebuild last patched/updated: This is the date when the most recent updates to the gene set were made.
\\r\\nKnown genes: The number of protein-coding genes that were mapped to a species-specific protein sequence using our cross-referencing (xref) system.
\\r\\nNovel genes: The number of protein-coding genes that were not mapped to species-specific proteins but instead were mapped to proteins from other species using our xref system.
\\r\\nProjected genes: The number of novel protein-coding genes that were subsequently assigned a gene name from their human orthologue.
\\r\\nPseudogenes: Number of pseudogenes.
\\r\\nRNA genes: Number of long and short RNA genes (non protein coding).
\\r\\nGene exons:
\\r\\nGene transcripts:
\\r\\nGenscan gene predictions: Number of genes predicted by Genscan, an ab initio gene finder. The Genscan results are not included in the Ensembl gene set. All of our gene models are supported by aligned sequence data.
\\r\\nShort variants:
\\r\\nStructural variants:
"} 5132 2012-06-08 15:36:12 106005 2019-09-19 13:26:02 draft 0 0 372 movie GRCh37, hg19, patch, haplotype, primary sequence, assembly, reference, genome sequence {"list_position": 21, "youtube_id": "sPE9j_Hw9HU", "title": "Patches and Haplotypes in the Human Genome", "youku_id": "XNDE3NTgyNzQ4", "length": "6:10"} 5655 2012-06-21 10:20:23 5655 2019-09-19 13:26:02 live 0 0 373 faq search, gene name, help {"question": "How can I search Ensembl?
", "answer": "Use the search box on the Ensembl homepage, or at the top right of all Ensembl views.
\\r\\nYou can search with a:
\\r\\nDid you know, you can use wild cards? Taking the example of the gene RHO:
\\r\\nHelp and Documentation can be searched from the homepage! Just type in a term you want to know more about, like non-synonymous SNP.
\\r\\nThe Gene Gain/Loss tree summarises the phylogenetic history of a Ensembl gene-family by showing gene gain events (expansions) and gene loss events (contractions) over time. A thick red branch on the tree indicates a significant expansion of the gene at that point in its history. A thick green branch denotes a contraction and a thin blue branch indicates that there was no significant change. The numbers at each node refer to the number of different genes in the ancestral species (as predicted with the CAFE tool). The colour of each node reflects the number of 'members' or 'genes' (coloured according to the legend below).
The species at the rightmost nodes are labelled in red (species for the current gene of interest), black (species with current genes in Ensembl in this tree) and grey (species with no current genes in this tree.)
Click on any node for a pop-up menu of information.
(Image)
For example, in the image below, we find the following information:
Taxonomic group
P-value of expansion
Gene number before and after the expansion
CAFE links:
The publication:
http://bioinformatics.oxfordjournals.org/content/22/10/1269.full
The lab page:
http://sites.bio.indiana.edu/~hahnlab/Software.html
Short sequence variants are displayed for one transcript (splice variant) in a gene in the Variation Image.
\\r\\nShort sequence variants for a gene are shown graphically. These are displayed as vertical lines, colour-coded according to the position of the variation in a transcript. These colours are described at the legend on the bottom of the graphic.
\\r\\nAfter the row showing all SNPs in the region, all transcripts in a gene are drawn. Red and gold transcripts are protein-coding, while blue transcripts are non-coding.
\\r\\nOne transcript is expanded to fill the display. Underneath each transcript are variations. If the variation is in the coding region, a coloured box will show any possible amino acids, such as the variation coding for alanine, in the diagram. Click any box for more information.
\\r\\nUnderneath the variations for a transcript, protein domains from various databases are drawn. The domains are also shown in the transcript tab, protein summary link at the left of transcript pages. Variations are traced through the protein domains using a line in the appropriate colour (see legend at the bottom of the diagram).
\\r\\nScrolling down past variations and domains for a transcript, all variations in the view are shown as boxes. If space is available, the nucleotide alleles are displayed. Click any empty box for the alleles, and a link to more information about the variation (variation properties).
\\r\\nTo simplify the image, choose only one or two variation consequences by configuring the display using the configure this page link at the left. This allows the consequence and variation source to be changed, and the intron context to be altered (i.e. if intronic variations are drawn, and the distance from an exon they must be to be shown). Note, this will also affect the variation table.
\\r\\nIf you would like to zoom in, we suggest you turn on the Sequence variants track using configure this page in Region in detail.
"} 5655 2012-07-18 11:20:41 106005 2019-09-19 13:26:02 live 0 0 385 faq FAQ, methylation, bisulphite, bisulfite, regulation, functional genomics, 5mC {"question": "Where can I find DNA Methylation data?
", "answer": "Ensembl hosts various DNA methylation data sets generated by different techniques, including reduced representation bisulphite sequencing (RRBS) profiles from the ENCODE project along side whole genome data sets (WGBS).
\\r\\nUsing the browser
\\r\\nThe methylation profiles can be viewed as tracks in the 'Region in detail' view of the Location tab as well as other views such as ‘Gene Summary’. To configure these tracks, click 'Configure this page' to open the configuration panel and scroll down to the 'Regulation' section which contains 'DNA Methylation'. The methylation profile tracks use a colour gradient (or heat bar) display style, with dark blue indicating highly methylated areas, through green to yellow, which denotes low methylation. Clicking on the track will show a pop-up window with detailed information about that position e.g. read coverage, sequence context etc. For more information about individual tracks and their analyses, click the 'info' icon in the configuration panel or hover your mouse over the track name in the Location view.
\\r\\nOther access
\\r\\nData can be accessed using the Perl API and is available for download from the Ensembl ftp site.
", "category": "regulation"} 106005 2012-09-25 08:18:53 106005 2019-09-19 13:26:02 live 0 0 496 lookup \N {"expanded": "", "word": "TSL:3", "meaning": "Transcript Support Level 3, when transcripts are supported by a single EST only.
"} \N \N \N 2019-09-19 13:26:02 live 0 0 386 faq functional genomics, regulation, gene regulation, regulatory regions, promoter, enhancer, transcription factor binding, DNAse I, Pol II, CTCF, ChIP-chip, ChIP-Seq, ENCODE {"question": "How can I view and download gene regulation data... and where did it come from?
", "answer": "Sequences potentially involved in gene regulation such as promoters and enhancers can be found in Ensembl. These are based on experiments from the ENCODE project. The Ensembl Regulatory Build integrates data from various sources (e.g. ENCODE, Roadmap Epigenomics etc.), over different cell types for both human and mouse. You can also find the original data supporting the Regulatory Build.
\\r\\nWhere available, there are additional features for human, mouse, and fly (e.g. cisRED, miRNA targets, VISTA enhancers, and REDfly).
\\r\\nThe browser
\\r\\nRegulation tracks are available in Region in detail and other views such as the Regulation view in the gene tab. Turn tracks on by clicking the Configure this page link in the left hand side menu. Scroll down to the Regulation section to see the data available for a given species.
\\r\\nHuman and mouse
\\r\\nThe Ensembl Regulatory Feature tracks display candidate regions likely to be involved with gene regulation such as potential promoters, enhancers, and insulators. These are the output of the Ensembl Regulatory Build The original data supporting the regulatory build are grouped under Open chromatin & Transcription factor binding sites or Histones & Polymerases. The data can be shown as signal profiles or peak calls.
\\r\\nOnce these data tracks are drawn, peak summits are indicated by a pair of small triangle pointers when available. Position weight matrices (PWMs) for TFBS peaks and Regulatory Features are mapped as part of the Regulatory Build and displayed as vertical black bars. Clicking on a vertical black bar will highlight the corresponding PWM in the pop up menu.
\\r\\nEnsembl also hosts the complementary genome segmentation state analysis, which provides a single track summary of the functional architecture of the human genome.
\\r\\nOther access
\\r\\nData are accessible using the Perl API.
\\r\\nAlternatively, download these data from the Ensembl ftp site.
\\r\\nBioMart provides another tool for regulation data retrieval.
", "category": "regulation"} 106005 2012-09-25 08:22:56 5655 2019-09-19 13:26:02 live 0 0 387 faq \N {"question": "Why does a stable regulatory ID such as ENSR00001348195 have multiple features associated with it?
", "answer": "The stable ID for 'Regulatory Features' denotes one specific genomic region in which there is data that supports a regulatory feature at that region. A stable ID is assigned to Regulatory Features across all cell lines which share the same distinct locus across and classification. As with other stable Ensembl IDs it is mapped between release and should remain constant. One of these features will be a special MultiCell feature, which acts as a single point of access to the regulatory build, and is the track most commonly turned on by default in various views.
\\r\\nThere is more detailed information on how we produce our regulatory build here:
\\r\\nhttp://www.ensembl.org/info/docs/funcgen/regulatory_build.html
\\r\\nFor information on how to view or download regulatory features, have a look at this FAQ:
\\r\\nttp://www.ensembl.org/Help/Faq?id=386
", "category": "regulation"} 106005 2012-09-27 15:28:19 106005 2019-09-19 13:26:02 dead 0 0 494 lookup \N {"expanded": "", "word": "TSL:1", "meaning": "Transcript Support Level 1, when transcripts are supported by at least one non-suspect mRNA.
"} \N \N \N 2019-09-19 13:26:02 live 0 0 495 lookup \N {"expanded": "", "word": "TSL:2", "meaning": "Transcript Support Level 2, when transcripts are supported by multiple ESTs or by an mRNA flagged as suspect.
"} \N \N \N 2019-09-19 13:26:02 live 0 0 388 view genome, region, location, position, zoom, browse, scroll {"content": "Note: This display is created using javascript and the HTML5 canvas element, and as such is only supported in modern browsers - up to date versions of Chrome, Firefox, and Internet Explorer 9 and later. There is also support for Safari 5.1, this doesn't always work on certain old Macbooks. If you have any problems, please change to one of the other browsers.
\\r\\nThe scrollable view allows you to browse genes and other annotation by scrolling across chromosomes. There are two main panels (or images) shown in figure 1. The first panel is a fixed image of the chromosome of interest, marking any haplotypes or patches in red or green, respectively. A red box illustrates the region of the chromosome you are currently viewing. You can jump to another region by dragging a box over the chromosome, then selecting Jump to region.
\\r\\n[[IMAGE::scrolly1.png width=\\"866\\" height=\\"674\\"]]
\\r\\nThe next panel is the Scrollable Region. This is a dynamic display that allows you to scroll across the chromosome and zoom in and out of regions of interest.
\\r\\nAs with the Region in detail page, the individual contigs that make up the genomic assembly are coloured in light or dark blue. Underneath, you will see the data tracks you have switched on, such as the Ensembl and Havana genes.
\\r\\nData tracks can be added or changed using the Configure this page tool button at the left of the page, or click on the track name itself followed by the cog wheel icon. Displays may alter as you zoom in and out of the scrollable region to limit the amount of data on the page. For most features this will mean remvoing labels and switching to stacked view as you zoom out, but sequence variants will not be visible at all above 10kb and structural variants will not not appear above 5Mb. We have launched the scrollable view with subset of the tracks available on the Region in detail page, and there are plans to expand this in the future, including the ability to add custom tracks.
\\r\\nThe buttons used to browse the scrollable region are shown in detail in figure 2.
\\r\\n[[IMAGE::scrolly2.png width=\\"938\\" height=\\"268\\"]]
\\r\\nTo scroll along the genome, click (and hold) the scroll arrows (A), or click and drag within the image.
\\r\\nZoom in and out using the zoom magnifying glass buttons (B), or using your mouse wheel within the image. The zoom buttons will keep the same centre point, whereas the mouse wheel will zoom towards your cursor position. You can turn off the wheel zoom by clicking on the wheel button (E), which will change the icon from a magnifying glass to an up/down arrow, allowing you to move up and down the page with the wheel as normal.
\\r\\nUse the track height button (C) to switch between automatic track height (arrows facing in icon) and fixed track height (arrows facing out icon), and reset to default using the arrow wheel button. In the fixed track height mode, adjust the height by dragging up and down from the paired horizontal lines between the tracks. As you scroll across the chromosome in automatic track height, the track height automatically adjusts to fit in all features. In fixed track height, you may find that not all features within a track are displayed, and the height needs to be adjusted to fit them in.
\\r\\nJump to a position in your display using the drag/select tool (D). Click the double ended arrow to select drag/select, the icon will change to a vertical dotted line. Drag a box in the image and select Jump to region. You can also drag and select a region by holding down the shift key as you select a box in the image.
\\r\\n[[MOVIE::390]]
"} 106005 2012-10-03 12:49:33 106005 2019-09-19 13:26:02 live 0 0 389 movie region, location, detail, region in detail, genome {"list_position": 4, "youtube_id": "tTKEvgPUq94", "title": "Looking at the Region of the LCT Gene - The Region in Detail View", "youku_id": "XNDYxMDU1MDM2", "length": "24:33"} 5655 2012-10-12 13:06:02 5655 2019-09-19 13:26:02 live 0 0 390 movie genoverse, region in detail, scroll, help, video {"list_position": "", "youtube_id": "WtDAno4bky0", "title": "Scrolling a Genome", "youku_id": "XNDY1NDU4MTQ0", "length": "5:25"} 5655 2012-10-22 15:58:31 5655 2019-09-19 13:26:02 dead 0 0 392 faq cite, citation, reference, publication, paper {"question": "How do I cite Ensembl?
", "answer": "To reference Ensembl, cite our most recent review overview article. A list of our publications can be found at our publications page.
\\r\\nIn your work you should include the Ensembl release (eg version 69) you extracted data from, as this allows your future readers to find the data you used.
", "category": "z_data"} 106005 2012-11-08 11:43:27 106005 2019-09-19 13:26:02 live 0 0 417 view variation, CNV, SV, structural variant, copy number variant, structural, variant, DGVa, dbVar, deletion, duplication, translocation {"content": "Navigate to more data, such as associated phenotype or genes in this region, using the left hand menu or graphical icons. Links in the left hand menu have a corresponding icon. It's your choice how to navigate through the variation displays.
\\r\\nThe following information is in the top panel:
\\r\\nDGVa (the Database of Genomic Variation Archive) is a database of variant data based at the EBI. DGVa shares data with dbVar (database of Variation) at NCBI, so these data will be included in the structural variants Ensembl imports from DGVa. Learn about data in DGVa in the DGVa Quicktour.
"} 106005 2012-12-18 11:05:13 122937 2019-09-19 13:26:02 live 0 0 418 view variation, CNV, SV, structural variant, copy number variant, structural, variant, DGVa, dbVar, deletion, duplication, translocation, genome, position, location {"content": "A top panel shows information specific to the variant. Read more about the top panel in this help page.
The genomic context display shows the region of the structural variant with associated annotation along the genome. The selected variant is shown in a 20 kb region along with surrounding variations, transcripts and regulatory features in a similar way to the region in detail view. The variants are staggered in multiple rows where they overlap. The blue bar shows how the genomic assembly is composed of individual contigs.
In Configure this page you can modify the width of the region and select or deselect variation and regulatory feature types shown.
Below the image are tables listing the features in the image that overlap the region displayed. All tables can be sorted by position or name.
Specific tables include:
For background information on variation types and sources, see this article.
"} 106005 2012-12-18 11:08:27 120522 2019-11-11 13:51:57 live 0 0 596 view postgap GWAS {"content": "The Post-GWAS Analysis Pipeline allows you to upload a tab-delimited file with GWAS summary statistics. The variant p-values and effect sizes are then finemapped and collocalised with GTEx eQTL summary statistics, to highlight likely causal gene candidates and the tissue where this effect takes place. This is currently a beta version of the tool and it may change at short notice as we fix bugs or add features.
The input for the Post-GWAS Analysis Pipeline is a tab-delimited summary statistics file from GWAS. The format for this is defined by the NHGRI-EBI GWAS Catalog. This can be a zipped file. The maximum file size is 20 MB.
An example file is shown below:
chromosome base_pair_location variant_id effect_allele other_allele beta standard_error p-value
1 2035379 rs10910029 a g -0.159 0.1035 8.41E-05
1 2035684 rs10910030 t c 0.1788 0.1033 5.51E-05
1 2035799 rs10752741 a g 0.184 0.1034 5.02E-05
1 2035977 rs10752742 t g 0.1558 0.1033 8.99E-05
Only the variant_id, p_value and beta are actually required for the Post-GWAS Analysis Pipeline to work. These are identified by reading the headers in your input file, so please ensure that these are specified correctly; the column order does not matter.
To run a job, select your file from your computer. You can choose the representative 1000 Genomes super-population from which to calculate linkage disequilibrium (LD) from the drop-down.
Click Run.
The Post-GWAS Analysis Pipeline will filter your data based on the p-value of the association. This means that the time it takes depends on the number of above-threshold hits, which is usually correlated with the number of individuals in the study.
Your job will appear in a jobs table and indicate Done when it is finished. Click on [Download Results] to get your data.
Your output will be a zipped folder, which you can expand to give you two files:
1. A short HTML report.
2. A summary report in tsv, called output2.tsv.
3. A detailed output file in tsv, called postgap_output.tsv.
The short HTML report consists of three tables: Genes, SNPs and Pathways.
The SNPs table lists either be above-threshold variants from the GWAS itself, or variants in LD with GWAS variants, based on 1000 Genomes populations. These are shown linked to genes, which are genes shown by the Genotype-Tissue Expression (GTEx) project to change expression based on the linked variant in a particular tissue, which is also shown in the table. The score combines the p-value of the variant/phenotype association from the GWAS with the p-value of the variant/gene expression association. The table shows the top ten, ranked by this score.
SNP | Gene | Tissue | Posterior |
A variant either from the GWAS, or in LD with a variant from the GWAS, from 1000 Genomes. The ID is a link to the variant in Ensembl. | A gene shown to have its expression affected by the variant, from GTEx. | The tissue in which the variant/gene expression association has been identified. | Score combining the GWAS p-value with the GTEx p-value. |
The genes table combines data from the SNPs table, linking together all instances of the gene. The cluster shows the location of all the variants which were identified linked to the gene. The score in this case is the combination of all the SNPs scores. The top ten are shown, ranked by this score.
Gene | Cluster | Tissue | Posterior |
A gene whose expression is affected by variants in the GWAS. The ID is a link to the gene in Ensembl. | The location of the variants that affect the expression of this gene. | The tissue in which the variant/gene expression association has been identified. | Score combining the SNP scores of all SNPs in the cluster. |
The Pathways table links together the associated genes by pathways from Reactome. The scores combine the gene scores for all the genes in that pathway identified. The top ten are shown, ranked by this score.
stld | Name | Score |
Reactome pathway which involves genes linked to the phenotype. The ID is a link to the pathway in Reactome. | The name of the Reactome pathway. | Score combining the gene scores. |
The postgap report in tsv contains information about all the SNPs, grouped into clusters. Every cluster is shown.
Gene ID | Cluster description | SNP ID | SNP posterior probability | Tissue | Cluster posterior probability |
A gene whose expression is affected by variants in the GWAS. The ID is a link to the gene in Ensembl. | The location of the variants that affect the expression of this gene. | A variant either from the GWAS, or in LD with a variant from the GWAS, from 1000 Genomes, which is part of the cluster. The ID is a link to the variant in Ensembl. | Score combining the GWAS p-value with the GTEx p-value. | The tissue in which the variant/gene expression association has been identified. | Score combining the SNP scores of all SNPs in the cluster. |
The TSV contains the full output of the Post GWAS Analysis Pipeline. Its full format is described in the Post GWAS Analysis Pipeline wiki.
"} 106005 2019-08-16 13:24:43 120522 2021-09-10 10:58:52 live \N \N 419 view variation, CNV, SV, structural variant, copy number variant, structural, variant, DGVa, dbVar, deletion, duplication, translocation, phenotype, disease, mutation {"content": "A top panel shows information specific to the variant. Read more about the top panel in this help page.
\\r\\nBelow this are the phenotype and gene tables.
\\r\\nThe phenotype table shows diseases and phenotypes that have been associated with the structural variant, and the number of individuals with the variant that present the phenotype.
\\r\\nThe gene table lists genes which overlap the structural variation, and also have associated phenotypes such as diseases. Click on the gene ID (ENSG#) to see the phenotype associated with the gene.
"} 106005 2012-12-18 11:12:53 106005 2019-09-19 13:26:02 live 0 0 420 view variation, CNV, SV, structural variant, copy number variant, structural, variant, DGVa, dbVar, deletion, duplication, translocation, gene, transcript, protein, regulation, regulatory feature, reg-feat, regfeat, reg feat, motif {"content": "A top panel shows information specific to the variant. Read more about the top panel in this help page.
Genes and regulatory features which overlap the structural variation are listed.
Two tables are shown.
The Gene and Transcript consequences table shows the position and effect of the variation on specific genes and transcripts.
Columns in the table are as follows:
The Regulatory consequences table shows Ensembl regulatory features and motifs at the variant position that may be involved in gene regulation from the Regulatory Build.
Columns in the table are as follows:
Note: Use the Show/hide columns button to turn unwanted columns off (or on). Expore the table using the 'CSV' button at the top right of each table. The Filter box provides a search.
"} 106005 2012-12-19 09:39:03 120522 2019-11-12 14:45:03 live 0 0 421 view variation, CNV, SV, structural variant, copy number variant, structural, variant, DGVa, dbVar, deletion, duplication, translocation, evidence, data, experimental data, experimental evidence, supporting evidence {"content": "A top panel shows information specific to the variant. Read more about the top panel in this help page.
\\r\\nThe table shows the sources of data supporting this variant.
\\r\\nStructural variants are given an esv###### or nsv###### ID. These are regions of the genome where structural variation has been demonstrated.
\\r\\nSupporting structural variants are given essv###### or nssv###### IDs. They are studies where structural variation has been identified. One structural variant can have multiple supporting structural variants associated with it, if the same region has been identified in many samples. The table shows all the supporting structural variants that have been associated with the selected structural variants.
\\r\\nIDs are prefixed with e or n (ie essv###### or nssv######), which indicate the source of the data. IDs prefixed with e mean that the study is curated by DGVa, whilst n indicates studies curated by dbVar.
\\r\\nDGVa (the Database of Genomic Variation Archive) is a database of variant data based at the EBI. DGVa shares data with dbVar (database of Variation) at NCBI, so this data will be included in the structural variants Ensembl imports from DGVa. Learn about data in DGVa in the DGVa Quicktour.
"} 106005 2012-12-19 09:40:09 106005 2019-09-19 13:26:02 live 0 0 423 movie tutorial, video, comparative genomics, species, homology, homologue, orthologue, orthology, paralogue, paralogy, gene tree, protein tree, whole genome alignment, synteny, family, compara, conserved region, conservation {"list_position": 5, "youtube_id": "bTBLg0bIi98", "title": "Comparing genes and species in Ensembl", "youku_id": "XNTA5MDExNjEy", "length": "20.48"} 106005 2013-01-30 12:42:08 5655 2019-09-19 13:26:02 live 0 0 428 view gene, transcript, splice, splicing, alternative splicing, exon, intron, sequence, compare, spliced, alternatively spliced {"content": "This view shows the sequences of spliced transcripts for a gene, including EST transcripts and ncRNAs (non-coding RNAs).
The gene sequence is shown above in brown, labelled with the gene name. Below, the transcript sequences are colour-coded to indicate the spliced sequences. Exon sequences are coloured according to their coding status: coding sequences are coloured blue, non-coding sequences in black and UTRs are coloured orange. Introns are shown in grey.
[[IMAGE::transcript_comparison_view.png width="705" height="458"]]
You can choose which transcripts you want to see by clicking on Select transcripts, which will open a menu listing the transcripts available for the gene, including their biotypes. Selected transcripts will appear in the left-hand column. Click on transcripts in the right-hand column to select them.
[[IMAGE::transcript_comparison_menu.png width="841" height="282"]]
You can alter the appearance of this page by picking on Configure this page. For example you can choose to see intron sequence or just to see dashes for the introns, and you can see variants on the sequence, allowing you to compare the effects of different sequence variants on different transcripts.
"} 106005 2013-03-06 14:50:17 254453 2023-09-07 15:32:11 live 0 0 429 faq BLAST, BLAT, sequence, search {"category": "z_data", "question": "What are the differences between BLAST and BLAT?
", "answer": "BLAST (Basic Local Alignment Search Tool) finds regions of local similarity among nucleotide or amino acid sequences. It compares a query sequence (DNA or protein) to a large set of sequences (the target) and calculates the statistical significance of matches. From Ensembl release 71 onwards, Ensembl uses the NCBI Blast implementation for its search options. Use Ensembl BLAST or read our tool help. More information can be found at NCBI.
BLAT (BLAST-like Alignment Tool) is a sequence alignment tool similar to BLAST but structured differently. BLAT quickly finds similarity in DNA and protein but it needs an exact or nearly-exact match to find a hit. Therefore Blat is not as flexible as BLAST. Since BLAST can find much more remote matches than Blat, it is the recommended tool when searching more distantly related sequences.
Due to its faster speed, BLAT is the default similarity search program in the Ensembl page for both nucleotide query and nucleotide target sequences. For more information about BLAT, click here. BLAT is free for non-commercial usage. For commercial licensing information, please contact Kent Informatics directly.
Patches and haplotypes
If you are looking for sequence that may be on a patch or haplotype, use BLAST, not BLAT. BLAT indices are built using the primary_assembly.fa file, whereas Blast indices are built from the toplevel.fa files and therefore will provide hits to both the primary assembly and the patches. You can use a patch/haplotype sequence as a query sequence to find where it maps on the genome, but you need to choose BLAST instead of BLAT to allow for mismatches.
", "division": []} 104467 2013-03-06 17:26:31 120522 2021-10-18 09:39:09 live 0 0 430 view RNASeq, RNA-Seq, expression, tissue, gene expression, gene, active, transcription {"content": "\\r\\nThis page shows RNASeq gene models and intron spanning reads for different tissue types based on RNASeq evidence. Click the View in location link to view them along the genome, enabling visual comparison to Ensembl genes, transcripts and other annotation in the Region in detail view.
\\r\\nThe source of the RNASeq data is shown in the RNASeq gene models column. A Y in the remaining columns indicates data is available for the tissue type.
\\r\\nHow the gene models are made:
\\r\\n[[MOVIE::393]]
\\r\\n"} 5655 2013-03-18 14:13:51 106005 2019-09-19 13:26:02 live 0 0 431 faq \N {"question": "
What is an Ensembl release?
", "answer": "Ensembl produces a new version (release) of the website and underlying databases every 2-3 months. This allows us to make new data and analyses available after rigorous quality checking. A new release may include new and/or updated data, such as new species, new genome assemblies, updated gene sets, new variation data, construction of new gene trees, alignments and homologies and annotation of regulatory features. There are also improvements and additions to our web-interface and database structure, including the Perl APIs. The production team at Ensembl coordinates the release cycle, which you can find more detail about, including quality checks, in this article.
\\r\\nYou can keep track of our release cycle, including announcements about what's coming up, when a new release is scheduled or live, and our plans for the future on our blog or by following us on Facebook or Twitter.
\\r\\nThe release number and a 'permanent link' for any view is shown at the bottom left of most pages in Ensembl. For example, this FAQ was published in April 2013 which corresponds with Ensembl release 71. We encourage users to note this down whenever extracting data from Ensembl, and include this in any publication which uses Ensembl, along with the 'permanent link', if you refer to a web view. One benefit of our release cycle is that you can keep track of when you got data from Ensembl, and go back to old versions, via our archive sites.
", "category": "z_data"} 106005 2013-03-21 09:24:58 104467 2019-09-19 13:26:02 live 0 0 432 faq FAQ, release, archive, update, new data {"question": "When is new data (eg new dbSNP build, a genome assembly, or updated ENCODE data) released on Ensembl ?
", "answer": "At Ensembl, maintaining high standards is a priority; we want people to be able to trust the data on our website and underlying databases. For this reason, the process of annotation takes some time, varying according to the type and quality of the data. We carry out stringent quality control both on the raw input data that we receive and on the data that we output, with continual further checks throughout the process in our release cycle. Because of this, our annotation takes longer than if we had fewer checks, and there may be a time delay in the release of data. We believe it is more important to produce high quality data slowly, than lower quality data quickly.
\\r\\n\\r\\nWhen the Ensembl Genebuild team receives a new genome assembly (which must be submitted to the INSDC and have passed their QC process), it takes two to three months to produce a set of gene models using the Ensembl gene annotation system. For all of our 65+ represented species, the Ensembl genebuild uses proteins from UniProt, and in recent genebuilds we have restricted this to those that have direct evidence (PE 1 and 2; http://www.uniprot.org/manual/protein_existence). Where annotated protein-cDNA pairs are available, we use Exonerate's cdna2genome module to produce protein coding models with UTR. For all species, same-species proteins are prioritised as an evidence source for gene annotation. For species where there are few proteins or cDNAs available in the public databases, RNAseq data are useful for gene annotation.
\\r\\nFor our most used species (human and mouse), Havana manual curation and CCDS consensus sequences are merged into the Ensembl gene set, and given Ensembl (ENST or ENSMUST) IDs, resulting in our highest confidence transcripts (Havana/Ensembl merged transcripts are gold in our browser). Havana manual curation is also available in the Ensembl gene set for zebrafish, rat and pig. There is more information on our genebuild process on our genome annotation page, as well as species-specific genebuild information, accessible from some of the species homepages, such as human.
\\r\\nVariation
\\r\\nImports from dbSNP are assessed through rigorous quality checking by the Ensembl Variation team. Suspect data is flagged, and available for users as failed variants. Variants are then classified according to their position on the genome and their consequences on the associated genes (consequence types). For human variants, linkage disequilibrium is calculated by population. This process can take a couple of weeks. Read more on our variation data description page.
\\r\\nEnsembl Regulation and ENCODE
\\r\\nWhen the Ensembl Regulation team receives data (ie from ENCODE or the Roadmap Epigenomics), it is integrated and summarised in the regulatory build. This adds value for each regulatory element by providing an aggregate MultiCell summary, transcription factor binding motifs and more specific classifications across each of the supported cell types. This is presented alongside the underlying data and other complementary datasets, such as ENCODE segmentation, providing an easily accessible interface that summarises the regulatory status across the genome.
\\r\\nComparative Genomics
\\r\\nThe majority of comparative genomics data is created in house, and is updated with every release. This unique data is not dependent on import from other databases, and is subject to internal quality checks. Respected alignment algorithms like Pecan, for whole genome alignments, or TreeBeST, for phylogenetic inference, are used in our pipelines - read more here.
\\r\\nNew Ensembl releases come out every three months and can include updated datasets, such as dbSNP imports and new genebuilds. For more about the release cycle, coordinated by the Ensembl Production Team, read this article. Despite all our efforts, some erroneous data does slip through the net, in part reflecting issues with the underlying data. We are grateful to any users who spot errors and report them to helpdesk@ensembl.org.
", "category": "archives"} 106005 2013-03-21 09:27:53 \N 2019-09-19 13:26:02 live 0 0 433 faq evidence status {"question": "What does the evidence status mean for a variant?
", "answer": "Data supporting the variant. Reported evidence types include:
\\r\\nThe cDNA or protein alignment between two orthologuous sequences is shown in CLUSTALW format. For both cDNAs and proteins, this consists of one sequence on top of the other, each labelled with the Ensembl protein IDs, and a track underneath indicating conservation between the sequences.
\\r\\nFor cDNA alignments, the conservation codes are:
\\r\\n*when nucleotides are identical:
\\r\\nspacewhen nucleotides are different
\\r\\nFor protein alignments, the conservation codes are:
\\r\\n*when amino acids are identical
\\r\\n:when amino acids are different but the function is conserved
\\r\\n.when amino acids are different but the function is semi-conserved.
\\r\\nspacewhen amino acids are different and there is no conservation of function.
\\r\\nDashes in the sequence (for both nucleotides and amino acids) indicate gaps in the alignment.
\\r\\nIn this view, we also provide the Ensembl stable IDs for the orthologous pair of genes and proteins, alongside the protein length, gene location, the % identity and % coverage.
\\r\\n% identity is the number of identical sites (amino acids or nucleotides) between two sequences in the alignment.
\\r\\n% coverage is the number of sites (amino acids or nucleotides) covered by the alignment (insertions and deletions are not included in the calculation).
\\r\\nFor each pair of orthologs with different protein lengths, there will be two numbers for % identity, and two numbers for % coverage, as both values depend on the protein length.
\\r\\nSee the BRCA2 homologous in human and anole lizard for an example.
\\r\\n[[IMAGE::help439.png width=\\"617\\" height=\\"102\\"]]
\\r\\nWhen the human protein ENSP00000439902 (longer protein in pink) is aligned to the lizard protein ENSACAP00000004459 (shorter fragment in blue), only 18% of the amino acids are identical to the lizard protein. The human protein is much longer and extends far beyond the lizard protein. In this same diagram, 51% of the amino acids in the lizard protein are identical to sequence in the human protein.
\\r\\nIn the same way, the % coverage of the human protein by the lizard protein is 35%. However, 97% of the shorter, lizard protein is covered by the human protein.
\\r\\nI used configure this page to make some changes. How do I save my selections?
", "answer": "In the configuration menu, there is a Save as ... button. [[IMAGE::CropperCapture551].png]] Click it, and you will be asked for a name and description. The track or data choices you made will be saved. You can reload them using the Load configuration button. It's that simple!
\\r\\nOnce you save configuration for more than one page, you might want to create a configuration set. This allows you to save your selections for several views at once. Click on the Manage configuration tab in the configuration menu to do this. [[IMAGE::CropperCapture552].png]]
\\r\\nRemember, if you log in to Ensembl (registration is free) you can access your saved configurations from any computer.
\\r\\nHave a look at our FAQs about adding your own data to Ensembl, and sharing views with collaborators.
", "category": "z_data"} 5655 2013-03-25 15:14:54 106005 2019-09-19 13:26:02 dead 0 0 441 faq \N {"question": "How can I add a data track to Ensembl, or change a view?
", "answer": "Click configure this page [[IMAGE::config_page.png]] from most views or the cog wheel icon from images, and you can choose from a selection of options for the view you are on. For example, if you want to add variants to the gene sequence, click on configure this page from the sequence page. Or, if you want to change what is displayed on our genomic region view (region in detail), configure this page will give you multiple data track options. See this FAQ if you think you would want to share your configuration with a colleague or collaborator.
\\r\\nRead our FAQ about track style options.
\\r\\nDid you want to add your own data?
", "category": "z_data"} 5655 2013-03-25 16:07:54 106005 2019-09-19 13:26:02 dead 0 0 451 view BLAST, BLAT, sequence, search {"content": "BLAST [1] and BLAT [2] are sequence similarity search tools that can be used for both DNA and proteins. BLAT is the default tool in Ensembl due to its faster speed. See other differences between BLAT and BLAST on our FAQ page.
Paste in a sequence (the suggested format is FASTA) or upload a sequence as a file. Up to 30 sequences can be added. If inputting multiple species, make sure a header (for example a FASTA header) separates each one.
[[IMAGE::blast_image1.png]]
You can perform multiple similarity searches at once by choosing different genomes and adding them to your list of species.
Click on 'Add/Remove Species' to open the Species Selector box. If you start typing the species name you wish to add, the search box will auto-fill with matches. Selecting these species will add them to your BLAST search. Alternatively, you can click on the species divisions (in green) to browse and select (by checking the boxes) any of the available species in Ensembl. The selected species will appear on the right side of the species selector box, to remove species; click the (-) button on the right of its name. Once you have selected all the species you wish to run through BLAST/BLAT, click apply to return to the query page.
The databases available for similarity searches are DNA and protein target databases.
Repetitve and/or low complexity regions are not masked
Genomic sequences have been run through the RepeatMasker program and repetitive and/or low complexity regions have been masked as Ns
Genomic sequences have been run through the RepeatMasker program and repetitive and/or low complexity regions have been masked as lower case letters
Predictions based on the sequence alone, therefore not supported by experimental evidence
The following options are available:
Pre-Configured Sets
For BLAST searches, you can change the 'Search Sensitivity' from normal to the following:
Near match (to find closer matches- more stringent settings than 'normal')
Short sequences (for short sequences like primers: BLASTN only.)
Distant homologies (to allow lower-scoring pairs to pass through)
Specific parameters for these configurations can be found by expanding the Configuration options. Alternatively, change the configuration options to customise your own BLAST search.
Running the job
You can give a name or description to this BLAST or BLAT search in the Description (optional) field.
Once your parameters are set, click RUN to start the search.
4) Recent BLAST tickets
The table lists jobs that are currently running or recently completed. A ticket ID is assigned to each job and additional information is provided i.e. Analysis, Jobs and Submitted at (date and time). You can customise the table by showing/hiding columns.
The progression of the job gets automatically refreshed every 10 seconds until the job is fully completed.
You can view the results by clicking on the ticket number or on the link View results.
[[IMAGE::blast_image2.png]]
5) Results
Details of the job include job name, species, search type (e.g. BLAT), sequence, query and database types and configuration settings.
Click on the title or (-) to collapse the Job details section.
It can be viewed on the page or downloaded as a file.
[[IMAGE::results_table_blast.png]]
This table lists all hits in order of high to low score (and E-value) but it can be customised to show/hide columns. The results can be sorted by any parameters available in the table.
Hover over the links provided in the results table and click on them to get:
[[IMAGE::results_blast_genomic.png]]
Click on the title or (-) to collaspe the Results table.
High-scoring segment pair (HSP) is a local alignment with no gaps that achieves one of the highest alignment scores in a given search. It corresponds to the matching region between the query and the database hit sequence.
The HSP distribution can be visualised on the karyotype (if the karyotype is available for a given species) and the hits are represented as arrows (the best hit is represented in a box).
Click on the arrows for a pop-up window with a summary of BLAST/BLAT hits such as Genomic location (bp), Score, E-value, etc for all target features available. Links to the Alignment (A), Query sequence (S) and Genomic sequence (G) are also available.
[[IMAGE::blast_karyo.png]]
Click on the title or (-) to collaspe the karyotype image.
The HSP distribution can be visualised on the query, which is shown as a chain of black and white boxes. Fragments of the query sequence that hit other places in the genome are shown as red boxes (click on those for more information). Usually these fragments are small (they vary between 100-200 nt) and map to various locations. These sequences are of low complexity, such as repetitive sequences.
[[IMAGE::blast_seq.png]]
Click on the title or (-) to collaspe the query sequence image.
A) Maximum number of hits to report:
Number of database hits that are displayed. The actual number of alignments may be greater than this. It varies from 10 to 5000 to 100000 and the default is 100.
B) Maximum E-value for reported alignments:
Number of hits reported that contain lower than the E-values selected. It varies from 1e-200 to 1000 and the defatul value is 1e-1.
This option is available for BLAST searches only, not BLAT. It is the length of the seed that initiates and alignment between the query and the target sequences. It varies from 2 to 15 and the default is 11 (nucleotides) for DNA and 3 (residues) for protein.
1) BLAST. Joseph Bedell, Ian Korf and Mark Yandell [OReilly & Associates, 2003]
2) Kent WJ. BLAT - the BLAST-like alignment tool. Genome Res. 2002 Apr;12(4):656-64. BLAT is free for non-commercial usage. For commercial licensing information, please contact Kent Informatics directly.
"} 104467 2013-04-03 13:12:26 254453 2023-09-07 15:02:37 live 0 0 599 faq regulation, promoter, sequence, export, download {"category": "regulation", "question": "How do I download the promoter sequence for my gene of interest?
", "answer": "If you’re working with human or mouse, Ensembl has generated predicted promoter regions through analysing datasets from the ENCODE, ENCODE, Roadmap Epigenomics and Blueprint projects. This is called the Regulatory Build.
The Ensembl Regulatory Build does not directly associate annotated genes with regulatory features, so you will need to search for your gene of interest and look for promoters proximal to the 5’ UTR. You should validate this yourself, either experimentally or by cross-referencing with cell-type activity levels and tissue-specific expression data.
These annotated promoters have unique stable IDs. You can navigate to the Regulation tab for your promoter of interest by searching for the stable ID itself or clicking on a promoter when searching in the context of a gene.
The easiest way to retrieve the sequence is by clicking on ‘Location’ within the Regulation tab, then the blue ‘Export data’ button once in the Location tab. This will export the genomic sequence for the region where the promoter exists.
If you’re not working with human or mouse, you’ll need to define the promoter as X number of basepairs upstream of the Transcription Start Site (TSS). Many people use 500bp to define the ‘promoter region’. Whatever length of upstream sequence you use for your definition, you can download the sequence either:
• through the gene tab by clicking on 'Sequence' in the left hand menu and then downloading the sequence using the blue 'Download Sequence' button and specifying the upstream sequence length in the download options window.
• by searching for the genomic coordinates upstream of your gene of interest in the location tab then clicking on the blue 'Export Data' button on the left hand side of the page.
If you have a large number of genes, you can use the REST API to retrieve the promoter sequences programmatically using the POST sequence endpoints for either genomic regions or stable IDs, depending on your input.
", "division": ["bacteria", "fungi", "metazoa", "plants", "protists", "vertebrates", "viruses"]} 120522 2022-01-05 11:38:08 \N 2022-01-05 11:38:08 live \N \N 562 view LD, linkage disequilibrium, allele, position, D', r2, d prime, D prime, haploview {"content": "This view displays detailed information on linkage disequilibrium (LD), a measure of the non-random association of alleles at two or more loci that descend from a single and ancestral chromosome.
\\r\\nThe commonly used summaries D' and r2 have been calculated.
\\r\\nD' is the difference between the observed and the expected frequency of a given haplotype. If two loci are independent (i.e. in linkage equilibrium and therefore not coinherited at all), the D' value will be 0.
\\r\\nr2 is the correlation between a pair of loci. It varies from 0 (loci are in complete linkage equilibrium) to 1 (loci are in complete linkage disequilibrium and coinherited). Note that only LDs with r2 values larger than 0.05 are available in Ensembl.
\\r\\n\\r\\nThe page shows the Ensembl genes and other genomic features annotated in the region such as SNPs, structural variants, etc. This can be customised by clicking on the 'Configure this page' button or on the cog wheel in the image. We also display the LD plot (s). In the centre of these plots, we give the genomic position of a SNP and show all LD values that are contained in a 20 kb region both upstream and downstream of this SNP.
\\r\\n[[IMAGE::LD_plot.png height=\\"540\\" width=\\"720\\"]]
\\r\\nLD values between any two variants in these plots are graphically displayed using inverted coloured triangles varying from white (low LD) to red (high LD). Hover over and click on the inverted triangle (s) to get the LD value between any two SNPs.
\\r\\nYou can select to view LD data for different populations by clicking on the 'Select populations' button in the left hand side. To export this data as HTML, Text, Excel or as a format for upload into Haploview, simply click on the 'Export data' button in the left hand side.
"} 120522 2016-02-15 12:35:41 \N 2019-09-19 13:26:02 live 0 0 577 view \N {"content": "Ensembl gene trees are generated by the Gene Orthology/Paralogy prediction method pipeline. All homologues in Ensembl are determined from gene trees.
\\r\\nMouse-strain gene trees are constructed using one representative protein for every gene in every mouse strain in Ensembl. The longest translation annotated by the CCDS project is used, if any are available, or the longest protein-coding translation otherwise. (The trees can also be considered as protein trees).
\\r\\nThe display shows the maximum likelihood phylogenetic tree representing the evolutionary history of genes; mouse consensus gene sequence, as well as sequences for individual mouse strains and for rat, are presented. These trees are reconciled with a species tree, generated by TreeBeST. Internal nodes are then annotated for duplication (red boxes) or speciation (blue boxes) events.
\\r\\n[[IMAGE::gainlosstreenode.png width=\\"463\\" height=\\"166\\"]]
\\r\\nRed squares represent duplication nodes, blue squares represent speciation nodes, giving rise to paralogues and orthologues. Another class of node, ambiguous, is shown as a lighter blue square.
\\r\\nThe gene of interest is highlighted in red and within-species paralogues are shown in blue, if the option to view paralogues is selected (below the tree diagram).
\\r\\nTaxonomy IDs refer to the NCBI Taxonomy Browser. The number at the top of pop-up menus (upon clicking on a node) corresponds to the node_id from the protein_tree_node table in the compara database.
\\r\\nMultiple alignment of the peptides (green bars) was made using MUSCLE. Green bars show areas of amino acid alignment, white areas are gaps in the alignment. Dark green bars indicate consensus alignments.
\\r\\nClick on a node to expand a collapsed set of branches into a full tree. The consensus amino acid alignment corresponds to the consensus residues in the collapsed node, and will be expanded when the tree is expanded.
\\r\\nYou can also view a detailed sequence alignment in Wasabi by clicking on a node.
\\r\\nYou can also use the collapse and expand links at the bottom as follows:
\\r\\nView current gene only: Shows the default view of the gene tree, where the selected gene and the node it is within is expanded fully, while all other nodes are collapsed.
\\r\\nView paralogs of current gene: Shows the current gene and all its paralogues with their nodes expanded fully, while all other nodes are collapsed.
\\r\\nView all duplication nodes: Expands all the red duplication nodes and all of the nodes they fall within, while speciation nodes remain collapsed.
\\r\\nView fully expanded tree: Expands all nodes.
\\r\\nConfigure this page to customise the tree. Colouring by clade can be removed.
\\r\\nGene trees can be exported as EMF (Ensembl Multi Format) files from the Ensembl ftp site.
"} 122937 2016-09-21 16:11:49 120522 2019-09-19 13:26:02 live 0 0 493 lookup \N {"expanded": "APPRIS - A system for annotating alternative splice isoforms", "word": "APPRIS", "meaning": "APPRIS is a system to annotate alternatively spliced transcripts based on a range of computational methods.
"} \N \N 2 2019-09-19 13:26:02 live 0 0 452 faq karyotype, cytogenetic banding pattern, chromosome band, cytogenetic band {"question": "Does Ensembl provide the karyotypes and cytogenetic banding patterns for its annotated genomes?
", "answer": "Karyotypes and cytogenetic banding patterns are provided only for the species where this information is known and available to us.
\\r\\nNot all genomes have been assembled into chromosomes and therefore they are provided as scaffolds only.
\\r\\nSee an example of a karyotype view in the Platypus genome.
\\r\\nThe Ensembl species with cytogenetic banding patterns available are fruitfly, human, mouse and rat. See an example of a cytogenetic banding pattern in the rat genome.
\\r\\n", "category": "assemblies"} 104467 2013-04-08 10:52:15 106005 2019-09-19 13:26:02 dead 0 0 503 view \N {"content": "\\r\\n
Some Ensembl genes are found on the primary assembly as well as an alternative assembly sestion, such as a patch or a haplotype.
\\r\\nThis page lists the equivalent genes on alternate versions of the assembly, including links to the genes.
\\r\\n[[MOVIE::372]]
"} \N \N \N 2019-09-19 13:26:02 live 0 0 453 movie Regulation, promoter, ENCODE, Regulatory Build, Matrix, Tutorial {"list_position": 6, "youtube_id": "P5_KqDztz4Y", "title": "Viewing Ensembl Regulation using the matrix", "youku_id": "XNDEzNzk3MzYw", "length": "3:12"} 5655 2013-04-12 13:36:58 106005 2019-09-19 13:26:02 live 0 0 461 faq \N {"question": "What is the meaning of the red, black, and peach colours in the gene sequence?
", "answer": "The genomic sequence of any gene in Ensembl is shown in the Gene tab, under Sequence in the left hand menu. The default display shows the exons and introns in addition to the flanking sequence. All exons in the region are highlighted in peach colour. Exons that belong to the gene of interest are shown in red letters.
\\r\\n[[IMAGE::gene_sequence.png]]
\\r\\nOther exons can belong to neighbouring genes, or genes on the opposite strand of your favourite gene, for example, are shown in black letters.
", "category": "genes"} 104467 2013-06-05 16:25:11 106005 2019-09-19 13:26:02 dead 0 0 460 view Regulation, promoter, ENCODE, Regulatory Build, Matrix, Tutorial {"content": "[[MOVIE::478]]
"} 97803 2013-05-28 13:03:56 \N 2019-09-19 13:26:02 live 0 0 457 movie Custom Data User Upload URL Attachment BAM GFF VEP {"list_position": 5, "youtube_id": "2TMXY-F2zcs", "title": "View Your Data in Ensembl", "youku_id": "XNTQ0NzkyMjAw", "length": "12:02"} 5655 2013-04-18 14:15:29 \N 2019-09-19 13:26:02 live 0 0 472 faq RNASeq models, RNA-seq models, RNA seq models, BAM files, matrix, intron {"question": "How can I view RNASeq data in Ensembl?
\\r\\n", "answer": "
RNASeq data are available for more than 20 species in Ensembl. These data have been processed using our in-house Ensembl RNASeq pipeline. For most species, we include the following data:
\\r\\nThese three types of RNASeq data can be viewed alongside the genome in the Location or Gene tabs by clicking on 'Configure this page' . In the configuration window, under the 'Genes and transcripts' menu on the left hand side, click on 'RNASeq models'.
\\r\\nYou can filter by tissue type, e.g. blood, and in human, you can filter by classes of data, e.g. Human BodyMap 2.0 and Beta cell transcriptome. You can choose to view the BAM files, the Gene models and/or the Intron-spanning reads for specific tissues using the configuration matrix. Watch this video on the Ensembl Regulation data to see how to turn tracks on and off using any configuration matrix. See the images below for an example in sheep.
\\r\\n[[IMAGE::RNAseq_matrix.png height=\\"457\\" width=\\"481\\"]] [[IMAGE::RNAseq_view.png height=\\"400\\" width=\\"466\\"]]
", "category": "genes"} \N \N 125915 2019-09-19 13:26:02 live 0 0 462 faq \N {"question": "Can I export a nice picture of my data in Ensembl?
", "answer": "Once you have custom data in Ensembl (see this FAQ or video), you can customise images and export them.
\\r\\nCustomise the image by adding or removing tracks. In addition, you can change the track style.
\\r\\nThe image below is an example of custom variation data displayed in Ensembl where a single block for each gene in the region of human chromosome 18 (for both forward and reverse stranded genes) is displayed.
\\r\\n[[IMAGE::collapsed_gene.png]]
\\r\\nTo change the default to show a single block for each gene rather than individual splice variants, go to the track name for the gene track, hover over the cog wheel, and change the setting from 'normal?' to 'collapsed'.
\\r\\n[[IMAGE::collapsed.png]]
\\r\\nThe image can be easily exported from Ensembl and used in publications or presentations, using the image icon.
\\r\\n[[IMAGE::export.png]]
", "category": "data"} 104467 2013-06-05 16:48:57 120522 2019-09-19 13:26:02 live 0 0 464 faq \N {"question": "How can I remove or turn off tracks in Ensembl?
", "answer": "Click on the 'Configure this page' button or the 'cog wheel' in Ensembl views to turn tracks off. You can also delete a track by hovering over a track name in the Ensembl image and clicking on the 'x'.
\\r\\n[[IMAGE::config2delete.png width=\\"563\\" height=\\"509\\"]]
\\r\\n[[IMAGE::hover2delete.png width=\\"578\\" height=\\"214\\"]]
\\r\\nView an example of a region in Ensembl where you can remove or turn off tracks.
", "category": "z_data"} 104467 2013-06-07 14:48:17 104467 2019-09-19 13:26:02 dead 0 0 465 faq \N {"question": "I've identified some new genes/transripts/RNASeq reads and would like them to be added to the Ensembl gene set. How do I submit them?
", "answer": "Ensembl genes are annotated from sequences found in various public databases. These are manually curated databases, such as UniProtKB/Swiss-Prot or RefSeq, and other public sequence repositories, such as ENA, GenBank or DDBJ. We do not accept direct submissions of gene models.
\\r\\nIf you have new sequences of genes or transcripts, you can submit them to either ENA, GenBank or DDBJ (data is synchronised between these databases, therefore a sequence submitted to one should appear in the others). Once your data has been accepted into one of these databases it will be included in the sequences aligned to the genome in the Ensembl genebuild.
\\r\\nNote that there may be a time delay between submitting your sequences and them appearing in Ensembl. Furthermore, a sequence that does not align to the genome will not appear in the Ensembl gene set. Note the time it takes to do an Ensembl genebuild and the Ensembl release cycle.
\\r\\nRNASeq (RNA-seq) data can be submitted to EBI's ArrayExpress.
", "category": "genes"} \N \N \N 2019-09-19 13:26:02 live \N \N 468 faq biotype, noncoding, coding {"question": "What do the different biotypes in Ensembl mean?
", "answer": "The Ensembl automatic annotation system classifies genes and transcripts into biotypes including: protein coding, pseudogene, processed pseudogene, miRNA, rRNA, scRNA, snoRNA, snRNA.
\\r\\nFor human, mouse, rat and pig, we incorporate manual annotation from Havana. For genes and transcripts that include manual annotation, we display the manually assigned biotype. The full list of Havana biotypes can be found here.
\\r\\nThe biotypes can be grouped into protein coding, pseudogene, long noncoding and short noncoding. Examples of biotypes in each group are as follows:
\\r\\nI would like to export an image of a particular view in Ensembl. Which image export option should I use?
", "answer": "Our image export menu provides several preset options, which are designed for different image use-cases.
\\r\\nThe Journal image option exports a PNG file with the following preset parameters:
\\r\\nFor larger formats we also offer a Poster option, which is larger (5000px) than the Journal option with a greater resolution. Note that this size should not be needed for normal print purposes, as some of the lines may come out very fine.
\\r\\nThe Presentation image option exports a PDF file preset with the following options:
\\r\\nWeb is a standard image export in the PNG file format. PDF is a standard image export in the PDF file format.
\\r\\nAlternatively you can select a range of image sizes and resolutions in the Custom Image menu. Note that we recommend PNG as an export option in most cases, as it is better supported and will more accurately reflect the image on-screen.
\\r\\nDon't forget to cite us!
\\r\\n", "category": "data"} \N \N 120522 2019-09-19 13:26:02 live 0 0 469 faq biotype, noncoding, coding {"question": "
What do the different biotypes in Ensembl mean?
", "answer": "The Ensembl automatic annotation system classifies genes and transcripts into biotypes including: protein_coding, pseudogene, processed_pseudogene, miRNA, rRNA, scRNA, snoRNA, snRNA.
\\r\\nFor human, mouse and selected other species, we incorporate manual annotation from Havana. For genes and transcripts that include manual annotation, we display the manually assigned biotype. The full list of Havana biotypes can be found here.
\\r\\nThe biotypes can be grouped into protein coding, pseudogene, long noncoding and short noncoding. Examples of biotypes in each group are as follows:
\\r\\nYou can use most of Ensembl’s features without an account. However, the additional benefits of registering are listed below:
User accounts allow you to save data, views in Ensembl and jobs submitted to Ensembl tools
You can save bookmarks and configurations
You can save your own data to Ensembl
You can share all of the above with colleague
You can set up an account by clicking on Login/Register at the top left of an Ensembl page, then select Register. You will then be prompted to enter your name, email, the organisation where you work or study and your country. You can choose whether you want to be added to Ensembl mailing lists.
[[IMAGE::Login_register.png height="59" width="333"]]
Once you have registered, you can log into your account by clicking onto Login/Register. When you are logged in, this will display your email address. Click on it to access your account.
When you run a job through an Ensembl tool, such as the Variant Effect Predictor or BLAST, you can save that job to your account. Just click on the floppy disk icon [[IMAGE::floppy_disk.png height="23" width="22"]] in the jobs table. Your saved jobs will be stored in a list when you go to these tools - make sure you give them a memorable name so they're easy to spot later.
If you've found a page you're interested in, you can bookmark it to your Ensembl account. On the left hand side of a page, you should see a blue button saying Bookmark this page. This will give you options to add the page to your account.
[[IMAGE::bookmark_page.png height="30" width="223"]]
Click on your email address in the top left to be given a list of your bookmarks.
[[IMAGE::bookmark_menu.png height="230" width="258"]]
Many Ensembl pages are configurable to display the datasets you're interested in. When you've configured a page a certain way, you might want to change it, but go back to that view later. An easy way to do this is save configurations.
In the Configure this page menu there is a blue button saying Save configuration as....
[[IMAGE::save_config.png height="31" width="262"]]
Click on this to go to a dialogue box which allows you to name and describe your configuration. It also allows you to save to either your Account, your Session or to a group.
[[IMAGE::save_menu.png height="454" width="431"]]
To load a saved configuration, click on Load configuration under the configure menu.
[[IMAGE::load_config.png height="30" width="262"]]
This will give you a list of your saved configurations to select from. Click on the down arrow at the left to see more information about the configuration. Click on the tick [[IMAGE::tick.png height="22" width="23"]] at the right to load a configuration.
Once you've saved lots of configurations, you might want to group them together into sets. A set is a group of configurations for different pages in Ensembl, for example you might want to save a set where all pages (e.g. Region in detail, Gene sequence, Exons view) are configured to show variation. You can save these as a set, and when you load the set, all pages you have a configuration for will be configured according to the set.
To make a set, click on Configure this page and go to the Personal Data tab at the top of the menu,
[[IMAGE::config_tabs.png height="22" width="690"]]
then click on Configuration sets at the left. This gives you a list of all the sets you have saved. You can create a new one by clicking on Create a new configuration set.
Select configurations from the list by clicking on the plus sign [[IMAGE::add.png height="22" width="23"]] at the right. Selected configurations will turn green. Configurations for the same page as a selected configuration will be greyed out, as you cannot add multiple configurations of a single page to a set.
[[IMAGE::new_set.png height="682" width="637"]]
Once you have selected your configurations, you can choose to save to either your account or session and give your set a name and a description. You can load a set by going into Configuration sets under Personal data in the Configure this page menu.
If you upload your own data to Ensembl, you can save it to your account. Uploaded data will automatically be added to your session. You can save it to your account by clicking on the blue Custom Tracks button.
[[IMAGE::manage_data.png height="30" width="223"]]
You will get to a list of your uploaded data. Save it by clicking on the floppy disk icon [[IMAGE::floppy_disk.png height="23" width="22"]].
You can log into your user account from any computer you go to. Your session is linked to the computer you are on and is accessible to anyone using that computer. You should save things to your user account if you would like to go back to them at a later date, if you use multiple computers or if you use a shared computer. Only save things to your session if you don't need to go back to them another day.
Configurations, sets and custom data can all be saved to either your user account or your session. If you have something saved to your session and you'd like to save it to your user account, click on the floppy disk icon [[IMAGE::floppy_disk.png height="23" width="22"]].
Items in your user account are in a separate file location to items in your session. This means that you cannot save configurations from your session into a set in your user account, or vice versa. When you save to your user account, items will be moved out of your session and into your user account. To ensure that your sets and configurations remain intact, Ensembl will also save all linked sets and configurations to that which you are saving. This means all sets your configuration is in, and all configurations in your set. This continues ad infinitum until all linked configurations and sets are saved. If you have many combinations of sets and configurations, this may mean that they are all saved.
Saved data can be shared to other users. You can do this by forming groups. Click on your email address at the top left then select My Account from the drop-down. This will take you to a summary of your account. In the menu on the left there are options to create or join groups. For example, you could create a group for your lab or collaborators.
If you are an administrator for a group, you can share data with the group. You can share custom data, bookmarks, configurations and sets. In each of their menus you will see the Share icon [[IMAGE::share.png height="22" width="22"]]. Click on it to get the option of sharing with each of your different groups or sharing via URL, which you can put in an email.
You will see data that have been shared with your groups when you go into the different menus. They are listed under *Data* from your groups.
"} \N \N 254453 2022-11-16 14:33:42 live 0 0 474 faq \N {"question": "Can I display my own SNP data on the Ensembl browser?
", "answer": "Yes, you can view your SNP data as well as indels and CNVs.
\\r\\nYou can do this through the 'Add Your Data/Manage Your Data' button on the 'Region in detail' page in the Location tab.
\\r\\nAnother option is to upload variants to Ensembl via the Variant Effect Predictor (VEP).
\\r\\nVCF files are variation files fully supported for custom data attachment in Ensembl if they are available at a URL, e.g. http://, https:// and ftp://.
\\r\\nYou will also need an index file with the extension .vcf.gz.tbi in the same directory as your VCF file and with the same name.
\\r\\nTo attach a VCF file to any 'Region in detail' view, you can follow these steps:
\\r\\n1) Click on 'Add Your Data/Manage Your Data'
\\r\\n2) Click 'Add Your Data' at the left (in case you are not already there):
\\r\\n[[IMAGE::attach_data.png height=\\"204\\" width=\\"629\\"]]
\\r\\n3) You may want to give a name for your custom data track
\\r\\n4) Choose 'VCF'
\\r\\n5) Provide the file URL (Please note you need both .vcf.gz and .vcf.gz.tbi (index) files in the same directory and under the same name)
\\r\\n6) Click 'Attach'
\\r\\n7) Close the configuration panel
\\r\\nThe variants should be displayed along the genome as shown below:
\\r\\n[[IMAGE::VCF_example.png height=\\"302\\" width=\\"613\\"]]
\\r\\nYou will be able to visualise the different consequence terms your variants cause on the set of Ensembl genes and transcripts. Zoom in to see the individual variants and their consequence terms.
\\r\\nFor other file formats you can use, see the Custom Data FAQ.
\\r\\nMore information about uploading and attaching your own data can be found in our video.
", "category": "variation"} \N \N 106005 2019-09-19 13:26:02 dead 0 0 475 faq \N {"question": "\\r\\n
Can I display my own SNP data on the Ensembl browser?
", "answer": "Yes, you can view your SNP data as well as indels and CNVs. You can do this through the Add Your Data/Manage Your Data button on the Region in detail page. Another option is to upload variants to Ensembl via the VEP (Variant Effect Predictor) (http://www.ensembl.org/Help/Faq?id=288).
\\r\\nVCF files are variation files fully supported for custom data attachment in Ensembl and will show the consequence terms the variants cause on the set of Ensembl genes and transcripts. Please note you need both .vcf.gz and .vcf.gz.tbi (index) files in the same directory.
\\r\\nTo attach a VCF file to Region in detail (link):
\\r\\nStep 1: Click on Add Your Data/Manage Your Data
\\r\\nStep 2: Click Add Your Data at the left (if you are not already here):
\\r\\n(Picture1-attached)
\\r\\nStep 3: Enter a name
\\r\\nStep 4: Choose VCF
\\r\\nStep 5: Enter in the URL (Please note you need both .vcf.gz and .vcf.gz.tbi (index) files in the same directory.)
\\r\\nStep 6: Click Attach
\\r\\nStep 7: Close the control panel
\\r\\nThe variants should be displayed along the genome as in this figure (Denise's image).
\\r\\n\\r\\n
You may need to zoom in to see the individual variants and their consequence terms.
\\r\\nFor other file formats you can use, see the Custom Data FAQ.
\\r\\nMore information about uploading and attaching your own data can be found in our video.
", "category": "variation"} \N \N \N 2019-09-19 13:26:02 dead 0 0 476 faq BioMart, export, attribute {"question": "Ensembl BioMart shows results for protein-coding genes when protein-associated attributes are chosen. Non-coding genes that pass filters will not be shown in the results if certain protein-associated attributes are chosen. Why does this occur?
", "answer": "In the Ensembl BioMart, the main dataset is made up of three main tables, each with a number of associated dimension (dm) tables.
\\r\\nThe first main table is built from the gene table in the Ensembl core schema. With it goes all the information directly associated with the gene table, such as cross-references assigned directly to genes (as opposed to transcripts or translations).
\\r\\nThe second main table inherits its fields from the gene main table and adds the transcript-related data, cross-references specifically on transcripts, etc. When building the transcript main table, it also inherits the data from the gene main table. More specifically, the transcript main table contains the data from the gene table for all genes that have transcripts, i.e. all genes.
\\r\\nDepending on what filters and attributes you select, the gene or the transcript main table will be used. Selecting HGNC symbols and transcript stable IDs, for example, will use the transcript main table as the transcript stable IDs are not available in the gene main table.
\\r\\nThe third main table inherits its structure from the second main table, which means that it contains all the fields from the gene main table and the transcript main table, and then it adds the specific fields for the translations (e.g. cross-references specifically on translations). It contains all the data from the transcript main table, but only for the transcripts that have translations, i.e. *not* all transcripts.
\\r\\nWhen the SwissProt-KB ID attribute (or any other external reference mapped to translations, e.g. EMBL ID or HPA ID in human) is applied, the main table involved is the translation main table since the attribute is a cross-reference associated with translations. The translation main table only contains data for the transcripts (and genes) that have translations. A consequence of this is that non-coding genes that pass filters will not be shown in the results if certain protein-associated attributes are chosen. We believe that it may be possible to change this behaviour and have requested such a change from the BioMart developers.
", "category": "data"} \N \N 122937 2019-09-19 13:26:02 live 0 0 477 view ncRNA, secondary structure, structure {"content": "This view shows the secondary structure of a selected ncRNA gene. Base pairing, stem loops and non-paired bases are shown in 2D plots. Depending on the gene, one or two identical structures may be displayed. One is available for every gene, and shows the actual sequence of the ncRNA gene without any annotations. The other structure shows the consensus and sequence conservation of the gene-family to which this ncRNA belongs, and is only available for the genes that are included in the Comparative Genomics pipeline.
\\r\\nThe consensus view is coloured and highlighted to indicate the following:
\\r\\n[[IMAGE::secondarystructure.png]]
\\r\\n\\r\\n
The structural information comes primarily from the covariance models provided by RFAM[1] and used for each ncRNA gene family in Ensembl. These covariance models are used for aligning all the Ensembl genes that belong to the same gene family[2] using the Infernal software[3] and a new covariance model is created with the actual sequences of the alignment. The secondary structure is then annotated using the conservation found in the alignment. See the documentation of the ncRNA trees[2] for more information about how the gene-family is constructed and aligned.
\\r\\nIf the genes could not be found in the RFAM database, we use RNAFold[7] to predict their structure.
\\r\\nThe 2D plots are generated using the r2r[4, 5] package. For more information on r2r, see also below.
\\r\\nThese plots show sequence conservation and base pair covariation. To establish the extent of conservation, the sequences were weighted following the GSC algorithm[6] implemented in Infernal and used by r2r. Weighted nucleotide frequencies were calculated at each position in the multiple alignment. To classify base pairs as covarying the weighted frequency of Watson-Crick or G-U pairs was calculated. Covariation was called if two sequences had pairs that differ at both positions. If only one position differed the occurrence was classified as a compatible mutation. See [6] for more information on this process.
\\r\\nAUTHORS' WARNING: R2R is not intended to evaluate evidence for covariation or RNA structure where this is in question. It is not appropriate to use R2R's covariation markings to declare that there is evidence of structural conservation within an alignment. R2R is a drawing program. As the original paper, Weinberg and Breaker, 2011, wrote: \\"This automated R2R annotation[of covariation]does not reflect the extent or confidence of covariation. While such information can be useful, we believe that thorough evaluation of covariation evidence ultimately requires analysis of the full sequence alignment. For example, misleading covariation can result from an incorrect alignment of sequences, or from alignments of sequences that do not function as structured RNAs. Unfortunately, there is no accepted method to assign confidence that entirely eliminates the need to analyze the full alignment.\\"
\\r\\n\\r\\n\\r\\n
[2] http://www.ensembl.org/info/genome/compara/ncRNA_methods.html
\\r\\n[3] http://infernal.janelia.org/
\\r\\n[4] http://breaker.research.yale.edu/R2R/
\\r\\n[5] Z. Weinberg and R.R. Breaker. R2R—software to speed the depiction of aesthetic consensus RNA secondary structures. BMC Bioinformatics, 21:3, 2011.
\\r\\n[6] M. Gerstein, E. L. L. Sonnhammer, and C. Chothia. Volume changes in protein evolution. Journal of Molecular Biology, 236(4):1067–78, 1994.
\\r\\n[7] http://www.tbi.univie.ac.at/RNA/
"} \N \N 122937 2019-09-19 13:26:02 live 0 0 478 movie RNA-Seq, RNASeq, Matrix, ENCODE, Regulation, Configure this page {"list_position": "", "youtube_id": "7UAKF8mj5Fo", "title": "Using the Matrix to View RNASeq Models, ENCODE Data and More", "youku_id": "XNjQwNjYyMjQ4", "length": "9:36"} \N \N \N 2019-09-19 13:26:02 live 0 0 485 movie bioinformatics, VEP, Variant Effect Predictor, Variation, SNP, CNV, tool {"list_position": "", "youtube_id": "rSIG_OVzyLU", "title": "Analyse your Sequence Variants with the VEP (Web Interface)", "youku_id": "", "length": "14:40"} \N \N \N 2019-09-19 13:26:02 live 0 0 481 movie bioinformatics, genomics, genome, Ensembl, APIs, perl {"list_position": "", "youtube_id": "nxTFcKi1nDw", "title": "How to Install the APIs (GIT and ftp)", "youku_id": "XMTUwNDEzMjkz", "length": "9:48"} \N \N 106005 2019-09-19 13:26:02 live 0 0 482 movie Ensembl, browsing, basic, overview, genome sequencing, assembly {"list_position": "", "youtube_id": "42qZyXSH0Cc", "title": "Introduction to genome browsers using Ensembl", "youku_id": "XMTUwNDEyMjA3", "length": "6:36"} \N \N 106005 2019-09-19 13:26:02 live 0 0 483 view VEP, Variant Effect Predictor, Sequence variants, User data, Custom data, Tool, SNP, CNV {"content": "[[HTML::/info/docs/tools/vep/online/results.html]]
\\r\\n[[HTML::/info/docs/tools/vep/online/input.html]]
\\r\\nRead more about the VEP in our documentation.
\\r\\n[[MOVIE::485]]
"} \N \N \N 2019-09-19 13:26:02 live 0 0 486 view VEP, custom data, user data, format {"content": "[[HTML::/info/docs/tools/vep/vep_formats.html]]
\\r\\n"} \N \N \N 2019-09-19 13:26:02 live 0 0 488 faq ID, stable, gene, transcript, ENSG, ENST, ENSP, protein, name, identifier {"question": "
I have an Ensembl ID, what can I tell about it from the ID?
", "answer": "An Ensembl stable ID consists of five parts: ENS(species)(object type)(identifier).(version).
\\r\\nUsing this information we can make assertions about an Ensembl ID. For example ENSMUSG00000017167.6. From this we can see that it's an Ensembl ID (ENS), from mouse (MUS), it's a gene (G) and it's on its sixth version (.6).
", "category": "z_data"} \N \N 120522 2019-09-19 13:26:02 live 0 0 489 view \N {"content": "Ensembl gene trees are generated by the Gene Orthology/Paralogy prediction method pipeline. All homologues in Ensembl are determined from gene trees.
\\r\\nGene trees are constructed using one representative protein for every gene in every species in Ensembl. The longest translation annotated by the CCDS project is used, if any are available, or the longest protein-coding translation otherwise. (The trees can also be considered as protein trees).
\\r\\nThe display shows the maximum likelihood phylogenetic tree representing the evolutionary history of genes. These trees are reconciled with a species tree, generated by TreeBeST. Internal nodes are then annotated for duplication (red boxes) or speciation (blue boxes) events.
\\r\\n[[IMAGE::genetree.png]]
\\r\\nRed squares represent duplications nodes, blue squares represent speciation nodes, giving rise to paralogues and orthologues. Another class of node, ambiguous, is shown as a lighter blue square. Click on the nodes for more information about the ancestral event.
\\r\\nThe gene of interest is highlighted in red and within-species paralogues are shown in blue, if the option to view paralogues is selected (below the tree diagram).
\\r\\nTaxonomy IDs refer to the NCBI Taxonomy Browser. The number at the top of pop-up menus (upon clicking on a node) corresponds to the node_id from the protein_tree_node table in the compara database.
\\r\\nMultiple alignment of the peptides (green bars) was made using MUSCLE. Green bars shows areas of amino acid alignment, white areas are gaps in the alignment. Dark green bars indicate consensus alignments.
\\r\\nThe consensus amino acid alignment corresponds to the consensus residues in the collapsed node, and will be expanded when the tree is expanded.
\\r\\nYou can also open the up a node in Jalview by clicking on the node.
\\r\\nConfigure this page to customise the tree. Colouring by clade can be removed.
\\r\\nClick on the download button in the blue header of the tree to download the whole tree in a variety of formats, such as CLUSTAL, Newick and OrthoXML. Click on a node to have the option to dowload just that node in one of these formats.
"} \N \N \N 2019-09-19 13:26:02 live 0 0 490 view \N {"content": "Protein Families are groups of proteins with high sequence similarity. They result from classifying all Ensembl proteins and all the metazoan proteins from UniProtKB (SwissProt and TrEMBL) against the TreeFam HMM library. Clusters are then aligned with Mafft.
\\r\\nEnsembl Families are asssigned stable IDs, so you can track the family across releases. The family ID is stable, and should not change in further releases. Please note that the ID mapping exclusively relates to the content, and not to the actual conservation of the alignment.
"} \N \N 120463 2019-09-19 13:26:02 live 0 0 498 lookup \N {"expanded": "", "word": "TSL:NA", "meaning": "Transcript Support Level Not Analysed. Pseudogenes, single exon transcripts, HLA, T-cell receptor and Ig transcripts are not analysed and therefore not given any of the TSL categories.
"} \N \N \N 2019-09-19 13:26:02 live 0 0 531 view GXA, Gene Expression Atlas, expression, baseline, gene expression, transcription, RNASeq, RNA-Seq, RNA seq, atlas {"content": "This page shows the baseline expression from EBI Gene Expression Atlas (GXA). The data and interface come in directly from GXA and are not interpreted or processed by Ensembl. A description of the data on this page can be found on the Expression Atlas help.
The message stating "No baseline expression in tissues found for ####" means that GXA has no data for this gene.
"} \N \N 120522 2022-01-06 10:44:22 live 0 0 499 lookup \N {"expanded": "", "word": "TSL:5", "meaning": "Transcript Support Level 5, for transcripts that are not supported at all by either an mRNA or an EST.
"} \N \N \N 2019-09-19 13:26:02 live 0 0 501 faq \N {"question": "Can I use Ensembl images/screenshots in my publication?
", "answer": "Yes. All pictures from the Ensembl browser can be exported and are free to use. Export your image in several formats (e.g. pdf, png, SVG and PostScript)
\\r\\n[[IMAGE::image_export_formats.png height=\\"303\\" width=\\"308\\"]]
\\r\\nPlease mention which version of the Ensembl browser you used, including the permanent link of the Ensembl page (found at the bottom left hand corner of our webpages).
\\r\\n[[IMAGE::permanent_link.png height=\\"70\\" width=\\"275\\"]]
\\r\\nand cite our latest publication.
", "category": "z_data"} \N \N 106005 2019-09-19 13:26:02 dead 0 0 522 lookup \N {"expanded": "", "word": "APPRIS: principal1", "meaning": "PRINCIPAL1 - APPRIS candidate principal isoform.
"} \N \N \N 2019-09-19 13:26:02 live 0 0 523 lookup \N {"expanded": "", "word": "APPRIS: principal2", "meaning": "PRINCIPAL2 - APPRIS candidate principal isoform (CCDS).
"} \N \N \N 2019-09-19 13:26:02 live 0 0 526 lookup \N {"expanded": "", "word": "APPRIS: principal5", "meaning": "PRINCIPAL5 - APPRIS candidate principal isoform (longest coding sequence).
"} \N \N \N 2019-09-19 13:26:02 live 0 0 524 lookup \N {"expanded": "", "word": "APPRIS: principal3", "meaning": "PRINCIPAL3 - APPRIS candidate principal isoform (earliest CCDS).
"} \N \N \N 2019-09-19 13:26:02 live 0 0 525 lookup \N {"expanded": "", "word": "APPRIS: principal4", "meaning": "PRINCIPAL4 - APPRIS candidate principal isoform (longest CCDS).
"} \N \N \N 2019-09-19 13:26:02 live 0 0 527 lookup \N {"expanded": "", "word": "APPRIS: alternative1", "meaning": "ALTERNATIVE1 - APPRIS candidate principal isoform that is conserved in at least three tested non-primate species.
"} \N \N \N 2019-09-19 13:26:02 live 0 0 528 lookup \N {"expanded": "", "word": "APPRIS: alternative2", "meaning": "ALTERNATIVE2 - APPRIS candidate principal isoform that appears to be conserved in fewer than three tested non-primate species.
"} \N \N \N 2019-09-19 13:26:02 live 0 0 529 view \N {"content": "This view shows an graphical display and a table.
\\r\\nThe graphical display shows the loci of all the genes which have this domain on the karyotype. Your gene of interest is shown as a red arrow; other genes are shown as blue arrows.
\\r\\nThe table lists all the genes which also have this domain. Click on the links to go to the gene.
"} \N \N \N 2019-09-19 13:26:02 live 0 0 536 lookup \N {"expanded": "", "word": "IMPACT: HIGH", "meaning": "The variant is assumed to have high (disruptive) impact in the protein, probably causing protein truncation, loss of function or triggering nonsense mediated decay."} \N \N \N 2019-09-19 13:26:02 live 0 0 537 lookup \N {"expanded": "", "word": "IMPACT: MODERATE", "meaning": "A non-disruptive variant that might change protein effectiveness."} \N \N \N 2019-09-19 13:26:02 live 0 0 538 lookup \N {"expanded": "", "word": "IMPACT: LOW", "meaning": "Assumed to be mostly harmless or unlikely to change protein behaviour."} \N \N \N 2019-09-19 13:26:02 live 0 0 539 lookup \N {"expanded": "", "word": "IMPACT: MODIFIER", "meaning": "Usually non-coding variants or variants affecting non-coding genes, where predictions are difficult or there is no evidence of impact."} \N \N \N 2019-09-19 13:26:02 live 0 0 540 view gene, transcript, splice variant, diagram, structure, status, biotype, merged {"content": "This page gives an overview of the information available at the gene level.
At the top, the page shows the gene name and Ensembl gene ID, the full description of the gene, its synonyms, its genomic location and strand, INSDC coordinates, and its number of transcripts.
The following sections show the Transcript Table and the Summary with links to external databases.
It shows each splice variant of a gene, i.e. protein-coding and non-coding transcripts. Each transcript is given an Ensembl Transcript ID, which is unique and stable.
It provides additonal information and links to external databases:
Ensembl genes, transcripts, and proteins are matched to sequences and information in other biological databases. The matches are referred to as external references, or Xrefs.
\\r\\nXref sources for Ensembl genes include HGNC, UCSC, the Database of Aberrant 3prime splice sites, DBASS3, and OMIM
\\r\\nXref sources for Ensembl transcripts (i.e. matches to Ensembl transcript and protein sequences) include UniProtKB, CCDS, EntrezGene, and NCBI RefSeq.
\\r\\nPlease see the General identifiers view in the transcript tab for more IDs associated with a specific Ensembl transcript and/or protein.
"} 106005 2015-08-25 13:04:55 \N 2019-09-19 13:26:02 live 0 0 542 view ortholog, orthologue, homolog, homologue, species, orthologs, orthologues, homologs, homologues, homology, orthology {"content": "Orthologues inferred from gene trees are determined using all species in that particular database, i.e. all the (mostly) chordates in Ensembl, all the fungi in Ensembl Fungi, all the plants in Ensembl plants, all the metazoa in Ensembl Metazoa, all the protists in Ensembl Protists, or all the species in the Pan-Compara set for Pan-Compara orthologues in Ensembl Genomes. A detailed description of the method is provided here.
\\r\\nUnaligned sequences (nucleotide and/or amino acids) of orthologous genes can be exported in FASTA format by clicking on Sequence export. The Compara API and BioMart can also be used to export orthologues.
\\r\\nSpecies are grouped by clades in the top table, such as Primate, Rodents, and Fish. By default, the full list of orthologues is shown below the table. Click on Show details to display only the orthologues for species in one clade.
\\r\\nThe number of species for each orthologue type is shown in the top table. Orthologue types are assigned by comparing two species, and are as follows:
\\r\\nOrthologues are defined in Ensembl as genes for which the most common ancestor node is a speciation event. These ancestral speciation events are represented by blue nodes in the gene trees.
\\r\\nPossible orthologues are homologues between species where the common ancestor is a weakly supported duplication event. Although they should be called paralogues according to the Compara rules, the low confidence on the duplication node might suggest an error in the phylogenetic reconstruction. We list these cases here as they might be real orthologues, especially in cases where no better orthologue is found.
\\r\\nThe list of orthologues underneath the top table shows the species, the orthologue type, the dN/dS value (if calculated), the Ensembl gene ID and name, the Target %ID and the Query %ID. If you are searching for a gene in human, for example, and looking for its homologue in another species such as mouse, the Query %ID refers to the percentage of the query sequence (human) that matches to the homologue (the mouse protein). Target %ID refers to the percentage of the target sequence (mouse) that matches to the query sequence (human).
\\r\\nIDs, orthology types, and dn/ds values can also be obtained using the compara API or with BioMart, accesible via the main desktop site.
"} 106005 2015-08-25 13:07:46 120522 2019-09-19 13:26:02 live 0 0 543 view disease, phenotype, OMIM, variation, COSMIC, mutation {"content": "This page lists diseases and phenotypes directly associated with a gene by the Online Mendelian Inheritance in Man (OMIM) compendium, Orphanet, DDG2P and other sources in human. In other species these phenotypes come from International Mouse Phenotyping Consortium (IMPC; mouse), Europhenome (mouse), ZFIN (zebrafish), Rat Genome Database (RGD; rat), Animal_QTLdb (various) and Online Mendelian Inhertiance in Animals (OMIA; various). You can see which databases are used for phenotypes for each species on our Variation sources page.
"} 106005 2015-08-25 13:08:40 \N 2019-09-19 13:26:02 live 0 0 544 view variation, polymorphism, sequence variation, single nucleotide polymorphism, SNP, insertion, deletion, population, mutation, non-synonymous, synonymous, indel, genotype, allele, variant, validation, evidence {"content": "This page offers the following information:
\\r\\nPlease see the Ensembl variation documentation for more information such as source of variants, and consequence types (effect on genes and transcripts).
\\r\\nIUPAC Ambiguity Codes
\\r\\n[[IMAGE::iupac_table.png width=\\"394\\" height=\\"302\\"]]
"} 106005 2015-08-25 13:12:08 \N 2019-09-19 13:26:02 live 0 0 545 view variant, variation, polymorphism, sequence variation, single nucleotide polymorphism, SNP, insertion, deletion, population, mutation, non-synonymous, synonymous, indel, genotype, allele, frequency, 1000 genomes, HapMap, minor, major, MAF, ESP {"content": "For the top panel describing variation details such as source and class, see this help page.
\\r\\nPopulations: Who was studied? Populations are described by three-letter codes such as CEU (Utah residents). 1000 genomes data is separated into five super-populations: AFR, AMR, EAS, EUR, and SAS. See this FAQ for a description of what they mean. ALL stands for all 1000 Genomes data, not separated by population. Most of these data are imported through dbSNP.
\\r\\nPie charts
\\r\\n[[IMAGE::pie_graphs.png]]
\\r\\nPie charts can be displayed for 1000 Genomes allele frequencies. If a pie chart is shown on the view for the 1000 Genomes Project, it represents the distribution of the alleles in a 1000 genomes population for a specific variation. The super populations are shown on the top-row. Click on a plus alongside Sub-populations to open up the pie charts for the sub-populations, which are then shown on the row(s) below.
\\r\\nIn the example above, 73% of the alleles found in the African population studied (AFR) are A (frequency of 0.73), and 27% are C. The sub-populations for the AFR population have been opened up and are displayed on the row below.
\\r\\nFrequency tables
\\r\\nThe populations are grouped by project when possible (e.g. 1000 Genomes, HapMap and ESP for human, Mouse Genomes Project for Mouse).
\\r\\nThe populations studied are shown in the first column. Allele frequencies and counts are followed by the genotype frequencies and counts. The final column, Genotype detail allows you to jump to the individual genotypes for that population.
\\r\\nThe first row shows a summary all the individuals in that study and is highlighted in yellow. The populations are then grouped, with the super population (in blue) followed by its sub-populations (white and grey).
\\r\\n[[IMAGE::variation_population.png]]
\\r\\nThis example is taken from the same variant as the pie charts above.
"} 106005 2015-08-25 13:13:48 \N 2019-09-19 13:26:02 live 0 0 546 view variant, consequence, type, variation, polymorphism, sequence variation, single nucleotide polymorphism, SNP, insertion, deletion, population, mutation, non-synonymous, synonymous, gene, transcript, SIFT, PolyPhen, protein {"content": "The Gene and Transcript consequences table shows the position and effect of the variation on specific genes and transcripts.
\\r\\nColumns in the table are as follows:
\\r\\nDiseases and traits that are associated with the variant of interest are shown on this page.
\\r\\nThe reported gene comes directly from the specific study. These are genes that were reported in the paper as being associated with this GWAS variant, and may not correspond to the genes reported on the Genes and Regulation page for the variant. The associated allele is also that reported in the paper, and may be the positive or negative strand allele (the alleles shown in Ensembl are always the positive strand alleles).
\\r\\nCheck the Variation - Source Documentation for a full list of sources of variation data currently available in Ensembl.
"} 106005 2015-08-25 13:26:06 \N 2019-09-19 13:26:02 live 0 0 548 view karyotype, chromosome, genome, assembly {"content": "A karyotype, displayed in this page, is available for some species in Ensembl. Images are imported from various sources, depending on the species. Dark and light bands reflect heterochromatin and euchromatin staining.
\\r\\nStatistics are shown below the karyotype, as follows:
\\r\\nFor the remaining statistics, see this help page.
"} 106005 2015-08-25 13:27:36 \N 2019-09-19 13:26:02 live 0 0 549 view \N {"content": "Loci associated with a phenotype, disease or trait are listed on this page. These loci may be genes, variants, structural variants or QTLs. Loci names, types, locations and the source of the annotation are reported. Where the report of an association to a variant also implicates a gene, the gene name is also listed. information from external references, such as publications, are reported where available. The submitter column is specific to data imported from ClinVar, and contains information about the group who produced and submitted the data initially. The supporting evidence column contains PubMed ID(s) that have cited this association, this is also only available for ClinVar data.
\\r\\nThe table can be filtered by loci type or annotation source.
"} 106005 2015-08-27 12:18:27 125915 2019-09-19 13:26:02 live 0 0 550 view \N {"content": "Loci associated with a phenotype are annotated onto the karyotypic view on this page. These loci may be either genes or variants.
\\r\\nArrows on the karyotype indicate loci associated with the phenotype. If the association was determined via a GWAS study and has a p-value indicating the strength of the association, it will be shown in shades of blue, red and purple, according to the colour scale below the karyotype. If the association does not have a p-value, the arrows are shown in black. Black arrows do not necessarily indicate straightforward Mendelian inheritance, just that no p-value is available.
"} 106005 2015-08-27 12:19:40 120522 2019-09-19 13:26:02 live 0 0 565 view \N {"content": "[[IMAGE::Manhattan.png height=\\"540\\" width=\\"693\\"]]
\\r\\nThis page displays a Manhattan plot of the variant of interest and surrounding area, indicating linkage. Linkage is calculated for a single population, which is displayed above the plot.
\\r\\nr2 is displayed by default, but if you wish to view D', you can do so by clicking on Configure the page, then going to Display options. There you can also adjust the display with and the position of the horizontal cutoff (set by default to 0.8).
"} 106005 2016-02-17 09:53:09 106005 2019-09-19 13:26:02 live 0 0 557 view LD, r2, r-squared, linkage, disequilibrium {"content": "\\r\\nThe LD export page allows you to view variants in LD across a locus. The table shows variants along the top and down the side. r2 values between the pairs are shown in the table, with shades of red and pink indicating the strength of the association: r2 of 1 is shown in red, whereas 0 is white, with shading of pink in between.
\\r\\nThe population that LD is calculated for is shown at the top. The population codes for 1000 Genomes are listed below:
\\r\\nPopulation Code | \\r\\nDescription | \\r\\nSuper Population Code | \\r\\n
CHB | \\r\\nHan Chinese in Bejing, China | \\r\\nEAS | \\r\\n
JPT | \\r\\nJapanese in Tokyo, Japan | \\r\\nEAS | \\r\\n
CHS | \\r\\nSouthern Han Chinese | \\r\\nEAS | \\r\\n
CDX | \\r\\nChinese Dai in Xishuanagbanna, China | \\r\\nEAS | \\r\\n
KHV | \\r\\nKinh in Ho Chi Minh City, Vietnam | \\r\\nEAS | \\r\\n
CEU | \\r\\n\\r\\n Utah Residents (CEPH) with Northern and Western European ancestry \\r\\n | \\r\\nEUR | \\r\\n
TSI | \\r\\nToscani in Italia | \\r\\nEUR | \\r\\n
FIN | \\r\\nFinnish in Finland | \\r\\nEUR | \\r\\n
GBR | \\r\\nBritish in England and Scotland | \\r\\nEUR | \\r\\n
IBS | \\r\\nIberian population in Spain | \\r\\nEUR | \\r\\n
YRI | \\r\\nYoruba in Ibadan, Nigera | \\r\\nAFR | \\r\\n
LWK | \\r\\nLuhya in Webuye, Kenya | \\r\\nAFR | \\r\\n
MAG | \\r\\nMandinka in The Gambia | \\r\\nAFR | \\r\\n
MSL | \\r\\nMende in Sierra Leone | \\r\\nAFR | \\r\\n
ESN | \\r\\nEsan in Nigera | \\r\\nAFR | \\r\\n
ASW | \\r\\nAmerican's of African Ancestry in SW USA | \\r\\nAFR | \\r\\n
ACB | \\r\\nAfrican Carribean in Barbados | \\r\\nAFR | \\r\\n
MXL | \\r\\nMexican Ancestry from Los Angeles USA | \\r\\nAMR | \\r\\n
PUR | \\r\\nPuerto Rican from Puerto Rica | \\r\\nAMR | \\r\\n
CLM | \\r\\nColombian from Medellian, Colombia | \\r\\nAMR | \\r\\n
PEL | \\r\\nPeruvian from Lima, Peru | \\r\\nAMR | \\r\\n
GIH | \\r\\nGujarati Indian from Houston, Texas | \\r\\nSAS | \\r\\n
PJL | \\r\\nPunjabi from Lahore, Pakistan | \\r\\nSAS | \\r\\n
BEB | \\r\\nBengali from Bangladesh | \\r\\nSAS | \\r\\n
STU | \\r\\nSri Lankan Tamil from the UK | \\r\\nSAS | \\r\\n
ITU | \\r\\nIndian Telugu from the UK | \\r\\nSAS | \\r\\n
\\r\\n
These populations have been divided into 5 super populations
\\r\\nAFR, African
\\r\\nAMR, Ad Mixed American
\\r\\nEAS, East Asian
\\r\\nEUR, European
\\r\\nSAS, South Asian
\\r\\nShort sequence variations are shown by consequence type in the Variation Table.
\\r\\nThis view shows all the variant consequences in a gene. If the same variant falls in several transcripts within the same gene, a new row will be displayed for each transcript. Therefore, this number reflects the number of variant consequence types across the transcripts.
\\r\\nIf you show a table for a variation consequence type, the columns will be as follows:
\\r\\nYou can order the table by the columns by clicking on the up/down arrows by the column titles. Filters in the top grey bar allow you to filter the data by SNP type.
\\r\\nGo to the Variation Image for a graphical view.
\\r\\n[[IMAGE::Variation_table.png height=\\"918\\" width=\\"1500\\"]]
"} 120522 2015-11-23 15:18:05 \N 2019-09-19 13:26:02 live 0 0 559 view assembly, convert, coordinate, file, bed, gff, gtf, wig, vcf {"content": "The Assembly Converter can convert coordinates on one genome assembly to another. Input coordinates in one of the specified formats, and receive a file in the same format with coordinates on the different assembly
\\r\\nThe Assembly Converter can convert the following file types:
\\r\\n\\r\\nThe data input form is shown below:
\\r\\n[[IMAGE::Assembly_convert_input.png height=\\"462\\" width=\\"650\\"]]
\\r\\nChoose your species and the conversion you wish to do from the two drop-downs. The assembly mapping drop-down shows all available mapping events for your chosen species, including conversions from old assemblies to newer ones and conversions back to old from new.
\\r\\nYou can input your data by pasting it into a box, uploading a file or attaching a web file. Choose the format that matches your data type.
\\r\\nTo run the conversion, click on the blue Run button.
\\r\\nThe results are shown in a table as below:
\\r\\n[[IMAGE::Assembly_converter_results.png height=\\"260\\" width=\\"684\\"]]
\\r\\nJobs will show as Queued, Running, Done or Failed. The table refreshes every 10 seconds, however you can hit the refresh button to force a refresh.
\\r\\nWhen your job is listed as Done, click on the download icon ([[IMAGE::download_icon.png height=\\"24\\" width=\\"23\\"]]) to download the file.
"} 106005 2015-11-23 15:44:15 \N 2019-09-19 13:26:02 live 0 0 560 view ID, convert, missing, change, update, lost, name {"content": "The ID history converter allows you to input a list of Ensembl IDs from a previous Ensembl release, and find what IDs they map to in the current release.
\\r\\nThe ID History converter can only be used to convert Ensembl IDs, which begin ENS.
\\r\\nChoose your species and the conversion and an optional name for the job. You can then specify the set of stable IDs as a new line separated list of IDs in the text area or you can upload a file or specify a URL containing the IDs.
\\r\\nClick Run to run the job.
\\r\\n[[IMAGE::ID_history_input.png height=\\"414\\" width=\\"595\\"]]
\\r\\nJobs will show in the Jobs table as Queued, Running, Done or Failed. The table refreshes every 10 seconds, however you can hit the refresh button to force a refresh.
\\r\\nWhen your job is listed as Done, click on the download icon to download the file. You can look at the results directly on the browser if you click on the link beside (View results), which opens a page like the one shown below:
\\r\\n[[IMAGE::ID_history_results.png height=\\"546\\" width=\\"862\\"]]
\\r\\nYour input ID(s) are shown in the first column. All IDs that match the ID are shown in the second column. The third column shows which Ensembl release each version first appeared in, including the version number. Where that archive is available, the IDs are links to that archive.
"} 106005 2015-11-23 15:54:34 106005 2019-09-19 13:26:02 live 0 0 561 view ID, convert, missing, change, update, lost, name {"content": "The ID history converter allows you to input a list of Ensembl IDs from a previous Ensembl release, and find what IDs they map to in the current release.
\\r\\n\\r\\nYour input ID(s) are shown in the first column. All IDs that match the ID are shown in the second column. The third column shows which Ensembl release each version first appeared in, including the version number. Where that archive is available, the IDs are links to that archive.
\\r\\n[[IMAGE::ID_history_results.png height=\\"546\\" width=\\"862\\"]]
"} 106005 2015-11-23 16:04:40 106005 2019-09-19 13:26:02 live 0 0 563 view linkage disequilibrium, LD, linked, variant, mapping, polymorphism, sequence variant, single nucleotide polymorphism, SNP, insertion, deletion, population, mutation, non-synonymous, synonymous, indel, genotype, allele {"content": "The top panel gives details on a given variant such as source, class, clinical significance and others. See 'Explore this variant' for more details.
\\r\\nA table is shown to summarise linkage data available for different populations. Click on any link in the population column to view a description of the population.
\\r\\nColumns can be sorted by clicking on the column header. Click on \\"Configure this page\\" at the left to select populations to be displayed, or to change the distance over which linked variants are shown.
\\r\\nTables showing Linked variants can be selected within the linkage column. These values are calculated by Ensembl and presented in tables for each population. Note the values are calculated for the comparison of the variant of interest with nearby variants. These values are based on population frequencies submitted to dbSNP, for example from the HapMap project.
\\r\\nThe linked variant tables show the distance between the linked variant and the variant on which the view is focused, any overlapping genes and/or phenotypes associated with the linked variant and the D' and r2 values. D' is the difference between the observed and the expected frequency of a given haplotype. If two loci are independent (i.e. in linkage equilibrium and therefore not coinherited at all), the D' value will be 0. r2 is the correlation between a pair of loci. It varies from 0 (loci are in complete linkage equilibrium) to 1 (loci are in complete linkage disequilibrium and coinherited). Note that only LDs with r2 values larger than 0.05 are available in Ensembl.
\\r\\nFor each region in the genome that is in high LD, a variant can be chosen to represent all linked variant in the area. This is called a tag SNP and is calculated by Ensembl per population where there is sufficient genotype information. See Chen, Y et. al 2010. These associations may be viewed in an LD plot or LD table.
\\r\\nNote: the difference between a linked variant table and an LD table is that the first is determined by comparing the variant of interest to nearby variants. The LD plot and tables are constructed using associations of all tag variants in the regions. LD is calculated from two variants within the region, in addition to a variant in the region against the variant of interest.
"} 120522 2016-02-15 12:39:26 \N 2019-09-19 13:26:02 live 0 0 564 view haplotype, protein, transcript, variation, variant, population, individual, sample, 1000 genomes {"content": "For any given transcript, this page displays actual haplotypes of variants found in that transcript or protein in 1000 Genomes individuals.
\\r\\nThese are shown as a table listing the haplotype as a series of alterations from the reference sequence. Protein haplotypes are shown like 523R>Q indicating the amino acid position, the reference amino acid and the alternative amino acid. CDS haplotypes are shown like 1568G>A indicating the base position in the CDS, the reference base and the alternative base.. Where a haplotype has multiple variants, these are shown separated by commas.
\\r\\nThe table also shows the overall frequency of that haplotype in the 1000 Genomes group, as well as in the super-populations studied by the 1000 Genomes individuals. You can view this table for either protein or CDS haplotypes.
\\r\\n[[IMAGE::haplotypes_table.png height=\\"649\\" width=\\"1338\\"]]
\\r\\nTo get more details on a haplotype, click on it to jump to a section that opens up at the bottom of the page. If you scroll back up and choose another haplotype, this will jump back down with the new haplotype.
\\r\\nThe details section includes information on Population frequencies, Aligned sequence, Sequence, Corresponding protein or CDS haplotypes (depending on which you are in to start with) and Sample data.
\\r\\nThe Population frequency section lists all the 1000 Genomes sub-populations where the haplotype was observed. If the haplotype is oberved in a population, its frequency is shown as a bar graph, a frequency and a count. The populations are sorted by super-population; hover over the population codes to get the full name.
\\r\\n[[IMAGE::haplotype_pop_freq.png height=\\"269\\" width=\\"508\\"]]
\\r\\nThe Aligned sequence section displays an alignment between the reference protein sequence, matches or mismatches with the haplotype protein sequence, the reference CDS sequence and matches or mismatches with the haplotype CDS sequence. Codons in the CDS are shown with yellow highlighting. Amino acid changes are highlighted with a colour indicating the likely effect on the protein function; refer to the legend at the top to see what the colours mean. Click on the variants to get more information, such as dbSNP ID, and go to the variant tab.
\\r\\nSince multiple CDS haplotypes can give one protein haplotype, due to synonymous changes, an alignment on a protein haplotype may have multiple lines for each alternative CDS haplotype. A CDS haplotype will only have one alternative protein haplotype.
\\r\\n[[IMAGE::aligned_haplotype.png height=\\"652\\" width=\\"1003\\"]]
\\r\\nThe Sequence section shows the complete protein or CDS sequence of the haplotype.
\\r\\nThe Corresponding CDS haplotype section is shown for protein haplotypes and lists all the possible CDS haplotypes for that protein haplotype. The Corresponding protein haplotype section is shown for a CDS haplotype and shows the one protein haplotype produced by that CDS haplotype.
\\r\\nThe Sample data table lists all the 1000 Genomes individuals who have that particular haplotype, using their unique identifiers. Their population codes are shown. Copies indicate whether the individual is homozygous (2) or heterozygous (1) for that haplotype.
\\r\\nFor more information on transcript haplotypes, please see this article.
\\r\\nWilliam Spooner, William McLaren, Timothy Slidel, Donna K. Finch, Robin Butler, Jamie Campbell, Laura Eghobamien, David Rider,
\\r\\nChristine Mione Kiefer, Matthew J. Robinson, Colin Hardman, Fiona Cunningham, Tristan Vaughan, Paul Flicek & Catherine Chaillan Huntington.
\\r\\nHaplosaurus computes protein haplotypes for use in precision drug design.
\\r\\nNature Communications volume 9, Article number: 4128 (2018)
\\r\\nhttps://www.nature.com/articles/s41467-018-06542-1
\\r\\n\\r\\n
"} 106005 2016-02-15 17:01:31 125866 2019-09-19 13:26:02 live 0 0 567 faq API, connection, Perl, failure, error, script, connect, valid, {"question": "
I have an error message when I try to connect to the Perl API.
", "answer": "Usually this is caused by using the wrong Ensembl API version to access a server. The API can only find species from the same version as it.
\\r\\nThe master branch of the Ensembl git repository is typically one release ahead of the public servers, and will always fail to find a species by default. The master branch is in development, and is not guaranteed to work. To access an older release, the Registry option DB_VERSION can be set, but it is preferable to use the correct API version to avoid unintended consequences.
\\r\\nYou can also try the systematic name of the species.
\\r\\nTry running perl ensembl/misc-scripts/ping_ensembl.pl and check the output.
\\r\\nEnsembl Genomes provides these species, and releases roughly two weeks later than Ensembl. If you have just updated your API and Ensembl recently announced a release, your software may be too new for the Ensembl Genomes servers. You can wait until they release, or roll back your API version. This is easy if you installed from Github.
\\r\\nVERSION=`perl -e 'use Bio::EnsEMBL::ApiVersion qw/software_version/; print software_version'`\\r\\ngit checkout release/`expr ${VERSION} - 1`
\\r\\nIf you installed a downloaded package, then you will need to download an older Ensembl API release.
\\r\\nIn a long-running process, it is possible to hit database server time limits for connections. Typically after 8 hours the server will close the connection, and your Perl code will die.
\\r\\n1. Use the nearest database server to improve efficiency. Ensembl has mirrors in Asia and the USA as well as the main servers hosted in the UK. See the Mirrors page for specifics.
\\r\\n2. For intense database access and high frequency querying, choose a good time to disconnect manually
\\r\\n...\\r\\n# discrete work completed that takes an hour or two\\r\\n$gene_adaptor->dbc->disconnect_if_idle;\\r\\n\\r\\n# API re-opens connection automatically\\r\\n$gene_adaptor->fetch_by_stable_id($stable_id);
\\r\\n3. For scripts which occasionally consult Ensembl while working on a big problem for several hours
\\r\\n...\\r\\n# For all code using Ensembl\\r\\nBio::Ensembl::Registry->set_disconnect_when_inactive(1);\\r\\n# For just one adaptor\\r\\n$adaptor->dbc->disconnect_when_inactive(1);\\r\\n# For just one occasion\\r\\n$adaptor->dbc->disconnect_if_idle;\\r\\n# This causes the connection to close whenever it is not being used. This is costly if there are very frequent database requests\\r\\n\\r\\n# In combination with disconnecting after every request, you can hold the connection open for the duration of a code block\\r\\nmy @gene_ids = ('ENSG0000001',...);\\r\\nmy %external_refs;\\r\\n$gene_adaptor->dbc->prevent_disconnect(sub {\\r\\n while (my $id = shift @gene_ids ) {\\r\\n my $gene = $gene_adaptor->fetch_by_stable_id($id);\\r\\n my $xrefs = $gene->get_all_DBEntries;\\r\\n foreach my $xref (@$xrefs) {\\r\\n $external_refs{$id} = $xref->display_id;\\r\\n }\\r\\n }\\r\\n});\\r\\n# This will finish faster than if it continues to disconnect and reconnect all the time
\\r\\n4. For scripts which access Ensembl a lot and have no easy opportunity to behave as in option 2 above.
\\r\\n# For all code using Ensembl\\r\\nBio::Ensembl::Registry->set_reconnect_when_lost(1);\\r\\n# For one adaptor\\r\\n$adaptor->dbc->reconnect_when_lost(1);\\r\\n# This option adds an additional message to every call to the database, checking that the connection is still up\\r\\n# It increases network traffic and latency of each request, but can restore a broken connection
\\r\\nOption 2 is both fastest and makes best use of Ensembl servers. Option 4 is next quickest, and option 3 is slow for heavy access, but suitable for occasional requests.
\\r\\nThe server can't host any more connections. Users are connecting too many times in a short time period. Contact Ensembl Helpdesk (helpdesk@ensembl.org) to let us know there is a problem, and try again later.
\\r\\nThe Ensembl API requires both DBI and DBD::mysql packages, typically via cpan or cpanm. If you have installed these libraries but still have this problem, you will need to add them to your PERL5LIB environment variable.
\\r\\necho $PERL5LIB\\r\\n# Can I see where my libs are installed?\\r\\nexport PERL5LIB=$PERL5LIB:/path/to/perl/lib
\\r\\nFirstly, check your connection parameters. Run perl ensembl/scripts/ping_ensembl.pl and see what it says. If both ping_ensembl and your script cannot connect, the most likely cause is that your local network prohibits this kind of traffic. Ask you sysadmins if they allow outbound database traffic on port 3306/5306.
", "category": "core_api"} 106005 2016-04-29 12:09:46 \N 2019-09-19 13:26:02 live 0 0 569 view VCF, Ped, 1000 genomes, converter, PLINK, Haploview {"content": "The VCF to PED Converter tool converts VCF file to create a linkage pedigree file (PED) and a marker information file, which may be loaded into other variation data analysis tools, such as PLINK and Haploview. You can choose to convert a VCF file of data taken from the 1000 Genomes project, or you can supply the VCF to PED Converter tool with your own files.
When you reach the VCF to PED Converter web interface, you will be presented with a form to define the allele frequency data to want to retreive.
Name for this job (optional): naming each of your data requests with a unique name allows you to track and search the list of your submitted jobs.
Species: The VCF to PED Converter tool is based on population frequency data generated by the 1000 Genomes project, and is therefore only available for the human GRCh37 assembly, which is selected by default.
Region Lookup: Define your genomic region of interest in the format chromosome#:Start_coordinate-End_coordinate e.g 4:122868000-122946000.
Genotype file URL: Define a URL that contains a VCF file that contains the population genotypes.
Sample-population mapping file URL: Define a URL that contains a file which lists all the individuals and the populations from which they come.
Base format: Choose how to express the genotypes. You can either select 'Bases' (i.e ATGC) or 'Numbers' (i.e 1234).
Biallelic: Exclude sites with more than two alleles from output.
Output: The output of the VCF to PED Converter is a PED file and a Marker Information file, which can be individually downloaded and used in downstream applications.
[[IMAGE::VCFtoPEDupdated.png height="201" width="1148"]]
"} 120522 2016-06-30 12:28:32 254453 2022-11-16 13:40:38 live 0 0 570 view polyploid, homologous, align, {"content": "
The top panels are similar to the chromosome diagram and gene map at the top of the Region in Detail view in the Location tab. Each homologous chromosome is shown in the panel.
\\r\\nGenomes for each chromosome are displayed graphically in the lower panel. This page shows chromosomes, scaffolds and contigs as they are.
\\r\\n\\r\\n
The chromosome you are coming from (for example if you were in the gene or transcript tab, or another view in the location tab) is shown in the first panel. Genes are drawn by default.
\\r\\n\\r\\n
\\r\\n
[[IMAGE::polyploid.png width=\\"650\\" height=\\"500\\"]]
\\r\\nThe image above shows wheat chromosome 5D, base pairs 17,389,805 to17,402,607 and the corresponding region in 5A, 5B and scaffold IWGSC_CSS_5BS_2295667. The pink bar shows the pairwise alignment of this region of chromosome 5D to each of the homologous chromosomes. Click on the pink bar to see the chromosome and coordinates (in base pairs) of the alignment. Green shading connects the alignments.
\\r\\nCustomise the view using the Configure this page toolbar, this allows display of different tracks, such as variants and ESTs (Expressed Sequence Tags) aligned to the genome.
\\r\\nZoom in or out by using the zoom slide, or the plus and minus buttons at the bottom of each panel. The panel may also be flipped in orientation or realigned using the buttons below the image. Click and drag a box with your mouse around any region to zoom in to that region.
"} 120463 2016-07-05 07:42:21 120463 2019-09-19 13:26:02 dead 0 0 571 view polyploid, homologous, align, {"content": "The top panels are similar to the chromosome diagram and gene map at the top of the Region in Detail view in the Location tab. Each homologous chromosome is shown in the panel.
\\r\\nGenomes for each chromosome are displayed graphically in the lower panel. This page shows chromosomes, scaffolds and contigs as they are.
\\r\\n\\r\\n
The chromosome you are coming from (for example if you were in the gene or transcript tab, or another view in the location tab) is shown in the first panel. Genes are drawn by default.
\\r\\n\\r\\n
\\r\\n
[[IMAGE::polyploid.png height=\\"500\\" width=\\"650\\"]]
\\r\\nThe image above shows wheat chromosome 5D, base pairs 17,389,805 to17,402,607 and the corresponding region in 5A, 5B and scaffold IWGSC_CSS_5BS_2295667. The pink bar shows the pairwise alignment of this region of chromosome 5D to each of the homologous chromosomes. Click on the pink bar to see the chromosome and coordinates (in base pairs) of the alignment. Green shading connects the alignments.
\\r\\nCustomise the view using the Configure this page toolbar, this allows display of different tracks, such as variants and ESTs (Expressed Sequence Tags) aligned to the genome.
\\r\\nZoom in or out by using the zoom slide, or the plus and minus buttons at the bottom of each panel. The panel may also be flipped in orientation or realigned using the buttons below the image. Click and drag a box with your mouse around any region to zoom in to that region.
"} 120463 2016-07-05 07:45:05 120463 2019-09-19 13:26:02 live 0 0 572 view \N {"content": "File Chameleon is a tool to assist in reformatting existing genomic flat files. It doesn't convert formats or merge files, only modifies existing files already available on the Ensembl FTP site.
\\r\\n\\r\\n
Entering a job name can help you track your File Chameleon jobs and better enable the Ensembl team to assist in any errors you might encounter.
\\r\\n\\r\\n
[[IMAGE::FC_select_species.png height=\\"69\\" width=\\"941\\"]]
\\r\\nSelect the species you want to convert a file for, the available species are those available from the Ensembl FTP site. As well, not all filters are available for all species
\\r\\n\\r\\n
File Chameleon currently supports GFF3, GTF and FASTA formats, select which file format you want to retrieve.
\\r\\n\\r\\n
Depending on which file format you’ve selected, there are different reformatting options available.
\\r\\n[[IMAGE::FC_select_options.png height=\\"271\\" width=\\"950\\"]]
\\r\\n\\r\\n
For some species File Chameleon can convert chromosome names from Ensembl style (1, 2, MT ...) to UCSC style (chr1, chr2, chrMT ...)
\\r\\n\\r\\n
For GTF and GFF3 formats you can filter our genes over a specific sizes, currently 2, 4, 6, or 8 Mbp.
\\r\\n\\r\\n
Some tools require transcript_id values in the attributes column of all records, in GTF/GFF3 files including gene records. Typically this is achieved by copying the gene_id value as the transcript_id attribute, which File Chameleon can automatically do.
\\r\\n\\r\\n
In Ensembl GFF3 files the coordinates of features in patches are with reference to the reference (eg. the chromosome). The remap patches filter remaps the coordinates of features in patches to be relative to the start of that particular patch.
\\r\\n\\r\\n
[[IMAGE::FC_source_file.png height=\\"40\\" width=\\"943\\"]]
\\r\\nFile Chameleon tries to select the most likely file for that species a user would be interested in, however for each species there are usually multiple possible files available, containing different combinations of patches. If the default isn't the file you're interested in, you can select a different file to apply the filters to.
\\r\\n\\r\\n
Once you run the job you'll be redirected to a table that lists jobs that are currently running or recently completed. A ticket ID is assigned to each job and additional information is provided i.e. Analysis, Jobs and Submitted at (date and time). You can customise the table by showing/hiding columns.
\\r\\n[[IMAGE::FC_jobs.png height=\\"101\\" width=\\"961\\"]]
\\r\\nThe progression of the job gets automatically refreshed every 10 seconds until the job is fully completed.
\\r\\nYou can download the results by clicking on the download icon when the job is complete. The file is available for download for 10d after which it will automatically be deleted.
"} 106005 2016-07-11 08:48:57 106005 2019-09-19 13:26:02 live 0 0 573 faq variant, SNP, allele, reference genome assembly, reference genome, low frequency allele, minor frequency allele, MAF, ancestral allele, CNV, copy number variant, single nucleotide polymorphism {"question": "What genomic sequence is available in Ensembl?
", "answer": "Ensembl contains the reference genome assembly for, at last count, 87 vertebrate species (genomic data for additional species can be found at our sister site, Ensembl Genomes).
\\r\\nReference assemblies can be compiled from the DNA of one individual, a collection of individuals, a breed or a strain. This depends on the species. Find the DNA source of each genome sequence in the More information and statistics link on each species home page.
\\r\\nFor species where the reference genome is a single individual's DNA, it should be kept in mind that, as genome builds are updated, the reference sequence might be found to be unusual, or to differ from the consensus sequence in a specific population, strain or species, in various ways. For example, it might vary in copy number for certain genomic regions or contain low-frequency alleles for particular genes.
\\r\\nIn the latter case, the reference allele will occasionally be a \\"minor-frequency allele\\", or MAF, or may not be the ancestral allele.
\\r\\n", "category": "assemblies"} 122937 2016-08-03 15:45:37 122937 2019-09-19 13:26:02 dead 0 0 574 faq variant, SNP, allele, reference genome assembly, reference genome, low frequency allele, minor allele frequency, MAF, ancestral allele, CNV, copy number variant, single nucleotide polymorphism {"question": "
The reference allele of my SNP of interest is low-frequency / is not the ancestral allele. Why?
", "answer": "Reference genome assemblies can be compiled from the DNA of one individual, a collection of individuals, a breed or a strain, depending on the species.
\\r\\nIf DNA from one or a few individuals was sequenced to create the reference genome, some regions of this reference assembly may include low-frequency nucleotides rather than the nucleotide most prevalent in the broader population.
\\r\\nEnsembl variant pages will report the reference allele, the ancestral allele, and the minor allele frequency (MAF; the rate at which the second-most common nucleotide occurs) for variant regions. Occasionally the reference allele will be a minor allele instead of the ancestral allele simply because of the origin of the reference genome. For example, in the reference genome assembly for human, each contig is derived from a single individual.
\\r\\nThe DNA source of each genome sequence, and whether it is based on an individual or a population, can be found in the More information and statistics link on each species home page.
\\r\\n\\r\\n
", "category": "variation"} 122937 2016-08-05 11:09:48 122937 2019-09-19 13:26:02 dead 0 0 575 view VCF, BAM, DataSlicer, Slicer, 1000 Genomes, {"content": "
The Data Slicer provides an interface which allows users to get subsections of either VCF (VCFtools) or BAM (SAMtools) files based on genomic coordinates.
\\r\\nCurrently available for GRCh37 only, you can access the Data Slicer from the tools link in the menu bar at the top of every page.
\\r\\nFirstly, you have the option of naming your job.
\\r\\nThen you will need to select the file format, VCF or BAM.
\\r\\nTo select the region you want to slice from, type the chromosome and coordinates; e.g. 1:1-50000.
\\r\\nIf you chose VCF as your format, you will be given the option to select 1000 Genomes Phase 3, 1000 Genomes Phase 1 or to provide a URL. For BAM you only have the option to provide a publicly visible URL. These URLs must be accompanied by either a tabix index (.tbi) or BAM index (.bai) of the same name. All 1000 Genomes VCF and BAM files on the FTP site have these indices with them. Please note that this service will only work for other BAM files over http.
\\r\\nIf you are slicing a VCF file you can also subset the data by individual or population. You can either put a comma separated list of individuals or populations in the box or select individuals or populations from the dropdown list. If you wish to select multiple individuals or populations, hold the ctrl key (on Windows/Linux) or the cmd key (Macs).
\\r\\nAfter clicking next the system produces your final file.
\\r\\n"} 120463 2016-09-12 23:36:52 \N 2019-09-19 13:26:02 live 0 0 576 view \N {"content": "The Variation Pattern Finder lets you look for patterns of shared variation between individuals in the same vcf file. In any specified chromosomal regions, different samples will have different combinations of variations. The finder looks for distinct variation combinations within the region, as well as individuals associated with each variation combination pattern. The finder only focuses on variations that change protein coding sequences such as missense variants, splice site changes.
\\r\\nThe vcf format is a tab format for presenting variation sites and genotypes data and is described: http://vcftools.sourceforge.net/specs.html. This tool takes both vcf4.0 and vcf4.1 format files.
\\r\\nCurrently available for GRCh37 only, you can access the Variation Pattern Finder from the tools link in the menu bar at the top of every page.
\\r\\nFirstly this form gives you the option of naming your job.
\\r\\nTo select the region you want to slice from, type the chromosome and coordinates; e.g. 1:1-50000.
\\r\\nYou now have the option to select 1000 Genomes Phase 3, 1000 Genomes Phase 1 or to provide the URL of any publicly visible vcf file (over http or ftp). These URLs must be accompanied by either a tabix index (.tbi) of the same name. For more information about creating tabix indexes please look at Tabix: fast retrieval of sequence features from generic TAB-delimited files. All 1000 Genomes VCF files on the FTP site have these indices with them.
\\r\\nAfter clicking next the system produces your final file.
\\r\\n\\r\\n
The results file will have the following sections:
\\r\\n1. Variation Header:
\\r\\n
2. Freq column: it gives the frequency of the given variant genotype combination in the file
3. Sample panel: it displays the first 2 samples for a particular population who have this pattern of variation and the heading shows which population that sample group is from
4. Genotype Panel: this is the individual genotypes as given by the VCF file. Please note if the delimiter symbol is | this means the genotype is phased; otherwise un-phased. “./.” in the expanded view represents sites with no genotype data. “-“ in the collapsed view represent genotypes that are either homozygous reference or no data.
Orthologues inferred from gene trees are determined using all species in that particular database, i.e. all the (mostly) chordates in Ensembl, all the fungi in Ensembl Fungi, all the plants in Ensembl plants, all the metazoa in Ensembl Metazoa, all the protists in Ensembl Protists, or all the species in the Pan-Compara set for Pan-Compara orthologues in Ensembl Genomes. A detailed description of the method is provided here.
\\r\\nUnaligned sequences (nucleotide and/or amino acids) of orthologous genes can be exported in FASTA format by clicking on Sequence export. The Compara API and BioMart can also be used to export orthologues.
\\r\\nThe list of orthologues underneath the top table shows the species, the orthologue type, the dN/dS value (if calculated), the Ensembl gene ID and name, links to other views, the Target %ID and the Query %ID, the Gene Order Conservation (GOC) score, the Whole Genome Alignment (WGA) coverage, and an indication of confidence of orthology.
\\r\\nThe Query %ID refers to the percentage of the query sequence (human) that matches to the homologue (the mouse protein). Target %ID refers to the percentage of the target sequence (mouse) that matches to the query sequence (human).
\\r\\nIDs, orthology types, and dn/ds values can also be obtained using the compara API or with BioMart.
"} 120522 2016-09-21 16:13:29 \N 2019-09-19 13:26:02 live 0 0 579 view variant, variation, strain, mouse, mouse genome project, {"content": "\\r\\nShort sequence variations between the 18 mouse strains represented in Ensembl are displayed in the Strain Table. The data comes from the Mouse Genome Project.
\\r\\nThis view shows all the observed alleles for the different mouse strains in a given genomic location.
\\r\\nThe table columns are as follows:
\\r\\nYou can order the table by the columns by clicking on the up/down arrows by the column titles. Filters above the table allow you to filter the data by variant class or variant consequence.
\\r\\n"} 120522 2016-09-23 08:24:28 125866 2019-09-19 13:26:02 live 0 0 584 movie search {"list_position": "", "youtube_id": "jcvF9HJeaZk", "title": "Using search", "youku_id": "XMjUxMjAxODQzNg", "length": "4.41"} 106005 2017-02-14 11:34:06 106005 2019-09-19 13:26:02 live 0 0 592 movie \N {"list_position": "", "youtube_id": "LHzUWKolxOI", "title": "LiteMol", "youku_id": "XMzk0MDU1MzY0MA", "length": "2.03"} 106005 2018-11-28 09:56:00 106005 2019-09-19 13:26:02 live 0 0 594 view protein structure, 3D, PDBe, variant {"content": "
\\r\\n Action \\r\\n | \\r\\n\\r\\n Mouse \\r\\n | \\r\\n\\r\\n Touchscreen \\r\\n | \\r\\n
\\r\\n Rotate \\r\\n | \\r\\n\\r\\n Left click and drag \\r\\n | \\r\\n\\r\\n One finger touch \\r\\n | \\r\\n
\\r\\n Zoom \\r\\n | \\r\\n\\r\\n Right click and drag \\r\\n | \\r\\n\\r\\n Two finger touch \\r\\n | \\r\\n
\\r\\n Move \\r\\n | \\r\\n\\r\\n Mouse wheel click and drag \\r\\n | \\r\\n\\r\\n Pinch \\r\\n | \\r\\n
\\r\\n Slab (move forward/backward through the structure) \\r\\n | \\r\\n\\r\\n Mouse wheel roller \\r\\n | \\r\\n\\r\\n Three finger touch \\r\\n | \\r\\n
Can modENCODE data be added for Drosophila species?
", "answer": "modENCODE information can be uploaded through the BAM plugin, which allows Ensembl users to view the content of BAM files in the context of a reference genome simply by placing the relevant files on a local HTTP server and performing a simple configuration step in Ensembl. Any data represented in BAM files (or other common format files) can be uploaded to the browser in a similar way. Information on uploading data can be found here.
", "division": ["metazoa"]} 120522 2020-08-21 09:51:08 128249 2023-05-20 21:08:18 live \N \N