File migration ============== GRCh38 files have moved to the GRCh38 folder. Further details are in CHANGELOG.md. The gff file of regulatory features in the top-level folder is there for backward compatibility. It will be removed in release 114. Files in the GRCh38 folder ========================== Annotation ---------- Annotation files are in the `GRCh38/annotation` folder. ### Regulatory features The file name follows this convention: `..regulatory_features.v.gff3.gz` Standard GFF3 is followed, sorted by coordinates. - Source: Ensembl - Start & End: These define the core region, which is typically used for downstream analysis - Type: feature types are from the Sequence Ontology. For Regulation, these values are: - enhancer - promoter - open_chromatin_region - CTCF_binding_site - TF_binding_site - Attributes: ID and the extended region are listed here. For promoters, the attributes include the associated gene(s). ### EMARs EMARs are regions that are accessible and epigenetically modified in at least one epigenome. The file name follows the convention: `..EMARs.v.gff.gz` ### Regulatory activity The file name follows the convention `..regulatory_activity.v.tsv.gz`. The first two columns are: feature_id and feature_type. The remaining columns are names of the epigenomes. ### Motif features For each transcription factor (TF) for which a ChIP-seq data set is part of the Ensembl Regulation resources and a position weight matrix (PWM) is available, we annotate the position of putative TF binding sites within the ChIP-seq peaks. The file name follows the convention: `..motif_features.v.gff3.gz`.