Directory layout ---------------- Each species directory contains one or more assembly directories for which we have data. The assembly directories contain the following directories: - `peaks`: bigBed files - `signal`: bigWig files - `annotation`: regulatory annotation (only available for reference assemblies) Annotation ---------- Annotation files are in the `//annotation` folder. ### Regulatory features The file name follows this convention: `..regulatory_features.v.gff3.gz` Standard GFF3 is followed, sorted by coordinates. - Source: Ensembl - Start & End: These define the core region, which is typically used for downstream analysis - Type: feature types are from the Sequence Ontology. For Regulation, these values are: - enhancer - promoter - open_chromatin_region - CTCF_binding_site - TF_binding_site - Attributes: ID and the extended region are listed here. For promoters, the attributes include the associated gene(s). ### EMARs EMARs are regions that are accessible and epigenetically modified in at least one epigenome. The file name follows the convention: `..EMARs.v.gff.gz` ### Regulatory activity The file name follows the convention `..regulatory_activity.v.tsv.gz`. The first two columns are: feature_id and feature_type. The remaining columns are names of the epigenomes. ### Motif features Motif features are only available for species which have TF ChIP-seq data and corresponding position weight matrices (PWMs) available. For these, we annotate the position of putative TF binding sites within the ChIP-seq peaks. The file name follows the convention: `..motif_features.v.gff3.gz`.