README file for the path ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/clinvar_variation/beta/ Last updated July 19, 2017 This directory contains a beta release of XML for Variation ClinVar accessions, or VCVs. File names will be constructed in the format of ClinVarVariationRelease_YYYY-MMDD.xml.gz where YYYY, MM, and DD are the year, month and day the file was created. Data in ClinVarVariationRelease is aggregated by the VariationID, which represents the variant or set of variants that were interpreted for clinical or functional significance, or that were components of such interpretations. This aggregation of data is assigned an accession number, with the prefix VCV (Variation in ClinVar) followed by nine digits. The digits correspond to the same Variation ID reported on ClinVar's web site and ClinVarFullRelease files, padded with preceding zeros. The accessions are versioned, with versions incremented when new or updated submissions are processed for the same VariationID. In the beta release, all VCV accesions will have version 1; the version number will not increment until the production release. This file has been constructed to make it easier for users who want to access all data for a variant or set of variants, not separated by the diseases for which they have been interpreted. The content is expected to be equivalent to data in ClinVarFullRelease, just organized differently. As with ClinVarFullRelease, some content in ClinVarVariationRelease is aggregated across all information in ClinVar for the same VariationID, while other elements, namely the /ClinVarVariationRelease/VariationArchive/InterpretedRecord/ClinicalAssertionList path, represent contributions from each submission. Please note: consistency checks between ClinVarFullRelease and ClinVarVariationRelease are still being polished, so in the beta phase there may be some differences in content. We will retain ClinVarFullRelease, the archive of ClinVar data aggregated by accessions beginning with RCV, corresponding to a variant-disease pair. Updates to ClinVarVariationRelease in the beta phase will be done irregularly and as needed, in response to development updates and bug reports. Updates to ClinVarVariationRelease will use the same snapshot of data as the weekly update for ClinVarFullRelease. Features in ClinVarVariationRelease include: - explicit elements to distinguish records for simple alleles vs. haplotypes vs. genotypes - explicit elements to distinguish between variants that were directly interpreted vs. variants that were interpreted only as part of a haplotype or genotype (i.e. "included" variants). The clinical significance for included variants is indicated as "no interpretation for the single variant". Some features are not yet included in ClinVarVariationRelease but will be added before the production release: - a history indicating accessions that were merged into the current accession (Replaces element) - a section to map the submitted name or identifier for the interpreted condition to the corresponding name used in ClinVar and MedGen CUI - a complementary file of deleted VCV accessions - certain types of variant sets are not yet included in the release: diplotypes, phase unknown, different chromosomes We anticipate that beta release will last six weeks; after that, we will move into a production mode. During the beta release, we ask our XML users to review the file and send feedback and error reports to clinvar@ncbi.nlm.nih.gov. See also: * the XSD for ClinVarVariationRelease: ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/clinvar_variation/beta/variation_archive.xsd When in production mode, variation_archive.xsd will be versioned and provided from ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/xsd_public * the updated ClinVar Data Dictionary: https://www.ncbi.nlm.nih.gov/projects/clinvar/ClinVarDataDictionary.pdf