README for ftp.ncbi.nlm.nih.gov/pub/medgen Last updated: September 11, 2014 There are multiple files maintained by NCBI that are related to medical genetics. This README summarizes files available by ftp not only in the medgen path, but also on other paths. ================================================================================ MedGen ================================================================================ Files in this directory are updated weekly, on Wednesdays. Each .RFF file is structured according to the following conventions: A vertical bar (|) is used as delimiter The first line in each file begins with a hash (#) and provides the column names. When appropriate, names of the columns are consistent with those used by UMLS. Many of the values come from UMLS as well (fields with names in lower case have no counterpart in UMLS). This document provides more information about the abbreviations used by UMLS. ============================================================ MERGED.RRF ============================================================ Pairs of concept identifiers (CUI) that have been merged. CUI concept unique identifier that has been replaced to CUI current concept identifier ============================================================ MGCONSO.RRF ============================================================ Summary data for each concept identifier. CUI concept unique identifier TS term status: P: preferred LUI (unique identifier for term i.e. lexically similar strings) S: non-preferred LUI STT string type: PF: preferred form of term VCW: case and word-order variant of the preferred form VC: case variant of the preferred form VO: variant of the preferred form VW: word-order variant of the preferred form ISPREF Is this term preferred in the set of terms from the source? (Y/N) AUI atom unique identifier, where an atom is one term from a source SAUI source-asserted atom unique identifier, i.e. the source's identifier for one term. Often null. SCUI source-asserted concept unique identifier, i.e. the source's identifier for a concept that may include multiple terms SDUI source-asserted descriptor unique identifier SAB abbreviation for the source of the term (Defined here) TTY type of term as defined by the source CODE unique identifier or code for the term provided by the source STR string, i.e. the term value SUPPRESS suppressed by UMLS curators ============================================================ MGDEF.RFF ============================================================ Summary data for definitions and sources of concepts. CUI concept unique identifier DEF concept definition. Please note that some values in the DEF column contain internal line feeds. The line separator for RRF files is '|\n'. The line separator within the DEF column of MGDEF.RRF is '\r', CR (Carriage return, '\r', 0x0D, 13 in decimal). Unix/Linux and windows tool sometimes behave differently on these formats. If this format is problematic for you, consider use of the comma-separated value (csv) files in the csv subdirectory. source sources that contribute strings or relationships to the UMLS Metathesaurus SUPPRESS suppressed by UMLS curators ============================================================ MGREL.RRF ============================================================ Summary data for relationship between concepts. CUI1 first concept unique identifier AUI1 first atom unique identifier, where an atom is one term from a source STYPE1 the name of the column in MRCONSO.RRF that contains the first identifier to which the relationship is attached REL relationship label CUI2 second concept unique identifier AUI2 second atom unique identifier, where an atom is one term from a source RELA additional relationship label RUI relationship unique identifier SAB abbreviation for the source of the term (Defined here) SL source of relationship label SUPPRESS suppressed by UMLS curators ============================================================ MGSAT.RRF ============================================================ Summary data for concepts' attributes. CUI concept unique identifier METAUI UMLS Metathesaurus asserted unique identifier STYPE the name of the column in MRCONSO.RRF that contains the identifier to which the attribute is attached CODE unique identifier or code for the term provided by the source ATUI attribute unique identifier ATN attribute name SAB abbreviation for the source of the term (Defined here) ATV attribute value SUPPRESS suppressed by UMLS curators ============================================================ MGSTY.RRF ============================================================ Summary data for semantic types. CUI concept unique identifier TUI semantic type unique identifier STN semantic type tree number STY semantic type ATUI attribute unique identifier ============================================================ NAMES.RRF ============================================================ Summary data for concept names and sources. CUI concept unique identifier name concept name source sources that contribute strings or relationships to the UMLS Metathesaurus SUPPRESS suppressed by UMLS curators ============================================================ medgen_pubmed ============================================================ Summary data for MedGen and PubMed links. UID MedGen unique identifier CUI concept unique identifier NAME concept name PMID PubMed unique identifier ============================================================ MedGen_HPO_Mapping.txt ============================================================ Report of MedGen's processing of terms from Human Phenotype Ontology (HPO) CUI concept unique identifier SDUI Identifier from HPO HpoStr term from HPO MedGenStr preferred term in MedGen MedGenStr_SAB Source of the term in MedGen STY semantic type ============================================================ MedGen_HPO_OMIM_Mapping.txt ============================================================ Report of MedGen's processing of terms from Human Phenotype Ontology (HPO) and their relationships diagnostic terms from OMIM OMIM_CUI concept unique identifier assigned to a record from OMIM MIM_number MIM number defining the record from OMIM OMIM_name preferred term from OMIM relationship relationship of the term from HPO to the record from OMIM. Constructions like 'not_manifestation_of' are used to represent the 'not' qualifier for a relationship. HPO_CUI Concept UID (CUI) assiged to the term from HPO HPO_name preferred term from HPO HPO_CUI Concept UID (CUI) assiged to the term from HPO MedGen_name preferred term used in MedGen MedGen_source source of the term used preferentially by MedGen STY semantic type ============================================================ MedGen_CUI_history.txt ============================================================ Tab-delimited report of changes in CUI in MeGen and the dates the changes were made. Previous_CUI The CUI that was deprecated Current_CUI The CUI that is now current Date_Of_Action The month and year this happened. ================================================================================ ================================================================================ Subdirectories ================================================================================ ================================================================================ ------------------------------------------------------------ ftp.ncbi.nlm.nih.gov/pub/medgen/csv/ ------------------------------------------------------------ The csv subdirectory contains a set of comma-separated files (csv) corresponding to the RRF files in the main path. Some of the files are split to allow loading into spreadsheet software (maximum 1,000,000 lines per file). The csv files also facilitate processing of the MGDEF.RRF file because the DEF column may contain internal line feeds. ------------------------------------------------------------ ftp.ncbi.nlm.nih.gov/pub/medgen/presentations ------------------------------------------------------------ The presentations subdirectory contains presentations related to MedGen. There is one file at present, Conditions_Phenotypes.pptx, which describes how data in MedGen and other resources can be used to identify terms and identifiers to include in submissions to ClinVar and GTR. ================================================================================ ================================================================================ Other sites ================================================================================ ================================================================================ ------------------------------------------------------------ ftp.ncbi.nlm.nih.gov/pub/clinvar/ ------------------------------------------------------------ The files discussed in the NAMES OF PHENOTYPES section of the README contain information about condition names used by ClinVar and GTR. These files are: disease_names gene_condition_source_id ConceptID_history.txt ------------------------------------------------------------ Gene's ftp site ------------------------------------------------------------ ftp://ftp.ncbi.nih.gov/gene/DATA/mim2gene_medgen A report of identifiers from OMIM, whether they are genes or conditions, and corresponding data in Gene and MedGen. Described in ftp://ftp.ncbi.nih.gov/gene/README.