MedGen FAQ
MedGen Documentation Pages
| Overview | Searching MedGen | Navigating MedGen records | MedGen Data Processing and Curation | FAQ | Definition Sources |
- Searching MedGen
- PubMed results in MedGen
- Professional practice guidelines
- Data and scope of concepts in MedGen
- Why aren't all terms from SNOMED CT in MedGen?
- How are outdated data processed?
- Which UMLS codes are excluded from or included in MedGen?
- How can I extract a report of MedGen identifiers and their relationships to MIM numbers and HPO identifiers?
- How can I extract a report of MedGen identifiers and their relationships to other concepts (such as hierarchies)?
If you have a question that is not answered by this document, please contact the MedGen team.
Searching MedGen
How can I find the names of disorders that are caused by a particular gene?
By submitting a query based on gene symbol.
There are 3 options to query MedGen based on gene symbol: Option 1: Enter a gene symbol in MedGen's query box, the result will include all records with that term anywhere in the text. To limit the results to disorders thought to be affected by altered function of that gene, click on the link at the top of the Search Results page that reads See GENE SYMBOL in MedGen. The number in parentheses at the end of that phrase identifies the number of records in MedGen reported as being caused by altered function of the GENE SYMBOL (e.g. CFTR). Option 2: Enter the gene symbol followed by [gene] in the search box (e.g. "CFTR[gene]"). Option 3: Click the Advanced link below the MedGen query box, In the Search Builder, select “Gene Name” from the menu. Enter the gene symbol in the search box and click the Search button.
By starting with the Gene database
Within the ‘Phenotypes’ section of a Gene page, there is a list of names of disorders with links to MedGen. Or within a Gene record, follow the MedGen link in the Related Information section at the right.
When I search by a MIM number, why do I sometimes get multiple records?
There are two major data flows that manage relationships between OMIM numbers and records in MedGen. One is the daily update provided by GTR- and ClinVar-related data flows from OMIM. The second is the semi-annual update from UMLS to MedGen. In the former data flow, the relationship of MedGen record to MIM number is 1:1. In the latter data flow the OMIM number may be reported for more than one concept UID or CUI.
PubMed results in MedGen
How are references chosen for the Recent clinical studies section in MedGen?
The citations listed in the Recent clinical studies are not curated, but provided computationally by using the Clinical Queries tool maintained by PubMed. The query that is used is the preferred name of the record.
To find additional literature related to the diseases in MedGen, you can use the "PubMed" link in the right-hand discovery panel under the "Related information" section. This link will open a new PubMed search based on the disease record in MedGen and cover a wider range of topics related to the disease. From there, users can apply filters and additional search criteria as needed.
How are relationships between MedGen and PubMed computed?
The links between records in MedGen and PubMed are generated by a combination of curation and computation. For those that are computed, the preferred term in MedGen is used to query PubMed, either limiting to matches in the title+ abstract of the paper, or limiting to matches to articles, once indexed to MeSH terms, that are indexed to have a genetic component. When non informative terms are identified, they are added to a 'stop list' to prevent future false positives.
Professional practice guidelines
What are the search criteria that identify Practice Guidelines from PubMed?
MedGen offers a detailed PubMed-based search pre-built for clinicians to find disease-specific practice guidelines. On each condition page, you will find the top results from our curated query of PubMed articles that has been specifically tailored to capture a wide range of practice guidelines for that condition. Additionally, there is a link to see the full search results in PubMed. If the PubMed search does not return any articles, we provide a search for a related broader concept to assist you in finding the most relevant information.
The query itself is given in full below, where the "Condition name" is MedGen's preferred name for the specific record in MedGen. It utilizes PubMed's Proximity Search feature to allow for minor deviations in word order for the condition name within the Title or Abstract of the publication. The search filters are selected to return articles indexed as practice guidelines, only articles available in English and will exclude articles that are case reports, clinical studies or randomized controlled trials. The additional terms in the search query include commonly used phrases from published practice guidelines, for example: "Genetic screening" or "Evidence-based guideline." While this complex query has been designed to be broad enough to cover a variety of phrases and concepts used in Practice Guideline publications, it may capture articles that do not fully conform to the expectation of a practice guideline nor will it identify all published practice guidelines.
("[condition name]"[tiab:~0]) AND ("english and humans"[Filter]) AND ( ("practice guideline"[Filter]) OR (practice*[titl] AND (guideline[titl] OR parameter[titl]
OR resource[titl] OR bulletin[titl] OR best[titl])) OR (genetic*[titl] AND (evaluation[titl] OR counseling[titl] OR screening[titl] OR test*[titl])) OR (clinical[titl]
AND ((expert[titl] AND consensus[titl]) OR utility[titl] OR guideline*[titl])) OR (management[titl] AND (clinical[titl] OR diagnos*[titl] OR recommendation[titl]
OR pain[titl] OR surveillance[titl] OR emergency[titl] OR guideline*[titl] OR therap*)) OR (treatment[titl] AND ((evaluation[titl] AND diagnosis[titl])
OR (assessment[titl] AND prevention[titl]) OR therap*)) OR (Diagnos*[titl] AND (prenatal[titl] OR treatment[titl] OR follow-up[titl] OR statement[titl]
OR criteria[titl] OR newborn[titl] OR differential[titl] OR neonatal[titl] OR neonate[titl])) OR (guideline*[titl] AND (pharmacogenetic*[titl]
OR recommendation[titl] OR therap*[titl] OR evidence-based[titl] OR consensus[titl] OR (technical[titl] AND standard*[titl]) OR (molecular[titl] AND testing[titl])))
OR (risk[titl] AND assessment[titl]) OR (recommendation*[titl] AND (statement[titl] OR Evidence-based[titl] OR Consensus[titl]))
OR (care AND ((Patient[titl] AND standard*[titl]) OR primary[titl] OR psychosocial[titl])) OR (Health[titl] AND supervision[titl])
OR (statement[titl] AND (policy[titl] OR position[titl] OR Consensus[titl])) OR (pharmacogenetics[titl] AND (Dosing[titl] OR therap*[titl] OR genotype*[titl] OR drug*[titl]))
OR (Chemotherapy[titl] AND decision*[titl]) OR (screening[titl] AND (newborn[titl] OR neonat*[titl] OR detection[titl] OR diagnos*[titl]))
OR (criteria[titl] OR genotype*[titl]) ) NOT ("Case reports"[Publication type] OR "clinical study"[Publication Type] OR "randomized controlled trial"[Publication Type])
What are the search criteria that identify practice guidelines on Bookshelf?
The query for titles and chapters on NCBI Bookshelf that are reported as Practice Guidelines utilizes the Publication Type and Resource Type fields. If there is a match on the preferred condition name from MedGen in any Bookshelf publication that is a "clinical guideline" or "practice guideline" it will be returned by this search. The publication type is assigned by NLM Cataloguers, and the resource type is selected by the provider of the resource. This may include resources that are broader in scope and may not capture all publications on the Bookshelf that could be considered practice guidelines.
(("clinical guidelines"[Resource Type]) OR "practice guideline"[Publication Type]) AND "[condition name]"
Are the queries for PubMed and Bookshelf Practice Guidelines comprehensive of all possible publications?
No. These queries rely on criteria such as how articles are indexed and the disease names used. We cannot guarantee that these searches will capture all potentially relevant practice guidelines, we rigorously tested various search criteria to ensure we would be able to return highly relevant publications. If there are specific practice guidelines that are not returned by these results, you can help us improve MedGen by reporting them to us at medgen_help@ncbi.nlm.nih.gov.
What are the Curated Practice Guidelines?
The team of Medical Genetics Curators at NCBI have identified professional and medical societies that issue practice guidelines which are not included in PubMed or Bookshelf. We add these manually to display under the section "Curated Practice Guidelines". To ensure current guidelines are available to the MedGen community, we review these organizations' websites regularly to identify new, updated, or retired guidelines and update them in this section of MedGen. If there are specific practice guidelines that are not listed in this section, you can help us improve MedGen by reporting them to us at medgen_help@ncbi.nlm.nih.gov.
Data and scope of concepts in MedGen
Why aren't all terms from SNOMED CT in MedGen?
MedGen includes terms and their identifiers from SNOMED CT based only on the semi-annual releases from UMLS. Thus, MedGen may be up to 6 months out of date with SNOMED CT. MedGen also limits its scope to concepts of interest to Medical Genetics and Infectious Diseases. So some SNOMED CT terms are not included, no matter how long they have been established, because they are out of scope, e.g. immunologic factors.
How are outdated data processed?
Following data releases from our authoritative sources, MedGen reconciles and remaps disease concepts to updated identifiers. If a source retires their identifier, MedGen no longer displays the identifier, but the identifier is stored in the record history. If UMLS retires a concept unique identifier (CUI) MedGen will generate a CN formatted CUI replacement to support the disease concept as represented by authoritative sources. When a source retires a disease name or attribute, and there is another source for the same disease name or attribute, the source is removed but the attribute remains active with the current source. Reports on MedGen’s FTP site may include these retired sources for disease names, but the data will not display on MedGen webpages nor in the XML version of the record. When a source retires a disease name or attribute, and there is no other source for the same disease name or attribute, the disease name or attribute and the source are removed from the website, XML, and FTP reports. When there are no authoritative sources to support a concept, but the concept is needed to support conditions tested in the NIH Genetic Testing Registry (GTR) or variant interpretations in ClinVar, MedGen will retain the concept with a CN identifier. Disease concepts with no authoritative source attributes and not needed by GTR or ClinVar submissions, are removed from public display and reports on the FTP site. MedGen’s internal database maintains records of the retired or “deleted” identifiers, names, or obsoleted records; they are flagged as “deleted” data but they are physically maintained within the database. The records can be restored if or when they are needed in MedGen to support submissions or authoritative terminologies. Tracking the history and evolution of records in MedGen dictates that we maintain these historic data points.
Which UMLS codes are excluded from or included in MedGen?
HPO, MeSH, OMIM, Orphanet, NCI, and SNOMED CT are all authoritative sources for MedGen and the identifiers from these sources are provided in MedGen for the relevant disease concepts and aligned to UMLS concept unique identifiers (CUIs) as much as possible while respecting distinctions from the original source. MedGen does not provide separate listings for genes, drugs, medical procedures or tests that may be within the scope of these vocabularies in UMLS. The "Semantic Type" (STY) categories from UMLS that are in-scope for MedGen are congenital abnormality, finding, molecular function, pathologic function, disease or syndrome, mental or behavioral dysfunction, cell or molecular dysfunction, sign or symptoms, anatomical abnormality and neoplastic process. Occasionally, additional STYs are included in MedGen as needed to represent the concepts from Human Phenotype Ontology (HPO) or to facilitate CUI matching to Mondo records. (Note: Mondo is not part of the UMLS.) The scope of records from HPO that are included in MedGen are limited to terms under the "Phenotypic abnormality" (HP:0000118). Concepts from the "Mode of Inheritance" (HP:0000005) branch are brought in as a controlled value list for data submissions to ClinVar and the NIH Genetic Testing Registry (GTR). HPO concepts from "Medical history", "clinical modifiers" and other branches of HPO are not in-scope for MedGen at this time.
How can I extract a report of MedGen identifiers and their relationships to MIM numbers and HPO identifiers?
There are multiple ways to access these data.
OMIM. If the starting point is the OMIM number, this file on Gene's ftp site reports the MedGen concept identifiers that match the phenotype records. Not all OMIM numbers have a corresponding record in MedGen; genes are out of scope as well as some named protein variants. ftp://ftp.ncbi.nih.gov/gene/DATA/mim2gene_medgen
This scope of OMIM records that will be represented in MedGen is under review and we welcome your feedback.
HPO. If the focus is data from HPO, then there are two files on MedGen's ftp site
- ftp://ftp.ncbi.nih.gov/pub/medgen/MedGen_HPO_Mapping.txt.gz
- ftp://ftp.ncbi.nih.gov/pub/medgen/MedGen_HPO_OMIM_Mapping.txt.gz
The README files at both sites provide all the details.
How can I extract a report of MedGen identifiers and their relationships to other concepts (such as hierarchies)?
MedGen's ftp site provides a compressed file named MGREL.RRF.gz, or in the csv folder, a series of files (split to make them managable), named MGREL_(number).csv. The fourth column, REL, includes the values PAR for parent, CHD for child, and SIB for sibling. These can be used in conjunction with the CUI1 and CUI2 values to construct hierarchies. The usage in MedGen for the REL column is consistent with that of UMLS. https://www.nlm.nih.gov/research/umls/knowledge_sources/metathesaurus/release/abbreviations.html
Last update: March 2025