Send to

Choose Destination
JAMA Dermatol. 2018 Dec 5. doi: 10.1001/jamadermatol.2018.4473. [Epub ahead of print]

Use of Big Data to Estimate Prevalence of Defective DNA Repair Variants in the US Population.

Author information

Laboratory of Cancer Biology and Genetics, Center for Cancer Research, National Cancer Institute, Bethesda, Maryland.
Academy Enrichment Program Scholar, Office of Intramural Training & Education, Office of the Director, National Institutes of Health, Bethesda, Maryland.
Human Genetics Program, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland.



Wide use of genomic sequencing to diagnose disease has raised concern about the extent of genotype-phenotype correlations.


To correlate disease-associated allele frequencies with expected and reported prevalence of clinical disease.

Design, Setting, and Participants:

Xeroderma pigmentosum (XP), a recessive, cancer-prone, neurocutaneous disorder, was used as a model for this study. From January 1, 2017, to May 4, 2018, the Human Gene Mutation Database and a cohort of patients at the National Institutes of Health were searched and screened to identify reported mutations associated with XP. The clinical phenotype of these patients was confirmed from reports in the literature and National Institutes of Health medical records. The genetically predicted prevalence of disease based on frequency of known pathogenic mutations was compared with the prevalence of patients clinically diagnosed with phenotypic XP. Exome sequencing of more than 200 000 alleles from the Genome Aggregation Database, the National Cancer Institute Division of Cancer Epidemiology and Genetics database of healthy controls, and an Inova Hospital Study database was used to investigate the frequencies of these mutations in the general population.

Main Outcomes and Measures:

Listing of all reported mutations associated with XP, their frequencies in 3 large exome sequence databases, determination of the number of patients in the United States with XP using modeling equations, and comparison of the observed and reported numbers of patients with XP with specific mutations.


A total of 156 pathogenic missense and nonsense mutations associated with XP were identified in the National Institutes of Health cohort and the Human Gene Mutation Database. The Genome Aggregation Database provided frequency data for 65 of these mutations, with a total allele frequency of 1.13%. The XPF (ERCC4) mutation, p.P379S, had an allele frequency of 0.4%, and the XPC mutation, p.P334H, had an allele frequency of 0.3%. With the Hardy-Weinberg equation, it was determined that there should be more than 8000 patients who are homozygous for these mutations in the United States. In contrast, only 3 patients with XP were reported as having the XPF mutation, and 1 patient was reported as having the XPC mutation.

Conclusions and Relevance:

The findings from this study suggest that clinicians should approach large genomic databases with caution when trying to correlate the clinical implications of genetic variants with the prevalence of disease risk. Unsuspected mutations in known genes with a predisposition for skin cancer may be responsible for some of the high frequency of skin cancers in the general population.

Supplemental Content

Full text links

Icon for Silverchair Information Systems
Loading ...
Support Center