Format

Send to

Choose Destination
Am J Hum Genet. 2019 Nov 7;105(5):974-986. doi: 10.1016/j.ajhg.2019.09.027. Epub 2019 Oct 24.

A Genocentric Approach to Discovery of Mendelian Disorders.

Author information

1
Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA.
2
Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA.
3
Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA.
4
Duke Molecular Physiology Institute, Duke University Medical Center, Durham, NC 27710, USA; Department of Medicine, Duke University Medical Center, Durham, NC 27710, USA.
5
Pediatric Genetic and translational Medicine Center (P-GeM), Stanley Manne Children's Research Institute, Chicago, IL 60611, USA; Department of Pediatrics, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA.
6
Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Texas Children's Hospital, Houston, TX 77030, USA.
7
Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA; Texas Children's Hospital, Houston, TX 77030, USA; Department of Pediatrics, Baylor College of Medicine, Houston, TX 77030, USA.
8
Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA; School of Public Health, UTHealth, Houston, TX 77030, USA.
9
Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA. Electronic address: agibbs@bcm.edu.

Abstract

The advent of inexpensive, clinical exome sequencing (ES) has led to the accumulation of genetic data from thousands of samples from individuals affected with a wide range of diseases, but for whom the underlying genetic and molecular etiology of their clinical phenotype remains unknown. In many cases, detailed phenotypes are unavailable or poorly recorded and there is little family history to guide study. To accelerate discovery, we integrated ES data from 18,696 individuals referred for suspected Mendelian disease, together with relatives, in an Apache Hadoop data lake (Hadoop Architecture Lake of Exomes [HARLEE]) and implemented a genocentric analysis that rapidly identified 154 genes harboring variants suspected to cause Mendelian disorders. The approach did not rely on case-specific phenotypic classifications but was driven by optimization of gene- and variant-level filter parameters utilizing historical Mendelian disease-gene association discovery data. Variants in 19 of the 154 candidate genes were subsequently reported as causative of a Mendelian trait and additional data support the association of all other candidate genes with disease endpoints.

KEYWORDS:

HARLEE; Hadoop; Mendelian disease; big data; clan genomics; data lake; developmental disorder; genotype-first; ultra-rare; whole-exome sequencing

PMID:
31668702
PMCID:
PMC6849092
[Available on 2020-05-07]
DOI:
10.1016/j.ajhg.2019.09.027

Supplemental Content

Full text links

Icon for Elsevier Science
Loading ...
Support Center