Format

Send to

Choose Destination
Hum Mol Genet. 2015 Apr 1;24(7):1908-17. doi: 10.1093/hmg/ddu607. Epub 2014 Dec 8.

Missense variants in CFTR nucleotide-binding domains predict quantitative phenotypes associated with cystic fibrosis disease severity.

Author information

1
Department of Biomedical Engineering and Institute for Computational Medicine, The Johns Hopkins University, Baltimore, MD, USA.
2
McKusick-Nathans Institute of Genetic Medicine.
3
Department of Biomedical Engineering and Institute for Computational Medicine, The Johns Hopkins University, Baltimore, MD, USA, Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, MD, USA karchin@jhu.edu.

Abstract

Predicting the impact of genetic variation on human health remains an important and difficult challenge. Often, algorithmic classifiers are tasked with predicting binary traits (e.g. positive or negative for a disease) from missense variation. Though useful, this arrangement is limiting and contrived, because human diseases often comprise a spectrum of severities, rather than a discrete partitioning of patient populations. Furthermore, labeling variants as causal or benign can be error prone, which is problematic for training supervised learning algorithms (the so-called garbage in, garbage out phenomenon). We explore the potential value of training classifiers using continuous-valued quantitative measurements, rather than binary traits. Using 20 variants from cystic fibrosis transmembrane conductance regulator (CFTR) nucleotide-binding domains and six quantitative measures of cystic fibrosis (CF) severity, we trained classifiers to predict CF severity from CFTR variants. Employing cross validation, classifier prediction and measured clinical/functional values were significantly correlated for four of six quantitative traits (correlation P-values from 1.35 × 10(-4) to 4.15 × 10(-3)). Classifiers were also able to stratify variants by three clinically relevant risk categories with 85-100% accuracy, depending on which of the six quantitative traits was used for training. Finally, we characterized 11 additional CFTR variants using clinical sweat chloride testing, two functional assays, or all three diagnostics, and validated our classifier using blind prediction. Predictions were within the measured sweat chloride range for seven of eight variants, and captured the differential impact of specific variants on the two functional assays. This work demonstrates a promising and novel framework for assessing the impact of genetic variation.

PMID:
25489051
PMCID:
PMC4366609
DOI:
10.1093/hmg/ddu607
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Silverchair Information Systems Icon for PubMed Central
Loading ...
Support Center