Format

Send to

Choose Destination
Hum Mutat. 2019 Sep;40(9):1314-1320. doi: 10.1002/humu.23825. Epub 2019 Jun 24.

Predicting venous thromboembolism risk from exomes in the Critical Assessment of Genome Interpretation (CAGI) challenges.

Author information

1
Biomedical Informatics Training Program, Stanford University, Stanford, California.
2
Department of Dermatology, Stanford School of Medicine, Stanford, California.
3
Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas.
4
Department of Biochemistry & Molecular Biology, Baylor College of Medicine, Houston, Texas.
5
Department of Pharmacology, Baylor College of Medicine, Houston, Texas.
6
Computational and Integrative Biomedical Research Center, Baylor College of Medicine, Houston, Texas.
7
Innovations Labs, Tata Consultancy Services, Hyderabad, India.
8
Khoury College of Computer and Information Sciences, Northeastern University, Boston, Massachusetts.
9
Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, Washington.
10
Department of Computer Science and Informatics, Indiana University, Bloomington, Indiana.
11
BioFolD Unit, Department of Pharmacy and Biotechnology (FaBiT), University of Bologna, Bologna, Italy.
12
Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, New Jersey.
13
Department of Pharmacy and Biotechnology, Bologna Biocomputing Group, University of Bologna, Italy.
14
Institute of Biomembrane and Bioenergetics, Consiglio Nazionale delle Ricerche, Bari, Italy.
15
Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, Maryland.
16
Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Maryland.
17
Department of Plant and Microbial biology, University of California Berkeley, Berkeley, California.
18
Departments of Bioengineering, Biomedical Data Science, Genetics, and Medicine, Stanford University, Stanford, California.

Abstract

Genetics play a key role in venous thromboembolism (VTE) risk, however established risk factors in European populations do not translate to individuals of African descent because of the differences in allele frequencies between populations. As part of the fifth iteration of the Critical Assessment of Genome Interpretation, participants were asked to predict VTE status in exome data from African American subjects. Participants were provided with 103 unlabeled exomes from patients treated with warfarin for non-VTE causes or VTE and asked to predict which disease each subject had been treated for. Given the lack of training data, many participants opted to use unsupervised machine learning methods, clustering the exomes by variation in genes known to be associated with VTE. The best performing method using only VTE related genes achieved an area under the ROC curve of 0.65. Here, we discuss the range of methods used in the prediction of VTE from sequence data and explore some of the difficulties of conducting a challenge with known confounders. In addition, we show that an existing genetic risk score for VTE that was developed in European subjects works well in African Americans.

KEYWORDS:

exomes; machine learning; phenotype prediction; prediction challenge; venous thromboembolism

PMID:
31140652
DOI:
10.1002/humu.23825

Supplemental Content

Full text links

Icon for Wiley
Loading ...
Support Center