Defining Phenotypes from Clinical Data to Drive Genomic Research

Annu Rev Biomed Data Sci. 2018 Jul:1:69-92. doi: 10.1146/annurev-biodatasci-080917-013335. Epub 2018 Apr 25.

Abstract

The rise in available longitudinal patient information in electronic health records (EHRs) and their coupling to DNA biobanks has resulted in a dramatic increase in genomic research using EHR data for phenotypic information. EHRs have the benefit of providing a deep and broad data source of health-related phenotypes, including drug response traits, expanding the phenome available to researchers for discovery. The earliest efforts at repurposing EHR data for research involved manual chart review of limited numbers of patients but now typically involve applications of rule-based and machine learning algorithms operating on sometimes huge corpora for both genome-wide and phenome-wide approaches. We highlight here the current methods, impact, challenges, and opportunities for repurposing clinical data to define patient phenotypes for genomics discovery. Use of EHR data has proven a powerful method for elucidation of genomic influences on diseases, traits, and drug-response phenotypes and will continue to have increasing applications in large cohort studies.

Keywords: GWAS; PheWAS; biobank; electronic health record; genomics; phenotyping.