Send to

Choose Destination
Addict Biol. 2019 Sep;24(5):1056-1065. doi: 10.1111/adb.12670. Epub 2018 Oct 4.

Using DNA methylation to validate an electronic medical record phenotype for smoking.

Author information

Veterans Affairs Connecticut Healthcare System, West Haven, CT, USA.
Yale School of Medicine, New Haven, CT, USA.
Yale School of Public Health, New Haven, CT, USA.
VISN 4 MIRECC, Crescenz VAMC, Philadelphia, PA, USA.
University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA.
Vanderbilt University Medical Center, Nashville, TN, USA.
Geriatric Research Education and Clinical Centers (GRECC), Veterans Affairs Tennessee Valley Healthcare System, Nashville, TN, USA.
University of Washington, Seattle, WA, USA.


A validated, scalable approach to characterizing (phenotyping) smoking status is needed to facilitate genetic discovery. Using established DNA methylation sites from blood samples as a criterion standard for smoking behavior, we compare three candidate electronic medical record (EMR) smoking metrics based on longitudinal EMR text notes. With data from the Veterans Aging Cohort Study (VACS), we employed a validated algorithm to translate each smoking-related text note into current, past or never categories. We compared three alternative summary characterizations of smoking: most recent, modal and trajectories using descriptive statistics and Spearman's correlation coefficients. Logistic regression and area under the curve analyses were used to compare the associations of these phenotypes with the DNA methylation sites, cg05575921 and cg03636183, which are known to have strong associations with current smoking. DNA methylation data were available from the VACS Biomarker Cohort (VACS-BC), a sub-study of VACS. We also considered whether the associations differed by the certainty of trajectory group assignment (<0.80/≥0.80). Among 140 152 VACS participants, EMR summary smoking phenotypes varied in frequency by the metric chosen: current from 33 to 53 percent; past from 16 to 24 percent and never from 24 to 33 percent. The association between the EMR smoking pairs was highest for modal and trajectories (rho = 0.89). Among 728 individuals in the VACS-BC, both DNA methylation sites were associated with all three EMR summary metrics (p < 0.001), but the strongest association with both methylation sites was observed for trajectories (p < 0.001). Longitudinal EMR smoking data support using a summary phenotype, the validity of which is enhanced when data are integrated into statistical trajectories.


DNA methylation; EMR smoking; smoking methylation

[Available on 2020-09-01]

Supplemental Content

Full text links

Icon for Wiley
Loading ...
Support Center