Send to

Choose Destination
Mol Med. 2017 Nov;23:285-294. doi: 10.2119/molmed.2017.00100. Epub 2017 Aug 31.

Efficient genome-wide association in biobanks using topic modeling identifies multiple novel disease loci.

Author information

Center for Quantitative Health, Division of Clinical Research and Center for Human Genetic Research, Massachusetts General Hospital, Boston, MA 02114.
Partners Research Information Systems and Computing, Partners HealthCare System, One Constitution Center, Boston, MA 02129.


Biobanks and national registries represent a powerful tool for genomic discovery, but rely on diagnostic codes that may be unreliable and fail to capture the relationship between related diagnoses. We developed an efficient means of conducting genome-wide association studies using combinations of diagnostic codes from electronic health records (EHR) for 10845 participants in a biobanking program at two large academic medical centers. Specifically, we applied latent Dirichilet allocation to fit 50 disease topics based on diagnostic codes, then conducted genome-wide common-variant association for each topic. In sensitivity analysis, these results were contrasted with those obtained from traditional single-diagnosis phenome-wide association analysis, as well as those in which only a subset of diagnostic codes are included per topic. In meta-analysis across three biobank cohorts, we identified 23 disease-associated loci with p<1e-15, including previously associated autoimmune disease loci. In all cases, observed significant associations were of greater magnitude than for single phenome-wide diagnostic codes, and incorporation of less strongly-loading diagnostic codes enhanced association. This strategy provides a more efficient means of phenome-wide association in biobanks with coded clinical data.


ICD9; biobank; cluster analysis; coded clinical data; genetic association; genome-wide association; latent dirichilet allocation; registry; topic modeling

[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for BioMed Central Icon for PubMed Central
Loading ...
Support Center