The Electronic Medical Records and Genomics (eMERGE) Network ( is an NIH-funded consortium of five institutions with DNA data linked to electronic medical records. (All of the institutions agreed to contribute their genomic association results to dbGAP at the National Library of Medicine.) The goal of the consortium is to assess the utility of electronic medical records (EMRs) as resources for genomic science. The project includes an ethics component, community engagement, and the use of natural language processing to interpret EMRs. Each institution individually had proposed a genome-wide association study (GWAS) of about 3,000 subjects with a particular phenotype of interest (e.g. type 2 diabetes, cataracts, dementia, heart disease, and peripheral vascular disease) and an associated comparison group.

Several important lessons have been learned from the consortium’s experience. First, patient data, obtained during the normal course of clinical care, has proven to be a valid source for replicating genome-phenome associations that previously had been reported only in carefully qualified research cohorts. Second, although the individual institutions initially thought that they had large enough effect sizes and odds ratios to be adequately powered, in most cases, the entire network was needed to determine genome-wide association. Third, high-quality EMR-derived phenotypes require four elements: codes (including ICD codes, though codes have to be repeated multiple times to gain validity), laboratory-medicine results, medication histories, and natural language processing of physician comments. The ability to extract high-quality phenotypes from narrative text is essential along with codes, laboratory results, and medication histories to get high predictive values. Fourth, although the five electronic medical systems have widely varying structures, coding systems, user interfaces, and users, once validated at one site, the information transported across the network with almost no degradation of its specificity and precision.

Another lesson of critical importance was that the major impediments that the eMERGE Consortium has had to address are policy related, rather than technical. For instance, a particular challenge has been to achieve both meaningful data sharing and respect for patient privacy concerns, while adhering to applicable regulations and laws (Kho et al. 2011; Masys 2011; McGuire et al. 2011) (eMERGE has addressed this issue, in part, by developing a simplified Data Use Agreement—see Appendix D.)

