Jump to: Authorized Access | Attribution | Authorized Requests

Study Description

The electronic Medical Records and Genomics (eMERGE) Network is a consortium of five participating sites (Group Health Seattle, Marshfield Clinic, Mayo Clinic, Northwestern University, and Vanderbilt University) funded by the NHGRI to investigate the use of electronic medical record systems for genomic research. The goal of eMERGE is to conduct genome-wide association studies in approximately 19,000 individuals using EMR-derived phenotypes and DNA from linked Biorepositories.

Using electronic phenotyping methods, the consortium used DNA samples from all participating sites to explore the genetic determinants of red cell indices, white blood count (WBC) differential, diabetic retinopathy, height, serum lipid levels, specifically total cholesterol, HDL (high density lipoprotein), LDL (low density lipoprotein), and triglycerides, and autoimmune hypothyroidism as well as using the phenome-wide association study (PheWAS) paradigm to replicate and discover relationships between targeted genotypes with multiple phenotypes.

eMERGE led studies for which original genotyping was performed and are included in this merged set:

  • Genome-Wide Association Study on Cataract and HDL in the Personalized Medicine Research Project Cohort, phs000170
  • Development and Use of Network Infrastructure for High-Throughput GWA Studies, phs000234
  • Vanderbilt Genome-Electronic Records (VGER) Project: QRS Duration, phs000188
  • Northwestern NUgene Project: Type 2 Diabetes, phs000237
  • A Genome-Wide Association Study of Peripheral Arterial Disease, phs000203

Sites and participants include:

Vanderbilt University: BioVU, Vanderbilt's DNA databank, is an enabling resource for exploration of the relationships among genetic variation, disease susceptibility, and variable drug responses, and represents a key first step in moving the emerging sciences of genomics and pharmacogenomics from research tools to clinical practice. BioVU acquires DNA from discarded blood samples collected from routine patient care. The biobank is linked to de-identified clinical data extracted from Vanderbilt's EMR, which forms the basis for phenotype definitions used in genotype-phenotype correlations.

Marshfield Clinic: The Marshfield Clinic Personalized Medicine Research Project is a population-based biobank in central Wisconsin with more than 20,000 adult subjects who provided written, informed consent to access their medical records and provided a blood sample from which DNA was extracted and plasma and serum stored. In addition to an average of 30 years of medical history data, a questionnaire about environmental exposures, including a detailed food frequency questionnaire, is available to facilitate gene/environment studies.

Northwestern University: The NUgene Project is a repository with longitudinal medical information from participating patients at affiliated hospitals and outpatient clinics from the Northwestern University Medical Center. Participants' DNA samples are coupled with data from a self-reported questionnaire and continuously updated data from our Electronic Medical Record (EMR) representing actual clinical care events. Northwestern has a state-of-the art, comprehensive inpatient and outpatient EMR system of over 2 million patients. NUgene has broad access to participant data for all outpatient visits as well as inpatient data via a consolidated data warehouse. NUgene participants consent to distribution and use of their coded DNA samples and data for a broad range of genetic research by third-party investigators.

Group Health(GH)/University of Washington (UW): Aging and Dementia eMERGE study biorepository leverages rich population-based longitudinal data from both electronic medical records and in-depth research data to explore genome wide associations. Participants include Seattle-area members of GH (a large integrated health care system in Washington State) consented and enrolled in 1) the UW Alzheimer's Disease Patient Registry (ADPR) and 2) the Adult Changes in Thought (ACT) study. The ADPR (PI: Eric B. Larson; NIH/NIA U01 AG 006781) is a population-based registry of incident dementia cases designed to identify all new Alzheimer's Disease cases within GH from 1987 to 1996. Medical history, physical, laboratory testing, and neuropsychological testing were performed on all consenting potential cases for determination of dementia status by a consensus conference. The study base of the ADPR population was stable with an attrition rate of less than 1%/year. The ACT study (PI: Eric B. Larson; NIH/NIA U01 AG 006781) is an ongoing community-based cohort study of aging and dementia. The original cohort of 2,581 randomly selected dementia-free members age 65 and older was enrolled in 1994-1996 and expanded by 811 in 2000-2002. Continuous enrollment to maintain a cohort of 2,000 dementia free persons began in 2005. Participants receive biennial assessment including cognitive status determination. The ACT sub-sample is stable; for the original cohort, median enrollment in GH was 19 years prior to joining the ACT study, and 85% of the cohort has ≥10 years of GH enrollment. DNA for the ADPR participants were obtained through a companion study, Genetic Differences in Cases and Controls (PI: Walter Kukull; NIH/NIA R01 AG007584). DNA obtained through both studies were extracted from blood using Gentra Systems Puregene methods. DNA concentration is determined by UV optical density. All samples are checked for quality by 260/280 ratio. For long-term storage, samples are aliquoted and stored at -70°C.

Mayo Clinic: The Mayo biobank is a disease-specific biobank for vascular diseases including peripheral arterial disease (PAD). PAD patients were identified from individuals referred to the non-invasive vascular laboratory for lower extremity arterial evaluation. Since 1997, laboratory findings have been recorded into an electronic database employing an in-house software package for data archiving and retrieval; this data becomes part of the Mayo EMR. Patients referred to the center with suspected PAD undergo a comprehensive non-invasive evaluation including the ankle-brachial index (ABI) - the ratio of blood pressure measured in the upper arms divided by blood pressure measured at the ankles. Controls subjects are identified from patients referred to the Cardiovascular Health Clinic for stress ECG. The prevalence of PAD in patients with normal exercise capacity who do not have inducible ischemia on the stress ECG, was <1%. Data regarding risk factors for atherosclerosis such as diabetes, dyslipidemia, hypertension, and smoking are ascertained from the EMR.

  • Study Weblinks:
  • Study Type:
    • Case-Control
  • dbGaP estimated ancestry using GRAF-pop
  • Number of study subjects that have individual-level data available through Authorized Access:
Authorized Access
Publicly Available Data (Public ftp)

Connect to the public download site. The site contains release notes and manifests. The site also contains data dictionaries, variable summaries, documents, and truncated analyses, whenever available.

Study Inclusion/Exclusion Criteria

Subjects were determined using electronic algorithms deployed with an existing EMR.

The GWAS for a quantitative trait (e.g., RBC indices) was analyzed using the algorithm developed by Dr. Kullo to exclude RBC values affected by various medical conditions in all sites was used. (Please see additional document RBC_ESR_Algorithm.doc.)

(Please see additional document WBC_Algorithm.doc)

(Please see additional document Height_Algorithm.doc)

(Please see additional document Lipids_Algorithm.doc)

(Please see additional document DiabeticRetinopathy_definition.doc)

(Please see additional document Hypothyroidism_definition.doc)

(Please see additional document Non-melanoma skin cancer.doc)

(Please see additional document Actinic_keratosis_definition.doc)

(Please see additional document Seborrheic keratosis.doc)

Molecular Data
TypeSourcePlatformNumber of Oligos/SNPsSNP Batch IdComment
Whole Genome Genotyping Illumina Human660W-Quad_v1_A 592839 1048965
Whole Genome Genotyping Illumina Human1M-Duov3_B 1185051 1049348
Selected publications
Diseases/Traits Related to Study (MeSH terms)
Links to Related Resources
Authorized Data Access Requests
Study Attribution