My NCBI Sign In
Jump to: Authorized Access | Attribution | Authorized Requests

Study Description

As part of Population Architecture using Genomics and Epidemiology PAGE study (Phase I), the Epidemiologic Architecture using Genomics and Epidemiology (EAGLE I) project accessed both epidemiologic- and clinic-based collections. The epidemiologic-based collection of EAGLE I included the National Health and Nutritional Examination Surveys (NHANES), ascertained between 1991-1994 (NHANES III), 1999-2002, and 2007-2008. NHANES is a population-based cross-sectional survey now conducted every year in the United States to assess the health status of Americans at the time of ascertainment and to assess trends over the years of survey. Genetic NHANES consists of 19,613 DNA samples linked to thousands of variables including demographics, health and lifestyle variables, physical examination variables, laboratory variables, and exposures. NHANES is diverse with almost one-half of the samples (46.4%) coming from self-reported Mexican Americans and non-Hispanic blacks. In contrast to NHANES, BioVU is a clinic-based collection of >150,000 DNA samples from Vanderbilt University Medical Center linked to de-identified electronic medical records (EMRs). Approximately 12% of BioVU's overall DNA sample collection is from African American, Hispanic, and Asian patients.

The overall goals of PAGE I and EAGLE I were broad and several-fold:

  1. Replicate genome-wide association study (GWAS)- identified variants in European Americans;
  2. Identify population-specific and trans-population genotype-phenotype associations;
  3. Identify genetic and environmental modifiers of these associations.

NHANES is an excellent resource for the study of quantitative traits associated with common human diseases. However, given that the age range of NHANES spans childhood to late adulthood and not all diseases are surveyed, NHANES is less useful for the study of adult-onset diseases such as major cancers. Therefore, under American Recovery and Reinvestment Act (ARRA) funding, EAGLE as part of PAGE I defined eight major cancers sites for genetic analysis in BioVU, Vanderbilt's biorepository linked to de-identified EMRs. The eight major cancers defined for this study included melanoma, breast, ovarian, prostate, colorectal, lung, endometrial, and Non-Hodgkin's lymphoma (NHL). Cancer cases were defined using a combination of ICD-9 codes and tumor registry entries. Controls include BioVU participants without cancer and encompassing the age and gender distributions of cancer cases. Targeted genotyping of GWAS-identified variants for these diseases (124 SNPs) and ancestry informative markers (128 AIMs) was performed by the Center for Human Genetics Research Vanderbilt DNA Resources Core. After quality control, a total of 116 cancer-associated SNPs and 122 AIMs were available for downstream analyses.

  • Study Weblinks: EAGLE; PAGE
  • Study Type: Case-Control
  • Number of study subjects that have individual level data available through Authorized Access: 15256

Authorized Access
Publicly Available Data (Public ftp)

Connect to the public download site. The site contains release notes and manifests. If available, the site also contains data dictionaries, variable summaries, documents, and truncated analyses.

Study Inclusion/Exclusion Criteria

Cancer cases and controls were identified using a combination of in-patient and out-patient data as well as tumor registry entries. These data include primary site designations and histology information collected for clinical reporting purposes for the North America Association of Central Cancer Registries. A combination of the tumor registry data, along with ICD-9 billing codes, procedure codes, vital signs, and free text clinical notes, were used to identify cases for eight cancers among all patients aged 18 or greater in the SD with DNA samples using the following algorithms:

  • Breast cancer: Three or more mentions of ICD-9 primary code for malignant neoplasm of the female breast and all sub-codes on separate clinic visits OR a tumor registry entry for breast cancer AND female;
  • Colorectal cancer: Tumor registry entry for colorectal cancer;
  • Endometrial cancer: Tumor registry entry for endometrial cancer AND histology AND female;
  • Lung cancer: Tumor registry entry for lung cancer, any location and any type;
  • Melanoma: Three or more mentions of ICD-9 codes for malignant melanoma of skin OR tumor registry entry for melanoma;
  • Non-Hodgkin's lymphoma: Tumor registry entry for non-Hodgkin's lymphoma with histology;
  • Ovarian cancer: Tumor registry entry for ovarian cancer AND female;
  • Prostate cancer: Three or more mentions of ICD-9 codes for malignant neoplasm of prostate OR tumor registry entry for prostate cancer.

Approximately two control samples were identified per case. Controls were matched by sex, race/ethnicity (administratively assigned), and age (within five years of the cases). Controls were required to have at least two clinical narratives, with preference given to records with at least one fully documented history and physical. Exclusion criteria included records with one or more codes for neoplasms, records with a tumor registry entry, and records that had one or more cancer related keywords in the problem list.

Additional control criteria are as follows:

  • Breast cancer controls are female only. For women over 40 years of age, we required that records contain at least one mammography Bi-Rad score as 1 (negative) or 2 (benign);
  • Endometrial cancer controls are female only;
  • Ovarian cancer controls are female only;
  • For colorectal cancer controls, we required for patients over 50 years of age the keyword "colonoscopy" in the problem list OR a procedure code for colonoscopy;
  • Prostate cancer controls are male only. For male controls aged 40 years and greater to have at least one prostate specific antigen (PSA) level <4 and that the most recent PSA level is within the normal range.

A total of 7,348 cases of cancer were identified in BioVU for targeted genotyping in EAGLE (Table).

Table. Case counts by cancer and race/ethnicity. Cases of specific cancers were determined in the de-identified electronic medical records within BioVU using algorithms implemented in late 2010/early 2011 as described in the text. Race/ethnicity was administratively assigned.

Cancer EA AA H A AI/NA O U Total
Breast 1,052 163 7 17 2 10 66 1,317
Colorectal 797 75 6 5 1 5 23 912
Endometrial 203 19 1 1 0 1 8 233
Lung 782 66 2 3 1 4 43 901
Melanoma 1,225 23 2 0 0 3 95 1,348
Non-Hodgkin's lymphoma 276 17 1 0 0 2 46 342
Ovarian 161 7 3 2 0 0 10 183
Prostate 1,895 172 4 2 0 7 32 2,112
Total 6,391 542 26 30 4 32 323 7,348

Abbreviations: European American (EA), African American (AA), Hispanic (H), Asian (A), American Indian/Native Alaskan (AI/NA), Other (O), Unknown (U).

For the first five cancers defined in BioVU (breast, colorectal, melanoma, ovarian, and prostate cancers), we identified approximately two controls per case for genotyping as defined in the inclusion/exclusion criteria. A total of 8,996 controls were targeted for genotyping. Two controls per case of endometrial cancer, lung cancer, and non-Hodgkin's lymphoma were defined from among the genotyped control samples.

Molecular Data
TypeSourcePlatformNumber of Oligos/SNPsSNP Batch IdComment
Targeted Genotyping Applied Biosystems TaqMan SNP Genotyping Assay N/A N/A Please refer to submitter ID to link to dbSNP
Targeted Genotyping Sequenom Custom Array N/A N/A
Selected publications
Diseases/Traits Related to Study (MESH terms)
Authorized Data Access Requests
Study Attribution