Jump to: Authorized Access | Attribution | Authorized Requests

Substudies
phs000674.v3.p3 : Resource for Genetic Epidemiology Research on Aging (GERA)
phs000786.v2.p3 : RPGEH WGS Pilot Study

Study Description

The Kaiser Permanente Research Program on Genes, Environment, and Health (RPGEH) is a resource developed to facilitate research on genetic and environmental factors on common diseases and healthy aging. The RPGEH resource links biospecimens, health surveys, and comprehensive electronic medical records on broadly consented adult members of Kaiser Permanente Medical Care Plan, Northern California Region (KPNC). KPNC is an integrated health care delivery system with a membership of approximately 3.3 million people in northern California. The membership of KPNC is representative of the general population in the 14 county area in which facilities are located, although extremes of income are underrepresented. At the end of 2013, the RPGEH resource included: (1) demographic and behavioral surveys from over 430,000 participants; (2) biospecimens (DNA, serum, plasma, and/or saliva) from over 204,000 participants, including over 13,000 pregnant women; (3) genome-wide genotype data (70 billion SNP genotypes) on over 100,000 participants, including the Genetic Epidemiology Research on Adult Health and Aging (GERA) cohort; and (4) the longitudinal electronic medical records of the participants.

The RPGEH was developed beginning in 2005 at the Division of Research of Kaiser Permanente Northern California by Catherine Schaefer (Director), Neil Risch (Co-Director), Lisa Croen, Eric Jorgenson, Lawrence Kushi, Charles Quesenberry, Sarah Rowell, Carol Somkin, Stephen Van den Eeden, Larry Walter, and Rachel Whitmer. Funding of the RPGEH was provided to C. Schaefer (PI) and N. Risch (co-PI) by the Wayne and Gladys Valley Foundation, The Ellison Medical Foundation, the Robert Wood Johnson Foundation, Kaiser Permanente Northern California, and the Kaiser Permanente National and Regional Community Benefit Programs. The GERA cohort was funded by a grant from NIH to RPGEH and UCSF (RC2 AG036607; C. Schaefer and N. Risch, PIs). At the time of the award of the RC2 project in late 2009, the RPGEH had established a cohort of about 140,000 individuals who had answered a detailed survey, provided saliva samples for extraction of DNA, and given broad consent for the use of their data in studies of health and disease.

Survey and Cohort Recruitment. Initially, the RPGEH developed electronic disease registries to enable identification of phenotypes, using algorithms applied to EMR data. In 2007, the RPGEH mailed a four page survey to 1.9 million adult (≥ 18 years old) members of KPNC who had been members for two years or more, to obtain data on demographic and behavioral factors complementary to the clinical data in the EMR. The survey materials included a cover letter introducing the RPGEH, a two page list of Frequently Asked Questions, and the survey, which included questions on demographic factors such as education, race-ethnicity, income and marital status, dietary factors, physical activity, smoking, and alcohol consumption, as well as reproductive history and reproductive health. Members whose electronic medical records indicated a preference for written communications in Chinese or Spanish received survey materials both in English and a Chinese or Spanish translation. Approximately 400,000 completed surveys were returned.

Saliva Sample Collection. Beginning in July 2008, respondents to the survey were asked to sign and return a consent form and authorization for use and disclosure of protected health information. The consent form authorized broad use of biospecimens, survey data, and data from participants' electronic health records for use in studies of genetic and environmental influences on health and disease. Respondents who returned completed consent forms were mailed (Oragene) saliva collection kits; more than 132,000 saliva samples were collected in two years. Completed saliva kits were scanned and archived in a temporary biorepository at the KPNC Division of Research.

In late 2009, the RPGEH began collection of saliva samples from the California Men's Health Study (CMHS), a cohort that had been previously assembled in 2002-2003 and had been excluded from the RPGEH survey mailing with the intent of later adding CMHS participants to the assembled RPGEH cohort. The CMHS was developed to facilitate research on prostate cancer and other conditions in older men; the study protocol is described in Enger, et al., 2006. It enrolled and surveyed more than 40,000 men in KPNC, ages 45-69 years, who were members of KPNC during 2002-2003. CMHS men completed two mailed surveys with demographic and behavioral data similar to that of the RPGEH. The data on analogous variables were reconciled and integrated with the data derived from the RPGEH cohort for use in the RPGEH resource. By 2011, RPGEH collected approximately 15,400 saliva samples from men participating in the CMHS.

RPGEH Access and Collaborations Website and Procedures. The RPGEH maintains a web portal for inquiries and applications for collaboration and access to data. The url is: https://rpgehportal.kaiser.org/. RPGEH has an application process and an Access Review Committee that reviews applications for collaboration and use. For more information, please contact RPGEH through the website.

Authorized Access
Publicly Available Data
  Link to other NCBI resources related to this study
Study Inclusion/Exclusion Criteria

Inclusion criteria for the RPGEH Cohort data deposited in dbGaP include all of the following:

  1. Eligible for RPGEH survey
    1. ≥ 18 years of age at time of survey mailing (2007)
    2. KP Northern California Region enrollee for at least 2 years prior to survey
  2. Consented explicitly to have data deposited in NIH-maintained database

Exclusion criteria for the RPGEH Cohort data deposited in dbGaP included any of the following:

  1. Subject requested withdrawal from study after DNA extraction and genotyping
  2. Validity of link between biospecimen and study participant questionable because of genotype-phenotype discordance, e.g. gender

Study History

Survey and Cohort Recruitment. The RPGEH was first funded in 2005 and has worked to build the cohort and data resources continuously since then. Initially, the RPGEH developed electronic disease registries to enable identification of phenotypes, using algorithms applied to EMR data. In 2007, the RPGEH mailed a four page survey to 1.9 million adult (≥ 18 years old) members of KPNC who had been members for two years or more, to obtain data on demographic and behavioral factors complementary to the clinical data in the EMR. The survey materials included a cover letter introducing the RPGEH, a two page list of Frequently Asked Questions, and the survey, which included questions on demographic factors such as education, race-ethnicity, income and marital status, indicators for self- and family history of the occurrence of about 35 conditions and diseases, dietary factors, physical activity, smoking, and alcohol consumption, as well as reproductive history and reproductive health. Members whose electronic records indicated a preference for written communications in Chinese or Spanish received survey materials both in English and a Chinese or Spanish translation. Approximately 400,000 completed surveys were returned.

Saliva Sample Collection. Beginning in July 2008, respondents to the survey were asked to sign and return a consent form and authorization for use and disclosure of protected health information. The consent form authorized broad use of biospecimens, survey data, and data from participants' electronic health records for use in studies of genetic and environmental influences on health and disease. Respondents who returned completed consent forms were mailed (Oragene) saliva collection kits; more than 132,000 saliva samples were collected in two years. Completed saliva kits were scanned and archived in a temporary biorepository at the KPNC Division of Research.

In late 2009, the RPGEH added collection of saliva samples from the California Men's Health Study (CMHS), a cohort that had been previously assembled in 2002-2003 and had been excluded from the RPGEH survey mailing with the intent of later adding CMHS participants to the assembled RPGEH cohort. The CMHS was developed to facilitate research on prostate cancer and other conditions in older men; the study protocol is described in Enger, et al., 2006. It enrolled and surveyed more than 40,000 men in KPNC, ages 45-69 years, who were members of KPNC during 2002-2003. CMHS men completed two mailed surveys with demographic and behavioral data similar to that of the RPGEH. The data on analogous variables were reconciled and integrated with the data derived from the RPGEH cohort for use in the RPGEH resource. RPGEH collected approximately 15,400 additional saliva samples from men participating in the CMHS by 2009.

GERA Genotyping project. In September 2009, the RPGEH received a Grand Opportunity grant from NIA, NIMH, and the NIH Director's Office (RC2 AG036607) that enabled RPGEH to conduct genome-wide genotyping of 100,000 participants, selected from the approximately 147,000 participants who had provided consent and saliva samples up to that time. The RC2 grant was jointly awarded to Kaiser Permanente Division of Research and the UCSF Institute for Human Genetics (Schaefer / Risch, PIs). This project formed the GERA Cohort that is the basis for the deposition of data in dbGaP. The aims of the project included extraction of DNA from 100,000 saliva samples, design of custom microarrays for genotyping (one for each major race-ethnicity group in the cohort), genotyping of 100,000 DNA samples, linkage of the resulting data with clinical data from the EMR, survey data, and environmental data sources to enable analysis of genetic and environmental influences on many diseases and conditions, development of tools for provision of tailored datasets for specific research projects, and deposit of data in dbGaP.

Four custom arrays were designed for genotyping, one for each of the four major race-ethnicity groups in the RPGEH cohort: African Americans, Asians, Latinos, and Non-Hispanic Whites. The number of SNPs and SNP content varied by array, with SNP content designed to maximize the coverage of low frequency and more common variants specific to the different race-ethnicity groups, while also maximizing coverage of the whole genome, and including new SNPs from sequencing projects, and SNPs with established associations with disease phenotypes. Description of the array designs is provided in two publications: Hoffmann et al., 2011a and Hoffmann et al., 2011b. Genotyping was performed at the Genomics Core Facility of the Institute of Human Genetics at UCSF, under the direction of Pui-Yan Kwok, MD, PhD. Description of the DNA extraction and genotyping processes and QC is provided in Kvale et al., 2015 (PMID: 26092718). Description of the analyses of population structure and development of principal components for adjustment of population structure is provided in Banda et al., 2015 (PMID: 26092716).

To maximize the diversity of the sample, the GERA cohort was formed by including all racial and ethnic minority participants with saliva samples (N = 20,935; 19%); the remaining participants were drawn sequentially and randomly from white non-Hispanic participants (89,341; 81%). A total of 110,266 participant samples were included to ensure that 100,000 were successfully assayed.

GERA Imputation project. Imputation was performed on an array-wise basis. Genotypes were pre-phased with SHAPEIT v2.5, with cryptic relatives included to improve phasing. The 1000 Genomes Project (October 2014 release with singletons removed) was used as a cosmopolitan reference panel, and over 31 million variants were imputed with IMPUTE2 v2.3.1 (Hoffmann et al., 2015).

Reconsent for GERA dbGaP data deposition. Although the original consent form signed by RPGEH participants provided for sharing of de-identified data with collaborators, it did not provide explicit consent for placement of participants' data in databases with access controlled by NIH or other third parties. To ensure all participants were appropriately consented for placement of data in dbGaP, the RPGEH mailed new consent forms that included a section explaining dbGaP to all participants. Approximately 77% of participants returned the signed, updated consent form. After excluding samples that failed genotyping and small numbers of invalid or duplicate results, the total number of appropriately consented participants with data for deposition in dbGaP is 78,419. The demographic characteristics of the final GERA cohort for dbGaP are similar to those of the broader GERA genotyped cohort.

Funding. In addition to the NIH funding of the RC2 project that supported the genotyping, the RPGEH has been supported by grants from philanthropic foundations, including the Wayne and Gladys Valley Foundation, the Ellison Medical Foundation, and the Robert Wood Johnson Foundation, as well as support from Kaiser Permanente, for work on disease registries, cohort enrollment, survey collection, and collection of biospecimens.

Selected Publications
Diseases/Traits Related to Study (MeSH terms)
Authorized Data Access Requests
See articles in PMC citing this study accession
Study Attribution
  • Principal Investigators
    • Catherine Schaefer, PhD. Kaiser Permanente Research Program on Genes, Environment and HealthKaiser Permanente Division of Research, Oakland, CA, USA.
    • Neil Risch, PhD. Institute for Human Genetics, University of California, San Francisco, CA, USA.
  • Co-Investigators
    • Yambazi Banda, PhD. Institute for Human Genetics, University of California, San Francisco, CA, USA.
    • Elizabeth Blackburn, PhD. Department of Biochemistry and Biophysics, University of California, San Francisco, CA, USA.
    • Lisa Croen, PhD. Kaiser Permanente Research Program on Genes, Environment and HealthKaiser Permanente Division of Research, Oakland, CA, USA.
    • Thomas Hoffmann, PhD. Institute for Human Genetics, University of California, San Francisco, CA, USA.
    • Carlos Iribarren, MD, PhD. Kaiser Permanente Division of Research, Oakland, CA, USA.
    • Eric Jorgenson, PhD. Kaiser Permanente Research Program on Genes, Environment and HealthKaiser Permanente Division of Research, Oakland, CA, USA.
    • Lawrence Kushi, PhD. Kaiser Permanente Research Program on Genes, Environment and HealthKaiser Permanente Division of Research, Oakland, CA, USA.
    • Mark Kvale, PhD. Institute for Human Genetics, University of California, San Francisco, CA, USA.
    • Pui-Yan Kwok, MD, PhD. Institute for Human Genetics, University of California, San Francisco, CA, USA.
    • Charles Quesenberry, PhD. Kaiser Permanente Division of Research, Oakland, CA, USA.
    • Sarah Rowell, MPH. Kaiser Permanente Division of Research, Oakland, CA, USA.
    • Carol Somkin, PhD. Kaiser Permanente Division of Research, Oakland, CA, USA.
    • Stephen Van den Eeden, PhD. Kaiser Permanente Division of Research, Oakland, CA, USA.
    • Larry Walter, MA. Kaiser Permanente Division of Research, Oakland, CA, USA.
    • Rachel Whitmer, PhD. Kaiser Permanente Division of Research, Oakland, CA, USA.
  • Funding Sources
    • RC2 AG036607. National Institute on Aging, National Institutes of Health, Bethesda, MD, USA.
    • Wayne and Gladys Valley Foundation.
    • Ellison Medical Foundation.
    • Robert Wood Foundation.
    • Kaiser Permanente, Oakland, CA, USA.
  • Funding Source Contacts: NIH
    • Winifred K. Rossi, MA. National Institute on Aging, National Institutes of Health, Bethesda, MD, USA.
  • Funding Source Contacts: All other sources
    • Please contact PI Catherine Schaefer, PhD for additional information.
  • Genotyping Center
    • Genomics Core Facility, Institute for Human Genetics, University of California, San Francisco, CA, USA.
  • Genotyping Quality Control
    • Mark Kvale, PhD. Institute of Human Genetics, University of California, San Francisco, CA, USA.