|Jump to:||Authorized Access|||||Attribution|||||Authorized Requests|
- Study Description
This sub-study phs000668 CHARGE-S Atherosclerosis Risk in Communities (ARIC) contains whole genome, whole exome, and targeted sequencing data, and exome SNP array data, produced as part of NHLBI's CHARGE-S project for subjects available from the phs000668 study. Summary level phenotypes for the NHLBI ARIC Cohort study participants can be viewed at the top-level study page phs000280 ARIC Cohort. Individual level phenotype data and molecular data for the ARIC Cohort top-level study and sub-studies are available by requesting Authorized Access to the NHLBI ARIC Cohort phs000280 study.
Genome-wide association studies (GWAS) have successfully localized multiple loci containing common variations influencing coronary heart disease and its risk factors, but in most cases neither the gene underlying disease susceptibility nor the spectrum of candidate functional variants has been identified. Building on GWAS for NHLBI-diseases: the U.S. CHARGE consortium (the CHARGE sequencing (CHARGE-S) consortium) is a collaborative effort to leverage existing population, laboratory and computational resources to identify susceptibility genes underlying genome-wide significant and well-replicated GWAS findings for heart, lung and blood diseases and their risk factors. The sequencing approach was funded by NHLBI with funds provided by the American Recovery and Reinvestment Act of 2009 (ARRA). The U.S. CHARGE consortium consists of multiple large population-based longitudinal cohort studies, including the Atherosclerosis Risk in Communities (ARIC) Study (N=15,792), the Cardiovascular Health Study (CHS) (N=5,888), and the Framingham Heart Study (FHS) (N=14,428).
The study has taken a two pronged approach to following-up GWAS. First, regional capture targeted sequencing was performed in genomic regions influencing 15 phenotypes to localize causal variants that are responsible for the GWAS signal. The phenotypes examined were atrial fibrillation, blood pressure, body mass index (BMI), bone mineral density, C-reactive protein (CRP), carotid intima-media thickness (IMT), echocardiography, electrocardiogram PR and QRS interval, fasting insulin, hematocrit, pleiotropy, pulmonary function, retinal venule diameter, and stroke. A case-cohort study design was used in which a common reference sample was selected from all three cohorts at baseline. The cohort random sample included 2,000 individuals composed of 1,000 participants from the ARIC study, 500 participants from FHS, and 500 participants from CHS in a 1:1 gender ratio. The comparison groups were either selected cases for discrete phenotypes, or participants drawn from the top and/or bottom tail of the distribution for quantitative phenotypes. The size of each comparison group was 200 individuals. Approximately 2 Mb of the genome was sequenced for the targeted loci.
Second, whole exome capture sequencing and low-pass whole genome sequencing were completed for the cohort random sample and 7 phenotypes for which there were more than 3 GWAS signals in coding regions to detect novel rare and common variants. The phenotypes investigated by whole exome sequencing were age at menopause, electrocardiogram QT interval, fasting blood glucose, fibrinogen level, renal function, Stamler-Kannel-like extremes of risk factors, and waist-to-hip ratio.
Follow-up genotyping using the Illumina HumanExome BeadChip has also been completed. Additional information including variant annotation is available at http://web.chargeconsortium.com/main/exomechip.
All sequencing was carried out at the Human Genome Sequencing Center at the Baylor College of Medicine. This study contains the Atherosclerosis Risk in Communities (ARIC) study subset of CHARGE-S. Additional data from CHARGE-S is also available via dbGaP.
- Authorized Access
- Publicly Available Data (Public ftp)
- Study Inclusion/Exclusion Criteria
The inclusion criteria for selection of participants for the CHARGE-S study were full informed consent, sufficient DNA for sequencing, self-reported ethnicity as non-Hispanic white, no first degree relatives within a subgroup of individuals selected for an extreme trait, and availability of genotyping results from the appropriate phenotype-specific GWAS. Participants from at least one of the three cohorts represented in the CHARGE-S study were included in each phenotype group. Inclusion and exclusion criteria for the individual phenotypes are described below:
A. Targeted sequencing
- Atrial fibrillation: Two hundred (200) subjects from the Massachusetts General Hospital Atrial Fibrillation study with early-onset atrial fibrillation occurring before 66 years of age were selected for sequencing. Participants with evidence of structural heart disease as assessed by echocardiography were excluded.
- Blood pressure: One hundred (100) individuals were selected from the ARIC study, 50 from FHS, and 50 from CHS from both extremes of the standardized residuals of systolic blood pressure and diastolic blood pressure after adjustment for age, age2, BMI, and study site if applicable. The regression was stratified by sex, and an equal number of individuals from both sexes were chosen for sequencing. Data from the first clinical visit available for an individual was used where there was data from multiple examinations. Systolic blood pressure and diastolic blood pressure was adjusted if participants were taking antihypertensive medication by adding 10 mm Hg/5 mm Hg. Individuals taking antihypertensive medication for the selection of subjects for the lower tail of the trait; with a history of heart failure prior to measurement of blood pressure; whose systolic blood pressure was < 60 mm Hg, or diastolic blood pressure < 20 mm Hg; or whose BMI was + 4 standard deviations from the mean were excluded.
- Body mass index (BMI): Two hundred (200) unrelated individuals including 100 participants from the ARIC study, 50 from CHS, and 50 from FHS were sequenced from the high tail of the distribution for BMI based on age- and sex-adjusted residuals. In FHS, subjects were greater than 25 years of age. In the ARIC study and CHS, there were no age restrictions. In all studies, individuals were excluded if BMI < 18.5 kg/m2.
- Bone mineral density: One hundred (100) individuals were selected for targeted sequencing from CHS and 100 were chosen from FHS who had extremely low femoral neck bone mineral density (FN BMD) with approximately twice as many women as men. The selection of participants was based on using FN BMD T-score (number of standard deviations below young normal values) < -2 and Z-score < -1.5. After the original CHARGE-S sequencing, the musculoskeletal working group received funding to perform targeted sequencing of all CHARGE-S working group loci in an additional sample of Framingham participants. The 325 samples were selected to have low FN BMD using the following sequential criteria until all 325 samples were selected:
- C-reactive protein (CRP) level: One hundred (100) individuals from the ARIC study, 50 from CHS, and 50 from FHS with the highest CRP residuals were chosen from a sex-stratified sample after adjustment for age, hormone therapy, study site, BMI, and lipid therapy. Participants with residuals greater than four times the standard deviation were excluded.
- Carotid intima-media thickness (IMT): Participants were selected for sequencing from the high tail of the common carotid IMT distribution. The study sample included 100 subjects from the ARIC study, 50 subjects from CHS, and 50 subjects from FHS, with an equal number of men and women.
- Echocardiography (left ventricular diastolic dimension): Fifty (50) unrelated males and 50 unrelated females (n=100) from the highest end of the trait distribution in CHS and FHS were sampled for sequencing after adjustment for age, height, weight, and study site if applicable.
- ECG (electrocardiogram) PR interval: Two hundred (200) subjects from the upper tail of the PR trait distribution based on residuals of a model with PR interval as the dependent variable and age, sex, study center, BMI, and height as the independent variables were selected for sequencing, including 50 men with the highest residuals and 50 women with the highest residuals in the ARIC study, 50 participants from CHS, and 50 participants from FHS. Individuals with a history of atrial fibrillation at baseline; extreme PR interval (<80 or >320); pacemaker or defibrillator; Wolff-Parkinson-White (WPW) syndrome; third degree AV block; history of heart failure or myocardial infarction; use of digoxin or class I or class III antiarrhythmic blocking medication; or who were missing covariates used for adjustment were excluded.
- ECG (electrocardiogram) QRS interval: Two hundred (200) subjects were sequenced from the upper tail of the QRS trait distribution including 50 men and 50 women in the ARIC study, 50 individuals from CHS and 50 participants in FHS after applying exclusions. Individuals with atrial fibrillation; history of myocardial infarction or congestive heart failure; a QRS interval > 120; Wolff-Parkinson-White (WPW) syndrome; implantation of a pacemaker, and use of class I and class III antiarrhythmic blocking medication were excluded.
- Fasting insulin: Two hundred (200) subjects were sampled from the high tail of the distribution including 100 individuals in the ARIC study, 50 participants in CHS and 50 participants in FHS. individuals with known diabetes; who were treated for diabetes; or those with a fasting glucose >7 mmol/L were excluded. The ARIC study and FHS applied a further exclusion of non-fasting individuals. Participants who were missing hemoglobin A1c values were also excluded in the ARIC study, and subjects with type 1 diabetes were excluded in FHS. Selection was gender-specific.
- Hematocrit: Two hundred (200) individuals were selected from the lower tail of the hematocrit distribution including 100 ARIC study participants, 50 CHS participants, and 50 FHS participants. A 50:50 gender ratio was maintained. The residuals from linear regression of hematocrit as a continuous trait with adjustment for age, sex, and study site for multicenter cohorts were calculated for each of the three cohorts. Individuals with hematocrits within 3 standard deviations of the sample mean for each cohort were included in the analysis. Individuals with known malignancies; who smoked; or who had renal failure were excluded.
- Pleiotropy: Pleiotropy, or the influence of a single gene on multiple traits, can be defined operationally as evidence that a region or locus containing one to many genes displays strong associations (p < 5 x 10-8) with 2 or more traits in multiple genome-wide association studies. Using novel bioinformatics tools that allow cross-trait queries to identify and visualize associations in regions that show a high degree of overlap across traits, 44 regions related to cardiovascular disease were selected for sequencing studies. These regions were assessed in all participants from the ARIC study, CHS, and FHS who were selected for targeted sequencing.
- Pulmonary function: Severe cases of chronic obstructive pulmonary disease (COPD) were selected based on forced expiratory volume in the first second (FEV1) that was less than 65% of the predicted value, and its ratio to forced vital capacity (FEV1/FVC) that was less than the lower limit of normal based on NHANES III prediction equations. A random sample of 200 subjects was selected for sequencing among those who met the severe COPD definition at visits 1 and 2 in the ARIC study, and who had non-missing covariate data.
- Retinal venule diameter: Individuals were selected for sequencing from the highest quartile of the trait distribution adjusted for age and sex from the ARIC study (n=166) and CHS (n=34). All participants had retinal photography and retinal arteriolar and venular caliber measured from computer software using standardized protocols.
- Stroke: Stroke was defined as a focal neurological deficit of presumed vascular cause with sudden onset and lasting for at least 24 hours or until death if the participant died less than 24 hours after the onset of symptoms. Participants with incident ischemic stroke based on clinical and imaging criteria excluding cardioembolic events were eligible for selection. This phenotype, corresponding to both large and small artery atherothrombotic strokes, yielded the largest hazard ratio in the CHARGE meta-analysis. From among individuals meeting these criteria, the individuals with the earliest strokes with onset past the age of 65, and equal numbers of men and women were selected in numbers proportional to the size of the participating cohorts: 80, 70, and 50 individuals from the ARIC study, CHS, and FHS, respectively.
1. T-score < -2 (for both FN BMD and lumbar spine BMD (LS BMD)) and Z -score < -2 (for both FN BMD and LS BMD)
2. FN BMD T-score < -2 and FN BMD Z-score < -2
3. FN BMD T-score < -2 and FN BMD Z-score < -1.5
4. FN BMD T-score < -1.5 and FN BMD Z-score < -1.5
5. FN BMD T-score < -1.0 and FN BMD Z-score < -1.5
6. FN BMD T-score < -1.0 and FN BMD Z-score < -0.5
For criteria #2 to #6, individuals were excluded if they had LS BMD Z-score > 0. Of the 325 samples, 300 were sequenced, and the ratio of women: men was approximately 2.5:1 since the GWAS findings that generated these candidate genes came from a sample with approximately the same ratio of women and men. Thus there were 81 men and 219 women included in the sample.
B. Exome sequencing
- ECG (electrocardiogram) QT interval: QT interval measures were adjusted for age and RR interval, and the highest standardized residuals were used as the basis of selection of 100 participants from the ARIC study, 50 participants from CHS, and 50 participants from FHS. Individuals who had a bundle branch block; QRS interval >120; atrial fibrillation; pacemaker activity, or who were using QT prolonging medication were excluded. Selection was gender-specific.
- Fasting glucose: Two hundred (200) subjects were sampled from the high tail of the native distribution of fasting blood glucose including 100 individuals in the ARIC study, 50 participants in CHS, and 50 participants in FHS. Individuals with known diabetes; who were treated for diabetes; or those with a fasting glucose >7 mmol/L were excluded. The ARIC study and FHS applied a further exclusion of non-fasting individuals. Participants who were missing hemoglobin A1c values were also excluded in the ARIC study, and subjects with type 1 diabetes were excluded in FHS. Selection was gender-specific.
- Menopause: One hundred (100) women in the ARIC study, 50 in CHS, and 50 in FHS were sequenced from the lower tail of the natural menopause distribution. Age at natural menopause was defined on the basis of self-reported age at the last menstrual period, excluding those reporting any menstrual cycle in the previous 12 months, and including only natural menopause cases (i.e., cases due to hysterectomy, chemotherapy, oophorectomy, or unknown causes were excluded). Early menopause was defined as natural menopause occurring between 35 and 44 years of age. Each cohort chose subjects at random from that tail.
- Plasma fibrinogen level: Participants with the highest fibrinogen levels based on residuals that were stratified by sex and adjusted for age and clinic site were sampled for sequencing. The top 25 individuals were selected for each sex from CHS and FHS, and the top 50 individuals for each sex were chosen from the ARIC study.
- Renal function: Each of the three cohorts selected their best chronic kidney disease (CKD) cases. A total of 200 subjects were chosen based on low glomerular filtration rate estimated by serum creatinine (eGFRcrea). In CHS, CKD was defined as eGFR < 60 ml/min/1.73m2 based on a single measurement at the baseline visit (n=50). ARIC (n=100) and FHS (n=50) used a cumulative definition of CKD based on measurements at several study visits. In the ARIC study, subjects were chosen based on low eGFR at visits 1, 2, and 4 if they were not included in the cohort random sample or other case groups. At each visit, individuals with eGFRcreas > 60 were excluded, and then a total of 100 participants with the lowest eGFR values at each visit were selected stratified by gender. In FHS, the cumulative prevalence of CKD was defined as low eGFRcrea at both the earlier examination cycle (15th for the Original Cohort and 2nd for the Offspring Cohort) and the later examination cycle (24th for the Original Cohort and 7th for the Offspring Cohort), or if diagnosed at the later examination cycles.
- Stamler-Kannel: The extremes of the Stamler-Kannel design were chosen by generating residuals for the following traits adjusted for age, age2, sex, BMI, BMI2, and study site: systolic blood pressure, diastolic blood pressure, triglycerides, total cholesterol, HDL cholesterol, glucose, and insulin. A principal components analysis was then fitted for these phenotypes and standardized across the three cohorts. Individuals with the highest and lowest values of the first principal component were selected for sequencing.
- Waist-to-hip ratio adjusted for BMI: Two hundred (200) unrelated participants including 100 individuals from the ARIC study, and 50 subjects from CHS and FHS were sequenced from each of the tails of the distribution for waist-to-hip ratio adjusted for BMI. In FHS, subjects were greater than 25 years of age and less than 65 years of age. In the ARIC study and CHS, there were no age restrictions. In all studies, individuals were excluded if BMI < 18.5 kg/m2.
C. Early access exome sequencing
- Brain MRI traits: One hundred (100) African-American individuals from the ARIC Brain Magnetic Resonance Imaging (MRI) Study were selected from the extremes of the distributions (upper tenth and lower fifth percentile) of leukoaraiosis and cerebral atrophy adjusted for age, sex, head size, and hypertension status. The sample is oversampled for the upper extremes of each trait.
- Systolic blood pressure: After adjustment for gender, baseline age, and use of antihypertensive medication, the residual systolic blood pressure was used to identify African-American participants in the lower 10% and upper 90% of the distribution in the ARIC study. For those individuals in the extreme tails of the systolic blood pressure distribution, a score was calculated that summed the difference between an individual's systolic blood pressure at a particular clinic visit and the 10% tail cutoff for that visit. Systolic blood pressure values from all available clinical examinations were used. Individuals with the largest summary score are those individuals that remain in the tails of the systolic blood pressure distribution in all available examinations and that have systolic blood pressure levels at each visit that are most extreme from the tail cutoff values. Thirty (30) individuals were chosen from each tail.
D. Genome-Wide Association Study in Essential Hypertension; FEHGAS2 (Family Blood Pressure Program (FBPP)-ARIC Essential Hypertension Genome-Wide Association Study 2)
As part of the comprehensive FEHGAS2 study to evaluate the role of rare and common genetic variation in systolic blood pressure, diastolic blood pressure, and hypertension, exome sequencing and whole genome sequencing were performed in 2,836 and 1,901 individuals, respectively, of African ancestry from the ARIC study.
E. Disease 2020: Large-Scale Sequencing and Analysis Center Initiated Projects/Baylor College of Medicine
Disease 2020, a new strategic framework for the NHGRI Sequencing Program, was introduced with the goal of leveraging recent advances in genomic technology to systematically define the genetic basis of human disease and maximize the impact of genomic medicine. In response to this initiative, the Large-Scale Sequencing and Analysis Centers including the Baylor College of Medicine Human Genome Sequencing Center (HGSC), were invited to propose demonstration projects known as Center Initiated Projects (CIPs). Two CIPs were designed in accordance with the emphasis of the common disease component of Disease 2020 on 1) leading causes of morbidity and mortality for which a significant proportion of the heritability is still unexplained, and 2) availability of appropriately consented DNA samples from individuals enrolled in large deeply phenotyped cohort studies; these CIPs are described below:
1. Rare and Common Variants Contribute to Age-Related Change in Brain Morphology and Cognitive Decline: ARIC
Genetic analysis of heritable endophenotypes, such as cognitive function and brain morphology that may be closer to the underlying disease pathophysiology and more directly related to gene expression, may help to uncover loci that increase susceptibility to Alzheimer's disease before diagnostic criteria are met. To evaluate the contribution of rare and low frequency coding variants to changes in cognition and brain morphology, exome sequencing was performed in 2,905 European-American participants from the ARIC Study, CHS and FHS. Selection of study subjects was carried out within each of the three cohorts. During the third ARIC examination, 966 European American and 960 African American participants aged 55 and older were invited to undergo cranial magnetic resonance imaging (MRI). Three validated neurocognitive tests chosen to represent different domains (Delayed Word Recall Test, verbal memory; Word Fluency Test, executive function; and Digit Symbol Substitution Test, processing speed) were administered to the entire cohort at the second and fourth ARIC clinic visits. In 2004-2006, 1,025 study participants underwent a second cerebral MRI examination and additional cognitive testing. Brain images were graded both semi-quantitatively (at visit 3 and at the Brain MRI follow-up visit) and quantitatively (follow-up visit only) for hippocampal volume. Participants from FHS were selected (n = 939) who had at least two MRI scans to assess hippocampal volume an average of six years apart. All FHS participants also had a 45-minute cognitive assessment at the same time as each MRI including a test of delayed recall (Logical Memory I adapted from the Original Wechsler Memory Scale). Individuals with at least one MRI to evaluate hippocampal volume (n = 835) were chosen from CHS. Among these CHS participants, 252 had hippocampal volume measured using a quantitative MRI scan, and 824 had data available for 6-year change in scores for the Digit Symbol Substitution Test. An additional group of 165 persons without hippocampal volume measurements but who had taken the Digit Symbol Substitution Test twice six years apart were also included. Meta-analysis strategies can be used to combine results for the three cohorts. This study contains the ARIC study subset of the CIP. Additional data from CHS and FHS is also available via dbGaP.
2. The Genetic Architecture of Common Chronic Disease
This study includes participants from the ARIC study with a focus on identification of rare and low frequency variants associated with cardiovascular disease and its risk factors by exome sequencing. Both African-Americans and European-Americans are included, and can be considered a random sample of ARIC study participants with available DNA who provided written informed consent for genetic studies and data sharing. Prior investment in sequencing and genotyping in the ARIC cohort will allow the data obtained from this CIP to be combined with other collaborative efforts including the NHLBI Grand Opportunity Exome Sequencing Project (GO-ESP), and FEHGAS2 to create a well-powered study for multiple phenotypes consisting of approximately 11,000 persons.
- Molecular Data
Type Source Platform Number of Oligos/SNPs SNP Batch Id Comment Targeted Sequencing Applied Biosystems SOLiD4 N/A N/A Exome sequencing and whole genome sequencing Illumina HiSeq 2000 N/A N/A Exome genotyping Illumina Infinium HumanExome BeadChip N/A N/A
- Study History
The U.S. CHARGE consortium consists of three large population-based longitudinal cohort studies, including the Atherosclerosis Risk in Communities (ARIC) Study, the Cardiovascular Health Study (CHS), and the Framingham Heart Study (FHS). The Atherosclerosis Risk in Communities (ARIC) Study is described below:
The ARIC study is a prospective longitudinal investigation of the development of atherosclerosis and its clinical sequelae in which 15,792 individuals aged 45 to 64 years were enrolled at baseline. At the inception of the study in 1986-1989, the participants were selected by probability sampling from four communities in the United States: Forsyth County, North Carolina; Jackson, Mississippi (African-Americans only); the northwestern suburbs of Minneapolis, Minnesota; and Washington County, Maryland. Six examinations have been carried out (examination 1, 1987-1989; examination 2 1990-1992; examination 3, 1993-1995; examination 4, 1996-1998; examination 5, 2011-2013; examination 6, 2016-2017), and a seventh examination is ongoing. Subjects are contacted annually to update their medical histories between examinations.
- Selected publications
- Diseases/Traits Related to Study (MESH terms)
- Links to Related Resources
- Authorized Data Access Requests
- Study Attribution