My NCBI Sign In
Jump to: Authorized Access | Attribution | Authorized Requests

Study Description

The Type 2 Diabetes (T2D) Genetic Exploration by Next-generation sequencing in Ethnic Samples (T2D-GENES) Consortium is a collaborative international effort to identify genes influencing susceptibility to T2D in multiple ethnic groups using next generation sequencing. T2D-GENES Project 2 is a complex pedigree-based study designed to identify low frequency or rare variants influencing susceptibility to T2D, using whole genome sequence (WGS) information from 1,043 individuals in 20 Mexican American T2D-enriched pedigrees from San Antonio, Texas. The major objectives of this study are to identify low frequency or rare variants in and around known common variant signals for T2D, as well as to find novel low frequency or rare variants influencing susceptibility to T2D.

The sampled individuals are obtained from two studies: the San Antonio Family Heart Study (SAFHS) and the San Antonio Family Diabetes/Gallbladder Study (SAFDGS), collectively referred to as the San Antonio Mexican American Family Studies (SAMAFS). The strategy is to sequence approximately 600 individuals at an average of 50x coverage across the entire genome, then impute genome wide genotypes for about 440 additional family members. The 600 sequenced individuals are specifically chosen for their value in imputing sequence information into other family members. By studying large pedigrees, we expect to find multiple individuals carrying each genetic variant, even if this variant is very rare in the population at large. Thus, a pedigree-based approach provides an excellent opportunity for identifying rare novel variants influencing risk of T2D and quantitative variation in T2D-related phenotypes. The whole genome sequencing has been done commercially by Complete Genomics, Inc. (CGI).

The final data set includes whole genome sequence data for 607 individuals. After quality control, 585 sequenced individuals provide data for family based imputation, using Merlin linkage analysis software, into approximately 440 additional family members for whom chip based genotypes are available to indicate which parental haplotype is transmitted.

Extensive phenotype data is provided for 1048 individuals. These include 5 sequenced individuals who do not belong to any of the 20 large pedigrees. Phenotype information was collected between 1991 and 2011 in the two contributing longitudinal studies. SAFHS participants may have information from up to 5 visits, and SAFDGS participants may have up to 4 visits. The clinical variables reported are coordinated with T2D-GENES Project 1 (multi-ethnic exome sequencing) and include T2D status and age at diagnosis, glycemic traits (fasting and 2 hour glucose and insulin), blood pressure, blood lipids (total cholesterol, HDL cholesterol, calculated LDL cholesterol and triglycerides), clinical chemistry (cystatin c, glutamic acid decarboxylase antibody titer (GadAb), creatinine, adiponectin and leptin). Glycated hemoglobin (HbA1c) was not measured for these individuals and insulin C-peptide is not included in this data set. Additional phenotype data include the medication status at each visit, classified in four categories as any current use of diabetes, hypertension or lipid-lowering medications, and, for females, current use of female hormones. Anthropometric measurements include age, sex, height, weight, hip circumference, waist circumference and derived ratios. Each phenotype variable has an initial summary column containing the most recent non-missing measurement for each individual, followed by the five potential time points for each individual, the number of non-missing measurements, and the age and year for the most recent non-missing measurement. For historical reasons, the order in which variables are presented on the dbGaP web site differs from their order in the data download file. When reading the comment fields for each variable, please note that commas are omitted to support data exchange in .csv format.

  • Study Types: Family, Longitudinal
  • Number of study subjects that have individual level data available through Authorized Access: 1048

Authorized Access
Publicly Available Data (Public ftp)

Connect to the public download site. The site contains release notes and manifests. If available, the site also contains data dictionaries, variable summaries, documents, and truncated analyses.

Study Inclusion/Exclusion Criteria

The T2D-GENES Project 2 family data relate to two San Antonio-based family studies: SAFHS and SAFDGS. The SAFHS began in 1991, and included 1,431 individuals in 42 extended families at baseline. Probands were 40 to 60 year old low-income Mexican Americans selected at random without regard to presence or absence of disease, almost exclusively from Mexican American census tracts in San Antonio, Texas. All first, second, and third degree relatives of the proband and of the proband's spouse, aged 16 years or above, were eligible to participate in the study. As part of our ongoing studies, we are currently recruiting new family members from the original families. The SAFDGS also began in 1991, and originally included 579 examined individuals distributed across 32 pedigrees as part of the San Antonio Family Diabetes Study (SAFDS). The sample size was expanded to more than 900 individuals through two recalls. The second recall refers to the San Antonio Family Gallbladder Study (SAFGS), which recruited new family members from the original SAFDS families and family members from 8 newly recruited families. The SAFDS and SAFGS are collectively called the San Antonio Family Diabetes/Gallbladder Study (SAFDGS), which includes more than 900 subjects that are distributed across 40 extended families. The probands for the SAFDGS were individuals with T2D identified in an earlier epidemiologic survey, the San Antonio Heart Study. Only low-income Mexican Americans identified in the San Antonio Heart study as having T2D were eligible to be probands. These individuals were approached in random order without regard to how many T2D individuals were in their families (i.e., no attempt was made to preferentially recruit multiplex families). Thus, the T2D probands in the SAFDGS constitute a population based case series of T2D individuals. All first, second, and third degree relatives, aged 18 or above, were invited to participate in the study. As part of our ongoing studies, we have recalled some SAFDGS participants.

Molecular Data
TypeSourcePlatformNumber of Oligos/SNPsSNP Batch IdComment
Whole Genome Sequencing Complete Genomics Assembler Version 1.11; File Format Version: 1.6 N/A N/A
Study History

The T2D-GENES Consortium Project 2 data are obtained from 20 Mexican American T2D-enriched pedigrees chosen from the SAMAFS (i.e., SAFHS and SAFDGS). This project is part of one of the five awards funded by NIDDK under a cooperative agreement award mechanism, which is governed by the Steering Committee of the T2D-GENES Consortium. Of the available SAMAFS families, 20 large pedigrees, consisting 1,043 individuals, were selected for this project by focusing on large lineages in order to maximize the number of founder copies as well as to get an optimal ratio of sequencing efficiency and sufficient number of T2D individuals. These pedigrees average approximately 52 individuals with a maximum pedigree size of 87 individuals. The large pedigree-based approach is utilized to maximize the probability of identifying all genetic variants in the complete genetic regions of the genomes (i.e., whole genome) by sequencing, including rare variants that segregate within families. To perform WGS studies, the program "ExomePicks" was used, which suggested family members to be sequenced in large pedigrees. In total, approximately 600 individuals were chosen for WGS. The full sequences for the remaining of the 1,043 individuals are obtained using family-based imputation, given that individuals in the sample are previously assessed for a high-density SNP framework. The individuals have been followed in a mixed longitudinal fashion, up to a maximum of 5 visits. Phenotype data, including T2D affection status and T2D-related quantitative traits (e.g., glucose, insulin, BMI, blood pressure, and lipids), are available for the study participants. The genome-wide genotype data, representing more than one million SNPs, are available for the study participants. These data were obtained using different versions of the Illumina Infinium Beadchips: HumanHap550v3, supplemented with HumanExon510Sv1; Human660W-Quadv1; Human1Mv1; and Human1M-Duov3. The raw genotype data obtained were processed using standard quality control procedures.

Selected publications
Diseases/Traits Related to Study (MESH terms)
Authorized Data Access Requests
Study Attribution