• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of ajhgLink to Publisher's site
Am J Hum Genet. May 9, 2008; 82(5): 1101–1113.
Published online May 2, 2008. doi:  10.1016/j.ajhg.2008.03.006
PMCID: PMC2651622

Genetic Architecture of Transcript-Level Variation in Humans

Abstract

We report here the results of testing the pairwise association of 12,747 transcriptional gene-expression values with more than two million single-nucleotide polymorphisms (SNPs) in samples of European (CEPH from Utah; CEU) and African (Yoruba from Ibadan; YRI) ancestry. We found 4,677 and 5,125 significant associations between expression quantitative nucleotides (eQTNs) and transcript clusters in the CEU and the YRI samples, respectively. The physical distance between an eQTN and its associated transcript cluster was referred to as the intrapair distance. An association with 4 Mb or less intrapair distance was defined as local; otherwise, it was defined as distant. The enrichment analysis of functional categories shows that genes harboring the local eQTNs are enriched in the categories related to nucleosome and chromatin assembly; the genes harboring the distant eQTNs are enriched in the categories related to transmembrane signal transduction, suggesting that these biological pathways are likely to play a significant role in regulation of gene expression. We highlight in the EPHX1 gene a deleterious nonsynonymous SNP that is distantly associated with gene expression of ORMDL3, a susceptibility gene for asthma.

Introduction

Genomic variation can provide molecular markers for the variation of many phenotypes.1,2 Moreover, transcriptional-expression differences can be mapped to one or more segments of DNA close to or distant from the genomic location of the gene.3 These segments of DNA, referred to as expression quantitative trait loci (eQTLs), or expression quantitative nucleotides (eQTNs) at the level of the individual polymorphism, can be local by regulating the expression of its target gene through at least four different modes, including neighboring gene regulation, allele-specific regulation, autoregulation, and feedback regulation.3 The first integrated study of DNA variation and gene expression was performed in budding yeast4 to dissect the genetic architecture of transcriptional regulation, an approach later termed genetical genomics.5 The identification of eQTLs (and eQTNs) is becoming a useful tool to bridge gene-expression results and genetic findings from the traditional QTL mapping studies in multiple species such as Arabidopsis,6 maize,7 Caenorhabditis elegans,8 mice,9–12 and humans.1,2,13 Microarray technology has allowed genome-wide association studies to evaluate heritability and global gene expression.14–18 In particular, three recent studies provide a comprehensive analysis of genetic alterations underlying regulation of gene expression.19–21 Our objective was to determine eQTL relationships with eQTNs in two HapMap22 populations (CEPH from Utah [CEU] and Yoruba from Ibadan [YRI]) through the integration of HapMap genotype information on more than two million common SNPs and gene expression of 12,747 transcript clusters (TCs), each of which contains a set of probesets representing all known exonic regions in the genome as well as the 5′- and 3′-untranslated regions.

Material and Methods

Lymphoblastoid Cell Lines

HapMap cell lines (30 CEU trios and 30 YRI trios) were purchased from Coriell Institute for Medical Research. Lymphoblastoid cell lines (LCLs) were maintained in RPMI 1640 containing 1% l-glutamine plus FBS (20% for 1st dilution, 15% for following dilutions) as previously described.2,23 Both sets of YRI and CEU lines were diluted to a concentration of 3.5–4.0 × 105 cells/mL every 2–3 days and harvested after the 4th dilution from exponentially growing cells, only if viability was ≥ 85%. Cell suspensions were spun at 400 × g for 5 min to remove media. Cell pellets were washed twice with ice-cold PBS (Invitrogen) and stored at −80°C.

Gene-Expression Evaluation

Samples of CEU and YRI were run concomitantly in efforts to minimize potential batch effects. Baseline gene expression was evaluated in 176 HapMap LCLs (87 CEU and 89 YRI) with Affymetrix GeneChip Human Exon 1.0 ST array. Cell pellets were thawed and total RNA was extracted with QIAGEN Qiashredder and RNeasy Plus kits (QIAGEN) according to the manufacturer's protocol. RNA concentration and purity was determined through measurement of A260/A280 ratios with the Spectronic Genesys 6 UV/Vis Spectrophotometer (Thermo Electron) as described.23 For each cell line, ribosomal RNA was depleted from 1 μg of total RNA with the RiboMinus Human/Mouse Transcriptome Isolation kit (Invitrogen). cDNA was generated with the GeneChip WT cDNA Synthesis and Amplification Kit (Affymetrix) per the manufacturer's instructions. cDNA was fragmented and end labeled with the GeneChip WT Terminal Labeling Kit (Affymetrix). Approximately 5.5 μg of labeled DNA target was hybridized to the Affymetrix GeneChip Human Exon 1.0 ST Array at 45°C for 16 hr per the manufacturer's recommendation. Hybridized arrays were washed and stained on a GeneChip Fluidics Station 450 and scanned on a GCS3000 Scanner (Affymetrix). We used the Robust Multichip Average (RMA) approach for the summarization and the log2 transformation approach to normalize the expression data. Detailed descriptions can be found in our previous publications.2,23

SNP Selection Criteria

The SNP genotypes of CEU and YRI populations were downloaded from the online HapMap database (release 22, nonredundant and rs_strand version). Altogether, 2,098,437 (CEU) and 2,286,186 (YRI) SNPs with minor allele frequencies > 5% and with no Mendelian-inheritance transmission errors in the CEU or YRI trios were used for the present study.

Genome-wide Association for TC-eQTN Pairs

The integrated studies between 12,747 log2-transformed TC expression levels and more than two million SNPs in the CEU or YRI were performed with the QTDT software.24,25 To distinguish between local and distant eQTNs and their associated TCs, we chose to define a gene as locally associated if its gene expression was associated with eQTN(s) within 4 Mb on either side of the gene and on the same chromosome; others were defined as distant. All physical positions used in the present study were based on NCBI build 36.1.

Analysis of eQTLs that Might Perturb Sequence Conservation and Transcription-Factor Binding Sites

The physical locations of eQTLs were compared to the most conserved regions generated from the UCSC alignments of 17 vertebrate genomes and to the conserved vertebrate transcription-factor binding-site regions from the same browser. The locations were tested with Z score criteria of 1.66 or 2.33 corresponding to p values of 0.05 and 0.01, respectively.

eQTN_BLOCKs and eQTN_hotspots

The eQTN_BLOCK (see Figure S1 available online) was defined as a region of the genome containing one or more eQTNs associated with the same TC and having a between-eQTN interval of < 500 Kb. Detailed information on eQTN_BLOCKs in the CEU and YRI is available in Table S1. The eQTN_hotspot was defined as a region of the genome containing one or more eQTNs associated with the expression of multiple nonredundant TCs. In the present study, we used 500 Kb as the bin size and thus divided the human genome into 5,691 bins. Only the distantly associated TC-eQTN pairs were included in the analysis.

Enrichment Analysis of Functional-Annotation Categories

We used DAVID bioinformatics tools26 to identify enriched functional-annotation categories. NCBI gene identifiers were used for the upload format. The genes harboring the more than two million SNPs in the CEU were used as the background data set for the CEU-local and CEU-distant eQTN-GENEs. The genes harboring the > 2 million SNPs in the YRI were used as the background data set for the YRI-local and YRI-distant eQTN-GENEs; the genes annotated for a total of 12,747 core TCs were used as the background data set for the CEU-distant, YRI_Hs_30, and YRI_Hs_36 TC-GENEs. The expression levels of enriched genes were grouped by a hierarchical clustering algorithm27 using the average-linkage method, which was implemented in the MeV:MultiExperiment Viewer (TIGR).28

Results

Significant TC-eQTN Associations in the CEU and the YRI

A total of 12,747 TCs, whose expression values were greater than the 25th percentile of average expression level of all TCs in the 176 HapMap samples, were selected for genome-wide association with more than two million common SNPs independently in CEU and YRI samples (Figure 1A). A p value of 2 × 10−8 corresponding to a false discovery rate (FDR)26 of approximately 10% for both CEU and YRI samples was set as the threshold for statistical significance (Figure 1B). With this stringent p value cutoff, 4,677 and 5,125 significant TC-eQTN associations were observed in the CEU and the YRI samples, respectively (Figures 1C and 1D and Table S2). We refer to the distance between an eQTN and its associated TC as the intrapair distance. The distribution of intrapair distances in 2,102 (CEU) and 761 (YRI) TC-eQTN pairs in the same chromosome is shown in Figure 1E and in conserved regions in Figure 1F. The intrapair distances in the CEU tend to be larger than in the YRI, and there are dramatically more TC-eQTN pairs with intrapair distances ranging from 1 to 4 Mb in the CEU than in the YRI (Figure 1E). Thus, to distinguish between close- and long-range associations in both populations, we chose to define a TC-eQTN pair with an intrapair distance of 4 Mb or less as local; all others were defined as distant. By this definition, the ratio of distant to local TC-eQTN pairs was greater in YRI samples (4,537/588) compared to CEU (2,667/2,010) (Table 1).

Figure 1
Features of Significant Association in the CEU and the YRI LCLs
Table 1
Numbers of Significant Associations in the CEU and YRI Samples

Associated TCs in the CEU and the YRI

A total of 741 (CEU) and 1,701 (YRI) unique TCs that are significantly associated with genetic variants were captured. Among them, there are 67 local and 691 distant TCs in CEU and 65 local and 1,665 distant TCs in YRI (Table 1). The genomic distribution of eQTNs and their associated TCs in CEU and YRI samples are shown in Figure 2. No chromosome was either overrepresented or underrepresented relative to the null distribution of the TCs in the analysis set (data not shown).

Figure 2
The genomic Distribution of eQTNs and TCs in the CEU and the YRI Populations

Bioinformatic Properties of the eQTNs

There were 3,902 and 4,148 significant eQTNs in CEU and YRI (Table 1), respectively, with an overlap of 244 in both populations. A total of 184 (CEU) and 206 (YRI) eQTNs are located within the most conserved regions (based on UCSC vertebrate genome alignments, March 2006). Only 19 of these highly conserved eQTNs overlap between the two populations which is not significantly different from the expected number. The distribution of intrapair distances of the conserved TC-eQTN pairs showed that most of the conserved local eQTNs are located within their associated genes (Figure 1F). We also examined the location of the significant eQTNs with respect to transcription-factor binding sites. Only 20 local and 80 distant eQTNs are located in the motifs of transcription-factor binding sites. Among them, six and three transcription-factor binding-site eQTNs are located within or < 10 Kb away from their locally associated genes in CEU and YRI, respectively. One potential explanation for the distant associations of eQTNs with gene expression is that the eQTN may be located near a transcription factor, and therefore the SNP affects the target-gene expression indirectly through the regulation of a transcription factor. Using the known or predicted transcription-factor gene sets collected by Messina et al.,29 we found a small proportion of distant eQTNs are located within or < 10 Kb away from transcription-factor gene loci (57 out of 1,283 and 121 out of 3,561 distant eQTNs for the CEU and the YRI, respectively), which agrees with the previous findings in Saccharomyces cerevisiae that transcription factors showed no enrichment of the trans-regulatory variation.30

eQTN_BLOCKs

Because eQTNs in close physical proximity tend to be in high linkage disequilibrium (LD), we defined eQTN_BLOCKs as eQTN(s) in close proximity and associated with expression of the same TC. This step reduces LD-induced redundancy and identifies genetic regions with multiple independent alleles that affect expression of the same TC. An eQTN_BLOCK can harbor as many as 346 eQTNs and stretch from 0 bp (single eQTN) up to 3.8 Mb. The median size of local eQTN_BLOCKs (harboring ≥ two eQTNs) in the CEU is 60,160 bp, almost twice that in the YRI (32,947 bp); similarly, the median size of distant eQTN_BLOCKs in the CEU is 15,058 bp compared to 6,007 bp in the YRI.

A total of 67 local and 1,074 distant eQTN_BLOCKs are associated with 67 and 691 TCs in the CEU samples, respectively. In YRI, 65 local and 2,786 distant eQTN_BLOCKs are associated with 65 and 1,665 TCs, respectively. Among them, 175 TCs are found in both CEU and YRI, with the majority of them being in the same mode (local or distant), including 25 local TCs and 143 distant TCs in both populations. Strikingly, 23 of these 25 TCs share the same local eQTN_BLOCKs across populations (see Table S3); the local eQTN_BLOCKs of the remaining two TCs are associated with local eQTN_BLOCKs with positions within 10 Kb across the populations. In contrast, none of the 143 distant TCs share the same distant eQTN_BLOCKs. One possible explanation for this is that the same distant TC-eQTN association found in one population may exist in the other population but with lower significance. However, even when a more inclusive p value cutoff of 3 × 10−6 was applied, there were no overlapping distant TC-eQTN pairs in the CEU and the YRI samples. Four TCs are associated with overlapping distant eQTN_BLOCKs across the populations with p value cutoff of 2 × 10−8. In addition, 67 TCs were associated with eQTN_BLOCKs with positions within 100 Kb across the populations, and 479 TCs were associated with eQTN_BLOCKs with positions within 1 Mb across the populations (data not shown).

Hotspots of eQTNs and Gene-Ontology Analysis

Some eQTNs are associated with multiple TCs. We found 14 and 38 of these distant eQTN_hotspots in the CEU and the YRI, respectively (Figure 3A). The number of distinct TCs linked to eQTNs in each bin is shown by bar height in Figure 3A, and the dashed lines show the maximum number of associated TCs that would be expected to fall into any one bin by chance with a probability equal to 0.001, corrected for the number of bins. Bins with bar heights at or above this line represent eQTN_hotspots. Hotspots were numbered sequentially according to their locations in the genome from chromosome 1 to 22. The hotspots harbor eQTNs associated with up to 42 and 36 distinct TCs in the CEU and the YRI samples, respectively (see Table S4).

Figure 3
The Distantly Associated eQTN_hotspots

The enrichment analysis in functional-annotation categories was carried out for both genes harboring eQTNs and the annotated genes of TCs (Table 2). The most highly enriched functional category is nucleosome assembly for the local eQTN-harboring genes in the CEU (14 hits, p = 1.0 × 10−14, Benjamini-Hochberg (B-H) corrected Pc = 3.4 × 10−11) (see 26). The same category is also enriched in the local eQTN-harboring genes in the YRI, though with less significance (6 hits, p = 1.5 × 10−5, B-H corrected Pc = 5.0 × 10−2). Moreover, this category is again enriched in the TC-GENES associated with the eQTN_hotspot YRI_Hs_30 (4 hits, p = 2.3 × 10−6, B-H corrected Pc = 3.5 × 10−3, Figure 3B). Note that eQTN_hotspot YRI_Hs_36 is also associated with several histone genes that are important for nucleosome assembly (Figure 3C). As shown in Figures 3B and 3D, the expression patterns of the histone genes associated with the eQTNs in the hotspots are very similar to each others.

Table 2
Enriched Functional Annotations of Genes Harboring eQTNs and TC-GENEs

As a follow-up to the above findings, we evaluated the expression profiles of genes that are important in nucleosome assembly. Histones and their related genes are involved in the maintenance of the structure of nucleosome and chromatin and thus are critical for the transcriptional efficiency of DNA. Our microarray probes cover at least 71 histones and 28 histone-related genes. Because most of the histone genes are clustered in the genome, they tend to be expressed in a similar pattern (Figure 3D), although at different levels (Figure 3E). A total of eight and 28 of these genes are associated with one or more eQTN_BLOCKs in the CEU and YRI, respectively (Figure 3E, Table S5). We found that one and seven in the CEU and two and 28 in the YRI histones, or their related genes, are associated with local and distant eQTNs in their respective population (see Table S5). Two histone genes, HIST1H3B [MIM *602819] and HIST1H2AB [MIM *602795], are associated with eQTN_BLOCKs in both CEU and YRI samples. More specifically, the expression of HIST1H3B is associated with 13 distinct eQTN_BLOCKs in the YRI. Among them, one local eQTN_BLOCK is also associated with HIST1H3B expression in the CEU. The gene expression of HIST1H2AB is associated with two nonoverlapping eQTN_BLOCKs, one for each population sample (see Table S5). Using a general linear model (GLM), we found that HIST1H3B, HIST1H3G [MIM *602815], and HIST1H1C [MIM *142710] are differentially expressed in the CEU and the YRI samples (PGLM = 1.59 × 10−7, 4.46 × 10−7, 1.32 × 10−5, respectively).23

In addition, we note that the genes harboring distant eQTNs are enriched in the cellular component of “membrane” and the biological process of synaptic transmission in the CEU (Table 2), and distant TC-GENES are enriched in the molecular function of transmembrane-receptor activity in the CEU (Table 2). In the YRI, the genes harboring the distant eQTNs are also enriched in the gene-ontology cellular-component category of membrane and the InterPro category of several integral membrane-protein families including EGF. However, TC-GENES do not exhibit enrichment in any annotations (Table 2). Our data demonstrate that transmembrane signal transduction is the major biological process related to the distant TC-eQTN associations in both populations.

Deleterious Nonsynonymous eQTNs

Three CEU eQTNs (rs724558 [T99I], rs1122326 [P2Q], rs1051740 [H113Y]) that result in deleterious nonsynonymous amino acid changes in SERPINB10 (MIM *602058), HSPB9 (MIM *608014), and EPHX1 (MIM +132810) genes, respectively, were identified with the SIFT program.31 These interesting eQTNs are associated with the gene expression of SERPINB10, LGP2 (MIM *608588) and ORMDL3 (MIM *610075) in the CEU samples, respectively. A significant association between rs7216389 and ORMDL3 gene expression was reported by Moffatt et al.32 In addition, we identified a more significant association of ORMDL3 gene expression with rs1051740 genotype (Figure 4A). We tested the association between SNP rs1051740 and gene expression of ORMDL3 in the CEU samples by using as a covariate the genotypes of SNP rs7216389, a SNP that was previously implicated in asthma (MIM 600807) susceptibility.32 The results show that SNP rs1051740 is significantly associated with its expression (p = 4 × 10−6), suggesting that SNP rs1051740 is contributing through an additive effect with SNP rs7216389 (Figure 4B) to expression of ORMDL3. A protein-domain search has shown that rs1051740 (amino acid connecting a β-sheet and α-helix in the conserved protein domain) is predicted to play an important role in maintaining the structure of EPHX1 (Epoxide hydrolase N terminus, CDD:69934) (Figure 4C).

Figure 4
ORMDL3 Gene Expression Is Associated with SNP rs1051740 in the CEU

Discussion

We report 4,677 and 5,125 significant associations between eQTNs and TCs in the CEU and the YRI samples, respectively, by using a pairwise association of 12,747 transcriptional gene expression values with more than two million SNPs in samples of CEU and YRI ancestry. During the preparation of this paper, three other large-scale eQTL studies19–21 were reported. Goring et al.20 reported a genetical-genomics linkage study between 19,648 transcript-level expression phenotypes and 432 polymorphic microsatellites in 1,240 individuals. Dixon et al.21 carried out a global genetical-genomics association study between 15,084 transcript-level expression phenotypes and 408,273 SNPs in 400 children. Stranger et al.19 performed a local genetical-genomics association study between 14,456 transcript-level expression phenotypes and the local SNPs within 1 Mb physical distance, as well as a distant genetical-genomics association study between 14,456 transcript-level expression phenotypes and ~25,000 selected SNPs in 270 HapMap samples. Although the present study has a smaller sample size (176 HapMap samples), we have performed a more extensive genetical-genomics association study with more than two million markers per expression phenotype, and thus provide enriched distant eQTN information for the expression phenotypes. By using the Affymetrix GeneChip Human Exon 1.0 ST array, our study has the advantage of determining expression levels of probes across the whole gene (5′-UTR, exons and 3′-UTR), which is considered a more accurate measure of gene expression.33 Previous studies utilizing earlier Affymetrix arrays (Affymetrix Focus array and U-133 series) are biased in that the oligonucleotides are designed at the 3′ end of the gene.14,21 Nonetheless, there are 54 (CEU: 30; YRI: 23) relationships between local genes and eQTNs common between our study and that of Stranger et al.19 Similarly, 45 (CEU: 39; YRI: 18) local genes overlap with the same eQTNs found in our study and the study of Dixon et al.21 (see Table S6).

Highlighted in the present study are eQTN_hotspot regions harboring pleiotropic eQTNs associated with expression phenotypes of multiple TCs. These hotspot regions are potentially important for the discovery of interactive gene-gene networks. The bin-based identification of eQTN_hotspot may be dependent on the LD pattern within the bin region. Among the hotspots found in our study, most hotspots harbor pleiotropic eQTNs associated with at least six TCs, and all hotspots harbor pleiotropic eQTNs associated with at least two TCs, even in the bin with low LD pattern. This suggests that the bin-based eQTN_hotspot approach is very efficient for highlighting the pleiotropic eQTNs among thousands of significant associations. We have identified TC-eQTN_BLOCK pairs to define independent genetic regions, each of which harbors a set of eQTNs associated with the expression of the same TC. Thus, we dramatically reduce the numbers of TC-eQTN pairs by taking into consideration the possibility that high LD may exist with eQTNs in close proximity associated with expression of the same TC.

Since genetically indistinguishable SNPs,34 which are in perfect LD (D' = 1, r2 = 1), may be associated with the same expression phenotypes of TCs, it is helpful to explore the biological significance of the identified eQTNs. We have evaluated whether the eQTNs are located in conserved regions, transcription-factor binding-site motifs, or transcription-factor genes. Despite the fact that we did not find significant enrichment in the above biologically important DNA segments for the eQTNs, we have identified in the CEU samples three eQTNs (rs724558 [SERPINB10_T99I], rs1122326 [HSPB9_P2Q], rs1051740 [EPHX1_H113Y]) that result in deleterious nonsynonymous amino acid changes in three genes, SERPINB10, HSPB9, and EPHX1, respectively. Moreover, the prediction using orthologs and homologs in the protein alignments suggests that the minor allele of these SNPs is potentially undermining protein function. These three eQTNs (rs724558, rs1122326, rs1051740) are associated with the expression level of SERPINB10, LGP2, and ORMDL3, respectively. SERPINB10 belongs to the superfamily of high-molecular-weight serine proteinase inhibitors (serpins), which are mainly clustered on human chromosome 18 and are key regulatory proteins in important biologic processes.35 A report has shown in the SERPINB10 gene two other missense variations (rs8097425 and rs963075) conferring important risks for prostate cancer.36 The LGP2 gene is 21 kb away from HSPB9 in the reverse direction. LGP2, a RNA helicase with multiple functionally important domains, is involved in the regulation of interferon production, and thus has potential therapeutic implications for immune regulation.37 The association between the local eQTN (rs1122326) and the transcript level of LGP2 provides a new clue for exploring the features of the LGP2 gene.

EPHX1 is one of the epoxide hydrolases, which play an important role in both the activation and detoxification of exogenous chemicals. This EPHX1 variation (H113Y, rs1051740) was reported to reduce EPHX activity by approximately 40%.38 It has been suggested to be responsible for genetic susceptibility to multiple diseases including lymphoproliferative disorder,39 preeclampsia,40 emphysema, and chronic obstructive pulmonary disease.41 In the present study, we found that this specific EPHX1 variation (rs1051740) is distantly associated with expression level of ORMDL3. The minor allele “C” of rs1051740 corresponds to a lower EPHX1 activity38 and higher expression of ORMDL3 (Figure 4B), a gene that has been shown to be associated with susceptibility to childhood asthma.32 In addition, our study also confirmed Moffatt et al.'s findings32 that the T allele at SNP rs7216389 is associated with higher expression of the ORMDL3 gene, though at a less significant level. A protein-domain search has shown that rs1051740 affects the amino acid connecting a β-sheet and α-helix in the conserved domain, thus playing an important role in maintaining the structure of the EPHX1 protein. Moreover, a test for interaction between SNP rs7216389 and rs1051740 revealed that these SNPs are additive with respect to expression of the ORMDL3 gene. Another independent study has shown that higher EPHX1 activity is associated with an increase risk for lifetime asthma,42 although it seems to be converse to our hypothesis that EPHX1 variation (rs1051740-C) leads to lower EPHX1 activity38 and distantly upregulates the expression of ORMDL3, thus contributing to an increased risk for asthma.32 Our results suggest that a stratified study using rs1051740 genotypes may help investigators clarify the effect of EPHX1 and other candidate genes on the complex disease of asthma.

There are 67 and 65 gene-expression phenotypes locally associated with one or more SNPs in the CEU or YRI population samples, respectively, and 691 and 1,665 gene-expression phenotypes distantly associated with one or more SNPs in the CEU or YRI population samples, respectively. There are several explanations that there are more distant TCs found in the YRI than in the CEU. One possibility is that there are 9% more SNPs evaluated in YRI compared to CEU (2.29 million versus 2.1 million) population. Another factor could be the larger size of the linkage disequilibrium blocks observed in the CEU compared to the YRI (16.3 Kb for CEU versus 7.3 Kb for YRI),43 implying that there are fewer genetically indistinguishable SNPs in the YRI than in the CEU. Thus, given the higher spectrum of SNP panels in the YRI samples, the SNPs in the YRI might have a higher possibility of being associated with the expression of one or more TCs than those in the CEU.

Although similar numbers of eQTNs have been identified in the CEU (3,902) and in the YRI (4,148), only a fraction of the eQTNs are associated with the same expression phenotypes in both population samples (243). To determine whether this is simply an effect of a stringent p value threshold (2 × 10−8), we chose the common SNPs (2,500 in CEU and 2,729 in YRI) that have a minor allele frequency > 5% to test for significance in their counterpart population samples. These 2,500 and 2,729 eQTNs are involved in 3,196 and 3,098 significant TC-eQTN associations in the YRI and CEU, respectively. We identified 384 (see Table S7) and 318 (see Table S8) significant TC-SNP pairs by using a less stringent p value cutoff of 1.6 × 10−5 in the CEU and YRI, respectively. All these TC-eQTN associations are local (p < 1.6 × 10−5), and the best p values for the distant TC-eQTN associations are only 0.0004 and 0.001 in the CEU and YRI population samples, respectively (data not shown), suggesting that the distant associations are highly population specific. In addition, given an arbitrary p value cutoff without taking into account the redundancy of SNPs in high LD in the current study, a more appropriate way to define a significant cutoff is to use a nonparametric simulation method. Therefore, we randomly selected 400 distant TC-eQTN associations from each population and performed a simulation (n = 10,000) to determine the empirical p values. We showed that almost all (>99.5%) of the randomly selected distant TC-SNP associations could be validated by simulation.

As a result of the high density of SNPs selected in this study, we expect to see a large degree of redundancy among the eQTNs. This was, in fact, observed between the TC-eQTN pairs through the topological demonstration of the relationships between eQTNs and their associated TCs (see Figure S1). Not surprisingly, TC-eQTN pairs in the CEU have higher redundancy than those in the YRI, because the average length of LD blocks in the CEU is more than twice that in the YRI (16.3 Kb versus 7.3 Kb).36 As shown in Figure S2, the redundancy mainly lies within the local TC-eQTN pairs. The replacement of the TC-eQTN pairs with TC-eQTN_BLOCK pairs efficiently reduces the redundancy without sacrificing the ability to detect interactive networks (see Figure S1). Our genetical-genomics association study revealed 4,677 and 5,125 significant TC-eQTN pairs in the CEU and the YRI samples, respectively. These TC-eQTN pairs can be further represented by 67 local TC-eQTN_BLOCK pairs, 50 distant same-chromosome TC-eQTN_BLOCK pairs, and 1,024 distant different-chromosome TC-eQTN_BLOCK pairs in the CEU and by 65 local TC-eQTN_BLOCK pairs, 118 distant same-chromosome TC-eQTN_BLOCK pairs, and 2,668 distant different-chromosome TC-eQTN_BLOCK pairs in the YRI (Table 1). A significantly higher proportion of distant TC-eQTN_BLOCK pairs are observed in the YRI compared to the CEU (X2 = 31.8, df = 1, p = 1.7 × 10−8).

The enrichment analysis in functional-annotation categories has suggested that the regulation of gene expression extensively involves histone genes, which are important for packaging the DNA into a higher-order chromatin structure and therefore fundamental for controlling gene expression. The expression patterns of histones and their related genes are similar within each subgroup, suggesting that regulation of expression of the genes in each subgroup may be closely related. The active involvement of histones and their related genes in the transcript-level regulation imply that a complex interactive network exists. The divergence in the expression levels of histones (HIST1H3B, HIST1H3G, and HIST1H1C) and the numbers and modes of histone-related TC-eQTN pairs leads us to hypothesize that there might be differential hierarchical regulation of transcript-level expression between the CEU and YRI. Further experimentation is required to test this hypothesis. Distant eQTNs tend to be associated with the expression phenotypes of genes involved with transmembrane signal transduction, implying that the distant regulation of transcript-level expression can be an indirect cascade mode.

This study presents a global view of the genetic background for the transcriptomic differences in populations. The data analysis can be applied in other QTL studies. The generated eQTN information can be integrated to provide further insight into the regulation of gene expression associated with genetic findings in QTL mapping studies. These data can serve as a rich resource to supplement findings from a wide variety of genetic studies including candidate gene, linkage, and genome-wide association studies.

Web Resources

The URLs for data presented herein are as follows:

Supplemental Data

Two figures and eight tables are available at http://www.ajhg.org/.

Supplemental Data

Document S1. Two figures:
Table S1. The TC-eQTN_BLOCK Pairs in the CEU and the YRI Populations:
Table S2. The TC-eQTN Pairs in the CEU and the YRI Populations:
Table S3. The TC-eQTN_BLOCKs Associated with the Same TCs in the CEU and the YRI Populations:
Table S4. The eQTN_hotspots in the CEU and the YRI Populations:
Table S5. The TC-eQTN_BLOCK Pairs of Histones and Their Related Genes in the CEU and the YRI Populations:
Table S6. The Consistency of eQTNs between Our Study and Other Studies:
Table S7. Replicable YRI TC-eQTN Pairs in the CEU Populations:
Replicable CEU TC-eQTN Pairs in the YRI Populations:

Acknowledgments

We thank the International HapMap Consortium for data availability and Jeong-Ah Kang for maintaining cell lines. This Pharmacogenetics of Anticancer Agents Research (PAAR) Group study was supported by National Institutes of Health/National Institute of General Medical Sciences grant U01GM61393. PAAR data have been deposited into PharmGKB, a knowledge base supported by U01GM61374. Gene-expression data are deposited in Gene Expression Omnibus: GSE7851. Four authors of this manuscript (T.A.C., T.X.C., A.C.S., and J.E.B.) are employees of Affymetrix Inc., Santa Clara, CA 95051. Their employment with Affymetrix could be construed as a conflict of interest because they may indirectly benefit from sales of Affymetrix GeneChip Human Exon 1.0 ST array.

References

1. Huang R.S., Duan S., Bleibel W.K., Kistner E.O., Zhang W., Clark T.A., Chen T.X., Schweitzer A.C., Blume J.E., Cox N.J., Dolan M.E. A genome-wide approach to identify genetic variants that contribute to etoposide-induced cytotoxicity. Proceedings of the National Academy of Sciences of the United States of America. 2007;104:9758–9763. [PMC free article] [PubMed]
2. Huang R.S., Duan S., Shukla S.J., Kistner E.O., Clark T.A., Chen T.X., Schweitzer A.C., Blume J.E., Dolan M.E. Identification of genetic variants contributing to Cisplatin-induced cytotoxicity by use of a genomewide approach. Am. J. Hum. Genet. 2007;81:427–437. [PMC free article] [PubMed]
3. Rockman M.V., Kruglyak L. Genetics of global gene expression. Nat. Rev. Genet. 2006;7:862–872. [PubMed]
4. Brem R.B., Yvert G., Clinton R., Kruglyak L. Genetic dissection of transcriptional regulation in budding yeast. Science. 2002;296:752–755. [PubMed]
5. Jansen R.C., Nap J.P. Genetical genomics: The added value from segregation. Trends Genet. 2001;17:388–391. [PubMed]
6. DeCook R., Lall S., Nettleton D., Howell S.H. Genetic regulation of gene expression during shoot development in Arabidopsis. Genetics. 2006;172:1155–1164. [PMC free article] [PubMed]
7. Salvi S., Sponza G., Morgante M., Tomes D., Niu X., Fengler K.A., Meeley R., Ananiev E.V., Svitashev S., Bruggemann E. Conserved noncoding genomic sequences associated with a flowering-time quantitative trait locus in maize. Proc. Natl. Acad. Sci. USA. 2007;104:11376–11381. [PMC free article] [PubMed]
8. Li Y., Alvarez O.A., Gutteling E.W., Tijsterman M., Fu J., Riksen J.A., Hazendonk E., Prins P., Plasterk R.H., Jansen R.C. Mapping determinants of gene expression plasticity by genetical genomics in C. elegans. PLoS Genet. 2006;2:e222. [PMC free article] [PubMed]
9. Hubner N., Wallace C.A., Zimdahl H., Petretto E., Schulz H., Maciver F., Mueller M., Hummel O., Monti J., Zidek V. Integrated transcriptional profiling and linkage analysis for identification of genes underlying disease. Nat. Genet. 2005;37:243–253. [PubMed]
10. Bystrykh L., Weersing E., Dontje B., Sutton S., Pletcher M.T., Wiltshire T., Su A.I., Vellenga E., Wang J., Manly K.F. Uncovering regulatory pathways that affect hematopoietic stem cell function using ‘genetical genomics.’ Nat. Genet. 2005;37:225–232. [PubMed]
11. Schadt E.E., Lamb J., Yang X., Zhu J., Edwards S., Guhathakurta D., Sieberts S.K., Monks S., Reitman M., Zhang C. An integrative genomics approach to infer causal associations between gene expression and disease. Nat. Genet. 2005;37:710–717. [PMC free article] [PubMed]
12. Chesler E.J., Lu L., Shou S., Qu Y., Gu J., Wang J., Hsu H.C., Mountz J.D., Baldwin N.E., Langston M.A. Complex trait analysis of gene expression uncovers polygenic and pleiotropic networks that modulate nervous system function. Nat. Genet. 2005;37:233–242. [PubMed]
13. Wang X., Tomso D.J., Chorley B.N., Cho H.Y., Cheung V.G., Kleeberger S.R., Bell D.A. Identification of polymorphic antioxidant response elements in the human genome. Hum. Mol. Genet. 2007;16:1188–1200. [PMC free article] [PubMed]
14. Cheung V.G., Spielman R.S., Ewens K.G., Weber T.M., Morley M., Burdick J.T. Mapping determinants of human gene expression by regional and genome-wide association. Nature. 2005;437:1365–1369. [PMC free article] [PubMed]
15. Deutsch S., Lyle R., Dermitzakis E.T., Attar H., Subrahmanyan L., Gehrig C., Parand L., Gagnebin M., Rougemont J., Jongeneel C.V., Antonarakis S.E. Gene expression variation and expression quantitative trait mapping of human chromosome 21 genes. Hum. Mol. Genet. 2005;14:3741–3749. [PubMed]
16. Monks S.A., Leonardson A., Zhu H., Cundiff P., Pietrusiak P., Edwards S., Phillips J.W., Sachs A., Schadt E.E. Genetic inheritance of gene expression in human cell lines. Am. J. Hum. Genet. 2004;75:1094–1105. [PMC free article] [PubMed]
17. Morley M., Molony C.M., Weber T.M., Devlin J.L., Ewens K.G., Spielman R.S., Cheung V.G. Genetic analysis of genome-wide variation in human gene expression. Nature. 2004;430:743–747. [PMC free article] [PubMed]
18. Stranger B.E., Forrest M.S., Clark A.G., Minichiello M.J., Deutsch S., Lyle R., Hunt S., Kahl B., Antonarakis S.E., Tavare S. Genome-wide associations of gene expression variation in humans. PLoS Genet. 2005;1:e78. [PMC free article] [PubMed]
19. Stranger B.E., Nica A.C., Forrest M.S., Dimas A., Bird C.P., Beazley C., Ingle C.E., Dunning M., Flicek P., Koller D. Population genomics of human gene expression. Nat. Genet. 2007;39:1217–1224. [PMC free article] [PubMed]
20. Goring H.H., Curran J.E., Johnson M.P., Dyer T.D., Charlesworth J., Cole S.A., Jowett J.B., Abraham L.J., Rainwater D.L., Comuzzie A.G. Discovery of expression QTLs using large-scale transcriptional profiling in human lymphocytes. Nat. Genet. 2007;39:1208–1216. [PubMed]
21. Dixon A.L., Liang L., Moffatt M.F., Chen W., Heath S., Wong K.C., Taylor J., Burnett E., Gut I., Farrall M. A genome-wide association study of global gene expression. Nat. Genet. 2007;39:1202–1207. [PubMed]
22. The International HapMap Consortium The International HapMap Project. Nature. 2003;426:789–796. [PubMed]
23. Zhang W., Duan S., Kistner E.O., Bleibel W.K., Huang R.S., Clark T.A., Chen T.X., Schweitzer A.C., Blume J.E., Cox N.J., Dolan M.E. Evaluation of genetic variation contributing to differences in gene expression between populations. Am. J. Hum. Genet. 2008;82:631–640. [PMC free article] [PubMed]
24. Abecasis G.R., Cardon L.R., Cookson W.O. A general test of association for quantitative traits in nuclear families. Am. J. Hum. Genet. 2000;66:279–292. [PMC free article] [PubMed]
25. Abecasis G.R., Cookson W.O., Cardon L.R. Pedigree tests of transmission disequilibrium. Eur. J. Hum. Genet. 2000;8:545–551. [PubMed]
26. Benjamini Y., Hochberg Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society B. 1995;57:289–300.
27. Eisen M.B., Spellman P.T., Brown P.O., Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA. 1998;95:14863–14868. [PMC free article] [PubMed]
28. Saeed A.I., Sharov V., White J., Li J., Liang W., Bhagabati N., Braisted J., Klapa M., Currier T., Thiagarajan M. TM4: A free, open-source system for microarray data management and analysis. Biotechniques. 2003;34:374–378. [PubMed]
29. Messina D.N., Glasscock J., Gish W., Lovett M. An ORFeome-based analysis of human transcription factor genes and the construction of a microarray to interrogate their expression. Genome Res. 2004;14:2041–2047. [PMC free article] [PubMed]
30. Yvert G., Brem R.B., Whittle J., Akey J.M., Foss E., Smith E.N., Mackelprang R., Kruglyak L. Trans-acting regulatory variation in Saccharomyces cerevisiae and the role of transcription factors. Nat. Genet. 2003;35:57–64. [PubMed]
31. Ng P.C., Henikoff S. Predicting deleterious amino acid substitutions. Genome Res. 2001;11:863–874. [PMC free article] [PubMed]
32. Moffatt M.F., Kabesch M., Liang L., Dixon A.L., Strachan D., Heath S., Depner M., von Berg A., Bufe A., Rietschel E. Genetic variants regulating ORMDL3 expression contribute to the risk of childhood asthma. Nature. 2007;448:470–473. [PubMed]
33. Kapur K., Xing Y., Ouyang Z., Wong W.H. Exon arrays provide accurate assessments of gene expression. Genome Biol. 2007;8:R82. [PMC free article] [PubMed]
34. Lawrence R., Evans D.M., Morris A.P., Ke X., Hunt S., Paolucci M., Ragoussis J., Deloukas P., Bentley D., Cardon L.R. Genetically indistinguishable SNPs and their influence on inferring the location of disease-associated variants. Genome Res. 2005;15:1503–1510. [PMC free article] [PubMed]
35. Schleef R.R., Chuang T.L. Protease inhibitor 10 inhibits tumor necrosis factor alpha -induced cell death. Evidence for the formation of intracellular high M(r) protease inhibitor 10-containing complexes. J. Biol. Chem. 2000;275:26385–26389. [PubMed]
36. Shioji G., Ezura Y., Nakajima T., Ohgaki K., Fujiwara H., Kubota Y., Ichikawa T., Inoue K., Shuin T., Habuchi T. Nucleotide variations in genes encoding plasminogen activator inhibitor-2 and serine proteinase inhibitor B10 associated with prostate cancer. J. Hum. Genet. 2005;50:507–515. [PubMed]
37. Saito T., Hirai R., Loo Y.M., Owen D., Johnson C.L., Sinha S.C., Akira S., Fujita T., Gale M. Regulation of innate antiviral defenses through a shared repressor domain in RIG-I and LGP2. Proc. Natl. Acad. Sci. USA. 2007;104:582–587. [PMC free article] [PubMed]
38. Hassett C., Aicher L., Sidhu J.S., Omiecinski C.J. Human microsomal epoxide hydrolase: Genetic polymorphism and functional expression in vitro of amino acid variants. Hum. Mol. Genet. 1994;3:421–428. [PubMed]
39. Sarmanova J., Benesova K., Gut I., Nedelcheva-Kristensen V., Tynkova L., Soucek P. Genetic polymorphisms of biotransformation enzymes in patients with Hodgkin's and non-Hodgkin's lymphomas. Hum. Mol. Genet. 2001;10:1265–1273. [PubMed]
40. Zusterzeel P.L., Peters W.H., Visser W., Hermsen K.J., Roelofs H.M., Steegers E.A. A polymorphism in the gene for microsomal epoxide hydrolase is associated with pre-eclampsia. J. Med. Genet. 2001;38:234–237. [PMC free article] [PubMed]
41. Smith C.A., Harrison D.J. Association between polymorphism in gene for microsomal epoxide hydrolase and susceptibility to emphysema. Lancet. 1997;350:630–633. [PubMed]
42. Salam M.T., Lin P.C., Avol E.L., Gauderman W.J., Gilliland F.D. Microsomal epoxide hydrolase, glutathione S-transferase P1, traffic and childhood asthma. Thorax. 2007;62:1050–1057. [PMC free article] [PubMed]
43. International HapMap Consortium A haplotype map of the human genome. Nature. 2005;437:1299–1320. [PMC free article] [PubMed]

Articles from American Journal of Human Genetics are provided here courtesy of American Society of Human Genetics
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...