• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptNIH Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Mol Psychiatry. Author manuscript; available in PMC Mar 15, 2011.
Published in final edited form as:
PMCID: PMC3057235

Whole genome association mapping of gene expression in the human prefrontal cortex

Variations in gene expression among individuals may have multiple downstream implications, including an effect on disease risk. “Genetical genomics” (or expression genetics) uses linkage and association methods to map gene expression phenotypes, connecting genetic variants to expression quantitative trait loci (eQTLs). It represents a promising approach to identifying novel expression regulatory elements in the genome. Studies of human lymphoblastoid cell lines1, liver2 and brain3 have also been reported. Meyers et al.3 studied 193 neuropathologically normal human brain samples from three cortical regions using the Affymetrix 500K Array for genotyping and the Illumina HumanRefseq-8 Expression Array for gene expression measurements. They assessed association between 366,140 SNPs and the expression of 14,078 transcripts, and identified 433 SNP-transcript pairs (99 transcripts) that showed significant cis-association (transcript-specific empirical P value ≤ 0.05); but only 25 of them (involving two genes, KIF1B and IPP) are significant after correcting for all the SNPs and phenotypes (transcripts) tested (Sidak multitranscript-corrected empirical P values ≤ 0.05). We would consider only the two genes truly significant cis- associations as they were the ones surviving correction for all the statistical tests.

There are several major limitations in the Myers et al. study, including sample heterogeneity (pooled samples from three different cortical regions: frontal, temporal, and parietal), expression data confounded by uncontrolled covariates, particularly brain pH value, and microarray batch effects. In any case, it would be reasonable to attempt a replication study. We performed a new brain eQTL mapping using psychiatric patient and control brains focusing, on prefrontal cortex, and with a statistical procedure optimized for covariates and microarray batch effects. We used Surrogate Variable Analysis (SVA)4 to remove covariate effects and ComBat5 to remove batch effects on gene expression before the SNP-expression association tests. These procedures, we hoped, would improve the power of detecting associations by removing sources of non-genetic variation from the data.

We obtained 164 brain samples from the Stanley Medical Research Institute (SMRI). These 164 samples came from two collections.68 1) The Neuropathology Consortium has 60 brains, with 56 of the 60 samples Caucasian. The samples are from Schizophrenia, Bipolar Disorder, Major Depression patients, and healthy controls. 2) The SMRI Array Collection contains another set of 105 samples, with 103 of them Caucasian with Schizophrenia, Bipolar Disorder, or healthy controls. Diagnoses of the samples were made by two senior psychiatrists, using DSM-IV criteria and based on medical records, and, when possible, telephone interviews with family members. Diagnoses of unaffected controls were based on structured interviews by a senior psychiatrist with family member(s) to rule out Axis I diagnoses.

These two sets of samples have been studied for gene expression in the prefrontal cortex (Broadmann area 46, dorsolateral prefrontal cortex, possibly contains Broadmann area 10, frontal pole) by six investigators using five different microarray platforms. Data is available at SMRI Online Genomics Database (https://www.stanleygenomics.org). Altar’s group is the only one that studied both Consortium and Array samples using the same microarray platform (Affymetrix Human Genome U133A). We chose this dataset (Study 1 and 2 in the online database) as our expression data, and obtained the CEL files of the raw gene expression data. These include 87 Array and 40 Consortium Caucasian samples. All these expression data were normalized with the robust multi-array average (RMA) method using Partek software (http://www.partek.com). RMA expression values were calculated based on scaling to a target intensity of 100, transformed by Log2(x+20). The Affymetrix U133A array uses, on average, 11 probes of a probeset to assay expressions of 3’ of one transcript. The probeset is the expression measurement unit (phenotype) in this study. A total of 22,277 probesets were assayed in U133A. We selected 6,968 probesets that were coded as “present” by the Affymetrix Microarray Suite (MAS) call algorithm in ≥ 80% of samples.

We used Surrogate Variable Analysis (SVA)4 to identify known and unknown covariates influencing the gene expression data. The residuals from SVA were then used for ComBat5 to remove batch effects. The effects of known variables on the gene expression data were identified using linear regression pre- and post-SVA and ComBat. All samples include collection group, diagnosis, age, gender, race, postmortem interval (PMI), brain pH, smoking, alcohol use, suicide status, and psychotic feature data. We used these variables as covariates in the analysis. Drug and alcohol use were dichotomized into “Heavy” and “Not heavy” (as defined by SMRI). Age, PMI, pH, and lifetime antipsychotics data were analyzed as quantitative covariates. Other covariates were analyzed as binary covariates. Summary information about the sample demographic data and covariates can be found in the Supplementary Table ST1.

The raw microarray expression data demonstrated strong effects of brain pH (significant in 57% of probes) and batch effects (significant in 48% of probes). After SVA and ComBat processing, and assessing significance of covariates by permuting within batches, the proportions of genes showing significant pH and batch effects (p<0.05) were reduced to 2% and 5% respectively, which are close to chance expectation (Supplementary Table ST2).

The 6,968 residuals obtained from SVA/ComBat were used as phenotypes for association analysis. All residuals were standardized to have a mean of 0 and standard deviation of 1.

Genomic DNAs of the same individuals were extracted from frozen cerebellum tissues provided by the SMRI. A phenol/chloroform/isoamyl alcohol protocol9 was modified and followed. The DNAs were resuspended in 0.1 mM EDTA TE buffer. The genomic DNA was evaluated by NanoDrop ND-1000 spectrophotometer (NanoDrop Technologies, Wilmington, DE) for concentration, and by 1% agarose gel to validate the DNA integrity. We used the GeneChip Mapping 5.0 Array and Assay Kit (Affymetrix, Santa Clara, CA) for genotyping following the Affymetrix protocol. Genotypes were called using the BRLMM-p algorithm (Affymetrix) with all arrays simultaneously. SNP call rates ranged from 97.3% to 99.58%, average 98.9%. In the 156 Caucasian samples, 238,389 out of 443,816 SNPs have call rates ≥ 99%, minor allele frequency ≥ 10%, and Hardy-Weinberg Equilibrium (HWE) p ≥ 0.001. These 238,389 SNPs were used to test for correlations with gene expression.

We used the programs STRUCTURE,10 PLINK,11 and EIGENSTRAT12 to verify sample ethnic homogeneity, and PLINK11 pairwise identity-by-state and identity-by-descent calculation to examine cryptic relatedness. The results confirmed that 127 selected samples are unrelated Caucasians, and these were used for genotype-expression association tests.

Gene expression regulation can be roughly divided into two types: cis-acting regulation by DNA elements in or adjacent to the transcripts, and trans-acting regulation by factors from the genomic regions distal from the transcripts, including from different chromosomes. We defined the SNPs within a region bounded by one Mb distance from both ends of each expression probeset as candidates for cis- analysis. All the other SNPs were analyzed for trans-acting associations of each gene. Forty expression probesets were excluded from the cis-analysis because of having multiple homologs in the genome. They were analyzed with the other trans-analyses using all SNPs as trans-candidate SNPs for them.

We used PLINK11 to perform linear regression analysis to test for correlation between expression residuals and genotype (additive genetic model; the number of minor alleles at each SNP). From this analysis, an asymptotic p-value from the Wald statistic was obtained. All permutations were done permuting within mRNA microarray and genotyping batches clusters to control for batch effects. Permutations were performed by swapping sets of phenotypes between individuals. This preserves the relationship between genotypes (and controls for LD) and within the grouped phenotypes (thus controlling for any correlations between expression probes). Clusters of individuals within each mRNA microarray and genotype batch were defined. Permutations were performed within each cluster of individuals to control for batch effects. Two sets of permutations were done. Permutations for an expression-SNP combination were calculated with the adaptive perm option of PLINK, permuting up to 1 billion replicates (EMP_P). This corrects for possible non-normality of the phenotype distribution. Permutations correcting for multiple testing within a cis-region or whole genome scan were also performed, using the max(T) permutation option of PLINK (Regionwide_ P for cis-; Genomewide_P for trans-). For each phenotype, results were permutated 1,000 times, using the same seed to maintain the correlation between phenotypes. The most significant statistic per replicate was saved using the PLINK mperm-save option. To estimate phenotype-wide significance, the most significant statistic per replicate across all phenotypes was obtained (statbest). Phenotype-wide corrected p-values were calculated as (R+1)/(N+1) where R is the number of times the statbest exceeded the observed statistic and N is the number of permutations (1000).

In the cis-analysis, 3,951 SNP-expression probeset pairs, consisting of 3530 SNPs and 903 probesets (of 826 genes) were significantly correlated with region-wide permutated p (Regionwide_P) ≤ 0.05. We found that 562 associations (involving 106 genes) are significant after correcting for the 6,928 expression phenotypes that have been tested for cis-association (Phenotype-wide_P ≤ 0.05). 72 expression probesets had 3 to 20 different SNPs from each cis-region showing Phenotype-wide significant associations. They are eQTLs supported by multiple SNPs in the same region. The cis- associations show effect sizes R2 ranging from 0.05 to 0.67. The top 10 best signals by Wald P of these 106 genes are shown in Table 1. The complete cis-regulation list can be found in Supplementary Table ST3.

Table 1
Top Ten Cis- Associations in Human Prefrontal Cortex

In the trans-analysis, 241 SNP-transcript probeset pairs from 239 SNPs and 160 probesets (157 genes) had associations at permutation corrected Genomewide_P ≤ 0.05 (Supplementary Table ST4). But none is significant after further phenotype-wide correction.

Interestingly, pathway and functional analysis of the regulated genes in both cis- and trans-associations using Ingenuity Pathway Analysis (www.ingenuity.com) show that “protein degradation and protein synthesis” are the most enriched function groups. Protein ubiquitination is the most enriched canonical pathway (Supplement table ST5). This suggests that protein metabolism may be impacted by the genetic variants with detectable effects more than other biological systems.

We detected one SNP (rs17733118, upstream of ZFP64, a zinc finger protein homolog) showing associations with two distinct genes VPS8 and CTNNA1. ZFP64 is thus a potential master regulator that regulates expression of VPS8 and CTNNA1, though nothing is known about their interactions so far. Again worth noting, these trans- associations do not reach phenotype-wide significant level therefore with a good possibility of being false positive.

In the previous study on brain samples.3 Myers et al. identified 433 SNP-transcript pairs (99 transcripts) showing region-wide cis-association (corrected for all the SNPs tested in each cis-region) but only identified two genes showing phenotype-wide significant cis-association. We analyzed 366 genes, which showed region-wide significant cis-association in our study, in the 46 frontal cortex subset samples from Myers’ study using our SVA-ComBat procedure. To maximize power, missing data was imputed using nearest neighbor averaging prior to SVA analyses. Information was available for sex, age, transcripts detected rate (TDR), sample collection institution and batch dates (but not brain pH). Effects of these covariates were analyzed using regression in the pre- and post- SVA + ComBat data. In the preprocessed data, batch effects, institution, and TDR were significant in 40%, 15%, and 20% of the probes. In the postprocessed data and permuting within batch cluster, batch effects, institution, and TDR were significant in 5%, 6%, and 0% of the probes. Association analyses performed in the same manner as with the SMRI data. Only the SNPs that were significant in our study were tested. Thus region-wide significant refers to significance after correction for the number of SNPs actually analyzed rather than for the whole cis-region. Of the 826 genes showing associations with a Regionwide_wide P < 0.05 in the SMRI data, only 366 genes could be tested in the Myers data. Defining replication to be only association with the same SNP in the same direction (same allele increases or decreases gene expression), 103 associations involving 45 genes are region-wide significant. Among them 26 associations involving seven genes are phenotype-wide significant in the replicate sample (Table 2 shows the best association for each of the 45 genes).

Table 2
Cis-Associations that Are Region-wide and/or Phenotype-wide Significant in both SMRI and Myers' Samples in the Same Association Direction (Best Association for Each Gene)

The relatively low level of replication is not surprising and lack of replication does not invalidate either set of findings. The replicate sample size is quite very small (only 46 samples). Normally the replicate sample size should be larger than the initial study to have sufficient power to reproduce the findings from the initial study. Also there is a brain region difference (frontal cortex in Myers’ study; prefrontal cortex in our study), demographic data differences (Myers’ samples average age 81; in this study average age 45).

Myers et al. reported RPS26 gene association with SNP rs11171739 as an example of replication of Cheung et al.’s finding in lymphoblastoid cells.1 We observed this association as well. Another study of liver also identified RPS26- rs2292239 correlation as one of its strongest associations2. RPS26 seems to be one of the most strongly genetically regulated genes in the human genome.

Seven SNPs in a 125 Kb genomic region showed cis-association with two different genes ALDH8A1 and HBS1L at the phenotype-wide significant level (Table 3). They may be considered as co-regulated transcripts. ALDH8A1 and HBS1L are transcribed in the same direction, with increased expression associated with the same SNP allele. They might be derived from a polycistronic transcript, though polycistronic transcription, except for microRNA cluster, has rarely been reported or studied in humans so far.13

Table 3
HBS1L and ALDH8A1 share cis- associations

Since we had psychiatric disorder patient samples in this study, we were interested in knowing whether the sample composition influenced the detected regulation elements. In the covariate analysis of the expression data, we found that disease diagnoses contribute very little to the global variations of gene expression level before and after the SVA/ComBat adjustment, comparing with many other factors, including PMI, brain pH (Supplementary Table 2). After regressing out factors including affection status, we found that affection status has little effect on the eQTL mapping results in this study.

It is conceivable that genetic variants would have stronger and direct impact on regional cis-regulation of gene expression, while distant trans-regulation would involve more factors and thus show less genetic effects. We identified an exceedingly large amount of cis-associations that can stand the strict statistical correction for multiple testing. No trans-associations are significant after correction for the number of SNPs and phenotypes analyzed. Other eQTL studies have claimed detection of trans- regulations with region-wide and occasionally phenotype-wide significance but there is little consistency between studies1417. The difficulty of replicating trans- eQTLs has been previously observed15. We are advocating the use of phenotype-wide significance, which might help to reduce the false positives that would be more difficult to replicate.

We note that this study focuses on genes that have observable relatively large variation in expression, and on SNPs that have common minor allele frequencies, in order to have well-powered SNP-expression pairs for eQTL mapping study. A considerable number of important neuropsychiatric disease candidate genes, including 5-HTT (SLC6A4), DRD1, DRD2, DRD3, DRD4, DRD5, GRIA1, GRIA3, GRIA4, GRIN1, GRIN2B, GRIN2C, GRIN2D, PER1, CRY1, CRY2, and others, were not assessed in this study because their expression probes were filtered out due to low detection levels.

Supplementary Material



Data and biomaterial access. The genotype and expression data files used in the paper are available at https://www.stanleygenomics.org/index.html and upon request from the authors. DNA and RNA samples are also available for application through SMRI (http://www.stanleyresearch.org/dnn/BrainResearchLaboratorybrBrainCollection/tabid/83/Default.aspx).

Reference List

1. Cheung VG, Spielman RS, Ewens KG, Weber TM, Morley M, Burdick JT. Nature. 2005;437:1365–1369. [PMC free article] [PubMed]
2. Schadt EE, Molony C, Chudin E, Hao K, Yang X, Lum PY, et al. PLoS. Biol. 2008;6:e107. [PMC free article] [PubMed]
3. Myers AJ, Gibbs JR, Webster JA, Rohrer K, Zhao A, Marlowe L, et al. Nat. Genet. 2007;39:1494–1499. [PubMed]
4. Leek JT, Storey JD. PLoS. Genet. 2007;3:1724–1735. [PMC free article] [PubMed]
5. Johnson WE, Li C, Rabinovic A. Biostatistics. 2007;8:118–127. [PubMed]
6. Knable MB, Barci BM, Webster MJ, Meador-Woodruff J, Torrey EF. Mol. Psychiatry. 2004;9:609–620. 544. [PubMed]
7. Torrey EF, Webster M, Knable M, Johnston N, Yolken RH. Schizophr Res. 2000;44:151–155. [PubMed]
8. Torrey EF, Barci BM, Webster MJ, Bartko JJ, Meador-Woodruff JH, Knable MB. Biol. Psychiatry. 2005;57:252–260. [PubMed]
9. Gross-Bellard M, Oudet P, Chambon P. Eur. J. Biochem. 1973;36:32–38. [PubMed]
10. Falush D, Stephens M, Pritchard JK. Genetics. 2003;164:1567–1587. [PMC free article] [PubMed]
11. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, et al. Am. J. Hum. Genet. 2007;81:559–575. [PMC free article] [PubMed]
12. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Nat. Genet. 2006;38:904–909. [PubMed]
13. Blumenthal T. Bioessays. 1998;20:480–487. [PubMed]
14. Morley M, Molony CM, Weber TM, Devlin JL, Ewens KG, Spielman RS, et al. Nature. 2004;430:743–747. [PMC free article] [PubMed]
15. Goring HH, Curran JE, Johnson MP, Dyer TD, Charlesworth J, Cole SA, et al. Nat Genet. 2007;39:1208–1216. [PubMed]
16. Emilsson V, Thorleifsson G, Zhang B, Leonardson AS, Zink F, Zhu J, et al. Nature. 2008;452:423–428. [PubMed]
17. Dixon AL, Liang L, Moffatt MF, Chen W, Heath S, Wong KC, et al. Nat Genet. 2007;39:1202–1207. [PubMed]
PubReader format: click here to try


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...