• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptNIH Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Schizophr Res. Author manuscript; available in PMC Sep 1, 2011.
Published in final edited form as:
PMCID: PMC2933424

Common variants conferring risk of schizophrenia: a pathway analysis of GWAS data


Unlike the typical analysis of single markers in genome-wide association studies (GWAS), we incorporated Gene Set Enrichment Analysis (GSEA) and hypergeometric test and combined them using Fisher's combined method to perform pathway-based analysis in order to detect genes’ combined effects on mediating schizophrenia. A few pathways were consistently found to be top ranked and likely associated with schizophrenia by these methods; they are related to metabolism of glutamate, the process of apoptosis, inflammation, and immune system (e.g., glutamate metabolism pathway, TGF-beta signaling pathway, and TNFR1 pathway). The genes involved in these pathways had not been detected by single marker analysis, suggesting this approach may complement the original analysis of GWAS dataset.

Keywords: Schizophrenia, gene set enrichment analysis, GWAS, pathway, candidate gene

1. Introduction

Genome-wide association studies (GWAS) have become a powerful approach to searching for common genetic variants which increase susceptibility to complex diseases or traits. So far, the search for common susceptibility variants has been less successful in schizophrenia than in many other complex diseases/traits (O'Donovan et al. 2009). Among several recent schizophrenia GWA studies, essentially no marker or gene has achieved genome-wide statistical significance level in any single study (Purcell et al. 2009; Shi et al. 2009; Stefansson et al. 2009; Sullivan et al. 2008), although combining data from several studies suggested the MHC region on chromosome 6p and a few other genes (e.g., NRGN and TCF4) might be promising for future validation (Purcell et al. 2009; Shi et al. 2009; Stefansson et al. 2009). Although it is commonly accepted that schizophrenia may result from many genes or genetic variants, each of which makes a small risk contribution, and through interactions with each other or environmental factors to cause this disorder, the genetic signal has always been examined at single marker level in the schizophrenia GWA studies.

Here we examined the association signal of GWAS markers in a set of genes categorized by biological pathways, assuming a complex disease such as schizophrenia may result from a number of genes which disrupt one or more pathways. To reduce bias, we applied two statistical methods to identify overrepresented pathways in a single GWAS dataset. The first method is Gene Set Enrichment Analysis (GSEA), which was initially developed for microarray gene expression analysis (Subramanian et al. 2005) but was recently adapted to GWA studies. The second method is the hypergeometric test which identifies pathways overrepresented with significant genes. We identified 4 pathways that had P value <0.05 by both methods. We further combined the P values using Fisher’s method (Fisher 1932) to assess the consistency of evidence. Importantly, these pathways are related to glutamate metabolism, the process of apoptosis, inflammation, and the immune system, implicating their involvement in the underlying pathology of schizophrenia.

2. Methods and Materials

2.1 GWAS Data Preparation

We used GAIN (Genetic Association Information Network) GWAS dataset for schizophrenia since most other schizophrenia GWAS datasets (e.g., ISC GWAS) have not been publicly available to general investigators (Manolio et al. 2007). The data access was approved by the GAIN DAC through National Human Genome Research Institute and was recently used in our candidate gene selection for schizophrenia (Sun et al. 2010; Sun et al. 2009). The data was extracted from the NCBI dbGaP (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=gap). Unrelated European ancestry samples (1158 schizophrenia cases and 1378 controls) were used in this analysis. We excluded SNPs whose missing genotype rate was >0.1, minor allele frequency (MAF) was <0.01, or Hardy-Weinberg equilibrium (HWE) was ≤0.001. This resulted in a total of ~725,000 SNPs. According to previous analysis, there was no significant stratification found in the GAIN samples of European ancestry (Shi et al. 2009); thus, we used basic allelic test (chisquare, 1df) to compute the association of each SNP with schizophrenia. Supplementary Figure 1 provides the corrected quantile-quantile (Q-Q) plot of all the SNPs we used. All P values were corrected for λ. The red line indicates the expectation if the observed distribution did not deviate from the expected distribution.

We mapped a SNP to a gene if it was located within the gene or 20 kb immediately upstream or downstream of the gene. The most significant SNP of the gene was chosen to represent the association of the gene in the follow up analysis. Canonical pathways were downloaded from MSigDB (Subramanian et al. 2005), which included major pathways from the several public resources such as KEGG (http://www.genome.jp/kegg/) and BioCarta (http://www.biocarta.com/genes/index.asp) databases. To avoid stochastic bias or testing too general biological process, we discarded pathways that contained less than 10 or more than 250 genes. After this SNP-gene and gene-pathway mapping process, we had 369,808 SNPs mapped to 19,896 protein-coding genes, which were involved in 511 biological pathways.

2.2 Gene Set Enrichment Analysis (GSEA)

The original GSEA algorithm was introduced in Subramanian et al (2005). Briefly, it is a weighted Kolmogorov-Smirnov-like test to examine if two datasets differ significantly. There are three main steps, as described in Wang et al. (2007).

1) For each SNP, we first calculated its χ2 statistic value by a case-control basic allelic association test and then selected the SNP that had the largest χ2 value in a gene region (denoted as r) to represent the extent of association of gene with the disease (i.e., schizophrenia). We next sorted all the genes by their χ2 value so that genes with stronger association were ranked on the top of the list.

2) For each pathway (i.e., gene set S), a running sum statistic (enrichment score, ES) was computed according to the following formula:


where N is the total number of genes included in a GWA study, i is the position in the gene list N, j is the position before i in the gene list N, rj is the χ2 statistic value of gene j, g denotes a gene NR=gjS|rj|m, is the number of genes in a pathway of interest. Of note, when m equals to 0, ES(S) reduces to Kolmogorov-Smirnov test. We set m = 1, as used in the original GSEA application, to weigh the genes by their association level (rj). ES measures the maximum deviation of the pathway departing from random walk (Subramanian et al. 2005; Wang et al. 2007).

3) Permutation was performed on the original GWAS data by swapping the labels of cases and controls while maintaining the same case/control ratio. In this way, the structure between SNPs and genes can be maintained while status of phenotypes is randomized. This step aims to test if an enriched pathway is also significantly associated with the disease and makes ES(S) of different pathways comparable. We performed 10,000 times of permutation. For each permutation (π), we calculated ES(S) and denoted as ES(S, π). Then, for each pathway, the original ES(S) was normalized according to the 10,000 ES(S, π), which generated an NES(S) by


In this way, for each pathway, ES(S) and ES(S, π) are compared in the same background distribution in terms of pathway size, gene length, SNP density, etc. Specifically, this approach effectively avoids the gene length bias from brain- or neuro-related genes, which tend to be large. In normalization process, comparison of ES(S) and ES(S, π) is based on the same gene set; thus there is no bias towards gene length or SNP density. The resultant NES(S) were normally distributed and comparable to each other with no bias, especially for pathways with long genes and having dense number of SNPs. A nominal P was computed for each pathway by counting the number of permutations that had ES(S, π) greater than or equal to the real case and then divided by the total number of permutations.

2.3 Hypergeometric Test

To test if a gene set is overrepresented in the GWAS dataset by using hypergeometric distribution, we first defined “interesting genes”. A gene was selected to be of interest if any GAIN marker mapped to the gene had P < 0.01. This P-value cutoff is arbitrary but has appeared to be useful as a first step. Assuming that 1) L is the total number of genes considered in a genome (i.e., represented by GWAS data and having pathway annotations) and M is the number of interesting genes out of L and, 2) for a gene set (i.e., a pathway), S is the number of genes within L and x is the number of genes within M, P value based on hypergeometric distribution could be computed as:


This P value indicates the probability of observing at least g genes in the current gene set. Similarly to the GSEA, we performed permutation, estimated nominal P values, and performed multiple testing correction using Benjamini-Hochberg method (Benjamini and Hochberg 1995).

2.4 Fisher’s Method

Fisher’s method to combine multiple P values from different tests is


where Pi is the P value for the ith test and k is the total number of tests (Fisher 1932). Χ2 has a chi-square distribution with 2k degrees of freedom. We used Fisher’s method to combine the nominal P values for each pathway computed by each method to identify pathways that show consistent significance by both methods.

3. Results and Discussion

We found 6 pathways having significant nominal P values (P <0.05) by the GSEA method, and 10 by the hypergeometric test. The following four pathways had nominal P values <0.05 by both methods: CARM_ER pathway (BioCarta), glutamate metabolism (BioCarta), TNFR1 pathway (BioCarta), and TGF beta signaling pathway (KEGG). Table 1 lists these overrepresented pathways ordered by GSEA NES value. There were additional 7 pathways having nominal P value <0.05 by either method (Supplementary Table 1). When we used Fisher's method to combine the nominal P values of GSEA and hypergeometric test, we found 9 out of these 11 pathways had combined P value <0.05 and one (glutamate metabolism) passed Benjamini-Hochberg multiple testing correction (Supplementary Table 1). Overall, the results based on these methods were consistent.

Table 1
Pathways overrepresented in the GAIN GWAS dataset (both nominal P from GSEA and hypergeometric test < 0.05)

Specifically, the glutamate metabolism pathway had a nominal P value 0.004 by GSEA, a nominal P value 0.004 by hypergeometric test, a Fisher’s combine P value 1.75 × 10−4, and a Benjamini-Hochberg correction P value 0.018 (Supplementary Table 1). This pathway directs glutamate metabolism, a pathway that has been linked to schizophrenia based upon the ability of NMDA receptor antagonists such as phencyclidine, ketamine and MK-801 to mimic the cognitive impairment and some symptoms of schizophrenia. Glutamate is the primary excitatory neurotransmitter in the central nervous system (CNS). Glutamate can be synthesized from glutamine by glutaminase (GLS) and can be metabolized to GABA by glutamate decarboxylase 1 (GAD1). GABA, the main inhibitory neurotransmitter has also been identified as a susceptibility factor for schizophrenia; it can be further metabolized by 4-aminobutyrate aminotransferase (ABAT) and aldehyde dehydrogenase 5 family, member A1 (ALDH5A1). Additionally, glutamate can be converted to glutathione (GSH) by glutamate-cysteine ligase, catalytic subunit (GCLC). Both genetic and functional studies have revealed an impairment in glutathione synthesis might be associated with schizophrenia (Gysin et al. 2007). Among the 24 genes in this pathway examined in the GAIN GWAS, ten had P value <0.05 based on the original association analysis (Table 2). The gene-wise P values, measured by the most significant SNP in each gene region, were within a range of 0.002–0.037. The similar P value ranges were observed in other three top ranked pathways (Supplementary Table 2), suggesting that multiple moderate-risk genes may interact with each other to increase risk of complex disease. Interestingly, the informative genes, as defined by the gene-wise P value <0.01, in this overrepresented pathway included GLS, GCLC, CPS1, ALDH5A1, GMPS, and GAD1.

Table 2
Genes having SNPs reaching P < 0.05 in glutamate metabolism pathway (BioCarta) in the GAIN GWAS dataset

Three pathways related to apoptosis, inflammation, and the immune system were identified overrepresented by both GSEA and hypergeometric methods: the TGF-beta pathway (nominal PGSEA = 0.034 and nominal Phypergeometric = 0.009), the TNFR1 pathway (nominal PGSEA = 0.042 and nominal Phypergeometric = 0.030), and the TOB1 pathway (nominal PGSEA = 0.070 and nominal Phypergeometric = 0.036). For the TOB1 pathway, while its nominal GSEA P value was slightly larger than 0.05, its nominal Phypergeometric was 0.036 and Fisher’s combined P value was 0.018. Therefore, we cited it together with the TGF-beta and TNFR1 pathways here. The TGF-beta signaling pathway is involved in many cellular processes including neuronal protection against both apoptosis and excitotoxicity (Vivien and Ali 2006). The TNFR1 signaling pathway controls the binding of TNF-alpha to the TNF receptor 1 and triggers cell apoptosis and, thus, neuronal cell death. TNF-alpha, a proinflammatory cytokine, is involved in several CNS functions (e.g., synaptic scaling (Stellwagen and Malenka 2006) and glutamatergic synaptic transmission (Beattie et al. 2002). Importantly, this result supported recent finding of involvement of the immune system in schizophrenia by combined GWA studies (Purcell et al. 2009; Shi et al. 2009; Stefansson et al. 2009). Informative genes included MYC, SMAD5, BMP7, TGFB1, CREBBP, IFNG, THBS2, PPP2R2B, ZFYVE16, ACVR1B, E2F4, SMAD9, BMP5, CDKN2B, TGFBR2, and SMAD6. Supplementary Figure 2 depicts the TGF-beta signaling pathway by highlighting the informative genes.

It is also worth noting the androgen and estrogen metabolism pathway, which had the smallest nominal P value (0.003) in GSEA, the smallest P value (0.003) in Fisher’s method, and nearly passed Benjamini-Hochberg multiple testing correction (PBH =0.088) (Supplementary Table 1). Estrogen may be protective in schizophrenia as men develop schizophrenia at an earlier age and with greater severity than women (Palha and Goodman 2006; Rao and Kolsch 2003). Interestingly, β-estradiol links to TNF and insulin (Guo et al. 2009), providing further support for the hypothesis that the immune system and apoptosis are important in schizophrenia pathophysiology.

There were a few recent reports of gene set based analysis in psychiatric GWA studies. Using SNP ratio test (SRT) on the ISC GWAS as the discovery dataset and GAIN GWAS as the validation dataset, O’Dushlaine et al. (2010) found that five pathways were significantly associated with schizophrenia; they were glycan structures biosynthesis 1, cell cycle, SNARE, cell adhesion molecules (CAMs), and tight junction. One of these pathways (CAMs) could pass multiple testing correction based on the validation GWAS dataset. These pathways were not found significant in our GSEA or hypergeometirc test. O’Dushlaine et al. also found the CAM pathway was significant with bipolar disorder (P =0.026) using the Welcome Trust Case Control Consortium (WTCCC) bipolar disorder dataset. In another study, Holmans et al. (2009) performed a Gene Ontology (GO) analysis of a bipolar disorder meta-analysis dataset (including the WTCCC data) and identified a list of significant GO terms. Almost all those GO terms (e.g., hormone activity, transcription factor activity) were general and not specifically related to neurodevelopment, as commonly hypothesized for psychiatric disorders. The overall inconsistent findings might be due to the complex genetic structure of the diseases, different datasets, or different statistical methods. Although caution needs to be taken in these results, the gene set analysis, especially pathway-based, is potentially effective for detecting genetic signal beyond the typical single marker analysis in the original GWA studies.

In this study, we primarily used GSEA and hypergeometric test to analyze the GWAS dataset. These two methods have been used in the analysis of both microarray gene expression and GWAS datasets. There are some other available methods such as SUMSQ (Dinu et al. 2007) and MAXMEAN (Efron and Tibshirani 2007) that have been reported with better performance (Tintle et al. 2009); however, it seems not convenient in linking them to PLINK for permutation analysis, which is computationally intensive. Such methods can be applied in future work.

In summary, we examined GWAS data from the GAIN study to identify genetic associations with schizophrenia at the pathway level rather than the SNP level. The genes involved in these pathways had not been detected by single marker analysis. Confirmation of these genes in replication studies would warrant more extensive applications of pathway-based approaches in the studies of complex disorders.

Supplementary Material



Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.


  • Beattie EC, Stellwagen D, Morishita W, Bresnahan JC, Ha BK, Von Zastrow M, Beattie MS, Malenka RC. Control of synaptic strength by glial TNFalpha. Science. 2002;295:2282–2285. [PubMed]
  • Benjamini Y, Hochberg Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B Stat. Methodol. 1995;57:289–300.
  • Dinu I, Potter JD, Mueller T, Liu Q, Adewale AJ, Jhangri GS, Einecke G, Famulski KS, Halloran P, Yasui Y. Improving gene set analysis of microarray data by SAM-GS. BMC Bioinformatics. 2007;8:242. [PMC free article] [PubMed]
  • Efron B, Tibshirani R. On testing the significance of sets of genes. Ann. Appl. Stat. 2007;1:107–129.
  • Fisher RA. Statistical methods for research workers. London: Oliver and Boyd; 1932.
  • Guo AY, Sun J, Riley BP, Thiselton DL, Kendler KS, Zhao Z. The dystrobrevin-binding protein 1 gene: features and networks. Mol. Psychiatry. 2009;14:18–29. [PMC free article] [PubMed]
  • Gysin R, Kraftsik R, Sandell J, Bovet P, Chappuis C, Conus P, Deppen P, Preisig M, Ruiz V, Steullet P, Tosic M, Werge T, Cuenod M, Do KQ. Impaired glutathione synthesis in schizophrenia: convergent genetic and functional evidence. Proc. Natl. Acad. Sci. USA. 2007;104:16621–16626. [PMC free article] [PubMed]
  • Holmans P, Green EK, Pahwa JS, Ferreira MA, Purcell SM, Sklar P, Owen MJ, O'Donovan MC, Craddock N. Gene ontology analysis of GWA study data sets provides insights into the biology of bipolar disorder. Am. J. Hum. Genet. 2009;85:13–24. [PMC free article] [PubMed]
  • Manolio TA, Rodriguez LL, Brooks L, Abecasis G, Ballinger D, Daly M, Donnelly P, Faraone SV, Frazer K, Gabriel S, Gejman P, Guttmacher A, Harris EL, Insel T, Kelsoe JR, Lander E, McCowin N, Mailman MD, Nabel E, Ostell J, Pugh E, Sherry S, Sullivan PF, Thompson JF, Warram J, Wholley D, Milos PM, Collins FS. New models of collaboration in genome-wide association studies: the Genetic Association Information Network. Nat. Genet. 2007;39:1045–1051. [PubMed]
  • O'Donovan MC, Craddock NJ, Owen MJ. Genetics of psychosis; insights from views across the genome. Hum. Genet. 2009;126:3–12. [PubMed]
  • O'Dushlaine C, Kenny E, Heron E, Donohoe G, Gill M, Morris D, Corvin A. Molecular pathways involved in neuronal cell adhesion and membrane scaffolding contribute to schizophrenia and bipolar disorder susceptibility. Mol. Psychiatry advance online publication. 2010 February 16; [PubMed]
  • Palha JA, Goodman AB. Thyroid hormones and retinoids: a possible link between genes and environment in schizophrenia. Brain Res. Rev. 2006;51:61–71. [PubMed]
  • Purcell SM, Wray NR, Stone JL, Visscher PM, O'Donovan MC, Sullivan PF, Sklar P. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature. 2009;460:748–752. [PMC free article] [PubMed]
  • Rao ML, Kolsch H. Effects of estrogen on brain development and neuroprotection--implications for negative symptoms in schizophrenia. Psychoneuroendocrinology. 2003;28 Suppl 2:83–96. [PubMed]
  • Shi J, Levinson DF, Duan J, Sanders AR, Zheng Y, Pe'er I, Dudbridge F, Holmans PA, Whittemore AS, Mowry BJ, Olincy A, Amin F, Cloninger CR, Silverman JM, Buccola NG, Byerley WF, Black DW, Crowe RR, Oksenberg JR, Mirel DB, Kendler KS, Freedman R, Gejman PV. Common variants on chromosome 6p22.1 are associated with schizophrenia. Nature. 2009;640:753–757. [PMC free article] [PubMed]
  • Stefansson H, Ophoff RA, Steinberg S, Andreassen OA, Cichon S, Rujescu D, Werge T, Pietilainen OP, Mors O, Mortensen PB, Sigurdsson E, Gustafsson O, Nyegaard M, Tuulio-Henriksson A, Ingason A, Hansen T, Suvisaari J, Lonnqvist J, Paunio T, Borglum AD, Hartmann A, Fink-Jensen A, Nordentoft M, Hougaard D, Norgaard-Pedersen B, Bottcher Y, Olesen J, Breuer R, Moller HJ, Giegling I, Rasmussen HB, Timm S, Mattheisen M, Bitter I, Rethelyi JM, Magnusdottir BB, Sigmundsson T, Olason P, Masson G, Gulcher JR, Haraldsson M, Fossdal R, Thorgeirsson TE, Thorsteinsdottir U, Ruggeri M, Tosato S, Franke B, Strengman E, Kiemeney LA, Melle I, Djurovic S, Abramova L, Kaleda V, Sanjuan J, de Frutos R, Bramon E, Vassos E, Fraser G, Ettinger U, Picchioni M, Walker N, Toulopoulou T, Need AC, Ge D, Yoon JL, Shianna KV, Freimer NB, Cantor RM, Murray R, Kong A, Golimbet V, Carracedo A, Arango C, Costas J, Jonsson EG, Terenius L, Agartz I, Petursson H, Nothen MM, Rietschel M, Matthews PM, Muglia P, Peltonen L, St Clair D, Goldstein DB, Stefansson K, Collier DA. Common variants conferring risk of schizophrenia. Nature. 2009;460:744–747. [PMC free article] [PubMed]
  • Stellwagen D, Malenka RC. Synaptic scaling mediated by glial TNF-[alpha] Nature. 2006;440:1054–1059. [PubMed]
  • Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA. 2005;102:15545–15550. [PMC free article] [PubMed]
  • Sullivan PF, Lin D, Tzeng JY, van den Oord E, Perkins D, Stroup TS, Wagner M, Lee S, Wright FA, Zou F, Liu W, Downing AM, Lieberman J, Close SL. Genomewide association for schizophrenia in the CATIE study: results of stage 1. Mol. Psychiatry. 2008;13:570–584. [PMC free article] [PubMed]
  • Sun J, Jia P, Fanous AH, van den Oord EJCG, Chen X, Riley BP, Amdur RL, Kendler KS, Zhao Z. Schizophrenia gene networks and pathways and their applications for novel candidate gene selection. PLoS ONE. 2010;5:e11351. [PMC free article] [PubMed]
  • Sun J, Jia P, Fanous AH, Webb BT, van den Oord EJ, Chen X, Bukszar J, Kendler KS, Zhao Z. A multi-dimensional evidence-based candidate gene prioritization approach for complex diseases-schizophrenia as a case. Bioinformatics. 2009;25:2595–2602. [PMC free article] [PubMed]
  • Tintle NL, Borchers B, Brown M, Bekmetjev A. Comparing gene set analysis methods on single-nucleotide polymorphism data from Genetic Analysis Workshop 16. BMC Proceedings. 2009;3 Suppl 7:S96. [PMC free article] [PubMed]
  • Vivien D, Ali C. Transforming growth factor-beta signalling in brain disorders. Cytokine Growth Factor Rev. 2006;17:121–128. [PubMed]
  • Wang K, Li M, Bucan M. Pathway-based approaches for analysis of genomewide association studies. Am. J. Hum. Genet. 2007;81:1278–1283. [PMC free article] [PubMed]
PubReader format: click here to try


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...