• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of bmcbioiBioMed Centralsearchsubmit a manuscriptregisterthis articleBMC Bioinformatics
BMC Bioinformatics. 2006; 7: 247.
Published online May 5, 2006. doi:  10.1186/1471-2105-7-247
PMCID: PMC1534062

Bayesian models for pooling microarray studies with multiple sources of replications

Abstract

Background

Biologists often conduct multiple but different cDNA microarray studies that all target the same biological system or pathway. Within each study, replicate slides within repeated identical experiments are often produced. Pooling information across studies can help more accurately identify true target genes. Here, we introduce a method to integrate multiple independent studies efficiently.

Results

We introduce a Bayesian hierarchical model to pool cDNA microarray data across multiple independent studies to identify highly expressed genes. Each study has multiple sources of variation, i.e. replicate slides within repeated identical experiments. Our model produces the gene-specific posterior probability of differential expression, which provides a direct method for ranking genes, and provides Bayesian estimates of false discovery rates (FDR). In simulations combining two and five independent studies, with fixed FDR levels, we observed large increases in the number of discovered genes in pooled versus individual analyses. When the number of output genes is fixed (e.g., top 100), the pooled model found appreciably more truly differentially expressed genes than the individual studies. We were also able to identify more differentially expressed genes from pooling two independent studies in Bacillus subtilis than from each individual data set. Finally, we observed that in our simulation studies our Bayesian FDR estimates tracked the true FDRs very well.

Conclusion

Our method provides a cohesive framework for combining multiple but not identical microarray studies with several sources of replication, with data produced from the same platform. We assume that each study contains only two conditions: an experimental and a control sample. We demonstrated our model's suitability for a small number of studies that have been either pre-scaled or have no outliers.

Background

cDNA microarrays monitor gene expression for thousands of genes simultaneously. Two experimental conditions are compared by examining the ratio of expression between two samples, e.g. treatment versus control, wildtype versus mutant, or disease versus healthy. The primary goal of these experiments is to identify genes that are differentially expressed between the two conditions. The up-regulated and down-regulated genes shed light on biological mechanisms of the cell, such as functional pathways, response to treatments, and gene regulation.

Bayesian models have been used extensively in single microarray studies to identify differentially expressed genes (e.g. [1-3]); the discrete mixture model approach in particular has had considerable use [4-13]. Additional Bayesian methods for identifying differentially expressed genes include Bayesian ANOVA for microarrays (BAM) of Ishwaran and Rao [14,15]. This approach redefines the search for differentially expressed genes as a Bayesian variable selection process and uses a hierarchical model that is tailored to adaptive shrinkage. Through the use of model averaging, BAM essentially shrinks only the effects of the non-differentially expressed genes relative to the least squares estimates. For a review of Bayesian approaches to microarray data analysis and their advantages over frequentist methods, see Yang et al. [16].

In addition to performing single microarray studies, biologists often conduct multiple but not identical studies to understand the same biological system. Pooling results of these studies can help identify truly differentially expressed genes. Meta-analyses for microarray studies have been used recently by many researchers in a non-Bayesian context [17-24]. Rhodes et al. [17] combined the results of four prostate cancer studies using a p-value approach. Genes were assigned a p-value in each study separately, and the results were combined to estimate a gene-specific p-value across all studies. This method avoids the necessity of integrating gene expression measures and thus can be used for data across multiple platforms. Choi et al. [18] presented a meta-analysis that integrated gene effect sizes, rather than p-values, into one mean effect. The effect sizes for each study were equal to the mean differences between affected and control groups, standardized by a pooled standard deviation. Due to this standardization, data was able to be integrated across platforms. A common parameter for inter-study variability was incorporated into the model, and statistical significance was determined by permutation tests. Parmigiani et al. [20] introduced an integrative correlation approach to combining data from multiple platforms. This procedure evaluated gene expression consistencies across platforms rather than pooling gene expression values. Using lung cancer data, this method identified genes with reproducible expression patterns across studies and improved correlation across studies. Additional theoretical approaches for combining data from different platforms include adding covariates to models to account for the differences among data types [24,25], although this has not been applied in a microarray setting. While the studies of Rhodes et al. [17], Choi et al. [18], Parmigiani et al. [20] and others provide methods for integrating data across platforms, other authors have shown the difficulties in such an approach ([26,27]; discussions in [25,28]). Working with cell lines, Kuo et al. [26] and Jarvinen et al. [27] both conclude that combining data across platforms is unreliable. Due to these difficulties, other meta-analysis methods focus on incorporating data from one platform only [23,24]. Here, we focus our approach to combine microarray data from the same platform, cDNA microarrays, and assume that the data has either been pre-normalized across studies or that there are no outlying studies.

Choi et al. [18] also provided an alternative Bayesian meta-analysis method to their random effects approach. In this Bayesian model, uninformative prior distributions were assigned to the overall mean effects and the inter-study variation parameter. Within-study gene effects were modelled as t-distributions, and posterior estimates of the overall mean effect for each gene were produced by smoothing effects across studies. The authors demonstrated that Bayesian meta-analysis is more robust and flexible than traditional methods, confirming the findings of DuMouchel and Harris [29]. Bayesian models are also well-suited to data with many levels of replication, including replicate slides within repeated identical experiments. Due to these advantages, we introduce a Bayesian hierarchical model that provides a principled framework for incorporating data from multiple independent cDNA microarray studies with several sources of replication. Unlike the approach of Choi et al. [18], which smoothes the gene effects into one average, our method produces the posterior probability of differential expression based on gene expression levels across studies. Thus, inter-study variability does not need to be estimated by our model. The probability of differential expression provides a direct method for ranking genes, and also for estimating both integration-driven discovery rates and false discovery rates. In simulations, we illustrate that pooling studies increases the number of discovered genes for given thresholds of probabilities of differential expression and false discovery rates, compared to individual studies. In addition, for a fixed top number of genes, the pooled model identifies considerably more differentially expressed genes than separate studies. We also illustrate our method using experimental data from two independent studies in Bacillus (B.) subtilis.

Background on cDNA microarray experiments

cDNA microarrays measure the amount of messenger RNA (mRNA) contained in an experimental sample. They are produced by robotic arrayers, which place entire gene sequences complementary to mRNA onto glass slides. In an experiment, the mRNA in two samples, e.g. treated and control, are fluorescently labeled with two different dyes, typically red and green (Cy5 and Cy3), and mixed together. The combined sample is hybridized to the array, and complementary sequences bind to each other. The relative amounts of mRNA present in the two samples are measured by scanning the slide with two different wavelengths. The resulting fluorescent intensity values for the red and green-labeled mRNA are then compared by using the ratio of intensities. For further details, see [30-33]. We use log-ratios of intensities in each study since they even out highly skewed distributions and give a more realistic sense of variation [34].

Results and discussion

Bayesian model for pooling multiple studies

Biologists often conduct multiple but different studies that all target the same biological system or pathway. For example, when studying the effect of a key transcription factor σE in B. subtilis, Eichenberger et al. [35] conducted both the σE knockout and the σE over-expression experiments (i.e. the mutant and induction experiments, respectively; see Methods). Thus, those genes that are up-regulated in one experiment should be down-regulated in another. Pooling both experiments can help more accurately identify true target genes. More generally, we may imagine having available multiple independent studies of one specific biological system. We assume that each study contains only two conditions: an experimental and a control. It is desirable to combine information from these studies in a principled way. Our model to achieve this goal is as follows:

y j g s e | μ j g e ~ N ( μ j g e , τ j g 2 ) j = 1 , , J ; g = 1 , , G ; e = 1 , , E ; s = 1 , , S e μ j g e | θ j g ~ N ( θ j g , σ j g 2 ) j = 1 , , J ; g = 1 , , G ; e = 1 , , E θ j g | I g = 0 ~ N ( 0 , η j g 0 2 ) θ j g | I g = 1 ~ N ( 0 , c j × η j g 0 2 ) I g ~ Bernoulli ( p ) p ~ Uniform ( 0 , 1 ) ,       ( 1 ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamXvP5wqSXMqHnxAJn0BKvguHDwzZbqegyvzYrwyUfgarqqtubsr4rNCHbGeaGqiA8vkIkVAFgIELiFeLkFeLk=iY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfeaY=biLkVcLq=JHqVepeea0=as0db9vqpepesP0xe9Fve9Fve9GapdbaqaaeGacaGaaiaabeqaamqadiabaaGcbaqcaaAbaeaabyWaaaaabiqaaGhacaWLjaGaemyEaKNcdaWgaaqcbaAaaiabdQgaQjabdEgaNjabdohaZjabdwgaLbqabaqcaaQaeiiFaWhcciGae8hVd0McdaWgaaqcbaAaaiabdQgaQjabdEgaNjabdwgaLbqabaaajaaObaGaeiOFa4habaGaeeOta4KcdaqadaqcaaAaaiab=X7aTPWaaSbaaKqaGgaacqWGQbGAcqWGNbWzcqWGLbqzaeqaaKaaGkabcYcaSiab=r8a0PWaa0baaKqaGgaacqWGQbGAcqWGNbWzaeaacqaIYaGmaaaajaaOcaGLOaGaayzkaaGaeiilaWIaemOAaOMaeyypa0JaeGymaeJaeiilaWIaeSOjGSKaeiilaWIaemOsaOKaei4oaSJaem4zaCMaeyypa0JaeGymaeJaeiilaWIaeSOjGSKaeiilaWIaem4raCKaei4oaSJaemyzauMaeyypa0JaeGymaeJaeiilaWIaeSOjGSKaeiilaWIaemyrauKaei4oaSJaem4CamNaeyypa0JaeGymaeJaeiilaWIaeSOjGSKaeiilaWIaem4uamLcdaWgaaqcbaAaaiabdwgaLbqabaaajaaObiqaaqracaWLjaGae8hVd0McdaWgaaqcbaAaaiabdQgaQjabdEgaNjabdwgaLbqabaqcaaQaeiiFaWNae8hUdeNcdaWgaaqcbaAaaiabdQgaQjabdEgaNbqabaaajaaObaGaeiOFa4habaGaeeOta4KcdaqadaqcaaAaaiab=H7aXPWaaSbaaKqaGgaacqWGQbGAcqWGNbWzaeqaaKaaGkabcYcaSiab=n8aZPWaa0baaKqaGgaacqWGQbGAcqWGNbWzaeaacqaIYaGmaaaajaaOcaGLOaGaayzkaaGaeiilaWIaemOAaOMaeyypa0JaeGymaeJaeiilaWIaeSOjGSKaeiilaWIaemOsaOKaei4oaSJaem4zaCMaeyypa0JaeGymaeJaeiilaWIaeSOjGSKaeiilaWIaem4raCKaei4oaSJaemyzauMaeyypa0JaeGymaeJaeiilaWIaeSOjGSKaeiilaWIaemyraueabaGae8hUdeNcdaWgaaqcbaAaaiabdQgaQjabdEgaNbqabaqcaaQaeiiFaWNaemysaKKcdaWgaaqcbaAaaiabdEgaNbqabaqcaaQaeyypa0JaeGimaadabaGaeiOFa4habaGaeeOta4KcdaqadaqcaaAaaiabicdaWiabcYcaSiab=D7aOPWaa0baaKqaGgaacqWGQbGAcqWGNbWzcqaIWaamaeaacqaIYaGmaaaajaaOcaGLOaGaayzkaaaabaGae8hUdeNcdaWgaaqcbaAaaiabdQgaQjabdEgaNbqabaqcaaQaeiiFaWNaemysaKKcdaWgaaqcbaAaaiabdEgaNbqabaqcaaQaeyypa0JaeGymaedabaGaeiOFa4habaGaeeOta4KcdaqadaqcaaAaaiabicdaWiabcYcaSiabdogaJPWaaSbaaKqaGgaacqWGQbGAaeqaaKaaGkabgEna0kab=D7aOPWaa0baaKqaGgaacqWGQbGAcqWGNbWzcqaIWaamaeaacqaIYaGmaaaajaaOcaGLOaGaayzkaaaabiqaaWfbcaWLjaGaemysaKKcdaWgaaqcbaAaaiabdEgaNbqabaaajaaObaGaeiOFa4habaGaeeOqaiKaeeyzauMaeeOCaiNaeeOBa4Maee4Ba8MaeeyDauNaeeiBaWMaeeiBaWMaeeyAaKMcdaqadaqcaaAaaiabdchaWbGaayjkaiaawMcaaaqaaiaaxMaacqWGWbaCaeaacqGG+bGFaeaacqqGvbqvcqqGUbGBcqqGPbqAcqqGMbGzcqqGVbWBcqqGYbGCcqqGTbqBkmaabmaajaaObaGaeGimaaJaeiilaWIaeGymaedacaGLOaGaayzkaaGaeiilaWcaaiaaxMaacaWLjaGcdaqadaqcaaAaaiabigdaXaGaayjkaiaawMcaaaaa@2D8D@

for j = 1,..., J independent studies. Here, yjgse is the microarray data, i.e. the normalized log-expression ratios for gene g, experiment e, slide s, and μjge is the average over all slides Se within experiment e of study j. θjg is the log-expression ratio for each gene of study j. Conjugate inverse chi-squared prior distributions are assigned to τjg2 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacqWFepaDdaqhaaWcbaGaemOAaOMaem4zaCgabaGaeGOmaidaaaaa@324B@ and σjg2 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacqWFdpWCdaqhaaWcbaGaemOAaOMaem4zaCgabaGaeGOmaidaaaaa@3249@ , for which we use the notation τjg2~kτ˜j2/χk2 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacqWFepaDdaqhaaWcbaGaemOAaOMaem4zaCgabaGaeGOmaidaaOGaeiOFa4Naem4AaSMaf8hXdqNbaGaadaqhaaWcbaGaemOAaOgabaGaeGOmaidaaOGaei4la8Iae83Xdm2aa0baaSqaaiabdUgaRbqaaiabikdaYaaaaaa@3EA3@ and σjg2~hσ˜j2/χh2 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacqWFdpWCdaqhaaWcbaGaemOAaOMaem4zaCgabaGaeGOmaidaaOGaeiOFa4NaemiAaGMaf83WdmNbaGaadaqhaaWcbaGaemOAaOgabaGaeGOmaidaaOGaei4la8Iae83Xdm2aa0baaSqaaiabdIgaObqaaiabikdaYaaaaaa@3E93@. Here, (χk2)1 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqGGOaakiiGacqWFhpWydaqhaaWcbaGaem4AaSgabaGaeGOmaidaaOGaeiykaKYaaWbaaSqabeaacqGHsislcqaIXaqmaaaaaa@34AE@ denotes the standard inverse χ2 distribution with k degrees of freedom, and τ˜j2 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacuWFepaDgaacamaaDaaaleaacqWGQbGAaeaacqaIYaGmaaaaaa@3103@ and σ˜j2 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacuWFdpWCgaacamaaDaaaleaacqWGQbGAaeaacqaIYaGmaaaaaa@3101@ are scale parameters of the inverse chi-squared distribution and are derived from the data. The parameter τ˜j2 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacuWFepaDgaacamaaDaaaleaacqWGQbGAaeaacqaIYaGmaaaaaa@3103@ is equal to slide variation, σ˜j2 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacuWFdpWCgaacamaaDaaaleaacqWGQbGAaeaacqaIYaGmaaaaaa@3101@ is equal to experiment variation for study j, and the degrees of freedom h, k are assumed known. We define Ig~Bernoulli(p) as the indicator variable for differential expression of gene g, i.e. θjg≠0, j = 1,..., J, where p is the percent of differentially expressed genes. Thus, Prob(Ig = 1) = p, where

I g = { 0  if  θ j g = 0 , j = 1 , , J 1  if  θ j g 0 , j = 1 , , J MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGjbqsdaWgaaWcbaGaem4zaCgabeaakiabg2da9maaceaabaqbaeqabiGaaaqaaiabicdaWiabbccaGiabbMgaPjabbAgaMjabbccaGGGaciab=H7aXnaaBaaaleaacqWGQbGAcqWGNbWzaeqaaOGaeyypa0JaeGimaaJaeiilaWcabaGaemOAaOMaeyypa0JaeGymaeJaeiilaWIaeSOjGSKaeiilaWIaemOsaOeabaGaeGymaeJaeeiiaaIaeeyAaKMaeeOzayMaeeiiaaIae8hUde3aaSbaaSqaaiabdQgaQjabdEgaNbqabaGccqGHGjsUcqaIWaamcqGGSaalaeaacqWGQbGAcqGH9aqpcqaIXaqmcqGGSaalcqWIMaYscqGGSaalcqWGkbGsaaaacaGL7baaaaa@5A25@

Here, genes are divided into two groups, non-expressed (Ig = 0) and expressed (Ig = 1), with respective probabilities (1-p) and p. The model produces the posterior distribution for Dg = Prob(Ig = 1|data), which is the basis for inference. For prior distributions, when Ig = 0, we assume the θjg are distributed normally with mean zero and small variance ηjg02 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacqWF3oaAdaqhaaWcbaGaemOAaOMaem4zaCMaeGimaadabaGaeGOmaidaaaaa@3320@; when Ig = 1, we assume the θjg are distributed normally with mean zero and large variance c × ηjg02 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacqWF3oaAdaqhaaWcbaGaemOAaOMaem4zaCMaeGimaadabaGaeGOmaidaaaaa@3320@. A Markov chain Monte Carlo (MCMC) implementation of the model [36] simulates posterior distributions for each parameter. See Methods for more details on the prior distributions and the MCMC implementation. For each gene, we calculate the posterior probability Dg of differential expression over all studies, and rank the genes based on Dg. The prior estimates of the variance parameters τjg2 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacqWFepaDdaqhaaWcbaGaemOAaOMaem4zaCgabaGaeGOmaidaaaaa@324B@ and σjg2 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacqWFdpWCdaqhaaWcbaGaemOAaOMaem4zaCgabaGaeGOmaidaaaaa@3249@ are similar to Tseng et al. [2]. Our prior structure for a single experiment is similar to Gottardo et al. [8] except that we place a Uniform prior distribution on p rather than estimating p through an iterative algorithm. We also have one more level of variation than the model of Gottardo et al. [8], i.e. variation over slides within experiments. Our underlying hierarchical Gaussian model is also similar to the BAM models of Ishwaran and Rao [14,15], except that the BAM models are designed for a two-sample problem, while our model assumes that the data are ratios of treatment and control intensities. We evaluate our model using false discovery rates and integration-driven discovery rates, defined in the following.

Integration-driven discovery

Choi et al. [18] define the integration-driven discovery rate (IDR) as the number of genes discovered in a meta-analysis that were not discovered in any of the individual studies alone, divided by the total number of discoveries. IDR represents the gain in information from combining studies versus individual studies. For our model, we fix a threshold value, γ, and label genes differentially expressed if (Dg γ). The IDR is defined as the number of genes that are labelled differentially expressed in the pooled analysis and are not differentially expressed in any of the individual studies:

IDR  ( γ ) = #  genes[( D g γ )  in pooled analysis] and [( D g < γ )  in all individual studies] # genes[( D g γ )  in pooled analysis] . MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqqGjbqscqqGebarcqqGsbGucqqGGaaidaqadaqaaGGaciab=n7aNbGaayjkaiaawMcaaiabg2da9maalaaabaGaei4iamIaeeiiaaIaee4zaCMaeeyzauMaeeOBa4MaeeyzauMaee4CamNaee4waSLaeeikaGIaemiraq0aaSbaaSqaaiabdEgaNbqabaGccqGHLjYScqWFZoWzcqGGPaqkcqqGGaaicqqGPbqAcqqGUbGBcqqGGaaicqqGWbaCcqqGVbWBcqqGVbWBcqqGSbaBcqqGLbqzcqqGKbazcqqGGaaicqqGHbqycqqGUbGBcqqGHbqycqqGSbaBcqqG5bqEcqqGZbWCcqqGPbqAcqqGZbWCcqqGDbqxcqqGGaaicqqGHbqycqqGUbGBcqqGKbazcqqGGaaicqqGBbWwcqqGOaakcqWGebardaWgaaWcbaGaem4zaCgabeaakiabgYda8iab=n7aNjabcMcaPiabbccaGiabbMgaPjabb6gaUjabbccaGiabbggaHjabbYgaSjabbYgaSjabbccaGiabbMgaPjabb6gaUjabbsgaKjabbMgaPjabbAha2jabbMgaPjabbsgaKjabbwha1jabbggaHjabbYgaSjabbccaGiabbohaZjabbsha0jabbwha1jabbsgaKjabbMgaPjabbwgaLjabbohaZjabb2faDbqaaiabbocaJiabbccaGiabbEgaNjabbwgaLjabb6gaUjabbwgaLjabbohaZjabbUfaBjabbIcaOiabdseaenaaBaaaleaacqWGNbWzaeqaaOGaeyyzImRae83SdCMaeiykaKIaeeiiaaIaeeyAaKMaeeOBa4MaeeiiaaIaeeiCaaNaee4Ba8Maee4Ba8MaeeiBaWMaeeyzauMaeeizaqMaeeiiaaIaeeyyaeMaeeOBa4MaeeyyaeMaeeiBaWMaeeyEaKNaee4CamNaeeyAaKMaee4CamNaeeyxa0faaiabc6caUaaa@BBC0@

False discovery rate

Benjamini and Hochberg [37] introduced the false discovery rate (FDR), which is defined as the number of false discoveries divided by the number of discoveries. We refer to this as the true false discovery rate (tFDR), which can be exactly computed in our simulation studies since we know which genes are truly differentially expressed. Further applications of FDR to microarrays include [38-40]. Genovese and Wasserman [41] define the posterior expected FDR (peFDR) as:

p e FDR = E ( FDR | Y ) = g ( 1 D g ) δ g g δ g , MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamXvP5wqSXMqHnxAJn0BKvguHDwzZbqegyvzYrwyUfgarqqtubsr4rNCHbGeaGqiA8vkIkVAFgIELiFeLkFeLk=iY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfeaY=biLkVcLq=JHqVepeea0=as0db9vqpepesP0xe9Fve9Fve9GapdbaqaaeGacaGaaiaabeqaamqadiabaaGcbaqcaaQaemiCaaNaemyzauMaeeOrayKaeeiraqKaeeOuaiLaeyypa0JaemyrauKcdaqadaqcaaAaaiabbAeagjabbseaejabbkfasjabcYha8HWadiaa=LfaaiaawIcacaGLPaaacqGH9aqpkmaalaaajaaObaGcdaaeqbqcaaAaaOWaaeWaaKaaGgaacqaIXaqmcqGHsislcqWGebarkmaaBaaajeaObaGaem4zaCgabeaaaKaaGkaawIcacaGLPaaaiiGacqGF0oazkmaaBaaajeaObaGaem4zaCgabeaaaeaacqWGNbWzaeqajmaOcqGHris5aaqcaaAaaOWaaabuaKaaGgaacqGF0oazkmaaBaaajeaObaGaem4zaCgabeaaaeaacqWGNbWzaeqajmaOcqGHris5aaaajaaOcqGGSaalaaa@6AC5@

with δg an indicator for differentially expressed genes (see also Do et al. [13]). In the simulated data, we use tFDR and compare tFDR to peFDR; for the experimental data, we have no choice but to use peFDR.

Simulation results for two studies

We simulated data for two studies, Study 1 and Study 2, with similar format to the B. subtilis mutant and induction studies, using three different values for the percent of truly differentially expressed genes: p = 5%, 10% and 25% (see Methods). We implemented Model (1) for each of the two simulation studies separately and in a pooled analysis. The IDR ranged from 1.9% to 42.9% for all values of p for γ ≥ 50%, with maximum tFDR of 5% for the pooled analysis (Table (Table1).1). Note that tFDR is low for large values of γ due to the simulation procedure. The IDR increases as γ increases. IDR is also smaller for larger values of p (Figure (Figure1a);1a); this is due to the larger variability between studies. As a result, fewer genes have Dg less than γ in both studies separately, which reduces IDR. In addition to identifying highly expressed genes by choosing a γ threshold, researchers often choose a maximum tFDR and examine lists of differentially expressed genes with corresponding tFDR. In Figure Figure1b,1b, we display all tFDR levels < 20% for p = 10% and show the number of discovered genes for the two individual studies and the pooled analysis. This plot shows the considerable increase in the number of differentially expressed genes found in the pooled analysis versus the separate analyses for the same level of tFDR.

Figure 1
IDR and discovered genes versus tFDR for the two-study simulation data. a) Integration-driven discovery rate (IDR) versus threshold values of posterior probabilities of differential expression, γ, for the two-study simulated data and percent of ...
Table 1
Results for two-study simulations. Integration-driven discovery rate (IDR) and the number of discovered genes for various threshold values of the posterior probability of differential expression, γ, and three simulated levels of the percent of ...

In addition to choosing a threshold value of Dg or FDR, researchers are often interested in the top set of genes only, i.e. the top 300 genes. For this reason, we rank the genes based on Dg in both the pooled and individual analyses and compare the resulting numbers of differentially expressed genes that are included in the top genes. For each of the three simulation studies, p = 5%, 10%, 25%, we choose a threshold of the top p% of genes. We find that the pooled model always identifies a larger number of differentially expressed genes than individual studies (Table (Table22).

Table 2
Number of differentially expressed (D.E.) genes for fixed top numbers of genes. The number of differentially expressed genes discovered by the pooled model and individual models for fixed threshold numbers of top genes, including the two-study simulation ...

We also compared peFDR to tFDR for the simulation data to ensure that our peFDR is a reasonable approximation to the true values. As seen from Figure Figure2,2, which displays all values of peFDR versus tFDR for the simulation results, the peFDR was always larger than tFDR, so that peFDR is a conservative estimate of tFDR. The average differences between peFDR and tFDR were less than 2.3% for all pooled simulation results. The maximum difference between peFDR and tFDR decreased as the simulated percent of truly differentially expressed genes p increased. Specifically, for p = 5%, the average difference between peFDR and tFDR was 2.3%, with maximum difference of 12.6% at tFDR = 19.9%. For p = 10%, the average difference was 2.2%, with maximum difference of 7.8% at tFDR = 3.1%. For p = 25%, the average difference was 2.3%, with maximum difference of 5.1% at tFDR = 23.8%.

Figure 2
True false discovery rate versus posterior expected false discovery rate for the simulation data. True false discovery rate (tFDR) (solid lines) and posterior expected false discovery rate (peFDR) (dashed lines) versus the number of discovered genes for: ...

Simulation results for five studies

We also assessed our model for a meta-analysis with five studies. For this, we used the same Study 1 and Study 2 as in the previous section, and p = 10%. We then simulated three further studies similar to Study 1, but with different model parameters for ηjg02 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacqWF3oaAdaqhaaWcbaGaemOAaOMaem4zaCMaeGimaadabaGaeGOmaidaaaaa@3320@,c, and variation over slides and experiments (see Methods). The IDR was 7.1% for γ = 0.95, and 12.8% for γ = 99%, with tFDR of 0% in the pooled analysis for these levels of γ. The IDR was lower for the same levels of γ for the five-study versus two-study pooled analysis. This was due to the larger variation between the five studies, resulting in fewer genes with Dg less than γ in all studies, which reduced IDR. We plot IDR versus γ in Figure Figure3a.3a. Figure Figure3b3b displays the number of discovered genes for the pooled analysis versus the five separate analyses for tFDR < 20%, which again shows a considerable increase for the pooled analysis.

Figure 3
IDR and discovered genes versus tFDR for the five-study simulation data. a) Integration-driven discovery rate (IDR) versus threshold values of posterior probabilities of differential expression, γ, for the five-study simulated data and percent ...

We also show the number of differentially expressed genes identified by the pooled versus individual analyses for a fixed value of top expressed genes in Table Table2.2. For the top 300 genes, the pooled model again identifies more differentially expressed genes than individual studies. We also compared peFDR to tFDR in Figure Figure2d.2d. The average difference was 0.54%, with maximum difference of 2.7%at tFDR = 0.36%. These values are smaller than the results for the two-study simulations, showing improved accuracy of peFDR when pooling more data.

Experimental data results

We implemented Model (1) to pool the mutant and induction B. subtilis studies, with 2,515 genes that had expression in both studies. We also implemented Model (1) for each study individually. For values of γ ≥ 50% and maximum peFDR of 11.5% for the pooled analysis, the IDR ranged from 8.2% to 53.3% (Table (Table3).3). We plot IDR versus γ in Figure Figure4a.4a. Figure Figure4b4b presents the number of discovered genes for peFDR < 20% for both the separate and pooled analyses. The induction study had much lower log-ratios of expression than the mutant study; the average 97.5%-ile for the induction experiments was 0.65 versus 1.59 for the mutant experiments. As a result, the maximum Dg value for the induction study was 0.84, with minimum corresponding peFDR of 16%. In contrast, the mutant study had 33 genes with Dg of 1.0. Even though the values of Dg were lower for the induction than the mutant study, we found that combining the two data sets resulted in more discoveries of differentially expressed genes than either study alone for fixed levels of peFDR.

Figure 4
IDR and discovered genes versus peFDR for the experimental data. a) Integration-driven discovery rate (IDR) versus threshold values of posterior probabilities of differential expression, γ, for the B. subtilis mutant and induction experimental ...
Table 3
Results for Bacillus subtilis experimental data. Integration-driven discovery rate (IDR), posterior expected false discovery rate (peFDR) and the number of discovered genes for various threshold values of the posterior probability of differential expression, ...

Conclusion

We demonstrated here the usefulness of a Bayesian hierarchical model for pooling data across independent microarray studies with several sources of variation. The pooled method provides a systematic analysis framework, producing probability estimates of differential expression for each gene. These estimates are used to rank genes, calculate IDR, and produce posterior expected FDR values.

In the simulation of two and five studies, we found an appreciable increase in the IDR for various thresholds of the probability of differential expression, with corresponding low levels of tFDR. When fixing tFDR, we found more genes discovered in the pooled analysis than the separate analyses. When setting a threshold for the top genes of interest, the pooled model identified more truly differentially expressed genes than individual analyses. In the simulation of five studies, the IDR was somewhat lower than for two studies, but was still considerable. When comparing the peFDR to tFDR in simulations, we found reasonable agreement, with peFDR overestimating tFDR on average by less than 3%. The difference between peFDR and tFDR decreased for the simulation of five studies, indicating that pooling more data improves the posterior estimation of FDR. In our analysis of experimental data, the IDR was also large. One study had somewhat lower probabilities of differential expression, which resulted in more discoveries when the data was pooled. We conclude that combining information across studies strengthens the probabilities of differential expression, improves IDR, and increases the number of discovered genes for fixed tFDR, peFDR and fixed top percent of genes than individual study analyses.

Our model is designed for studies from the same platform. In the B. subtilis experimental data, a common control sample was used for the mutant and induction studies. However, our model does not require a common reference sample across studies, and assumes the studies are independent. In addition, all studies do not need to have the same array-design layout. This is due to the studies being linked only through the common parameter of differential expression, p; no other parameters are shared between studies. For example, one study could have only replicate slides, and another study could have both replicate slides and replicate experiments. We also assume that there are either no outlying studies or that the data has been scaled across studies before analysis. Future work will address the issues of pooling studies from different platforms and sets of studies that may contain outliers.

Methods

Simulation data for two studies

We simulated data for two studies, with the same format as the B. subtilis mutant and induction experimental data (see Methods: Experimental data), with the percent of differentially expressed genes of p = 5%, 10% and 25%. Each study had 3,000 genes; the first study had 5 replicate slides within 3 replicate experiments, and the second study had 4 replicate slides within 3 replicate experiments. We simulated data from Model (1), with parameters similar to those found in the experimental data. For Study 1, we used ηjg02 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacqWF3oaAdaqhaaWcbaGaemOAaOMaem4zaCMaeGimaadabaGaeGOmaidaaaaa@3320@ = 0.015, and c = 66.67. The variance across slides was set to 0.074, and across experiments to 0.029. For Study 2, we used ηjg02 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacqWF3oaAdaqhaaWcbaGaemOAaOMaem4zaCMaeGimaadabaGaeGOmaidaaaaa@3320@ = 0.02, and c = 40. The variance across slides was set to 0.02, and across experiments to 0.026.

Between study variance for the experimental data was 0.067 for all 2,515 genes, and 0.296 for the top 10% of genes. The simulated data had similar between study variance for all genes, and higher variability for the top genes than the experimental data. The between study variance was 0.053 for p = 5%, 0.105 for p = 10% and 0.23 for p = 25% for all 3,000 genes, with between study variance for the top genes of 0.714 for p = 5%, 0.887 for p = 10% and 0.86 for p = 25%. For each gene, log-expression ratios are simulated from normal distributions, independently of other genes. Although expression is expected to have some correlation among genes, this is difficult to model, and we thus assume independence for simulation purposes. The independence assumption was also used in simulation studies by other authors (see, for example, [8,9]).

Simulation data for five studies

For the simulation of five studies, we used the Study 1 and Study 2 data from the previous section, and simulated data for 3 additional studies, with p = 10% for all studies. For Studies 3, 4 and 5, we simulated 5 replicate slides within 3 replicate experiments. For the parameters ηjg02 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacqWF3oaAdaqhaaWcbaGaemOAaOMaem4zaCMaeGimaadabaGaeGOmaidaaaaa@3320@ ,c and variance across slides and experiments, we used a range of values that were either between the values for Study 1 and Study 2, or somewhat larger or smaller than these two studies. For the Study 3, we used ηjg02 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacqWF3oaAdaqhaaWcbaGaemOAaOMaem4zaCMaeGimaadabaGaeGOmaidaaaaa@3320@ = 0.017, and c = 70.6. The variance across slides was set to 0.05, and across experiments to 0.02. For Study 4, we used ηjg02 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacqWF3oaAdaqhaaWcbaGaemOAaOMaem4zaCMaeGimaadabaGaeGOmaidaaaaa@3320@ = 0.02, and c = 55. The variance across slides was set to 0.04, and across experiments to 0.022. For Study 5, we used ηjg02 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacqWF3oaAdaqhaaWcbaGaemOAaOMaem4zaCMaeGimaadabaGaeGOmaidaaaaa@3320@ = 0.015, and c = 60. The variance across slides was set to 0.06, and across experiments to 0.03. Between study variance ranged from 0.098 to 0.153 for all 5 studies for all 3,000 genes, and from 0.827 to 1.35 for the top 10% of genes, which was higher than the two-study experimental data.

Experimental data

The B. subtilis experiments were designed to identify sporulation genes under the control of sigma factor E (σE). Two complementary experimental setups were used, the first was a deletion of σE (mutant study) and the second an overexpression of σE (induction study), described in the following (for additional details, see [35,42]).

Mutant study

In the mutant study, the treated sample contained sporulating cells with a null mutation in the gene for σE (i.e. the mutant sample), and the control sample contained sporulating cells that were wild type for σE. The wild-type/mutant ratios were examined; up-regulated genes were identified as belonging to the σE regulon. In total, five microarrays were produced from three independent identical experiments; the first experiment had three replicate arrays and the second and third experiments each had one array. The number of genes spotted on the five arrays ranged from 4,268 to 4,751; these values are larger than the B. subtilis genome size of 4,106 due to multiple spotting of selected genes on various arrays. The percent of low quality spots that were removed from analysis ranged from 18.6% to 64.5% of values across the five arrays. In total, there were 3,713 genes with measurable expression ratios in at least one microarray. Here, we analyze values after normalization using a rank-invariant method [2,43].

Induction study

In the induction study, σE was overexpressed in response to an inducer, i.e. cells that had been treated with an inducer were compared to control cells. The induction/wild-type ratios were examined; up-regulated genes were identified as belonging to the σE regulon. In total, four microarrays were produced from three independent identical experiments. The first two experiments each had one array, and the third experiment had two replicate arrays. The number of genes spotted on the four arrays ranged from 4,608 to 4,751; the percentage of genes detected on the arrays ranged from 33.0% to 47.4%. In total, there were 2,552 genes with measurable expression ratios in at least one microarray. Here, we again analyze the post-normalized values.

Markov chain Monte Carlo implementation

In the Markov chain Monte Carlo analysis, the full conditionals are simulated as follows.

Joint posterior distribution

For the hierarchical model of (1), the joint distribution of the data and parameters is:

p ( y j g s e , μ j g e , τ j g 2 , σ j g 2 , θ j g , I g , p , η j g 0 2 , c j ) = j = 1 J g = 1 G e = 1 E { s = 1 S e p ( y j g s e | μ j g e , τ j g 2 ) p ( μ j g e | θ j g , σ j g 2 ) } p ( θ j g , I g | p , Ω j ) p ( τ j g 2 ) p ( σ j g 2 ) p ( p ) p ( Ω j ) , MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamXvP5wqSXMqHnxAJn0BKvguHDwzZbqegyvzYrwyUfgarqqtubsr4rNCHbGeaGqiA8vkIkVAFgIELiFeLkFeLk=iY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfeaY=biLkVcLq=JHqVepeea0=as0db9vqpepesP0xe9Fve9Fve9GapdbaqaaeGacaGaaiaabeqaamqadiabaaGceaqabeaajaaOcqWGWbaCcqGGOaakcqWG5bqEkmaaBaaajeaObaGaemOAaOMaem4zaCMaem4CamNaemyzaugabeaajaaOcqGGSaaliiGacqWF8oqBkmaaBaaajeaObaGaemOAaOMaem4zaCMaemyzaugabeaajaaOcqGGSaalcqWFepaDkmaaDaaajeaObaGaemOAaOMaem4zaCgabaGaeGOmaidaaKaaGkabcYcaSiab=n8aZPWaa0baaKqaGgaacqWGQbGAcqWGNbWzaeaacqaIYaGmaaqcaaQaeiilaWIae8hUdeNcdaWgaaqcbaAaaiabdQgaQjabdEgaNbqabaqcaaQaeiilaWIaemysaKKcdaWgaaqcbaAaaiabdEgaNbqabaqcaaQaeiilaWIaemiCaaNaeiilaWIae83TdGMcdaqhaaqcbaAaaiabdQgaQjabdEgaNjabicdaWaqaaiabikdaYaaajaaOcqGGSaalcqWGJbWykmaaBaaajeaObaGaemOAaOgabeaajaaOcqGGPaqkcqGH9aqpaOqaamaarahajaaObaGcdaqeWbqcaaAaaOWaaebCaKaaGgaakmaacmaajaaObaGcdaqeWbqcaaAaaiabdchaWjabcIcaOiabdMha5PWaaSbaaKqaGgaacqWGQbGAcqWGNbWzcqWGZbWCcqWGLbqzaeqaaKaaGkabcYha8jab=X7aTPWaaSbaaKqaGgaacqWGQbGAcqWGNbWzcqWGLbqzaeqaaKaaGkabcYcaSiab=r8a0PWaa0baaKqaGgaacqWGQbGAcqWGNbWzaeaacqaIYaGmaaqcaaQaeiykaKIaemiCaaNaeiikaGIae8hVd0McdaWgaaqcbaAaaiabdQgaQjabdEgaNjabdwgaLbqabaqcaaQaeiiFaWNae8hUdeNcdaWgaaqcbaAaaiabdQgaQjabdEgaNbqabaqcaaQaeiilaWIae83WdmNcdaqhaaqcbaAaaiabdQgaQjabdEgaNbqaaiabikdaYaaajaaOcqGGPaqkaKqaGgaacqWGZbWCcqGH9aqpcqaIXaqmaeaacqWGtbWulmaaBaaajiaObaGaemyzaugabeaaaKWaGkabg+GivdaajaaOcaGL7bGaayzFaaGaemiCaaNaeiikaGIae8hUdeNcdaWgaaqcbaAaaiabdQgaQjabdEgaNbqabaqcaaQaeiilaWIaemysaKKcdaWgaaqcbaAaaiabdEgaNbqabaqcaaQaeiiFaWNaemiCaaNaeiilaWccceGae4xQdCLcdaWgaaqcbaAaaGWabiaa9PgaaeqaaKaaGkabcMcaPaqcbaAaaiabdwgaLjabg2da9iabigdaXaqaaiabdweafbqcdaQaey4dIunajaaOcqWGWbaCcqGGOaakcqWFepaDkmaaDaaajeaObaGaemOAaOMaem4zaCgabaGaeGOmaidaaKaaGkabcMcaPiabdchaWbqcbaAaaiabdEgaNjabg2da9iabigdaXaqaaiabdEeahbqcdaQaey4dIunajaaOcqGGOaakcqWFdpWCkmaaDaaajeaObaGaemOAaOMaem4zaCgabaGaeGOmaidaaKaaGkabcMcaPiabdchaWjabcIcaOiabdchaWjabcMcaPiabdchaWjabcIcaOiab+L6axPWaaSbaaKqaGgaacaqFQbaabeaajaaOcqGGPaqkaKqaGgaacqWGQbGAcqGH9aqpcqaIXaqmaeaacqWGkbGsaKWaGkabg+GivdqcaaQaeiilaWcaaaa@12D6@

where Ωj = (ηjg02 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacqWF3oaAdaqhaaWcbaGaemOAaOMaem4zaCMaeGimaadabaGaeGOmaidaaaaa@3320@, cj), j = study, g = gene, e = experiment, s = slide.

Prior distributions

The prior distributions are specified as follow.

τ j g 2 ~ k τ ˜ j 2 / χ k 2 σ j g 2 ~ h σ ˜ j 2 / χ h 2 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakqaabeqaaGGaciab=r8a0naaDaaaleaacqWGQbGAcqWGNbWzaeaacqaIYaGmaaGccqGG+bGFcqWGRbWAcuWFepaDgaacamaaDaaaleaacqWGQbGAaeaacqaIYaGmaaGccqGGVaWlcqWFhpWydaqhaaWcbaGaem4AaSgabaGaeGOmaidaaaGcbaGae83Wdm3aa0baaSqaaiabdQgaQjabdEgaNbqaaiabikdaYaaakiabc6ha+jabdIgaOjqb=n8aZzaaiaWaa0baaSqaaiabdQgaQbqaaiabikdaYaaakiabc+caViab=D8aJnaaDaaaleaacqWGObaAaeaacqaIYaGmaaaaaaa@508F@

Here, (χk2)1 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqGGOaakiiGacqWFhpWydaqhaaWcbaGaem4AaSgabaGaeGOmaidaaOGaeiykaKYaaWbaaSqabeaacqGHsislcqaIXaqmaaaaaa@34AE@ denotes the standard inverse χ2 distribution with k degrees of freedom, and τ˜j2 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacuWFepaDgaacamaaDaaaleaacqWGQbGAaeaacqaIYaGmaaaaaa@3103@ and σ˜j2 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacuWFdpWCgaacamaaDaaaleaacqWGQbGAaeaacqaIYaGmaaaaaa@3101@ are scale parameters of the inverse chi-squared distribution derived from the data. τ˜j2 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacuWFepaDgaacamaaDaaaleaacqWGQbGAaeaacqaIYaGmaaaaaa@3103@ is produced as follows:

τ ˜ j 2 = 1 G ( S e 1 ) g = 1 G e = 1 E s = 1 S e ( y j g s e y j g e ) 2 , MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacuWFepaDgaacamaaDaaaleaacqWGQbGAaeaacqaIYaGmaaGccqGH9aqpdaWcaaqaaiabigdaXaqaaiabdEeahnaabmaabaWaaabqaeaacqWGtbWudaWgaaWcbaGaemyzaugabeaakiabgkHiTiabigdaXaWcbeqab0GaeyyeIuoaaOGaayjkaiaawMcaaaaadaaeWbqaamaaqahabaWaaabCaeaadaqadaqaaiabdMha5naaBaaaleaacqWGQbGAcqWGNbWzcqWGZbWCcqWGLbqzaeqaaOGaeyOeI0IaemyEaK3aaSbaaSqaaiabdQgaQjabdEgaNjabgwSixlabdwgaLbqabaaakiaawIcacaGLPaaadaahaaWcbeqaaiabikdaYaaaaeaacqWGZbWCcqGH9aqpcqaIXaqmaeaacqWGtbWudaWgaaadbaGaemyzaugabeaaa0GaeyyeIuoaaSqaaiabdwgaLjabg2da9iabigdaXaqaaiabdweafbqdcqGHris5aaWcbaGaem4zaCMaeyypa0JaeGymaedabaGaem4raCeaniabggHiLdGccqGGSaalaaa@659D@

where yjg.e is the average log-ratio of expression over the slides within an experiment:

y j g e = 1 S e s = 1 S e y j g s e . MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWG5bqEdaWgaaWcbaGaemOAaOMaem4zaCMaeyyXICTaemyzaugabeaakiabg2da9maalaaabaGaeGymaedabaGaem4uam1aaSbaaSqaaiabdwgaLbqabaaaaOWaaabCaeaacqWG5bqEdaWgaaWcbaGaemOAaOMaem4zaCMaem4CamNaemyzaugabeaaaeaacqWGZbWCcqGH9aqpcqaIXaqmaeaacqWGtbWudaWgaaadbaGaemyzaugabeaaa0GaeyyeIuoakiabc6caUaaa@49C2@

Similarly, the scale parameter for σjg2 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacqWFdpWCdaqhaaWcbaGaemOAaOMaem4zaCgabaGaeGOmaidaaaaa@3249@ is calculated as follows:

σ ˜ j 2 = 1 G ( E 1 ) g = 1 G e = 1 E ( y j g . e y j g .. ) 2 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacuWFdpWCgaacamaaDaaaleaacqWGQbGAaeaacqaIYaGmaaGccqGH9aqpdaWcaaqaaiabigdaXaqaaiabdEeahnaabmaabaGaemyrauKaeyOeI0IaeGymaedacaGLOaGaayzkaaaaamaaqahabaWaaabCaeaadaqadaqaaiabdMha5naaBaaaleaacqWGQbGAcqWGNbWzcqGGUaGlcqWGLbqzaeqaaOGaeyOeI0IaemyEaK3aaSbaaSqaaiabdQgaQjabdEgaNjabc6caUiabc6caUaqabaaakiaawIcacaGLPaaadaahaaWcbeqaaiabikdaYaaaaeaacqWGLbqzcqGH9aqpcqaIXaqmaeaacqWGfbqra0GaeyyeIuoaaSqaaiabdEgaNjabg2da9iabigdaXaqaaiabdEeahbqdcqGHris5aaaa@5638@

where yjg.. is the average log-ratio of expression over both slides and experiments. We use 3 degrees of freedom in each study for both τjg2 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacqWFepaDdaqhaaWcbaGaemOAaOMaem4zaCgabaGaeGOmaidaaaaa@324B@ and σjg2 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacqWFdpWCdaqhaaWcbaGaemOAaOMaem4zaCgabaGaeGOmaidaaaaa@3249@, i.e. h = k = 3. The prior distributions for the remaining parameters are as follow.

θ j g | I g = 0 ~ N( 0 , η j g 0 2 ) θ j g | I g = 1 ~ N( 0 , c j × η j g 0 2 ) I g ~ Bernoulli ( p ) p ~ Uniform ( 0 , 1 ) η j g 0 2 ~ a s 1 2 / χ a 2 c j ~ b s 2 2 / χ b 2 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamXvP5wqSXMqHnxAJn0BKvguHDwzZbqegyvzYrwyUfgarqqtubsr4rNCHbGeaGqiA8vkIkVAFgIELiFeLkFeLk=iY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfeaY=biLkVcLq=JHqVepeea0=as0db9vqpepesP0xe9Fve9Fve9GapdbaqaaeGacaGaaiaabeqaamqadiabaaGcbaqcaaAbaeaabyWaaaaabaacciGae8hUdeNcdaWgaaqcbaAaaiabdQgaQjabdEgaNbqabaqcaaQaeiiFaWNaemysaKKcdaWgaaqcbaAaaiabdEgaNbqabaqcaaQaeyypa0JaeGimaadabaGaeiOFa4habaGaeeOta4KaeeikaGIaeGimaaJaeiilaWIae83TdGMcdaqhaaqcbaAaaiabdQgaQjabdEgaNjabicdaWaqaaiabikdaYaaajaaOcqGGPaqkaeaacqWF4oqCkmaaBaaajeaObaGaemOAaOMaem4zaCgabeaajaaOcqGG8baFcqWGjbqskmaaBaaajeaObaGaem4zaCgabeaajaaOcqGH9aqpcqaIXaqmaeaacqGG+bGFaeaacqqGobGtcqqGOaakcqaIWaamcqGGSaalcqWGJbWykmaaBaaajeaObaGaemOAaOgabeaajaaOcqGHxdaTcqWF3oaAkmaaDaaajeaObaGaemOAaOMaem4zaCMaeGimaadabaGaeGOmaidaaKaaGkabcMcaPaqaceaaizGaaCzcaiabdMeajPWaaSbaaKqaGgaacqWGNbWzaeqaaaqcaaAaaiabc6ha+bqaaiabbkeacjabbwgaLjabbkhaYjabb6gaUjabb+gaVjabbwha1jabbYgaSjabbYgaSjabbMgaPPWaaeWaaKaaGgaacqWGWbaCaiaawIcacaGLPaaaaeGabaaWgiaaxMaacqWGWbaCaeaacqGG+bGFaeaacqqGvbqvcqqGUbGBcqqGPbqAcqqGMbGzcqqGVbWBcqqGYbGCcqqGTbqBkmaabmaajaaObaGaeGimaaJaeiilaWIaeGymaedacaGLOaGaayzkaaaabiqaaqtbcaWLjaGae83TdGMcdaqhaaqcbaAaaiabdQgaQjabdEgaNjabicdaWaqaaiabikdaYaaaaKaaGgaacqGG+bGFaeaacqWGHbqycqWGZbWCkmaaDaaajeaObaGaeGymaedabaGaeGOmaidaaKaaGkabc+caViab=D8aJPWaa0baaKqaGgaacqWGHbqyaeaacqaIYaGmaaaajaaObiqaaGKbcaWLjaGaem4yamMcdaWgaaqcbaAaaiabdQgaQbqabaaajaaObaGaeiOFa4habaGaemOyaiMaem4CamNcdaqhaaqcbaAaaiabikdaYaqaaiabikdaYaaajaaOcqGGVaWlcqWFhpWykmaaDaaajeaObaGaemOyaigabaGaeGOmaidaaaaaaaa@CC31@

We choose a, s12 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGZbWCdaqhaaWcbaGaeGymaedabaGaeGOmaidaaaaa@302A@ so that the prior mean of ηjg02 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamXvP5wqSXMqHnxAJn0BKvguHDwzZbqegyvzYrwyUfgarqqtubsr4rNCHbGeaGqiA8vkIkVAFgIELiFeLkFeLk=iY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfeaY=biLkVcLq=JHqVepeea0=as0db9vqpepesP0xe9Fve9Fve9GapdbaqaaeGacaGaaiaabeqaamqadiabaaGcbaacciqcaaQae83TdGMcdaqhaaqcbaAaaiabdQgaQjabdEgaNjabicdaWaqaaiabikdaYaaaaaa@44B1@ is 1 with variance 0.1. We choose b, s22 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGZbWCdaqhaaWcbaGaeGOmaidabaGaeGOmaidaaaaa@302C@ so that the prior mean of cj is 100 with variance 10,000.

Full conditional posterior distributions

Each parameter is sampled from the full conditional posterior distributions by the following.

μ j g e | rest ~ N ( S e y j g e σ j g 2 + τ j g 2 θ j g S e σ j g 2 + τ j g 2 , τ j g 2 σ j g 2 S e σ j g 2 + τ j g 2 ) , τ j g 2 | rest ~ { e = 1 E s = 1 S e ( y j g s e μ j g e ) 2 + k τ ˜ j 2 } / χ S 1 + + S E + k , 2 σ j g 2 | rest ~ [ e = 1 E ( μ j g e μ j g ) 2 + h σ ˜ j 2 ] / χ E + h 1 2 . MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamXvP5wqSXMqHnxAJn0BKvguHDwzZbqegyvzYrwyUfgarqqtubsr4rNCHbGeaGqiA8vkIkVAFgIELiFeLkFeLk=iY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfeaY=biLkVcLq=JHqVepeea0=as0db9vqpepesP0xe9Fve9Fve9GapdbaqaaeGacaGaaiaabeqaamqadiabaaGcbaqcaaAbaeaabmWaaaqaaGGaciab=X7aTPWaaSbaaKqaGgaacqWGQbGAcqWGNbWzcqWGLbqzaeqaaKaaGkabcYha8jabbkhaYjabbwgaLjabbohaZjabbsha0bqaaiabb6ha+bqaaiabd6eaoPWaaeWaaKaaGgaakmaalaaajaaObaGaem4uamLcdaWgaaqcbaAaaiabdwgaLbqabaqcaaQaemyEaKNcdaWgaaqcbaAaaiabdQgaQjabdEgaNjabgwSixlabdwgaLbqabaqcaaQae83WdmNcdaqhaaqcbaAaaiabdQgaQjabdEgaNbqaaiabikdaYaaajaaOcqGHRaWkcqWFepaDkmaaDaaajeaObaGaemOAaOMaem4zaCgabaGaeGOmaidaaKaaGkab=H7aXPWaaSbaaKqaGgaacqWGQbGAcqWGNbWzaeqaaaqcaaAaaiabdofatPWaaSbaaKqaGgaacqWGLbqzaeqaaKaaGkab=n8aZPWaa0baaKqaGgaacqWGQbGAcqWGNbWzaeaacqaIYaGmaaqcaaQaey4kaSIae8hXdqNcdaqhaaqcbaAaaiabdQgaQjabdEgaNbqaaiabikdaYaaaaaqcaaQaeiilaWIcdaWcaaqcaaAaaiab=r8a0PWaa0baaKqaGgaacqWGQbGAcqWGNbWzaeaacqaIYaGmaaqcaaQae83WdmNcdaqhaaqcbaAaaiabdQgaQjabdEgaNbqaaiabikdaYaaaaKaaGgaacqWGtbWukmaaBaaajeaObaGaemyzaugabeaajaaOcqWFdpWCkmaaDaaajeaObaGaemOAaOMaem4zaCgabaGaeGOmaidaaKaaGkabgUcaRiab=r8a0PWaa0baaKqaGgaacqWGQbGAcqWGNbWzaeaacqaIYaGmaaaaaaqcaaQaayjkaiaawMcaaiabcYcaSaqaciaa0caaueGaaCzcaiab=r8a0PWaa0baaKqaGgaacqWGQbGAcqWGNbWzaeaacqaIYaGmaaqcaaQaeiiFaWNaeeOCaiNaeeyzauMaee4CamNaeeiDaqhabaGaeeOFa4habaGcdaWcgaqcaaAaaOWaaiWaaKaaGgaakmaaqahajaaObaGcdaaeWbqcaaAaaiabbIcaOiabdMha5PWaaSbaaKqaGgaacqWGQbGAcqWGNbWzcqWGZbWCcqWGLbqzaeqaaKaaGkabgkHiTiab=X7aTPWaaSbaaKqaGgaacqWGQbGAcqWGNbWzcqWGLbqzaeqaaKaaGkabcMcaPOWaaWbaaKqaGgqabaGaeGOmaidaaKaaGkabgUcaRiabdUgaRjqb=r8a0zaaiaGcdaqhaaqcbaAaaiabdQgaQbqaaiabikdaYaaaaeaacqWGZbWCcqGH9aqpcqaIXaqmaeaacqWGtbWulmaaBaaajiaObaGaemyzaugabeaaaKWaGkabggHiLdaajeaObaGaemyzauMaeyypa0JaeGymaedabaGaemyraueajmaOcqGHris5aaqcaaQaay5Eaiaaw2haaaqaaiab=D8aJPWaa0baaKqaGgaacqWGtbWulmaaBaaajiaObaGaeGymaedabeaajeaOcqGHRaWkcqWIMaYscqGHRaWkcqWGtbWulmaaBaaajiaObaGaemyraueabeaajeaOcqGHRaWkcqWGRbWAcqGGSaalaeaacqaIYaGmaaaaaaqcaaAaceaaOcGaaCzcaiab=n8aZPWaa0baaKqaGgaacqWGQbGAcqWGNbWzaeaacqaIYaGmaaqcaaQaeiiFaWNaeeOCaiNaeeyzauMaee4CamNaeeiDaqhabaGaeeOFa4habaGcdaWcgaqcaaAaaOWaamWaaKaaGgaakmaaqahajaaObaGaeeikaGIae8hVd0McdaWgaaqcbaAaaiabdQgaQjabdEgaNjabdwgaLbqabaqcaaQaeyOeI0Iae8hVd0McdaWgaaqcbaAaaiabdQgaQjabdEgaNjabgwSixdqabaqcaaQaeiykaKIcdaahaaqcbaAabeaacqaIYaGmaaaabaGaemyzauMaeyypa0JaeGymaedabaGaemyraueajmaOcqGHris5aKaaGkabgUcaRiabdIgaOjqb=n8aZzaaiaGcdaqhaaqcbaAaaiabdQgaQbqaaiabikdaYaaaaKaaGkaawUfacaGLDbaaaeaacqWFhpWykmaaDaaajeaObaGaemyrauKaey4kaSIaemiAaGMaeyOeI0IaeGymaedabaGaeGOmaidaaaaajaaOcqGGUaGlaaaaaa@414A@

Here, χS1++SE+k2 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacqWFhpWydaqhaaWcbaGaem4uam1aaSbaaWqaaiabigdaXaqabaWccqGHRaWkcqWIMaYscqGHRaWkcqWGtbWudaWgaaadbaGaemyraueabeaaliabgUcaRiabdUgaRbqaaiabikdaYaaaaaa@3981@ denotes the standard inverse χ2 distribution with (S1 +...+ SE + k) degrees of freedom, with scale parameter:

{ e = 1 E s = 1 S e ( y j g s e μ j g e ) 2 + k τ ˜ j 2 } / ( S 1 + + S E + k ) . MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamXvP5wqSXMqHnxAJn0BKvguHDwzZbqegyvzYrwyUfgarqqtubsr4rNCHbGeaGqiA8vkIkVAFgIELiFeLkFeLk=iY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfeaY=biLkVcLq=JHqVepeea0=as0db9vqpepesP0xe9Fve9Fve9GapdbaqaaeGacaGaaiaabeqaamqadiabaaGcbaWaaSGbaKaaGgaakmaacmaajaaObaGcdaaeWbqcaaAaaOWaaabCaKaaGgaacqGGOaakcqWG5bqEkmaaBaaajeaObaGaemOAaOMaem4zaCMaem4CamNaemyzaugabeaajaaOcqGHsisliiGacqWF8oqBkmaaBaaajeaObaGaemOAaOMaem4zaCMaemyzaugabeaajaaOcqGGPaqkkmaaCaaajeaObeqaaiabikdaYaaajaaOcqGHRaWkcqWGRbWAcuWFepaDgaacaOWaa0baaKqaGgaacqWGQbGAaeaacqaIYaGmaaaabaGaem4CamNaeyypa0JaeGymaedabaGaem4uam1cdaWgaaqccaAaaiabdwgaLbqabaaajmaOcqGHris5aaqcbaAaaiabdwgaLjabg2da9iabigdaXaqaaiabdweafbqcdaQaeyyeIuoaaKaaGkaawUhacaGL9baaaeaacqGGOaakcqWGtbWukmaaBaaajeaObaGaeGymaedabeaajaaOcqGHRaWkcqWIMaYscqGHRaWkcqWGtbWukmaaBaaajeaObaGaemyraueabeaajaaOcqGHRaWkcqWGRbWAcqGGPaqkcqGGUaGlaaaaaa@7F1D@

For Ig = 0, the following full conditionals are sampled:

θ j g | I g = 0 ,  rest  ~ N ( E η j g 0 2 μ j g E η j g 0 2 + σ j g 2 , σ g 2 η j g 0 2 E η j g 0 2 + σ j g 2 ) , η j g 0 2 | I g = 0 ,  rest ~ [ a s 1 2 + θ j g 2 a + 1 ] / χ a + 1 2 . MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamXvP5wqSXMqHnxAJn0BKvguHDwzZbqegyvzYrwyUfgarqqtubsr4rNCHbGeaGqiA8vkIkVAFgIELiFeLkFeLk=iY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfeaY=biLkVcLq=JHqVepeea0=as0db9vqpepesP0xe9Fve9Fve9GapdbaqaaeGacaGaaiaabeqaamqadiabaaGceaqabeaaiiGajaaOcqWF4oqCkmaaBaaajeaObaGaemOAaOMaem4zaCgabeaajaaOcqGG8baFcqWGjbqskmaaBaaajeaObaGaem4zaCgabeaajaaOcqGH9aqpcqaIWaamcqGGSaalcqqGGaaicqqGYbGCcqqGLbqzcqqGZbWCcqqG0baDcqqGGaaicqGG+bGFcqWGobGtkmaabmaajaaObaGcdaWcaaqcaaAaaiabdweafjab=D7aOPWaa0baaKqaGgaacqWGQbGAcqWGNbWzcqaIWaamaeaacqaIYaGmaaqcaaQae8hVd0McdaWgaaqcbaAaaiabdQgaQjabdEgaNjabgwSixdqabaaajaaObaGaemyrauKae83TdGMcdaqhaaqcbaAaaiabdQgaQjabdEgaNjabicdaWaqaaiabikdaYaaajaaOcqGHRaWkcqWFdpWCkmaaDaaajeaObaGaemOAaOMaem4zaCgabaGaeGOmaidaaaaajaaOcqGGSaalkmaalaaajaaObaGae83WdmNcdaqhaaqcbaAaaiabdEgaNbqaaiabikdaYaaajaaOcqWF3oaAkmaaDaaajeaObaGaemOAaOMaem4zaCMaeGimaadabaGaeGOmaidaaaqcaaAaaiabdweafjab=D7aOPWaa0baaKqaGgaacqWGQbGAcqWGNbWzcqaIWaamaeaacqaIYaGmaaqcaaQaey4kaSIae83WdmNcdaqhaaqcbaAaaiabdQgaQjabdEgaNbqaaiabikdaYaaaaaaajaaOcaGLOaGaayzkaaGaeiilaWcakeaajaaOcqWF3oaAkmaaDaaajeaObaGaemOAaOMaem4zaCMaeGimaadabaGaeGOmaidaaKaaGkabcYha8jabdMeajPWaaSbaaKqaGgaacqWGNbWzaeqaaKaaGkabg2da9iabicdaWiabcYcaSiabbccaGiabbkhaYjabbwgaLjabbohaZjabbsha0jabc6ha+PWaaSGbaKaaGgaakmaadmaajaaObaGcdaWcaaqcaaAaaiabdggaHjabdohaZPWaa0baaKqaGgaacqaIXaqmaeaacqaIYaGmaaqcaaQaey4kaSIae8hUdeNcdaqhaaqcbaAaaiabdQgaQjabdEgaNbqaaiabikdaYaaaaKaaGgaacqWGHbqycqGHRaWkcqaIXaqmaaaacaGLBbGaayzxaaaabaGae83XdmMcdaqhaaqcbaAaaiabdggaHjabgUcaRiabigdaXaqaaiabikdaYaaaaaqcaaQaeiOla4caaaa@D056@

For Ig = 1, the following full conditionals are sampled:

θ j g | I g = 1 ,  rest  ~ N ( E c j η j g 0 2 μ j g E c j η j g 0 2 + σ j g 2 , σ j g 2 c j η j g 0 2 E c j η j g 0 2 + σ j g 2 ) , η j g 0 2 | I g = 1 ,  rest ~ [ c j a s 1 2 + θ j g 2 c j ( a + 1 ) ] / χ a + 1 2 . MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamXvP5wqSXMqHnxAJn0BKvguHDwzZbqegyvzYrwyUfgarqqtubsr4rNCHbGeaGqiA8vkIkVAFgIELiFeLkFeLk=iY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfeaY=biLkVcLq=JHqVepeea0=as0db9vqpepesP0xe9Fve9Fve9GapdbaqaaeGacaGaaiaabeqaamqadiabaaGceaqabeaaiiGajaaOcqWF4oqCkmaaBaaajeaObaGaemOAaOMaem4zaCgabeaajaaOcqGG8baFcqWGjbqskmaaBaaajeaObaGaem4zaCgabeaajaaOcqGH9aqpcqaIXaqmcqGGSaalcqqGGaaicqqGYbGCcqqGLbqzcqqGZbWCcqqG0baDcqqGGaaicqGG+bGFcqWGobGtkmaabmaajaaObaGcdaWcaaqcaaAaaiabdweafjabdogaJPWaaSbaaKqaGgaacqWGQbGAaeqaaKaaGkab=D7aOPWaa0baaKqaGgaacqWGQbGAcqWGNbWzcqaIWaamaeaacqaIYaGmaaqcaaQae8hVd0McdaWgaaqcbaAaaiabdQgaQjabdEgaNjabgwSixdqabaaajaaObaGaemyrauKaem4yamMcdaWgaaqcbaAaaiabdQgaQbqabaqcaaQae83TdGMcdaqhaaqcbaAaaiabdQgaQjabdEgaNjabicdaWaqaaiabikdaYaaajaaOcqGHRaWkcqWFdpWCkmaaDaaajeaObaGaemOAaOMaem4zaCgabaGaeGOmaidaaaaajaaOcqGGSaalkmaalaaajaaObaGae83WdmNcdaqhaaqcbaAaaiabdQgaQjabdEgaNbqaaiabikdaYaaajaaOcqWGJbWykmaaBaaajeaObaGaemOAaOgabeaajaaOcqWF3oaAkmaaDaaajeaObaGaemOAaOMaem4zaCMaeGimaadabaGaeGOmaidaaaqcaaAaaiabdweafjabdogaJPWaaSbaaKqaGgaacqWGQbGAaeqaaKaaGkab=D7aOPWaa0baaKqaGgaacqWGQbGAcqWGNbWzcqaIWaamaeaacqaIYaGmaaqcaaQaey4kaSIae83WdmNcdaqhaaqcbaAaaiabdQgaQjabdEgaNbqaaiabikdaYaaaaaaajaaOcaGLOaGaayzkaaGaeiilaWcakeaajaaOcqWF3oaAkmaaDaaajeaObaGaemOAaOMaem4zaCMaeGimaadabaGaeGOmaidaaKaaGkabcYha8jabdMeajPWaaSbaaKqaGgaacqWGNbWzaeqaaKaaGkabg2da9iabigdaXiabcYcaSiabbccaGiabbkhaYjabbwgaLjabbohaZjabbsha0jabc6ha+PWaaSGbaKaaGgaakmaadmaajaaObaGcdaWcaaqcaaAaaiabdogaJPWaaSbaaKqaGgaacqWGQbGAaeqaaKaaGkabdggaHjabdohaZPWaa0baaKqaGgaacqaIXaqmaeaacqaIYaGmaaqcaaQaey4kaSIae8hUdeNcdaqhaaqcbaAaaiabdQgaQjabdEgaNbqaaiabikdaYaaaaKaaGgaacqWGJbWykmaaBaaajeaObaGaemOAaOgabeaakmaabmaajaaObaGaemyyaeMaey4kaSIaeGymaedacaGLOaGaayzkaaaaaaGaay5waiaaw2faaaqaaiab=D8aJPWaa0baaKqaGgaacqWGHbqycqGHRaWkcqaIXaqmaeaacqaIYaGmaaaaaKaaGkabc6caUaaaaa@EC46@

For all iterations, the following are sampled:

c j | rest ~ [ G b s 2 2 + g = 1 G θ j g 2 η j g 0 2 G ( b + 1 ) ] / χ G ( b + 1 ) 2 , p | rest ~ Beta ( g = 1 G I g + 1 , G g = 1 G I g + 1 ) , I g ~ Bernoulli ( d g ) , d g = p ( I g = 1 | rest ) = p  Prob ( Y g | I g = 1 ) p  Prob ( Y g | I g = 1 ) + ( 1 p )  Prob( Y g | I g = 0 ) . MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamXvP5wqSXMqHnxAJn0BKvguHDwzZbqegyvzYrwyUfgarqqtubsr4rNCHbGeaGqiA8vkIkVAFgIELiFeLkFeLk=iY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfeaY=biLkVcLq=JHqVepeea0=as0db9vqpepesP0xe9Fve9Fve9GapdbaqaaeGacaGaaiaabeqaamqadiabaaGceaqabeaajaaOfaqaaeWadaaabaGaem4yamMcdaWgaaqcbaAaaiabdQgaQbqabaqcaaQaeiiFaWNaeeOCaiNaeeyzauMaee4CamNaeeiDaqhabaGaeeOFa4habaGcdaWcgaqcaaAaaOWaamWaaKaaGgaakmaalaaajaaObaGafm4raCKbauaacqWGIbGycqWGZbWCkmaaDaaajeaObaGaeGOmaidabaGaeGOmaidaaKaaGkabgUcaROWaaabCaKaaGgaakmaalaaajaaObaacciGae8hUdeNcdaqhaaqcbaAaaiabdQgaQjabdEgaNbqaaiabikdaYaaaaKaaGgaacqWF3oaAkmaaDaaajeaObaGaemOAaOMaem4zaCMaeGimaadabaGaeGOmaidaaaaaaeaacqWGNbWzcqGH9aqpcqaIXaqmaeaacuWGhbWrgaqbaaqcdaQaeyyeIuoaaKaaGgaacuWGhbWrgaqbaOWaaeWaaKaaGgaacqWGIbGycqGHRaWkcqaIXaqmaiaawIcacaGLPaaaaaaacaGLBbGaayzxaaaabaGae83XdmMcdaqhaaqcbaAaaiqbdEeahzaafaWcdaqadaqcbaAaaiabdkgaIjabgUcaRiabigdaXaGaayjkaiaawMcaaaqaaiabikdaYaaaaaqcaaQaeiilaWcabiqaaqiacaWLjaGaemiCaaNaeiiFaWNaeeOCaiNaeeyzauMaee4CamNaeeiDaqhabaGaeeOFa4habaGaeeOqaiKaeeyzauMaeeiDaqNaeeyyaeMcdaqadaqcaaAaaOWaaabCaKaaGgaacqWGjbqskmaaBaaajeaObaGaem4zaCgabeaaaeaacqqGNbWzcqGH9aqpcqaIXaqmaeaacqqGhbWraKWaGkabggHiLdqcaaQaey4kaSIaeGymaeJaeiilaWIaem4raCKaeyOeI0IcdaaeWbqcaaAaaiabdMeajPWaaSbaaKqaGgaacqWGNbWzaeqaaKaaGkabgUcaRiabigdaXaqcbaAaaiabbEgaNjabg2da9iabigdaXaqaaiabbEeahbqcdaQaeyyeIuoaaKaaGkaawIcacaGLPaaacqGGSaalaeGabaaXciaaxMaacqWGjbqskmaaBaaajeaObaGaem4zaCgabeaaaKaaGgaacqGG+bGFaeaacqqGcbGqcqqGLbqzcqqGYbGCcqqGUbGBcqqGVbWBcqqG1bqDcqqGSbaBcqqGSbaBcqqGPbqAkmaabmaajaaObaGaemizaqMcdaWgaaqcbaAaaiabdEgaNbqabaaajaaOcaGLOaGaayzkaaGaeiilaWcaaaGcbaqcaaQaemizaqMcdaWgaaqcbaAaaiabdEgaNbqabaqcaaQaeyypa0JaemiCaaNcdaqadaqcaaAaaiabdMeajPWaaSbaaKqaGgaacqWGNbWzaeqaaKaaGkabg2da9iabigdaXiabcYha8jabbkhaYjabbwgaLjabbohaZjabbsha0bGaayjkaiaawMcaaiabg2da9OWaaSaaaKaaGgaacqWGWbaCcqqGGaaicqqGqbaucqqGYbGCcqqGVbWBcqqGIbGycqGGOaakieqacqGFzbqwkmaaBaaajeaObaGaem4zaCgabeaajaaOcqGG8baFcqWGjbqskmaaBaaajeaObaGaem4zaCgabeaajaaOcqGH9aqpcqaIXaqmcqGGPaqkaeaacqWGWbaCcqqGGaaicqqGqbaucqqGYbGCcqqGVbWBcqqGIbGycqGGOaakcqGFzbqwkmaaBaaajeaObaGaem4zaCgabeaajaaOcqGG8baFcqWGjbqskmaaBaaajeaObaGaem4zaCgabeaajaaOcqGH9aqpcqaIXaqmcqGGPaqkcqGHRaWkcqGGOaakcqaIXaqmcqGHsislcqWGWbaCcqGGPaqkcqqGGaaicqqGqbaucqqGYbGCcqqGVbWBcqqGIbGycqqGOaakcqGFzbqwkmaaBaaajeaObaGaem4zaCgabeaajaaOcqGG8baFcqWGjbqskmaaBaaajeaObaGaem4zaCgabeaajaaOcqGH9aqpcqaIWaamcqGGPaqkaaGaeiOla4caaaa@28AC@

Here G = total number of genes g, G' = set of genes with Ig = 1 in an iteration, and Yg is the data from all studies. Since the full conditional posterior distributions are all closed form when conditioned on the values of Ig, the Gibbs sampler [36] is used to generate samples from these distributions. We used 5,000 iterations for all analyses, except for the five study simulation, which required 8,000 iterations, which was more than adequate. The calculations are implemented using the WinBUGS software [44].

Availability and requirements

The WinBUGS code for executing the models is freely available.

Project name: BayesPoolMicro.

Project home page: http://www.math.umass.edu/~conlon/research/BayesPoolMicro/

Operating system: Windows 98 or later.

Other requirements: WinBUGS software version 1.4 or later [44].

License: free.

Authors' contributions

EMC and JJS contributed to writing the computer code. All authors contributed to the development of the methodology and to writing the manuscript.

Acknowledgements

We thank George Tseng, Jeffrey Townsend and John Staudenmayer for helpful discussion, and Patrick Eichenberger and the laboratory of Richard Losick for the B. subtilis microarray data and helpful advice. We also thank three anonymous referees for input that enhanced the manuscript. EMC was partially supported by a University of Massachusetts Healey Endowment Grant, and JSL was partially supported by the NIH Grant R01-HG02518-01, and the NSFChina Grant 10228102.

References

  • Baldi P, Long AD. Bayesian framework for the analysis of microarray expression data: reguralized t-test and statistical inferences of gene changes. Bioinformatics. 2001;17:509–519. doi: 10.1093/bioinformatics/17.6.509. [PubMed] [Cross Ref]
  • Tseng GC, Oh MK, Rohlin L, Liao JC, Wong WH. Issues in cDNA microarray analysis: quality filtering, channel normalization, models of variations and assessment of gene effects. Nucleic Acids Res. 2001;29:2549–2557. doi: 10.1093/nar/29.12.2549. [PMC free article] [PubMed] [Cross Ref]
  • Townsend JP, Hartl DL. Bayesian analysis of gene expression levels: statistical quantification of relative mRNA level across multiple treatments or samples. Genome Biology. 2002;3:research0071.1–71.16. doi: 10.1186/gb-2002-3-12-research0071. [PMC free article] [PubMed] [Cross Ref]
  • Efron B, Tibshirani R, Storey JD, Tusher VG. Empirical Bayes Analysis of a Microarray Experiment. Journal of the American Statistical Association. 2001;96:1151–1160. doi: 10.1198/016214501753382129. [Cross Ref]
  • Newton MA, Kendziorski CM, Richmond CS, Blattner FR, Tsui KW. On Differential Variability of Expression Ratios: Improving Statistical Inference About Gene Expression Changes From Microarray Data. Journal of Computational Biology. 2001;8:37–52. doi: 10.1089/106652701300099074. [PubMed] [Cross Ref]
  • Ibrahim JG, Chen M-H, Gray RJ. Bayesian Models for Gene Expression With DNA Microarray Data. Journal of the American Statistical Association. 2002;97:88–99. doi: 10.1198/016214502753479257. [Cross Ref]
  • Broët P, Richardson S, Radvanyi F. Bayesian hierarchical model for identifying changes in gene expression from microarray experiments. Journal of Computational Biology. 2002;9:671–683. doi: 10.1089/106652702760277381. [PubMed] [Cross Ref]
  • Gottardo R, Pannucci JA, Kuske CR, Brettin T. Statistical analysis of microarray data: a Bayesian approach. Biostatistics. 2003;4:597–620. doi: 10.1093/biostatistics/4.4.597. [PubMed] [Cross Ref]
  • Lönnstedt I, Speed TP. Replicated microarray data. Statistica Sinica. 2002;12:31–46.
  • Pan W. A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments. Bioinformatics. 2002;18:546–554. doi: 10.1093/bioinformatics/18.4.546. [PubMed] [Cross Ref]
  • Kendziorski CM, Newton MA, Lan H, Gould MN. On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles. Statistics in Medicine. 2003;22:3899–3914. doi: 10.1002/sim.1548. [PubMed] [Cross Ref]
  • Newton MA, Noueiry A, Sarkar D, Ahlquist P. Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics. 2004;5:155–176. doi: 10.1093/biostatistics/5.2.155. [PubMed] [Cross Ref]
  • Do KA, Müller P, Tang F. Bayesian mixture model for differential gene expression. Journal of the Royal Statistical Society C. 2005;54:627–644. doi: 10.1111/j.1467-9876.2005.05593.x. [Cross Ref]
  • Ishwaran H, Rao JS. Detecting Differentially Expressed Genes in Microarrays Using Bayesian Model Selection. Journal of the American Statistical Association. 2003;98:438–455. doi: 10.1198/016214503000224. [Cross Ref]
  • Ishwaran H, Rao JS. Spike and Slab Gene Selection for Multipgroup Microarray Data. Journal of the American Statistical Association. 2005;100:764–780. doi: 10.1198/016214505000000051. [Cross Ref]
  • Yang D, Zakharkin SO, Page GP, Brand JP, Edwards JW, Bartolucci AA, Allison DB. Applications of Bayesian statistical methods in microarray data analysis. Am J Pharmacogenomics. 2004;4:53–62. doi: 10.2165/00129785-200404010-00006. [PubMed] [Cross Ref]
  • Rhodes DR, Barrette TR, Rubin MA, Ghosh D, Chinnaiyan AM. Meta- analysis of microarrays: inter-study validation of gene expression profiles reveals pathway dysregulation in prostate cancer. Cancer Research. 2002;62:4427–4433. [PubMed]
  • Choi JK, Yu U, Kim S, Yoo OJ. Combining multiple microarray studies and modeling inter-study variation. Bioinformatics. 2003:i84–i90. doi: 10.1093/bioinformatics/btg1010. [PubMed] [Cross Ref]
  • Ghosh D, Barette TR, Rhodes D, Chinnaiyan AM. Statistical issues and methods for meta-analysis of microarray data: a case study in prostate cancer. Functional & Integrative Genomics. 2003;3:180–188. doi: 10.1007/s10142-003-0087-5. [PubMed] [Cross Ref]
  • Parmigiani G, Garrett-Mayer ES, Anbazhagan R, Gabrielson E. A cross-study comparison of gene expression studies for the molecular classification of lung cancer. Clinical Cancer Research. 2004;10:2922–2927. doi: 10.1158/1078-0432.CCR-03-0490. [PubMed] [Cross Ref]
  • Jiang H, Deng Y, Chen H, Tao L, Sha Q, Chen J, Tsai C, Zhang S. Joint analysis of two microarray gene-expression data sets to select lung adenocarcinoma marker genes. BMC Bioinformatics. 2004;5:81. doi: 10.1186/1471-2105-5-81. [PMC free article] [PubMed] [Cross Ref]
  • Rhodes DR, Yu J, Shanker K, Deshpande N, Varambally R, Ghosh D, Barrette T, Pandey A, Chinnaiyan AM. Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression. Proc Natl Acad Sci USA. 2004;101:9309–9314. doi: 10.1073/pnas.0401994101. [PMC free article] [PubMed] [Cross Ref]
  • Hu P, Greenwood CMT, Beyene J. Integrative analysis of multiple gene expression profiles with quality-adjusted effect size models. BMC Bioinformatics. 2005;6:128. doi: 10.1186/1471-2105-6-128. [PMC free article] [PubMed] [Cross Ref]
  • Hedges LV, Olkin I. Statistical Methods for Meta-Analysis. San Diego, CA, Academic Press; 1985.
  • Stevens JR, Doerge RW. Combining Affymetrix microarray results. BMC Bioinformatics. 2005;6:57. doi: 10.1186/1471-2105-6-57. [PMC free article] [PubMed] [Cross Ref]
  • Kuo WP, Jenssen TK, Butte AJ, Ohno-Machado L, Kohane IS. Analysis of matched mRNA measurements from two different microarray technologies. Bioinformatics. 2002;18:405–412. doi: 10.1093/bioinformatics/18.3.405. [PubMed] [Cross Ref]
  • Jarvinen AK, Hautaniemi S, Edgren H, Auvinen P, Saarela J, Kallioniemi OP, Monni O. Are data from different gene expression microarray platforms comparable? Genomics. 2004;83:1164–1168. doi: 10.1016/j.ygeno.2004.01.004. [PubMed] [Cross Ref]
  • Hardiman G. Microarray platforms – comparisons and contrasts. Pharmacogenomics. 2004;5:487–502. doi: 10.1517/14622416.5.5.487. [PubMed] [Cross Ref]
  • DuMouchel WH, Harris JE. Bayes methods for combining the results of cancer studies in humans and other species. Journal of the American Statistical Association. 1983;78:293–315. doi: 10.2307/2288631. [Cross Ref]
  • Lockhart DJ, Winzeler EA. Genomics, gene expression and DNA arrays. Nature. 2000;405:827–836. doi: 10.1038/35015701. [PubMed] [Cross Ref]
  • Wu TD. Analyzing gene expression data from DNA microarrays to identify candidate genes. Journal of Pathology. 2001;195:53–65. doi: 10.1002/1096-9896(200109)195:1<53::AID-PATH891>3.0.CO;2-H. [PubMed] [Cross Ref]
  • Hardiman G. Microarray technologies – an overview. Pharmacogenomics. 2002;3:293–297. doi: 10.1517/14622416.3.3.293. [PubMed] [Cross Ref]
  • Southern EM. DNA microarrays. History and overview. Methods Mol Biol. 2000;170:1–15. [PubMed]
  • Dudoit S, Yang YH, Callow MJ, Speed TP. Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Statistica Sinica. 2002;12:111–139.
  • Eichenberger P, Jensen ST, Conlon EM, van Ooij C, Silvaggi J, Gonzalez-Pastor JE, Fujita M, Ben-Yehuda S, Stragier P, Liu JS, Losick R. The sigmaE regulon and the identification of additional sporulation genes in Bacillus subtilis. Journal of Molecular Biology. 2003;327:945–972. doi: 10.1016/S0022-2836(03)00205-5. [PubMed] [Cross Ref]
  • Liu JS. Monte Carlo Strategies in Scientific Computing. New York, Springer-Verlag; 2001.
  • Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society B. 1995;85:289–300.
  • Tusher VG, Tibshirani R, Chu G. Significance Analysis of Microarrays Applied to the Ionizing Radiation Response. Proceedings of the National Academy of Sciences USA. 2001;98:5116–5121. doi: 10.1073/pnas.091062498. [PMC free article] [PubMed] [Cross Ref]
  • Storey JD. A Direct Approach to False Discovery Rates. Journal of the Royal Statistical Society B. 2002;64:479–498. doi: 10.1111/1467-9868.00346. [Cross Ref]
  • Storey JS, Tibshirani R. SAM Thresholding and False Discovery Rates for Detecting Differential Gene Expression in DNA Microarrays. In: Parmigiani G, Garrett ES, Irizarry RA, Zeger SL, editor. The Analysis of Gene Expression Data: Methods and Software. Springer, NY; 2003.
  • Genovese C, Wasserman L. Operating characteristics and extensions of the false discovery rate procedure. Journal of the Royal Statistical Society B. 2002;64:499–518. doi: 10.1111/1467-9868.00347. [Cross Ref]
  • Conlon EM, Eichenberger P, Liu JS. Determining and analyzing differentially expressed genes from cDNA microarray experiments with complementary designs. Journal of Multivariate Analysis. 2004;90:1–18. doi: 10.1016/j.jmva.2004.02.007. [Cross Ref]
  • Schadt EE, Li C, Ellis B, Wong WH. Feature extraction and normalization algorithms for high-density oligonucleotide gene expression array data. J CellBiochem Suppl. 2001;37:120–125. [PubMed]
  • The BUGS Project http://www.mrc-bsu.cam.ac.uk/bugs

Articles from BMC Bioinformatics are provided here courtesy of BioMed Central

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...