![]() | ![]() |
Formats:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Ranking Analysis of Microarray Data: A Powerful Method for Identifying Differentially Expressed Genes a Institute of Molecular Medicine, The University of Texas-Houston, Houston, Texas 77030 b Laboratory for Conservation and Utilization of Bioresources, Yunnan University, Kunming Province, Yunnan, China 650 c Human Genetics Center, School of Public Health, The University of Texas-Houston, Houston, Texas, 77030 Corresponding author: Yun-Xin Fu, Ph.D., Human Genetics Center, University of Texas at Houston, 1200 Herman Pressler, Houston Texas 77030, Email: Yunxin.fu/at/uth.tmc.edu, Telephone: 713-500-9813; Fax: 713-500-0900 Abstract Microarray technology provides a powerful tool for the expression profile of thousands of genes simultaneously, which makes it possible to explore the molecular and metabolic etiology of the development of a complex disease under study. However, classical statistical methods and technologies fail to be applied to microarray data. Therefore, it is necessary and motivated to develop the powerful methods for large-scale statistical analyses. In this paper, we described a novel method, called Ranking Analysis of Microarray data (RAM). RAM, which is a large-scale two-sample t-test method, is based on comparisons between a set of ranked T-statistics and a set of ranked Z-values (a set of ranked estimated null scores) yielded by a “randomly splitting” approach instead of a “permutation” approach and two-simulation strategy for estimating the proportion of genes identified by chance, i.e., the false discovery rate (FDR). The results obtained from the simulated and observed microarray data shows that RAM is more efficient in identification of genes differentially expressed and estimation of FDR under the undesirable conditions such as a large fudge factor, small sample size, or mixture distribution of noises than Significance Analysis of Microarrays (SAM). Keywords: Microarray, t-test, ranking analysis, false discovery rate Introduction Microarray technology provides a powerful tool for measuring the expression levels of large numbers of genes simultaneously, and creates unparalleled opportunities to study complex physiological or pathological processes, including the development of disease, that are mediated by the coordinated action of multiple genes [1]. Detection of genes differentially expressed across experimental, biological and/or clinical conditions is a major objective of microarray experiments. Methods for finding genes significantly differentially expressed in the context of microarray data analysis can be classified into three major groups [2,3]: marginal filters, wrappers [4], and embedded approaches [5,6]. The wrapper and embedded methods are a type of search algorithms by which candidate gene subsets that are useful to build a good predictor are constructed and selected and then evaluated by using a classification algorithm [3,7]. The filter approaches are a type of simple and fast-speed method including t-tests and nonparametric scoring [8,9] and analysis of variance (ANOVA) [1,10] for searching for the features (genes) or feature (gene) subsets that are irrelevant and independent of each other [3, 7]. For the microarray data, the filter approaches encounter a challenging simultaneous inference problem, as the probability of committing a type I error increases with the number of tests performed [11]. In order to resolve the statistical problem in testing a large family of null hypotheses, several multiple procedures have been developed. The Bonferroni procedure, the Holm procedure [12], Hochberg procedure [13], the Westfall and Young procedure [14] address the multiple test problem by controlling the family-wise error rate (FWER), which is the probability that at least one false positive occurs over the collective tests [15]. However, these methods are based on the assumption that different tests are independent of each other, they are, thus, not well suited to microarray data, often being too stringent and may yield no or few positive genes [16] and may result in unnecessary loss of power. Benjamini and Hochberg [17] have proposed an alternative measure, the false discovery rate (FDR), to control erroneous rejection of a number of true null hypotheses. FDR is an expected proportion of the false positives among all the positives detected. The FDR-based multiple testing approaches, such as the Benjamini and Hochberg(BH) procedure [17,18] and the Benjamini and Liu-procedure [19] have been developed for testing for a large family of hypotheses. These procedures are generally suited to larger sample sizes because small sample sizes lead FDR to be too “granular” [16]. Most recently, Storey [20] and Storey and Tibshirani [21] developed a new measure, i.e., positive FDR (pFDR) that is an arguably more appropriate variation. It multiplies the FDR by a factor of, which is the estimated proportion of non-differentially expressed genes to all genes on π0 the arrays [22]. The estimate of pFDR is smaller than the estimate of FDR [22]. Tsai et al. [23] suggested the use of the conditional FDR (cFDR) on the most significant findings. Pounds and Cheng [15, 24] proposed the spacing LOESS histogram (SPLOSH) approach to estimate of cFDR.. Tusher et al. [16] developed a new FDR-based method, called Significance Analysis of Microarrays (SAM). SAM is very popular because it can identify genes with significantly expressional change and can estimate FDR based on permutations. However, the conventional permutation approach is not the most appropriate method for estimating the null distribution for most microarray data because sample sizes in such experiments are commonly small which yield relatively small number of permutations and lead to inaccurate ranking of scores. Although SAM has the advantage of being distribution-free, its use of a fudge factor (S0) makes it mostly applicable to normal distributions because S0 is in general smaller than or equal to 1 in normal distributions. Non-normal distributions or small sample sizes can produce a larger S0, which often makes SAM loss its power or be not applicable. These problems in SAM led us to develop a new statistical method called ranking analysis of microarray (RAM) data. The overall approach of RAM is somewhat similar to SAM, which is to identify genes with significant expression changes through the use of gene-specific t-tests, but RAM evaluates its significance based on an improved empirical distribution generated by a “randomly splitting” approach instead of the permutation approach and implementation of a simulation-based interval method for estimation of FDR. As a result, the RAM has all the major advantages of SAM, plus performs very well for small sample sizes, which are typical in microarray experiments. Methods T-statistic For simplicity, we will focus our discussion on the analysis of expression data from experiments of two different classes (designated as 1 and 2), which is very common in practice. The two classes may correspond to two different genotypes of individuals, treatments, cell types, tissues, etc. Let N be the number of genes examined and mik be the number of replicate observations for the expression of gene k (k = 1, …, N) in class i (i =1, 2). We will refer to the collection of all the observations for a given gene in class i as sample i. Therefore, mik is the size of sample i for gene k. Typically m11 = m12 = … = m1N = m1 and m21 = m22 = … = m2N = m2, otherwise the experiments is said to have some missing observations Let ik and
The traditional t-test statistic for testing if there is a significant difference between two sample means is equal to where in the current context for unequal variances for the two class experiments or for equal variances. Although the traditional t-statistic is a reasonable choice for some expression data sets, its applicability is often questionable because that a small sampling variance ( 1), which can often arise due to randomness from large number of genes and small sample size, and relatively large value of dk may lead to erroneous conclusion. Such effect is generally known as the fudging effect. To reduce the fudging effect, Tusher et al. [16] proposed a modified t-statistic defined aswhere S0 is a constant representing the minimal coefficient of variation of tk computed as a function of σk in the moving windows across the data. However, in our own studies, we noted the fudging effect using the modified t-statistic is still quite strong when the sample size is small. In particular, small sample size often leads to an unreasonably large value of S0 that dominates the test statistic and consequently reduces the power of the analysis. To circumvent the problem, we propose a simple alternative correction δk for the variance of expression for gene k as
for the case of unequal variances and
for the case of equal variances where
Thus, the t-statistic for the difference of expression levels of gene k is redefined as
Since Tk = tk unless dk > σk <1, the new test statistic is a simpler extension of the traditional t-statistic than that proposed by Tusher et al. [16]. Ranking Analysis To identify genes whose expression levels are significantly different in two experimental conditions, a common practice is to rank the genes according to their values of the chosen statistics, which in our situation is T. Suppose Tk* is the k*-th largest T value, then its corresponding gene k is said to have significantly different expression between the two experimental conditions for a given threshold value Δif
where Zk* = E(Tk*) is the expectation of Tk*under the null hypothesis that there is no gene having a significant difference in expression. This type of test is known as the Ranking Test. To enable the ranking test, it is critical to obtain a good estimate of Zk*. Tusher et al. [16] proposed a permutation approach for this purpose, which uses a standard permutation procedure for each gene. This process works well if the sample size is large. When the sample size is small, however, the number of permutated samples for each gene is rather small, which leads to a biased ranking test and even renders the test not applicable. This appears to be caused by the randomness introduced by permutations that lead to biased tail distributions for ranked values. The observations from analyzing both real and simulated data lead us to develop a Randomly Splitting (RS) approach to estimate Z as follows. First each sample is randomly split into two subsamples with size difference not larger than a given value C. We found that it is best to set C=4. For the J-th split, let
The splitting process is carried out for every gene, and define
The set of
Fig. 1
Estimate of FDR Consider a series of threshold values Δi (i=1,…L). Let N(i) be the number of genes that are significant at the threshold Δi by the ranking analysis. N(i) is then comprised of two parts: the number of true positives Nt(i) and the number of false positives Nf(i). Therefore N(i) = Nt(i) + Nf(i). The false discovery rate (FDR) at the threshold Δi can be written as RFD(i) = Nf(i)/N(i) which requires to be estimated since Nf(i) is unknown. To improve the accuracy of estimating FDR, we propose a new strategy to obtain FDR as an average of two estimates each derived from simulation under a specific condition. The first estimate is carried out as follows. For each gene, two samples of m replicates are simulated from a normal distribution, one with mean randomly set to be
The process will produce M sets of simulated data each is subjected to the ranking analysis described in the previous section. For each simulated data set, every ranked position has thus a corresponding T value that is denoted by
k* for every ranking position will allow one to identify genes that are becoming significant. The number of such genes in the J-th set of simulation data at the threshold Δi is denoted by N(1, J,i).Let
as the first estimate of FDR where
The second estimate of FDR is obtained also from simulation. The simulation of the two samples for each gene is done in the same way as the first simulation, except that the two means are set to be equal, i.e.,
k*2, and the significances across all the ranking positions at threshold Δi are counted as N(2,J,i). Let
as the second estimate of FDR. Equation (8) shows that f(2,i) = 1 when N(i) = 0 and N(2,i) ≥ 1, f(2,i) = 0.5 when N(i) = N(2,i), f(2,i) < 0.5 when N(i) > N(2,i), and f(2,i) = 0 when N(i) ≥ 1 and N(2,i) = 0. Although we intended to find a lower and upper bounds for FDR, it can be seen from Fig. 2
where ai = f(1,i)/[f(1,i) + f(2,i)] and bi = 1 − ai. We found that at threshold level Δi, a better estimate of FDR is obtained by
To further smooth the estimates of FDR, consider the difference between the numbers of genes found to be significant at adjacent thresholds Δi and Δi+1, define a recursive formula modifying the probability fi as
where pi = [N(i) − N(i + 1)]/[1 + N(i) − N(i + 1)]and qi = 1 − pi. Equation (11) suggests that fi+1 =fi if N(i) = N(i + 1). Thus, the number of the false discoveries among those found to be significant at threshold Δi in the observed data is estimated by
and an estimate of the FDR at threshold Δi is given by
It can be seen from Fig. 2 FD(i), indicating that FD(i) is a good estimate of FDR. We also found that, if no gene in the simulation was found to be significant, FD(i) would be more than 0.5 at threshold Δi of f(1,i) < f(2,i) (the result is not shown).Simulation Results Estimate of The Null Distribution To determine if the empirical distributions obtained by the permutation approach and the RS approach are appropriate for the analysis of expression data, we simulated three sets of microarray data sets each consisting of 3000 genes and two samples of 12 replicates each. The means and variances for each gene are set to the observed means and variances from the real microarray data obtained from our laboratory. In our real microarray data sets, the expression levels of 3000 genes were measured for two different strains [the spontaneously hypertensive rat (SHR) and stroke-prone spontaneously hypertensive rat (SHRSP)] each consisting of 12 rat individuals. In the first simulation data set, all 3000 genes were set to have no treatment effect. In the second and third simulation data sets, treatment effects of G=10R and G = 30R, respectively, were randomly assigned to 30% of the genes where R is a random variable in the uniform distribution (0,1]. In the ranking analysis, a set of Zk*values for each simulated data set was computed from 100 permutations or 100 random splits. As Zk*is an estimate of Tk*under the null hypothesis, a desirable property is that Zk*has a linear relationship with Tk*. This property can be seen by plotting Zk*versus Tk*. Fig. 3 It can be seen from Fig. 3 Estimate of FDR Since it is generally unknown if a given gene expresses differently in two different conditons, it is not necessarily best to use real data of gene expression to evaluate a FDR estimator. Therefore, we also conducted a computer simulation for comparing expression status (significance or insignificance) of a gene identified by a method with its real status. In this simulation study, we also generated two data sets of 3000 genes where treatment effect values of 10R were randomly assigned to 10% and 30 % of the genes, respectively, and sample size was set to be 6 replicates. This simulation procedure was iterated 20 times. Four criteria, i.e., absolute average, maximum and minimum, and variance of differences between the estimated and true numbers of the false discoveries across all FD(i)% ≤ λ obtained from these 20 two-sample simulated data sets were used to assess an estimator. We set λ = 40, 30, 20, 10, and 5%. Table 1 summarizes the results obtained by applying RAM and SAM (the software comes from http://www-stat.stanford.edu/~tibs/SAM/) to these simulated data sets in the situations of 10% and 30% of the genes given effect values of 10R, respectively. These results shown in Table 1 clearly indicate that the RAM estimator has a much better accuracy in estimating FDR than the SAM estimator. In particular, for FDR of 5%, which is an important threshold value in practice, the RAM’s estimate is, on average, 0.65 false discoveries with variance <1, and variation interval of 1~3 false discoveries whereas SAM estimate is, on average, about 2 false discoveries with variance larger than 6 and variation interval of 7 false discoveries. Fig. 2
Identification of Differentially Expressed Genes The exact distribution for the expression level of a gene is unknown in microarray experiments. For some genes, normal distributions may be appropriate, while for some gamma distribution may be more accurate, and for some none of the standard distributions may be adequate. When many thousands of genes are examined simultaneously, a variety of distributions is likely present. Therefore, it is appropriate to evaluate a method using data generated from a mixture of distributions. For simplicity, we limited ourselves in the simulation to use gamma and normal distributions to yield data sets consisting of 3000 genes in two samples each having 6 replicates. Then at random we mixed them together at a given proportion (for example, 30% gamma distribution and 70% normal distribution) to construct a new set of microarray data. We applied SAM and RAM to the simulation data set. The results are summarized in Fig. 1
Application to the Real Microarray Data Both SAM and RAM were applied to the two-sample real microarray data of 7129 genes obtained from two small samples (4 replicates for each sample) provided in the SAM software package. The results shown in Table 3 is helpful for explaining the observation in Table 1 of Tusher et al.[16]. A larger S0 (S0=3.3) is the primary cause for SAM’s poor performance: 12% FDR in the 48 genes identified to be significant at threshold Δ =1.2. It can be seen from Table 3 that RAM found 61 genes having significant expressional change at an acceptable FDR level of 3.3% whereas SAM identified only 21 genes at an acceptable FDR level of 4.7%. The difference of 40 genes between both is because of an unnecessarily larger fudging factor (S0= 3.4) used in SAM. In deed, these 40 genes all have d > σ < 1, suggesting that a large value of S0 indeed led some truly differentially expressed genes to be missed by SAM.
Discussion In conventional statistical resampling, permutation is a popular approach to estimate a null distribution. However, as seen from our analysis and as indicted in Appendix A, the distribution-free method based on permutations would be generally biased because for microarray data analysis small sample sizes limit the number of distinct permutation samples and ranking the T-statistics at each permutation does not completely remove the treatment effect contributing to gene-expression variations. The RS approach is developed in this paper to circumvent the aforementioned problems of SAM. The resulting RAM has the advantage of being insensitive to the treatment effect often present in real data and having a better estimate of FDR. Another important advantage of RAM is that it works well for small sample size that is particularly useful for analyzing microarray data that often have small sample sizes. In addition, the RS approach can be easily extended to the pair data set (see Appendix B) FDR is often used to control error rate in the BH-procedure [18] and in SAM [16] and [22]. In practice, for a multiple-test method based on t-statistic, it is important to obtain an accurate estimate of FDR. In SAM, the FDR estimate is realized through the permutation approach in which fluctuations around expectation occur among permutated samples. The fluctuations would be impacted on by the data itself, i.e., sample size, treatment effect, and data noise. The RAM estimator of FDR is based on a two-simulation strategy so that it avoids these impacts on the estimate of FDR. Our simulation results indicate that the RAM estimator of FDR is generally accurate at a given threshold of interest. In an idealized setting where all expression level is normally distribution, SAM and RAM all work well for identifying differentially expressed genes. However, in the case that most of the expression levels follow a normal distribution and a small fraction, for example, 30 percent of the genes, possibly follow a gamma distribution, SAM performs poorly or even fails to work due to a larger fudge factor S0 whereas RAM continues to performs well. In addition, small sample size makes it possible to produce the sample variances far smaller than 1 in a large-scale gene-expression profile. This situation, as seen in Tusher et al. [16], also produces a larger fudging factor for SAM, but in RAM this fudging impact can effectively be excluded. 01 Click here to view.(618K, xls) Acknowledgments This research has supported by grants from the U.S. National Institutes of Health R01 NS41466 (MF) and R01 HL69126 (MF), R01 GM50428 (YF) and funds from Yunnan University. We thank the High Performance Computer Center of Yunan University for computational support and Sara Barton for editorial assistance Appendix A Suppose we have two classes Xk = {xk1,…, xkm) and Yk = {yk1,…, ykn) of m replicates for gene k. A permutation produces two resampling classes Xk′ = {xk1,…,xkm−r, yk1,…,ykr) and Yk′ = {xk1,…,xkr, yk1,…,ykn−r). From these resampling two-class data, we have two resampling means
Let xkj = μk + τxk + exkj and ykj = μk + τyk + eykj where μk is overall mean (expectation) for expression levels of gene k, τxk and τyk are assumed to be treatment effects contributing to expression variation of gene k, exkj and eykj are expression noises. Thus, these two means can also be expressed as
where r is number of exchanged members between two classes. It is clear that with difference between
if r = m/2, otherwise, d(τk) ≠ 0. In addition, rank of Z-values across all position at each permutation changes the Z-values in position k* in the rank space so that the component dealing with d(τk) in the Z-value in position k* in the rank space, that is,
For no treatment effect, i.e., τxk = τyk = 0 and for small sample size for gene k, ∑ek ≥ 0 or ∑ek ≤ 0, and hence, Equations A1a and A1b are changed to
In the difference between
where d(ek) = exk − eyk and d[ek (r)] = ēyk(r) − ēxk(r). It is clear from equation (A4) that d(εk) ≠ d(ek) if d[ek (r)] ≠ 0. On the other hand, due to ēxk(r) ēxk and ēyk(r) ēyk, d[ek (r)] = ēyk(r) − ēxk(r) is negatively related to d(ek) = ēxk − ēyk, that is, if d(ek) > 0, then d[ek (r)] ≤ 0 or if d(ek)] < 0, then d[ek (r)] ≥ 0. Again, rank of the Z-value across all positions leads to
Appendix B For paired data, since two samples of mk observed values (x1k,…, xmkk) and (y1k,…, ymkk) become a sample of mk distant values (d1k,…, dmk k), k =1,…, N, the sample of mk replicates for distances can be also at random cut into two subsamples. Let dik = xik − yik = dk + exik − eyik = dk + eik, i = 1,…, mk where dk is difference between treatment effects on the expression of gene k. We then have
say, ēk in the paired data is equivalent to that in the unpaired data. The null score of the T-statistic is estimated by the Z-value: where σ 2 (dk) is the sample variance of distances between two paired data for gene k. Footnotes Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. References 1. Kerr MK, Martin M, Churchill GA. Analysis of variance for gene expression microarray data. J Comput Biol. 2000;7:819–839. [PubMed] 2. Li L, Jiang JW, Li X, Moser KL, Guo Z, Du L, Wang Q, Topol EJ, Wang Q, Rao S. A robust hybrid between genetic algorithm and support vector machine for extracting an optimal feature gene subset. Genomics. 2005;85:16–23. [PubMed] 3. Guyon I, Elisseeff A. An introduction to variable and feature selection. J Mach Learn Res. 2003;4(4):1157–1182. 4. Xing EP, Jordan MI, Karp RM. Feature selection for high-dimensional genomic microarray data. Machine Learning: Proceedings of the Eighteenth International Conference; San Francisco, Morgan Kaufmann, San Mateo, CA. [2001]. 5. Kohavi R, John GH. Wrappers for feature subset selection. Artif Intell. 1997;97:273–324. 6. Tsamardinos I, Aliferis CF. Ninth International Workshop on Artificial Intelligence and Statistics. Key West, FL: 2003. Towards principled feature selection: relevance, filters and wrappers. 7. Wolf L, Shashua A, Geman D. Feature selection for unsupervised and supervised inference: The emergence of sparsity in a weight-based approach. J Mach Learn Res. 2005;6(11):1855–1887. 8. Park PJ, Pagano M, Bonetti M. A nonparametric scoring algorithm for identifying informative genes from microarray data. Pac Symp Biocomput. 2001:52–63. [PubMed] 9. Li L, Li X, Guo Z. Efficiency of two filters for feature gene selection. Life Sci Res. 2003;7:372–396. (Chinese). 10. Cui X, Hwang JTG, Qiu J, Blades NJ, Churchill GA. Improved statistical tests for differential gene expression by shrinking variance components. Biostatistics. 2005;6(1):59–75. [PubMed] 11. Pan W, Lin J, Le CT. How many replicates of arrays are required to detect gene expression changes in microarray experiments? A mixture model approach. Genome Biol. 2002;3(5):Research0022. [PubMed] 12. Holm S. A simple sequentially rejective multiple test procedure. Scand J Stat. 1979;6:54–70. 13. Hochberg Y. A sharper Bonferroni procedure for multiple tests of significance. Biometrika. 1988;75:800–803. 14. Westfall P, Young S. Resampling-Based Multiple Testing. Wiley; New York: 1993. 15. Turkheimer FE, Smith CB, Schmidt K. Estimation of the number of “true” null hypotheses in multivariate analysis of neuroimaging data. Neuroimage. 2001;3:920–930. [PubMed] 16. Tusher VG, Tibshirani R, Chu G. Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA. 2001;98:5116–5121. [PubMed] 17. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. JR Stat Soc Ser B. 1995;57:289–300. 18. Benjamini Y, Drai D, Elmer G, Kafkfi N, Golani I. Controlling the false discovery rate in behavior genetics research. Behav Brain Res. 2001;125:279–284. [PubMed] 19. Benjamini Y, Liu W. A step-down multiple hypotheses testing procedure that controls the false discovery rate under independence. J Stat Plan Inference. 1999;82:163–170. 20. Storey JD. A direct approach to false discovery rates. JR Stat Soc Ser B. 2002;64:479–498. 21. Storey JD, Tibshirani R. Statistical significance for genome wide studies. Proc Natl Acad Sci USA. 2003;100:9440–9445. [PubMed] 22. Cui X, Churchill GA. Statistical tests for differential expression in cDNA microarray experiments. Genome Biol. 2003;4:210–219. [PubMed] 23. Tsai C-A, Hsueh H-M, Chen JJ. Estimation of false discovery rates in multiple testing: application to gene microarray data. Biometrics. 2003;59:1071–1081. [PubMed] 24. Pounds S, Cheng C. Improving false discovery rate estimation. Bioinformatics. 2004;20:1737–1745. [PubMed] |
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
J Comput Biol. 2000; 7(6):819-37.
[J Comput Biol. 2000]Genomics. 2005 Jan; 85(1):16-23.
[Genomics. 2005]Pac Symp Biocomput. 2001; ():52-63.
[Pac Symp Biocomput. 2001]Biostatistics. 2005 Jan; 6(1):59-75.
[Biostatistics. 2005]Genome Biol. 2002; 3(5):research0022.
[Genome Biol. 2002]Proc Natl Acad Sci U S A. 2001 Apr 24; 98(9):5116-21.
[Proc Natl Acad Sci U S A. 2001]Proc Natl Acad Sci U S A. 2001 Apr 24; 98(9):5116-21.
[Proc Natl Acad Sci U S A. 2001]Proc Natl Acad Sci U S A. 2001 Apr 24; 98(9):5116-21.
[Proc Natl Acad Sci U S A. 2001]Proc Natl Acad Sci U S A. 2001 Apr 24; 98(9):5116-21.
[Proc Natl Acad Sci U S A. 2001]Behav Brain Res. 2001 Nov 1; 125(1-2):279-84.
[Behav Brain Res. 2001]Proc Natl Acad Sci U S A. 2001 Apr 24; 98(9):5116-21.
[Proc Natl Acad Sci U S A. 2001]Genome Biol. 2003; 4(4):210.
[Genome Biol. 2003]Proc Natl Acad Sci U S A. 2001 Apr 24; 98(9):5116-21.
[Proc Natl Acad Sci U S A. 2001]