![]() | ![]() |
Formats:
|
||||||||||||||||||
Copyright © 2009 Mosley and Keri; licensee BioMed Central Ltd. Intrinsic bias in breast cancer gene expression data sets 1Department of Pharmacology, Case Western Reserve University School of Medicine, Cleveland, USA 2Division of General Medical Sciences–Oncology, Case Western Reserve University School of Medicine, Cleveland, USA Corresponding author.Jonathan D Mosley: jdm20/at/case.edu; Ruth A Keri: keri/at/case.edu Received February 27, 2009; Accepted June 29, 2009. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Abstract Background While global breast cancer gene expression data sets have considerable commonality in terms of their data content, the populations that they represent and the data collection methods utilized can be quite disparate. We sought to assess the extent and consequence of these systematic differences with respect to identifying clinically significant prognostic groups. Methods We ascertained how effectively unsupervised clustering employing randomly generated sets of genes could segregate tumors into prognostic groups using four well-characterized breast cancer data sets. Results Using a common set of 5,000 randomly generated lists (70 genes/list), the percentages of clusters with significant differences in metastasis latencies (HR p-value < 0.01) was 62%, 15%, 21% and 0% in the NKI2 (Netherlands Cancer Institute), Wang, TRANSBIG and KJX64/KJ125 data sets, respectively. Among ER positive tumors, the percentages were 38%, 11%, 4% and 0%, respectively. Few random lists were predictive among ER negative tumors in any data set. Clustering was associated with ER status and, after globally adjusting for the effects of ER-α gene expression, the percentages were 25%, 33%, 1% and 0%, respectively. The impact of adjusting for ER status depended on the extent of confounding between ER-α gene expression and markers of proliferation. Conclusion It is highly probable to identify a statistically significant association between a given gene list and prognosis in the NKI2 dataset due to its large sample size and the interrelationship between ER-α expression and markers of proliferation. In most respects, the TRANSBIG data set generated similar outcomes as the NKI2 data set, although its smaller sample size led to fewer statistically significant results. Background Over the past decade, a large number of global gene expression data sets of human breast cancers have become publicly available [1-6]. These data sets have provided a wealth of information for the generation and testing of biological and clinical hypotheses [7]. Clinical and pathological factors with relevance to breast cancer are extensively characterized, and the prognostic significance of these factors is reflected in these publicly available data sets. These factors include tumor grade, Her2 and estrogen receptor (ER) expression [8]. Whether gene expression data contributes additional prognostic information beyond what is offered by these clinical factors is debated [9,10]. Gene expression profiles associated with individual clinical hallmarks have also been described. For instance, a large set of genes is associated with ER gene expression [5,11]. In addition, the molecular basis of grade has also been examined with results showing a strong relationship between histological grade and tumor proliferation [6,12]. Indeed, the consistent prognostic efficacy of a proliferation signature is now well established [13,14]. The strong inter-relationships between clinical features, gene expression patterns and prognosis has led to the postulate that, depending upon the underlying relationship between the clinical and prognostic factors in a given data set, prognostic gene expression signatures may simply function as a proxy measure for these established clinical variables. For example, Gruvberger et. al. showed that a gene expression signature derived from the van't Veer data set [15], for which ER status was a strong predictor of the incidence of metastasis, was not predictive of metastasis in a data set for which this relationship did not exist [16]. This observation led the authors to propose that derivation of future prognostic gene signatures stratify analyses by ER status in order to adjust for the known association between gene expression and ER status. This suggestion has been variably implemented, but is often ignored. Another consequence of the association between prognosis and large sets of correlated genes is that a large number of predictive gene lists can be derived by selecting different members of predictive clusters of correlated genes. This phenomenon can occur even when gene selection adheres to a standardized protocol due to variations such as the specific tumors used in training sets (subsets of tumors used to derive prognostic lists) [10,17]. While breast cancer gene expression data sets have considerable commonality in terms of their data content, the populations that they represent and the data collection and analysis methods can be quite disparate. The advantage of this heterogeneity is that it provides an opportunity to test the robustness of a hypothesis across multiple populations represented in these data sets [18]. However, a potential disadvantage of this heterogeneity is that systematic differences between data sets, unrelated to analytical approaches, may create sources of bias that impact their intrinsic likelihood of confirming a given hypothesis. To gain insight into the extent of potential systematic differences in obtaining statistically significant results across breast cancer gene expression data sets, we ascertained whether there were intrinsic differences in the likelihood of observing an association between the expression of a selected set of genes and metastasis using four well-characterized data sets. To minimize bias in our approach, an unsupervised clustering algorithm was used to segregate tumors into one of two clusters based on the expression levels of randomly selected sets of genes (30–400 genes/set). These clusters were then compared for differences in metastasis latencies. We found that one data set, the Netherlands Cancer Institute (NKI2) set of young women with early stage disease, was considerably more likely to give a significant finding when examining either all tumors or only ER positive tumors, as compared to the other data sets examined. Factors that contributed to an increased likelihood of a random gene list being predictive of metastasis within a given data set were 1) the number of tumors analyzed and 2) the inter-relationships between ER expression, proliferation and metastases. We suggest that these intrinsic differences between the data sets should be considered in the design and analysis of future studies incorporating gene expression data. Methods Previously published microarray data sets Global gene expression and clinical data (including estrogen receptor status and metastasis recurrence latencies) were analyzed in four independent, publicly available breast cancer gene expression data sets. The Netherlands Clinical Institute (NKI2) data set contains data on 295 women with early stage breast cancer (downloaded from http://www.rii.com/publications/default.htm) [4,19]. The Wang data set contains gene expression data on 296 women with lymph node negative disease [1] (GEO series GSE2034). The KJX64 and KJ125 data sets contain data on 189 women, 64 of which were treated with tamoxifen, with primary operable invasive breast cancer (GEO series GSE2990) [6]. The TRANSBIG data set contains data for 183 untreated women from the Bordet Institute (GEO series GSE7390) [20,21]. All probes from each data set were used in the analyses except in the NKI2 data set where only the probes where there was complete data on more than 291 of the 295 subjects were used (n = 24,023 probes). Missing values for genes in the NKI2 set were imputed using the "impute" option in the FastClus procedure. Gene expression data for the NKI2 data set were given as log10 expression ratios, while data from the TRANSBIG and KJX64/KJ125 sets were log2 expression values. Expression values were log2 transformed in the Wang gene expression data set. Each gene probe in each data set was mapped to a unigene cluster ID using the SOURCE database (source.stanford.edu). For each gene expression data set, a new data set containing a single set of expression data for each unique unigene cluster ID was created. In the Wang, Miller and KJX64/KJ125 data sets, the expression values for each probe was set to have a median value of 0 and standard deviation of 1. In instances where there were multiple gene probes with a common unigene cluster ID, the median expression value of all the common probes was used. There were 11,318 genes with unique unigene identifiers that were common to all four data sets. Clustering using random gene lists Random lists of genes of various sizes (30 to 400 genes per list) were generated by simple random sampling (Surveyselect procedure). For each randomly-generated gene list, tumors were separated into two groups using an unsupervised hierarchical clustering procedure that was based on a correlation matrix derived from standardized centroid components for cluster assignment (Varclus procedure). The two groups represent the first bifurcation of the clustering hierarchy. Random gene lists were also generated using subsets of genes that were identified as being associated with overall metastases latencies. The subsets of genes utilized in this analysis were those that had a relatively modest proportional hazards p-value of less than 0.1 in a Cox regression analysis. Survival analyses All survival analyses were based on 5-year metastatic recurrence latencies. All subjects not experiencing metastasis within 5 years were censored at that time point. Cox proportional hazards regression models were used to ascertain differences in latencies between groups assigned by hierarchical clustering (PHReg procedure). Univariate proportional hazard ratios and p-values are reported for all analyses and represent the differences in risk between the relatively "poor" prognosis group as compared to a relatively "good" prognosis group. In univariate analyses examining the association between ER status and latencies, hazard ratios represent the change in risk for ER positive tumors, as compared to ER negative tumors. Multivariable models adjusted for ER-α expression used the ESR1 probe in the NKI2 data set and the "205225_at" gene probe in other data sets. Determining cluster assignment association with ER status For each pair of clusters created by the hierarchical clustering procedure, the percentage of all ER negative and ER positive tumors contained in each cluster was computed. The maximum value of the ratio of the ER negative percentage to the ER positive percentage among the two clusters was then computed. If ER positive and negative tumors were assigned to clusters in equal proportions, this ratio would be approximately 1. Otherwise, this ratio would be greater than 1, indicating that the clusters contained a relatively disproportionate number of ER negative tumors, as compared to ER positive tumors. Global adjustment of gene expression data To eliminate correlations between all genes in the NKI2 data set and either ER-α or proliferation genes, the data were globally adjusted by fitting a least-squares regression line to each probe in the dataset and then computing the residuals (GLM procedure) [14]. Each probe was the dependent variable and either ER-α or a proliferation gene was the continuous independent variable. The residuals (adjusted values) for each probe represent the new expression values for that probe. This new adjusted value is no longer linearly correlated with the independent variable. The gene selected to adjust each data set for proliferation was based on a previously published analysis and represents the gene probe that was the most predictive member of a set of correlated proliferation-associated genes linked with an increased risk of developing metastases [14]. The specific genes used in each data set were UBE2C (NKI2 and TRANSBIG), HMMR (KJX64/KJ125) and RACGAP1 (Wang). Determining the total variance explained by a gene cluster The total variance for a data set was computed by summing up the variances for each gene in the data set. To determine the variance explained by globally adjusting a data set for a gene, the total variance of the adjusted data set was computed and the percent change of this value relative to the unadjusted value was determined. The proportion of the proliferation-associated variance explained by adjusting for ER-α was calculated using the equation:
where Var(EP) is the percent change in variance after sequentially adjusting a data set for ER-α and proliferation and Var(E) and Var(P) are the percent changes in variance after individually adjusting for either the ER-α or proliferation genes, respectively. Statistical packages All calculations were performed using SAS version 9.1 (SAS Institute, Cary, NC). All statistical tests were two-sided, and a p-value less than 0.01 was considered statistically significant. Results Intrinsic bias in gene expression data sets We sought to ascertain whether there may be intrinsic bias among publicly available breast cancer gene expression data sets that would influence the likelihood of observing a significant difference in metastatic tumor recurrence latencies based on gene expression patterns of primary tumors. To address this question, we employed an empirical approach whereby we determined the probability of identifying groups of tumors with statistically significant differences in recurrence latencies using hierarchical clustering that was driven by randomly generated lists of genes. Four previously published data sets were examined. Marked differences in the ability to segregate good and poor prognosis tumors were observed between the data sets using randomly generated gene lists of various sizes (figure (figure1A1A
We next assessed whether the differences among the data sets was the result of differences in the content of the genes represented on the microarrays used in each study. We performed clustering analysis using a set of 5,000 random lists (70 genes/list) that were comprised of genes that were common to all four data sets (figure (figure1B).1B In light of the known association between estrogen receptor (ER) status and disease outcome [22] and the fact that expression of numerous genes is associated with ER status in breast cancer expression data sets [11,16], these findings could be due to cluster assignment dictated by the ERα-correlated gene expression network. Among the four data sets we evaluated, ER status was associated with significant differences in latencies in the NKI2 and TRANSBIG data sets (table 1). In multivariable Cox regression models adjusting for ER-α gene expression, there was a significant attenuation in the number of random gene lists that were predictive in the NKI2 and TRANSBIG data sets (table 1B). In the NKI2 set, there was a 60% reduction in the number of predictive gene lists. However, approximately 25% percent of the random gene lists were still predictive. In contrast, none of the gene lists in the TRANSBIG gene set were predictive after ER-α adjustment, suggesting that none of the significant gene clusters were measuring significantly more information than ER-α gene expression. The Wang data set was relatively unaffected by ER-α adjustment.
Gruvberger et. al. has suggested that prognostic gene lists be tested independently on ER positive and negative tumors in order to control for the effects of ER-α gene expression on tumor assignment [16]. We re-examined our prognostic lists separately in ER positive and negative tumors. While none of the lists were predictive amongst ER negative tumors, the NKI2 data set continued to show a significantly higher likelihood of giving a positive finding, as compared to the other data sets (figure (figure1C).1C In summary, these results indicate that there is a large disparity in the likelihood of observing significant differences in metastasis risk among breast cancer data sets using arbitrary gene expression levels as a classifier. In particular, it was 4–5 times more likely to obtain a significantly predictive gene list in the NKI2 data set than the other data sets, even in stratified analyses examining ER positive tumors. Cluster assignment is associated with ER status The fact that multivariable adjustment for ER-α decreased the number of predictive gene lists in the NKI2 and TRANSBIG data sets suggests that the hierarchical clustering algorithm is functioning in large part by stratifying tumors into groups based on ER status. To confirm this, we examined the association between ER status and cluster assignment across the data sets. Specifically, we determined if there were disproportionate proportions of ER negative, as compared to ER positive, tumors in the different prognostic groups that were produced by the 5,000 random gene lists (70 genes/list) (figure (figure2A).2A
While clustering is associated with ER status, it is not clear whether this association is due to the specific contribution of ER-α correlated genes, as has been suggested [16], or whether other gene clusters associated with ER status may be driving this association. To directly ascertain the contribution of ER-α correlated genes to cluster assignment, the gene expression data were globally adjusted to eliminate the correlations between all genes in a data set and the ER-α gene [14,23]. To accomplish this, we fit a least-squares regression line to each probe in the dataset, using ER-α as the independent variable, and then computed the residual. This residual represented the new expression value for that probe. After adjustment, the tumors were reclustered using the previously generated 5,000 lists of 70-gene lists and reanalyzed. Global adjustment eliminated the propensity for clusters to be associated with ER status in all data sets except the KJX64/KJ125 data set, in which there was only a modest attenuation (figure (figure2B).2B In summary, these data show that ER-α correlated genes can serve to either increase or decrease the likelihood that a random set of genes will cluster tumors into groups with significant differences in latencies. The differences in the direction of the effect are related to the underlying association between ER status and the risk of metastasis in each data set. Proliferation-correlated genes eliminate the prognostic contributions of ER-α correlated genes It has been previously shown that genes associated with cellular proliferation are strong predictors of metastases [13,14,24-26]. To ascertain the impact of adjusting for proliferation-correlated genes on cluster assignment and outcome, each data set was globally adjusted for proliferation. Adjustment almost completely eliminated the prognostic abilities of all of the random gene lists in the NKI2, Wang and TRANSBIG data sets (figure (figure2C),2C The relationship between ER-α and proliferation varies across data sets While global adjustment for proliferation eliminated the predictive capacity of the random gene lists in the NKI2, Wang and TRANSBIG data sets, adjustment for ER-α had differential effects across these data sets. To better characterize the relationship between genes that are correlated with ER-α or proliferation and cluster assignment, we characterized the inter-relationships between these gene clusters. We first examined the proportion of the total variance in each data set accounted for by adjustment for either ER-α or proliferation correlated genes (figure (figure3a).3a
It is important to note that the variation explained by the ER-α and proliferation-correlated genes is not independent. Thus, in contrast to the above similarities between the data sets, the proportion of the proliferation-associated variance that is explained by ER-α differs substantially (figure (figure3B).3B Proliferation-correlated genes do not explain disparities in class prediction among the data sets for ER positive tumors To gain more insight into possible differences between the NKI2, Wang and TRANSBIG data sets, we next examined the ER positive tumors. Amongst ER positive tumors, a random gene list was substantially more likely to give a significant result in the NKI2 data set, as compared to the other data sets. Among ER positive tumors, global adjustment for proliferation also eliminated the prognostic abilities of all of the random gene lists in the NKI2, Wang and TRANSBIG data sets (figure (figure4A).4A
A larger sample size may account for differences between the NKI2 and TRANSBIG data sets Another explanation for the disparities in the probability of identifying significantly different classes of tumors with random gene classifiers between the data sets, especially the NKI2 and TRANSBIG data set which share many commonalities, is differences in the sample sizes (table 1). The NKI2 and Wang data sets are considerably larger than the TRANSBIG data set. A larger sample size typically gives smaller standard errors of measurement and, consequently, smaller p-values. Thus, rather than using a p-value as an outcome, we examine hazard ratio (HR) coefficients, which show the magnitude of the differences in risk between cluster pairs, independent of sample size. When looking at all tumors in the NKI2, Wang and TRANSBIG data sets the median size and range of the HRs for the NKI2 and TRANSBIG data sets were approximately comparable (median of 2.0 vs. 1.9, respectively) (figure (figure5A).5A
Discussion In the present study, we show that there are marked differences in the likelihood of observing a positive association between the expression patterns of a set of genes and metastasis across four breast cancer global gene expression data sets. Most striking was the observation that over 60% of randomly selected lists of 70 genes each could segregate tumors into two groups with significant differences in outcome in the NKI2 data set. This finding is particularly salient in light of the fact that this data set has been frequently used to both generate and verify prognostic lists of genes [4,6,19,27-29]. The ability of the majority of random gene lists to predict outcome calls into question the biological validity of obtaining a positive finding in this data set, as a vast number of combinations of likely biologically unrelated genes are predictive of metastasis. Gruvberger et. al., noting a strong association between ER status and outcome in the van't Veer data set [15], suggested that analyses of prognostic gene lists be stratified by ER status to ascertain whether the lists were simply functioning as proxies of ER-α gene expression [16]. ER status was associated with the risk of metastases in two of the data sets that we analyzed, the NKI2 (a partial superset of the van't Veer data set) and TRANSBIG data sets. In three of the four data sets, we found that tumors tended to cluster by ER status. This was directly attributable to genes correlated with ER-α expression, as elimination of the variance in gene expression associated with ER-α expression completely attenuated disproportionate clustering by ER status. Adjusting for the expression of ER-α decreased the proportion of predictive gene lists in the NKI2 and TRANSBIG data sets, as would be expected. It also substantially increased the proportion of predictive gene lists from approximately 15% to 35% in the Wang data set where ER status is not an independent predictor of metastasis. Thus, a gene list which gives disparate results in the Wang versus NKI2 or TRANSBIG data sets may, indeed, be functioning as a proxy for ER expression. While ER-α expression may be an important contributor to cluster assignment, it is not the primary determinant for obtaining a significant prognostic gene list. When each of the data sets was globally adjusted for proliferation, there were almost no significantly predictive gene lists, even though tumors tended to cluster by ER status. Genes associated with proliferation have been shown to be the essential contributors to an effective prognostic gene list in gene expression data sets of combined ER positive and negative tumors or just ER positive tumors [13,14,26]. We show that the extent to which adjusting for ER expression impacts a prognostic gene list is related to the extent to which the expression the ER-α gene and proliferation genes covary. Hence, ER status is predictive of prognosis to the extent to which it functions as a confounder to the relationship between the expression of proliferation genes and the risk of metastasis. Stratifying our analyses by ER status had differential effects on the data sets. In the NKI2 data set, almost 40% of gene lists were predictive in ER positive tumors, while less than 5% were predictive in the TRANSBIG data set. This result is interesting since both data sets had a similar proportion of events occurring among ER positive and negative tumors and ER status was associated with outcome in both data sets. When we examined the distribution of the hazard ratio estimates, which show the magnitude of the differences in rates of metastases of the tumors in the cluster pairs, the hazard ratio estimates tended to be larger in the TRANSBIG data set. This analysis suggests that clusters generated with the random gene lists tend to have larger differences in the metastasis latencies in the TRANSBIG data. Thus, the most likely explanation for the difference in the proportion of significant findings between these data sets is the larger sample size of the NKI2 data set. Other than sample size variations, these data sets were similar in many regards. The similarities may reflect commonalities in their patient populations, both of which were derived from young women in northern European countries. In contrast to the ER positive tumors, virtually none of the random gene lists was prognostic in ER negative tumors in stratified analyses. This observation may suggest that, among these data sets, there is not a large, dominant cluster of correlated genes associated with prognosis among these tumors. Another possible explanation is that there is more heterogeneity among ER negative tumors, which can be comprised on Her2 amplified tumors, basal as well as familial forms of breast cancer [30-32]. We would note, however, that gene-expression based prognostic signatures for ER negative tumors have been described [33,34], suggesting that robust common prognostic features likely exist amongst these tumors. We found that, in general, random gene lists were less predictive of recurrence in the Wang data set. In analyses of all tumors, this effect could be partially attributed to the lack of an association between ER status and outcome in this data set, as described above. The random gene lists were also far less predictive among ER positive tumors in this data set, as compared to the NKI2 data set. This difference cannot be attributed to sample size differences as these data sets are approximately the same size. Like the TRANSBIG data set, the Wang data set is comprised of lymph node negative patients. While this difference could account for disparities between the Wang and NKI2 data sets (which utilized tumors from patients with node positive and negative disease), it would not account for differences between it and the TRANSBIG data set. Thus, the relevance of the difference in tumor composition with respect to this clinical variable is not clear. Another notable difference between the data sets is the older age representation among participants in the Wang study which could contribute additional heterogeneity amongst these tumors. It is also interesting to note that random gene lists containing genes that were independently predictive of metastases were only predictive approximately 55% percent of the time in this data set, versus 90% or more of the time in the other data sets (see Additional File 1). This might suggest that there are more independent clusters of genes predictive of prognosis in this data set. Consistent with this hypothesis, we have previously found a higher degree of correlation among independently predictive genes in NKI2 data set versus the Wang data set [14]. Michiels et. al. showed in the van't Veer data set that the set of genes selected for a prognostic classifier is highly variable depending upon the training set used to identify prognostic genes [35]. Based on our findings, we would anticipate an even greater degree of variability in the genes identified if their analysis was repeated using the Wang data set. The KJX64/KJ125 data set produced considerably different results than the other data sets. Virtually none of the random gene lists were prognostic in this data set. It is important to note that it is possible to generate a large set of prognostic genes lists in this data set, as we found that 97% of random gene list derived from genes independently associated with metastasis were indeed predictive (see Additional File 1). In addition, this data set has also been used to generate other validated gene lists [6]. We found anomalies in this data set that may account for differences in its behavior, as compared to the other data sets, such as the fact that the ER-α gene probe levels were poorly correlated with ER status. This might suggest that there are issues with scaling of the raw data or that there are a large proportion of gene probes which did not perform optimally. However, regardless of how we rescaled the data, we obtained nearly identical results. Furthermore, this data set was derived using the same gene expression technology (Affymetrix) as the TRANSBIG and Wang data sets. Hence, differences in behavior cannot be attributed to differences in technology. Conclusion In summary, these analyses demonstrate that there are systematic differences among the four breast cancer gene expression data sets examined herein. Most notable is the fact that the NKI2 data set provides a very high likelihood of obtaining a significant association between expression and metastases for a given set of genes. This disparity persists when examining all tumors in the data set or when looking only at ER positive tumors. The TRANSBIG data set behaved similarly to the NKI2 data set and the lower likelihood of obtaining a positive result in this data set is likely attributable to its smaller sample size. Fewer random gene lists are predictive in the Wang data set due to the lack of an association between ER status and the risk of metastasis. These differences in data sets should be considered when using them for developing new prognostic classifiers or assessing the robustness of classifiers that were generated with other data sets. Competing interests The authors declare that they have no competing interests. Authors' contributions JDM and RAK contributed equally to the design and implementation of this study. All statistical analyses were performed by JDM. All authors read and approved the final manuscript. Pre-publication history The pre-publication history for this paper can be accessed here: Acknowledgements This work was supported by grants from the National Institutes of Health [RO1-CA090398 (to RAK) and training grant T32-GM07250 (MSTP/CASE) (to JDM)] and an Ohio Innovation Incentive Fellowship (to JDM). References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||
Lancet. 2005 Feb 19-25; 365(9460):671-9.
[Lancet. 2005]J Natl Cancer Inst. 2006 Feb 15; 98(4):262-72.
[J Natl Cancer Inst. 2006]Ann Oncol. 2006 Sep; 17 Suppl 10():x259-62.
[Ann Oncol. 2006]J Clin Oncol. 2007 Nov 20; 25(33):5287-312.
[J Clin Oncol. 2007]Eur J Cancer. 2004 Aug; 40(12):1837-41.
[Eur J Cancer. 2004]Nature. 2002 Jan 31; 415(6871):530-6.
[Nature. 2002]Breast Cancer Res. 2003; 5(1):23-6.
[Breast Cancer Res. 2003]Eur J Cancer. 2007 Mar; 43(4):745-51.
[Eur J Cancer. 2007]Bioinformatics. 2005 Jan 15; 21(2):171-8.
[Bioinformatics. 2005]BMC Genomics. 2007 Sep 20; 8():331.
[BMC Genomics. 2007]N Engl J Med. 2002 Dec 19; 347(25):1999-2009.
[N Engl J Med. 2002]N Engl J Med. 2006 Aug 10; 355(6):560-9.
[N Engl J Med. 2006]Lancet. 2005 Feb 19-25; 365(9460):671-9.
[Lancet. 2005]J Natl Cancer Inst. 2006 Feb 15; 98(4):262-72.
[J Natl Cancer Inst. 2006]Clin Cancer Res. 2007 Jun 1; 13(11):3207-14.
[Clin Cancer Res. 2007]BMC Med Genomics. 2008 Apr 25; 1():11.
[BMC Med Genomics. 2008]N Engl J Med. 2002 Dec 19; 347(25):1999-2009.
[N Engl J Med. 2002]J Natl Cancer Inst. 2006 Feb 15; 98(4):262-72.
[J Natl Cancer Inst. 2006]Lancet. 2005 Feb 19-25; 365(9460):671-9.
[Lancet. 2005]Clin Cancer Res. 2007 Jun 1; 13(11):3207-14.
[Clin Cancer Res. 2007]Cancer. 1977 Jun; 39(6 Suppl):2934-47.
[Cancer. 1977]Cancer Res. 2001 Aug 15; 61(16):5979-84.
[Cancer Res. 2001]Breast Cancer Res. 2003; 5(1):23-6.
[Breast Cancer Res. 2003]Breast Cancer Res. 2003; 5(1):23-6.
[Breast Cancer Res. 2003]Breast Cancer Res. 2003; 5(1):23-6.
[Breast Cancer Res. 2003]BMC Med Genomics. 2008 Apr 25; 1():11.
[BMC Med Genomics. 2008]Bioinformatics. 2007 Aug 15; 23(16):2080-7.
[Bioinformatics. 2007]Cell Cycle. 2006 Oct; 5(19):2198-202.
[Cell Cycle. 2006]BMC Med Genomics. 2008 Apr 25; 1():11.
[BMC Med Genomics. 2008]BMC Cancer. 2007 Sep 25; 7():182.
[BMC Cancer. 2007]Cancer Res. 2005 May 15; 65(10):4059-66.
[Cancer Res. 2005]N Engl J Med. 2002 Dec 19; 347(25):1999-2009.
[N Engl J Med. 2002]J Natl Cancer Inst. 2006 Feb 15; 98(4):262-72.
[J Natl Cancer Inst. 2006]N Engl J Med. 2006 Aug 10; 355(6):560-9.
[N Engl J Med. 2006]Proc Natl Acad Sci U S A. 2005 Mar 8; 102(10):3738-43.
[Proc Natl Acad Sci U S A. 2005]Genome Biol. 2006; 7(10):R101.
[Genome Biol. 2006]Nature. 2002 Jan 31; 415(6871):530-6.
[Nature. 2002]Breast Cancer Res. 2003; 5(1):23-6.
[Breast Cancer Res. 2003]Cell Cycle. 2006 Oct; 5(19):2198-202.
[Cell Cycle. 2006]BMC Med Genomics. 2008 Apr 25; 1():11.
[BMC Med Genomics. 2008]Cancer Res. 2005 May 15; 65(10):4059-66.
[Cancer Res. 2005]Cancer Causes Control. 1999 Oct; 10(5):431-7.
[Cancer Causes Control. 1999]Science. 1987 Jan 9; 235(4785):177-82.
[Science. 1987]Breast Cancer Res. 2008; 10(4):R73.
[Breast Cancer Res. 2008]Genome Biol. 2007; 8(8):R157.
[Genome Biol. 2007]BMC Med Genomics. 2008 Apr 25; 1():11.
[BMC Med Genomics. 2008]Lancet. 2005 Feb 5-11; 365(9458):488-92.
[Lancet. 2005]J Natl Cancer Inst. 2006 Feb 15; 98(4):262-72.
[J Natl Cancer Inst. 2006]