![]() | ![]() |
Formats:
|
||||||||||||||||||||||||
Copyright van Vliet et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Module-Based Outcome Prediction Using Breast Cancer Compendia 1Information and Communication Theory Group, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Delft, The Netherlands 2Bioinformatics and Statistics Group, Department of Molecular Biology, Netherlands Cancer Institute, Amsterdam, The Netherlands 3Mouse Models for Breast Cancer, Department of Molecular Biology, Netherlands Cancer Institute, Amsterdam, The Netherlands Xiaolin Wu, Academic Editor National Cancer Institute at Frederick, United States of America * To whom correspondence should be addressed. E-mail: M.H.vanVliet/at/TUDelft.nl Conceived and designed the experiments: LW CK Mv MR. Performed the experiments: Mv. Analyzed the data: Mv. Wrote the paper: LW Mv MR. Received July 14, 2007; Accepted September 28, 2007. This article has been cited by other articles in PMC.Abstract Background The availability of large collections of microarray datasets (compendia), or knowledge about grouping of genes into pathways (gene sets), is typically not exploited when training predictors of disease outcome. These can be useful since a compendium increases the number of samples, while gene sets reduce the size of the feature space. This should be favorable from a machine learning perspective and result in more robust predictors. Methodology We extracted modules of regulated genes from gene sets, and compendia. Through supervised analysis, we constructed predictors which employ modules predictive of breast cancer outcome. To validate these predictors we applied them to independent data, from the same institution (intra-dataset), and other institutions (inter-dataset). Conclusions We show that modules derived from single breast cancer datasets achieve better performance on the validation data compared to gene-based predictors. We also show that there is a trend in compendium specificity and predictive performance: modules derived from a single breast cancer dataset, and a breast cancer specific compendium perform better compared to those derived from a human cancer compendium. Additionally, the module-based predictor provides a much richer insight into the underlying biology. Frequently selected gene sets are associated with processes such as cell cycle, E2F regulation, DNA damage response, proteasome and glycolysis. We analyzed two modules related to cell cycle, and the OCT1 transcription factor, respectively. On an individual basis, these modules provide a significant separation in survival subgroups on the training and independent validation data. Introduction Unraveling the structure of complex biological processes from genomic data sources has been a focal point in bioinformatics research. Thus far, supervised analysis of microarray data has been performed in a data-driven fashion [1]–[4]. These studies have reported and tested prognostic markers, sets of genes, which are predictive of treatment response and outcome. One of the main issues in data-driven approaches is the small ratio of samples relative to the number of genes for a particular study, causing small sample size related problems. This problem can be addressed by reducing the number of features (input variables) or increasing the number of samples. The latter approach was pursued by combining two or even more datasets and then deriving prognostic markers from the resulting dataset [5]–[8]. Employing more samples results in, for instance, better estimates of gene variances and improves estimates of the t-statistic [9]. This approach was also followed by Segal et al. [10] and Tanay et al. [11] who constructed microarray gene expression compendia (collections of microarray data sets spanning a diversity of phenotypes). The supervised analyses performed on compendia are data-driven and currently still employing single genes as input features. As an alternative, knowledge of functional groupings of genes into, for example pathways, can be employed to define meta-features, called modules. Such meta-features have two important advantages. Firstly, a relevant module can be directly linked to the biological processes that underly the observed outcome. Secondly, moving from a gene-based to a module-based representation reduces the number of input variables, which alleviates the small sample size problem. Segal et al. [10] proposed a framework for the unsupervised knowledge-driven analysis of expression data. Within this framework, modules are extracted based on relevant gene sets from a compendium of microarray data. We follow that approach, and extend the framework to include a supervised classification analysis based on the extracted modules and the available clinical data. In addition, we introduce cancer-specific compendia, as an intermediate step between a single dataset and a complete human cancer compendium. Employing the supervised framework, we evaluate the predictive performance of classifiers derived from cancer-specific datasets, a cancer specific compendium, and a human cancer compendium. In addition, we wanted to investigate the capacity of these classifiers to generalize beyond the dataset on which they were trained. Therefore, we set up an experiment in which we validated our classifier on independent data from the same institution (intra-dataset validation), a combination of institutions (cross-dataset validation), and by validating on data from different institutions (inter-dataset validation). Finally, since we adopted the module extraction of Segal et al. [10], the optimized set of modules that is selected by the supervised analysis allows for a more transparent analysis of the obtained results. That is, the modules can be related to the original gene sets, and thus, to cellular processes, giving more insight into the mechanisms causing the outcome differences. Methods Our method extends the unsupervised knowledge-driven framework proposed by Segal et al. [10] to the supervised classification domain. This extension allows the identification of module-based prognostic markers, rather than gene-based markers. The entire outline of our methodology is presented in Figure 1
Input: Compendium The usual approach is to analyse a single microarray dataset in isolation. To find cancer related modules, Segal et al. [10] proposed to take multiple datasets into account that are all related to human cancer types (Figure 1
In our analyses we have also used the HCC that was published by Segal et al. [10]. This compendium contains data from various cancer types and has a total of 1973 arrays, for which 14143 genes are present. The compendium already contained data from three previous breast cancer studies, in total 152 arrays: 26 arrays from the first study by Perou et al. [12], 41 arrays from the second study by Perou et al. [13], and 85 arrays from Sorlie et al. [14]. The additional breast cancer microarray datasets that we have used, originate from previously published research [2]–[4]. The Vijver dataset consists of 295 breast cancer patients, the Wang dataset consists of 285 records, and the Sorlie data consists of 167 records. To be able to use these datasets in conjunction with the HCC, we mapped all the available probes to the same set of Entrez ids in the HCC. Furthermore, after mean-normalisation, we discretized each dataset separately into three levels: induced (1), basal (0), and repressed (−1), taking into account the skewing and variance in each of the datasets (Discretization was applied, because the module extraction procedure that Segal et al. [10] proposed, requires a discrete input.). Outcome data (time to metastasis) was available for all patients in the Vijver and Wang datasets. For classification, the poor outcome group was defined as all patients with time to metastasis less than five years, and the good outcome group as those with time to metastasis greater than or equal to five years. Censored patients in the poor group were not taken into account when training and assessing a classifier. On the other hand, censored patients in the good group were included in both the training and validation [15]. Input: Gene Set Collection We collected 2682 gene sets from several biological databases and resources (Figure 1
Unsupervised Analysis To extract modules from compendia of microarrays, we largely followed the knowledge-driven approach proposed by Segal et al. [10] (Figure 1 Following the extraction of the modules, a module activity matrix is constructed for the training data as well as the validation data (Figure 1 The conversion from gene expression to module activity is done per array, per module. For the induced and repressed genes separately, we test whether the overlap of induced or repressed genes on the array with the module is significant, compared to a random draw. To this end, we use the hypergeometric distribution to calculate a p-value for the significance of the overlap. Following FDR correction [16] (significance threshold = 0.05) one of the following four situations occur:
Figure 4
Supervised Analysis Supervised classification provides a means to identify modules with activities that are significantly associated with some relevant outcome variable, such as, metastasis-free survival in breast cancer (Figure 1 In our experiment, we focused on differences between the sets of modules and we omitted an extensive evaluation of a range of different classifier types. Since all features are discrete, we used forward filtering as module selector, the mutual information as criterion to evaluate the individual modules (using maximally 200 modules), and a simple Bayes classifier [18]. For the discretized gene-expression data we used the same setup as for the module-based approach. Following the Train/Test procedure, we trained a final classifier (Figure 1 To assess the performance of the classifiers on the independent validation dataset, we constructed an ROC curve (Figure 1 Finally, a feedback step relates the modules selected in the classifier to the original gene sets (Figure 1 Experimental Setup We wanted to investigate the capacity of our classifiers to generalize beyond the dataset that they were trained on. Therefore, we designed our experiments such that three different validation schemes were possible. In all cases the training and validation sets are non-overlapping (independent), i.e. no samples that were used during module extraction/training are employed in the validation. The following three scenarios were considered: training and validation on data from 1) the same institution (denoted as intra-lab validation); 2) a combination of the same and other institutions (cross-lab validation); and 3) separate institutions (inter-lab validation). Since we had equivalent outcome data for the Vijver and Wang datasets, we mirrored the role of both so that we ended up with a total of six experiments, as shown in Table 1. Results and Discussion Extracting modules from the compendia For each of the compendia we derived a set of modules using the discretized gene expression data as well as the 2682 gene sets as input. The number of modules that were found are listed in Table 1. The number of modules found ranged from 576 to 1163, which is a significant reduction in the number of features from the original 14143 genes. Additionally, we included the previously published 456 modules [10] in the current investigation (S456). These differ from the HCC modules, as they are constructed based on gene sets derived by hierarchical clustering. Classification performances The classification performances of the experiments listed in Table 1, are compared based on the AUCs obtained on the validation data. For each of the six experiments (Intra1, Cross1, Inter1, Intra2, Cross2, Inter2) the results obtained with each feature type (BC, BCC, HCC, S456, Genes) were ranked based on the AUC. Figure 5
To assess the statistical significance of the observed differences between the median ranks, we applied a one-sided Wilcoxon rank sum test to all pair-wise combinations of feature types. The obtained p-values are depicted in the left panel in Figure 6
From the left panel in Figure 6 From the right panel in Figure 6 = 0.006). This indicates that a human cancer compendium lacks specificity with respect to a breast cancer compendium. We can therefore conclude that for breast cancer specific prediction, a cancer specific compendium is more suitable compared to a more global human cancer compendium. As shown by Segal et al.[10] the HCC and S456 modules may still be relevant for identifying differences between cancer types.The pairwise comparisons (left panel in Figure 6 For a given classifier to serve as a prognostic index in clinical practice, a suitable operating point on its associated ROC curve needs to be selected. For outcome prediction in breast cancer, the True Positive Rate (TPR) is typically set at a desired threshold, and based on the ROC curve, the corresponding False Positive Rate (FPR) possible is determined. Therefore, we have re-evaluated the AUC scores by integrating over the TPR interval ranging from 0.5 to 1. This interval was chosen since it reflects a clinically more relevant range than the complete TPR range ([0,1]). All results are reported in the supplementary information (Table S2, Figure S2 and Figure S3). Consistent with earlier results, the BC modules perform significantly better compared to all other feature types. In addition, the BCC modules now have a significantly lower median rank compared to the HCC and S456 modules (p = 0.05, and p = 0.01). This strengthens our conclusion that the BC modules perform better than the other feature types, and that a breast cancer specific compendium performs better compared to a human cancer compendium, especially when considering a clinically relevant setting.Interpretability of Gene and Module-based signatures All classifiers output a signature of relevant features, that is predictive for survival. To better understand the biological processes associated with disease outcome in breast cancer, the signatures are further investigated. For gene-based features, the overlap with known pathways is employed to attach biological meaning to an obtained signature. Module-based classifiers, on the other hand, return a set of predictive modules rather than single genes. Each of these individual modules may provide links to relevant biological processes. Moreover, since modules are extracted from the data by combining (parts of) gene sets, these may link additional genes to known pathways, which are relevant to disease progression in breast cancer. To explore the association of the signatures to biological processes, we analyzed a gene-based and BC module-based signature. We chose to compare the gene-based signature and BC module-based signature from Inter1 (Table 1), since they were derived from the same dataset. These signatures consist of 21 genes, and 55 modules (Dataset S1), respectively. For every module in the module-based signature, as well as the 21 single-gene signature, we computed the enrichment for each of the 2682 gene sets employing the hypergeometric distribution. For the gene-based signature no gene sets were significantly enriched (p<0.05 after Bonferroni correction), whereas 319 gene sets were significantly enriched in at least 1 of the 55 modules in the module-based signature (p<0.05 after Bonferroni correction). The complete matrices of raw p-values are depicted in supplementary information (Text S1, and Figure S4). Many of the 319 gene sets are associated with similar biological processes within the context of the 55 module signature (i.e. they have similar enrichment profiles across the 55 modules). Therefore, we clustered the gene sets based on enrichment scores into seven distinct clusters employing complete linkage, hierarchical clustering with Euclidean distance as dissimilarity measure. The common biological themes associated with the gene sets in each of the resulting seven clusters are listed on the left in Figure 7
Based on the complete table of enrichment p-values (Supplementary Figure S4), it is evident that there are clusters of modules that also show a similar enrichment pattern across the gene sets. Therefore, we clustered the modules into seven distinct groups, as depicted by the dendrogram at the top in supplementary Figure S4. These clusters were labeled Module groups 1 to 7, as indicated at the bottom in Figure 7 The main table in Figure 7 Figure 7 A detailed analysis of two modules Based on the module activity representation, the arrays in a dataset can be separated into 3 groups: arrays where the module is activated, repressed or showing basal activity. Using this separation, we present the Kaplan-Meier curves for two modules from the module-based signature, on the training data [4], as well as on the independent validation data [3], see Figures 8
Figure 8 = 0.0127). Deregulation of the cell cycle has been identified as one of the hallmarks of cancer [19]. More importantly, an increased activity of the cell cycle has been linked to more aggressive tumors. This is in accordance with our observation for this module, which shows that an induced module activity is linked to the subgroup with the worst outcome. Conversely, a repressed module activity shows the best outcome.Figure 9 = 0.0098). In breast cancer, the OCT1 transcription factor is known to be often overexpressed [21] relative to normal breast tissue, but its exact role in the tumorigenic process has remained unclear. Additionally, OCT1 has been identified as a transcriptional repressor [22]. We show that the concerted repression of downstream targets of the OCT1 transcription factor relates to a poor outcome group. On the other hand, an induced module activity relates to a subgroup with significantly better outcome. Thus, this module can be identified as a potential tumor suppressor module.Conclusion By extending an existing unsupervised knowledge-driven framework to the supervised classification domain, we were able to investigate the effects of including knowledge from previous gene expression studies (through compendia) as well as known cellular processes (through gene sets) on the accuracy of outcome prediction in breast cancer. Our analysis included a validation of the classifiers on independent data, which allowed for an objective evaluation of the actual generalization behavior of the gene-based and module-based classifiers in a clinically relevant setting. Classifiers based on genes had a very large variance, compared to the BC module-based classifier. We hypothesize that the conversion of gene expression data to module activity functions as a regularization step, where the extent of the regularization is controlled by the specificity of the modules, resulting in more stable classifiers. Overall, a trend emerges in the performance versus compendium specificity. Modules from the most specific single dataset showed the best performance-significantly outperforming all other classifiers. These were closely followed by modules extracted from a breast cancer specific compendium, which performed significantly better than modules from the human cancer compendium when evaluated across a clinically relevant TPR range. Finally, modules from the human cancer compendium showed the weakest performance. This indicates that it is preferable to employ a compendium specific to the cancer type under study. Moreover, the heterogeneity between different institutions tends to be more detrimental than the gain in sample size when a breast cancer specific compendium is constructed. A module-based approach to classification provides a signature of predictive modules, as opposed to a gene-based signature. Interpretation of a gene-based signature is usually limited to a mapping of the genes in the signature to functional categories. However, for the approach outlined here, it holds that the modules were constructed from biologically meaningful gene sets, and therefore these can be linked directly to the underlying biological processes. We illustrated this advantage by providing a meta-representation of the modules in one of the module-based classifiers, which reveals molecular processes, such as cell cycle, DNA damage, glycolysis, and proteasome, known to be involved in breast cancer. The gene-based signature provided no significant links at all. This gain in biological insight greatly favors the use of a module-based classifier. Our research includes an in-depth analysis of two modules that were part of the module-based signature, which were related to cell cycle, and the OCT1 transciption factor. By themselves, these modules provide a significant separation in subgroups on the training and independent validation data. The cell cycle related module indicated that an induced module activity is linked to the worst outcome. This confirms the well known relationship between the cell cycle process and cancer in general. On the other hand, the OCT1 related module revealed a novel relationship to breast cancer outcome. Based on its module activity, this module could be designated as a tumor repressor module. Neither of these factors could be revealed from the gene-based signature. Therefore, we conclude that module-based signatures provide a much richer insight to the underlying biology compared to gene-based signatures. Research on outcome prediction not only contributes to the development of reliable diagnostic tests, but also by improving our understanding of the processes involved in carcinogenesis, and specifically how these influence disease progression and therapy response. From a practical perspective, diagnostic tests based on small gene sets are preferred, and are also designed with this objective in mind. However, such sets often fail to provide significant biological insight into the disease. Our module-based classifiers were not designed to employ a minimal number of genes, and the large number of genes employed could be a limitation to the direct application of these classifiers in a clinical setting. However, in our study, the module-based classifier had a significantly lower variance in performance than the gene based classifier, a property which is clearly preferable in the clinical setting. We clearly demonstrate that the module-based gene sets provide a much richer feedback by revealing functional categories associated with disease outcome. These insights could speed up the development of anti-cancer drugs, since the identified processes will help focus the search for viable drug targets. In conclusion, while module-based classifiers are perhaps less practical for clinical use due to the large gene sets being employed, their robustness and the biological insights they provide will most likely result in both short and long term clinical benefit. Text S1 Module-based outcome prediction using breast cancer compendia. (0.11 MB DOC) Click here for additional data file.(107K, doc) Figure S1 Methodology overview. Overview of the unsupervised module extraction procedure, followed by a supervised investigation of the relation between module expression and conditions. In this example no FDR correction was done, so as to retain a fair amount of significantly expressed gene sets/modules. (1.16 MB EPS) Click here for additional data file.(1.1M, eps) Figure S2 Boxplot showing ranked AAC results. In each of the six experiments the features were ranked based on the AAC (TPR range from 0.5 to 1) obtained on the independent validation set (1 best, 5 worst). This boxplot shows the median rank along with the quartile ranges for each of the five features. (0.59 MB EPS) Click here for additional data file.(574K, eps) Figure S3 Comparison of ranked AAC results. Two tables showing a pairwise comparison of the five feature types. Each cell (row = i, column = j) depicts the the p-value obtained by performing a one-sided Wilcoxon rank sum test with as null hypothesis that the median rank of type i is lower than type j, based on the AACs (TPR range from 0.5 to 1) achieved for each of the six experiments. The plot on the left shows individual comparisons, the plot on the right includes comparisons of groups of features. Cell-shading reflects the p-values.(0.81 MB EPS) Click here for additional data file.(787K, eps) Figure S4 Comparison of a module-based signature (A) and a gene-based signature (B). The module-based signature from the Inter1 experiment contains 55 modules, and the gene-based signature contains 21 genes (Table 1). For both signatures an enrichment score for their overlap with the collection of 2682 gene sets was calculated based on the hypergeometric distribution. This resulted in a total of 319 gene sets that were enriched in at least one module or in the gene-based signature (P<0.05 after Bonferroni correction). Several modules turned out to have a similar pattern of enrichment across the gene sets. Additionally, gene sets that relate to a common theme turned out to have a similar enrichment pattern across the modules. Therefore, we clustered the matrix of p-values in both dimensions (2-dimensional, hierarchical clustering, complete linkage, Euclidean distance). The dendrograms at the top, and to the left indicate the clustering, where we chose to group either dimension into seven distinct groups. The labels on the right indicate the individual gene set labels, and the label on the bottom indicates the groups of modules formed along with the number of modules in each group in brackets. The main table shows the median p-value for the enrichment of each of the seven clusters of modules, across these seven groups of gene sets. Similarly, the table on the right shows the median p-values for the gene signature. Shading of the cells reflects the p-values. (4.12 MB EPS) Click here for additional data file.(3.9M, eps) Dataset S1 (5.69 MB XLS) Click here for additional data file.(5.4M, xls) Table S1 (0.03 MB DOC) Click here for additional data file.(31K, doc) Table S2 (0.03 MB DOC) Click here for additional data file.(31K, doc) Acknowledgments We thank L.J. van 't Veer, and D.S.A Nuyten for their valuable discussions on the work presented. Footnotes Competing Interests: The authors have declared that no competing interests exist. Funding: This work was part of the BioRange programme of the Netherlands Bioinformatics Centre (NBIC), which is supported by a BSIK grant through the Netherlands Genomics Initiative (NGI). References 1. Van 't Veer L, Dai H, van de Vijver M, He Y, Hart A, et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature. 2002;415:530–6. [PubMed] 2. Sorlie T, Tibshirani R, Parker J, Hastie T, Marron J, et al. Repeated observation of breast tumor subtypes in independent gene expression data sets. PNAS. 2003;14:8418–23. [PubMed] 3. Wang Y, Klein J, Zhang Y, Sieuwerts A, Look M, et al. Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. The Lancet. 2005;365:671–9. 4. Van de Vijver M, He Y, van 't Veer L, Dai H, Hart A, et al. A gene-expression signature as a predictor of survival in breast cancer. N. Engl. J. Med. 2002;25:1999–2009. [PubMed] 5. Rhodes D, Yu J, Shanker K, Deshpande N, Varambally R, et al. Oncomine: a cancer microarray database and integrated data-mining platform. Neoplasia. 2004;1:1–6. [PubMed] 6. Shen R, Ghosh D, Chinnaiyan A. Prognostic meta-signature of breast cancer developed by two-stage mixture modelling of microarray data. BMC Genomics. 2004;5:94. [PubMed] 7. Jiang H, Deng Y, Chen HS, Tao L, Sha Q, et al. Joint analysis of two microarray gene-expression data sets to select lung adenocarcinoma marker genes. BMC Bioinformatics. 2004;5:81. [PubMed] 8. Teschendorff AE, Naderi A, Barbosa-Morais NL, Pinder SE, Ellis IO, et al. A consensus prognostic gene expression classifier for er positive breast cancer. Genome Biology. 2006;7:R101. [PubMed] 9. Kim R, Park P. Improving identification of differentially expressed genes in microarray studies using information from public databases. Genome Biology. 2004;5:R70. [PubMed] 10. Segal E, Friedman N, Koller D, Regev D. A module map showing conditional activity of expression modules in cancer. Nat. Genet. 2004;36:1090–8. [PubMed] 11. Tanay A, Steinfeld I, Kupiec M, Shamir R. Integrative analysis of genome-wide experiments in the context of a large high-throughput data compendium. Molecular Systems Biology. 2005;1 12. Perou C, Jeffrey S, van de Rijn M, Rees C, Eisen M, et al. Distinctive gene expression patterns in human mammary epithelial cells and breast cancers. PNAS. 1999;96:9212–7. [PubMed] 13. Perou C, Sorlie T, Eisen M, van de Rijn M, Jeffrey S, et al. Molecular portraits of human breast tumours. Nature. 2000;(406):747–52. [PubMed] 14. Sorlie T, Perou C, Tibshirani R, Aas T, Geisler S, et al. Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. PNAS. 2001;98:10869–74. [PubMed] 15. Bair E, Tibshirani R. Semi-supervised methods to predict patient survival from gene expression data. Plos Biology. 2004;2:e108. [PubMed] 16. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B. 1995;57:289–300. 17. Wessels L, Reinders M, Hart A, Veenman C, Dai H, et al. A protocol for building and evaluating predictors of disease state based on microarray data. Bioinformatics. 2005;21:3755–62. [PubMed] 18. Domingos P, Pazzani M. ICML. 1996. Beyond independence: conditions for the optimality of the simple bayesian classifier. 19. Hanahan D, Weinberg RA. The hallmarks of cancer. Cell. 2000;(1):57–70. [PubMed] 20. Bonnet S, Archer S, AllalunisTurner J, Haromy A, Beaulieu C, et al. A mitochondria k channel axis is suppressed in cancer and its normalization promotes apoptosis and inhibits cancer growth. Cancer Cell. 2007;100:37–51. [PubMed] 21. Jin T, Branch D, Zhang X, Qi S, Youngson B, et al. Examination of POU homeobox gene expression in human breast cancer cells. International Journal of Cancer. 1999;81:104–112. 22. Malin S, Linderson Y, Almqvist J, Ernberg I, Tallone T, et al. DNA dependent conversion of Oct1 and Oct2 into transcriptional repressors by Groucho/TLE. Nucl. Acids Res. 2005;33:4618–4625. [PubMed] 23. 24. |
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||||||||
Nature. 2002 Jan 31; 415(6871):530-6.
[Nature. 2002]N Engl J Med. 2002 Dec 19; 347(25):1999-2009.
[N Engl J Med. 2002]Neoplasia. 2004 Jan-Feb; 6(1):1-6.
[Neoplasia. 2004]Genome Biol. 2006; 7(10):R101.
[Genome Biol. 2006]Genome Biol. 2004; 5(9):R70.
[Genome Biol. 2004]Nat Genet. 2004 Oct; 36(10):1090-8.
[Nat Genet. 2004]Nat Genet. 2004 Oct; 36(10):1090-8.
[Nat Genet. 2004]Nat Genet. 2004 Oct; 36(10):1090-8.
[Nat Genet. 2004]Nat Genet. 2004 Oct; 36(10):1090-8.
[Nat Genet. 2004]Bioinformatics. 2005 Oct 1; 21(19):3755-62.
[Bioinformatics. 2005]Nat Genet. 2004 Oct; 36(10):1090-8.
[Nat Genet. 2004]N Engl J Med. 2002 Dec 19; 347(25):1999-2009.
[N Engl J Med. 2002]N Engl J Med. 2002 Dec 19; 347(25):1999-2009.
[N Engl J Med. 2002]Proc Natl Acad Sci U S A. 2003 Jul 8; 100(14):8418-23.
[Proc Natl Acad Sci U S A. 2003]Nat Genet. 2004 Oct; 36(10):1090-8.
[Nat Genet. 2004]Nat Genet. 2004 Oct; 36(10):1090-8.
[Nat Genet. 2004]Proc Natl Acad Sci U S A. 1999 Aug 3; 96(16):9212-7.
[Proc Natl Acad Sci U S A. 1999]Nature. 2000 Aug 17; 406(6797):747-52.
[Nature. 2000]Proc Natl Acad Sci U S A. 2001 Sep 11; 98(19):10869-74.
[Proc Natl Acad Sci U S A. 2001]Proc Natl Acad Sci U S A. 2003 Jul 8; 100(14):8418-23.
[Proc Natl Acad Sci U S A. 2003]N Engl J Med. 2002 Dec 19; 347(25):1999-2009.
[N Engl J Med. 2002]Nat Genet. 2004 Oct; 36(10):1090-8.
[Nat Genet. 2004]PLoS Biol. 2004 Apr; 2(4):E108.
[PLoS Biol. 2004]Nat Genet. 2004 Oct; 36(10):1090-8.
[Nat Genet. 2004]Nat Genet. 2004 Oct; 36(10):1090-8.
[Nat Genet. 2004]Nat Genet. 2004 Oct; 36(10):1090-8.
[Nat Genet. 2004]Bioinformatics. 2005 Oct 1; 21(19):3755-62.
[Bioinformatics. 2005]Nat Genet. 2004 Oct; 36(10):1090-8.
[Nat Genet. 2004]Nat Genet. 2004 Oct; 36(10):1090-8.
[Nat Genet. 2004]Cell. 2000 Jan 7; 100(1):57-70.
[Cell. 2000]Cancer Cell. 2007 Jan; 11(1):37-51.
[Cancer Cell. 2007]N Engl J Med. 2002 Dec 19; 347(25):1999-2009.
[N Engl J Med. 2002]N Engl J Med. 2002 Dec 19; 347(25):1999-2009.
[N Engl J Med. 2002]N Engl J Med. 2002 Dec 19; 347(25):1999-2009.
[N Engl J Med. 2002]N Engl J Med. 2002 Dec 19; 347(25):1999-2009.
[N Engl J Med. 2002]Cell. 2000 Jan 7; 100(1):57-70.
[Cell. 2000]N Engl J Med. 2002 Dec 19; 347(25):1999-2009.
[N Engl J Med. 2002]Nucleic Acids Res. 2005; 33(14):4618-25.
[Nucleic Acids Res. 2005]