Nat Biotechnol. 2014 Dec;32(12):1202-12. doi: 10.1038/nbt.2877. Epub 2014 Jun 1.
A community effort to assess and improve drug sensitivity prediction algorithms.
Costello JC1,
Heiser LM2,
Georgii E3,
Gönen M4,
Menden MP5,
Wang NJ6,
Bansal M7,
Ammad-ud-din M4,
Hintsanen P8,
Khan SA4,
Mpindi JP8,
Kallioniemi O8,
Honkela A9,
Aittokallio T8,
Wennerberg K8;
NCI DREAM Community,
Collins JJ10,
Gallahan D11,
Singer D11,
Saez-Rodriguez J5,
Kaski S12,
Gray JW6,
Stolovitzky G13.
Abbuehl JP14, Aittokallio T8, Allen J15, Altman RB16, Ammad-ud-din M4, Balcome S17, Bansal M7, Battle A18, Bender A19, Berger B20, Bernard J14, Bhattacharjee M21, Bhuvaneshwar K22, Bieberich AA23, Boehm F24, Califano A7, Chan C25, Chen B15, Chen TH26, Choi J27, Coelho LP28, Cokelaer T5, Collins JC10, Costello JC29, Creighton CJ30, Cui J31, Dampier W32, Davisson VJ23, De Baets B33, Deshpande R17, DiCamillo B34, Dundar M35, Duren Z36, Ertel A37, Fan H24, Fang H38, Gallahan D11, Gauba R22, Georgii E4, Gönen M4, Gottlieb A16, Grau M39, Gray JW6, Gusev Y22, Ha MJ26, Han L40, Harris M22, Heiser LM6, Henderson N24, Hejase HA41, Hintsanen P8, Homicsko K14, Honkela A9, Hou JP42, Hwang W27, IJzerman AP43, Kallioniemi O8, Karacali B44, Kaski S12, Keles S24, Kendziorski C24, Khan SA4, Kim J27, Kim M15, Kim Y45, Knowles DA18, Koller D18, Lee J46, Lee JK45, Lenselink EB43, Li B47, Li B31, Li J48, Liang H49, Ma J42, Madhavan S50, Menden MP5, Mooney S47, Mpindi JP8, Myers CL17, Newton MA24, Overington JP51, Pal R52, Peng J20, Pestell R32, Prill RJ53, Qiu P54, Rajwa B55, Sadanandam A14, Saez-Rodriguez J5, Sambo F34, Shin H31, Singer D11, Song J56, Song L22, Sridhar A57, Stock M33, Stolovitzky G13, Sun W26, Ta T24, Tadesse M58, Tan M38, Tang H15, Theodorescu D59, Toffolo GM34, Tozeren A32, Trepicchio W31, Varoquaux N60, Vert JP60, Waegeman W33, Walter T60, Wan Q52, Wang D50, Wang NJ6, Wang W17, Wang Y36, Wang Z24, Wegner JK61, Wennerberg K8, Wu T62, Xia T17, Xiao G15, Xie Y15, Xu Y63, Yang J15, Yuan Y49, Zhang S36, Zhang XS36, Zhao J36, Zuo C24, van Vlijmen HW61, van Westen GJ51.
- 1
- 1] Howard Hughes Medical Institute, Boston University, Boston, Massachusetts, USA. [2] Department of Biomedical Engineering, Boston University, Boston, Massachusetts, USA. [3] [4].
- 2
- 1] Department of Biomedical Engineering, Oregon Health and Science University, Portland, Oregon, USA. [2].
- 3
- 1] Helsinki Institute for Information Technology HIIT, Department of Information and Computer Science, Aalto University, Espoo, Finland. [2].
- 4
- Helsinki Institute for Information Technology HIIT, Department of Information and Computer Science, Aalto University, Espoo, Finland.
- 5
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, UK.
- 6
- Department of Biomedical Engineering, Oregon Health and Science University, Portland, Oregon, USA.
- 7
- Department of Systems Biology, Center for Computational Biology and Bioinformatics, Columbia University, New York, New York, USA.
- 8
- Institute for Molecular Medicine Finland FIMM, University of Helsinki, Helsinki, Finland.
- 9
- Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland.
- 10
- 1] Howard Hughes Medical Institute, Boston University, Boston, Massachusetts, USA. [2] Department of Biomedical Engineering, Boston University, Boston, Massachusetts, USA. [3] Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, Massachusetts, USA.
- 11
- National Cancer Institute, National Institutes of Health, Bethesda, Maryland, USA.
- 12
- 1] Helsinki Institute for Information Technology HIIT, Department of Information and Computer Science, Aalto University, Espoo, Finland. [2] Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland.
- 13
- IBM T.J. Watson Research Center, IBM, Yorktown Heights, New York, USA.
- 14
- Swiss Institute for Experimental Cancer Research (ISREC), Swiss Federal Institute of Technology Lausanne (EPFL), Lausanne, Switzerland.
- 15
- Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, Dallas, Texas, USA.
- 16
- Departments of Genetics and Bioengineering, Stanford University, Stanford, California, USA.
- 17
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, Minnesota, USA.
- 18
- Department of Computer Science, Stanford University, Palo Alto, California, USA.
- 19
- Unilever Centre, Cambridge University, Cambridge, UK.
- 20
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, Massachusetts, USA.
- 21
- 1] Department of Statistics, University of Pune, Pune, India. [2] School of Mathematics and Statistics, University of Hyderabad, Hyderabad, India.
- 22
- Innovation Center for Biomedical Informatics, Georgetown University Medical Center, Washington, DC, USA.
- 23
- Department of Medicinal Chemistry and Molecular Pharmacology, Purdue University, W. Lafayette, Indiana, USA.
- 24
- 1] Department of Statistics, University of Wisconsin, Madison, Wisconsin, USA. [2] Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, Wisconsin, USA.
- 25
- 1] Department of Computer Science and Engineering, Michigan State University, East Lansing, Michigan, USA. [2] Department of Chemical Engineering and Materials Science, Michigan State University, East Lansing, Michigan, USA. [3] Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan, USA.
- 26
- Department of Biostatistics, University of North Carolina, Chapel Hill, North Carolina, USA.
- 27
- Korea Advanced Institute of Science and Technology, Daejeon, Korea.
- 28
- Instituto de Medicina Molecular, Faculdade de Medicina da Universidade de Lisboa, Lisbon, Portugal.
- 29
- 1] Howard Hughes Medical Institute, Boston University, Boston, Massachusetts, USA. [2] Department of Biomedical Engineering, Boston University, Boston, Massachusetts, USA. [3].
- 30
- Department of Medicine, Dan L. Duncan Center Division of Biostatistics, Baylor College of Medicine, Houston, Texas, USA.
- 31
- Translational Medicine, Millennium Pharmaceuticals, Cambridge, Massachusetts, USA.
- 32
- Center for Integrated Bioinformatics, Drexel University, Philadelphia, Pennsylvania, USA.
- 33
- Department of Mathematical Modelling, Statistics and Bioinformatics, Ghent University, Ghent, Belgium.
- 34
- Department of Information Engineering, University of Padova, Padova, Italy.
- 35
- Computer and Information Science Department, IUPUI, Indianapolis, Indiana, USA.
- 36
- National Center for Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.
- 37
- Jefferson Kimmel Cancer Center, Drexel University, Philadelphia, Pennsylvania, USA.
- 38
- Department of Biostatistics, Bioinformatics and Biomathematics, Georgetown University Medical Center, Washington, DC, USA.
- 39
- Department of Physics, University of Marburg, Marburg, Germany.
- 40
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA.
- 41
- Department of Computer Science and Engineering, Michigan State University, East Lansing, Michigan, USA.
- 42
- Department of Bioengineering and Institute for Genomic Biology, University of Illinois, Champaign-Urbana, Illinois, USA.
- 43
- Leiden Academic Center for Drug Research, University of Leiden, Leiden, Netherlands.
- 44
- Izmir Institute of Technology, Izmir, Turkey.
- 45
- Division of Biostatistics, University of Virginia School of Medicine, Charlottesville, Virginia, USA.
- 46
- 1] Korea Advanced Institute of Science and Technology, Daejeon, Korea. [2] Korea Institute of Science and Technology Information, Daejeon, Korea.
- 47
- Buck Institute, Novato, California, USA.
- 48
- 1] Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA. [2] CAS-MPG Partner Institute for Computational Biology, Key Laboratory of Computational Biology, Shanghai Institute for Biological Sciences, Chinese Academy of Sciences, Shanghai, P.R. China.
- 49
- 1] Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA. [2] Graduate Program in Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine, Houston, Texas, USA.
- 50
- 1] Innovation Center for Biomedical Informatics, Georgetown University Medical Center, Washington, DC, USA. [2] Department of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, DC, USA.
- 51
- ChEMBL Group, The EMBL-European Bioinformatics Institute, Cambridge, UK.
- 52
- Electrical and Computer Engineering, Texas Tech University, Lubbock, Texas, USA.
- 53
- IBM Almaden Research Center, IBM Almaden Research Center, San Jose, California, USA.
- 54
- Department of Bioinformatics and Computational Biology, University of Texas MD Anderson Cancer Center, Houston, Texas, USA.
- 55
- Bindley Bioscience Center, Purdue University, W. Lafayette, Indiana, USA.
- 56
- Department of Animal and Avian Science, University of Maryland, College Park, Maryland, USA.
- 57
- Embedded Systems Laboratory (ESL), Institute of Electrical Engineering, Swiss Federal Institute of Technology Lausanne (EPFL), Lausanne, Switzerland.
- 58
- Department of Mathematics and Statistics, Georgetown University, Washington, DC, USA.
- 59
- The University of Colorado Cancer Center, University of Colorado School of Medicine, Aurora, Colorado, USA.
- 60
- 1] Centre for Computational Biology, Mines ParisTech, Fontainebleau, France. [2] Institut Curie, Paris, France. [3] INSERM U900, Paris, France.
- 61
- Janssen Pharmaceutica, Beerse, Belgium.
- 62
- Department of Biostatistics and Computational Biology, Rochester University Medical Center, Rochester, New York, USA.
- 63
- 1] Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA. [2] Department of Statistics, Rice University, Houston, Texas, USA.
Abstract
Predicting the best treatment strategy from genomic information is a core goal of precision medicine. Here we focus on predicting drug response based on a cohort of genomic, epigenomic and proteomic profiling data sets measured in human breast cancer cell lines. Through a collaborative effort between the National Cancer Institute (NCI) and the Dialogue on Reverse Engineering Assessment and Methods (DREAM) project, we analyzed a total of 44 drug sensitivity prediction algorithms. The top-performing approaches modeled nonlinear relationships and incorporated biological pathway information. We found that gene expression microarrays consistently provided the best predictive power of the individual profiling data sets; however, performance was increased by including multiple, independent data sets. We discuss the innovations underlying the top-performing methodology, Bayesian multitask MKL, and we provide detailed descriptions of all methods. This study establishes benchmarks for drug sensitivity prediction and identifies approaches that can be leveraged for the development of new methods.
Figure 1
The NCI-DREAM drug sensitivity challenge. (a) Six genomic, epigenomic, and proteomic profiling data sets were generated for 53 breast cancer cell lines, which were previously described. Drug responses as measured by growth inhibition were assessed after treating the 53 cell lines with 28 drugs. Participants were supplied with all six profiling data sets and dose-response data for 35 cell lines and all 28 compounds (training set). Cell line names were released, but drug names were anonymized. The challenge was to predict the response (ranking from most sensitive to most resistant) for the 18 held-out cell lines (test set). The training and test cell lines were balanced for cancer subtype, dynamic range and missing values (). Submissions were scored on their weighted average performance on ranking the 18 cell lines for 28 compounds. (b) Dose-response values for the training and test cell lines displayed as heatmaps.
Nat Biotechnol. 2014 Dec;32(12):1202-1212.
Figure 2
Evaluation of individual drug sensitivity prediction algorithms. Prediction algorithms (n = 44) are indexed according to . (a) Team performance was evaluated using the weighted, probabilistic concordance index (wpc-index), which accounts for the experimental variation measured across cell lines and between compounds. Overall team ranks are listed on top of each bar. The gray line represents the mean random prediction score. (b,c) Robustness analysis was performed by randomly masking 10% of the test data set for 10,000 iterations. Performing this procedure repeatedly generates a distribution of wpc-index scores for each team (b). Additionally, after each iteration, teams were re-ranked to create a distribution of rank orders (c). The top two teams were reliably ranked the best and second-best performers (one-sided, Wilcoxon signed-rank test for b and c, FDR « 10−10).
Nat Biotechnol. 2014 Dec;32(12):1202-1212.
Figure 3
The method implemented by the best performing team. (a) In addition to the six profiling data sets, three different categories of data views were compiled using prior biological knowledge, yielding in total 22 genomic views of each cell line. (b) Bayesian multitask MKL combines nonlinear regression, multiview learning, multitask learning and Bayesian inference. Nonlinear regression: response values were computed not directly from the input features but from kernels, which define similarity measures between cell lines. Each of the K data views was converted into an N×N kernel matrix Kk (k = 1,…,K), where N is the number of training cell lines. Specifically, the Gaussian kernel was used for real-valued data, and the Jaccard similarity coefficient for binary-valued data. Multiview learning: a combined kernel matrix K* was constructed as a weighted sum of the view-specific kernel matrices Kk, k = 1,…,K. The kernel weights were obtained by multiple kernel learning. Multitask learning: training was performed for all drugs simultaneously, sharing the kernel weights across drugs but allowing for drug-specific regression parameters, which for each drug consisted of a weight vector for the training cell lines and an intercept term. Bayesian inference: the model parameters were assumed to be random variables that follow specific probability distributions. Instead of learning point estimates for model parameters, the parameters of these distributions were learned using a variational approximation scheme.
Nat Biotechnol. 2014 Dec;32(12):1202-1212.
Figure 4
Performance comparison of data set views. The top-performing method, Bayesian multitask MKL, and an elastic net predictor were trained on (a) the original profiling data sets, (b) computed views, (c) groups of data views, and (d) the fully integrated set of all data views. Boxplots represent the distribution of 50 random simulations matching the NCI-DREAM challenge parameters, where whiskers indicate the upper and lower range limit, and the black line, the median. (b) The computed views were derived from gene sets, combined data sets, calculated as the product of values between data sets, and discretizing continuous measures into binary values. (c) Data view groups were defined as all views derived from one profiling data set. (d) For Bayesian multitask MKL, the integration of all data views achieves the best performance. Gene expression is the most predictive profiling data set, slightly outperformed by gene set views of expression data and the integration of original and gene set expression data.
Nat Biotechnol. 2014 Dec;32(12):1202-1212.
Figure 5
Prediction performance on individual drugs. Prediction algorithms are indexed and colored according to . (a) The heatplot illustrates participant performance on individual drugs, grouped by drug class (values can be found in ). Drug weights, which take into account the number of missing values and the noise in the −log10(GI50) measurements, are displayed at the top of the heatplot. Team submissions are ordered according to their overall performance from best performer at the top of the list. (b) The dynamic range of drugs across all cell lines was compared to the median team score. The node size reflects the number of distinct −log10(GI50) values for each drug across all 53 cell lines. The node colors reflect mode-of-action classes. The gray horizontal line is the mean score of random predictions and the vertical gray line separates low dynamic range (<2) from high dynamic range (>2), where dynamic range for a drug is the maximum −log10(GI50) − minimum −log10(GI50). (c) The distribution of team scores (n = 44) for individual drugs was compared to the null model of random predictions (gray line where pc-index = 0.5). The red points correspond to the maximum possible pc index (pc index of gold standard in the test data). On average, 21/28 drugs performed better than the null model; using the Kolmogorov-Smirnov test, 16/28 drugs were significantly better than the null model (*FDR < 0.05; **FDR < 0.01; ***FDR < 0.001).
Nat Biotechnol. 2014 Dec;32(12):1202-1212.
Publication types
MeSH terms
Substance
Secondary source ID
Grant support
Full Text Sources
Other Literature Sources
Medical