NCC-AUC: an AUC optimization method to identify multi-biomarker panel for cancer prognosis from genomic and clinical data

Bioinformatics. 2015 Oct 15;31(20):3330-8. doi: 10.1093/bioinformatics/btv374. Epub 2015 Jun 18.

Abstract

Motivation: In prognosis and survival studies, an important goal is to identify multi-biomarker panels with predictive power using molecular characteristics or clinical observations. Such analysis is often challenged by censored, small-sample-size, but high-dimensional genomic profiles or clinical data. Therefore, sophisticated models and algorithms are in pressing need.

Results: In this study, we propose a novel Area Under Curve (AUC) optimization method for multi-biomarker panel identification named Nearest Centroid Classifier for AUC optimization (NCC-AUC). Our method is motived by the connection between AUC score for classification accuracy evaluation and Harrell's concordance index in survival analysis. This connection allows us to convert the survival time regression problem to a binary classification problem. Then an optimization model is formulated to directly maximize AUC and meanwhile minimize the number of selected features to construct a predictor in the nearest centroid classifier framework. NCC-AUC shows its great performance by validating both in genomic data of breast cancer and clinical data of stage IB Non-Small-Cell Lung Cancer (NSCLC). For the genomic data, NCC-AUC outperforms Support Vector Machine (SVM) and Support Vector Machine-based Recursive Feature Elimination (SVM-RFE) in classification accuracy. It tends to select a multi-biomarker panel with low average redundancy and enriched biological meanings. Also NCC-AUC is more significant in separation of low and high risk cohorts than widely used Cox model (Cox proportional-hazards regression model) and L1-Cox model (L1 penalized in Cox model). These performance gains of NCC-AUC are quite robust across 5 subtypes of breast cancer. Further in an independent clinical data, NCC-AUC outperforms SVM and SVM-RFE in predictive accuracy and is consistently better than Cox model and L1-Cox model in grouping patients into high and low risk categories.

Conclusion: In summary, NCC-AUC provides a rigorous optimization framework to systematically reveal multi-biomarker panel from genomic and clinical data. It can serve as a useful tool to identify prognostic biomarkers for survival analysis.

Availability and implementation: NCC-AUC is available at http://doc.aporc.org/wiki/NCC-AUC.

Contact: ywang@amss.ac.cn

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Area Under Curve*
  • Biomarkers / analysis*
  • Breast Neoplasms / diagnosis*
  • Breast Neoplasms / genetics
  • Breast Neoplasms / mortality
  • Carcinoma, Non-Small-Cell Lung / diagnosis*
  • Carcinoma, Non-Small-Cell Lung / genetics
  • Carcinoma, Non-Small-Cell Lung / mortality
  • Data Interpretation, Statistical*
  • Female
  • Gene Expression Profiling
  • Gene Expression Regulation, Neoplastic
  • Genomics / methods*
  • Humans
  • Lung Neoplasms / diagnosis*
  • Lung Neoplasms / genetics
  • Lung Neoplasms / mortality
  • Models, Biological
  • Pattern Recognition, Automated
  • Prognosis
  • Proportional Hazards Models
  • Support Vector Machine
  • Survival Rate
  • Systems Biology
  • Systems Integration

Substances

  • Biomarkers