Send to:

Choose Destination
See comment in PubMed Commons below
Bioinformatics. 2007 Aug 15;23(16):2080-7. Epub 2007 Jun 6.

Predicting survival from microarray data--a comparative study.

Author information

  • 1Department of Mathematics, University of Oslo, Norway.



Survival prediction from gene expression data and other high-dimensional genomic data has been subject to much research during the last years. These kinds of data are associated with the methodological problem of having many more gene expression values than individuals. In addition, the responses are censored survival times. Most of the proposed methods handle this by using Cox's proportional hazards model and obtain parameter estimates by some dimension reduction or parameter shrinkage estimation technique. Using three well-known microarray gene expression data sets, we compare the prediction performance of seven such methods: univariate selection, forward stepwise selection, principal components regression (PCR), supervised principal components regression, partial least squares regression (PLS), ridge regression and the lasso.


Statistical learning from subsets should be repeated several times in order to get a fair comparison between methods. Methods using coefficient shrinkage or linear combinations of the gene expression values have much better performance than the simple variable selection methods. For our data sets, ridge regression has the overall best performance.


Matlab and R code for the prediction methods are available at

[PubMed - indexed for MEDLINE]
Free full text
PubMed Commons home

PubMed Commons

How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for HighWire
    Loading ...
    Write to the Help Desk