Send to

Choose Destination
Methods Inf Med. 2005;44(3):438-43.

Molecular diagnosis. Classification, model selection and performance evaluation.

Author information

Max Planck Institute for Molecular Genetics, Computational Diagnostics Group, Ihnestrasse 63-73, 14195 Berlin, Germany.



We discuss supervised classification techniques applied to medical diagnosis based on gene expression profiles. Our focus lies on strategies of adaptive model selection to avoid overfitting in high-dimensional spaces.


We introduce likelihood-based methods, classification trees, support vector machines and regularized binary regression. For regularization by dimension reduction, we describe feature selection methods: feature filtering, feature shrinkage and wrapper approaches. In small sample-size situations efficient methods of data re-use are needed to assess the predictive power of a model. We discuss two issues in using cross-validation: the difference between in-loop and out-of-loop feature selection, and estimating model parameters in nested-loop cross-validation.


Gene selection does not reduce the dimensionality of the model. Tuning parameters enable adaptive model selection. The feature selection bias is a common pitfall in performance evaluation. Model selection and performance evaluation can be combined by nested-loop cross-validation.


Classification of microarrays is prone to overfitting. A rigorous and unbiased assessment of the predictive power of the model is a must.

[Indexed for MEDLINE]

Supplemental Content

Loading ...
Support Center