Send to

Choose Destination
See comment in PubMed Commons below
Anal Bioanal Chem. 2008 Mar;390(5):1261-71. doi: 10.1007/s00216-007-1818-6. Epub 2008 Jan 29.

Assessing and improving the stability of chemometric models in small sample size situations.

Author information

Institute for Analytical Chemistry, Dresden University of Technology, Bergstrasse 66, 01062, Dresden, Germany.


Small sample sizes are very common in multivariate analysis. Sample sizes of 10-100 statistically independent objects (rejects from processes or loading dock analysis, or patients with a rare disease), each with hundreds of data points, cause unstable models with poor predictive quality. Model stability is assessed by comparing models that were built using slightly varying training data. Iterated k-fold cross-validation is used for this purpose. Aggregation stabilizes models. It is possible to assess the quality of the aggregated model without calculating further models. The validation and aggregation methods investigated in this study apply to regression as well as to classification. These techniques are useful for analyzing data with large numbers of variates, e.g., any spectral data like FT-IR, Raman, UV/VIS, fluorescence, AAS, and MS. FT-IR images of tumor tissue were used in this study. Some tissue types occur frequently, while some are very rare. They are classified using LDA. Initial models were severely unstable. Aggregation stabilizes the predictions. The hit rate increased from 67% to 82%.

[Indexed for MEDLINE]
PubMed Commons home

PubMed Commons

How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for Springer
    Loading ...
    Support Center