Display Settings:

Format

Send to:

Choose Destination

    Anal Chim Acta. 2007 Jun 5;592(2):210-7. Epub 2007 Apr 27.

    Assessing the statistical validity of proteomics based biomarkers.

    Smit S, van Breemen MJ, Hoefsloot HC, Smilde AK, Aerts JM, de Koster CG.

    Swammerdam Institute for Life Sciences, Universiteit van-Amsterdam, Nieuwe Achtergracht 166, 1018 WV Amsterdam, The Netherlands.

    A strategy is presented for the statistical validation of discrimination models in proteomics studies. Several existing tools are combined to form a solid statistical basis for biomarker discovery that should precede a biochemical validation of any biomarker. These tools consist of permutation tests, single and double cross-validation. The cross-validation steps can simply be combined with a new variable selection method, called rank products. The strategy is especially suited for the low-samples-to-variables-ratio (undersampling) case, as is often encountered in proteomics and metabolomics studies. As a classification method, principal component discriminant analysis is used; however, the methodology can be used with any classifier. A dataset containing serum samples from Gaucher patients and healthy controls serves as a test case. Double cross-validation shows that the sensitivity of the model is 89% and the specificity 90%. Potential putative biomarkers are identified using the novel variable selection method. Results from permutation tests support the choice of double cross-validation as the tool for determining error rates when the modelling procedure involves a tuneable parameter. This shows that even cross-validation does not guarantee unbiased results. The validation of discrimination models with a combination of permutation tests and double cross-validation helps to avoid erroneous results which may result from the undersampling.

    PMID: 17512828 [PubMed - indexed for MEDLINE]

    Supplemental Content

    Click here to read