Statistical validation of peptide identifications in large-scale proteomics using the target-decoy database search strategy and flexible mixture modeling

Hyungwon Choi; Debashis Ghosh; Alexey I Nesvizhskii

doi:10.1021/pr7006818

Statistical validation of peptide identifications in large-scale proteomics using the target-decoy database search strategy and flexible mixture modeling

J Proteome Res. 2008 Jan;7(1):286-92. doi: 10.1021/pr7006818. Epub 2007 Dec 14.

Authors

Hyungwon Choi¹, Debashis Ghosh, Alexey I Nesvizhskii

Affiliation

¹ Department of Pathology and Biostatistics, University of Michigan, Ann Arbor, Michigan, USA.

PMID: 18078310
DOI: 10.1021/pr7006818

Abstract

Reliable statistical validation of peptide and protein identifications is a top priority in large-scale mass spectrometry based proteomics. PeptideProphet is one of the computational tools commonly used for assessing the statistical confidence in peptide assignments to tandem mass spectra obtained using database search programs such as SEQUEST, MASCOT, or X! TANDEM. We present two flexible methods, the variable component mixture model and the semiparametric mixture model, that remove the restrictive parametric assumptions in the mixture modeling approach of PeptideProphet. Using a control protein mixture data set generated on an linear ion trap Fourier transform (LTQ-FT) mass spectrometer, we demonstrate that both methods improve parametric models in terms of the accuracy of probability estimates and the power to detect correct identifications controlling the false discovery rate to the same degree. The statistical approaches presented here require that the data set contain a sufficient number of decoy (known to be incorrect) peptide identifications, which can be obtained using the target-decoy database search strategy.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Complex Mixtures*
Databases, Protein
Information Storage and Retrieval*
Models, Statistical
Peptides / analysis*
Proteomics / methods*
Software
Tandem Mass Spectrometry*

Substances

Complex Mixtures
Peptides

Abstract

Publication types

MeSH terms

Substances

Grants and funding