Format

Send to

Choose Destination
J Proteome Res. 2018 Jan 5;17(1):290-295. doi: 10.1021/acs.jproteome.7b00563. Epub 2017 Nov 2.

PhoStar: Identifying Tandem Mass Spectra of Phosphorylated Peptides before Database Search.

Author information

1
University of Applied Sciences Upper Austria , Bioinformatics Research Group, Softwarepark 11, 4232 Hagenberg, Austria.
2
Research Institute of Molecular Pathology (IMP) , Protein Chemistry, Campus-Vienna-Biocenter 1, 1030 Vienna, Austria.
3
Institute of Molecular Biotechnology (IMBA), Protein Chemistry , Vienna Biocenter (VBC), Dr. Bohr-Gasse 3, 1030 Vienna, Austria.

Abstract

Standard proteomics workflows use tandem mass spectrometry followed by sequence database search to analyze complex biological samples. The identification of proteins carrying post-translational modifications, for example, phosphorylation, is typically addressed by allowing variable modifications in the searched sequences. Accounting for these variations exponentially increases the combinatorial space in the database, which leads to increased processing times and more false positive identifications. The here-presented tool PhoStar identifies spectra that originate from phosphorylated peptides before database search using a supervised machine learning approach. The model for the prediction of phosphorylation was trained and validated with an accuracy of 97.6% on a large set of high-confidence spectra collected from publicly available experimental data. Its power was further validated by predicting phosphorylation in the complete NIST human and mouse high collision-dissociation spectral libraries, achieving an accuracy of 98.2 and 97.9%, respectively. We demonstrate the application of PhoStar by using it for spectra filtering before database search. In database search of HeLa samples the peptide search space was reduced by 27-66% while finding at least 97% of total peptide identifications (at 1% FDR) compared with a standard workflow.

KEYWORDS:

machine learning; mass spectrometry; phosphorylation; post-translational modification; proteomics; random forest classification; search space reduction

PMID:
29057658
DOI:
10.1021/acs.jproteome.7b00563
[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for American Chemical Society
Loading ...
Support Center