Send to

Choose Destination
J Proteome Res. 2018 May 4;17(5):1879-1886. doi: 10.1021/acs.jproteome.7b00899. Epub 2018 Apr 16.

A Protein Standard That Emulates Homology for the Characterization of Protein Inference Algorithms.

Author information

Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health , KTH - Royal Institute of Technology , Box 1031 , 17121 Solna , Sweden.
European Molecular Biology Laboratory , European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus , Hinxton, Cambridge CB10 1SD , United Kingdom.
Biological Sciences Division , Pacific Northwest National Laboratory , Richland , Washington 99352 , United States.
Institute for Systems Biology , Seattle , Washington 98109 , United States.
Center for Proteomics and Metabolomics , Leiden University Medical Center , 2300 RC Leiden , The Netherlands.


A natural way to benchmark the performance of an analytical experimental setup is to use samples of known composition and see to what degree one can correctly infer the content of such a sample from the data. For shotgun proteomics, one of the inherent problems of interpreting data is that the measured analytes are peptides and not the actual proteins themselves. As some proteins share proteolytic peptides, there might be more than one possible causative set of proteins resulting in a given set of peptides and there is a need for mechanisms that infer proteins from lists of detected peptides. A weakness of commercially available samples of known content is that they consist of proteins that are deliberately selected for producing tryptic peptides that are unique to a single protein. Unfortunately, such samples do not expose any complications in protein inference. Hence, for a realistic benchmark of protein inference procedures, there is a need for samples of known content where the present proteins share peptides with known absent proteins. Here, we present such a standard, that is based on E. coli expressed human protein fragments. To illustrate the application of this standard, we benchmark a set of different protein inference procedures on the data. We observe that inference procedures excluding shared peptides provide more accurate estimates of errors compared to methods that include information from shared peptides, while still giving a reasonable performance in terms of the number of identified proteins. We also demonstrate that using a sample of known protein content without proteins with shared tryptic peptides can give a false sense of accuracy for many protein inference methods.


benchmark; homology; mass spectrometry; peptide; protein inference; protein standard; proteofom; proteomics; sample of known content

[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for American Chemical Society Icon for PubMed Central
Loading ...
Support Center