Send to

Choose Destination
J Proteome Res. 2018 Dec 7;17(12):4160-4170. doi: 10.1021/acs.jproteome.8b00392. Epub 2018 Sep 17.

Large-Scale Reanalysis of Publicly Available HeLa Cell Proteomics Data in the Context of the Human Proteome Project.

Robin T1,2,3,4, Bairoch A1,4, Müller M5, Lisacek F2,3,6, Lane L1,4.

Author information

CALIPHO Group , SIB Swiss Institute of Bioinformatics, CMU , Rue Michel-Servet 1 , CH-1211 Geneva , Switzerland.
Proteome Informatics Group , SIB Swiss Institute of Bioinformatics, CMU , Rue Michel-Servet 1 , CH-1211 Geneva , Switzerland.
Computer Science Department , University of Geneva , CH-1211 Geneva , Switzerland.
Department of Microbiology and Molecular Medicine, Faculty of Medicine , University of Geneva , CH-1211 Geneva , Switzerland.
Vital-IT Group , SIB Swiss Institute of Bioinformatics, Genopode Building , Quartier Sorge , CH-1015 Lausanne , Switzerland.
Section of Biology , University of Geneva , CH-1211 Geneva , Switzerland.


The practice of data sharing in the proteomics field took off and quickly spread in recent years as a result of collective effort. Nowadays, most journal editors mandate the submission of the original raw mass spectra to one of the databases of the ProteomeXchange consortium. With the exception of large institutional initiatives such as PeptideAtlas or the GPMDB, few new studies are however based on the reanalysis of mass spectrometry data. A wealth of information is thus left unexploited in public databases and repositories. Here, we present the large-scale reanalysis of 41 publicly available data sets corresponding to experiments carried out on the HeLa cancer cell line using a custom workflow. In addition to the search of new post-translational modification sites and "missing proteins", our main goal is to identify single amino acid variants and evaluate their impact on protein expression and stability through the spectral counting quantification approach. The X!Tandem software was selected to perform the search of a total of 56 363 701 tandem mass spectra against a customized variant protein database, compiled by the application of the in-house MzVar tool on HeLa-specific somatic and genomic variants retrieved from the COSMIC cell line project. After filtering the resulting identifications with a 1% FDR threshold computed at the protein level, 49 466 unique peptides were identified in 7266 protein entries, allowing the validation of 5576 protein entries in accordance with the HPP guidelines version 2.1. A new "missing protein" was observed (FRAT2, NX_O75474, chromosome 10), and 189 new phosphorylation and 392 new protein N-terminal acetylation sites could be identified. Twenty-four variant peptides were also identified, corresponding to 21 variants in 21 proteins. For three of the nine heterozygous cases where both the variant peptide and its wild-type counterpart were detected, the application of a two-tailed sign test showed a significant difference in the abundance of the two peptide versions.


HeLa cell line; N-acetylation; bioinformatics; data reanalysis; identification; mass spectrometry; phosphorylation; proteomics; spectral counting; variants

Supplemental Content

Full text links

Icon for American Chemical Society
Loading ...
Support Center