Format

Send to

Choose Destination
Genome Res. 2019 Mar;29(3):485-493. doi: 10.1101/gr.235028.118. Epub 2019 Jan 4.

PepQuery enables fast, accurate, and convenient proteomic validation of novel genomic alterations.

Wen B1,2, Wang X1,2, Zhang B1,2.

Author information

1
Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas 77030, USA.
2
Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA.

Abstract

Massively parallel or second-generation sequencing-based genomic studies continuously identify new genomic alterations that may lead to novel protein sequences, which are attractive candidates for disease biomarkers and therapeutic targets after proteomic validation. Integrative proteogenomic methods have been developed to use mass spectrometry (MS)-based proteomics data for such validation. These methods replace the reference sequence database in proteomic database searching with a customized protein database that incorporates sample- or disease-specific sequences derived from DNA or RNA sequencing, thus enabling the identification of novel protein sequences. Although useful, this spectrum-centric approach requires a full evaluation of all possible spectrum-peptide pairs, which is time-consuming, error-prone, and difficult to apply. Here, we present PepQuery, a peptide-centric approach that focuses on only novel DNA or protein sequences of interest. PepQuery allows quick and easy proteomic validation of genomic alterations without customized database construction. We demonstrated the sensitivity and specificity of the approach in validating completely novel proteins, novel splice junctions, and single amino acid variants using simulations and experimental data. Notably, enabling unrestricted modification searching in PepQuery reduced false positives by up to 95%. We implemented PepQuery as both web-based and stand-alone applications. The web version provides direct access to more than half a billion MS/MS spectra from the Clinical Proteomic Tumor Analysis Consortium (CPTAC) and other cancer proteomic studies. The stand-alone version supports batch analysis and user-provided MS/MS data. PepQuery will increase the usage of proteogenomics beyond the proteomics community and will broaden the application of proteogenomics in personalized medicine.

PMID:
30610011
DOI:
10.1101/gr.235028.118

Supplemental Content

Full text links

Icon for HighWire
Loading ...
Support Center