Send to

Choose Destination
J Proteome Res. 2015 Nov 6;14(11):4450-62. doi: 10.1021/pr501244v. Epub 2015 Oct 13.

De Novo Sequencing of Peptides from Top-Down Tandem Mass Spectra.

Author information

Algorithmic Biology Laboratory, Saint Petersburg Academic University , 8/3 Khlopina Str, Saint Petersburg 194021, Russia.
Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University , 7-9 Universitetskaya nab., Saint Petersburg 199034, Russia.
Department of Chemistry and Biochemistry, University of Oklahoma , 101 Stephenson Pkwy, Norman, Oklahoma 73019, United States.
Department of Neurology, Erasmus University Medical Center , Postbus 2040, 3000 CA Rotterdam, The Netherlands.
Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis , 535 West Michigan Street, IT 475, Indianapolis, Indiana 46202, United States.
Center for Computational Biology and Bioinformatics, Indiana University School of Medicine , 410 West 10th Street, Suite 5000, Indianapolis, Indiana 46202, United States.
Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory , Richland, Washington 99352, United States.
Department of Computer Science and Engineering, University of California, San Diego , 9500 Gilman Drive, La Jolla, California 92093, United States.


De novo sequencing of proteins and peptides is one of the most important problems in mass spectrometry-driven proteomics. A variety of methods have been developed to accomplish this task from a set of bottom-up tandem (MS/MS) mass spectra. However, a more recently emerged top-down technology, now gaining more and more popularity, opens new perspectives for protein analysis and characterization, implying a need for efficient algorithms to process this kind of MS/MS data. Here, we describe a method that allows for the retrieval, from a set of top-down MS/MS spectra, of long and accurate sequence fragments of the proteins contained in the sample. To this end, we outline a strategy for generating high-quality sequence tags from top-down spectra, and introduce the concept of a T-Bruijn graph by adapting to the case of tags the notion of an A-Bruijn graph widely used in genomics. The output of the proposed approach represents the set of amino acid strings spelled out by optimal paths in the connected components of a T-Bruijn graph. We illustrate its performance on top-down data sets acquired from carbonic anhydrase 2 (CAH2) and the Fab region of alemtuzumab.


T-Bruijn graph; de novo sequencing; top-down mass spectrometry

[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for American Chemical Society
Loading ...
Support Center