Format

Send to

Choose Destination
Mol Cell Proteomics. 2012 Oct;11(10):933-44. Epub 2012 Jul 5.

A proteogenomic survey of the Medicago truncatula genome.

Author information

1
Department of Biochemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA.

Abstract

Peptide sequencing by computational assignment of tandem mass spectra to a database of putative protein sequences provides an independent approach to confirming or refuting protein predictions based on large-scale DNA and RNA sequencing efforts. This use of mass spectrometrically-derived sequence data for testing and refining predicted gene models has been termed proteogenomics. We report herein the application of proteogenomic methodology to a database of 10.9 million tandem mass spectra collected over a period of two years from proteolytically generated peptides isolated from the model legume Medicago truncatula. These spectra were searched against a database of predicted M. truncatula protein sequences generated from public databases, in silico gene model predictions, and a whole-genome six-frame translation. This search identified 78,647 distinct peptide sequences, and a comparison with the publicly available proteome from the recently published M. truncatula genome supported translation of 9,843 existing gene models and identified 1,568 novel peptides suggesting corrections or additions to the current annotations. Each supporting and novel peptide was independently validated using mRNA-derived deep sequencing coverage and an overall correlation of 93% between the two data types was observed. We have additionally highlighted examples of several aspects of structural annotation for which tandem MS provides unique evidence not easily obtainable through typical DNA or RNA sequencing. Proteogenomic analysis is a valuable and unique source of information for the structural annotation of genomes and should be included in such efforts to ensure that the genome models used by biologists mirror as accurately as possible what is present in the cell.

PMID:
22774004
PMCID:
PMC3494139
DOI:
10.1074/mcp.M112.019471
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for HighWire Icon for PubMed Central
Loading ...
Support Center