Matching cross-linked peptide spectra: only as good as the worse identification

Mol Cell Proteomics. 2014 Feb;13(2):420-34. doi: 10.1074/mcp.M113.034009. Epub 2013 Dec 12.

Abstract

Chemical cross-linking mass spectrometry identifies interacting surfaces within a protein assembly through labeling with bifunctional reagents and identifying the covalently modified peptides. These yield distance constraints that provide a powerful means to model the three-dimensional structure of the assembly. Bioinformatic analysis of cross-linked data resulting from large protein assemblies is challenging because each cross-linked product contains two covalently linked peptides, each of which must be correctly identified from a complex matrix of potential confounders. Protein Prospector addresses these issues through a complementary mass modification strategy in which each peptide is searched and identified separately. We demonstrate this strategy with an analysis of RNA polymerase II. False discovery rates (FDRs) are assessed via comparison of cross-linking data to crystal structure, as well as by using a decoy database strategy. Parameters that are most useful for positive identification of cross-linked spectra are explored. We find that fragmentation spectra generally contain more product ions from one of the two peptides constituting the cross-link. Hence, metrics reflecting the quality of the spectral match to the less confident peptide provide the most discriminatory power between correct and incorrect matches. A support vector machine model was built to further improve classification of cross-linked peptide hits. Furthermore, the frequency with which peptides cross-linked via common acylating reagents fragment to produce diagnostic, cross-linker-specific ions is assessed. The threshold for successful identification of the cross-linked peptide product depends upon the complexity of the sample under investigation. Protein Prospector, by focusing the reliability assessment on the least confident peptide, is better able to control the FDR for results as larger complexes and databases are analyzed. In addition, when FDR thresholds are calculated separately for intraprotein and interprotein results, a further improvement in the number of unique cross-links confidently identified is achieved. These improvements are demonstrated on two previously published cross-linking datasets.

Publication types

  • Evaluation Study
  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms*
  • Amino Acid Sequence
  • Combinatorial Chemistry Techniques / methods
  • Computational Biology
  • Cross-Linking Reagents / pharmacology*
  • Databases, Protein / standards
  • Humans
  • Mass Spectrometry / methods*
  • Models, Molecular
  • Peptide Fragments / chemistry
  • Peptide Fragments / metabolism
  • Protein Binding
  • Protein Interaction Mapping / methods*
  • Proteins / analysis*
  • Proteins / metabolism
  • Reproducibility of Results
  • Research Design

Substances

  • Cross-Linking Reagents
  • Peptide Fragments
  • Proteins