Format

Send to

Choose Destination
Anal Chem. 2000 Mar 1;72(5):999-1005.

A statistical basis for testing the significance of mass spectrometric protein identification results.

Author information

1
The Rockefeller University, New York, New York 10021, USA.

Abstract

A method for testing the significance of mass spectrometric (MS) protein identification results is presented. MS proteolytic peptide mapping and genome database searching provide a rapid, sensitive, and potentially accurate means for identifying proteins. Database search algorithms detect the matching between proteolytic peptide masses from an MS peptide map and theoretical proteolytic peptide masses of the proteins in a genome database. The number of masses that matches is used to compute a score, S, for each protein, and the protein that yields the best score is assumed as the identification result. There is a risk of obtaining a false result, because masses determined by MS are not unique; i.e., each mass in a peptide map can match randomly one or several proteins in a genome database. A false result is obtained when the score, S, due to random matching cannot be discerned from the score due to matching with a real protein in the sample. We therefore introduce the frequency function, f(S), for false (random) identification results as a basis for testing at what significance level, alpha, one can reject a null hypothesis, H0: "the result is false". The significance is tested by comparing an experimental score, S(E), with a critical score, S(C), required for a significant result at the level alpha. If S(E) > or = S(C), H0 is rejected. f(S) and S(C) were obtained by simulations utilizing random tryptic peptide maps generated from a genome database. The critical score, S(C), was studied as a function of the number of masses in the peptide map, the mass accuracy, the degree of incomplete enzymatic cleavage, the protein mass range, and the size of the genome. With S(C) known for a variety of experimental constraints, significance testing can be fully automated and integrated with database searching software used for protein identification.

PMID:
10739204
DOI:
10.1021/ac990792j
[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for American Chemical Society
Loading ...
Support Center