Send to

Choose Destination
Proteins. 2000 Oct 1;41(1):98-107.

Practical limits of function prediction.

Author information

Protein Design Group, CNB-CSIC, Madrid, Spain.


The widening gap between known protein sequences and their functions has led to the practice of assigning a potential function to a protein on the basis of sequence similarity to proteins whose function has been experimentally investigated. We present here a critical view of the theoretical and practical bases for this approach. The results obtained by analyzing a significant number of true sequence similarities, derived directly from structural alignments, point to the complexity of function prediction. Different aspects of protein function, including (i) enzymatic function classification, (ii) functional annotations in the form of key words, (iii) classes of cellular function, and (iv) conservation of binding sites can only be reliably transferred between similar sequences to a modest degree. The reason for this difficulty is a combination of the unavoidable database inaccuracies and the plasticity of protein function. In addition, analysis of the relationship between sequence and functional descriptions defines an empirical limit for pairwise-based functional annotations, namely, the three first digits of the six numbers used as descriptors of protein folds in the FSSP database can be predicted at an average level as low as 7.5% sequence identity, two of the four EC digits at 15% identity, half of the SWISS-PROT key words related to protein function would require 20% identity, and the prediction of half of the residues in the binding site can be made at the 30% sequence identity level.

[Indexed for MEDLINE]

Supplemental Content

Loading ...
Support Center