A measure of success in fold recognition

Aron Marchler-Bauer, Stephen H. Bryant

Trends in Biochemical Sciences 1997 22: 236-240

(reproduced with permission)

Prediction of protein structure by fold recognition, or threading, was recently put to the test in a "blind" structure prediction experiment, CASP2. Thirty-two teams from around the world participated, preparing predictions for 22 different "target" proteins whose structures were soon to be determined. As experimental structures became available we, as organizers of the threading competition, computed objective measures of fold recognition specificity and model accuracy, to identify and characterize successful predictions. Here we present a brief summary of these prediction evaluations, a tally of "correct" predictions and a discussion of factors associated with correct predictions. We find that threading produced specific recognition and accurate models whenever the structural database contained a template spanning a large fraction of target sequence. Presence of conserved sequence motifs was helpful, but not required, and it would appear that threading can succeed whenever similarity to a known structure is sufficiently extensive.

The past few years have been a time of active research in computational methods for protein threading, or fold recognition. This work is motivated by biologist's need for the greatest possible sensitivity in sequence database searches, which increasingly serve as a means to identify molecular function by inference from that of homologs. Threading replaces a database of protein sequences with a database of known 3-dimensional structures. Since 3-dimensional structure is highly conserved in protein evolution, threading may identify a homolog even when sequence similarity per se has dropped into the "twilight zone" of detectability. Indeed, in a few cases threading searches have already identified remote homologs when other methods failed[1 ,2]. Computational biologists are also interested in threading as a means to make progress on the long standing problem of protein structure prediction. Homology modeling, or prediction based on sequence alignment with a known structure, is known to produce accurate models when sequences are similar enough to correctly align[3]. If threading can generate accurate sequence-structure alignments when sequence similarity is low or undetectable, it may extend prediction by homology to a greater proportion of new protein sequences.

Structure prediction workshops

Threading algorithms are a complex matter, of course, and evaluation of their performance even more complex. Furthermore, in their efforts to characterize threading methods authors have generally employed different test sets and accuracy criteria, and it is difficult to make comparisons (for reviews see 4,5,6, 7,8). To provide objective measures of success and to identify the best prediction methods John Moult and colleagues have therefore organized 2 workshops on "Critical Assessment of Structure Prediction", CASP1[9,10, 11] and CASP2[12, 13,14,15]. In advance of the workshops groups working on threading and other methods were asked to make predictions for sequences whose structures were as yet unknown, but were expected to be determined soon. At the workshops these "blind" structure predictions were compared to experimental results, using the best evaluation criteria that the meeting organizers and independent assessors could devise. Needless to say, these "community experiments" have generated a certain competitive spirit among prediction groups and, one might hope, some contribution to progress in fold recognition!

CASP1, held in Asilomar, California in December 1994, showed several examples of successful fold recognition, though it also raised questions about the accuracy of sequence-structure alignments[16]. For CASP2, held in Asilomar in December 1996, we as organizers of the threading "competition" have attempted to address this latter issue by providing quantitative evaluation of both fold recognition specificity and threading model accuracy. To make this possible we asked prediction teams to submit predictions (for all targets showing no obvious sequence similarity to known structures) in a pre-defined, machine-readable format. When the experimental structures became available we then computed automatically several measures of recognition specificity and model accuracy, and presented these results for analysis and discussion at the workshop. A detailed assessment of CASP2 results will appear this fall in a special issue of Proteins: Structure, Function and Genetics, including a comparative assessment of different group's results by Michael Levitt, the "judge" of the threading competition. Here we provide a brief summary, and with all due respect to the prediction teams we avoid entirely any naming of "winners" or "losers". Instead we address a question of perhaps more general interest: How successful were fold recognition techniques as a whole, in blind prediction, and what progress has been made in the two years since CASP1?

The short answer, as the reader will see, is that threading methods have produced accurate models for all CASP2 targets showing sufficient similarity to a known structure in the database. This is clear evidence that threading can succeed in the "twilight zone" of sequence similarity, and furthermore that these methods have progressed with respect to alignment accuracy since CASP1. It would seem that the CASP2 experiment has indeed provided a measure of success in threading prediction.

A Large-Scale Community Experiment

The most important ingredient for CASP2, naturally, was a set of target sequences whose structures were soon to be determined. Here we cannot thank enough the crystallographers and NMR spectroscopists who advised us of proteins under investigation in their laboratories and later, for the evaluation process, provided atomic coordinates, often in advance of publication. With these group's generous assistance we were able to assemble 22 suitable threading targets, each showing no significant similarity by BLAST[17], and other pairwise sequence comparison methods to structures then in the Protein Data Bank[18]. Fifteen of these structures were determined in time for the workshop and formed the sample for which blind prediction results could be evaluated.

An equally important ingredient for CASP2 was a "standard of truth", a way to identify, for the evaluation process, those targets which indeed possess a fold similar to that of structures in the database. Here we must thank groups who provided in a timely and machine-readable form the results of 3 different structure-structure comparison methods, DALI[19], SSAP[20] and VAST[21], which collectively served as the "jury" for determination of structural similarities. Though these methods differ, this jury was nonetheless essentially unanimous in identifying 7 of the 15 targets as having strong similarity to one or more known structures. As shown in figure 1, these 7 structures also provided a fortunate test set for titration of fold recognition performance. The targets ranged from "easy" to "medium" to "hard", based on the nature and extent of their similarity to structures in the database.

Figure 1

24] diagrams of the 7 threading targets with structural similarity to other proteins in the database. These are labeled according to identification codes used by CASP2 predictors, T04, T31, etc. The portion of the target structures that can be superimposed on the best-matching database protein is colored red. For "easy" prediction targets this common substructure comprises 60% or more of the target and characteristic sequence motifs are present, as indicated in green. For T04, Polyribonucleotide Nucleotidyltransferase (84 residues)[25], this motif includes residues Phe22, Val23, His34, Ser36, and Ile38, parts of which are conserved in the RNP-1 motif of cold-shock proteins[26]. For T31, Exfoliative Toxin A (242 residues)[27], this motif includes His72, Ser195 and Asp90, the catalytic triad of this serine protease. For "medium" prediction targets the common substructure comprises 60% or more of the target, but characteristic sequence motifs are absent. These targets include T02, Threonine Deaminase (74 residues) (Gallagher T., unpublished results), T14, 3-Dehydroquinase (252 residues) (Shrive A.K., Polikarpov I. and Sawyer L., unpublished results), and T38, Endoglucanase C (152 residues)[28]. For T02 an N-terminal domain homologous to tryptophan sythase is omitted from the diagram, as this region was not a threading target. The region shown includes a 2-fold repeated substructure recognized by sequence analysis, 86% of which may be superimposed on the database structure. For "hard" targets only a small domain of the target, not recog nized by sequence analysis, may be superimposed on any database structure. These targets include T20, Ferrochelatase (320 residues)[29], and T22, L-Fucose-Isomerase (591 residues) (Seemann J. E. and Schulz G. E., unpublished results).

The most essential ingredient needed for a large scale fold-recognition experiment was of course predictions! Here we organizers had plenty of assistance: Thirty-two threading prediction teams submitted a total of 338 predictions for one or more of the 15 solved targets, in total more than 7000 models, since each prediction generally provided many alternatives. These prediction teams deserve the credit for anything learned from CASP2. By providing this large sample the predictors have made possible a quantitative evaluation of the current "state of the art" in fold recognition.

Measures of Specificity and Accuracy

For the threading competition CASP2 prediction teams were asked to provide a "hit list" identifying PDB structures predicted to be similar to the target, and the team's relative confidence in each alternative. Each team, in other words, was allowed a "bet" of 100 pennies which they could distribute as they chose among structures in PDB. Bets on "none of the above" were also allowed, and were in fact the correct answer for the 8 targets found not to be similar to any database structure. Fold Recognition Specificity, a primary evaluation quantity, is defined as the fraction of the bet placed on structures similar to the target, as identified by any of the jury of structure comparison methods.

For each PDB structure in their hit list CASP2 prediction teams were also asked to supply a residue-by-residue alignment of the target sequence and database structure. This sequence-structure alignment is equivalent to a molecular model encompassing backbone atoms of the aligned region, and a primary measure of model accuracy is simply the alpha-carbon RMS (root mean square) Superposition Residual of that model as compared to the true structure. As a measure of global similarity, however, RMS can fail to detect threading models that are correct for only a portion of the target. For this reason we have also employed two measures that reflect, roughly speaking, the fraction of the predicted structure that was accurate. Threading Alignment Specificity is defined as the fraction of the aligned residue pairs in the threading alignment that agree (to within 4 residues) with the structure-structure alignment produced by any of the structure-comparison jury. Threading Contact Specificity is defined as the fraction of residue contacts (alpha-carbons less than 8 Angstroms apart) in the predicted threading model that are also present in the true structure of the target. For CASP2 predictions alignment and contact specificity were found to be highly correlated, but the latter is independent of the alignments produced by structure comparison and allowed us to verify that no similar "folds" (used in any threading models) were missed by the jury of structure comparison methods.

Table 1. CASP2 threading prediction results
Target protein Relative difficulty Number of correct models Template structure from PDB Structural alignment length Structural alignment identiy Structural alignment RMS Best model alignment length Best model contact specificity Best model RMS
T04 easy 16 1CSP 61 24.5% 2.05 65 61.6% 2.97
T31 easy 14 1PPF E 188 16.0% 2.38 202 70.7% 4.16
T14 medium 7 1NAL 1 204 13.0% 2.72 132 33.0% 5.45
T38 medium 2 1BGL A 94 9.0% 3.50 98 53.9% 5.72
T02 medium 1 1PSD A 70 6.2% 1.87 64 65.0% 2.83
T20 hard 0 1TLF A 90 7.8% 2.71 203 28.8% 7.63
T22 hard 0 1LPB B 65 13.8% 2.18 93 9.3% 9.27

Target Protein: CASP2 target identification codes for the 7 targets with strong similarity to database structures. Relative Difficulty: A classification of the 7 target structures, as described in
Figure 1. Number of Correct Models: The number of predictions which crossed thresholds of fold-recognition specificity and model accuracy as described in the text. Template Structure from PDB: The PDB identification code[18] of the database structure used as a template for the "best" threading model. These are: 1CSP, Major Cold Shock Protein[30], 1PPF E, Human Leukocyte Elastase [31], 1NAL 1, N-Acetylneuraminate Lyase [32], 1BGL A, Beta-Galactosidase [33], 1PSD A, Phosphoglycerate Dehydrogenase [34], 1TLF A, tryptic core fragment of the Lactose Repressor (Friedman, A.M., Fischmann, T.O. and Steitz, T.A., data submission to PDB18) and 1LPB B, Pancreatic Lipase[35]. Note that all targets showed similarity to groups of related structures in the database, and many threading models counted here as "correct" were based on other members of these similar-structure groups. Structural Alignment Length: The number of target protein residues that may be structurally aligned with the database structure. Here and in Figure 2 structural alignments are taken from the member of the structure comparison "jury" that awarded the greatest alignment specificity to each prediction. Structural Alignment RMS: The alpha-carbon RMS residual of the target and database structures, according to this structural alignment. Structural Alignment Identity: The percentage of aligned residue pairs, from this structural alignment, where residue types are identical. Best Model Contact Specificity: For the "correct" model with the lowest RMS residual, the percentage of residue-residue contacts in the model that are also present in the true structure of that target. Note that the best model for T20 exceeds the contact specificity threshold for a "correct" model (25%), but it is not counted as such because fold-recognition specificity was below the required threshold (20%). Best Model Alignment Length: The length of the threading alignment. Best Model RMS: The alpha-carbon RMS residual of the "best" threading model as compared to the true structure of the target.

A Tally of Correct Predictions

To count the number of "correct" predictions one must define thresholds with respect to fold recognition specificity and model accuracy. This is a contentious subject, particularly for predictors who happen to fall on just the wrong side of whatever boundary is chosen! Below, in counting "correct" predictions, we employ a threading specificity threshold of 20% or greater. This corresponds to "within the top 5" on a ranked hit list, and is intermediate between the "top 2" and "top 10" thresholds suggested by the CASP1 assessors[16]. 58% of predictions (for the 7 recognizable targets) achieved a threading specificity this high. By this standard over half of the predictions passed the CASP2 "exam"!

For model accuracy we employ a contact specificity threshold of 25% and an alignment specificity threshold of 50%; a model is considered accurate if it surpasses either value. Both thresholds correspond closely to the average achieved across all threading models (based on a structure indeed similar to the target), and may be understood as "above average" model accuracy. 50% alignment specificity also has an intuitive interpretation: At least half of the residues positioned by threading alignment with a template structure in the database were indeed positioned correctly. Only about 13% of predictions crossed both the fold recognition specificity and model accuracy thresholds. This was a tougher grade to make: Only 13% of predictions got an "A" on the CASP2 "exam"!

A tally of predictions crossing both recognition specificity and accuracy thresholds is listed by target in Table 1. While only 13% of predictions were this successful, there are still very many "correct" predictions: 40 models from 21 different prediction teams. Collectively these teams have achieved this level of recognition specificity and model accuracy for 5 of the 7 targets amenable to threading prediction. Among these models the "best" can be chosen as that with the lowest RMS versus the true structure of the target, to illustrate the global accuracy achieved. By this standard the predictions are indeed remarkably good for targets with no obvious sequence similarity to known structures: RMS values range from 3 to roughly 6 Angstroms. Few if any models from CASP1[16] were as accurate, and threading methods as a whole have clearly improved with respect to sequence-structure alignment accuracy.

A Recipe for Success?

Some trends are apparent from the data in Table 1. The most obvious is that the number of "correct" predictions declines with the difficulty of the target. There are 30 correct predictions for "easy" targets, 10 for "medium", and none at all for "hard". As described in the caption for Figure 1, "easy" targets are distinguished by the presence of characteristic sequence motifs. One can imagine that this factor contributed to the higher "bets" on the correct folds, as would consideration of these target's conserved biological function. The "hard" targets are distinguished from "medium" and "easy" by lack of a extensive or global similarity with any database structure. It was suggested previously that specific fold recognition requires that roughly 60% of target residues to be superimposable onto a database structure [22], and it would appear that blind predictions at CASP2 were successful for only (and all of) these cases.

Some of these evaluation results are plotted in Figure 2 in a way that shows how closely the threading predictions approached the best possible accuracy. The "RMS-gap" between the best threading model and the structure comparison result is large for the "hard" targets, where the template structure from the database corresponds to only a small domain of the target. But for the "easy" or "medium" targets the RMS-gap is between 1 and 2 Angstroms, values comparable to what is seen in homology modeling when sequence similarity is detectable but low[23]. There would also seem to be an association of lower RMS-gap with lower RMS of the true target structure when superimposed onto the database template. When RMS is low the structural environments of many residues are well conserved, and threading alignments can be more accurate, a trend also noted previously in control threading experiments [22]. It is also quite interesting that model accuracy does not appear to be limited by sequence similarity of the target and database protein. Target T02 has the lowest model RMS, for example, but a sequence identity in structural alignment of only 6%, well below the "twilight zone" of sequence similarity.

Figure 2

Accuracy of CASP2 model structures. RMS residuals of the "best" threading models are plotted versus the fraction of the target that may be structurally aligned with the template structure from the database. The extent of the superimposable substructure is defined as the ratio of structural alignment length (
Table 1) to target sequence length (Figure 1). Red plotting symbols indicate the alpha-carbon RMS residual (in Angstrom units) of the predicted model as compared to the experimental structure of each target, a measure of model accuracy. Yellow symbols indicate the RMS residual from structural alignment of the target and database structures, the best accuracy that could be achieved by a threading model. The RMS-gap, or difference between these values, is indicated by the lengths of the dotted-line segments. The percentage of identical residues in the structural alignments are shown in parentheses. Values above 15% typically indicate the presence of conserved sequence motifs and are suggestive of evolutionary relationship.

Perhaps the most surprising trend, however, is one that was widely expected but not observed. CASP2 results might have clearly identified the "best" threading method, but instead one finds that a variety of methods were almost equally successful, particularly with the "easy" targets. Threading success, it would seem, has more to do with targets than methods: Folds were specifically recognized and accurately modeled when the target was globally similar to a domain in the structure database, with 60% or more of its residues superimposable. Threaders might do well to focus their attention on ways to identify likely domain boundaries in target sequences, so that only domain subsequences need be considered. This was the recipe followed by one prediction team to produce the only "correct" (and superbly accurate) model for target T02.

Click on "Threading Now?"

While the success of the best predictions is impressive, a potential user of threading methods should bear in mind that this is still new territory for computational biologists. There remain problems with false positives, for example: Among predictions for the 8 targets not similar to any known structure fully 70% placed more than 50% confidence in incorrect models, rather than in "none of the above". Furthermore, most predictions were not accomplished by fully automatic procedures, if only because the human experts who produced them had not yet had time to incorporate their latest ideas into their computer programs. Lastly, a potential user of threading methods must bear in mind that threading models can at best approximate experimentally determined structures, since they are limited by the similarity of the structural template available in the database. At CASP2 threading models approached this goal, but RMS superposition residuals with respect to experimental structures were still in the range of 3-6 Angstroms. But if this is bad news with respect to easy availability of proven and accurate threading methods, there is also good news to go with it: Since threading remains an active research area one may expect further progress in recognition specificity, model accuracy and automation. If CASP2 is any guide, one might even "bet" on further measurable progress by CASP3, two years hence!


  1. Madej T., Boguski M.S. and Bryant S.H. (1995) FEBS Lett. 373, 13-18
  2. Matsuo Y. and Nishikawa K. (1994) FEBS Lett. 345, 23-26
  3. Johnson M.S., Srinivasan N., Sowdhamini R. and Blundell T.L (1994) Crit. Rev. Biochem. Mol. Biol. 29, 1-68
  4. Fetrow J.S. and Bryant S.H. (1993) Bio/Technology 11, 479-484
  5. Bryant S.H. and Altschul S.F. (1995) Curr. Opin. Struct. Biol. 5, 236-244
  6. Torda, A.E. (1997) Curr. Opin. Struct. Biol. 7, 200-205
  7. Jones D.T. and Thornton J.M. (1996) Curr. Opin. Struct. Biol. 6, 210-216
  8. Finkelstein A.V. (1997) Curr. Opin. Struct. Biol. 7, 60-71
  9. Moult J., Pedersen J.T., Judson R., and Fidelis K. (1995) Proteins 23, II-IV
  10. Shortle D. (1995) Nat. Struct. Biol. 2, 91-93
  11. Moult J. (1996) Curr. Opin. Biotechnol. 7, 422-427
  12. Pennisi E. (1996) Science 273, 426-428
  13. Eisenberg D. (1997) Nat. Struct. Biol. 4, 95-97
  14. Shortle D. (1997) Curr. Biol. 7, R151-R154
  15. Dunbrack R.L. Jr. et al. (1997) Fold. Design 2, R27-R42
  16. Lemer C. M.-R., Rooman M.J. and Wodak S.J. (1995) Proteins 23, 337-355
  17. Altschul S.F., Gish W., Miller W., Myers E.W., Lipman D.J. (1990) J. Mol. Biol. 215, 403-410
  18. Abola E.E. et al. (1987) in Crystallographic Databases - Information Content, Software Systems, Scientific Applications (Allen, F.H., Bergerhoff, G. and Sievers, R., eds.), pp 107-132, International Union of Crystallography
  19. Holm L. and Sander C. (1996) Science 273, 595-602
  20. Orengo C.A. and Taylor W.R. (1996) Methods Enzymol. 266, 617-635
  21. Gibrat J.-F., Madej T. and Bryant S.H. (1996) Curr. Opin. Struct. Biol. 6, 377-385
  22. Bryant, S.H. (1996) Proteins 26, 172-185
  23. Mosimann, S., Meleshko, R. and James, N.G. (1995) Proteins 23, 301-317
  24. Kraulis P.J. (1991) J. Appl. Crystallogr. 24, 946-950
  25. Bycroft M. et al. (1997) Cell 88, 235-242
  26. Landsman D. (1992) Nucleic Acids Res. 20, 2861-2864
  27. Vath G.M. et al. (1997) Biochemistry 36,1559-1566
  28. Johnson P.E. et al. (1996) Biochemistry 35,14381-14394
  29. Al-Karadaghi S. et al. (1997) EMBO J. in press
  30. Schindelin H. et al. (1992) Proteins 14,120-124
  31. Bode W. et al. (1986) EMBO J. 5, 2453-2458
  32. Izard T. et al. (1994) Structure 2,361-369
  33. Jacobson R.H., Zhang X.J., DuBose R.F. and Matthews B.W. (1994) Nature 369, 761-766
  34. Schuller D.J., Grant G.A. and Banaszak L.J. (1995) Nat. Struct. Biol. 2, 69-76
  35. van Tilbeurgh H., Sarda L., Verger R. and Cambillau C. (1992) Nature 359,159-162

Aron Marchler-Bauer, Stephen H. Bryant
Computational Biology Branch
National Center for Biotechnology Information
National Institutes of Health
8600 Rockville Pike
Bethesda, MD 20894

Created. 13 Aug 1997
NCBI Home Staff Papers