A measure of success in fold recognition
Aron Marchler-Bauer, Stephen H. Bryant
Trends in Biochemical Sciences 1997 22: 236-240
(reproduced with
permission)
Prediction of protein structure by fold recognition, or threading, was
recently put to the test in a "blind" structure prediction experiment,
CASP2.
Thirty-two teams from around the world participated, preparing predictions for
22 different "target" proteins whose structures were soon to be determined. As
experimental structures became available we, as organizers of the threading
competition, computed objective measures of fold recognition specificity and
model accuracy, to identify and characterize successful predictions. Here we
present a brief summary of these prediction evaluations, a tally of "correct"
predictions and a discussion of factors associated with correct predictions.
We find that threading produced specific recognition and accurate models
whenever the structural database contained a template spanning a large fraction
of target sequence. Presence of conserved sequence motifs was helpful, but not
required, and it would appear that threading can succeed whenever similarity to
a known structure is sufficiently extensive.
The past few years have been a time of active research in computational methods
for protein threading, or fold recognition. This work is motivated by
biologist's need for the greatest possible sensitivity in sequence database
searches, which increasingly serve as a means to identify molecular function
by inference from that of homologs. Threading replaces a database of protein
sequences with a database of known 3-dimensional structures. Since
3-dimensional structure is highly conserved in protein evolution, threading may
identify a homolog even when sequence similarity per se has dropped into the
"twilight zone" of detectability. Indeed, in a few cases threading searches have
already identified remote homologs when other methods failed[1
,2]. Computational biologists are also interested in
threading as a means to make progress on the long standing problem of protein
structure prediction. Homology modeling, or prediction based on sequence
alignment with a known structure, is known to produce accurate models when
sequences are similar enough to correctly align[3]. If
threading can generate accurate sequence-structure alignments when sequence
similarity is low or undetectable, it may extend prediction by homology to a
greater proportion of new protein sequences.
Structure prediction workshops
Threading algorithms are a complex matter, of course, and evaluation of their
performance even more complex. Furthermore, in their efforts to characterize
threading methods authors have generally employed different test sets and
accuracy criteria, and it is difficult to make comparisons (for reviews see
4,5,6,
7,8). To provide objective measures
of success and to identify the best prediction methods John Moult and
colleagues have therefore organized 2 workshops on "Critical Assessment of
Structure Prediction", CASP1[9,10,
11] and CASP2[12,
13,14,15].
In advance of the workshops groups working on threading and other methods were
asked to make predictions for sequences whose structures were as yet unknown,
but were expected to be determined soon. At the workshops these "blind"
structure predictions were compared to experimental results, using the best
evaluation criteria that the meeting organizers and independent assessors could
devise. Needless to say, these "community experiments" have generated a certain
competitive spirit among prediction groups and, one might hope, some
contribution to progress in fold recognition!
CASP1, held in Asilomar, California in December 1994, showed several examples
of successful fold recognition, though it also raised questions about the
accuracy of sequence-structure alignments[16]. For CASP2,
held in Asilomar in December 1996, we as organizers of the threading
"competition" have attempted to address this latter issue by providing
quantitative evaluation of both fold recognition specificity and threading
model accuracy. To make this possible we asked prediction teams to submit
predictions (for all targets showing no obvious sequence similarity to known
structures) in a pre-defined, machine-readable format. When the experimental
structures became available we then computed automatically several measures of
recognition specificity and model accuracy, and presented these results for
analysis and discussion at the workshop. A detailed assessment of CASP2 results
will appear this fall in a special issue of Proteins: Structure, Function and
Genetics, including a comparative assessment of different group's results by
Michael Levitt, the "judge" of the threading competition. Here we provide a
brief summary, and with all due respect to the prediction teams we avoid
entirely any naming of "winners" or "losers". Instead we address a question of
perhaps more general interest: How successful were fold recognition techniques
as a whole, in blind prediction, and what progress has been made in the two
years since CASP1?
The short answer, as the reader will see, is that threading methods have
produced accurate models for all CASP2 targets showing sufficient similarity
to a known structure in the database. This is clear evidence that threading can
succeed in the "twilight zone" of sequence similarity, and furthermore that
these methods have progressed with respect to alignment accuracy since CASP1.
It would seem that the CASP2 experiment has indeed provided a measure of
success in threading prediction.
A Large-Scale Community Experiment
The most important ingredient for CASP2, naturally, was a set of target
sequences whose structures were soon to be determined. Here we cannot thank
enough the crystallographers and NMR spectroscopists who advised us of
proteins under investigation in their laboratories and later, for the
evaluation process, provided atomic coordinates, often in advance of
publication. With these group's generous assistance we were able to assemble
22 suitable threading targets, each showing no significant similarity by
BLAST[17], and other pairwise sequence comparison
methods to structures then in the Protein Data Bank[18].
Fifteen of these structures were determined in time for the workshop and
formed the sample for which blind prediction results could be evaluated.
An equally important ingredient for CASP2 was a "standard of truth", a way to
identify, for the evaluation process, those targets which indeed possess a fold
similar to that of structures in the database. Here we must thank groups who
provided in a timely and machine-readable form the results of 3 different
structure-structure comparison methods, DALI[19],
SSAP[20] and VAST[21], which
collectively served as the "jury" for determination of structural similarities.
Though these methods differ, this jury was nonetheless essentially unanimous
in identifying 7 of the 15 targets as having strong similarity to one or more
known structures. As shown in figure 1, these 7 structures also provided a
fortunate test set for titration of fold recognition performance. The targets
ranged from "easy" to "medium" to "hard", based on the nature and extent of
their similarity to structures in the database.
Figure 1
MolScript[24] diagrams of the 7 threading targets with
structural similarity to other proteins in the database. These are labeled
according to identification codes used by CASP2 predictors, T04, T31, etc.
The portion of the target structures that can be superimposed on the
best-matching database protein is colored red. For "easy" prediction targets
this common substructure comprises 60% or more of the target and characteristic
sequence motifs are present, as indicated in green. For T04, Polyribonucleotide
Nucleotidyltransferase (84 residues)[25], this motif
includes residues Phe22, Val23, His34, Ser36, and Ile38, parts of which are
conserved in the RNP-1 motif of cold-shock proteins[26].
For T31, Exfoliative Toxin A (242 residues)[27], this
motif includes His72, Ser195 and Asp90, the catalytic triad of this serine
protease. For "medium" prediction targets the common substructure comprises
60% or more of the target, but characteristic sequence motifs are absent.
These targets include T02, Threonine Deaminase (74 residues) (Gallagher T.,
unpublished results), T14, 3-Dehydroquinase (252 residues) (Shrive A.K.,
Polikarpov I. and Sawyer L., unpublished results), and T38, Endoglucanase C
(152 residues)[28]. For T02 an N-terminal domain
homologous to tryptophan sythase is omitted from the diagram, as this region
was not a threading target. The region shown includes a 2-fold repeated
substructure recognized by sequence analysis, 86% of which may be superimposed
on the database structure. For "hard" targets only a small domain of the target, not recog
nized by sequence analysis, may be superimposed on any database structure. These
targets include T20, Ferrochelatase (320 residues)[29],
and T22, L-Fucose-Isomerase (591 residues) (Seemann J. E. and Schulz G. E.,
unpublished results).
The most essential ingredient needed for a large scale fold-recognition
experiment was of course predictions! Here we organizers had plenty of
assistance: Thirty-two threading prediction teams submitted a total of 338
predictions for one or more of the 15 solved targets, in total more than 7000
models, since each prediction generally provided many alternatives. These
prediction teams deserve the credit for anything learned from CASP2. By
providing this large sample the predictors have made possible a quantitative
evaluation of the current "state of the art" in fold recognition.
Measures of Specificity and Accuracy
For the threading competition CASP2 prediction teams were asked to provide a
"hit list" identifying PDB structures predicted to be similar to the target,
and the team's relative confidence in each alternative. Each team, in other
words, was allowed a "bet" of 100 pennies which they could distribute as they
chose among structures in PDB. Bets on "none of the above" were also allowed,
and were in fact the correct answer for the 8 targets found not to be similar
to any database structure. Fold Recognition Specificity, a primary
evaluation quantity, is defined as the fraction of the bet placed on structures
similar to the target, as identified by any of the jury of structure comparison
methods.
For each PDB structure in their hit list CASP2 prediction teams were also asked
to supply a residue-by-residue alignment of the target sequence and database
structure. This sequence-structure alignment is equivalent to a molecular model
encompassing backbone atoms of the aligned region, and a primary measure of
model accuracy is simply the alpha-carbon RMS (root mean square)
Superposition Residual of that model as compared to the true structure.
As a measure of global similarity, however, RMS can fail to detect threading
models that are correct for only a portion of the target. For this reason we
have also employed two measures that reflect, roughly speaking, the fraction of
the predicted structure that was accurate. Threading Alignment
Specificity is defined as the fraction of the aligned residue pairs in the
threading alignment that agree (to within 4 residues) with the
structure-structure alignment produced by any of the structure-comparison jury.
Threading Contact Specificity is defined as the fraction of residue
contacts (alpha-carbons less than 8 Angstroms apart) in the predicted threading
model that are also present in the true structure of the target. For CASP2
predictions alignment and contact specificity were found to be highly correlated, but the latter is independent of the alignments produced
by structure comparison and allowed us to verify that no similar "folds" (used
in any threading models) were missed by the jury of structure comparison
methods.
Table 1. CASP2 threading prediction results
| Target protein |
Relative difficulty |
Number of correct models |
Template structure from PDB |
Structural alignment length |
Structural alignment identiy |
Structural alignment RMS |
Best model alignment length |
Best model contact specificity |
Best model RMS |
| T04 |
easy |
16 |
1CSP |
61 |
24.5% |
2.05 |
65 |
61.6% |
2.97 |
| T31 |
easy |
14 |
1PPF E |
188 |
16.0% |
2.38 |
202 |
70.7% |
4.16 |
| T14 |
medium |
7 |
1NAL 1 |
204 |
13.0% |
2.72 |
132 |
33.0% |
5.45 |
| T38 |
medium |
2 |
1BGL A |
94 |
9.0% |
3.50 |
98 |
53.9% |
5.72 |
| T02 |
medium |
1 |
1PSD A |
70 |
6.2% |
1.87 |
64 |
65.0% |
2.83 |
| T20 |
hard |
0 |
1TLF A |
90 |
7.8% |
2.71 |
203 |
28.8% |
7.63 |
| T22 |
hard |
0 |
1LPB B |
65 |
13.8% |
2.18 |
93 |
9.3% |
9.27 |
Target Protein: CASP2 target identification codes for the 7 targets with
strong similarity to database structures. Relative Difficulty: A
classification of the 7 target structures, as described in
Figure 1. Number of Correct Models: The number of predictions which
crossed thresholds of fold-recognition specificity and model accuracy as
described in the text. Template Structure from PDB: The PDB
identification code[18] of the database structure used
as a template for the "best" threading model. These are: 1CSP, Major Cold
Shock Protein[30], 1PPF E, Human Leukocyte Elastase
[31], 1NAL 1, N-Acetylneuraminate Lyase
[32], 1BGL A, Beta-Galactosidase
[33], 1PSD A, Phosphoglycerate Dehydrogenase
[34], 1TLF A, tryptic core fragment of the Lactose
Repressor (Friedman, A.M., Fischmann, T.O. and Steitz, T.A., data submission
to PDB18) and 1LPB B, Pancreatic Lipase[35]. Note that
all targets showed similarity to groups of related structures in the database,
and many threading models counted here as "correct" were based on other members
of these similar-structure groups. Structural Alignment Length: The
number of target protein residues that may be structurally aligned with the
database structure. Here and in Figure 2 structural
alignments are taken from the member of the structure comparison "jury" that
awarded the greatest alignment specificity to each prediction. Structural
Alignment RMS: The alpha-carbon RMS residual of the target and database
structures, according to this structural alignment. Structural Alignment
Identity: The percentage of aligned residue pairs, from this structural
alignment, where residue types are identical. Best Model Contact
Specificity: For the "correct" model with the lowest RMS residual, the
percentage of residue-residue contacts in the model that are also present in
the true structure of that target. Note that the best model for T20 exceeds the
contact specificity threshold for a "correct" model (25%), but it is not
counted as such because fold-recognition specificity was below the required
threshold (20%). Best Model Alignment Length: The length of the
threading alignment. Best Model RMS: The alpha-carbon RMS residual of
the "best" threading model as compared to the true structure of the target.
A Tally of Correct Predictions
To count the number of "correct" predictions one must define thresholds with
respect to fold recognition specificity and model accuracy. This is a
contentious subject, particularly for predictors who happen to fall on just
the wrong side of whatever boundary is chosen! Below, in counting "correct"
predictions, we employ a threading specificity threshold of 20% or greater.
This corresponds to "within the top 5" on a ranked hit list, and is intermediate
between the "top 2" and "top 10" thresholds suggested by the CASP1
assessors[16]. 58% of predictions (for the 7 recognizable
targets) achieved a threading specificity this high. By this standard over half
of the predictions passed the CASP2 "exam"!
For model accuracy we employ a contact specificity threshold of 25% and an
alignment specificity threshold of 50%; a model is considered accurate if it
surpasses either value. Both thresholds correspond closely to the average
achieved across all threading models (based on a structure indeed similar to
the target), and may be understood as "above average" model accuracy. 50%
alignment specificity also has an intuitive interpretation: At least half of
the residues positioned by threading alignment with a template structure in
the database were indeed positioned correctly. Only about 13% of predictions
crossed both the fold recognition specificity and model accuracy thresholds.
This was a tougher grade to make: Only 13% of predictions got an "A" on the
CASP2 "exam"!

A tally of predictions crossing both recognition specificity and accuracy
thresholds is listed by target in Table 1. While only 13% of predictions were
this successful, there are still very many "correct" predictions: 40 models
from 21 different prediction teams. Collectively these teams have achieved
this level of recognition specificity and model accuracy for 5 of the 7 targets
amenable to threading prediction. Among these models the "best" can be chosen
as that with the lowest RMS versus the true structure of the target, to
illustrate the global accuracy achieved. By this standard the predictions are
indeed remarkably good for targets with no obvious sequence similarity to known
structures: RMS values range from 3 to roughly 6 Angstroms. Few if any models
from CASP1[16] were as accurate, and threading methods
as a whole have clearly improved with respect to sequence-structure alignment
accuracy.
A Recipe for Success?
Some trends are apparent from the data in Table 1. The most obvious is that
the number of "correct" predictions declines with the difficulty of the target.
There are 30 correct predictions for "easy" targets, 10 for "medium", and none
at all for "hard". As described in the caption for Figure 1,
"easy" targets are distinguished by the presence of characteristic sequence
motifs. One can imagine that this factor contributed to the higher "bets" on
the correct folds, as would consideration of these target's conserved biological
function. The "hard" targets are distinguished from "medium" and "easy" by
lack of a extensive or global similarity with any database structure. It was
suggested previously that specific fold recognition requires that roughly 60%
of target residues to be superimposable onto a database structure
[22], and it would appear that blind predictions at
CASP2 were successful for only (and all of) these cases.
Some of these evaluation results are plotted in Figure 2 in a way that shows
how closely the threading predictions approached the best possible accuracy.
The "RMS-gap" between the best threading model and the structure comparison
result is large for the "hard" targets, where the template structure from the
database corresponds to only a small domain of the target. But for the "easy"
or "medium" targets the RMS-gap is between 1 and 2 Angstroms, values comparable
to what is seen in homology modeling when sequence similarity is detectable but
low[23]. There would also seem to be an association
of lower RMS-gap with lower RMS of the true target structure when superimposed
onto the database template. When RMS is low the structural environments of many
residues are well conserved, and threading alignments can be more accurate,
a trend also noted previously in control threading experiments
[22]. It is also quite interesting that model accuracy
does not appear to be limited by sequence similarity of the target and database
protein. Target T02 has the lowest model RMS, for example, but a sequence
identity in structural alignment of only 6%, well below the "twilight zone"
of sequence similarity.
Figure 2
Accuracy of CASP2 model structures. RMS residuals of the "best" threading
models are plotted versus the fraction of the target that may be structurally
aligned with the template structure from the database. The extent of the
superimposable substructure is defined as the ratio of structural alignment
length (Table 1) to target sequence length
(Figure 1). Red plotting symbols indicate the
alpha-carbon RMS residual (in Angstrom units) of the predicted model as
compared to the experimental structure of each target, a measure of model
accuracy. Yellow symbols indicate the RMS residual from structural alignment
of the target and database structures, the best accuracy that could be achieved
by a threading model. The RMS-gap, or difference between these values, is
indicated by the lengths of the dotted-line segments. The percentage of
identical residues in the structural alignments are shown in parentheses.
Values above 15% typically indicate the presence of conserved sequence motifs
and are suggestive of evolutionary relationship.
Perhaps the most surprising trend, however, is one that was widely expected but
not observed. CASP2 results might have clearly identified the "best" threading
method, but instead one finds that a variety of methods were almost equally
successful, particularly with the "easy" targets. Threading success, it would
seem, has more to do with targets than methods: Folds were specifically
recognized and accurately modeled when the target was globally similar to a
domain in the structure database, with 60% or more of its residues
superimposable. Threaders might do well to focus their attention on ways to
identify likely domain boundaries in target sequences, so that only domain
subsequences need be considered. This was the recipe followed by one prediction
team to produce the only "correct" (and superbly accurate) model for target
T02.
Click on "Threading Now?"
While the success of the best predictions is impressive, a potential user of
threading methods should bear in mind that this is still new territory for
computational biologists. There remain problems with false positives, for
example: Among predictions for the 8 targets not similar to any known structure
fully 70% placed more than 50% confidence in incorrect models, rather than in
"none of the above". Furthermore, most predictions were not accomplished by
fully automatic procedures, if only because the human experts who produced
them had not yet had time to incorporate their latest ideas into their computer
programs. Lastly, a potential user of threading methods must bear in mind that
threading models can at best approximate experimentally determined structures,
since they are limited by the similarity of the structural template available
in the database. At CASP2 threading models approached this goal, but RMS
superposition residuals with respect to experimental structures were still in
the range of 3-6 Angstroms. But if this is bad news with respect to easy
availability of proven and accurate threading methods, there is also good news
to go with it: Since threading remains an active research area one may expect
further progress in recognition specificity, model accuracy and automation.
If CASP2 is any guide, one might even "bet" on further measurable progress by
CASP3, two years hence!
References
- Madej T., Boguski M.S. and Bryant S.H. (1995)
FEBS Lett. 373, 13-18
- Matsuo Y. and Nishikawa K. (1994)
FEBS Lett. 345, 23-26
- Johnson M.S., Srinivasan N., Sowdhamini R. and Blundell T.L (1994)
Crit. Rev. Biochem. Mol. Biol. 29, 1-68
- Fetrow J.S. and Bryant S.H. (1993)
Bio/Technology 11, 479-484
- Bryant S.H. and Altschul S.F. (1995)
Curr. Opin. Struct. Biol. 5, 236-244
- Torda, A.E. (1997)
Curr. Opin. Struct. Biol. 7, 200-205
- Jones D.T. and Thornton J.M. (1996)
Curr. Opin. Struct. Biol. 6, 210-216
- Finkelstein A.V. (1997)
Curr. Opin. Struct. Biol. 7, 60-71
- Moult J., Pedersen J.T., Judson R., and Fidelis K. (1995)
Proteins 23, II-IV
- Shortle D. (1995)
Nat. Struct. Biol. 2, 91-93
- Moult J. (1996)
Curr. Opin. Biotechnol. 7, 422-427
- Pennisi E. (1996)
Science 273, 426-428
- Eisenberg D. (1997)
Nat. Struct. Biol. 4, 95-97
- Shortle D. (1997)
Curr. Biol. 7, R151-R154
- Dunbrack R.L. Jr. et al. (1997)
Fold. Design 2, R27-R42
- Lemer C. M.-R., Rooman M.J. and Wodak S.J. (1995)
Proteins 23, 337-355
- Altschul S.F., Gish W., Miller W., Myers E.W., Lipman D.J. (1990)
J. Mol. Biol. 215, 403-410
- Abola E.E. et al. (1987) in
Crystallographic Databases - Information Content, Software Systems, Scientific Applications
(Allen, F.H., Bergerhoff, G. and Sievers, R., eds.), pp 107-132, International Union of Crystallography
- Holm L. and Sander C. (1996)
Science 273, 595-602
- Orengo C.A. and Taylor W.R. (1996)
Methods Enzymol. 266, 617-635
- Gibrat J.-F., Madej T. and Bryant S.H. (1996)
Curr. Opin. Struct. Biol. 6, 377-385
- Bryant, S.H. (1996)
Proteins 26, 172-185
- Mosimann, S., Meleshko, R. and James, N.G. (1995)
Proteins 23, 301-317
- Kraulis P.J. (1991)
J. Appl. Crystallogr. 24, 946-950
- Bycroft M. et al. (1997)
Cell 88, 235-242
- Landsman D. (1992)
Nucleic Acids Res. 20, 2861-2864
- Vath G.M. et al. (1997)
Biochemistry 36,1559-1566
- Johnson P.E. et al. (1996)
Biochemistry 35,14381-14394
- Al-Karadaghi S. et al. (1997)
EMBO J. in press
- Schindelin H. et al. (1992)
Proteins 14,120-124
- Bode W. et al. (1986)
EMBO J. 5, 2453-2458
- Izard T. et al. (1994)
Structure 2,361-369
- Jacobson R.H., Zhang X.J., DuBose R.F. and Matthews B.W. (1994)
Nature 369, 761-766
- Schuller D.J., Grant G.A. and Banaszak L.J. (1995)
Nat. Struct. Biol. 2, 69-76
- van Tilbeurgh H., Sarda L., Verger R. and Cambillau C. (1992)
Nature 359,159-162
Aron Marchler-Bauer, Stephen H. Bryant
Computational Biology Branch
National Center for Biotechnology Information
National Institutes of Health
8600 Rockville Pike
Bethesda, MD 20894
Email:bryant@ncbi.nlm.nih.gov
Created. 13 Aug 1997