Second Meeting on the Critical Assessment of
Techniques for Protein Structure Prediction
Aron Marchler-Bauer, Steve Bryant, for CASP2 Organizers
The predictors used a standardized format for their submissions, cautious verification of the format during the submission process made the data likely to be free of syntactical errors. The same format has been used for reporting the results of structure-structure comparison methods, so that they could be verified in the same way. The evaluation software compared one threading/fold recognition-submission against one structure comparison result, which is used as the 'standard of truth', at a time.
The format is described in detail at sites in the UK or USA. The results from a structure comparison search look the same, except for the format descriptor reading:
PFRMAT SCV1and some additional records describing the quality of the similarity:
RMSIDE T0021 0 1ALA _ 1 57 7.79 7.02the three numbers are: the length of the structural alignment in residues, the RMS-Deviation between aligned structures in Angstroem, and the percentage of identical residues in the (structural) alignment.
A detailed description and discussion of the criteria used for the evaluation can be found in the UK or USA.
The evaluation software automatically calculates and reports the following numbers:
The predictors were asked to specify the overall confidence in their prediction. They could do so by assigning a non-zero score to a dummy structure "NONE". Assigning 100% of the score to "NONE" would mean that the author(s) of the prediction think that none of the structures listed in the prediction are suitable three-dimensional models for the target sequence. Assigning 50% of the overall score to "NONE" would mean that the predictor(s) thought that the odds of having found the right answer would be 50%.
The predictors were asked to submit, along with the hits having non-zero score, and corresponding alignments, hits to all other database proteins used in their search procedure, with a score of 0. The number of database structures listed in the submission is reported.
This is the actual number of hits, as found in the prediction, which have been assigned a non-zero score, even if this score is very small and represents a neglectible fraction of the total prediction's bet.
Out of all the prediction's hits with a non-zero score ("TNt0"), some might be "correct hits" - i.e. in the structure-structure comparison results, used as the standard of truth, hits to the same database chain were reported with a significant score.
This is the number of hits, having a significant score in the structure-structure comparison results, which are listed in the prediction's search database. "TCmx" represents the maximal possible value for "TCrct", given the search database used for the prediction and the results of the structure-structure comparison method.
The Threading Specificity is calculated as the percentage of the "bet", that was placed on structures which are similar to the target protein. Predictors could bet on no structures - (except for the dummy "NONE", the specificity is 0 by default in this case) - on one or on several structures. The amount of the bet placed on actual structures is normalized to sum up to 1.0 for the prediction evaluation. The confidence listed in the column labeled "Conf" is therefore not used for calculating the threading- and alignment-specific quantities listed and described below.
Structure comparison methods assign scores to individual matches of the target structure to a database structure, or simply report whether a database structure is similar to the target or not. For the calculation of threading specificity, scores from the submission are normalized to sum up to 1.0, while structure comparison scores are scaled down, if necessary, so that their maximum is 1.0, these scores are interpreted as the probability, as assessed by the structure-structure comparison method, that the two structures are actually similar. Structure similarity scores of less than 0.5 were considered insignificant in the evaluation. Threading specificity is calculated by multiplying the values for all matching TSCORE-records and summing up:
Score assigned to correct hit i Str.Comp. Score for hit i
Thr.Spec. = 100 * sum ( ------------------------------- * ------------------------- )
(i) Sum of Scores for all hits max(1,Str.Comp. Scores)
Threading sensitivity is the percentage of similar structures, present in the prediction submission search set, that were detected as being similar by the prediction method. It is calculated as the percentage of the structure comparison methods "bet" that is correct, when compared against the threading/fold-recognition submission. To calculate this number, the same procedure is used as in the calculation of threading specificity, but the schemes of score normalization are swapped.
Score assigned to correct hit i Str.Comp. Score for hit i
Thr.Sens. = 100 * sum ( ------------------------------- * ------------------------- )
(i) max(Prediction Scores) sum(Str.Comp. Scores)
The threading specificity, as described above, does not take into account the ranking in the prediction's list of matches, merely the amount of the bet placed on correct hits is used. The best-case specificity, as reported here, is calculated from the rank (position in a list sorted by score) of the best scoring (first) correct hit. If the best-scoring correct hit is listed in first place (has the hightest rank), the best-case specificity is 100%. It is calculated as the number of structures listed in the prediction less the rank of the first correct hit plus one, divided by the number of structures (this number is then multiplied by 100).
In a simple example a search database contains 100 structures, the best scoring correct hit is found in position 2 after sorting - this results in a best-case specificity of 99%. If there are, for example, 5 hits with equal score at the top of the list, and only one of them is correct, the best case specificity will be 96% only (as if the first correct hit was found in position 5). If, for example, 3 out of 5 equally scoring hits at the top of the list are correct hits, the best case specificity will be 98% (as if the first correct hit was found at position 3). A threshold is set, with respect to structure-structure comparison scores, below which a "hit" is not considered to be correct; the threshold used for the evaluation was 0.5.
Size of Search Set - Rank of first correct hit + 1
Best Case Specificity = 100 * --------------------------------------------------
Size of Search Set
Chance specificity is the best-case specificity expected by chance. This number depends on the size of the search database ("TDbs") and on the number of correct structures found there ("TCmx"). It does not depend on the scores given to structures by the predictors. In a search database with 100 structures that contains 10 correct hits, it is expected that one of them would fall into the top 10% purely by chance - therefore the chance best-case specificity would be 90%!
Size of Search Set
Size of Search Set - -------------------------------- + 1
No. of Correct Str. in Search Set
Chance Specificity = 100 * ----------------------------------------------------------
Size of Search Set
A virtual "model" for the target structure was constructed from the C-alpha coordinates of a hit's database-structure and the corresponding alignment, as found in the prediction. The root mean square deviation for C-alpha atoms was calculated after superposition of this model onto the actual target structure (coordinate space superposition based on the Calpha coordinates). The RMSD-values are given in Angstroem-units.
"ACrct" is obtained by multplying the length of the prediction's alignment "ALen" by the alignment specificity "ASpc" - i.e. the fraction of correctly aligned residues. It simply gives the number of residues which have been aligned correctly by the prediction.
sum( ASpc(i) * AWgt(i) * SCWgt(i) )
(i)
ASpc = -------------------------------------
sum( AWgt(i) * SCWgt(i) )
(i)
If two or more sequence subsets are given in the prediction submission, the evaluation software will consequently give two or more separate lines specifying different confidence, if applicable, and the respective evaluation quantities, calculated separately for each sequence subset - which is therefore treated like a separate prediction!
Structure-structure comparison searches might have used target structures parsed into separate domains, giving results for the whole chain and for separate domains too, which can be quite different depending on the metric used and the assessment of significance. Furthermore, structure-structure comparison methods might as well use structure-subsets (domains) of the database proteins in their search set.
In cases of redundancy in the structure-structure comparison results, it had to be decided which of the respective (sub)results to be used for evaluation purposes. A prediction submission, for example, might have listed a hit of T00xx, sequence subset 2, to a database structure 1ABC X 1. In the structure-structure comparison results one might, for example, find a hit of the whole protein T00xx 0 to 1ABC X 0 - the whole chain of 1ABC X - and a hit of a domain T00xx 1 to a domain 1ABC X 1, as well as a hit of domain T00xx 2 to 1ABC X 2. The parsing of a target sequence/structure into subdomains does not necessarily match between prediction and structure-structure comparison, neither do the domain definitions used for database structures.
For the selection of the appropriate structure-structure comparison match, against which a prediction match was evaluated, an algorithm was used which basically tried to maximize alignment coverage; i.e. the fraction of target sequence residues, which have been used for an alignment in both the prediction and the structure-structure comparison.
For each hit to a database structure with a non-zero score, predictors were required to submit a sequence-to-structure alignment, which would be sufficient to construct a three-dimensional model of the backbone atoms for the target range which is covered by the alignment. In general, methods for threading/fold-recognition use such assignments of residues from a target sequence to coordinates (residues) from a database structure, to assess the quality of the respective hit.
The results from structure-structure comparison searches of the final target structures against PDB were summarized in the same format, for the evaluation of alignment-specific quantities, alignments were compared to alignments - or alignment-derived data to alignment-derived data. A single prediction might, of course, have hits to several structures in each search set, that are i) listed with a non-zero score, and ii) found to be similar to the target by the structure comparison method, expressed by a significant score in the structure comparison result. For all these, the alignment-related quantities are calculated separately, and they are summarized, by taking their weighted average, in the first table the evaluation system reports.
Alignment Specificity is the percentage of aligned target sequence residues, as found in the submission, which are aligned correctly, compared with the structure-comparison alignment.
Number of correctly aligned residues
Alignment specificity = 100 * --------------------------------------------
Number of residues aligned by the prediction
Alignment sensitivity is the percentage of aligned target sequence residues as found in the structure-comparison alignment, that have also been correctly aligned in the submission.
Number of correctly aligned residues
Alignment Specificity = 100 * ------------------------------------------------------
Number of residues aligned by the structure comparison
In a threading/fold-recognition alignment, one or several of the target sequence segments might be placed on the correct structural segments, but with a small shift error with respect to the structural alignment. In order to be able to detect alignment models, which are "globally similar", though not perfect as described by the Alignment Specificity, ASp2 counts the fraction of correctly aligned residues with allowing for a shift error of +/- 2 residues for each segment.
In a threading/fold-recognition alignment, one or several of the target sequence segments might be placed on the correct structural segments, but with a small shift error with respect to the structural alignment. In order to be able to detect alignment models, which are "globally similar", though not perfect as described by the Alignment Specificity, ASp4 counts the fraction of correctly aligned residues with allowing for a shift error of +/- 4 residues for each segment.
The mean shift error is the average distance (in 1-residue units) between residues of the database structure, to which the same target sequence residue has been aligned in the submission and the structure comparison result. It is calculated from a fraction of residues, which are aligned in the submission, only; this fraction (the alignment coverage) is des- cribed below.
sum(abs(Position Residue i is aligned to in prediction -
(i) Position Residue i is aligned to in Structure comp.))
Mean Shift Error = 100 * --------------------------------------------------------------------
No. of residues aligned by prediction AND structure comparison
No. of residues aligned by prediction AND structure comparison as well
Alignment coverage = 100 * ----------------------------------------------------------------------
No. of residues aligned by the prediction
Each correct prediction gives one (or several) alignments between the target sequence and database structures. In principle, a three-dimensional model (at least for the backbone atoms) for the target protein chain could be constructed by simply copying the coordinates of the database structure(s), according to the alignment. The alignment specificity, as described above, is very sensitive to even small mean shift errors. An alignment with a low mean shift error can have an alignment specificity of 0, when none of the residues is aligned correctly, but still a high fraction of contacts can be predicted correctly by the model. Furthermore, some structures have repeated substructures, and an alignment with a high mean shift error and an alignment specificity of 0 can still predict a rather large fraction of contacts correctly, when, for example, the alignment is shifted by one or several of repetitive units in a structure.
For the calculation of Contact Specificity and Contact Sensitivity, contacts as predicted by the threading alignment model are compared against contacts found in the actual target structure. Contact Specificity ist the fraction of contacts, as predicted by the alignment model, that also have been observed in the target structure, while Contact Sensitivity is calculated as the number of correctly predicted contacts divided by the number of actual contacts in the target structure.
The score assigned to each hit in structure comparison results is treated as the probability, determined by the respective method, that the target structure and the respective database structure are similar. This quantity, found in structure-structure comparison results for the respective hit, is listed in the alignment-tables. Note that a Structure comparison weight of less than 0.5 is ignored, and therefore models based on these structures are not evaluated.
Gives the number of target sequence residues aligned to database structure residues in the structure-structure comparison results.
Root mean square deviations from structure-structure comparisons were listed in the respective results, these are copied to the alignment evaluation tables. If missing, RMSD-values were calculated from the C-alpha coordinates of a virtual "model", constructed from the database-structure of a hit and the corresponding alignment, and superposition of this model on the actual target structure in coordinate space. These RMSD-values are given in Angstroem-units.
This number gives the fraction of residues from the target sequence, which have been aligned to identical residue types by the structure comparison method. The fraction is expressed as a percentage.
A threading/fold-recognition prediction has assigned fractions of the total bet to distinct hits from its list of database structures. The number reported as "AWgt" is the fraction of this bet, placed on the hit corresponding to the alignment evaluated. This fraction is calculated without taking into account the bet placed on the dummy structure "NONE".
Gives the number of target sequence residues aligned to database structure residues in the prediction's alignment model.
As for the structure-structure comparison alignment, this number gives the fraction of residues from the target sequence, which have been aligned to identical residue types by the prediction; the fraction is again expressed as a percentage.
The submission format permitted the use of probabilistic alignments; instead of specifying just one way for the target sequence to be aligned to a database structure, several different alignments could be specified, along with different probabilities that were required to sum up to 1. In the evaluation, probabilistic alignments are converted into matrices (target-sequence residues x database- structure residues), that hold the alignment information as "weights" being associated with the assignment of particular target-sequence-residues to particular database-structure-residues. If a target sequence residue is aligned to some position in all the probabilistic alignments given, the respective row in the alignment-matrix will sum up to 1.0; however, different probabilistic alignments can vary in length an in the regions they cover, so they do not in general provide a sum-weight of 1.0 for each of the aligned residues!
For the calculation of alignment specificity and sensitivity, the elements of the alignment-matrix are treated as individual weights assigned to the respective sequence-residue - structure-position matches. Alignment specificity is then calculated as the sum over all the weigths assigned to correctly aligned positions (as compared with the structure comparison), divided by the sum over all the weights assigned. Similarly the alignment sensitivity is calculated as the weight assigned to correctly aligned positions divided by the sum over all the weights in the structure-comparison alignment matrix.
html-ized by C. Hogue, A. Marchler-Bauer
