- Journal List
- NIHPA Author Manuscripts
- PMC3076516

# New benchmark metrics for protein-protein docking methods

## Abstract

With the development of many computational methods that predict the structural models of protein-protein complexes, there is a pressing need to benchmark their performance. As was the case for protein monomers, assessing the quality of models of protein complexes is not straightforward. An effective scoring scheme should be able to detect substructure similarity and estimate its statistical significance. Here, we focus on characterizing the similarity of the interfaces of the complex and introduce two scoring functions. The first, the interfacial Template Modeling score (*i*TM-score), measures the geometric distance between the interfaces, while the second, the Interface Similarity score (IS-score), evaluates their side chain contact similarity in addition to their geometric similarity. We first demonstrate that the IS-score is more suitable for assessing docking models than the *i*TM-score. The IS-score is then validated in a large-scale benchmark test on 1,562 dimeric complexes. Finally, the scoring function is applied to evaluate docking models submitted to the Critical Assessment of PRediction of Interactions (CAPRI) experiments. While the results according to the new scoring scheme are generally consistent with the original CAPRI assessment, the IS-score identifies models whose significance was previously underestimated.

**Keywords:**docking, protein-protein interaction, protein-protein interface, structure prediction, TM-score, IS-score, CAPRI

## Introduction

In the quest to determine all protein-protein interactions in a given proteome, recent high-throughput technologies have enabled substantial progress^{1}^{–}^{3}. Drafts for different model systems are emerging, though many details are still missing^{4}^{–}^{7}. The mapping of protein-protein interactions, however, is just a starting point towards revealing their functional roles in living biosystems. In order to understand protein-protein interactions, it is necessary to structurally characterize all representative protein complexes at high resolution^{8}.

Despite rapid growth in the number of structurally solved protein complexes^{9}, the pace of structure determination lags far behind the pace of the detection of protein-protein interactions. To fill this gap, many computational approaches have been proposed for predicting the structures of protein complexes. They can be roughly categorized into two types: Template-Based (TB) and Template-Free (TF). In TB approaches^{10}^{–}^{15}, one first builds a homology model based on a solved template structure, and then refines the model. In TF approaches^{16}^{–}^{23}, also known as protein-protein docking methods, one docks unbound components that form the target complex. Both methods have advantages and disadvantages. TB approaches generally have higher accuracy, but suffer from low coverage because of their dependence on the availability of template structures. Although the issue of low coverage might be overcome by the recognition that the structural space of protein-protein interfaces is highly degenerate^{24}, in practice identifying which protein pairs actually interact is very challenging. On the other hand, TF approaches can deal with a novel target whose quaternary structure does not match any solved template structure, but there is no guarantee of high quality docking models, particularly when bound structures undergo significant conformational changes from the unbound structures^{25}. Furthermore, TF approaches require the information that two input proteins interact; that is, they are not reliable in predicting whether two proteins interact or not, largely due to the limitations of force fields used for evaluating interaction energy^{26}. By comparison, TB methods usually contains (explicitly or implicitly) an evolutionary component, which prefers templates sharing conserved biological interactions with target proteins. Thus, in addition to predicting the structure of protein interactions, TB methods may be used to predict whether two proteins interact.

To benchmark the performance of docking methods, a community-wide experiment, known as CAPRI, has been carried out^{27}^{–}^{29}. One central task is to measure the quality of a predicted docking model, using its target structure, usually a solved crystal structure, as the gold standard. Furthermore, in the case of template-based modeling, it is also critical to measure the quality of both the template and the final model. Thus, any improvement or deterioration resulting from the “refinement” procedure, designed to improve over the template alignment, can be evaluated. For these purposes, one needs to derive effective structure comparison metrics. The CAPRI assessors employed complex criteria based on the Root Mean Square Deviation (RMSD) and the fraction of conserved native contacts *f _{nat}*

^{29}. While these criteria are convenient, they have three limitations: The first is that RMSD is often dominated by the largest deviations, and hence, may overlook substructure similarity. The second is that the statistical significance of a given RMSD value is length dependent

^{30}. The third is that the thresholds employed for model quality classification are often subjective, in the sense that an assessment of the statistical significance of the given structural comparison metric is lacking.

The problem of model quality assessment is not unique to protein docking experiments. An analogous situation was encountered in the evaluation of structural models predicted for monomeric proteins. In the recent Critical Assessment of Protein Structure Prediction (CASP), several commonly used scoring functions include the Global Distance Test (GDT) score^{31}, the MaxSub score^{32}, and the TM-score^{33}. The statistical significance relative to random of both the GDT and the MaxSub scores are sensitive to the size of the target protein^{33}. As a result, one often cannot tell whether a raw score indicates a significant prediction. By contrast, the TM-score corrects for length effects. Based on the statistics obtained from comparing random protein structures at various lengths, a TM-score of 0.4 or higher indicates a significant prediction^{33}. Other statistically rigorous treatments also have been undertaken to calculate the significance (i.e., *P*-values) of protein models^{34}^{,}^{35}.

Previously, we introduced the *i*TM-score and the IS-score in iAlign^{36}, a program for the structural comparison of protein-protein interfaces based on interface structure alignments, where the equivalence of target and template residues is not *a priori* specified. It has been shown that the IS-score is an effective metric for evaluating structural alignments of protein-protein interfaces^{24}^{,}^{36}. In this study, we examine both scoring functions for measuring the quality of docking models. The key difference between the previous study and the current one is that iAlign does not require any previously specified sequence correspondence, whereas in the current scenario, the mapping of equivalent target-template residues is specified in advance. As a result, one needs to adjust the random background and re-calibrate the statistical models, as detailed below. Furthermore, we performed large-scale benchmark tests to compare and validate our scoring schemes and applied the IS-score to docking models submitted to the CAPRI experiments.

## Methods

A heavy-atom distance cutoff of 4.5 Å is employed to define an interfacial contact. A protein-protein interface is the collection of all residues with at least one interfacial contact between pairs of proteins.

### Scoring function and search algorithm

Assuming that a native (target) structure has *L* interfacial residues, the *i*TM-score of a corresponding docking model is defined by comparing the geometric distances of the native interfacial residues of the model and the native structure^{33}^{,}^{36},

where *N _{a}* is the number of superimposed native interfacial residues,

*d*is the Euclidean distance between the C

_{i}_{α}atoms from the

*i*th superimposed residue pair, and the empirical scaling factor ${d}_{0}\equiv 1.24\sqrt[3]{{L}_{Q}-15}-1.8$ is introduced to correct for length effects. Note that the definition of the

*i*TM-score is exactly the same as used for assessing the model quality of the global structure alignment of monomeric proteins

^{33}. However, the TM-scores of interfaces and of individual proteins have a different level of statistical significance at the same numerical value (see below). To avoid confusion, we use the term

*i*TM-score to denote the TM-score of interfaces and reserve the notation TM-score for the global comparison of a pair of structures.

In order to calculate the distance *d _{i}*, a subset of corresponding residues are superimposed using the Kabsch algorithm

^{37}, which minimizes their pairwise root-mean-square deviation, RMSD. Since there are many ways to select the subset, the notation max in Eq. 1 indicates that the

*i*TM-score is the maximum out of all possible superimpositions. A heuristic iterative extension algorithm is employed to calculate the

*i*TM-score

^{33}, similar to the one used for calculating the GDT-score

^{31}and MaxSub

^{32}. Briefly, we select fragments of size

*L*=

_{sub}*L, L/2, L/4, …, 4*, respectively. When

*L*is less than

_{sub}*L*, initial fragments are selected by sliding continuously along the native interface. Starting from an initial fragment of size

*L*, the corresponding residues within

_{sub}*L*in the model and native interfaces are superimposed. Then, all model/interface residue pairs within a distance less than

_{sub}*d*are collected and superimposed again. The process is iterated until the rigid-body transformation converges.

_{0}The second scoring function is the Interface Similarity score (IS-score), which measures not only geometric distances but also the conservation of interfacial contacts^{36}. The IS-score is derived from the *i*TM-score as follows,

Here, the contact overlap factor *f _{i}* (

*c*+

_{i}/ a_{i}*c*) / 2, where

_{i}/ b_{i}*a*is the number of interfacial contacts observed at the

_{i}*i*th position of the native interface,

*b*is the number of interfacial contacts observed at the corresponding position in the model, and

_{i}*c*is the number of interfacial contacts conserved in both interfaces. If

_{i}*c*= 0,

_{i}*f*is 0, regardless of the value of

_{i}*b*. The scaling factor ${s}_{0}\equiv 0.14-0.2/{L}_{Q}^{0.3}$ is introduced to make the means of the IS-scores length-independent among randomly selected interfaces (see below). Note that the scaling factor is slightly different from what was derived previously in iAlign

_{i}^{36}. The adjustment is introduced to correct for a small shift in the means of the IS-scores among random interfaces. The search algorithm for calculating the IS-score is essentially the same as describe above for the

*i*TM-score.

Both the *i*TM/IS-score give a maximum score of one for a perfect model.

### Statistical significance

The statistical significance of the IS-score is estimated by comparing 24,120 randomly selected interface pairs of same lengths (see Data Set). In each pair, interfacial residues at the same positions in respective sequences are arbitrarily assigned as equivalent. Figure 1A shows the means of the IS-scores and *i*RMSD values of unrelated interfaces. Without applying the scaling factor, the raw IS-score calculated using Eq. 3 decreases exponentially as the length of the interface increases. Likewise, the mean random *i*RMSD value increases exponentially as the interface size increases. By comparison, the re-scaled IS-scores are approximately length-independent at a mean value of 0.10. It should be noted that the mean of random IS-scores calculated here is smaller than the mean of random IS-scores calculated previously with the program iAlign^{36}. The reason is that iAlign does not *a priori* impose a one-to-one sequence correspondence. Therefore, iAlign usually finds a better correspondence (or alignment), which gives a higher IS-score even for randomly related interfaces.

**A**) Mean of IS-scores and interfacial RMSD values versus the size of protein interfaces. Horizontal dashed lines are located at 0.1. (

**B**) Distribution of the IS-score among random interfaces.

**...**

Since the IS-scores are maxima, the extreme value distribution is a suitable statistical model for describing their distribution. As shown in Figure 1B, the probability density function of the IS-scores calculated from the random background follows the extreme value distribution,

where *z* denotes the Z-score given by *z* = (*s* − μ) / σ. The variable *s* denotes the IS-score; μ is the location parameter, and σ is the scale parameter. The corresponding *P*-value of the score can be calculated according to the formula

The scores from random interfaces were fit to Eq. 4. The resulting *P*-values and their corresponding IS-scores are given in Table I. The calculated *P*-values according to the statistical model agree with the empirical values obtained by ranking the IS-scores of 24,120 random interface pairs. One may use these scores to quickly estimate statistical significance.

An improved estimation of statistical significance is obtained by modeling the distributions of scores at specific lengths. Fig. 2 shows the observed and modeled distributions at various lengths. Each distribution is modeled by the Gumbel distribution described in Eq. 4. The location and scale parameters can be estimated through linear regression fits,

**...**

The parameters *a* to *d*, given in Table II, were obtained by linear fitting to the location and scale parameters, which were obtained through maximum likelihood estimates with the EVD package in the statistical platform R (http://www.r-project.org/).

### Analysis measures

In addition to the *i*TM/IS-score, we also define common metrics adopted for evaluating docking models^{29}. The smaller/larger of the two monomers in a binary complex are termed as the ligand/receptor of the complex. Let *N*_{c} denote the number of interfacial contacts observed in the native complex structure, and *n* the number of native interfacial contacts preserved in the docking model. The fraction of native contacts is *f _{nat}*

*n/N*. The interfacial RMSD,

_{c}*i*RMSD, is the RMSD of the C

_{α}atoms of interfacial residues observed in a native structure with respect to their positions in a docking model, and the ligand RMSD,

*l*RMSD, is the global RMSD of the C

_{α}atoms of all ligand residues. The

*i*RMSD is calculated after superimposing these native interfacial residues, whereas the

*l*RMSD is calculated after superimposing the receptors.

### Data sets

#### (i) Random Background

The random background for statistical significance analysis was derived from the M-TASSER template library^{11}. We first obtained all-against-all pairs of all dimeric complexes. A pair of dimers was then selected, if any two monomers, one from each dimer, have a global sequence identity < 30% and a global TM-score < 0.4. This selection led to a set of globally unrelated dimer pairs. Since IS-score requires that the two interfaces have the same length, we randomly removed interfacial residues of the longer interface, if the two interfaces are of different size. The removal was carefully done by requiring that all remaining interfacial residues maintain at least one interfacial contact. To prevent possible over-representation of any given dimer, we further required that no dimer appears more than 20 times in the final selections. The procedure yielded 24,120 pairs of interfaces, which were used for estimating the statistical significance of the IS-score. In each pair, two interfacial residues were assigned as equivalent if they appear at the same positions in respective sequences after removing all non-interfacial residues.

#### (ii) Decoy Set

For the comparison between the *i*TM-score and the IS-score, we used a decoy set from the Dockgound^{38}. The decoy set was curated from docking models generated with unbound protein structures for 61 target complexes. We further define a near native docking model if it has *l*RMSD ≤ 5 Å and *f _{nat}* > 30%, and define an incorrect model if it has

*l*RMSD > 5 Å and

*f*= 0%. The procedure produced 425 near native models and 5,232 incorrect models.

_{nat}#### (iii) Docking Set

From the M-TASSER template library^{11}, we selected 1,526 complexes whose individual proteins are less than 500 amino acids in length. Rigid-body docking using the bound structures from the complexes were subsequently carried out with the program FT-Dock^{23} using default parameters. The top 100 docking models, ranked by shape complementarity, were retained for validating the statistical significance of the IS-scores. In total, we collected 152,600 models by pooling together the top 100 docking models from all complexes.

#### (iv) CAPRI Models

The docking models for recent CAPRI targets were downloaded from the official web site (http://www.ebi.ac.uk/msd-srv/capri/). We selected ten recent protein-protein targets (T24 – T36, except for cancelled T26, and RNA/protein targets T33 and T34), for which the docking models were available to the public. The criteria adopted by the CAPRI assessors for model quality evaluation are the following^{29}:

- High –
*f*≥ 0.5 & (_{nat}*l*RMSD ≤ 1 Å ‖*i*RMSD ≤ 1 Å), - Medium – (
*f*≥ 0.5 &_{nat}*l*RMSD > 1 Å &*i*RMSD > 1 Å) ‖ (*f*≥ 0.3 &_{nat}*f*< 0.5 &_{nat}*l*RMSD ≤ 5 Å &*i*RMSD ≤ 2 Å), - Acceptable – (
*f*≥ 0.3 &_{nat}*l*RMSD > 5 Å &*i*RMSD > 2 Å) ‖ (*f*≥ 0.1 &_{nat}*f*< 0.3 &_{nat}*l*RMSD ≤ 10 Å &*i*RMSD ≤ 4 Å), - Incorrect –
*f*< 0.1 ‖ (_{nat}*l*RMSD > 10 Å &*i*RMSD > 4 Å).

The notions & and ‖ denote logical conjunction and disjunction, respectively. It should be noted that the CAPRI assessors employed distance cutoffs of 5 and 10 Å to define interfacial residues separately for calculating *f _{nat}* and

*i*RMSD. In this study, we only used the final assessments (i.e., High, Medium, Acceptable, and Incorrect) provided by the CAPRI assessors.

### Availability

The data sets and IS-score software package including the source code are freely available at http://cssb.biology.gatech.edu/isscore.

## Results

### IS-score versus *i*TM-score

We first compare the performance of the IS-score and *i*TM-score on evaluating the quality of docking models. For this comparison, we selected 425 near native and 5,232 incorrect docking models from a Dockgound decoy set generated with unbound protein structures (see Methods). As shown in Fig. 3A, the distributions of the IS-scores for near native and incorrect docking models are well separated. Near native docking models all have an IS-score above 0.17, and 97% of the IS-scores > 0.25, whereas incorrect models all have the scores < 0.12. By comparison, an overlapping regime in the *i*TM-scores is observed between near native and incorrect models. Incorrect docking models have their *i*TM-scores ranging from 0.33 to 0.68; and a peak is observed at 0.5. The peak is due to the superimposition of one side of the protein interface. Most unbound protein structures used for docking are structurally very close to their bound structural forms. In these cases, at least half of a native interface can be superimposed to its counterpart in a docking model, despite the fact that the other side of the interface is far away from it native position in an incorrect model. Such superimposition gives a significant *i*TM-score > 0.4, as overlapping the score regime of the near native models from 0.4 to 0.9.

*i*TM/IS-scores for assessing the quality of protein docking models. (

**A**) Score distributions of incorrect docking models and of near native docking models.

*i*TMS and ISS denote

*i*TM-score and ISS score, respectively. (

**B**) ROC curves of sensitivity

**...**

The performance of IS-score and *i*TM-score is further displayed in the Receiver Operating Characteristic (ROC) curves (Fig. 3B), where the sensitivity is the fraction of near-native models, and the false positive rate is the fraction of incorrect models. The ROC curves were obtained by varying the thresholds of the *i*TM/IS-score. The IS-score has a perfect ROC curve with the value of AUC_{0.2} (Area Under Curve up to a 20% false positive rate) of 1, whereas the *i*TM-score has an AUC_{0.2} value of 0.76. Overall, the analysis demonstrates that a similarity metric based purely on geometric distances has an intrinsic flaw for evaluating docking models and that the IS-score yields a much more accurate assessment by taking interfacial contacts explicitly into account.

### Discriminating docking models

To further examine whether the IS-score returns a reasonable estimate of statistical significance, we further performed large scale tests on a total of 152,600 docking models for the 1,526 target complexes. Each model was assessed according to the IS-score with respect to the native structure. As expected, the vast majority (96%) of these models have an insignificant IS-score with *P* > 0.01, while a small fraction (3.2%) of docking models resemble the native structure at a high level of similarity with *P* < 1×10^{−10} (Fig. 4A).

**A**) Number of docking models according to the IS-score

*P*-values. Box plots of docking models according to (

**B**) fraction of native contacts preserved in models, (

**C**) interfacial

**...**

As shown in Fig. 4B and C, all docking models within 2.5 Å *i*RMSD from native structures or with a *f _{nat}* value > 30% have a significant

*P*better than 1×10

^{−6}, mostly, better than 1×10

^{−10}. Conversely, almost all interfaces with

*P*< 1×10

^{−6}have an

*i*RMSD of less than 2.5 Å and a

*f*of more than 30%. In rare exceptions, a docking model has a significant

_{nat}*P*< 1×10

^{−6}, while exhibiting a relatively high

*i*RMSD/

*l*RMSD > 3/8 Å and low native contacts < 30%. These cases are from docking very large complexes with usually more than 150 interfacial amino acids. Two cases are shown in Fig. 5. Despite a high

*l*RMSD of 9.5 Å, visual inspection suggests that the docking model shown in Fig. 5A resembles very well the native structure, validating the estimated high

*P*-value of 6.5×10

^{−23}. In Fig. 5B, the docking model has a

*P*of 5.4×10

^{−7}, due to the maintenance of 22 native contacts, despite a different orientation from the native docking pose.

**A**) a putative citrate lyase (PDB code: 1xr4, chain A and B) and (

**B**) an aminotransferase (PDB code: 1dty, chain A and B). In each snapshot, the two chains from docking model are colored in cyan/orange, and the corresponding chains

**...**

Virtually all insignificant models at *P* > 0.01 has an *i*RMSD > 3 Å and *f _{nat}* < 10%. About 1% of docking models exhibit an interface that bears a significant similarity to the native interface with a

*P*between 0.01 and 1×10

^{−6}. These model interfaces typically have a

*i*RMSD between 5 to 10 Å and preserve 10 to 30% of native contacts. They usually overlap a part of the native interface.

### Assessing CAPRI models

Finally, we applied the IS-score to assess the quality of docking models submitted by various research groups for ten recent CAPRI targets. The results of the IS-score evaluations are compared to the official assessments provided by the CAPRI organizers, who categorized each model into one of four groups: Incorrect, Acceptable, Medium, and High, according to *i*RMSD, *l*RMSD, and *f _{nat}* (see Methods). A total of 2,874 Incorrect, 117 Acceptable, 59 Medium, and 16 High quality models for these ten targets were evaluated. Consistent with the CAPRI assessments, the overall distributions of the four groups of docking models are clearly separated according to either the IS-scores or their

*P*-values (Fig. 6). The means of the IS-scores/

*Log*

_{10}

*P*are 0.08/−0.21 (I), 0.26/−5.7 (A), 0.48/−14.0 (M), and 0.69/−21.2 (H), respectively.

**A**) the IS-score

*P*-values and (

**B**) the IS-score. Legends indicate model quality provided by the CAPRI assessors. (

**C**) One example (model ID: T26_P41.M02) of CAPRI docking models for target T26. The model was categorized

**...**

Out of 192 models with Acceptable or better quality, 174 (91%) and 185 (96%) have a significant *P* < 0.01 and 0.05, respectively. Only seven Acceptable models have a *P* > 0.05. These models, from targets T24, T25, T27, and T29, have about 10% to 15% native contacts correctly modeled. However, the numbers of preserved native contacts are small, considering that their native interfaces consist of about 40 native contacts or less. On the other hand, a total of 263 models have a significant similarity to their target interface at a *P* < 0.01 or better. Among these significant models, 89 were assigned as Incorrect. Most of these significant Incorrect models are from targets T26 and T32, which have relatively large interfaces with more than 60 native contacts. The difference between the CAPRI and IS-score assessments can be attributed to two main reasons. First, the CAPRI assessment uses *f _{nat}* and RMSDs, with a size-dependence issue, whereas the IS-score takes the length effect into account. Second, the IS-score only considers interface similarity but ignores global orientation. A slight rotation could lead to a large

*l*RMSD, despite the fact that the

*i*RMSD is relatively small. An example of an Incorrect model with significant interface similarity is shown in Fig. 6C. Visual inspection suggests that the model has good interface similarity at

*i*RMSD of 4.3 Å and 27%

*f*. However, a slight tilt around the interface leads to a large

_{nat}*l*RMSD of 12 Å.

Fig. 7 shows the quality of individual docking models for each target. For all targets with the exception of T25, unbound or homology structures were provided as the starting structures for docking experiments. Overall, it is clear that higher ranked models have better quality, consistent with the official assessments. In particular, for targets T35 and T36, where only one Acceptable model was found, these two Acceptable models were the best as assessed by the *P*-value of the IS-score. Additionally, corresponding to the assessment that no Acceptable model was found for T28, the top ranked docking model of the same target has a marginal *P* value of 0.047.

## Discussion and Conclusion

Currently, *i*RMSD and *l*RMSD are metrics commonly employed in docking studies. The major advantages of the RMSD metrics are two-fold: First, the overall quality of a docking model is guaranteed if one uses a very conservative RMSD criterion; second, the calculation of RMSD is very straightforward. However, RMSD metrics also have two significant disadvantages. First, it is well-known that the statistical significance of a given RMSD value is length dependent (e.g., Fig. 1A). As a result, there is no straightforward relationship between RMSD values and the statistical significance of docking models. This is reflected in a simple fact that, at the same *i*RMSD value (e.g., 3 Å), to build a docking model for a 100 residue interface is more difficult than for a 30 residue interface. In addition, RMSD metrics are global metrics, meaning that local similarity may not be properly characterized by RMSDs. One extreme example is shown in Fig. 8, where the docking model has a highly significant IS-score of 0.43 (P = 2×10^{−18}), despite a large *i*RMSD value of 13.2 Å, caused by the rotations of two helical segments. Other than the two helical segments, the remainder (60%) of the interface superimposes with an RMSD of less than 2 Å between the model and the native structure. Obviously, the model in this example is not a random prediction. For the purpose of assessing a docking method, it is important to differentiate such a case from a random model prediction. Overall, one should be cautious in using RMSD metrics to assess the quality of a docking model.

*i*RMSD. The model structure is superimposed onto the native structure in (

**A**) top view and (

**B**) side view. The model structure overlaps the native structure (PDB

**...**

We have introduced and examined the performance of two scoring schemes, the iTM-score and the IS-score, for use in assessing the quality of protein-protein docking models. Both scores are able to detect significant substructure similarity if it exists. While the iTM-score is based on geometric distances, the IS-score combines both interfacial contacts and geometric distances. In benchmark tests of 425 near native models and 5,232 randomly related, incorrect models, generated from rigid-body docking of unbound protein structures, the IS-score achieves a perfect classification at an AUC_{0.2} value of 1, whereas the iTM-score gives an inferior performance at an AUC_{0.2} value of 0.76. The main issue with the iTM-score is that the interaction pose is not explicitly taken into account. As a result, an artificially high iTM-score may be obtained through the superimposition of one side of the interface, while the other side of the interface may be far away from its native position. The issue is intrinsic to all scoring functions based solely on geometric distances. By comparison, the introduction of the contact overlap factor in the IS-score scheme eliminates this issue.

For a proper model quality assessment, it is important to assess statistical significance of predicted models. Using random interfaces as the background, we have derived statistical models for estimating the significance of the IS-score. The estimation is validated on 156,200 randomly selected docking models. Virtually all highly significant interfaces with *P* > 10^{−6} are native-like, and conversely, all native-like docking models display a highly significant *P* > 10^{−6}, mostly > 10^{−10}. By contrast, insignificant models with *P* > 10^{−6} have a *i*RMSD > 3 Å and a *f _{nat}* < 10%. Models with

*P*between 0.01 and 1×10

^{−6}have some interfacial similarity, but may exhibit a rotation that gives a relatively large

*l*RMSD.

The IS-score is further applied to evaluate the docking models for ten recent CAPRI targets. Overall, the evaluation of the IS-score is consistent with the official CAPRI assessment. On average, the mean of the IS-scores are 0.26, 0.48, and 0.69, for Acceptable, Medium, and High resolution models, respectively. However, it appears that the official assessment is somewhat conservative. According to the *P*-values of the IS-scores, we identified quite a few models whose significance is under-estimated. The IS-score scheme is conceptually simple and statistically sound. One further application of the scheme is to use it as the objective function for method optimization.

## Acknowledgments

This work is supported by the National Institutes of Health Grant No. GM-48835.

## References

## Formats:

- Article |
- PubReader |
- ePub (beta) |
- PDF (2.6M) |
- Citation

- Docking and scoring protein interactions: CAPRI 2009.[Proteins. 2010]
*Lensink MF, Wodak SJ.**Proteins. 2010 Nov 15; 78(15):3073-84.* - Blind predictions of protein interfaces by docking calculations in CAPRI.[Proteins. 2010]
*Lensink MF, Wodak SJ.**Proteins. 2010 Nov 15; 78(15):3085-95.* - Docking by structural similarity at protein-protein interfaces.[Proteins. 2010]
*Sinha R, Kundrotas PJ, Vakser IA.**Proteins. 2010 Nov 15; 78(15):3235-41.* - Protein-protein docking tested in blind predictions: the CAPRI experiment.[Mol Biosyst. 2010]
*Janin J.**Mol Biosyst. 2010 Dec; 6(12):2351-62. Epub 2010 Aug 19.* - Genome-wide studies of protein-protein interaction.[Curr Opin Struct Biol. 2003]
*Janin J, Séraphin B.**Curr Opin Struct Biol. 2003 Jun; 13(3):383-8.*

- Mapping Monomeric Threading to Protein–Protein Structure Prediction[Journal of chemical information and modelin...]
*Guerler A, Govindarajoo B, Zhang Y.**Journal of chemical information and modeling. 2013 Mar 25; 53(3)717-725* - Exploiting conformational ensembles in modeling protein-protein interactions on the proteome scale[Journal of proteome research. 2013]
*Kuzu G, Gursoy A, Nussinov R, Keskin O.**Journal of proteome research. 2013 Jun 7; 12(6)2641-2653* - APoc: large-scale identification of similar protein pockets[Bioinformatics. 2013]
*Gao M, Skolnick J.**Bioinformatics. 2013 Mar 1; 29(5)597-604* - CONS-COCOMAPS: a novel tool to measure and visualize the conservation of inter-residue contacts in multiple docking solutions[BMC Bioinformatics. ]
*Vangone A, Oliva R, Cavallo L.**BMC Bioinformatics. 13(Suppl 4)S19* - Protein Docking by the Interface Structure Similarity: How Much Structure Is Needed?[PLoS ONE. ]
*Sinha R, Kundrotas PJ, Vakser IA.**PLoS ONE. 7(2)e31349*

- New benchmark metrics for protein-protein docking methodsNew benchmark metrics for protein-protein docking methodsNIHPA Author Manuscripts. May 2011; 79(5)1623

Your browsing activity is empty.

Activity recording is turned off.

See more...