- We are sorry, but NCBI web applications do not support your browser and may not function properly. More information

- Journal List
- BMC Bioinformatics
- v.10; 2009
- PMC2800122

# Protein-protein docking using region-based 3D Zernike descriptors

^{1}Department of Biological Sciences, Purdue University, West Lafayette, Indiana 47907, USA

^{2}Department of Computer Science, Purdue University, West Lafayette, Indiana 47907, USA

^{3}Markey Center for Structural Biology, Purdue University, West Lafayette, Indiana 47907, USA

^{}Corresponding author.

## Abstract

### Background

Protein-protein interactions are a pivotal component of many biological processes and mediate a variety of functions. Knowing the tertiary structure of a protein complex is therefore essential for understanding the interaction mechanism. However, experimental techniques to solve the structure of the complex are often found to be difficult. To this end, computational protein-protein docking approaches can provide a useful alternative to address this issue. Prediction of docking conformations relies on methods that effectively capture shape features of the participating proteins while giving due consideration to conformational changes that may occur.

### Results

We present a novel protein docking algorithm based on the use of 3D Zernike descriptors as regional features of molecular shape. The key motivation of using these descriptors is their invariance to transformation, in addition to a compact representation of local surface shape characteristics. Docking decoys are generated using geometric hashing, which are then ranked by a scoring function that incorporates a buried surface area and a novel geometric complementarity term based on normals associated with the 3D Zernike shape description. Our docking algorithm was tested on both bound and unbound cases in the ZDOCK benchmark 2.0 dataset. In 74% of the bound docking predictions, our method was able to find a near-native solution (interface C-*α*RMSD ≤ 2.5 Å) within the top 1000 ranks. For unbound docking, among the 60 complexes for which our algorithm returned at least one hit, 60% of the cases were ranked within the top 2000. Comparison with existing shape-based docking algorithms shows that our method has a better performance than the others in unbound docking while remaining competitive for bound docking cases.

### Conclusion

We show for the first time that the 3D Zernike descriptors are adept in capturing shape complementarity at the protein-protein interface and useful for protein docking prediction. Rigorous benchmark studies show that our docking approach has a superior performance compared to existing methods.

## Background

Protein-protein interactions are a pivotal component of many biological processes and mediate a diverse variety of functions that include signal transduction, antibody-antigen complex, transport, and gene expression regulation, to name a few [1-4]. Although detailed structural information of a protein complex is necessary for the understanding of the underlying mechanisms, such details are often difficult to obtain through traditional experimental means (X-ray crystallography, NMR). To this end, a number of computational protein docking methods have been developed [5,6], which employ various schemes for describing both geometric and energetic complementarity at the docking interface and explore the docking conformational space [7-11].

Although significant efforts have been made to develop protein docking prediction methods, recent Critical Assessment of PRediction of Interactions (CAPRI) [12] experiments have shown that plenty of room for improvement still remains. It is often discussed that one of the weaknesses of current docking methods is that they are not fully capable of handling the inherent flexibility of protein chains, which induces varying degrees of conformational change upon docking [13,14]. Indeed, accounting for full flexibility of proteins is still a very challenging task for any computational method, especially for cases where the main-chain of a protein undergoes substantial conformational change (a root mean square deviation (RMSD) of a few Angstroms) upon docking. Such large fluctuations of loop regions or domain motions associated with several proteins is difficult to predict even using current molecular dynamics simulation [15] or *ab initio *protein structure prediction approaches [16-18]. Therefore, some of the earliest works on protein docking treat interacting proteins as rigid-bodies using geometrical complementarity at the docking interface to guide the search for putative binding modes [9,19,20].

The importance of effective methods of capturing geometrical information has been recently emphasized because shape complementarity forms the basis of most docking schemes, including those that accommodate a reasonable amount of conformational change. This can be attributed to the following reasons: (1) There are many protein complexes which undergo only subtle conformational change upon docking, which are practically well approximated by rigid-body docking methods [21]. Enzyme-inhibitors, an important class of protein complexes, fall into this category. (2) Flexibility of the protein can be handled to some extent by introducing a degree of softness in the surface representation [22-24] or by applying energy optimization [15] or side chain conformation search. (3) To account for chain flexibility in the docking prediction, pair-wise docking of simulated conformational ensembles of the interacting proteins can be performed, while treating each pre-determined conformation as a rigid body (cross-docking) [15]. (4) More broadly, effective protein surface shape representation has a wide variety of other applications, including structure-based function prediction [25-28], binding-ligand prediction [29,30], and fast protein database search [31,32]. Accordingly, there has been a renewed interest in shape driven docking [33-37], with more advanced descriptors being introduced. A brief overview of previous studies relating to protein shape representation and their application to docking is given below.

Connolly published pioneering works on detecting shape complementarity, the solid angle approach [38] and its improved variant called shape distributions [39]. These works attempted to distinguish regions with similar concavity but with different shapes. Proceeding along the same lines, a recently published method, Context Shapes (CS) [33] uses Boolean values to represent local shape features by applying a ray tracing scheme. Each context shape captures the local shape within a sphere centered on a surface point. The description is in the form of a binary vector with each bit assigned 1/0 depending on its location, *i.e*. inside or outside the surface. A rotational search is then performed to identify docking orientations that are ranked in terms of the buried surface area. Some approaches have also used surface normals for defining docking orientations and shape complementarity [40,41]. For example, the *S*_{C }statistic uses the product of the normals of the points at the interface weighted by the distance between them to quantify the geometrical match [42]. In the approach proposed by Bordner and Gorin [43], surface normal vectors are anti-aligned and rotated along the common axis provided by the contacting normal. PatchDock combines geometric hashing and pose clustering to identify matching patterns across geometric patches (concave, convex, flat) in the participating surfaces [44]. The algorithm relies on the use of transformation invariant shape signatures (local reference frames) that are defined using pairs of points and their associated normals. The patch based comparison is also used in another docking program 3D-GARDEN [45] that compares the triangular facets of the ligand and receptor surfaces to define initial docking orientations. 3D Grid-based representations, on the other hand, provide a discretized form of the protein surface and volume and have been a popular choice in several docking approaches [19,23,46,47]. While minor conformational changes can be handled implicitly, the surface complementarity of the docked structures can be assessed in terms of the grid overlap, which can be evaluated by applying Fast Fourier transforms (FFT).

More recently, several publications have featured the use of spherical harmonics and its extension, the 3D Zernike descriptors (3DZDs), which have been successfully applied to comparing shapes of proteins and ligands [31,32,48,49]. HEX [50], for example, uses spherical polar basis functions to model surface shapes. It also avoids the use of expensive grid-based calculations employed in FFT based methods and instead uses the expansion coefficients of spherical harmonics to calculate correlations of the ligand and receptor surface overlaps. Spherical harmonics, however, are not rotationally invariant and make use of Wigner matrices [51] to identify rotationally invariant regions. In contrast, 3DZD corrects this drawback while providing a more compact shape definition. Our previous studies [25,31,32,52,53] have shown that the rotation invariant descriptor effectively captures protein surface shape *similarity *on both global and local levels. To our knowledge, no known application of the 3DZD to docking has been reported in literature.

In this paper, we introduce, for the first time, the application of the 3DZD [25,54] for docking. The present work uses 3DZD for capturing protein shape *complementarity *and applies it to protein-protein docking by matching local regions around points across arbitrary rotations. The docking algorithm named LZerD (Local 3D Zernike descriptor-based Docking program) LZerD uses a geometric hashing scheme [10] and incorporates a novel geometric scoring function that serves as an efficient first stage filter. The method does not use any binding site information and is moderately tolerant to conformational changes. The efficacy of the method is demonstrated using the ZDOCK benchmark datasets [21,55] and results are compared with other approaches [33,33,47,50,56]. We first show that 3DZD is able to capture shape complementary of docking interfaces effectively and by applying 3DZD to a docking scoring function it improves docking prediction results. We further show that LZerD has a better performance than others in unbound docking predictions while staying comparable for bound-bound docking cases.

## Results

### Capturing local shape complementarity using 3D Zernike descriptors

To begin with, we demonstrate that the 3DZD sufficiently captures the shape complementarity of protein docking interfaces and can also quantifies the complementarity. As explained in the Methods section, the 3DZD is a series expansion of a three dimensional (3D) function (*i.e*. in this work protein surface is represented by a 3D function). Hence, two surfaces are compared by the series of coefficients assigned to each term in the 3DZD. Here, we compute the correlation coefficient of the 3DZDs of two surfaces to quantify the similarity. A strong advantage of 3DZD is that it is rotation and translation invariant, *i.e*. the same 3DZD is obtained for an object in any pose. While our previous studies used the 3DZD for capturing *similarity *of global and local shapes [25,31,32,57], this study focuses on their use in capturing the *complementarity *across local surface regions and for protein docking prediction.

The 3DZD of two objects with perfect shape complementarity are identical with a corresponding correlation coefficient of 1.0, as illustrated for a hypothetical case in the Additional file 1. Table Table11 lists fifteen protein complexes, for which the 3DZDs of the docking interfaces of bound and unbound proteins are compared. Five complexes are taken from each of the three complex categories in ZDOCK benchmark 2.0, *i.e*. the enzyme/inhibitor, the antigen/antibody, and the others. Docking interfaces of both bound and unbound proteins are compared. The interface is defined as the set of surface grid points within 4.5 Å of any atom of the other protein. Also shown is the change in accessible surface areas upon the complex formation as computed by the NACCESS program [58]. Overall, a high correlation is observed among the 3DZD of the docking interface regions for both bound and unbound complexes; all fifteen cases show a correlation coefficient higher than 0.9 for the bound cases and for all but two of the unbound docking interfaces. SA-barstar complex (PDB: 1AY7) is shown in Figure Figure1,1, which has a significant 3DZD correlation of 0.93. There are five cases in Table Table11 where the unbound interface shows a higher correlation coefficient than the bound interface with the difference of 0.01 or 0.02 (1ATN). To further examine this effect, we rotated the docking interface of two proteins, 1AHW and 2PCC and computed the differences in the correlation coefficients across the different orientations (taken at 30 degree intervals along the x, y and z axis). We found that these are within the error range (within 0.02) and can be attributed to the discretization of the molecular surface i.e. caused by placing proteins in different orientations on the 3D grid (the size of the grid is set to 0.6 Å). Note that the 3DZD is rotation invariant from mathematical point of view, but in practice causes errors due to the voxelization of protein surface. The effect of the variance of the 3DZD upon rotation of the protein has been examined in our previous paper (See Figure Figure22 in the paper by Sael *et al*. [31]).

**The interface region for the protein complex**1AY7

**(ribonuclease SA complex with barstar)**. Magnitudes of 3DZDs of docking interface regions of ribonuclease SA (1AY7-A, interface shape and corresponding 3DZD are shown in black) and barstar (1AY7-B, the interface

**...**

**The correlation coefficient of 3DZD and the angle between normal vectors sampled at surface points of (a), the bound and (b), unbound interface of Ras-RasGAP complex**. The PDB ID is 1WQ1 and 6A21/1WER for the bound and the unbound proteins, respectively.

**...**

As the interface is not known in advance in blind docking predictions, the 3DZDs are computed for local spherical regions centered at selected surface points, then combinations of compatible spheres across the ligand and receptor proteins are sought. The surface normals associated with the points are also combined with the 3DZD to be able to capture both local direction and shape of surface. Figure Figure22 shows examples of the 3DZD correlation and the angle between the normals calculated for points less than 2.5 Å apart in the bound and unbound interfaces of the Ras-RasGAP complex (PDB: 1WQ1). As is shown later, the docking prediction for this complex turned out to be successful by using these two variables to describe the local shape features of the ligand and the receptor.

### Docking prediction results

In what follows, we report results of docking predictions by our algorithm, LZerD. To rank the predictions, LZerD uses a combination of four scoring terms, a reward and a penalty term derived from the 3DZD and normal vectors, the buried surface area, and a penalty term for atomic clashes measured in terms of the excluded volume. Docking conformation search is performed using an efficient geometric hashing scheme that incorporates a kd-tree nearest neighbor algorithm. Weighting factors for combining the four scoring terms are optimized on a training dataset which consists of 29 bound-bound protein complexes taken from ZDOCK benchmark dataset 0.0 and 1.0. The resulting scoring function is then tested on ZDOCK benchmark dataset ver. 2.0. Please refer to the Methods section for details.

First, the results for the training dataset, on which the weighting factors are optimized, are shown (Table (Table2).2). In the second part, results of the unbound docking using LZerD are reported for 84 Benchmark 2.0 complexes are compared (Table (Table3).3). Predictions by LZerD are further analyzed with respect to previously published results of other geometry based docking approaches:

1) ZDOCK(PSC) [47] - The FFT based algorithm incorporates a pairwise shape complementarity (PSC) term that takes into consideration close atomic contacts (within a distance cutoff) between the ligand and the receptor with an added penalty term to account for clashes.

2) PatchDock [56] - Geometric hashing based scheme with a grid based scoring function.

3) Context Shapes [33] - Local shape representation in terms of binary values and a scoring function based on the buried surface area.

4) HEX [50] - Uses spherical polar Fourier correlations to perform a six degree search with a corresponding steric complementarity score (Table (Table44).

### Prediction results on the training set

Table Table22 shows the performance of LZerD for the bound-bound cases used in the training set. A near-native structure or a hit is defined as one with an interface RMSD (iRMSD) of 2.5 Å or lower (better) to the native complex. For the 29 cases considered, a hit for 18 complexes are ranked within the top 100. Table Table22 also details the effectiveness of adding the scoring terms. The first column shows the results the combined weighted score (the LZerD score) as defined in Eqn. 11 with weights as shown in Table Table55 in Methods, the second uses both 3DZD and normal vectors (Eqn. 6), and the last uses only the angle between the normals. In addition, we also compare results by using the buried surface area (Eqn. 7), the excluded volume term (Eqn. 8), and the combination of the two (the last column in Table Table2).2). The weights for combining the buried surface area and the excluded volume terms are optimized in the same way as the weights for the full LZerD score on the same dataset (Table (Table5).5). Comparison of the performance shows that the full LZerD score obtains better (*i.e*. lower) ranks in more cases (20 cases) than the combination of the buried surface area and the excluded volume terms (6 cases), indicating the usefulness of the 3DZD term. The combination of the four terms to yield the LZerD score is justified by the result that it produces the top ranked hits in 20 cases.

### Comparison with ZDOCK, Context Shapes, and PatchDock

To assess the performance of the algorithm, LZerD results were compared with those of ZDOCK(PSC) [47], Context Shapes (CS) [33], and PatchDock [56]. Comparison was performed on the bound-bound docking test set (Additional file 2) and also on the unbound test set of ZDOCK Benchmark 2.0 (Table (Table3).3). Results on the bound test set for CS, ZDOCK (PSC), and PatchDock are taken from the original article of CS [33] (Table (Table22 in their article). As in their study, only the top ranking 3600 predictions are considered. As for the unbound test set, ZDOCK decoys (produced by running ZDOCK in the dense mode with a 6° sampling) for the Benchmark 2.0 dataset (84 complexes) was downloaded from the ZDOCK website http://zlab.bu.edu/zdock/benchmark.shtml. For LZerD and ZDOCK, the top ranking 54000 conformations are considered. The two algorithms are compared in two ways (Table (Table3).3). First, the re-ranking performance of the LZerD scoring function, when applied to the ZDOCK decoys, is examined. Next, the docking predictions by LZerD are compared with the ZDOCK results to assess the conformation searching capability and scoring performance. Results for CS and PatchDock are taken from the original article of CS [33] (Table (Table44 in their article).

For the docking predictions of 76 bound-bound cases (Additional file 3), CS has a better performance than others at each rank cutoff of 100, 500, 1000, and 2000. LZerD comes a close second followed by ZDOCK and PatchDock. This order is also gauged by the number of "wins" and the MLR values (Eqn. 16), which yields 20, 56, 96 and 115 for CS, LZerD, ZDOCK, and PatchDock, respectively.

For the unbound test set (Table (Table3),3), ZDOCK and LZerD show significantly better performance than CS and PatchDock. It would seem that CS is better tuned for bound-bound docking but is not as well suited for handling unbound cases. In comparing the re-ranked ZDOCK decoys the LZerD score (the second block from the right in Table Table3)3) with the original ranking produced by ZDOCK (the block on the left), similar success rates are seen for ranks below 500. They also have the same number of wins, *i.e*. cases when either the LZerD score or ZDOCK (PSC) has a better rank than the other. However, the LZerD score performs better when ranks above 500 are considered with as many as 32 and 38 cases in the top 1000 and 2000, respectively, as compared with 29 and 33 for ZDOCK. Comparison between the docking predictions by LZerD (the right block) and ZDOCK shows that the former has a better performance. LZerD obtains a better rank in nine more cases (24 vs. 33) and has more successful cases in the top 100 and 2000 and a tied performance for ranks below 1000. The two last rows of Table Table33 show head-to-head comparison between the LZerD Rerank or LZerD with the other methods. These comparisons clearly indicate the superior performance of LZerD Reranking and LZerD Docking over the others. The head-to-head comparison between LZerD and LZerD Rerank shows that the former is better than the latter, implying LZerD's better conformation sampling scheme.

Figure Figure33 shows examples of complex predictions by LZerD for four cases. 1AHW is a relatively easy case, where both LZerD and ZDOCK (PSC) have more than 20 hits in the top 2000 decoys (Figure (Figure3A).3A). LZerD obtains a hit ranked in the top five and significantly improves the ranking of corresponding ZDOCK decoys from 268 to 15. 1UDI (Figure (Figure3B)3B) and 1WQ1 (Figure (Figure3C)3C) are relatively difficult cases, where only a few hits are obtained within 2000 ranks by the two docking algorithms. The last case, 1SBB, is a difficult target, where both LZerD and ZDOCK failed to obtain any hits within the top 54000 ranks. LZerD generates a correct hit with an iRMSD of 2.5 Å at the rank 81598 (Figure 3D2), but the best prediction within top 54000 has an iRMSD of 10.24 Å (Figure 3D1). Therefore, the problem of LZerD for this case is the scoring function rather than the conformation sampling.

### Coparison with HEX

Table Table44 shows the comparison between the results for LZerD with another shape-based docking algorithm, HEX [50] on unbound cases of Benchmark 2.0. The docking results for HEX are taken from Table Table33 of the paper by Ritchie *et al*. [50]. In keeping with these results, we computed the ligand RMSD (lRMSD) and the MLR value, and excluded complexes for which no hit (lRMSD of less than 10 Å) was found within the top 2000 predictions. The results for ZDOCK are the same as in Table Table33 (decoys downloaded from ZDOCK Benchmark website were analyzed).

Out of the 84 benchmark cases, all three methods failed in 25. In the remaining 59 cases, LZerD obtains a better rank in 22 while HEX and ZDOCK have corresponding values of 18 and 20, respectively. LZerD also manages to obtain the lowest MLR score of 164 as compared to ZDOCK and HEX. LZerD also outperforms the other two methods in terms of the number of cases with hits ranked within the top 100, 500, and 2000 decoys.

## Discussion

In this paper, we have proposed a novel protein docking algorithm, LZerD that employs the 3DZD for the first time for capturing protein surface shape complementarity of docking interfaces. Previously we have shown that the 3DZD is a powerful tool for identifying both global [31,32] and local [25,32] surface *similarity *among proteins. In this work, we have further shown that 3DZD is also useful for capturing protein local surface *complementarity*, which is readily applicable for protein docking. LZerD uses geometric hashing as the core of the search algorithm and takes advantage of the rotation-invariant characteristics of the 3DZD. Our experiments show that the descriptor is effective in improving docking predictions when considered as one of the terms in the scoring function.

The results on the ZDOCK benchmark datasets have been compared with the other available shape-based docking algorithms, ZDOCK, Context Shapes, PatchDock, and HEX. On the bound benchmark dataset, LZerD showed a better performance than the two methods (ZDOCK and PatchDock) but slightly worse than Context Shapes (Additional file 2). However, on the unbound benchmark dataset (Tables (Tables33 &4), LZerD clearly outperforms all the other approaches. Hits obtained by Context Shapes, which performed well for the bound docking set, decreased sharply for the unbound benchmark dataset. In contrast, LZerD improves its performance relative to the others for the unbound cases indicating that the 3DZD is effective in handling a certain range of the flexibility of the protein surface that needs to be considered in protein docking. Shape-based approaches which can effectively handle a certain range of flexibility (an RMSD of 1-2 Å) is very important in the computational docking problem because such method is an indispensable component in a flexible docking prediction in combination with explicit conformational sampling scheme, such as "*ab initio*" structure modeling methods (*e.g*. ROSETTA [59], TASSER [17], CABS [60], TOUCHSTONE [16],) and molecular dynamics approaches [61,62], that are aimed at handling alternative conformation explicitly. Since even the state-of-the-art *ab initio *modeling methods are still not capable of precise modeling of protein structures, it is docking methods' task to be able to handle the small range of flexibility.

While the current study only uses geometric terms for scoring, the inclusion of more interaction energy based terms such as desolvation and electrostatics are expected to make further improvement. As a future work, we are following a cross docking approach using LZerD based on an ensemble of simulated structures of individual proteins to further consider flexibility of protein chains.

## Conclusion

We showed that the 3D Zernike descriptors are effective in capturing shape complementarity at the protein-protein docking interface. Employing the 3D Zernike descriptors, we developed a novel shape-based protein docking algorithm, LZerD. Comparison with existing shape-based docking algorithms showed that LZerD has a better performance than the other existing methods in unbound docking while remaining competitive for bound docking cases.

## Methods

### Protein surface representation

The protein surface is defined implicitly as the sum of atom-centered Gaussians [43] and takes the form of *G*(*x*) = *σ*:

where *x *is a point on the surface in the three dimensional space and *x*_{i }the center of an atom in the protein. The degree of smoothness of the resulting isosurface is altered by varying the value of the parameter *σ*, the effect of which can be seen in Figure Figure4.4. The molecular surface definition of the form of Eqn. 1 has been commonly used for representing drug molecules [63], protein-protein interaction [64], and docking [65]. The smoothing parameter has the advantage of allowing a certain amount of flexibility to be incorporated into the representation which for this study is set to 0.35. 0.35 provides sufficient amount of surface detail and being simpler is faster to calculate as compared to the other forms proposed [64]. We used the Marching Cubes algorithm as implemented in the GNU Triangulated Surface (GTS) library http://gts.sourceforge.net to compute the protein surface, which is then discretized by placing it inside a cubic grid. Each grid cell (voxel) is then assigned 1 if it is on the surface and 0, otherwise.

### Shape representation using 3D Zernike descriptors

The protein surface defined in Cartesian coordinates is represented by the 3D Zernike descriptors (3DZD). The 3DZD are a series expansion of a 3D function, *f(x)*, (*i.e*. protein surface represented by Eqn. 1 in our case), which allows a compact representation of a 3D function. Desirable properties of 3DZD such as transformation invariance and minimum information redundancy (orthonormality) make them well suited for shape matching. Here we provide a brief description of the 3DZD. For more mathematical details, please refer to the paper by Canterakis [66] and also refer to our previous papers for technical procedure for computing the 3DZD for proteins [31,32].

A protein surface is represented as a 3D function, *f(x)*, by first placing the protein structure onto a 3D grid, and voxels overlapping with the protein surface are marked with a value of 1 and others are with 0. The size of the grid is set to 0.6 Å. The 3DZD are derived from the Zernike-Canterakis polynomials[66]. For the order *n *and repetition *m*, they are given by

where (, ) are the spherical harmonics[67] (*(*, ) are the polar coordinates) of the *l*^{th }degree with *l *≤ *n*, *m * [-*l*, *l*], and *(n-l) *is even and non-negative. The radial polynomial *R*_{nl}(*r*) where *r *is the radius is defined so that are orthonormal polynomials when written in Cartesian coordinates. For a 3D function *f*(*x*) where *x * ^{3 }the 3D Zernike moments are given by

The above equation is expressed as a linear combination of geometric moments of order *n *where *M*_{rst }denotes the geometrical moment of the object normalized to fit in the unit sphere and is a set of complex coefficients. The moments are however not rotationally invariant. They are therefore collected into (*2l+1)-*dimensional vectors and the norms (||·||) of the vectors define the rotationally invariant 3D Zernike descriptors (Eqn. 4).

We use n = 10, which yields a total of 36 invariants for the 3DZD. Matching local shapes is thus reduced to comparing the vector of numbers associated with each local surface. The strength of complementarity between two such vectors (*p *= 36 values in vectors *x *and *y*) is given by the correlation coefficient:

During the docking process, 3DZD are computed for spherical patches of 6 Å radius centered on a set of evenly distributed points on the protein surface.

### Overview of docking algorithm

A general outline of the docking procedure, LZerD, is shown in Figure Figure5.5. Starting with the atomic coordinates of a ligand protein and a receptor protein, evenly distributed points at a minimum separation of 1.8 Å are extracted from the corresponding protein surfaces. For each point, a corresponding normal vector is calculated. In all, each point is labeled by its coordinates in the 3D space, the surface normal, and a numeric vector of 3DZD that captures shape features in the region bounded by a sphere of set radius. Geometric hashing is then used to find partial matches between the two surfaces based on the signatures associated with the points. The alignment transformations calculated for the matching point sets are then applied to the ligand protein, following which, the scoring function examines the amount of overlap (with an added penalty for atom collisions). Another term that measures the local geometric complementarity as reflected by the angle between the normals and the shape correlation of the 3DZD vector is also calculated. The terms are then combined with appropriate weighting factors (determined using a genetic algorithm) to give a final score for each orientation. The individual components of the docking algorithm are described in detail in the following sections.

### Geometric hashing based point matching

Given two protein surfaces (a ligand protein and a receptor protein) represented by a set of points with corresponding labels, the first task is to quickly identify matches between the two sets of features. We have used geometric hashing [10], a computer vision based approach, for identifying candidate matches. The algorithm proceeds in two stages. In the preprocessing (first) stage, points from the ligand protein are stored in a transformation invariant form in a hash table. For this, an orthonormal coordinate frame is defined using three non-collinear points and the coordinates of all the other points are expressed in this local frame. The calculated invariant vectors are then discretized and stored in a two-dimensional hash table. In the recognition stage, points from the receptor protein are used to calculate candidate frames. For each such coordinate basis and the receptor points encoded in it, corresponding ligand points are identified (those points must satisfy geometrical constraints) in the hash table constructed earlier.

A naïve implementation of the hash table based indexing is however found to be inefficient due to the non-uniform distribution of the data in the hash space. This results in longer search times as hash entry lists have different lengths. Efforts to tackle this problem have attempted the use of rehashing techniques [68] and self-organizing maps [69]. Here, a commonly used data structure for nearest neighbor search, the *k*d-tree (*k *is the dimension), which partitions point sets recursively into a small number of cells with no cell containing too many objects, is employed. Unlike hashing, the method is adaptive in that, the partitions are dependent on the data points to be stored. We have used the approximate nearest neighbor (ANN) library, which, for a given query point, retrieves all neighbors within a specified radius and allows more efficient search than conventional kd-tree [70] approaches. These points are further checked for compatibility criteria before they are added to the pool of matches.

Given *N*_{L }and *N*_{R }points on the ligand and receptor proteins, respectively, the geometric hashing scheme is as follows:

Define a reference frame using two points, *A *and *B*, and a direction where and 0 are the associated normals. For building the Cartesian frame, point *A *is taken as the origin, as the *x*-axis, as the *y*-axis and the *z*-axis as the cross-product of the other two axes. The orthonormal basis (transformation matrix) so formed includes a rotation and a translation to move a given point to the origin as shown in Figure Figure6.6. In choosing a frame, certain constraints are applied. Most values were taken from a previously published study [9], except for two values, d_{min }and d_{max}, which are described below. Values ranging from 2.5 to 5 Å for d_{min }and 7.5 to 12 Å for d_{max }were tested on the training set proteins. Considering the number of hits and the computational efficiency, we chose 4 Å for d_{min }and 9 Å for d_{max}.

**A coordinate frame is calculated using points A and B and the average of their normals as shown in (a)**. The change of coordinates from (X, Y, Z)-space to (U, V, N)-space is given by the rotation matrix (U, V, N) (the 3 × 3 matrix represented

**...**

i. *d*_{min }<*d*_{Ab }<*d*_{max }where *d*_{min }and *d*_{max }are the minimum and maximum allowable distances between two point are set to *d*_{min }= 4 Å and *d*_{max }= 9 Å

ii. *TorsionAng le*(*a*_{n}, *A*, *B*, *b*_{n}) < 1.4 *radians*

iii. (*a*_{n}, *b*_{n})< 1.2 *radians *where denotes the angle

iv. 0.87 *radians *< (*a*_{n}, *B*), (*b*_{n}, *B*) < 1.2 *radians *and |(*a*_{n}, *B*) - (*b*_{n}, *B*)|< 1.2 *radians *(1.4, 1.2, and 0.87 radians are approximately equivalent to 80.2, 68.8, and 49.8 degrees)

1) The coordinates of all other points in the ligand are then recorded in this reference frame created in the previous step. Thus, for a ligand defined by *P *points, the other *P-2 *points (two points used for the construction of the frame) are transformed into this reference frame. Only those points *C *for which *d*_{AC}, *d*_{BC }< 15 Å are considered. The triangle (*A, B, C*) thus formed is weighted by the edge lengths along with the other labels associated with the vertices.

2) Enter all the transformed points of the ligand protein in each ligand reference frame into the *k*d- tree where *k *= 3 *i.e. x, y, z *coordinates of the ligand point. Each such point has a reference to the coordinate frame in which it was transformed.

3) Compute the coordinate frames and new positions of the receptor points as shown in Steps 1 and 2.

4) For every coordinate frame for the receptor protein:

i. For every transformed point in the current receptor frame

1. Perform a range search to retrieve geometrically similar points of the ligand (points recorded in different reference frames are stored in the kd-tree). This search (using methods in the ANN library) locates all nearest neighbors of the current receptor point within a radius bound of 2.5 Å.

2. For each ligand point located within this bound, compare the labels (3DZD, normals, torsion angles, and point distances) of the points (receptor vs ligand) and those of the corresponding reference frames. If the labels are compatible, then increment a vote counter for the current ligand-receptor basis pair. Criteria for comparison are as follows:

a) The lengths of the triangle edges should be compatible *i.e*.

where *r*_{i}, *r*_{j}, *l*_{i}, and *l*_{j }are vertices of the triangles and *i, j *= 1, 2, 3.

b) A ligand point *i *is compatible with a receptor point *j *if the correlation between their 3DZD exceed a threshold value *i.e. Correlations(Z*_{i}, *Z*_{j}) ≥ 0.65

c) Compatibility of pairwise normal angles formed between any two of the ligand or the receptor < 0.8 *radian *for 1 ≤ *i*, *j *≤ 3

d) Torsion angles formed by any two points and their corresponding normals should be compatible across the ligand and the receptor

< 0.8 *radian *for 1 ≤ *i*, *j *≤ 3

(0.8 radian is approximately equivalent to 45.8 degrees)

e) Compatibility of the angle between the normals and reference frame axes for the ligand and receptor. < 0.8 radian for *i *= 1, 2 where is the unit vector in the direction of *l*_{1}*l*_{2 }for the ligand and *r*_{1}*r*_{2 }for the receptor.

3. For all the receptor-ligand reference frames that exceed a threshold limit of votes (point pairs with matching labels). The threshold is set to 10 votes.

a) For each matching list of point pairs that exceed the minimum list of votes, calculate the alignment transformation that overlays these points [71].

b) Apply the transformation to the ligand protein. This transformation is actually a composition of three transforms as shown in Figure Figure77.

**Given a receptor R and a ligand L in their original orientation, a random orientation**. Selection of a three-point basis (two points + the average of the normals of the two points) for both the ligand (

*T*_{0 }is first applied to the ligand*L*" →

*T*

_{IL}

**...**

c) Calculate the number of clashes as indicated by the excluded volume and the extent of overlap given by the contact surface area. If the excluded volume exceeds the threshold value of 500 Å^{3}, discard the transformation.

d) Output transformation and corresponding scores.

ii. Clear the voting list and proceed with the next reference frame.

5) For all candidate orientations, evaluate the weighted score based on Equations 6-8 described in the subsequent sections. Output the ranked list.

### Shape-based scoring function

The scoring function is based on geometric criteria and is composed of the following terms:

1) Term based on the orientation of the surface normals and the local shape correlation defined by the 3DZD. The term is further split into a reward and a penalty.

2) The buried surface area

3) The excluded volume

These individual terms are then combined with appropriate weighting factors (weight optimization performed using a genetic algorithm) to produce a single score with a higher value indicating a more favorable solution.

#### Term based on normals and 3DZD local shape correlation

This measure of the geometric surface complementarity is based on the orientation of the normals at the interface and the shape correlation between the 3DZD vectors. The score for a given ligand orientation is calculated using the following equation:

In the above equation, -1 ≤ *η*_{ij }≤ 1 is the dot product of the normals while-1 ≤ *C*_{ij }≤ 1 is the correlation between the 3DZD vectors as calculated by Eqn. 5. The score *ZN*(*η*_{ij}, *C*_{ij}) is computed for every point on the ligand protein *i *and its closest counterpart *j *on the receptor and summed up for a given docking conformation. The first part of the equation represents the reward score given to cases where a given pair *i *and *j *have normals with opposing angles (π radians, *i.e*. the scalar product is -1) and the 3DZD correlation close to 1 (*i.e*. perfectly complementary to each other). The maximum reward value of 1.0 is assigned to the term *A *when *C*_{ij }= 1 and *η*_{ij }= -1. The reward value is suitably reduced for cases where the correlation and the angle values are poor. The second part of the equation deals with penalization that occurs when the scalar product of the normals yields a positive value or if the correlation is negative (indicating a poor match between the shapes). The term *1-A *has a maximum value of 0.98 when *C*_{ij }= -1 and *η*_{ij }= 1. Both scores are weighted by an exponential distance term. The reward term and the penalty term is weighted separately with different weights (Table (Table5).5). Although these scores reflect the local shape complementarity, they do not adequately explain the extent of overlap of the surfaces and the resultant clashes that may occur due to the ligand orientation. Two additional terms have therefore been added to account for such cases.

#### Area of overlap

For a given orientation of the ligand, the amount of overlap is estimated by the buried surface area. The accessible area buried between the interacting components is given by:

Where *SASA *are the solvent accessible surface areas of the receptor R, the ligand_{L}, and the complex_{RL}, respectively. The areas were estimated using the boolean look-up tables approach by LeGrand and Merz [72]. Although larger values generally indicate a more stable association, they can often be misleading owing to the large interpenetration of the partners. A penalty term is also included to account for the steric clashes that may occur.

#### Excluded volume

Atom pairs at the interface that are closer than a cutoff distance (3 Å according to the CAPRI criteria [73]) are said to clash. Owing to the proximity, these clashing atoms have a repulsive effect which can be measured by the excluded volume term, for which an analytical expression has been proposed [74]. For two atoms, *a *and *b*, with van der Waals radii *R*_{a }and *R*_{b}, the distance between the atoms, *d*_{ab}, the corresponding overlap volume *V*_{ab }can be calculated as

where

The overall excluded volume is given by the sum of the volume overlaps over the receptor, *a * *R*, and the ligand atoms, *b * *L*:

#### Optimization of weights using Genetic Algorithm

Given the four scoring terms, the goal is to find a set of weighting factors *w*, *c*, such that the fitness function (Eqn. 12) is maximized [75]:

Here *s*_{i }is individual scoring terms, *w*_{i }is the set of weights for each term, and *c*_{i }is the corresponding offset value. Eqn. 11 defines a ranking order on the set of docking predictions to be assessed. The form of Eqn. 11 was found to be superior to the linear weighting scheme [34] and has been duplicated in this study. A genetic algorithm (GA) based approach was used to determine an optimal set of weights and offset values for the terms. Starting with a population of randomly generated solutions, the GA iteratively attempts to improve the quality of the solutions using evolutionary schemes such as mutation and crossover with a suitably defined fitness function to guide the search.

The current implementation of the GA uses a(*μ*, *λ*) evolutionary strategy with self-adaptation [76] where the best *μ *individuals from a population of *λ *offspring are chosen as parents for the next generation. The fitness function is based on the Boltzmann-enhanced discrimination of receiver operating characteristic (BEDROC) [77]. The BEDROC metric is motivated by the fact that there are more false positives among the predictions and places more emphasis on hits located near the top of an ordered list as compared to those at the end. The metric has been shown to provide better discrimination of hits and non-hits in comparison to other measures such as average rank and the area under the ROC curve [78]. For a set of *N *predictions, with *n *hits, the BEDROC metric applies a continuously decreasing exponential weighting for the ranks of the hits and is calculated as:

Here *r*_{i }is the rank of the hit and *β *= 160.9 is an exponent pre-factor defined such that 80% of the corresponding BEDROC score was based on the top-ranked 1% percent of the predictions. Given the discrepancies in ratio of hits to non-hits across different proteins, the average of the BEDROC values over the training set cases *i *was chosen as the final fitness.

Following is the algorithm flow for finding the weights based on GA:

1) Generate an initial population *P *= 30. Each individual in *P *is represented by 16 values, corresponding to the 4 feature weights, 4 offset values, and 8 mutation probabilities.

2) For a predefined number of generations *K *= 250 do

1. Create a new population using crossover and mutation. For each pair of parent individuals, two new offspring are created. Thus, for *n *= 30 individuals in *P*, *n*^{2 }= 900 offspring are created. The blend crossover (Eqn. 14) operator as implemented in OpenBeagle [79] was used. Given two parents with mutation parameters , random values , and parameter *α *= 0.5 the children are recombined using the blend crossover operator as follows:

As different mutation strengths were used for each parameter, a Gaussian mutation operator [76] is applied (Eqn. 15). The update rule for mutation parameters *g*_{i }and offspring *x*_{i }for *β *= 0.01 and random values drawn from a Gaussian distribution (zero mean and unit variance) N(0,1) is given by

i. First the mutation probabilities are recombined (crossover) and then mutated.

ii. The weights and offset values are then recombined using the blend crossover operator, followed by mutation using the newly created set of mutation probabilities.

2. Assess the fitness of the newly created population

i. Use each individual set of weights (*w*_{i}, *c*_{i}) P to score and rank the sets of predictions for the roteins in the training set.

ii. Evaluate the BEDROC value (Eqn. 12) for each protein based on the ranks of the hits.

iii. Calculate the fitness for each individual as given by the average BEDROC value.

3. Select the *n *best set of parameters from the *n*^{2 }offspring. These will be the parents for the next generation.

3) At the end of *K *such cycles, record the best performing set of parameters for the current training set.

All the reported ranks for the LZerD predictions are based on the set of weights (Eqn. 11; Table Table5)5) identified by the genetic algorithm (GA) trained on the data set listed in Table Table2.2. The signs of the weights computed for the 3DZD-normal penalty, the buried surface area, and the excluded volume may seem counterintuitive *i.e*. they have positive, negative, and positive values when they are supposed to contribute as a penalty, a reward, and a penalty, respectively. However, they do function as desired, as for a vast majority of the cases, the computed terms are well below the offset values, thus effectively reversing the signs.

### Datasets and experimental setup

The training and testing procedure of the docking prediction is illustrated in Figure Figure8.8. The weighting factors (Eqn. 11) are determined based on a training set of 29 bound-bound protein complexes taken from ZDOCK benchmark datasets 0.0[80] and 1.0 [55]. For each of the 29 protein complexes in the training set, LZerD generated around 50,000-250,000 predictions, which were then classified as hits (predicted complex conformations with an interface RMSD ≤ 2.5 Å) and non-hits. Since the number of predictions was significantly high, the training set was split into 20 subsets each containing 2000 randomly chosen predictions for each protein complex. The subsets were chosen such that 75% of the hits for each complex were included for training. For each of the 20 sets the GA was run 10 times and the best set of weighting factors over each run was recorded. The 200 sets of values were then applied to the whole training dataset covering all predictions and the best performing set of weighting factors was chosen based on the one that yielded the largest average BEDROC score (Eqn. 12). The BEDROC score of the 200 training GA runs is provided as the Additional file 3.

**Implementation of the training phase using the GA**. Twenty splits of the training data set are considered where each split is a collection of 2000 randomly selected predictions that include 75% of the hits from the generated data for each protein complex.

**...**

The weighting factors obtained were then applied to a test set of (84 bound and unbound complexes) taken from the ZDOCK Benchmark 2.0 [21]. Protein complexes in ZDOCK Benchmark 2.0 that are common to the ZDOCK Benchmark 0.0 and 1.0 were removed from the training set thus ensuring an independent test set.

### Evaluation Criteria

All docking predictions were evaluated in terms of the interface (any atom within 10 Å of the other protein is considered to be part of the interface) RMSD between the bound and unbound interface C-*α *with values under 2.5 Å considered a hit[73]. In addition to the interface RMSDs, the ligand RMSDs (LRMSD) *i.e*. RMSD of the backbone atoms of the bound and unbound ligand proteins have also been calculated. According to the CAPRI criteria [73], LRMSD values less than 10 Å fall in the acceptable range. Another criterion, based on the mean of the logarithm of rank of the first hit[50] has also been used in the assessment

Here *N *is the number of complexes and *χ *refers to maximum value of the rank to be considered and is set to 1000. The value of *MLR *ranges between 1 (all rank 1 hits) to *χ *(no hits within the threshold).

## Availability and requirements

The docking program *LZerD *is written in C++. LZerD is freely available to academic institutions and Linux executables can be downloaded from our website, http://kiharalab.org/proteindocking/. The program requires a computer with at least 1.5 GB RAM operated by Linux OS. As the program requires some preprocessing of the PDB files, a shell script has been provided to automate the procedure. Thus, the user is only required to provide the ligand and receptor protein (PDB) files to be docked as input to the script. Output is in the form of PDB files of top ranking ligand orientations. The average times combining both docking and scoring range are about 1-2 hours for small proteins (about 400 points on the receptor and ligand) and it may take longer for larger proteins. This timing is reported on a computer with a dual-core 2.1 GHz processor with 8 GB RAM.

## Abbreviations

3D: three dimensional; 3DZD: 3D Zernike descriptors; LZerD: Local 3D Zernike descriptor-based Docking program.

## Authors' contributions

VV implemented the algorithms, designed and conducted the experiments, and drafted the paper. YDY participated in the development of geometric hashing. LS developed programs for computing the 3DZD and analyzed protein docking interfaces. DK conceived of the study, participated in its design and coordination, and helped to draft the manuscript. All authors read and approved the final manuscript.

## Acknowledgements

The authors gratefully acknowledge useful discussions with David La during the course of this work. We thank Paul Tuck for proofreading the manuscript. This work has been supported by grants from the National Institutes of Health (R01GM075004). DK also acknowledges grants from NIH (U24 {"type":"entrez-nucleotide","attrs":{"text":"GM077905","term_id":"221375274","term_text":"GM077905"}}GM077905) and National Science Foundation (DMS0604776, DMS0800568, EF0850009, IIS0915801).

## References

- Russell RB, Alber F, Aloy P, Davis FP, Korkin D, Pichaud M, Topf M, Sali A. A structural perspective on protein-protein interactions. Curr Opin Struct Biol. 2004;14:313–324. doi: 10.1016/j.sbi.2004.04.006. [PubMed] [Cross Ref]
- Aloy P, Russell RB. Ten thousand interactions for the molecular biologist. Nat Biotechnol. 2004;22:1317–1321. doi: 10.1038/nbt1018. [PubMed] [Cross Ref]
- Salwinski L, Eisenberg D. Computational methods of analysis of protein-protein interactions. Curr Opin Struct Biol. 2003;13:377–382. doi: 10.1016/S0959-440X(03)00070-8. [PubMed] [Cross Ref]
- Szilagyi A, Grimm V, Arakaki AK, Skolnick J. Prediction of physical protein-protein interactions. Phys Biol. 2005;2:S1–16. doi: 10.1088/1478-3975/2/2/S01. [PubMed] [Cross Ref]
- Ritchie DW. Recent progress and future directions in protein-protein docking. Curr Protein Pept Sci. 2008;9:1–15. doi: 10.2174/138920308783565741. [PubMed] [Cross Ref]
- Halperin I, Ma B, Wolfson H, Nussinov R. Principles of docking: An overview of search algorithms and a guide to scoring functions. Proteins. 2002;47:409–443. doi: 10.1002/prot.10115. [PubMed] [Cross Ref]
- Tovchigrechko A, Wells CA, Vakser IA. Docking of protein models. Protein Sci. 2002;11:1888–1896. doi: 10.1110/ps.4730102. [PMC free article] [PubMed] [Cross Ref]
- Chen R, Li L, Weng Z. ZDOCK: an initial-stage protein-docking algorithm. Proteins. 2003;52:80–87. doi: 10.1002/prot.10389. [PubMed] [Cross Ref]
- Fischer D, Lin SL, Wolfson HL, Nussinov R. A geometry-based suite of molecular docking processes. J Mol Biol. 1995;248:459–477. [PubMed]
- Wolfson H, Rigoutsos I. Geometric hashing: an overview. IEEE Computational Science Engineering. 1997;4:10–21. doi: 10.1109/99.641604. [Cross Ref]
- Ritchie DW, Kemp GJ. Protein docking using spherical polar Fourier correlations. Proteins. 2000;39:178–194. doi: 10.1002/(SICI)1097-0134(20000501)39:2<178::AID-PROT8>3.0.CO;2-6. [PubMed] [Cross Ref]
- Lensink MF, Mendez R, Wodak SJ. Proteins. 3. Vol. 69. 2007. Docking and scoring protein complexes: CAPRI; pp. 704–718. [PubMed] [Cross Ref]
- Bonvin AM. Flexible protein-protein docking. Curr Opin Struct Biol. 2006;16:194–200. doi: 10.1016/j.sbi.2006.02.002. [PubMed] [Cross Ref]
- Andrusier N, Mashiach E, Nussinov R, Wolfson HJ. Principles of flexible protein-protein docking. Proteins. 2008;73:271–289. doi: 10.1002/prot.22170. [PMC free article] [PubMed] [Cross Ref]
- Krol M, Chaleil RA, Tournier AL, Bates PA. Implicit flexibility in protein docking: cross-docking and local refinement. Proteins. 2007;69:750–757. doi: 10.1002/prot.21698. [PubMed] [Cross Ref]
- Kihara D, Lu H, Kolinski A, Skolnick J. TOUCHSTONE: an ab initio protein structure prediction method that uses threading-based tertiary restraints. Proc Natl Acad Sci USA. 2001;98:10125–10130. doi: 10.1073/pnas.181328398. [PMC free article] [PubMed] [Cross Ref]
- Lee SY, Skolnick J. Benchmarking of TASSER_2.0: an improved protein structure prediction algorithm with more accurate predicted contact restraints. Biophys J. 2008;95:1956–1964. doi: 10.1529/biophysj.108.129759. [PMC free article] [PubMed] [Cross Ref]
- Misura KM, Baker D. Progress and challenges in high-resolution refinement of protein structure models. Proteins. 2005;59:15–29. doi: 10.1002/prot.20376. [PubMed] [Cross Ref]
- Katchalski-Katzir E, Shariv I, Eisenstein M, Friesem AA, Aflalo C, Vakser IA. Molecular surface recognition: determination of geometric fit between proteins and their ligands by correlation techniques. Proc Natl Acad Sci USA. 1992;89:2195–2199. doi: 10.1073/pnas.89.6.2195. [PMC free article] [PubMed] [Cross Ref]
- Gabb HA, Jackson RM, Sternberg MJ. Modelling protein docking using shape complementarity, electrostatics and biochemical information. J Mol Biol. 1997;272:106–120. doi: 10.1006/jmbi.1997.1203. [PubMed] [Cross Ref]
- Mintseris J, Wiehe K, Pierce B, Anderson R, Chen R, Janin J, Weng Z. Protein-Protein Docking Benchmark 2.0: an update. Proteins. 2005;60:214–216. doi: 10.1002/prot.20560. [PubMed] [Cross Ref]
- Jiang F, Kim SH. "Soft docking": matching of molecular surface cubes. J Mol Biol. 1991;219:79–102. doi: 10.1016/0022-2836(91)90859-5. [PubMed] [Cross Ref]
- Palma PN, Krippahl L, Wampler JE, Moura JJ. BiGGER: a new (soft) docking algorithm for predicting protein interactions. Proteins. 2000;39:372–384. doi: 10.1002/(SICI)1097-0134(20000601)39:4<372::AID-PROT100>3.0.CO;2-Q. [PubMed] [Cross Ref]
- Zacharias M. ATTRACT: protein-protein docking in CAPRI using a reduced protein model. Proteins. 2005;60:252–256. doi: 10.1002/prot.20566. [PubMed] [Cross Ref]
- Sael L, Kihara D. In: Biological Data Mining. Chen J, Lonardi S, editor. Boca Raton, Florida, USA: Chapman & Hall/CRC Press; 2009. Protein surface representation and comparison: New approaches in structural proteomics; pp. 89–109.
- Li B, Turuvekere S, Agrawal M, La D, Ramani K, Kihara D. Characterization of local geometry of protein surfaces with the visibility criterion. Proteins. 2007;71:670–683. doi: 10.1002/prot.21732. [PubMed] [Cross Ref]
- Tseng YY, Dundas J, Liang J. Predicting Protein Function and Binding Profile via Matching of Local Evolutionary and Geometric Surface Patterns. J Mol Biol. 2009;387:451–464. doi: 10.1016/j.jmb.2008.12.072. [PMC free article] [PubMed] [Cross Ref]
- Laskowski RA, Watson JD, Thornton JM. ProFunc: a server for predicting protein function from 3D structure. Nucleic Acids Res. 2005;33:W89–W93. doi: 10.1093/nar/gki414. [PMC free article] [PubMed] [Cross Ref]
- Kahraman A, Morris RJ, Laskowski RA, Thornton JM. Shape variation in protein binding pockets and their ligands. J Mol Biol. 2007;368:283–301. doi: 10.1016/j.jmb.2007.01.086. [PubMed] [Cross Ref]
- Kihara D, Sael L, Chikhi R. Local surface shape-based protein function prediction using Zernike descriptors. Biophys J. 2009;96:650a. doi: 10.1016/j.bpj.2008.12.3435. [Cross Ref]
- Sael L, Li B, La D, Fang Y, Ramani K, Rustamov R, Kihara D. Fast protein tertiary structure retrieval based on global surface shape similarity. Proteins. 2008;72:1259–1273. doi: 10.1002/prot.22030. [PubMed] [Cross Ref]
- Sael L, La D, Li B, Rustamov R, Kihara D. Rapid comparison of properties on protein surface. Proteins. 2008;73:1–10. doi: 10.1002/prot.22141. [PMC free article] [PubMed] [Cross Ref]
- Shentu Z, Al HM, Bystroff C, Zaki MJ. Context shapes: Efficient complementary shape matching for protein-protein docking. Proteins. 2008;70:1056–1073. doi: 10.1002/prot.21600. [PubMed] [Cross Ref]
- Bernauer J, Aze J, Janin J, Poupon A. A new protein-protein docking scoring function based on interface residue properties. Bioinformatics. 2007;23:555–562. doi: 10.1093/bioinformatics/btl654. [PubMed] [Cross Ref]
- Grunberg R, Leckner J, Nilges M. Complementarity of structure ensembles in protein-protein binding. Structure. 2004;12:2125–2136. doi: 10.1016/j.str.2004.09.014. [PubMed] [Cross Ref]
- Segal D, Eisenstein M. The effect of resolution-dependent global shape modifications on rigid-body protein-protein docking. Proteins. 2005;59:580–591. doi: 10.1002/prot.20432. [PubMed] [Cross Ref]
- Zhang Q, Sanner M, Olson AJ. Shape complementarity of protein-protein complexes at multiple resolutions. Proteins. 2009;75:453–467. doi: 10.1002/prot.22256. [PMC free article] [PubMed] [Cross Ref]
- Connolly ML. Shape complementarity at the hemoglobin alpha 1 beta 1 subunit interface. Biopolymers. 1986;25:1229–1247. doi: 10.1002/bip.360250705. [PubMed] [Cross Ref]
- Connolly ML. Shape distributions of protein topography. Biopolymers. 1992;32:1215–1236. doi: 10.1002/bip.360320911. [PubMed] [Cross Ref]
- Hou T, Wang J, Chen L, Xu X. Automated docking of peptides and proteins by using a genetic algorithm combined with a tabu search. Protein Eng. 1999;12:639–648. doi: 10.1093/protein/12.8.639. [PubMed] [Cross Ref]
- Gardiner EJ, Willett P, Artymiuk PJ. Protein docking using a genetic algorithm. Proteins. 2001;44:44–56. doi: 10.1002/prot.1070. [PubMed] [Cross Ref]
- Lawrence MC, Colman PM. Shape complementarity at protein/protein interfaces. J Mol Biol. 1993;234:946–950. doi: 10.1006/jmbi.1993.1648. [PubMed] [Cross Ref]
- Bordner AJ, Gorin AA. Protein docking using surface matching and supervised machine learning. Proteins. 2007;68:488–502. doi: 10.1002/prot.21406. [PubMed] [Cross Ref]
- Schneidman-Duhovny D, Inbar Y, Nussinov R, Wolfson HJ. PatchDock and SymmDock: servers for rigid and symmetric docking. Nucleic Acids Res. 2005;33:W363–W367. doi: 10.1093/nar/gki481. [PMC free article] [PubMed] [Cross Ref]
- Lesk VI, Sternberg MJ. 3D-Garden: a system for modelling protein-protein complexes based on conformational refinement of ensembles generated with the marching cubes algorithm. Bioinformatics. 2008;24:1137–1144. doi: 10.1093/bioinformatics/btn093. [PubMed] [Cross Ref]
- Jiang F, Kim SH. "Soft docking": matching of molecular surface cubes. J Mol Biol. 1991;219:79–102. doi: 10.1016/0022-2836(91)90859-5. [PubMed] [Cross Ref]
- Chen R, Weng Z. A novel shape complementarity scoring function for protein-protein docking. Proteins. 2003;51:397–408. doi: 10.1002/prot.10334. [PubMed] [Cross Ref]
- Mak L, Grandison S, Morris RJ. An extension of spherical harmonics to region-based rotationally invariant descriptors for molecular shape description and comparison. J Mol Graph Model. 2007;26:1035–1045. doi: 10.1016/j.jmgm.2007.08.009. [PubMed] [Cross Ref]
- Gramada A, Bourne PE. Multipolar representation of protein structure. BMC Bioinformatics. 2006;7:242. doi: 10.1186/1471-2105-7-242. [PMC free article] [PubMed] [Cross Ref]
- Ritchie DW, Kozakov D, Vajda S. Accelerating and focusing protein-protein docking correlations using multi-dimensional rotational FFT generating functions. Bioinformatics. 2008;24:1865–1873. doi: 10.1093/bioinformatics/btn334. [PMC free article] [PubMed] [Cross Ref]
- Edmonds AR. Angular momentum in quantum mechanics. Princeton: Princeton University Press; 1957.
- La D, Esquivel-Rodriguez J, Venkatraman V, Li B, Sael L, Ueng S, Ahrendt S, Kihara D. 3D-SURFER: software for high-throughput protein surface comparison and analysis. Bioinformatics. 2009;25:2843–2844. doi: 10.1093/bioinformatics/btp542. [PMC free article] [PubMed] [Cross Ref]
- Sael L, Kihara D. Characterization and classification of local protein surfaces using self-organizing map. International Journal of Knowledge Discovery in Bioinformatics (IJKDB) 2010. in press .
- Novotni M, Klein R. 3D Zernike descriptors for content based shape retrieval. ACM Symposium on Solid and Physical Modeling, Proceedings of the eighth ACM symposium on Solid modeling and applications. 2003. pp. 216–225. full_text.
- Chen R, Mintseris J, Janin J, Weng Z. A protein-protein docking benchmark. Proteins. 2003;52:88–91. doi: 10.1002/prot.10390. [PubMed] [Cross Ref]
- Schneidman-Duhovny D, Inbar Y, Polak V, Shatsky M, Halperin I, Benyamini H, Barzilai A, Dror O, Haspel N, Nussinov R, Wolfson HJ. Taking geometry to its edge: fast unbound rigid (and hinge-bent) docking. Proteins. 2003;52:107–112. doi: 10.1002/prot.10397. [PubMed] [Cross Ref]
- Venkatraman V, Sael L, Kihara D. Potential for protein surface shape analysis using spherical harmonics and 3D Zernike descriptors. Cell Biochem Biophys. 2009;54:23–32. doi: 10.1007/s12013-009-9051-x. [PubMed] [Cross Ref]
- Hubbard SJ, Thornton JM. NACCESS. London: University College London, Department of Biochemistry and Molecular Biology; 1993.
- Das R, Baker D. Macromolecular modeling with rosetta. Annu Rev Biochem. 2008;77:363–382. doi: 10.1146/annurev.biochem.77.062906.171838. [PubMed] [Cross Ref]
- Kolinski A. Protein modeling and structure prediction with a reduced representation. Acta Biochim Pol. 2004;51:349–371. [PubMed]
- Schlick T. Molecular modeling and simulation. New York: Springer-Verlag; 2002.
- Brooks BR, Brooks CL III, Mackerell AD Jr, Nilsson L, Petrella RJ, Roux B, Won Y, Archontis G, Bartels C, Boresch S, Caflisch A, Caves L, Cui Q, Dinner AR, Feig M, Fischer S, Gao J, Hodoscek M, Im W, Kuczera K, Lazaridis T, Ma J, Ovchinnikov V, Paci E, Pastor RW, Post CB, Pu JZ, Schaefer M, Tidor B, Venable RM, Woodcock HL, Wu X, Yang W, York DM, Karplus M. CHARMM: the biomolecular simulation program. J Comput Chem. 2009;30:1545–1614. doi: 10.1002/jcc.21287. [PMC free article] [PubMed] [Cross Ref]
- Grant JA, Pickup BT. A Gaussian Description of Molecular Shape. Journal of Physical Chemistry. 1995;99:3503–3510. doi: 10.1021/j100011a016. [Cross Ref]
- Gabdoulline RR, Wade RC. Effective charges for macromolecules in solvent. Journal of Physical Chemistry. 1996;100:3868–3878. doi: 10.1021/jp953109f. [Cross Ref]
- Lesk VI, Sternberg MJ. 3D-Garden: a system for modelling protein-protein complexes based on conformational refinement of ensembles generated with the marching cubes algorithm. Bioinformatics. 2008;24:1137–1144. doi: 10.1093/bioinformatics/btn093. [PubMed] [Cross Ref]
- Canterakis N. 3D Zernike moments and Zernike affine invariants for 3D image analysis and recognition. Proc 11th Scandinavian Conference on Image Analysis. 1999. pp. 85–93.
- Dym H, McKean H. Fourier series and integrals. San Diego: Academic Press; 1972.
- Lifshits M, Blayvas I, Goldenberg R, Rivlin E, Rudzsky M. Rehashing for Baysian geometric hasing. Proceedings of the 17th International Conference on ICPR'04. 2004;3:99–102.
- Bebis G, Georgiopoulos M, Lobo ND. Using self-organizing maps to learn geometric hash functions for model-based object recognition. Ieee Transactions on Neural Networks. 1998;9:560–570. doi: 10.1109/72.668897. [PubMed] [Cross Ref]
- Arya S, Mount DM, Netanyahu NS, Silverman R, Wu AY. An optimal algorithm for approximate nearest neighbor searching in fixed dimensions. Journal of the Acm. 1998;45:891–923. doi: 10.1145/293347.293348. [Cross Ref]
- Umeyama S. Least-Squares Estimation of Transformation Parameters Between 2 Point Patterns. Ieee Transactions on Pattern Analysis and Machine Intelligence. 1991;13:376–380. doi: 10.1109/34.88573. [Cross Ref]
- Legrand S, Merz K. Rapid approximation to molecular surface area via the use of Boolean logic and look-up tables. J Comp Chem. 1993;14:349–352. doi: 10.1002/jcc.540140309. [Cross Ref]
- Janin J. Assessing predictions of protein-protein interaction: the CAPRI experiment. Protein Sci. 2005;14:278–283. doi: 10.1110/ps.041081905. [PMC free article] [PubMed] [Cross Ref]
- Eyal E, Najmanovich R, McConkey BJ, Edelman M, Sobolev V. Importance of solvent accessibility and contact surfaces in modeling side-chain conformations in proteins. Journal of Computational Chemistry. 2004;25:712–724. doi: 10.1002/jcc.10420. [PubMed] [Cross Ref]
- Sebag M, Aze J, Lucas N. ROC-based evolutionary learning: Application to medical data mining. Artificial Evolution. 2004;2936:384–396.
- Meyer-Nieberg S, Hans-Geor B. Parameter Setting in Evolutionary Algorithms. Berlin/Heidelberg: Springer; 2007. Self-adaptation in evolutionary algorithms; pp. 47–75. full_text.
- Truchon JF, Bayly CI. Evaluating virtual screening methods: good and bad metrics for the "early recognition" problem. J Chem Inf Model. 2007;47:488–508. doi: 10.1021/ci600426e. [PubMed] [Cross Ref]
- Truchon JF, Bayly CI. Evaluating virtual screening methods: good and bad metrics for the "early recognition" problem. J Chem Inf Model. 2007;47:488–508. doi: 10.1021/ci600426e. [PubMed] [Cross Ref]
- Gagne C, Parizeau M. Genericity in evolutionary computation software tools: principles and cas-study. Int J Artif Intell Tools. 2006;15:173–194. doi: 10.1142/S021821300600262X. [Cross Ref]
- Chen R, Weng Z. Docking unbound proteins using shape complementarity, desolvation, and electrostatics. Proteins. 2002;47:281–294. doi: 10.1002/prot.10092. [PubMed] [Cross Ref]

**BioMed Central**