![]() | ![]() |
Formats:
|
||||||||||
Protein-protein docking with reduced potentials by exploiting multi-dimensional energy funnels †Center for Information & Systems Eng., and Dept. of Manufacturing Eng., Boston University, 15 St. Mary’s St., Brookline, MA 02446, e-mail: yannisp/at/bu.edu, url: http://ionia.bu.edu/. ‡Center for Information & Systems Engineering, and Dept. of Manufacturing Eng., Boston University, e-mail: yangshen/at/bu.edu. ¶Center for Information & Systems Engineering, and Dept. of Manufacturing Eng., Boston University, e-mail: vakili/at/bu.edu. §Department of Biomedical Engineering, Boston University, e-mail: vajda/at/bu.edu. The publisher's final edited version of this article is available at Conf Proc IEEE Eng Med Biol Soc.Abstract We propose a new computational approach for protein docking exploiting energy funnels in the 6-dimensional space of translations and rotations of the ligand with respect to the receptor. Our approach consists of a series of translational and orientational moves of the ligand towards the receptor. Each move is performed using a global optimization method we have developed – the Semi-Definite Underestimation (SDU) method – which can exploit a funnel-like energy function. We compared our approach with Monte Carlo on a set of 10 protein complexes using two residue-level potentials. To achieve the same level of performance (produce a near-native ≤ 3Å RMSD complex) our approach reduces energy evaluations by more than a factor of two, on average. Keywords: Computational biology, Global optimization, Semi-definite programming, Molecular docking I. INTRODUCTION The genome-wide proteomics study fosters the need for understanding protein–protein interactions. Although X-ray crystallography has determined the structure of many protein complexes, the number of such complexes is low compared to the total number of structures in the Protein Data Bank (PDB). Consequently, determining the atomic coordinates of the complex computationally – a problem known as protein docking – is critical. In protein docking, one starts from the structures of the component proteins – the receptor and the ligand – which are assumed known. Given the fact that the native complex adopts the lowest Gibbs energy, the problem can be cast as global optimization of the binding free energy. According to recent CAPRI protein-protein docking experiment results, 1 all successful methods are based on a multistage approach. Such an approach starts with a systematic coarse grain search, where Fast Fourier Transform (FFT) correlation techniques [1]) are widely used. Then promising conformational subsets are further explored by more intelligent optimization algorithms in higher resolution. Such “refinement”-stage algorithms include Monte Carlo minimization with simultaneous rigid body and side-chain optimization [2], and pseudo-Brownian rigid body docking followed by Biased Probability Monte Carlo minimization of the ligand interacting side chains [3]. As in protein folding, the protein binding energy landscape is believed to contain extremely rugged funnels [4]. This concept is more or less computationally validated in the one-dimensional dissimilarity space of RMSD (Root Mean Square Distance). Further, it has a physical explanation. Generally, the energy includes forces that act in different space-scales, resulting in multi-frequency behavior, and leading to a huge number of local minima caused by high-frequency terms (such as van der Waals). This suggests that even in a relatively small region, a local search algorithm can be of low efficiency and trapped easily in kinetic moves [5]. But at local minima where the solute-solute and solute-solvent interfaces are equally well packed without overlaps, the intermolecular van der Waals interactions in the bound state are largely balanced by solute-solvent interactions in the free state. Therefore, at such local minima the free energy surface is essentially determined by the “smooth” free energy component, which exhibits a funnel-like shape [4]. However, the funnel-like shape has not been sufficiently explored in the higher-dimensional conformational space. This is exactly what our work sets out to achieve. In this paper we present a stochastic global optimization algorithm for protein docking. It explores the energy funnels in the space of rigid motion and leads a rigid motion pathway toward the global minimum by exploiting such funnels. II. MATERIALS AND METHODS A. Reduced Potentials To better understand the structure of funnels in the orientational subspace, and remove complicating factors present in the full version of the problem, we will work with two models. These models are Gō-type models [6] and have been extensively used for understanding protein folding. They classify all possible contacts as “native” (i.e., present in the native structure) or “non-native”. A set of potentials are developed so that native contacts are favorable while non-native are not. The major advantage of such reduced models is that they yield a smoother energy landscape. In the simplified system, both proteins (receptor and ligand) are represented at residue level. The center of a residue is chosen as the position of the Cβ atom except for Glycine where we use the Cα atom since its side-chain consists of only a hydrogen atom. The receptor is held fixed and the ligand is freely moving as a rigid body. Then, the conformation of the ligand, or more specifically, the positions of the ligand’s residues become a function of conformational variables which we denote by the vector x. Let = {ri; i = 1, … , I} denote the set of positions for the receptor residues where ri is the center of receptor residue i and = {rj(x); j = 1, … , J} denote the set of positions for the ligand residues where similarly rj(x) is the center of ligand residue j. Let x* correspond to the native complex and define as “native contacts” all pairs (i, j) where i corresponds to the ith receptor residue, j corresponds to the jth ligand residue, and dl ≤ ri – rj(x*) ≤ du for some appropriate distances dl, du (e.g., dl = 3.8 Å and du = 6.5 Å). Namely, we call “native” all contacts where the corresponding residues are close enough in the native complex. Denote by the set of all these native contacts (i, j) and set dij(x) = ri – rj(x) , where dij(x*) will denote distances in the native complex. Measuring all distances in Å, we consider the following two potentials:Reduced Potential 1 F1(x) = Σ(i,j) ![]() (dij(x) – dij(x*))² – 100. Essentially, there are a number of “springs” connecting the residues of the receptor and the ligand at the interface. These springs have different equilibrium length dij(x*) but the stretch force constants are not differentiated. For physical feasibility we also use the penalty term of space exclusions: 10 Σi,j 1{dij(x) < dl}, where 1{·} is an indicator function equal to 1 only if the corresponding condition is satisfied and 0 otherwise. This term is defined for every reduced potential and would not be repeated in the next potentials. Notice that the potential is expected to exhibit relatively smooth funnel-like shape but not trivially quadratic in x since dij(x) is nonlinear in x.Reduced Potential 2 We maintain the penalty for space clashes but replace the native interactions by the term F2(x) = 10 Σ(i,j) ![]() Eij1{dl ≤ dij(x) ≤ du} + Σ(i,j)![]() (dij(x) – du) 1 {dij(x) > du}. Namely, we consider a strong attraction force for native contacts in the range (dl, du) and a linear force for longer range contacts. The native interactions are weighted with the Miyazawa-Jernigan residue-residue contact energies [7], denoted by Eij for residue pair (i, j).The purpose of local minimization for these reduced potentials is to remove space clashes at the residue level. This is simply achieved by incrementally pulling the ligand away from the receptor along the line segment connecting the two centers of mass, until the penalty term becomes zero. The incremental stepsize we used was 0.2 Å. B. The SDU Method We have developed a stochastic optimization method called Semi-Definite Underestimator (SDU) algorithm for funnel-like shaped scoring functions. The method has similarities with the CGU method [5], applied in protein folding, which uses canonical quadratic underestimation to approximate the energy function. However, the rather restricted choice of the underestimator results in problematic performance for challenging protein docking problems. We follow a similar strategy and work on the envelope surface spanned by the set of local minima. This surface inherits the smooth behavior of the low-frequency energy terms. More specifically, we generate a moderate number of local minima and construct a general convex quadratic function that forms the tightest underestimator of all of them. This quadratic function suggests the location of the energy minimum. We use this information to iteratively refine our search. Constructing an underestimator Let us denote by f : n → the free energy function we seek to minimize and assume we have obtained a set of K local minima ϕ¹, … , ϕK of f(·). We are interested in constructing a convex quadratic function U(ϕ) satisfying U(ϕi) ≤ f(ϕi), for all i = 1, …, K, that is, a function that underestimates f(·) at all local minima ϕi, i = 1, …, K. More specifically, U(ϕ) ϕ′Qϕ + b′ϕ + c, where Q n×n, is a positive semi-definite matrix, b n, and c is a scalar. The positive semi-definiteness of Q guarantees the convexity of U(·).In [8] we showed that the problem of finding the tightest possible such underestimator U(·) can be formulated as a Semi-Definite Programming (SDP) problem. These are nonlinear problems that possess special structure and can be solved efficiently using interior-point methods (in polynomial time in the size of the input). Biased sampling Suppose we are seeking the global minimum of f(·) in some region and have obtained an under-estimator U(·) as described above. Depending on the samples we used, and assuming that the constructed underestimator reflects the general structure of f(·), the minimum of U(·), say ϕP, is in the vicinity of the global minimum of f(·). We will be referring to ϕP as the predictive conformation. Our sampling method generates points so that the ones close to ϕP are more likely to be selected while points with high enough energies are assigned small probabilities (see [8]).The SDU algorithm (outline) [8] SDU seeks a global minimum of f(·) in a region and maintains a set of interesting local minima obtained so far as well as the best such local minimum denoted by ϕG. It goes through a series of iterations, each consisting of an exploration step and an underestimation step. The exploration step generates points in the current search region using the sampling approach described above. Using each such sample as a starting point we perform local minimization of f(·) and update to include these new local minima while discarding unfavorable local minima. Solving the SDP problem described earlier we derive a new underestimator that underestimates f(·) at all points in , which is then used to drive exploration in the next iteration. In [8] we have established that SDU probabilistically converges (i.e., with probability converging to 1) to the global minimum of f(·) when applied to a general class of funnel-shaped functions as the number of local minima used for underestimating f(·) grows.C. Search Space and Our Strategy For protein docking, we optimize the free energy over the 6-dimensional (6D) space, denoted by SE(3), of translations and rotations of the ligand with respect to the receptor. SE(3) is the semidirect product of two distinctive subspaces: ³ (translations) and SO(3) (the rotation group). ³ is a Euclidean space without any curvatures. As to SO(3), Euler first showed that SO(3) is a 3-dimensional manifold and at least 5 parameters are required to represent it in an 1 – 1 global manner. One can also view optimization in SO(3) as taking place in the vector space of 3 × 3 rotation matrices R where R′R = I and det(R) = 1, where I is the identity matrix and det(·) denotes the determinant. Various other parametrizations of SO(3) are available, e.g., Euler angles, exponential coordinates, quaternions (see [9]).The translational space ³ is Euclidean so applying SDU is more straightforward. We tested on unbound ligands for all complexes in the benchmark set of [10]. For each complex, we considered the bound orientation and pulled the ligand 7Å away along the line segment connecting the two centers of mass. We performed translational optimization while keeping the ligand’s orientation fixed to the native one. Out of the 48 protein complexes in the benchmark set, SDU finds 19 predictions with < 1Å RMSD and 40 with < 3Å RMSD.SO(3), on the other hand, is a nonlinear manifold. It can be locally approximated by a Euclidean space, but, in such small regions the energy landscape is rugged and relatively flat to provide some guidance. As indicated in Sec. I, the protein-docking community has explored 1-dimensional energy funnels (in RMSD plots); yet, high-dimensional funnels in SE(3) or SO(3) have neither been validated nor explored. To overcome these difficulties, the strategy we adopt is to pursue optimization separately in each of the two subspaces – the translational subspace ³ and the orientational subspace SO(3) – and then weave these moves into a coordinated movement of the receptor towards the ligand.D. Parametrizations of the Rotation Group To optimize these reduced potentials over x SE(3) the parametrization of SO(3) is critical. The shortest curves – or minimal geodesics – on the SO(3) manifold are of the form R(t) = R0eΩt, where Ω is a 3 × 3 skew-symmetric matrix formed by the elements of some vector ω ³. In this curve, R0 SO(3) is the initial point and R(1) is the final point. Hence, a parametrization of SO(3) can be obtained by mapping rotation matrices R SO(3) to vectors ω ³ through the relationship R = R0eΩ, where Ω is a skew-symmetric matrix formed by the elements of ω. This parametrization produces deep and wide enough funnels of protein binding energies.E. Coordinating Translational and Rotational Moves The ultimate goal in refinement-stage docking is to start from some initial position and orientation of the ligand and move in the conformational space defined by the complexes in some conformational regions to form the minimum energy complex in the region. If the region we are working with contains the native complex, then our objective is to produce a high-quality (< 1–3Å RMSD) approximation. Let us represent the energy function to be minimized as f(r,ω) where the receptor is held fixed, r ³ denotes the position of the ligand, and ω ³ maps to an orientation of the ligand in SO(3) using the exponential parametrization described above. We use the following series of translational/orientational adjustments:1. Orientational Adjustment Start with a given starting structure characterized by (r0,ω0) and optimize the energy over ω within some hypercube centered at ω0 while keeping the translation vector fixed at r0. This optimization can be done using our SDU algorithm as ω is unconstrained in a Euclidean space. Suppose this yields a new structure characterized by (r0, ω¹); this is a structure that has oriented itself to minimize the energy potential. 2. Translational Adjustment Start with (r0, ω¹) and now optimize over r in some region centered at r0. That is, using the ω¹ orientation make a “step” towards the receptor in order to minimize the energy potential. Again, this can be done using our SDU algorithm. 3. Convergence Criterion Repeat steps 1 and 2 above until no significant movement of the ligand is observed. We note that even though each translational and orientational adjustment is in some small region of ³ and SO(3), respectively, it is achieved by global optimization. That is, these moves are not local (in the sense of local optimization) and involve overcoming significant energy barriers.III. RESULTS We tested the proposed approach against a standard Monte Carlo minimization using the Metropolis acceptance criterion. The test set consists of 10 arbitrarily chosen bound protein complexes from the benchmark set of [10]: 5 enzyme-inhibitor complexes (1AVW, 1BRC, 1CSE, 2KAI and 2PTC) and 5 antigen-antibody or other ones (1A0O, 1AHW, 1AVZ, 1MLC and 2JEL). We place the ligand about 10 Åaway from the receptor and orient it by rotating the native orientation by 0.8 radian on average (i.e., 46°) around an arbitrary axis. This results on average in ligands 15 – 20 Åaway (binding site of the ligand, Cα) from the native one. Our initial search region consists of a 12 Åcube in the translational subspace and a 0.3 radian cube in the orientational subspace of local exponential parameters. The native structure x* is beyond this region. Samples are uniformly generated in both of these regions until underestimators are constructed. Then biased sampling guided by the underestimators is adopted. Although function evaluation is negligible in these cases with reduced potentials, we should keep in mind that in docking actual proteins function evaluations (e.g., using CHARMM [11] or other complex potentials) dominate all other tasks. Hence, both algorithms take at most 1000 energy evaluations for each sample path and we collected 100 sample paths for each algorithm per protein complex. We report the success rates η of both algorithms for each protein where “success” is defined based either on energy (F(x) 0.95 × F(x*), cf. Table I) or on similarity (RMSD 3 Å, cf. Table II). K is the required number of sample paths (or, equivalently, energy evaluations in the thousands) to guarantee a success with a probability of 95% (K = max{log(1–η)(1–P), 1}). The last row of each table reports average results over all 10 complexes.
Based on the results above, we clearly see that SDU outperforms Monte Carlo. Using the more challenging reduced potential 2 and defining success based on energy, SDU needs on average 7,930 energy evaluations compared to 11,970 for Monte Carlo in order to reach the same level of performance; an efficiency gain of 33.8% (cf. Table I). If instead we define success based on similarity, the corresponding numbers are 5,900 for Monte Carlo and 2,830 for SDU, i.e., SDU cuts energy evaluations by a more than a factor of two (cf. Table II). It can also be seen that for antigen-antibody or other complexes the gap is even wider. In docking with more complex potentials, these comparisons reflect the relative computational requirements of the two algorithms since energy evaluations would dominate all other tasks. This significant gain in efficiency is due to SDU’s ability to exploit energy funnels, thus, zooming in more quickly in interesting regions of the energy landscape. We acknowledge that standard Monte Carlo minimization can be improved by simulated annealing-type modifications. Yet, we do not expect this to drastically change the qualitative advantage of SDU which benefits from the global structure of the energy landscape rather than relying on mostly local moves (as Monte Carlo does). IV. CONCLUSION We developed a new computational approach for protein docking exploiting energy funnels in the 6-dimensional space of translations and rotations of the ligand with respect to the receptor. The approach consists of a series of translational and orientational moves of the ligand towards the receptor. Each move is performed using the Semi-Definite Underestimation (SDU) method, introduced in in [8], which can exploit funnel-like energy functions. Using two residue-level potentials, we compared our approach with Monte Carlo on a set of 10 protein complexes, including enzyme-inhibitor, antigen-antibody, and other. To achieve the same level of performance (produce a near-native ≤ 3Å RMSD complex) our approach reduces energy evaluations by more than a factor of two, on average. If instead the requirement is to produce a conformation with energy within 5% of the native, then the efficiency gain is 33.8%, on average.
Acknowledgments Research partially supported by the NSF under grants DMI-0330171, ECS-0426453, CNS-0435312,DMI-0300359, EEC 0088073, by the NIH under grant R01-GM61867, and by the ARO under the ODDR&E MURI2001 Grant DAAD19-01-1-0465. Footnotes REFERENCES 1. Katchalski-Katzir E, Shariv I, Eisenstein M, Friesem AA, Aflalo C, Vakser I. Molecular surface recognition: Determination of geometric fit between proteins and their ligands by correlation techniques. Proc. Natl. Acad. Sci. USA. 1992;vol. 89:2195–2199. [PubMed] 2. Gray JJ, Moughon S, Wang C, Schueler-Furman O, Kuhlman B, Rohl CA, Baker D. Protein-protein docking with simultaneous optimization of rigid-body displacement and side-chain conformations. J. Molecular Biology. 2003;vol. 331:281–299. 3. Fernandez-Recio J, Totrov M, Abagyan R. Identification of protein-protein interaction sites from docking energy landscapes. J. Mol. Biol. 2004;vol. 335(no 3):843–865. [PubMed] 4. Camacho C, Vajda S. Protein docking along smooth association pathways. Proc. Natl. Acad. Sci. USA. 2001;vol. 98:10636–10641. [PubMed] 5. Phillips A, Rosen J, Dill K. ch. Convex Global Underestimation for Molecular Structure Prediction. In: Pardalos PM, et al., editors. From Local to Global Optimization. Kluwer Academic Publishers; 2001. pp. 1–18. 6. Taketomi H, Ueda Y, Gō N. Studies on protein folding, unfolding and fluctuations by computer simulation. I. the effect of specific amino acid sequence represented by specific inter-unit interactions. Int. J. Pept. Protein Res. 1975;vol. 7:445–459. [PubMed] 7. Miyazawa S, Jernigan RL. Residue-residue potentials with a favorable contact pair term and an unfavorable high packing density term, for simulation and threading. J. Mol. Biol. 1996;vol. 256(no 3):623–644. [PubMed] 8. Paschalidis IC, Shen Y, Vajda S, Vakili P. A semi-definite programming-based underestimation method for global optimization in molecular docking. Proceedings of the 44th IEEE Conference on Decision and Control; Seville, Spain. 2005. pp. 3675–3680. 9. Stuelpnagel J. On the parametrization of the three-dimensional rotation group. SIAM Review. 1964;vol. 6(no 4):422–430. 10. Chen R, Mintseris J, Janin J, Z W. A protein-protein docking benchmark. Proteins. 2003;vol. 52:88–91. [PubMed] 11. Brooks B, Bruccoleri R, Olafson B, States D, Swaminathan S, M K. CHARMM: A program for macromolecular energy, minimization, and dynamics calculations. J Comput Chem. 1983;vol. 4:187–217. |
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||
Proc Natl Acad Sci U S A. 1992 Mar 15; 89(6):2195-9.
[Proc Natl Acad Sci U S A. 1992]J Mol Biol. 2004 Jan 16; 335(3):843-65.
[J Mol Biol. 2004]Proc Natl Acad Sci U S A. 2001 Sep 11; 98(19):10636-41.
[Proc Natl Acad Sci U S A. 2001]Int J Pept Protein Res. 1975; 7(6):445-59.
[Int J Pept Protein Res. 1975]J Mol Biol. 1996 Mar 1; 256(3):623-44.
[J Mol Biol. 1996]Proteins. 2003 Jul 1; 52(1):88-91.
[Proteins. 2003]Proteins. 2003 Jul 1; 52(1):88-91.
[Proteins. 2003]