- Journal List
- HHMI Author Manuscripts
- PMC2547409

# Molecular Dynamics: Survey of Methods for Simulating the Activity of Proteins

^{*}To whom correspondence should be addressed. E-mail: ude.dscu@nommaccmj

## 1. Introduction

The term molecular mechanics (MM) refers to the use of simple potential-energy functions (e.g., harmonic oscillator or Coulombic potentials) to model molecular systems. Molecular mechanics approaches are widely applied in molecular structure refinement, molecular dynamics (MD) simulations, Monte Carlo (MC) simulations, and ligand-docking simulations.

Typically, molecular mechanics models consist of spherical atoms connected by springs which represent bonds. Internal forces experienced in the model structure are described using simple mathematical functions. For example, Hooke’s law is commonly used to describe bonded interactions, and the nonbonded atoms might be treated as inelastic hard spheres or may interact according to a Lennard-Jones potential. Using these simple models, a molecular dynamics simulation numerically solves Newton’s equations of motion, thus allowing structural fluctuations to be observed with respect to time.

Dynamic simulation methods are widely used to obtain information on the time evolution of conformations of proteins and other biological macromolecules^{1}^{–}^{4} and also kinetic and thermodynamic information. Simulations can provide fine detail concerning the motions of individual particles as a function of time. They can be utilized to quantify the properties of a system at a precision and on a time scale that is otherwise inaccessible, and simulation is, therefore, a valuable tool in extending our understanding of model systems. Theoretical consideration of a system additionally allows one to investigate the specific contributions to a property through “computational alchemy”,^{5} that is, modifying the simulation in a way that is nonphysical but nonetheless allows a model’s characteristics to be probed. One particular example is the artificial conversion of the energy function from that representing one system to that of another during a simulation. This is an important technique in free-energy calculations.^{6} Thus, molecular dynamics simulations, along with a range of complementary computational approaches, have become valuable tools for investigating the basis of protein structure and function.

This review offers an outline of the origin of molecular dynamics simulation for protein systems and how it has developed into a robust and trusted tool. This review then covers more recent advances in theory and an illustrative selection of practical studies in which it played a central role. The range of studies in which MD has played a considerable or pivotal role is immense, and this review cannot do justice to them; MD simulations of biomedical importance were recently reviewed.^{4} Particular emphasis will be placed on the study of dynamic aspects of protein recognition, an area where molecular dynamics has scope to provide broad and far-ranging insights. This review concludes with a brief discussion of the future potential offered to advancement of the biological and biochemical sciences and the remaining issues that must be overcome to allow the full extent of this potential to be realized.

### 1.1. Historical Background

MD methods were originally conceived within the theoretical physics community during the 1950s. In 1957, Alder and Wainwright^{7} performed the earliest MD simulation using the so-called hard-sphere model, in which the atoms interacted only through perfect collisions. Rahman^{8} subsequently applied a smooth, continuous potential to mimic real atomic interactions. During the 1970s, as computers became more widespread, MD simulations were developed for more complex systems, culminating in 1976 with the first simulation of a protein^{9}^{,}^{10} using an empirical energy function constructed using physics-based first-principles assumptions. MD simulations are now widely and routinely applied and especially popular in the fields of materials science^{11}^{,}^{12} and biophysics.

As will be discussed later in this review, a variety of experimental conditions may be simulated with modern theories and algorithms. The initial simulations only considered single molecules in vacuo. Over time, more realistic or at least biologically relevant simulations could be performed. This trend is continuing today.

The initial protein MD simulation, of the small bovine pancreatic trypsin inhibitor (BPTI), covered only 9.2 ps of simulation time. Modern simulations routinely have so-called equilibration periods much longer than that, and production simulations of tens of nanoseconds are routine, with the first microsecond MD simulation being reported in 1998.^{13} In addition, the original BPTI simulation included only about 500 atoms rather than the 10^{4}-10^{6} atoms that are common today. While much of this advancement results from an immense increase in availability of computing power, major theoretical and methodological developments also contribute significantly.

The number of publications regarding MD theory and application of MD to biological systems is growing at an extraordinary pace. A single review cannot do justice to the recent applications of MD. Using data from ISI Web of Science, the authors estimate that during 2005 at least 800 articles will be published that discuss molecular dynamics and proteins. The historical counts are shown in Figure 1.

### 1.2. Protein Dynamics

The various dynamic processes that can be characterized for proteins have time scales ranging from femtoseconds to hours. They also cover an extensive range of amplitudes and energies. Many of these motions have critical roles in biochemical functions.^{14} Rapid and localized motions may play a role in enzymatic reactions. Slower motions that occur on the scale of whole proteins include allosteric coupling^{15} and folding transitions. Subunit associations occur over even longer distances.

Simulations of the longer time scale folding events are covered elsewhere in this issue.^{16} Characteristic time scales for protein motions are shown in Table 1.

## 2. Application of Molecular Dynamics in the Study of Biomolecular Phenomena

Molecular dynamics can now be routinely applied in the investigation of a wide range of dynamic properties and processes by researchers in numerous fields, including structural biochemistry, biophysics, enzymology, molecular biology, pharmaceutical chemistry, and biotechnology. Using MD simulations, one is able to study thermodynamic properties and time-dependent (i.e., kinetic) phenomena. This enables an understanding to be developed of various dynamic aspects of biomolecular structure, recognition, and function. However, when used alone, MD is of limited utility. An MD trajectory (i.e., the progress of simulated structure with respect to time) generally provides data only at the level of atomic positions, velocities, and single-point energies. To obtain the macroscopic properties in which one is usually interested requires the application of *statistical mechanics*, which connects microscopic simulations and macroscopic observables.

Statistical mechanics provides a rigorous framework of mathematical expressions that relate the distributions and motions of atoms and molecules to macroscopic observables such as pressure, heat capacity, and free energies.^{17}^{,}^{18} Extraction of these macroscopic observables is therefore possible from the microscopic data, and one can predict, for instance, changes in the binding free energy of a particular drug candidate or the mechanisms and energetic consequences of conformational change in a particular protein.

Specific aspects of biomolecular structure, kinetics, and thermodynamics that may be investigated via MD include, for example, macromolecular stability,^{19} conformational and allosteric properties,^{20} the role of dynamics in enzyme activity,^{21}^{,}^{22} molecular recognition and the properties of complexes,^{21}^{,}^{23} ion and small molecule transport,^{24}^{,}^{25} protein association,^{26} protein folding,^{27}^{,}^{16} and protein hydration.^{28}

MD, therefore, provides the opportunity to perform a variety of studies including molecular design (drug design^{29} and protein design^{30}) and structure determination and refinement (X-ray,^{31} NMR,^{32} and modeling^{33}).

## 3. Molecular Dynamics Methods and Theory

Given the structure of a biomolecular system, that is, the relative coordinates of the constituent atoms, there are various computational methods that can be used to investigate and study the dynamics of that system. In the present section, a number of such methods are described and discussed. The majority of important dynamics methodologies are highly dependent upon the availability of a suitable potential-energy function to describe the energy landscape of the system with respect to the aforementioned atomic coordinates. This critical aspect is, therefore, introduced first.

### 3.1. Potential Functions and the Energy Landscape

Choice of an appropriate energy function for describing the intermolecular and intramolecular interactions is critical to a successful (i.e., valid yet tractable) molecular dynamics simulation. In conventional MD simulations, the energy function for nonbonded interactions tends to be a simple pairwise additive function (for computational reasons) of nuclear coordinates only. This use of a single nuclear coordinate to represent atoms is justified in terms of the Born-Oppenheimer approximation.^{34} For bonded groups of atoms, that is those that form covalent bonds, bond angles, or dihedral angles, simple two-body, three-body, and four-body terms are used, as described below.

The energy functions usually consist of a large number of parametrized terms. These parameters are chiefly obtained from experimental and/or quantum mechanical studies of small molecules or fragments, and it is assumed that such parameters may be transferred to the larger molecule of interest. The set of functions along with the associated set of parameters is termed a force field. A variety of force fields have been developed specifically for simulation of proteins.

There are notable exceptions, but it is usual for a force field to be purely additive. For instance, bond lengths are not considered to be dependent on the bond angles, and atomic partial charges are fixed in magnitude. This is generally considered to give a reasonable, although not flawless,^{35} approximation of the potential-energy landscape. Alternative methods for probing dynamics might demand additional restrictions or properties to be satisfied by the potential functions, and these are detailed in the appropriate sections of this review.

Some force fields, often termed class II force fields, do incorporate cross terms or higher order terms.^{36}^{–}^{38} These class II force fields were typically developed to reproduce vibrational spectra accurately or treat structures with geometries far from their equilibrium values.

One fairly typical and widely applied force field is the CHARMM22 force field.^{39}^{–}^{41} Like all widely applied force fields, it consists of several discrete terms. Each of these terms possesses a simple functional form and describes an intermolecular or intramolecular force exhibited within the system given the set of relative atomic coordinates

where *K*_{d}, *K*_{UB}, *K _{θ}*,

*K*, and

_{χ}*K*are the bond length, Urey-Bradley (1–3 bond length), bond angle, dihedral angle, and improper dihedral angle force constants, respectively. Likewise,

_{ϕ}*d*,

*S*,

*θ*,

*χ*, and

*ϕ*are the bond length, Urey-Bradley (1–3 bond length), bond angle, dihedral angle, and improper dihedral angle values exhibited in the current configuration, and the zero subscript represents the reference, or equilibrium, values for each of those. These terms represent the bonded interactions. The final term in the function represents the nonbonded interactions, incorporating Coulombic and Lennard-Jones interactions. ε

*relates to the Lennard-Jones well depth,*

_{ij}*R*

_{ij}^{min}is the distance at which the Lennard-Jones potential is zero,

*q*is the partial atomic charge of atom

_{i}*i*, ε

_{l}is the effective dielectric constant, and

*r*is the distance between atoms

_{ij}*i*and

*j*. The Lorentz-Berthelodt combination rules

^{42}are used to obtain the necessary Lennard-Jones parameters for each pair of different atoms; ε

*values are the geometric mean of the ε*

_{ij}*and ε*

_{ii}*values, while*

_{jj}*R*

_{ij}^{min}values are the arithmetic mean of the

*R*

_{ii}^{min}and

*R*

_{jj}^{min}values. Values for the atomic partial charges,

*q*, are determined from a template-based scheme, with charges often modified to reproduce dielectric shielding effects (i.e., to mimic some of the effects of shielding from a high dielectric constant solvent). This ε

_{i}_{l}is usually set to unity for simulations incorporating explicit solvent representations.

Using eq 1 one may evaluate the potential energy, *V*(*r*), of the system from a single set of atomic coordinates since the relevant distances and angles are easily determined. The energy is that of a single instance, termed a snapshot, of the system.

*V*(*r*) includes contributions from every bond angle and dihedral angle in the system; however, it might be noted that many Urey-Bradley terms and improper dihedral angles are not used. Only those that are required for fitting computational results to observable vibrational spectra are utilized. In addition, the linear term in the original Urey-Bradley function is not incorporated at all. This is a reasonable approximation because it has been shown that, when Cartesian coordinates are used, only the quadratic term is required for determining vibrational frequencies.^{43} The nonbonded terms are applied to all atoms except those attached through one or two covalent bonds. In certain, specific, cases the Lennard-Jones term is adjusted for atoms connected through three covalent bonds in order to accurately reproduce experimentally observed structures. An example of such a case is the nitrogen and oxygen atoms of amides.

For the purposes of MD, it is advantageous for the force field to have efficiently accessible first and second derivatives with respect to atomic position (which correspond to the physical characteristics of atomic force and force gradients, respectively), and this is one of the more notable reasons for the very simple mathematical forms generally chosen.

The CHARMM force fields have been separately parametrized for proteins,^{39} nucleic acids,^{44} lipids,^{45} and carbohydrates^{46}^{,}^{47} with the goal of consistency between these sets, allowing for simulation of heterogeneous systems. Different force fields exist for small organic molecules^{48}^{–}^{51} and nonbiological macromolecules such as zeolites.

### 3.2. Energy Minimization

Although not strictly a dynamics method, energy minimization is a fundamental concept upon which much of the theory discussed in this review is built.

Given a set of *N* independent variables, *r*, where *r* = (*r*_{1}, *r*_{2}, *r*_{3}, …, *r _{N}*), the task is to find the values for each of those variables, termed

*r*

_{min}, for which a particular function,

*V*(

*r*), has its global minimum. In the case of a molecular mechanics protein model,

*N*is typically three times the number of atoms (resulting from three degrees of freedom per atomic coordinate),

*r*encodes the atomic coordinates (e.g., the Cartesian coordinates), and

*V*is the potential energy as given by an equation such as eq 1. It may be seen that computationally this task is a nonlinear optimization problem.

Numerous algorithms exist for solving such nonlinear optimization problems, and a small selection of these are widely applied in molecular mechanics modeling of proteins. Relevant algorithms are reviewed elsewhere.^{52}^{,}^{53}

It is extremely difficult to locate the global minimum of a general nonlinear function with more than a dozen independent variables. Typical biomolecular systems with as few as a hundred atoms will be described with on the order of 300 variables; thus, it is usually impossible to provably locate the global minimum. Also, while energy minimization methods may be used to efficiently refine molecular structures, they are totally inadequate for sampling conformational space. Given an unrefined molecular structure with bond angles and lengths distorted from their respective minima or with steric clashes between atoms, energy minimization methods can be very useful for correcting these flaws and are therefore routinely applied to protein systems. The most popular methods include those that use derivatives of various orders, including the first-order (i.e., utilizes first-order derivatives) steepest descent and conjugate gradient methods and the second-order (i.e., utilizes second-order derivatives) Newton-Raphson method.

The steepest descent method is one of several first-order iterative descent methods. These all utilize the gradient of the potential-energy surface, which directly relates to forces in the MM description of molecular systems, to guide a search path toward the nearest energy minimum. Because this corresponds to reducing the potential energy by moving atoms in response the force applied on them by the remainder of the system, this method is attractive as it may be considered to have a behavior that is physically meaningful. Formally, the force vector is defined as *F*(*r*) = −d/d*r V*(*r*) where *r* is the vector of atomic coordinates.

In all of the iterative descent methods, a succession of atomic configurations are generated by applying, for iteration *k*, the relationship *x*(*k*) = *x*(*k* − 1) + *λ*(*k*)*F*(*k*), where the vector *x* represents the 3*N* dimensional configuration, *λ*(*k*) is a step size, and *F*(*k*) is the force vector. The step size for the first iteration is usually selected arbitrarily or else by some simple heuristic. After every iteration this step size is adjusted according to whether the overall potential energy of the system was reduced or increased by that step. If the energy increased, it is assumed that the step size was sufficiently large to jump over the local minimum along the search direction, and accordingly the step size is reduced by some multiplicative factor, typically 0.5. In the event that the energy was indeed reduced, the step size is increased by some factor, typically around 1.2. This continual adjustment of the step size keeps it roughly appropriate for the particular curvature of the potential-energy function in the region of interest. While the steepest descent method is highly inefficient for multidimensional problems with irregular potential surfaces with multiple local minima, it is robust in locating the closest local minimum. Consequently, the global motions required to locate the global energy minimum will not be observed. Nonetheless, it is very effective in removing steric conflicts and relaxing bond lengths and bond angles to their canonical values.

The Newton-Raphson method is a popular second-derivative method, although it requires some simple modifications before it is suited to typical biomolecular MM systems. The basic method relies on the assumption that, at least in the region of the minima, the potential energy is quadratically related to the individual variables. *V*(*x _{i}*)

*a*+

*bx*+

_{i}*cx*

_{i}^{2}, where

*a*,

*b,*and

*c*are constants. This leads to first and second derivatives of

At the minimum, d*V*(*x*_{min})/d*x* = 0, so *x*_{min} may be calculated using

For quadratic surfaces, no iterative searching is necessary since the exact minimum may be determined from the current configuration and the derivatives at that configuration. Unfortunately, biomolecular MM systems tend to be extremely nonquadratic and also contain many local minima. These characteristics render the basic Newton-Raphson method less useful. However, it has found widespread use as a method for efficiently completing the optimization performed via an alternative method. One modified form of the method, adopted basis set Newton-Raphson (ABNR), is very effective for large biomolecular systems.^{41}

### 3.3. Adiabatic Mapping

The simplest approach to studying motion in proteins is the characterization of low-energy paths for specific motions. This approach is termed “adiabatic mapping”.^{54}

The protocol typically followed involves forcing specific atoms to move along a predetermined path to cause a structural change of interest. The remaining atoms are allowed to move freely, subject to the potential-energy landscape, to reduce (or minimize) the overall potential energy at each point along the path. It is assumed that, since shifts in atomic coordinates will roughly correspond to the structural fluctuations required to allow the motion, these energies approximate the change in energies that should be observed during the associated real, spontaneous, motion.

Adiabatic mapping is computationally inexpensive and has therefore been applied to study many structural changes of various magnitudes or scales. No direct information on the time scales of dynamic mechanisms is obtained, although some approximate results can be derived from the relaxed energies in analytical models of the dynamics (e.g., the Langevin equation, below).

The major flaw in this method is the dependence of the results on the path selected to represent, or drive, the entire motion. If the motion actually proceeds along a different path, then misleading results will be obtained. Quantitative errors are also to be expected due to any incomplete conformational relaxation, chiefly overestimation of enthalpy barriers caused by incomplete relaxation of delocalized strain. Energy minimization algorithms tend to be inefficient with respect to nonlocal strain (i.e., that which may be driving large domain motions). In addition, this approach ignores certain important thermal effects. For instance, neither the entropy nor the temperature dependence of the enthalpy is ordinarily obtained, despite these being important factors in the kinetics of structural motions.

### 3.4. Molecular Dynamics

In simple terms, molecular dynamics simulations involve the iterative numerical calculation of instantaneous forces present in a MM system and the consequential movements in that system. The MM system consists of a set of particles that move in response to their interactions according to the equations of motion defined in classical (i.e., Newtonian) mechanics. Classical MD is much more efficient than might be expected from full consideration of the physics of biomolecular systems due to the number of substantial approximations. Notably, quantum dynamical effects are usually ignored. Instead, each particle (typically a single atom, but sometimes a rigid set of atoms) is considered to be a point mass. This approximation is justified in terms of the Born-Oppenheimer approximation^{34} (i.e., only the nuclear displacements need to be considered). This section provides a brief overview of the concepts upon which molecular dynamics simulations are justified and implemented.

For an atom, *i*, with mass *m _{i}*, and position indicated by the 3-dimensional vector

**r**

*, the relationship between the atom’s velocity and momentum,*

_{i}*p*, is

_{i}The net force, *F _{i}*, exerted on the atom

*i*by the remainder of the system is given by the negative gradient of the potential-energy function with respect to the position of atom

*i*

The Newtonian equation of motion for atom *i* is

Given the position with respect to a single component of vector **r*** _{i}*, (that is the position along a single dimension,

*x*) at a specific time,

*t*, then the position after a short and finite interval, denoted Δ

*t*, is given by a standard Taylor series

The position *x*(*t*), the velocity d*x*(*t*)/d*t*, and the acceleration d^{2}*x*(*t*)/d*t*^{2} are sufficient for numerical solution to the equations of motion if some approximation to account for higher order terms in the Taylor series can be made. For this single dimension, Newton’s second law describes the acceleration

where *F _{x}* is the component of the net force acting on the atom parallel to the direction of

*x*.

This just leaves the unspecified approximation for the infinite series of higher terms from the Taylor expansion to be devised. The simplest approach is to assume that the higher terms sum to zero, effectively truncating the Taylor expansion at the second derivative, the acceleration. In the general case, this is a very poor approximation as highlighted by consideration of Newton’s third law. The net force acting in the entire system should be zero, resulting in conservation of the total energy (i.e., kinetic plus potential energies) and conservation of the total momentum. With the simple approximation suggested, significant fluctuations and drifting over time occur in the total energy of the system as a simulation progresses. A wide range of improvements to this simple approximation are used in modern molecular dynamics software, many of which are described later in this review.

Numerous algorithms exist for integrating the equations of motion.^{55}^{–}^{58} Many of these are finite difference methods in which the integration is partitioned into small steps, each separated in time by a specific period Δ*t* because the continuous potentials describing atomic interaction preclude an analytical solution. The simple Verlet algorithm^{55} uses the atomic positions and accelerations at time *t* and the positions from the prior step, *x*(*t* − Δ*t*), to determine the new positions at *t* + Δ*t*

A slight modification of this, known as the leapfrog algorithm,^{59} is popular. The leapfrog algorithm uses the positions at time *t* and the velocities at time *t* − (Δ*t*/2) for the update of both positions and velocities via the calculated forces, *F*(*t*), acting on the atoms at time *t*

Alternative finite difference method integrators include the velocity Verlet method^{57} and the Beeman algorithm.^{56} The velocity Verlet method synchronizes the calculation of positions, velocities, and accelerations without sacrificing precision. The Beeman algorithm exhibits improved energy conservation characteristics due to its more accurate expression for velocities.

All of these commonly used integrators are time reversible. This means the direction of simulation in time is arbitrary. If the velocities of all atoms were swapped in sign, the simulation would run in exactly the reverse direction.

The computational expense of using any particular integration scheme is important, but for practical simulations of proteins another consideration becomes even more critical. The computational demands of the integration method are insignificant compared to the calculation of all the forces acting within the system. It is therefore advantageous to limit the number of force calculations required during the simulation. One method for doing this is to select an integrator that allows longer time steps without deviating significantly from the path of an exact, analytical trajectory.

The degree to which the Taylor series expansion is important in determining the accuracy of each integrator depends on which terms they include. The largest term in the Taylor expansion which is not considered in a given integration scheme defines the, so-called, *order* of that method. The Verlet algorithm, for example, is a fourth-order method with terms beyond Δ*t*^{4} truncated.

A family of integration algorithms which are correct to a selected error order are known as the predictor-corrector methods.^{58} These methods initially estimate the positions, velocities, accelerations, and any desired higher order terms of the Taylor expansion. Next, forces are calculated with these estimated positions giving new accelerations at time (*t* + Δ*t*). The two sets of accelerations are compared, and a correction step adjusts the originally estimated positions, velocities, etc.

#### 3.4.1. Simulated Environment

A range of experimental conditions can be simulated by MD. The earliest protein simulations^{9}^{,}^{10}^{,}^{60} considered the molecules as isolated entities, effectively in a vacuum. Later simulations included explicit water and neighboring protein molecules as in a crystal environment. It is now conventional to duplicate the system periodically in all directions to represent an essentially infinite system. Typically, a cubic lattice is used for replication of the central cubic box. The atoms outside the central box are simply images of the atoms simulated in that box. So-called *periodic boundary conditions* ensure that all simulated atoms are surrounded by neighboring atoms, whether those neighbors are images or not. The so-called *minimum image con*V*ention* guarantees that duplicate interactions between atoms are not included by calculating only one pairwise interaction for each pair of atoms. For atoms *i* and *j*, the interaction is that between the original atom *i* and whichever copy of atom *j*, original or image, is closest to atom *i*.

Periodic boundary conditions are not restricted to cubic systems. Other geometries are used including the rhombic dodecahedron^{61} and the truncated octahedron.^{62} These can significantly reduce the number of solvent atoms required in the system, leading to a corresponding reduction in the computational requirements. The range of possible geometries suitable for periodic systems is limited, but stochastic boundary conditions^{63} can be utilized, in the absence of periodicity, with any system geometry.

Stochastic boundary conditions are particularly useful when investigating only a particular region such as the binding site in a ligand-binding study. This enables much of the system that would otherwise be simulated to be excluded, thus saving considerable computational resources. The region of interest is enclosed within a shell, usually spherical. The atoms belonging to this shell region are subject to stochastic dynamics, for example, evaluated using the Langevin equation. The stochastic shell region itself is enclosed in a bath region in which the atoms are stationary. This outer region forms a barrier that maintains the overall structure of the system, while the shell region accommodates any local fluctuations in conformation, density, or energy that occur in the central region where standard MD is performed. This approach has been applied in the study of proteins,^{64} but the restrictive boundaries in the simplest models are known to introduce artificial density fluctuations and can alter the structure of solvents such as water.^{65} More recent models have improved characteristics.^{66}^{–}^{68}

#### 3.4.2. SHAKE

From a fixed amount of computation, the length of a simulation is determined by a number of factors including the cost of evaluating interactions, number of interactions that need to be evaluated at each time step, period of that time step, and number of degrees of freedom that need to be propagated. To increase the efficiency of a computer simulation, any of those four aspects might be improved upon. Increasing the time step period is, therefore, a simple approach for extending tractable simulation time scales, but a number of factors limit the step size.^{69} The number of interactions to evaluate may be reduced via the use of implicit solvent models, discussed later in this review, or by a reduced representation of the biomolecular structure, also discussed later.

Numerous algorithmic improvements can be applied to enhance the stability and increase the efficiency of MD simulations. The use of integrators with good stability properties such as velocity Verlet^{57} and extensions such as the reversible reference system propagator algorithm method (RESPA)^{70} are typical and allow extended time steps to be utilized. Improvements in efficiency are often obtained through freezing the fastest modes of vibration by constraining the bonds to hydrogen atoms to fixed lengths using algorithms such as SHAKE,^{60}^{,}^{71} RATTLE,^{72} and LINCS.^{73} Specifically, the use of RESPA and fixing of bond lengths involving hydrogen atoms with SHAKE, RATTLE, or LINCS allow the use of larger time-step (Δ*t*) sizes without any significant amount of degradation in the quality of the trajectory (or in the accuracy of the simulation).

The SHAKE algorithm (otherwise known as the constrained Verlet method) is a straightforward modification of the Verlet algorithm to impose constraints on the internal coordinates such as bond lengths and bond angles. The length of the time step is restricted by the requirement that Δ*t* is small compared to the period of the highest frequency motions being simulated. For the biomolecular systems of interest, the highest frequency motions are the bond stretching vibrations, yet these vibrations are generally of minimal interest in the study of biomolecular structure and function. Thus, algorithms, such as SHAKE, that constrain the bonds to their equilibrium lengths are useful. In essence, they may be considered as averaging out the highest frequency vibrations.

In the SHAKE algorithm all constraints are imposed through fixed interatomic distances. In the case of bond lengths, single interatomic distances are sufficient. To constrain bond angles the fact that three constituent atomic coordinates are related through three interatomic distances is relied on.

If constraint *k* is on the distance between atoms *i* and *j*, then it may be expressed as

where *r _{ij}* is the vector from atom

*i*to atom

*j*(

*r*=

_{ij}*r*−

_{j}*r*) and

_{i}*d*is the desired distance. At any given step during the practical numerical simulation, the constraint is said to be satisfied whenever the deviation is less than some threshold. In the case of SHAKE, the constraint is satisfied when

_{ij}*r*

_{ij}^{2}−

*d*

_{ij}^{2}<

*ε*/

*d*

_{k}^{2}, where

*ε*is a constant and

*d*

_{k}^{2}is the equilibrium bond length.

All except the highest frequency motions of proteins are not noticeably affected, although constraints are not recommended for valence bond angles except for those in the rigid water models.^{74} This is chiefly due to coupling between the bond-angle motions and dihedral motions. It is also usually applied only to the bonds with the fastest vibrations, namely, those involving a hydrogen atom. Nonetheless, in practice the time step can typically be increased by a factor of 3 compared to simulations with the original Verlet algorithm.

The LINCS constraint method directly resets the constraints rather than the derivatives of the constraints (i.e., resets the constrained distances rather than the velocities), therefore avoiding drift inherent in the SHAKE method. It is also reported to produce a further speed up of about four times.^{73}

Other improved variations of SHAKE have been proposed, including MSHAKE,^{75} which performs matrix inversion to solve the constraint equations. Such variants are suited to systems with a limited number of interdependent constraints (e.g., water) where the cost of inverting the matrix is favorable compared to performing a large number of iterations. This method is also useful when a high level of accuracy in the application of constraints is desired. The QSHAKE method^{76} introduces quaternion dynamics for rigid fragments. The total number of holonomic constraints is reduced, thus obtaining convergence within fewer iterations and increasing stability under larger time steps. Generalized SHAKE^{77} adds support for general nonholonomic constraints, and no numerical drift is observed for large numbers of constraints with this approach.

#### 3.4.3. Experimental Conditions

Typically, it is important to accurately simulate the experimental conditions to be replicated. Various values for physical conditions, such as pressure and temperature, may be readily considered in the simulations.

##### Ensembles

An ensemble is a collection of all possible systems that have differing microscopic states but belong to a single macroscopic or thermodynamic state.^{17} Various different formal ensembles with differing characteristics exist. The most widely simulated are as follows. (1) The canonical ensemble (NVT): This is the collection of all systems whose thermodynamic state is characterized by a fixed number of atoms, *N*, fixed volume, *V*, and fixed temperature, *T*. (2) The isobaric-isoenthalpic ensemble (NPH): An ensemble with a fixed number of atoms, *N*, fixed pressure, *P*, and fixed enthalpy, *H*. (3) The isobaric-isothermal ensemble (NPT): An ensemble with a fixed number of atoms, *N*, fixed pressure, *P*, and fixed temperature, *T*. (4) The grand canonical ensemble (*μ*VT): A thermodynamic state characterized by a fixed chemical potential, *μ*, fixed volume, *V*, and fixed temperature, *T*. (5) The microcanonical ensemble (NVE): A thermodynamic state characterized by a fixed number of atoms, *N*, fixed volume, *V*, and fixed energy, *E*. This corresponds to a closed (i.e., isolated) system since energy is conserved.

Most early simulations corresponded to the microcanonical ensemble under so-called free dynamics. However, experiments are usually performed at constant temperature and volume (i.e., the canonical ensemble) or constant pressure and temperature (i.e., the isobaric-isothermal ensemble), so it is often desirable to simulate these conditions instead or mimic these conditions or those expected under physiological conditions. During a simulation at constant energy, the temperature will be observed to fluctuate due to the spontaneous interconversion of the kinetic and potential components of the total energy. The instantaneous temperature may be evaluated from the atomic velocities using

where *k*_{B} is Boltzmann’s constant, *m _{i}* and v

_{i}are the mass and velocity of atom

*I,*respectively, and

*N*is the total number of atoms. If desired, the atomic velocities can be rescaled or otherwise modified to keep the temperature constant during the course of a simulation. It is worth mentioning that eq 13 must be corrected when constraint algorithms, such as SHAKE, are used.

^{78}

To maintain a constant pressure during a simulation, the volume needs to be allowed to fluctuate by adjusting the dimensions of the periodic box and rescaling the atomic positions accordingly. Numerous methods exist for running MD simulations at a constant pressure properly. Methods include the *extended system algorithm*,^{79} the *constraint algorithm*,^{80} *weak coupling to an external bath*,^{81} the *hybrid method*,^{82} and the *Lange*V*in piston method*.^{83}

#### 3.4.4. Solvation

A reasonable representation of a protein’s environment is important for characterizing its properties through simulations. At today’s levels of understanding and computational resources, it is not currently possible to fully consider the full physiological environment of any typical protein. Certain specific cases can be considered more fully, but the general case is too complicated. The nonphysiological cases that involve proteins in crystal-packing arrangements or in vacuo (such as in the original BPTI simulation^{9}) are comparatively trivial. Even simulations of in vitro systems have particular issues to consider as outlined in this section. Nonetheless, such systems are tractable, so an aqueous solvent is selected as the environment for the vast majority simulations. Great strides have been made for simulations in more specific environments such as those for transmembrane proteins.^{24}^{,}^{25}^{,}^{84}^{,}^{85}

##### Implicit Solvation

Most proteins exist, at least partially, within an aqueous environment. Justified by this fact, it is common to assume that a protein is fully solvated in pure or ion-containing water during simulations. However, a considerable portion of the computation time could be spent evaluating the solvent-solvent interactions. It is therefore desirable to avoid using explicit water when possible. However, solvent effects are important and cannot be totally disregarded. Consequently, numerous implicit solvent models have been developed.^{86}^{–}^{89}

In addition to the dielectric screening effects, an explicit solvent contributes specific interactions that are often important for mediating protein structure or function. Thus, explicit solvents play an important role in simulations for the accurate consideration of electrostatic effects^{90} and for the valid decomposition of free energies, for example. Conversely, implicit models of solvation allow for better direct estimations of free energies of solvation than explicit solvation models.^{91} The statistical mechanical characteristics and properties of implicit solvation models have been rigorously examined.^{92}

One implicit solvent model is the generalized Born (GB) model^{93}^{,}^{88} which, especially on parallel computer systems, can be used to run significantly faster MD simulations than can explicit solvent models.^{94} Like all implicit models, GB is known to be unable to reproduce certain microscopic solvent features.^{87} Moreover, implicit solvent models are known to facilitate modified conformational dynamics of protein molecules when compared to explicit models,^{95} which is usually undesirable. A hybrid method which incorporates explicit solvent molecules in a defined region of the system, such as a binding site or a channel, is the generalized solvent potential method.^{96} In this method a static solvent-shielded field from the biomolecular solute is calculated using a finite-difference Poisson-Boltzmann method. This field is used to impose a solvent reaction field, and the specific region of interest for explicit solvent is hydrated.

Apolar solvation models, using a cavity potential plus dispersion potential decomposition, such as the analytical generalized Born and nonpolar (AGBNP) solvent model,^{97} have been shown to be very effective. The apolar component is likely to be necessary in the exploration of larger conformational changes.^{98} This overcomes the poor correlation often found between the apolar forces from explicit solvent and implicit solvent simulations.^{99}

##### Explicit Water Models

For cases when explicit consideration of the solvent is desirable, or necessary, there is a wide range of explicit water models available. The most popular of these models include TIP3P, TIP4P,^{100} TIP5P,^{101} SPC, and SPC/E.^{102}

Commonly, the parameters in water models are adjusted such that the enthalpy of vaporization and the density of water are reproduced in simulations. All of the above models have a dipole moment of about 2.3 D instead of the experimental gas-phase value of 1.85 D. The temperature dependence of the density of water is not described well by any of these models perhaps except the TIP5P model.

The most popular models for water are all consistent with the SHAKE approximation, discussed above, since these models for water treat the molecules as completely rigid.

MD simulations with only a thin layer of water around the protein can overcome some of the problems of a purely implicit solvent.^{103}^{–}^{105} The restrained water droplet model applies a weak harmonic restraining force to a 5 Å shell of water.^{106}

##### Electrostatics

Long-range electrostatic interactions^{107} play a dominant role in protein structural stability and are also crucial determinants in the initial encounter of many association processes.^{26} Typically, the most computationally expensive portion of a MD simulation is the evaluation of these long-range electrostatic interactions.^{108}^{,}^{109} As the number of charges in a system increases, the number of Coulombic interactions will grow as the square of that number, potentially resulting in a prohibitively large number of interactions to evaluate.

In earlier MD simulations, a cutoff was applied to the distance of electrostatic interactions, known as *spherical truncation*.^{110}^{,}^{111} The interactions beyond that cutoff distance, for instance, at 12 Å, were ignored so that the maximum number of interactions becomes bounded, assuming a finite density. An abrupt cutoff distance introduces an energetic discontinuity into the system, and this can lead to unstable simulations, so smoothing functions are often applied instead. To further reduce the computational cost, group-based neighbor lists were introduced, but it is known that with such techniques the energy is not conserved.^{111} The twin-range cutoff method overcomes some of these problems. The technique calculates the short-range electrostatic interactions at every time step, while the long-range interactions are only recalculated immediately after the nonbonded neighbor list is recalculated.

The Ewald summation method^{112} offers a theoretically rigorous approach to the evaluation of electrostatic interactions in infinite periodic systems. While the original method is not well suited to efficient calculations within biomolecular MD simulations, more recent work^{113}^{,}^{114} has introduced versions with improved computational complexity (*N* log *N* complexity), and these are widely applied. Particularly for systems with large periodic boxes and high dielectric solvents, the artifacts observed in simulations with the Ewald summation methods are insubstantial.^{115}^{,}^{116} These methods were demonstrated as being relatively efficient.

For systems that are naturally two dimensional, special Ewald summation and particle mesh Ewald (PME) methods can be applied.^{117}^{,}^{118} Such systems often include those in the simulation of transmembrane proteins that are typically simulated with periodicity in the plane of the membrane but with a finite length perpendicular.

The fast multipole (FM) method also offers an efficient way (computational complexity O(N)) to handle long-range electrostatic interactions.^{114}

An alternative to explicitly including all interactions in an infinite system, as is done by the Ewald and FM methods, but still considering those interactions unlike the spherical truncation methods, is to use a reaction field.^{67}^{, }^{96} This seeks to represent the surroundings by mimicking the response of the dielectric medium beyond the cutoff distance or boundary. While it is still an approximation, this gives stable and accurate results.^{119}^{,}^{109}

For nonperiodic systems, provided that they are large, multipole expansion^{120} and multigrid methods^{121}^{,}^{122} are both efficient and useful.

### 3.5. Langevin Dynamics

Langevin dynamics incorporates stochastic terms to approximate the effects of degrees of freedom that are neglected in the simulation. It is based on use of the Langevin equation as an alternative to Newton’s second law. This equation incorporates two additional terms. The first term is a frictional, or damping, function that is intended to represent the fictional drag experienced by solute molecules in a solvent that is not explicitly simulated. The second additional term is a random force that is applied to mimic the random impulses that would be expected from both the solvent and any coincident solute molecules. The Langevin equation for the motion of an atom, *i*, is

where *F _{i}*(

*r*) is the usual term used in conventional MD,

*ζ*is the friction coefficient, and

_{i}*R*(

_{i}*t*) represents the random forces experienced by the atom. The temperature of the simulated system is maintained by a relationship between

*ζ*and

_{i}*R*(

_{i}*t*) (namely, the fluctuation-dissipation theorem). When

*ζ*= 0, Langevin dynamics is equivalent to conventional MD. When

_{i}*ζ*> 0, the random impulses felt by the system can assist in propagating barrier-crossing motions and, therefore, Langevin dynamics can offer improved conformational sampling characteristics over standard MD.

_{i}### 3.6. Brownian Dynamics

Brownian dynamics (BD) is a diffusional analogue of molecular dynamics^{26}^{,}^{123}^{,}^{124} carried out through the numerical integration of the Langevin equation. When the solvent surrounding a molecule has high effective viscosity, the motion of that molecule can be described in terms of a random walk since the damping effect of the solvent will overcome any inertial effects. The Brownian dynamics method seeks to simulate the random walk to produce a representative diffusional trajectory. This is achieved by using a very large friction coefficient, *ζ _{i}*, in the Langevin equation. In the case that a process of interest is diffusion controlled, Brownian dynamics is a useful and widely applied approach that is complementary to molecular dynamics. It is common, but not essential, for proteins to be treated as rigid bodies in BD simulations. As a result of the relative computational requirements of Brownian dynamics methods compared to molecular dynamics, time scales in the microsecond or millisecond range are readily accessible.

^{26}

Examples of biological processes which are amenable to study by Brownian dynamics include diffusion-controlled reactions, diffusional encounters, and ionic diffusion under the influence of an electrostatic field.

The choice of MD versus Langevin dynamics versus BD needs to be carefully considered depending upon which contributions are thought to dominate in the motion of interest.

### 3.7. Monte Carlo

Structural and thermodynamic properties of a system can be obtained through Monte Carlo (MC) simulations, thus making these a significant alternative to molecular dynamics simulations. Monte Carlo simulations are a stochastic approach to the task of generating a set of representative configurations under given thermodynamic conditions such as temperature and volume.

One attractive aspect of conventional MC simulations is that only the potential energy is normally used in stepping through configurations; no forces need evaluation, resulting in more efficient calculations. Some biased MC approaches do utilize force data, however.

In its simplest form, the Monte Carlo algorithm is a method for numerical integration. A set of parameters are randomly selected, or randomly perturbed, and a function of these parameters is evaluated. The results of many such steps are collated, and once sufficient sampling has occurred, the probabilities of any given result occurring can be readily assessed.

Metropolis et al.^{125} introduced a technique known as Metropolis Monte Carlo simulation. In this scheme, the problem is described in terms of a thermodynamic system at potential energy, *V,* and temperature, *T*. With a constant *T*, the initial configuration is perturbed and the change in energy d*V* is computed. If the change in energy is negative, the new configuration is accepted. If the change in energy is positive, it is accepted with a probability given by a Boltzmann factor. This process is iterated until sufficient sampling statistics for the current temperature *T* are achieved. This procedure simplifies the calculation of the Boltzmann average for any observable property since it is now just the mean value of this property over all samples.

There are a number of issues that seriously hamper the use of Monte Carlo simulations with large biomolecules. Importantly, efficient moves are difficult to define for macromolecules. That is, it is difficult to devise simple structural perturbations that cause changes of a sufficiently large magnitude but also avoid generating energetically infeasible configurations. Some work has eased this issue for proteins, however.^{126}^{,}^{127}

Conventional MC methods are inefficient for exploring the configurational space of large biomolecules when compared to molecular dynamics.^{128} In addition, MC methods give no information about the time evolution of structural events. Hybrid MC/MD methods might resolve both of these issues and are described in the literature.^{129}^{–}^{131} A conceptually related procedure, known as the relaxed complex method, is discussed later in this review.

### 3.8. Simulated Annealing

The *simulated annealing* algorithm^{132} is related to the MC algorithm and forms an efficient technique to find the minimum energy configuration of a system. The usual Metropolis Monte Carlo algorithm is inefficient at sampling configurations that are beyond high potential-energy barriers; thus, it is only useful when starting at a configuration that is already near the global energy minimum well. Simulated annealing overcomes this problem by initially performing Monte Carlo steps at a very high temperature. According to a periodic schedule, this simulation temperature is decreased at a logarithmic rate (or, sometimes, a linear rate) until the temperature reaches zero. This procedure is widely used in protein modeling or refinement applications. As with conventional MC methods, no information about the time evolution of structural events can be obtained.

The diffusion equation method (DEM) potential smoothing method^{133}^{,}^{134} (as discussed in Enhanced Sampling section of this review) is an analytical equivalent to simulated annealing.^{135}

Similar temperature scaling procedures utilizing MD instead of MC for generating configurations are discussed in the Enhanced Sampling section of this review, below.

### 3.9. Nondynamic Methods

#### 3.9.1. Conformational Sampling

Numerous nondynamic methods besides MC and SA exist for sampling available conformational space of proteins.^{136}

CONCOORD^{137} is a method that does not generate a time series of configurations. However, it does generate conformations that satisfy a set of distance constraints. This is different to the above algorithms since it does not rely on a potential surface. The structures are obtained through purely geometric considerations. This allows for some conformations that might never be found in an MD simulation or other energy-based methods and is therefore complementary to the dynamical simulation methods.

#### 3.9.2. Principal Component Analysis

Principal component analysis (PCA) is a method that is often used for reducing the dimensionality of a dataset. For an arbitrary dataset where there is significant correlation between the dimensions, or individual variables, the first principal component is the linear combination of these variables which gives the best-fit line through the entire dataset. In other words, it is the linear combination which describes the greatest amount of variance in the data. The second and subsequent principal components are fit to the residual variation remaining after the more significant principal components are excluded. All principal components are orthogonal.

The separation of functionally important motions from the random thermal fluctuations of a protein is one of the challenges of trajectory analysis. PCA of the covariance matrix of the atomic coordinates is termed essential dynamics (ED).^{138} This is a powerful method for extracting the significant, large-scale, correlated motions occurring in a simulation. In this sense, the principal components are the orthogonal basis set for the trajectory’s atomic coordinates. The principal components corresponding to the greatest variance can be projected onto the protein structure, either individually or in sets. All other motions, including the smaller thermal fluctuations, will be filtered out. This facilitates visualization and appreciation of the major motions that may be biologically relevant.^{139}

### 3.10. QM/MM

Hybrid quantum mechanical/molecular mechanical (QM/MM) methods^{140} have reached a viable state and are rapidly gaining popularity.^{21}^{,}^{141}

QM/MM methods are particularly useful since they allow the study of biomolecular reaction mechanisms. This is a task for which conventional MM is unsuitable owing to their assumption that bonds are never made or broken. Conventional QM methods are also unsuitable for this task owing to their computational expense, making calculations on the scale of entire solvated proteins currently intractable. QM/MM methods are beyond the scope of this review but are covered elsewhere in this issue.^{142}

## 4. Free-Energy Calculations

The purpose of a MD simulation is often to derive kinetic and thermodynamic data about the model system. Indeed, many thermodynamic properties can be readily extracted from sufficient sample configurations of a system. As an example, the entropy of a system is directly related to the number of different configurations that are thermally accessible to it.^{17}

One very important thermodynamic quantity is free energy, a measure of the stability of a system. In particular, free energy of binding is a measure of the stability of a complex, a measure that is probably fundamental to all studies of biomolecular binding processes.

Rigorous techniques, including the thermodynamic cycle-perturbation method,^{143} exist for the estimation of free energies from simulations. However, such calculations are generally only practical for small or highly constrained systems. Often the calculations are extremely expensive or the level of sampling required for reliable statistics might be beyond the feasible limits. Methods for speeding up free-energy calculations are valuable.^{144} Recent methods for free energy of binding estimation are briefly discussed below.

### 4.1. Free Energy of Binding

The interactions between proteins and other molecules are critical to many biological systems and processes. Signal transduction, metabolic regulation, enzyme cooperativity, physiological response, and other processes are all dependent upon noncovalent binding. These processes may be investigated through modeling and simulation, particularly as the range of solved protein structures grows. Through MD, MC, and the various related methods described in this review, binding modes and the corresponding binding free energies^{6} may be estimated for protein-ligand and protein-protein complexes.

Approaches available for estimating either relative or absolute binding free energies cover a broad range of accuracies and computational requirements. Free-energy perturbation (FEP) and thermodynamic integration (TI) methods are computationally expensive, but they have been successfully applied in the prediction of the binding strengths for complexes.^{5}^{,}^{145} Many more or less rigorous methods^{146} have been developed to estimate such free energies more rapidly. These include the linear interaction energy (LIE) method,^{147} the molecular mechanics/Poisson Boltzmann surface area (MM/PBSA) method,^{148}^{,}^{149} the chemical Monte Carlo/molecular dynamics (CMC/MD) method,^{150}^{,}^{151} the pictorial representation of free-energy components (PRO-FEC) method,^{150}^{,}^{152} the one-window free-energy grid (OWFEG) method,^{153}^{,}^{154} the *λ*-dynamics method,^{155}^{,}^{156} and the 4D-PMF method,^{157} among others.

#### MM/PBSA

The molecular mechanics Poisson-Boltzmann surface area (MM/PBSA) model is a so-called endpoint free-energy method; only the initial and final states of the system are evaluated to estimate a free-energy change. This compares to the more accurate FEP and TI methods that require an equilibrium sampling of the entire transformation path, from an initial to a final state. End-point methods are computationally efficient, and consequently, they are widely discussed and applied in the literature. Despite their simplicity, a connection between statistical thermodynamics and various end-point free-energy models has been derived.^{148}^{,}^{158}^{–}^{161} Certain limitations of MM/PBSA must be considered, including the fact that there is no consideration of specific water interactions, it is sensitive to the trajectory, and it is sensitive to induced fit effects.

MM/PBSA^{149}^{,}^{162}^{,}^{163} is basically a postprocessing method to evaluate the standard free energies of molecules or the binding free energies of molecular complexes in a relatively computationally efficient manner.

The MM/PBSA method partitions the free energy into molecular mechanical (MM) energies, continuum solvation energies, and solute entropy terms as follows

where *G*_{mol} is the average standard free energy^{148} of the molecule of interest, which can be the ligand, the receptor, or their complex. *G*_{PBSA} is the molecular solvation free energy. The solute’s entropy term may be estimated using a number of methods.

The average molecular mechanical energy, *E*_{MM}, is typically defined as

where *E*_{bond}, *E*_{angle}, *E*_{torsion}, *E*_{vdw}, and *E*_{elec} are the bond, angle, torsion, van der Waals, and electrostatics terms of intramolecular energy, respectively.

The molecular solvation free energy can be further decomposed to

where *G*_{PB} is the average electrostatic contribution from molecular solvation, *γ* is the surface tension of the solvent, and *A* is the solvent-accessible surface area (SASA). The electrostatic solvation free energy, *G*_{PB}, can be calculated using

where *N* is the number of atoms in the molecule, *q _{i}* is the electrostatic charge of atom

*i*, and

*and*

_{i}^{aq}*are the electrostatic potentials of atom*

_{i}^{g}*i*in the aqueous and gas phase, respectively, which are usually obtained by solving the Poisson-Boltzmann equation.

^{164}

^{,}

^{165}

### 4.2. Activated Molecular Dynamics

Many biological processes are intrinsically fast but, since these processes occur infrequently, appear to have long time scales. As an example, many reactions and conformational transitions exhibit long time scales because they consist of one or more activated processes. In fact, activated processes such as local conformational changes associated with ligand binding^{166} are widespread in biology. An activated process is one in which a high-energy barrier exists between the initial and final states and this barrier must be overcome. The actual barrier crossing is often relatively rapid, but the time required for the system’s random thermal fluctuations to provide the constituent atoms with suitable momentum can be long.

Conventional MD is unsuitable for investigating activated processes in biology because the tractable simulation time scales are of the order of nanoseconds while the biological process might take milliseconds or longer. However, a procedure known as *acti*V*ated molecular dynamics* makes the study of such processes possible provided that the primary structural changes for the process are known beforehand.

Activated molecular dynamics is a two-stage process. First, a series of simulations is performed. Each of these simulations is constrained to a successive portion of the transition pathway. The purpose of these simulations is to locate the free-energy barrier peak. The second stage involves running conventional simulations from the region of the free-energy barrier. The resulting trajectories can be run in forward and reverse to generate a set of representative barrier crossing events. Analysis of these trajectories gives useful information regarding the mechanism of the activated process.^{167}

### 4.3. Steered Molecular Dynamics

Steered molecular dynamics (SMD) simulations introduce a time-dependent or position-dependent force. The purpose of this force is to steer systems along particular degrees of freedom. This allows one to focus on dynamic events of interest while keeping computational expense to a minimum.^{168}^{,}^{169} For example, the external force could drive a particular binding or unbinding event.

SMD offers scope for interactive steering in an immersive 3D environment. Implementations of such interactive environments include one^{170} based on SIGMA^{171} and VMD^{172} and another based on NAMD and VMD that utilizes a haptic feedback device.^{173}

In many respects, SMD simulation is the computational analogue of the experimental techniques which apply external mechanical manipulations to biomolecules. These experimental techniques include atomic force microscopy (AFM),^{174} optical tweezers,^{175} biomembrane force probes,^{176} and dynamic force spectroscopy^{177} experiments.

In the limit of weak forces that only slowly change in direction, any induced structural change will be minor and SMD will be equivalent to umbrella sampling. The results of SMD simulations are, however, often more interesting when this limit is violated significantly. Such conditions would be disastrous to many applications of standard molecular dynamics, including umbrella sampling itself and also methods relying upon free-energy perturbation theory and the weighted histogram analysis. Therefore, SMD might be useful in cases where major structural changes will be experienced and correspondingly major deviations from equilibrium would occur. Examples of such nonequilibrium cases include ligand unbinding and protein unfolding as initiated by stretching of termini. Equilibrium descriptions cannot be applied in the analysis of such simulations. Therefore, the extraction of valid potentials of mean force from SMD simulations is not straightforward, but several approaches have been proposed.^{178}^{–}^{181} Free-energy differences can be obtained from the exponential averages of irreversible work,^{182}^{,}^{183} and this leads to the most promising approach employed in extracting free-energy profiles from SMD simulations.^{184}^{,}^{185}

Jarzynski’s equation^{182}^{,}^{183} relates equilibrium free-energy differences and work done through nonequilibrium processes. Consider a system described by a parameter *λ* and a process that causes this parameter to evolve from *λ*_{0} at time zero to *λ _{t}* at time

*t*. According to the second law of thermodynamics, assuming the system is quasi-static, the average work done on that system cannot be smaller than the difference between free energies of the system corresponding to

*λ*

_{0}and

*λ*

_{t}In other words, a nonequilibrium process provides only an upper limit to the free-energy difference. Jarzynski^{182} proposed an equality that is independent of the speed of the process

where the average is over a set of trajectories of the nonequilibrium process.

This equality has been validated both experimentally^{186} and through computational simulation.^{183} Thus, the equality provides a method for calculating free energies from non-equilibrium processes despite conventional thermodynamic integration being invalid because Δ*F* does not equate to *W*. The major difficulty that remains, however, is that the average of exponential term of Jarzynski’s equality is dominated by trajectories corresponding to small values of work. These trajectories are infrequent in the simulations, leading to inadequate sampling. Currently, practical application is limited to slow processes where the fluctuation of work is comparable to the simulation temperature.^{184} Coupling SMD to certain enhanced sampling techniques described later in this review might extend this limit of practicality.

A related technique is targeted MD in which a force that is dependent upon the difference between the current conformation and a target conformation. The aim is to drive the evolution of the simulation toward the given target conformation. Targeted MD has been applied in the prediction of pathways between particular protein conformations^{187} and in protein folding.^{188}

## 5. Recent Advances in the Theoretical Aspects of Molecular Dynamics

### 5.1. Force Fields

The accuracy of the potential-energy function is of crucial importance to the validity and stability of MD simulations of proteins^{189}^{,}^{190} and indeed all macromolecules. As indicated above, the form of the energy function must be simple in order to make such computations tractable. It is also important that derivatives are readily accessible to facilitate efficient minimization and efficient integrators of motion.

Most force fields utilized in MD simulations of proteins share a significant number of similarities. Harmonic terms describe bond lengths and angles, Fourier series describe torsions, and pairwise atomic interactions are described using a Lennard-Jones function and a Coulombic function. Usually, although not always, parameters are first obtained for protein systems and subsequent parameters are derived for nucleic acid, lipids, and other biological molecules in such a way that they are consistent with the protein set. The main differences between the various force fields result from the diverse approaches taken to derive the individual parameters. It is not unusual for the parameters to contain significant interrelations and compensatory components such that the final results within a full simulation system reproduce desired experimental observables. Relatively innocuous seeming differences in the way that different software packages handle technical details in the simulation, such as the treatment of long-range electrostatic effects or the treatment of interactions between atoms bonded through a small number of intermediate atoms, can lead to substantially divergent energies with an alternative energy function. A drawback of this is that parameters for a given atom type cannot be compared between the force fields. The direct transfer of parameters from one force field to another is, generally, not valid.

The consistent-valence force field (CVFF)^{191} differs from most of these force fields in the sense that it has a more complex functional form. Most of the others differ only through minor points such as how improper torsions (i.e., out of plane dihedral angles) are treated, what scaling factors are used for nonbonded interactions, or whether hydrogen bonds are included explicitly. The van der Waals parameters of all of the force fields listed above were developed through empirical fitting to small molecule model systems in liquid or solid phases. As a result, the densities of solvated protein systems tend to be close to reality. The torsional parameters tend to be fit to a mixture of QM and empirical data. The parametrization of template partial charges for the atoms in residues is more challenging. The resultant electrostatic interactions must be balanced with the particular water models. A typical approach is to determine the gas-phase partial charges through a QM calculation of a model compound and to scale these calculated charges by a somewhat justified multiplicative factor.

While it is not clear whether alanine tetrapeptide makes a reasonable model system for proteins, a study of 20 different protein force fields using *ab initio* quantum mechanical calculations indicated that there were discrepancies in all of these force fields.^{192} Furthermore, it was suggested that in order to yield accurate electrostatics, the force fields would need to incorporate non-atom-centered partial charges.

The AMBER95 force field^{193} for proteins is an example of one of the several widely used force fields that are developed alongside particular MD simulation software packages. The accuracy of partial charges assigned to various atoms in a protein structure is critical. Partial charges for the AMBER force field were determined using the restrained electrostatic potential (RESP) method.^{194}^{,}^{195} This method fits a quantum mechanically calculated electrostatic potential at molecular surfaces using an atom-centered point charge model. Subsequent studies were conducted to assess how well the method performed in calculating conformational energies, and it performed better than the other tested force fields.^{196} RESP charges have been calculated for molecules with a description of lone-pair donor sites and atom-centered polarization.^{197} In this study the partial charges were determined self-consistently from the charges and induced dipoles to reproduce a quantum mechanical electrostatic potential. In a separate study^{198} an automated systematic search and genetic algorithm-based approach was applied for parameter optimization. For both of these methods the error in the conformational energy was lower than with the older AMBER95 force field.

The general AMBER force field (GAFF)^{199} for organic molecules is designed to be compatible with existing AMBER force fields. It has parameters for most pharmaceutically relevant compounds and can be applied to a range of molecules with an automated methodology. This makes it suitable for applications such as rational drug design where protein-ligand simulations are utilized and manually assigning appropriate parameters to all ligands is not practical.

In the CHARMM22 force field^{200} the atomic charges were derived from *ab initio* quantum chemical calculations of the interactions between model compounds and water molecules.

The majority of the bond length and bond angle parameters of the OPLS-AA force field^{201} were extracted directly from the AMBER95 force field. The torsional and nonbonded parameters were instead derived using a combination of *ab initio* molecular orbital calculations and Monte Carlo simulations. A study using a similar set of parameters for amines concluded that there is no need to consider polarizability.^{202} The OPLS-AA force field was further improved by reparametrizing the torsional coefficients.^{203} The deviation in the energy compared to those from *ab initio* calculations of peptide was significantly reduced.

#### 5.1.1. Polarizable Force Fields

One physical characteristic of molecular entities that is currently avoided in contemporary MM force fields is the effect of fluctuations in polarization.^{204}^{,}^{205} There are two schools of thought on the matter. The first would suggest that since force fields are freely parametrized, these parameters could be fixed such that the effects of polarization are implicitly incorporated, at least in an approximate or average sense. Others would argue, however, that standard empirical force fields do not include any polarization terms and only by explicitly including such terms would accurate reproduction of experimental observables be enabled.^{192} The hypothetical polarizable force fields were for many years touted as the “next-generation” of force field.^{192}^{,}^{206}^{–}^{208} To date the most common methodologies for incorporating explicit polarizability include induced dipole models^{209} and fluctuating charge models.^{210}^{,}^{211} Unfortunately, the practical issues involved along with uncertainties in the best way to approximate the physics have resulted in a distinct lack of usable polarizable force field for proteins and other biomolecules. Nonetheless, the simpler case of homogeneous liquids such as water has yielded to the efforts of force field developers.^{212}^{–}^{214} For example, in one study which found increased accuracy in a water model with polarization compared to simpler condensed phase models, charges and dipoles were calculated by fitting to *ab initio* potentials of isolated molecules and additional polarizability parameters were fitted to a range of potentials from applying electric fields to the molecules.^{215} A recently presented alternative is based on the Drude oscillator.^{216}^{–}^{218}

More recently, some attempts toward producing a usable polarizable force field for proteins have produced some more promising results. A fluctuating charge model for protein has been demonstrated in a nanosecond time scale simulation.^{219}^{,}^{220}

A promising atomic multipole method^{221} is distributed with the Tinker MD package^{222} (http://dasher.wustl.edu/tinker/).

### 5.2. Constant pH Molecular Dynamics

The protonation state of acidic and basic residues of a protein along with the protonation state of any substrate will be influenced by their interactions with the environment, their internal interactions, and their mutual interactions.^{223} While it is important to correctly assign protonation states, to run realistic simulations, it is especially vital to estimate the most favorable protonation states accurately in order to reliably estimate the free energy of the system. Numerous non-MD methods are available for estimating protonation states of proteins.^{224}^{–}^{227}

The general problem of allowing for protonations and deprotonations of titratable residues during MD simulations could be important for accurate representation of proteins and has been examined.^{228}^{–}^{232} A number of procedures for incorporating such events have been proposed.^{233} In some of these methods, protons are added or removed in continuous, nonintegral fashion. In other methods, the Poisson-Boltzmann equation is used to gauge the correct protonation state but not for the propagation of the MD trajectory.

Recently a method using physically realistic, integral changes in protonation and consistent potentials for both titration and propagation was presented.^{234} This approach uses a generalized Born model for the aqueous solvent. To ensure that surface groups on the solute protein exhibit proper canonical configurational sampling, Langevin dynamics is used to propagate the solute trajectories. The interdependence of titration states and solute conformation is recognized by use of periodic Monte Carlo sampling of the protonation states of the titratable residues. In the Monte Carlo step, a titratable site and protonation state are chosen at random, and the transition energy is calculated using

where *k*_{B} is the Boltzmann constant, *T* is temperature, pH is the specified solvent pH, p*K*_{a,ref} is the p*K*_{a} of the appropriate reference compound, Δ*G*_{elec} is the electrostatic component of the transition energy calculated for the titratable group in the protein, and Δ*G*_{elec,ref} is the electrostatic component of the transition energy for the reference compound, a solvated dipeptide amino acid. The electrostatic portion of the transition energy is determined by taking the difference between the potential calculated with the charges for the current protonation state and the potential calculated with the charges for the proposed state. There is no need for solvent equilibration because an implicit model is incorporated, so this is done in a single step. The equation can then be used to calculate the total transition energy, as all other terms are known. The total transition energy, Δ*G*_{elec}, is used as the basis for applying the Metropolis acceptance criterion to determine whether this transition will be accepted. If the transition is accepted, MD is continued with the titratable group in the new protonation state. Otherwise, MD continues without change to the protonation state. Applications of the method to hen egg white lysozyme yielded agreement with the experimental p*K*_{a} to within plus or minus one unit for most titratable sites. This is comparable to methods designed specifically for p*K*_{a} prediction.

### 5.3. Advanced Sampling Techniques

Sometimes one knows two biologically relevant but distinct conformational states but knows little about the necessary dynamic events of paths that convert one state into the other. Sometimes one knows a single conformational state, often a crystallographically determined structure but also knows that there must be a conformational change to provide a particular activity of interest. In many cases, the time scales involved in the conformation changes are not accessible via conventional MD techniques. To address this issue, numerous accelerated MD variants are proposed in the literature,^{235}^{,}^{236} and the more recent advances are covered here. These accelerated MD methods extend the conformational sampling characteristics, enabling extended time scales, or effective time scales, to be accessed. These methods aim to allow rare dynamic events to be observed more readily.

Accelerated MD methods can be grouped into three broad classes. The first class alters sampling of conformational space though explicit modification of the potential surface. The second class also alters the sampling but by using non-Boltzmann sampling to increase the probability of high-energy states. The third class includes those methods that enhance the sampling of certain degrees of freedom at the expense of other, typically faster, degrees of freedom. There is some overlap between these classes.

All of these accelerated MD approaches are distinct from fundamental improvements in the core MD algorithm, such as improved integrators and parallelism. These latter types of enhancements are discussed elsewhere in this review.

The most interesting or promising enhanced sampling methods are described in the following sections, but the reader should note that this area is actively researched, and further approaches are keenly anticipated. A separate group of methods are those that apply some non-MD algorithm. Although these are not actually MD methods, they are relevant to the issue at hand and are therefore also discussed here.

There are two primary goals driving development of enhanced sampling methods. Some of these approaches aim to increase the volume of conformational space that is explored during the simulation, while others aim to drive the system to a particular conformation or to the global minimum energy conformation more rapidly.

#### 5.3.1. Modified Potentials

The basic principle behind various potential-energy modification methods is to reduce the amount of time that the simulated system remains in a local energy minimum well, speeding the transitions from the region of one local minimum to another, forcing the system to sample the remainder of the available conformational space. In each of these approaches, the potential-energy function is altered to enhance sampling by reducing the propensity of energy wells to act as conformational traps. Methods that modify the potential-energy surface include the deflation method,^{237} conformational flooding,^{238}^{,}^{239} umbrella sampling,^{240} local elevation,^{241} potential smoothing,^{135} puddle-skimming^{242}^{,}^{243} and puddle-jumping method,^{244} hyperdynamics,^{245}^{,}^{246} and accelerated MD.^{247}

Another method is the well-studied diffusion equation method (DEM),^{133}^{–}^{135}^{,}^{248}^{–}^{251} a type of potential smoothing. In DEM, the diffusion equation is solved analytically for the potential-energy surface, thus deforming and smoothing it. A time-reversal process is used to restore the potential to its original form as the simulation proceeds.

The smoothing or flattening of the potential-energy landscape can be applied either globally to increase the overall sampling or only along a specific, predefined, reaction coordinate to enhance the conformational evolution along the direction of a desirable transformation or enhance sampling over a particular set of conformations, depending upon the method. For example, given some prior information about the desired conformations, one widely used approach is umbrella sampling.^{240} A compensating function, known as an umbrella potential, is added to the potential-energy function to bias the sampling. Obviously, construction of the umbrella function requires prior knowledge of the conformations of interest. Nonetheless, umbrella sampling is a powerful alternative to adiabatic mapping. In umbrella sampling, local strain may be relaxed more effectively and kinetic effects can be included to some extent. Importantly, some deviations from any specified conformational path can be tolerated.

In principle, these schemes are all applicable to methods other than MD. They can be used for enhancing sampling by MC simulations, for example.

The local elevation method^{241} enhances sampling by adding a penalty potential to any conformations previously sampled. This resembles the widely used tabu search algorithm in that a list of previous solutions is maintained and new solutions are driven away from solutions existing in this list. This approach has not been adapted for use with proteins. It is likely that the storage overhead for configurations of such large molecules would be overwhelming.

Another approach that aims to force systems out of their sampled local minima is conformational flooding.^{238}^{,}^{239} In conformational flooding the initial state of the system is destabilized by adding an extra unfavorable potential at this initial state. This method predicts the so-called essential degrees of freedom using PCA, described earlier. A Gaussian potential is added to the system to force it along these essential degrees of freedom. Unlike local elevation, this method has been demonstrated in protein simulations.

##### Hyperdynamics and Accelerated MD

The recently described *accelerated molecular dynamics*^{247} allows for more rapid sampling of the configurational space in systems with rough energy landscapes and also allows one to calculate the correct thermodynamic properties of the system. It resembles the puddle-skimming method^{243} except that its formulation avoids nonsmooth potential-energy surfaces that cause significant problems in MD simulations. In addition, the puddle-skimming method results in random sampling on a flat energy landscape in the regions of the minima where sampling becomes very inefficient.

In the accelerated MD method, time becomes a statistical quantity in the simulation. The effective time scale of simulations is increased by several orders of magnitude at the expense of sampling around the energy minima. Because the potential is altered analytically, the statistics of sampled configurations can be corrected to reproduce the canonical probability distribution for the original potential surface.

The original potential function *V*(*r*) is altered via

where Δ*V*(*r*) has a nonnegative value as given by

The modified potential echoes the original potential, so sampling directions during the simulation are still representative of the unbiased system. Thus, the random-sampling behavior of puddle skimming in the regions of the original minima is not observed.

When applied the to the dihedral angle term and the 1–4 term of the Amber potential, in a Langevin dynamics simulation using the GB implicit solvent model, substantial increases in configuration sampling could be obtained. Also, after applying a correction for the sampling bias, the method yielded the expected distribution of configurations for small peptides.^{247}

#### 5.3.2. Modified Sampling

The basic principle behind most of the second class of methods is also to reduce the amount of time that the simulated system remains in local energy minimum wells, forcing the system to sample the remainder of the conformational space available. However, these techniques use alternative methods for sampling rather than performing conventional MD on explicitly modified energy landscapes. One obvious approach is simulated annealing (SA), as described earlier, which effectively smoothes a potential-energy surface through additional entropic contributions.^{132} It can be shown that SA is the stochastic equivalent of the DEM method discussed above, and the relationships between these methods have been investigated.^{135}

Modified sampling methods include high-temperature MD,^{252} locally enhanced sampling (LES),^{253}^{–}^{256} replica exchange,^{257}^{,}^{258} parallel tempering,^{259}^{,}^{260} self-guided MD,^{261} targeted MD,^{187} milestoning,^{262}^{,}^{263} repeated annealing,^{264} additional degrees of freedom,^{265}^{,}^{266} and various non-Boltzmann sampling methods.^{267}^{,}^{268}

##### Locally Enhanced Sampling

In the LES method^{253}^{,}^{256} a fragment of the system, maybe a side chain or a ligand molecule, is duplicated so that the simulation contains *N* noninteracting copies of that fragment. The remainder of the system experiences each fragment through interactions that are reduced by a factor of 1/*N* from their original magnitudes. This use of multiple copies and the reduction of the interaction potentials significantly enhances the sampling of the conformations of the fragments.

Evaluating the free energy from LES simulations requires two additional perturbation calculations: one in transforming from the single-copy representation of the reference state to the multiple-copy representation and a second in transforming from the multiple-copy representation of the perturbed state into the single-copy representation. However, the benefits of adopting a multiple-copy representation outweigh the additional costs of introducing two more perturbation calculations.^{269} This method has been applied to the study of α → *β* anomerization of glucose, and it was found that the free-energy calculations converged an order of magnitude quicker than with the single-copy method.^{270}

##### High-Temperature Molecular Dynamics

Perhaps the most obvious approach for enhancing sampling of high-energy states is to raise the simulation temperature. This approach is known as high-temperature dynamics and has been evaluated by several groups.^{252}^{,}^{271}^{,}^{272} It is suggested that high-temperature molecular dynamics is a useful aid in conformational searches, but physiologically relevant low-energy structures are not generally obtained even after minimization of the generated high-energy structures. Indeed, the generated structures often have an infeasible proportion of cis-peptides.^{252} Consequently, this method is not without criticism. Another possibly major issue is that MM force fields, generally, have not been designed for, or validated with, temperatures much beyond the physiologically relevant 300–330 K that most MD simulations are run at. Whether or not the force field is physically correct, there may be debate about whether the use of such high temperatures leads to appropriate sampling characteristics since the entropic contribution to the free energy is significantly enhanced (and therefore over-sampled).^{238} Although there are several problematic issues involved with using high temperatures to accelerate MD simulations, the basic principle acts as the fundamental basis for a few more complex approaches discussed below.

##### Multiple-Copy Dynamics

Several approaches that use a series of simultaneous (or parallel) MD simulations are demonstrated in the literature.^{255}^{,}^{256}^{,}^{273}^{–}^{276} In these individual molecules in the separate simulations may, or may not, interact in some way. Similar approaches may be applied in simulated annealing.^{277}

One interesting multiple-copy approach is SWARM-MD.^{256} The basic idea behind SWARM-MD was motivated by the efficient search behavior observed of swarms of social insects. Despite the absence of any higher intelligence, whole swarms of insects often appear to exhibit significant organization and planning. The cooperative rules that lead to the swarm’s overall behavior is mimicked for conformational search in a simulated swarm of molecules. In such simulations, each molecule is subject to supplementary artificial forces to introduce the cooperative behavior. The artificial force drives the trajectory of each molecule toward the mean trajectory of the entire swarm.

It would be easy to conceive a scheme in which each molecule is driven away from the mean trajectory in order to enhance the volume of conformational space that would be sampled.

Multiple-copy approaches tend to be very well suited to trivial parallelization with communication only between the individual systems. Indeed, the well-known distributed protein-folding project, Folding@Home, uses a multiple-copy approach.^{278}

A number of multiple-copy approaches specific to ligand binding or design have been introduced, ^{275}^{,}^{279}^{,}^{280} and these are discussed further in the MD for Ligand Docking section, below.

##### Replica Exchange Molecular Dynamics

Like the multiple-copy approaches, replica exchange molecular dynamics (REMD)^{281} and the closely related parallel tempering method^{260} utilize a series of simultaneous and noninteracting simulations, known as replicas. With proteins, these simulations are typically MD, but earlier work applied Monte Carlo simulations. The replicas are simulated over a range of temperatures, and at particular intervals the temperatures of these simulations may be swapped (i.e., replicas are exchanged). The methods differ, however, in the way the individual simulations are coupled. Usually these replicas may exchange temperatures according to Monte Carlo-like transition probability. Such exchanges occur through a simple swapping of the simulation temperatures via velocity reassignment. The high-temperature replicas jump from basin to basin, but the low-temperature replicas explore a single valley with sampling characteristics just like conventional MD.

While REMD is widely applied to smaller molecules, particularly in peptide and protein folding experiments,^{282}^{–}^{284} it is found to be extremely computationally expensive when applied to large proteins.

In a study in which REMD was applied to a 20-residue peptide it was found that at physiologically relevant temperatures the conformational space was sampled much more efficiently than it was with conventional constant temperature MD^{284} and with similar thermodynamic properties.

##### Self-Guided Molecular Dynamics

Self-guided molecular dynamics (SGMD)^{261}^{,}^{285}^{,}^{286} applies an additional guiding force to drive the simulation. The guiding force is a continuously updated time average of the force of the current simulation, leading to increased search efficiency by assisting the system over energy barriers. For efficient sampling of the available conformational space, the correlation between the guiding forces and the actual physical forces must be low;^{287} nonetheless, the algorithm produces stable dynamics.^{286}

A related method, the momentum-enhanced hybrid Monte Carlo method (MEHMC)^{288} overcomes some of the inherent problems observed with SGMD. The SGMD algorithm lacks reversibility because the effective potential-energy landscape is a function of the trajectory rather than a function of the coordinates. This irreversibility results in substantial errors in canonical averages from the trajectory. MEHMC differs by using average momentum instead of average force to bias the initial momentum within a hybrid MD/Monte Carlo procedure. This is believed to yield correct canonical averages.^{288}

#### 5.3.3. Modified Dynamics

This third class of enhanced sampling method encompasses those methods in which the dynamics along the “slow” degrees of freedom are accentuated relative to the “fast” degrees of freedom. One such method is extremely widely used, to the extent that simulations not applying it (or one of its close descendants) are exceptionally rare. This is the SHAKE algorithm,^{71} where constraints are applied to particular bond lengths to allow larger time steps to be taken without encountering excessive forces. SHAKE was described earlier in this review. Another method is the MBO-(N)D algorithm^{289} in which groups of atoms are partitioned into substructures that are considered to be rigid during the simulation. It is possible to integrate the equations of motion faster by separating the faster components of the dynamic propagator from the slower components.^{290} Alternatively, the faster (i.e., higher-frequency) motions may be reduced or eliminated completely. Several algorithms based on this principle exist including dynamic integration within a subspace of low-frequency eigenvectors,^{291} generalized moment expansion,^{292} various types of coarse-grained modeling^{293}^{,}^{294} such as network models,^{295} mode coupling theory and projection of a generalized Langevin equation onto certain degrees of freedom,^{296} digital filtering of selected velocities,^{297}^{–}^{299} large time-step dynamics using stochastic action,^{300}^{,}^{301} and leap dynamics.^{302}

##### Gaussian Network Model

A large number of coarse-grained models for proteins have been described in the literature. The Gaussian network model^{293}^{,}^{303}^{–}^{305} is one such coarse-grained method in which the energy function evaluates the system at the level of residues rather than atoms or in some cases even more coarse fragments.^{306} The residues interact through simple harmonic and nonharmonic terms. This model has been widely applied in MD of proteins.^{307}^{,}^{308}

This simplified model has been reduced further, even to the point where sequence information is not considered.^{309} In addition, sampling has been further enhanced through amplification of slower motions^{310} and coupled to conventional MD to assist sampling.^{311}

##### Leap Dynamics

In the leap-dynamics^{302} method a combination of MD and essential dynamics is applied. Conformational “leaps” applied to the system force it over energy barriers. The structures from the leaping process are refined using MD. The method was demonstrated to correctly predict the enhanced partial flexibility of a mutant structure in comparison to native bovine pancreatic trypsin inhibitor.

A potentially significant disadvantage of many of these modified dynamics methods is that the dynamics are, of course, artificial. This may, or may not, be a problem depending upon the desired application of the results. Some methods might still be anticipated to yield a Boltzmann distribution of structures if run for a sufficiently large number of steps, but there can be no such guarantees for others.

##### Digitally Filtered Molecular Dynamics

Digitally filtered molecular dynamics (DFMD) applies the theory of digital filters to MD simulations, selectively enhancing or suppressing motions on the basis of frequency.^{312} This method was applied to the Syrian hamster prion protein, and a high degree of selectivity and control was demonstrated in the enhancement of the rate of conformational changes.^{298} A time reversible version of the method has also been developed.^{299}

##### Multiple Time-Step Methods

A number of multiple time-step methods are available^{313} with the reference system propagation algorithm (RESPA)^{314}^{–}^{316} being most widely applied to biomolecular systems.^{317}^{,}^{318} A reasonable MD trajectory may be generated at 35–50% of the usual computational expense. Coupled with the Langevin methods described above, speed ups of as much as 2 orders of magnitude have been reported for protein simulations.^{319}

The basic concept in multiple time-step methods is to separate the slow motions from the fast motions and evaluate the interactions relating to the slow motions less frequently. One common assumption in such methods is that interactions involving longer distances change more slowly, and therefore, the forces due to those interactions can be reapplied over several time steps before being recalculated.

##### Multiple-Body O(N) Dynamics

Multiple-body O(N) dynamics (MBO(N)D)^{289} is a molecular dynamics technique using a coarse-grained model that scales linearly with the number of bodies in the system. Speed ups of 5–30 times compared to conventional MD are claimed.^{320} This method combines rigid-body dynamics with multiple time steps. The highest frequency harmonic motions are removed while retaining the low-frequency anharmonic motions. It has been demonstrated to reproduce the global essential dynamic properties of both proteins and nucleic acid systems.^{289} One notable problem with this otherwise extremely promising method is that it relies upon the user to determine the level of granularity from an empirical study of such levels for the system of interest.

## 6. Recent Advances in the Computational Aspects of Molecular Dynamics

The practical application of molecular dynamics is fuelled, in part, by the wide availability of software and the growing availability of significant computational resources.

### 6.1. Software

There is a considerable amount of diversity in the software packages that may be chosen today. While many molecular dynamics packages aim to have a broad or comprehensive range of capabilities, each widely used package does have certain features or advantages that set it apart from the others. Few research groups restrict their simulations to a single software package. This encourages development of compatibility-oriented features and uniform benchmarks, although there is still much scope for improvement in these areas. The majority of popular MD packages can utilize force field, structure, and trajectory file formats that were originally introduced in other packages. This enables a certain amount of validation and facilitates reproduction of published results, even without the original software, an exigency for some areas of computational chemistry.

Many of the general-purpose protein modeling packages contain some kind of MD facility, although in many cases this is nothing more than an interface to one of the specialized MD software packages. Such interfaces are useful in themselves as they can provide a simple mechanism for invoking the simulations without an understanding of the underlying, and often complex, software. They, therefore, allow the nonspecialist to readily perform simple simulations.

Taking the authors’ research group as an example, GROMACS, AMBER, and NAMD are all routinely used on a regular basis for MD studies as warranted by the specific aims or requirements of the current project. In addition, CHARMM, NWChem,^{321}^{,}^{322} and others are used when the situation demands.

While it might be argued by some that such a broad base of actively developed software fractures the field and wastes a lot of effort through duplication, it in fact fosters a great deal of friendly competition. Such competition is a sign of a healthy worldwide research effort. Other reasons, such as the independent validation or verification of theories, should not be understated either. As MD approaches and methods pass from the province of specialized experts into the wider realm of scientists, particular software packages can offer important advantages or accommodate particular needs of the diverse set of potential users.

#### 6.1.1. GROMACS

GROMACS^{323}^{,}^{324} (http://www.gromacs.org/) is advertised as a versatile package which is primarily designed to perform molecular dynamics simulations of biochemical molecules such as proteins and lipids. However, it is also claimed that since GROMACS is extremely fast at calculating the nonbonded interactions that typically dominate simulations, many groups are also using it for research on nonbiological systems such as polymers. GROMACS was initially a rewrite of the GROMOS^{325} package (http://www.igc.ethz.ch/gromos/), which itself, like AMBER, was originally derived from an early version of CHARMM.

The main advantages of GROMACS are its ease of use and its exceptional performance on standard personal computers. A lot of effort was expended in optimizing the code to run efficiently on desktop computers such as those using the Intel Pentium IV and PowerPC G4 processors. The authors report that it is normally between 3 and 10 times faster than other MD programs.

One factor that many users might find attractive is its lack of a scripting engine. GROMACS is actually a suite of small command-line programs each with a simple set of options.

GROMACS file formats are somewhat interesting. All files are plain-text based, so they are, in principle, human readable. This helps them avoid the platform dependence malediction of many MD packages. These plain-text formats result in much larger file sizes than binary formats would, so GROMACS transparently utilizes standard UNIX compression tools. In addition, trajectories may be stored in a very condensed form using lossy compression. Lossy compression describes a class of data compression algorithms that achieve impressive reductions in data size by only approximating the original data. While some fidelity is lost, it generally has little practical consequence much like the lossy compression used in the well-known JPEG image format.

Most of the standard types of data analysis can be performed using the set of accompanying tools which can also produce publication-ready plots in a straightforward manner.

From a practical point-of-view, one particularly attractive reason to choose GROMACS is the fact that it is distributed as free software under the terms of the GNU General Public license (http://www.gnu.org/). This provides certain freedoms and significant pedagogical benefits. Source code is generally available (at least for academic groups) for all major MD packages, but the use and reuse of such code is often highly restricted.

#### 6.1.2. NAMD

Whereas GROMACS is renowned for its spectacular performance on modest desktop computers, NAMD^{326} (http://www.ks.uiuc.edu/Research/namd/) exhibits inspiring performance on high-end parallel computing platforms with large numbers of processors.^{327} NAMD is able to comfortably handle system sizes which are well beyond the practical, if not absolute, limits of other MD packages. For instance, NAMD was used in the simulation of a 200 000 atom lac repressor system.^{328} Such huge biological systems are simulated on massively parallel supercomputers and Beowulf-style clusters. NAMD scales to (i.e., runs efficiently on) large numbers, even thousands, of processors impressively.^{329} Until the recent development of PMEMD^{330} no other MD software came close on this point.

Besides its efficiency, another advantageous property is the level of integration with the VMD molecular visualization software.^{172} This facilitates interactive molecular dynamics, for example.^{331}

One specific factor that probably contributes to its efficiency is that its feature set is relatively modest compared to the other major MD packages. However, NAMD is file compatible with packages such as AMBER and CHARMM, so system preparation and the postprocessing and analysis of results are often performed using other software.

NAMD is distributed free of charge, with full source code, but certain conditions on use apply. Its object-oriented design implemented using the C++ programming language^{332} facilitates pedagogical use and the incorporation of new algorithms.

#### 6.1.3. AMBER

Two of the more established and respected MD packages are CHARMM and AMBER^{333} (http://amber.scripps.edu/). While the respective developers can trace the programs heritage to a common source, they have been independently developed since the early 1980s and adopted slightly different philosophies. AMBER consists of a suite of separate programs, each performing a specific task. The CHARMM developers have taken a more integrated approach where one single and monolithic application does everything from system preparation through simulation to analysis. There are inherent benefits to each approach, but the overall merit, more or less, comes down to personal preference.

AMBER was principally maintained in the research group of the late Professor Peter Kollman, and ongoing maintenance is now coordinated in the research group of Professor David Case. Code contributions came from a variety of locations, however. Along with CHARMM, it often incorporates new methodologies and algorithms before any other packages.

A recent addition to the AMBER suite^{334} is PMEMD, a stripped-down and optimized version of the general MD program known as Sander. PMEMD provides scaling on massively parallel platforms that is comparable with NAMD.

The simulations executed using PMEMD are intended to replicate AMBER’s Sander calculations within the limits of computational precision. However, the computation is performed much more quickly, in roughly one-half of the memory, and with significantly less overhead on larger numbers of processors. A number of benchmark cases are presented on the PMEMD website (http://amber.scripps.edu/pmemd-get.html).

Like CHARMM, a series of force fields are developed in conjunction with the AMBER simulation software.^{189}^{,}^{335}

#### 6.1.4. CHARMM

CHARMM (Chemistry at HARvard Molecular Mechanics) is described as a program for macromolecular simulations, including energy minimization, molecular dynamics, and Monte Carlo simulations.^{39}^{–}^{41} It is predominantly developed within the research group of Professor Karplus at Harvard University, although, as highlighted on the CHARMM website (http://yuri.harvard.edu/), a network of developers in the United States and elsewhere actively contribute to its ongoing development. The CHARMM software is developed in unison with a series of force fields, as described above.

A variant of CHARMM, named CHARMm, is widely deployed in commercial settings. While it lacks some of the more cutting-edge features, it is arguably more robust and bug free. Commercial support is provided by Accelrys (http://www.accelrys.com/support/life/charmm/).

### 6.2. Hardware

The great strides made in the molecular dynamics field are due in part to the phenomenal developments in computer hardware. Protein simulations make heavy demands on the available computing facilities. The length and accuracy of simulations is chiefly restricted by the availability of processor power, while memory and disk space are also important, especially for large model systems. The most significant developments in computational infrastructure with applicability to MD are briefly discussed here with details of this applicability.

#### 6.2.1. Parallel Computing

The most powerful supercomputers available today consist of arrays of processors that communicate via fast interconnects. The potential power of these systems is only utilized when the algorithm to be run can be partitioned into a number of separate processes that can individually be run on a single processor. With no or minimal communication required between the processes, the bandwidth of the interconnects will not become the bottleneck in the calculation. In addition, if the processes do not need to remain synchronized, then processing power will not be wasted as one of the processes waits for others to catch up. While this might be the case for many computational tasks, it is not for typical MD algorithms. Particularly in the case of nonbonded interactions, significant communication is required and the processes will need to be synchronized during each time step.

##### Blue Gene

Blue Gene^{336} is a widely publicized project, with an ultimate goal of implementing and utilizing a so-called hugely parallel supercomputer architecture.

*IBM’s Blue Gene Project*. (http://researchweb.watson.ibm.com/bluegene/index.html) is described as representing a unique opportunity to explore novel research into a number of areas, notably including biomolecular simulation. The planned scientific program will require and foster a collaborative effort across many disciplines and the participation of the worldwide scientific community to make best use of this exciting computational resource.

##### Commodity Clusters

The availability of Beowulf-style clusters built from commodity PC components is being increasingly leveraged, as a cost-effective alternative to traditional supercomputer platforms, to facilitate large-scale MD simulations. There is one caveat, however. The technical characteristics of typical clusters do not suit them to fully efficient MD simulations as detailed in the Future Prospects and Challenges section. Clusters are suitable for running many simulations in parallel, with minimal communication between the processing nodes and therefore maintaining much of the efficiency of the serial codes. Some types of simulation, including the relaxed complex method, are highly suited to clusters.

## 7. Recent Applications of Molecular Dynamics

There have been numerous and varied simulations described in the literature. As MD approaches the point in its development at which it becomes a routine tool for nonspecialist researchers, such simulations will only increase in frequency. It is impossible to give a broad overview that will do justice to any of these simulations. Consequently, a tiny set of recent examples driven by simulations of biomolecules is selected here for brief discussion.

### 7.1. Functional Mechanism of GroEL

In the highly concentrated milieu of the cell, chaperone molecules are essential to facilitate the correct folding of many proteins. For example, in *Escherichia coli* it is thought that around 10% of proteins located in the cytoplasm require an experimentally well-characterized protein, chaperonin GroEL, for correct folding. This protein is a homomeric complex of 14 subunits arranged in two heptagonal rings.^{337} Critical to its function are large conformational changes that are regulated through cooperative binding and hydrolysis of ATP in the presence of a chaperonin GroES. This cooperativity is positive in a given ring but negative between the rings. The conformational changes occur in all subunits, converting them from a “closed” form to an “open” form upon binding of ATP and GroES.

It was impossible to determine the conformational pathway between the open and closed forms experimentally. It is believed that this conformational transition occurs on a millisecond time scale, so conventional MD of this transition would not be tractable. However, targeted MD is applicable in such cases and was applied to determine the transition pathway between the two known conformations.^{338} The targeted MD simulation predicts a particular intermediate conformation with ATP bound before GroES. It was also indicated that steric interactions along with salt bridges between the individual subunits mediate the pattern of positive and negative cooperativity of the ATP binding and hydrolysis. It is seen that early in the pathway ATP binding triggers a downward motion of a small intermediate domain, and this causes the larger motion of the apical and equatorial domains. Subsequent cryoelectron microscopy results support the results of the simulation, indicating that this intermediate domain plays a critical role in the conformational behavior.^{339}

### 7.2. Simulation of Membrane Proteins

An active area where simulations play a key role is the study of ion diffusion through pores and channels and the gating mechanisms associated with such channels, topics that are frequently reviewed.^{25}^{,}^{340}^{–}^{342} Experimentally probing the structure of transmembrane proteins is difficult, but valuable insights have been obtained through the application of MD in cases where a reasonable structure is known or can be predicted. One system that has been widely simulated is the M2 protein of influenza A. During a 4 ns MD simulation a funnel-like structure formed, but it appeared to be occluded by a particular histidine residue,^{343} while it has also been shown that the protonation of this residue can drive channel opening.^{344} Both of these predictions have been validated by NMR results. This transmembrane protein fragment has been used as the basis of a number of model ion channels for various viral proteins.^{340}

### 7.3. Molecular Dynamics for Docking and Ligand Design

The interactions between proteins and substrates are critical to many biological systems and processes. Signal transduction, metabolic regulation, enzyme cooperativity, physiological response, and other processes are all dependent upon noncovalent binding. These processes may be investigated through modeling and simulation, particularly as the range of solved protein structures grows. Through MD, MC, and the various related methods described in this review, binding modes and the corresponding binding free energies may be estimated for protein-ligand^{29} and protein-protein^{345} complexes.

Ligand docking is the prediction of protein-ligand complexes; the use of MD is widespread in such ligand-docking studies.^{346}^{,}^{347} Most ligand-docking methods are MM-based; however, the present discussion is limited to methods that actually use MD rather than just using MM-based scoring functions.

When calculating free energy of binding estimates there is a necessary balance to be found between the accuracy or reliability of these estimates and the computational cost of the calculations. It is not always essential to determine highly accurate binding constants for productive studies in drug design.

At the upper end of the accuracy versus computational speed, one factor that becomes crucial is that the ligand-binding process can lead to conformational changes in the receptor protein itself. These changes could be necessary for the receptor to accommodate the bound ligand. While it is important to explore the conformational space available to the receptor (i.e., the protein) molecule, it is often difficult to predict or represent the plasticity of the binding site.^{348} This is particularly important when there may be multiple, allosterically connected, binding sites. While many approaches are available for considering such flexibility,^{349}^{–}^{351} it is suggested that the use of multiple protein structures in the docking process is a wise approach to the task. One simple approach to generate such multiple structures is through the use of MD^{352}^{,}^{353} with standard “static” docking to a series of individual snapshots from the simulation trajectory. It may be tempting to use MD as the docking search method, but during reasonable simulation times the system is likely to be stuck in local minima, with energy barriers of more than 1–2 kT unlikely to be overcome. With some modification of the potential function to smooth the energy surface and allow further exploration of receptor conformations such simulations can become practical.^{354}^{,}^{355} By coupling different degrees of freedom of the system to different temperatures, the system can be assisted in escaping local minima by varying temperatures that specifically mediate the flexibility of the ligand.^{356}^{,}^{357} The concept of varying temperatures to enhance sampling for docking simulations is commonly applied in the guise of simulated annealing MD.^{358}

#### 7.3.1. Advanced Molecular Dynamics-Based Methods for Drug Discovery

It has been shown that in a study of different search algorithms, an MD-based method was the most efficient approach for large search spaces and produced the lowest (i.e., best) mean energies for the docked conformations.^{359}^{,}^{360}

Further docking methods based around MD have been proposed. The dynamic pharmacophore method^{361} and the relaxed complex method^{362}^{,}^{363} are both designed to take receptor flexibility into account in the analysis of ligand-receptor binding. These are described below.

SGMD, as described earlier, would probably be suited to investigation of protein-ligand systems. This was applied to a host-guest system,^{364} but no study for protein-ligand systems has been reported to date.

The main impediment to the use of MD in docking studies remains the computational cost of running suitably long simulations. Approaches using only short MD simulations have been shown to improve the performance of docking procedures versus methods using static structures, in certain cases.^{365}

The filling potential method^{366} is an MD-based approach for estimating free-energy surfaces for protein-ligand docking. This is a modified umbrella potential sampling method which enables the ligand molecule to drift out of local minima through a self-avoiding (via a tabu list) random walk consisting of an iterative cycle of local-minimum searches and transition-state searches.

Another interesting flexible docking method relies on calculation of the flexible degrees of freedom using MD simulations.^{367} This approach allows relaxation of the protein conformation in precalculated soft flexible degrees of freedom. These soft flexible modes are extracted as principal components of motion from a conventional MD simulation.

MD is also widely used in the refinement of docked conformations from the results of approximate, generally rigid-body, ligand docking. For example, a three-stage method has been presented.^{368} A grid-based method is used to sample the conformations of an unbound ligand in the first stage. Next, the lowest energy ligand conformers are rigidly docked into the binding site. The docked modes are refined in the third stage by molecular mechanics minimization, conformational scanning at the binding site, and a short period of MD-based simulated annealing. This procedure was applied to ligand-protein complexes with as many as 16 rotatable bonds in the ligand with final root-mean-square deviations ranging from 0.64 to 2.01 Å compared to the crystal structures.

Taking this refinement a stage further, a combined quantum mechanical/molecular mechanical (QM/MM) docking method is described.^{369} In this method AutoDock^{370} is used to generate initial starting points for the docked structures, semiempirical AM1 QM/MM optimization of the complex gives an improved description of the binding mode, and the electronic properties of the ligand within the environment of a flexible protein to simulate the limited structural changes of the enzyme upon ligand binding. This method was able to reproduce the induced fit structural changes when a simple optimization was adequate for reproducing the protein’s movement.

##### Multiple-Copy Simultaneous Search

Multiple-copy simultaneous search is a method for determining locations of favorable binding for functional groups to the surface of a protein.^{275}^{,}^{371} A few thousand copies of a functional group are randomly placed at the protein surface and then subjected to simultaneous energy minimization and/or quenched molecular dynamics. The resulting locations of the functional group yield a functionality map of the protein receptor site, which included consideration of its flexibility. A set of these functionality maps can be used for the analysis of protein-ligand interactions and rational drug design.^{280}

##### Locally Enhanced Sampling

In LES a fragment of the system exists as several copies in the same simulation, as described above. The individual copies do not interact with each other but do interact with the environment. In the case that the fragment is a ligand, LES becomes a method for sampling ligand conformations under the influence of a protein.^{372}

##### Dynamic Pharmacophore Method

The dynamic pharmacophore method^{351}^{,}^{361}^{,}^{373} requires a set of instantaneous snapshots of the fluctuating receptor molecule. These snapshots are typically extracted from MD simulations, although structures that are consistent with NMR data or other sources might be used instead. Probes corresponding to fundamental functional groups (e.g., methyl, hydroxyl, phenyl) are docked to each snapshot with the aim of detecting consensus patterns for the whole ensemble of snapshots.

Generally, receptor-based pharmacophore models are developed using a single receptor structure. These pharmacophore models based on one receptor structure could fail to identify inhibitors that bind to structures that are somewhat different from the experimental or model structure but that are still readily accessible at physiological temperatures. The dynamic pharmacophore model was developed to address this issue.

For each snapshot from an MD simulation, a pharmacophore model was constructed by identifying favorable binding sites of chemical functional groups using the multiunit search for interacting conformers (MUSIC) procedure of the BOSS program.^{374} This identifies favorable binding sites of probe molecules by simultaneously refining the energy of a large number of probe molecules, which do not interact with each other, in the potential field of the receptor molecule. Strong binding sites tend to cluster many probe molecules in well-defined orientations and locations. Thus, strong binding sites can be selected as those which consistently appear in many snapshots rather than a few snapshots. These sites are used to form the important components in a pharmacophore model. This approach also uncovers useful binding sites that might not be readily recognized as such in the initial starting structure.

While this procedure increases the number of false positives,^{375} it produces pharmacophore models that perform better than any single conformation model for potent inhibitors of HIV-1 integrase.^{373}

##### Relaxed Complex Method

The relaxed complex method^{362}^{,}^{363} resembles the dynamic pharmacophore method but involves docking of whole ligand molecules to the initial set of receptor snapshots with subsequent rescoring of the most favorable structures within a rigorous statistical mechanical framework.^{148}

It is possible that ligands may bind to conformations that occur only rarely in the dynamics of the receptor and that strong binding often reflects multivalent attachment of the ligand to the receptor. Two successful experimental approaches that recognize this fact are *SAR by NMR*^{376} and the *tether method*.^{377} The relaxed complex method was inspired by these methods and aims to reliably consider the induced fit at the binding site.

The relaxed complex method is a three-step process. The first step is to generate a series of target conformations of the receptor. This task is typically performed by selecting representative snapshots from an apo-protein MD trajectory. Alternative methods for generating the receptor conformations are possible. Such methods include the replica exchange method or even just an ensemble of short time-scale MD simulations. During the second step small libraries of candidate ligands are docked into each representative receptor structure. In the original work^{362}^{,}^{363} the Autodock^{370} software was used for this docking process, although the overall method is independent of the choice of docking software. The final stage seeks to improve the scoring of the predicted binding configurations by use of a more rigorous, but computationally expensive, method for estimating the standard free energies of binding. This rescoring process has been demonstrated with MM/PBSA^{363} as implemented in the AMBER software with the electrostatic terms calculated using the APBS software,^{164} but again, the overall method is not dependent upon this choice.

A double-ligand variation of the above procedure incorporates consideration of the fact that two ligands with relatively low binding affinities might be linked to form a single high-affinity ligand. Because the binding of the first ligand could introduce unfavorable interactions for the binding of the second ligand, the combination of the best-ranked ligands for respective binding sites does not necessarily produce the most favorable composite compound. Continuing from the previous single-ligand studies, the first ligand is treated as part of the receptor molecule and the docking simulations of the second ligand are repeated but limited to a region of space consistent with the allowable lengths of linkers. As before, the binding of the second ligand is subsequently recalculated using the more accurate approach.

Initial results presented from work applying the relaxed complex approach covered the binding of two ligands to the FK506 binding protein (FKBP) with conformations generated via a 2 ns MD simulation. This demonstrated that the ligand binding is sensitive to conformational fluctuations in the protein; the binding energies covered a range of 3–4 kcal mol^{−1}, which corresponds to a 100–1000-fold difference in binding affinities.^{362} The use of MM/PBSA free-energy evaluations allowed for correct prediction of binding modes compared to the crystallographic structures.^{363}

##### Lambda Dynamics Method

The *λ* dynamics method^{155}^{,}^{156} is another technique intended for speeding up free-energy calculations. In the *λ* dynamics method, another multiple-copy method, multiple ligands are simultaneously located in the receptor binding site. However, the interaction potential of each ligand is reduced from its full strength. The fraction, *λ*_{i}^{2}, of the interaction potential for each ligand is determined dynamically during the simulation as an additional degree of freedom. Specifically, *λ*_{i} is treated as a particle with a fictitious mass. Because the interaction potential of each ligand is reduced, the barriers for conformational transitions are lowered. The reduced barriers allow each ligand to further explore orientational and conformational space more readily. The ranking of the ligands can emerge rapidly during the simulation because *λ*_{i}^{2} is rapidly able to increase for the winners at the expense of the losers. Distinguishing the strong binders from the weaker binders can be much faster than by performing many individual free-energy perturbation calculations with a single ligand each time. This method has been demonstrated to efficiently identify strong benzamidine inhibitors of trypsin.^{378}

#### 7.3.2. Practical Applications of Molecular Dynamics in Drug Discovery

There are, undoubtedly, many examples where computer simulation and molecular dynamics have played a demonstrated role in the discovery or development of therapeutic drugs.^{4} There are particular cases where MD provided an otherwise unavailable description of the flexibility of the binding site to aid in the development of drug candidates. It is hoped that MD will help in the discovery of practical HIV-1 integrase inhibitors, for instance, as it had in the earlier discovery of widely prescribed HIV-1 protease inhibitors.^{379} Concepts from HIV-1 integrase MD simulation have proved useful in developing promising novel antiviral compounds.^{380} MD simulations indicated sizable fluctuations of the catalytic site.^{381}^{–}^{383} MD studies predicted favorable binding of compounds that utilize a hitherto unknown additional binding trench adjacent to the catalytic site,^{384} as shown in Figure 2, and this has been validated by experimental studies.^{380} This has the potential to greatly reduce the likelihood of resistant strains developing.

## 8. Future Prospects and Challenges

### 8.1. Efficiency and Stability

Standard MD methods often fail to explore configurational space adequately for the accurate evaluation of thermodynamic and kinetic properties for proteins. This is partly because such systems typically have enthalpic and entropic barriers that are significantly higher than the thermal energy at physiologically relevant temperatures. When systems are trapped in local regions of configurational space over the time scale of a simulation, due to high free-energy barriers, they appear nonergodic.^{385} That is, for these systems the time averages of observable characteristics do not equal the corresponding ensemble averages. The simple fact that the low-frequency motions of proteins typically correspond to the larger conformational changes, and these are often the more interesting motions, aggravates the issue. Such motions sometimes do not involve crossing of a very high energy barrier but may have a slow, diffusional character. Thus, the problem is just a matter of sampling for an inadequate length of time. Many different enhanced sampling methods have been introduced in the literature, as described above, to reduce this problem. However, no perfect solution has been devised to date. Indeed, certain approaches are better suited to specific systems or observable characteristics than others. Future progress toward resolving this issue will be of great interest.

Besides the development of improved sampling protocols, simply enhancing the efficiency of MD routines will increase its practical scope. For example, improvements to integrators might allow larger time steps to be used. Likewise, improved methods for long-range force evaluation, particularly in terms of computational parallelization, would lead to more efficient simulations. Each of these putative improvements is ripe for exploration.

Fairly fundamental incremental improvements to the underlying MD algorithms are still being made (e.g., a fast and readily parallelized pair list construction algorithm for nonbonded interactions was described recently^{386}).

### 8.2. Electrostatics

Generalized Born treatments of electrostatics have had the important advantage of lower computational requirements over the more rigorous Poisson-Boltzmann treatments. With the recent development of Poisson-Boltzmann solvers that are fast enough to conduct simulations for proteins,^{164}^{,}^{387}^{,}^{388} it would be desirable to choose Poisson-Boltzmann methods when the drawbacks of GB might affect the conclusions being drawn from a simulation. While analogous methods that use Poisson-Boltzmann rather than generalized Born treatments of the electrostatics require no conceptual jumps, those methods cannot be implemented as efficiently as GB at this time. The necessary theory for determining forces and MD trajectories from the Poisson-Boltzmann formalism exists^{387}^{,}^{389}^{,}^{390} but can be prohibitively expensive to calculate at present. Analytical methods for higher order derivatives in such methods are desired.

### 8.3. Solvation and Solvent Models

Improvements to the speed and accuracy of calculations regarding solvent will be particularly beneficial. Speed increases will be useful because a major portion of a typical simulation will consist of solvent. Accuracy improvements are important because the solvent often mediates important aspects of protein structure, dynamics, and function. Replicating explicit solvent effects with new implicit solvent models will help to advance the understanding of such aspects. Currently, specific interactions, solvation shells, and long-range order are ignored in most implicit models, although some hybrid methods do seek to resolve this. In particular, the poor correlation between apolar solvation forces exhibited in explicit solvent simulations and implicit solvent simulations needs to be addressed.^{99}^{,}^{97}

### 8.4. MD in Ligand Docking and Molecular Design Studies

While protein flexibility undoubtedly plays a critical role in determining molecular recognition, most drug design and modeling efforts disregard these effects since they are computationally expensive to include. With the rapid progress in algorithms and computational resources, as discussed in this review, it is becoming feasible to consider these effects in a wider range of drug discovery tasks. The more demanding but rigorous free-energy calculation methods can often be used in the later stages of lead optimization. The more rapid but approximate methods, those relying on single reference states, can be used to quickly identify favorable and unfavorable features of a putative lead compound in molecular recognition. The identification and classification of these features can help to suggest modifications of the compound to improve binding affinities. These features also aid in construction of pharmacophore models for locating possible alternative lead compounds from chemical libraries or databases. Even the generation of improved libraries can be assisted via the use of pharmacophore-based constraints in their design. Free-energy methods that may be considered intermediate to the two extremes, for example, MM/PBSA and the semiempirical linear response approach, can be utilized to further screen out less promising compounds suggested by the single reference state models before more rigorous free-energy calculations are performed. Improved implicit or hybrid implicit/explicit solvent models also have a role in facilitating rapid conformational sampling, allowing protein flexibility to be fully accounted for in the early stages of a rational drug design process.

### 8.5. Benchmarks

Benchmarks are inherently subjective; it is impossible to objectively compare all MD algorithms or software on equal terms when they have widely differing capabilities and aims. However, there are certain common example simulations, or analysis tasks, that are amenable to benchmarking for both numerical accuracy and computational cost. The MD community would benefit greatly from a diverse set of well-conceived and publicly available benchmark tests based on these.

There are few direct comparisons of the various accelerated sampling techniques in the literature, for example. A standard set of benchmark tests would alleviate the need for researchers to setup and run unfamiliar methods and software to fairly compare these against newly developed approaches. This would be of particular value when methods are specifically developed, or optimized, for computer systems unavailable to the researchers who would otherwise be performing the comparisons.

### 8.6. Computing Facilities

MD continues to benefit immensely from improvements in computer technology. As computers become faster, it has become possible to handle larger molecules and explore their dynamics for longer time scales. Moreover, the recent advent of Beowulf-style clusters has resulted in a considerable increase in the number of research groups able to undertake biomolecular simulations. Currently, a typical simulation might have a system size of around 10^{5}-10^{6} atoms, and a multiple nanosecond simulation will probably require of the order of 10^{6}-10^{7} time steps. Such a simulation could be expected to take a couple of weeks on between 8 and 32 processors (obviously this is highly dependent on the nature of the particular protocols employed and on the efficiency of the simulation software). During this period it could generate gigabytes of data for subsequent analysis and visualization. Computer resources with the capability of handling and storing such large quantities of data are now widely available, but meaningful visualization is increasingly becoming a challenging task. The tools and techniques developed for large-scale data-mining efforts will also be increasingly valuable in the study of MD trajectories.

### 8.7. BioSimGrid

As just indicated, a typical and routine large-scale MD simulation might produce several gigabytes of raw data that needs to be processed when complete. An additional unresolved issue faced by the developers and users of modern molecular dynamics technology is the archival, indexing, and dissemination of this output data. Building upon the current efforts toward Grid computing, BioSimGrid (http://www.biosimgrid.org/) may provide the solution or insight toward future solutions. BioSimGrid is a collaborative project between several of the leading U.K. research groups in the field of molecular simulation.^{391}

The BioSimGrid project seeks to build, using Grid technology, a publicly accessible database of biomolecular simulation data. The data will include the raw simulation output, information about the generic properties of that output and the corresponding software configuration data, and information derived from analysis of the raw data. One valuable outcome of such a database might be integration of the simulation data with experimental and bioinformatic data, opening a wide range of data-mining possibilities.

## 9. Summary

MD simulations of proteins have provided many insights into the internal motions of these biomolecules. Simulation of in silico models aids in the interpretation and reconciliation of experimental data.

With ongoing advances in both methodology and computational resources, molecular dynamics simulations are being extended to larger systems and longer time scales. This enables investigation of motions and conformational changes that have functional implications and yields information that is not available through any other means. Today’s results suggest that (subject to the continuing utilization of synergies between experiment and simulation) the applications of molecular dynamics will command an increasingly critical role in our understanding of biological systems.

Investigation of the structural and functional characteristics of intriguing biochemical systems is being made possible by computer simulation with techniques such as molecular dynamics.

## Acknowledgments

The authors acknowledge support from Accelrys, the Howard Hughes Medical Institute, the NIH, the NSF, the National Biomedical Computation Resource, the San Diego Supercomputer Center, and the NSF Center for Theoretical Biological Physics through funding and provision of resources.

## 11. Abbreviations

- AGBNP
- nalytic generalized Born and nonpolar
- AFM
- atomic force microscopy
- BD
- Brownian dynamics
- BPTI
- bovine pancreatic trypsin inhibitor
- DEM
- diffusion equation method
- ED
- essential dynamics
- FEP
- free-energy perturbation
- FM
- fast multipole
- GB
- generalized Born model
- IMD
- interactive molecular dynamics
- LES
- locally enhanced sampling
- LIE
- linear interaction energy
- LINCS
- linear constraint solver
- MBO(N)D
- multiple-body O(N) dynamics
- MC
- Monte Carlo
- MD
- molecular dynamics
- MEHMC
- momentum-enhanced hybrid Monte Carlo method
- MM
- molecular mechanics
- MM/GBSA
- molecular mechanics/generalized Born-surface area method
- MM/PBSA
- molecular mechanics/Poisson Boltzmann-surface area method
- PCA
- principal component analysis
- PMEMD
- particle mesh Ewald molecular dynamics
- PMF
- potential of mean force
- QM/MM
- hybrid quantum mechanics/molecular mechanics
- REMD
- replica exchange molecular dynamics
- RESPA
- reversible reference system propagator algorithms
- SA
- simulated annealing
- SASA
- solvent-accessible surface area
- SGMD
- self-guided molecular dynamics
- SMD
- steered molecular dynamics
- TI
- thermodynamic integration

## Biographies

Stewart A. Adcock was a recent postdoctoral researcher in J. Andrew McCammon’s research group at the University of California, San Diego (UCSD). He was born in Norwich, England, in 1975. He received his MChem degree in Chemistry from the University of Sheffield and his D.Phil. degree from Oxford University. His doctoral research concerned the structural prediction and simulation of transmembrane proteins, under the guidance of W. G. Richards. Between 2001 and 2004 he was a UCSD research scholar, during which time he developed protocols and software for protein structure modeling and prediction. Currently, he is a scientific software consultant with Tessella Support Services PLC and has interest in high-performance scientific software and computing platforms.

J. Andrew McCammon is an Investigator of the Howard Hughes Medical Institute and holds the Joseph E. Mayer Chair of Theoretical Chemistry at the University of California, San Diego (UCSD). He was born in Lafayette, Indiana, in 1947. He received his B.A. degree from Pomona College and his Ph.D. degree in Chemical Physics from Harvard University, where he worked with John Deutch on biological applications of statistical mechanics and hydrodynamics. In 1976–78 he was a research fellow at Harvard, where he developed the computer simulation approach to protein dynamics in collaboration with Martin Karplus. He joined the faculty of the University of Houston as Assistant Professor of Chemistry in 1978 and was appointed to the M. D. Anderson Chair at Houston in 1981. He moved to his current positions at UCSD in 1995. His industrial consulting started in 1987 with the establishment of the computer-aided drug discovery program of Agouron Pharmaceuticals. About 50 of his former students hold tenured or tenure-track positions at leading universities or research institutes. His awards include the George Herbert Hitchings Award for Innovative Methods in Drug Design from the Burroughs Wellcome Fund and the Smithsonian Institution’s Award for Breakthrough Computational Science, sponsored by Cray Research. New directions in his research group include theoretical efforts to understand how supramolecular and cellular activity emerge from the molecular level, particularly in neural synapses.

## References

*SIGMA documentation*, 8080/HERMANS/software/SIGMA/; http://hekto.med.unc.edu.

## Formats:

- Article |
- PubReader |
- ePub (beta) |
- PDF (5.9M) |
- Citation

- Biochemistry. Integrative structural biology.[Science. 2013]
*Ward AB, Sali A, Wilson IA.**Science. 2013 Feb 22; 339(6122):913-5.* - Stability and dynamics of virus capsids described by coarse-grained modeling.[Structure. 2006]
*Arkhipov A, Freddolino PL, Schulten K.**Structure. 2006 Dec; 14(12):1767-77.* - First-principles quantum chemistry in the life sciences.[Philos Trans A Math Phys Eng Sci. 2004]
*van Mourik T.**Philos Trans A Math Phys Eng Sci. 2004 Dec 15; 362(1825):2653-70.* - The Molecular Biology Toolkit (MBT): a modular platform for developing molecular visualization applications.[BMC Bioinformatics. 2005]
*Moreland JL, Gramada A, Buzko OV, Zhang Q, Bourne PE.**BMC Bioinformatics. 2005 Feb 6; 6:21. Epub 2005 Feb 6.* - Chemical biology at the crossroads of molecular structure and mechanism.[Nat Chem Biol. 2005]
*Doudna JA.**Nat Chem Biol. 2005 Nov; 1(6):300-3.*

- Achievements and challenges in structural bioinformatics and computational biophysics[Bioinformatics. 2015]
*Samish I, Bourne PE, Najmanovich RJ.**Bioinformatics. 2015 Jan 1; 31(1)146-150* - Energetic Changes Caused by Antigenic Module Insertion in a Virus-Like Particle Revealed by Experiment and Molecular Dynamics Simulations[PLoS ONE. ]
*Zhang L, Tang R, Bai S, Connors NK, Lua LH, Chuan YP, Middelberg AP, Sun Y.**PLoS ONE. 9(9)e107313* - Protein dynamics via computational microscope[World Journal of Methodology. ]
*Guliaev AB, Cheng S, Hang B.**World Journal of Methodology. 2(6)42-49* - Water Dynamics in Protein Hydration Shells: The Molecular Origins of the Dynamical Perturbation[The Journal of Physical Chemistry. B. 2014]
*Fogarty AC, Laage D.**The Journal of Physical Chemistry. B. 2014 Jul 17; 118(28)7715-7729* - LipidWrapper: An Algorithm for Generating Large-Scale Membrane Models of Arbitrary Geometry[PLoS Computational Biology. ]
*Durrant JD, Amaro RE.**PLoS Computational Biology. 10(7)e1003720*

- PubMedPubMedPubMed citations for these articles
- SubstanceSubstancePubChem Substance links
- TaxonomyTaxonomyRelated taxonomy entry
- Taxonomy TreeTaxonomy Tree

- Molecular Dynamics: Survey of Methods for Simulating the Activity of ProteinsMolecular Dynamics: Survey of Methods for Simulating the Activity of ProteinsHoward Hughes Medical Institute Author Manuscripts. May 2006; 106(5)1589

Your browsing activity is empty.

Activity recording is turned off.

See more...