- We are sorry, but NCBI web applications do not support your browser and may not function properly. More information

- Journal List
- NIHPA Author Manuscripts
- PMC2976774

# Protein loop modeling by using fragment assembly and analytical loop closure

^{1,}

^{†}

^{*}Dongseon Lee,

^{2,}

^{*}Hahnbeom Park,

^{2}Evangelos A. Coutsias,

^{3}and Chaok Seok

^{2,}

^{†}

^{1}Department of Bioinformatics and Life Science, Soongsil University, Seoul 156-743, Korea

^{2}Department of Chemistry, Seoul National University, Seoul 151-747, Korea

^{3}Department of Mathematics and Statistics, University of New Mexico, Albuquerque, NM 87131, USA

^{†}Correspondence to: Chaok Seok, Department of Chemistry, Seoul National University, Seoul 151-747, Korea. Phone: +82-2-880-9197. Email: rk.ca.uns@koahc. Julian Lee, Department of Bioinformatics and Life Science, Soongsil University, Seoul 156-743, Korea. Phone: +82-2-820-0453. Email: rk.ca.uss@luj

^{*}These authors contributed equally to this work.

## Abstract

Protein loops are often involved in important biological functions such as molecular recognition, signal transduction, or enzymatic action. The three dimensional structures of loops can provide essential information for understanding molecular mechanisms behind protein functions. In this paper, we develop a novel method for protein loop modeling, where the loop conformations are generated by fragment assembly and analytical loop closure. The fragment assembly method reduces the conformational space drastically, and the analytical loop closure method finds the geometrically consistent loop conformations efficiently. We also derive an analytic formula for the gradient of any analytical function of dihedral angles in the space of closed loops. The gradient can be used to optimize various restraints derived from experiments or databases, for example restraints for preferential interactions between specific residues or for preferred backbone angles. We demonstrate that the current loop modeling method outperforms previous methods that employ residue-based torsion angle maps or different loop closure strategies when tested on two sets of loop targets of lengths ranging from 4 to 12.

**Keywords:**Loop modeling, Protein structure prediction, Fragment assembly method, Analytical loop closure, Loop ensemble

## I. INTRODUCTION

Prediction of the native structure of a protein from its amino acid sequence is one of the most important problems in protein science. However, modeling the native structure based solely on physico-chemical energy functions remains an unsolved problem [1–3]. Therefore, bioinformatics approaches that utilize information extracted from the database of known structures are widely used in practice. When experimental structures of homologous sequences are available, these structures can be used as templates [4, 5]. However, homologous proteins still have gaps or insertions in sequences, referred to as loops, whose structures are not conserved during evolution. Since the templates give no structural information on these regions, the loops have to be modeled *ab initio*.

Although the length of a loop region is generally much shorter than that of the whole protein chain, modeling a loop poses a challenge not present in the global protein structure prediction, in that the modeled loop structure has to be geometrically consistent with the rest of the protein structure. The condition of such consistency imposes constraints on the possible values of the loop dihedral angles, called the loop closure constraint, when the bond lengths and bond angles are kept close to canonical values. In many loop modeling methods developed so far, conformations are generated without explicit loop closure constraint. The gap in the chain is reduced afterwards either by screening out conformations with large gaps or by minimizing an energy term penalizing the gap [6–13].

On the other hand, conformations satisfying the loop closure constraint can be generated by using analytical loop closure [14–24]. Among these methods, the polynomial formulation developed in Ref. [20, 21] has the combined advantage of simplicity and generality, and can be applied to closing loops by rotation of torsion angles of non-consecutive residues. Iterative loop closure methods have also been developed [25–28]. An analytical loop closure approach is natural and efficient in that minimization of an arbitrary gap penalty is unnecessary since loops are restricted to be closed in a purely geometric way, and there is no small remaining chain break that needs to be ignored or reduced afterwards. In a sampling test on thirty loop targets of lengths ranging from four to twelve residues and an optimization test on an eight-residue loop, it was shown that loop sampling can be performed much more efficiently when analytical loop closure is employed [20]. Analytical loop closure was also combined with the Rosetta energy function [24] and was shown to predict loop structures more accurately than the previous Rosetta method that employs an iterative loop closure method [29].

The loop conformational space can be further reduced by using fragment assembly. Fragment assembly methods have been applied widely and successfully to protein structure prediction when structural templates are not available [13, 30–45]. In a fragment assembly method, local structures are limited to those of short fragments collected from a structure database, and the global structure is modeled by searching for the lowest free energy state among the states with such local structures.

In this work, we combine the two approaches, analytical loop closure and fragment assembly, for efficient protein loop sampling. Since an initial loop conformation generated by fragment assembly alone does not close the loop in general, backbone torsion angles are perturbed so that the analytical loop closure equation is satisfied. A torsional energy function can be minimized at the same time to confine the angle changes that accompany loop closure within a desired range. In order to perform this task efficiently, we develop an analytic formula for the gradient of a function of backbone dihedral angles in the space of closed loops.

Prediction results on eight short protein loops using a preliminary version of the current method was reported in Ref. [30], where a Monte Carlo search was used to find conformations minimizing a deviation from the original fragment angles. In this work, by developing a general formula for the analytic gradient of a function of dihedral angles that satisfy the loop closure constraint, such minimization can be performed much more efficiently.

We demonstrate the performance of our method by loop reconstruction tests on the 30 loops proposed by Canutescu and Dunbrack [27] and the 317 loops developed by Fiser et al. [46]. We found that the sampling efficiency is significantly improved compared to four different previous methods [7, 20, 27, 47]. By combining our sampling method with a statistical potential DFIRE [48, 49] the loop prediction accuracy could also be improved.

## II. METHODS

### A. Collection of Fragments and Structure Database

For each residue of a target loop, a seven-residue window centered on the residue is considered. For each window, two hundred fragment structures of length seven with similar sequence features are collected from a non-redundant structure database, as described below. The structure database was constructed by clustering an ASTRAL SCOP (version 1.63) set so that no two proteins in the database have more than 25 % sequence identity with each other [50–52]. The resulting set consists of 4362 non-redundant protein chains and total of 905684 residues. In order to perform a fair benchmark test, we did not use fragments obtained from proteins homologous to the target proteins in this work. To elaborate, we removed the proteins with E-value less than 0.01 after a BLAST search [53] with the whole sequence containing the target loop.

The sequence features to be compared for fragment selection are the sequence profiles obtained from a PSI-BLAST search. A sequence profile is a set of position-dependent mutation probabilities of the protein residues to other amino acids, obtained from local alignment of a given sequence with related sequences in a *sequence* database. The PSI-BLAST profile contains evolutionary information that cannot be obtained directly from the raw sequence, and it has been widely used for local structure prediction [51, 52, 54] as well as for global structure prediction by fragment assembly methods [13, 30, 32–45].

Since we consider windows of size seven, the sequence features for each window form a matrix of size 7 × 20. The distance between two sets of sequence features *A* and *B* is defined as

where
${P}_{ij}^{(A)}$ is a component of the sequence feature set *A*, and *w _{i}* is a weight parameter. Since the end-regions of a fragment is often cut off during fragment assembly, as explained in the next subsection, the structure of the central region is more frequently used. We thus place higher weight on the central region by using the formula

Two hundred fragments of seven residues that have the shortest distances from the target loop sequence for each window are then collected for fragment assembly. It must be noted that for the terminal residues of the loop, the windows contain residues in the framework region. Therefore, the sequence features used for collecting the fragments contain information on the framework region as well.

### B. Fragment Assembly for the Loop Region

The fragments obtained as above are assembled to construct loop conformations. Conformations are generated by sequentially adding randomly chosen fragments starting from the N-terminal region of the loop. A new fragment is joined to the growing loop conformation only if they share at least one residue with close dihedral angles. Two sets of dihedral angles (*ϕ*_{1}, *ψ*_{1}) and (*ϕ*_{2}, *ψ*_{2}) are considered to be close if

The comparison of dihedral angles is made between the first *w* − 1 residues of the new fragment and the last *w* − 1 residues of the current partial loop conformation, where *w* is the the fragment length. As mentioned above, *w* = 7 is used in this work. If we find a residue that satisfies the condition Eq. (3), the new fragment is added starting from the next residue position, and the length of the partial loop is increased by 1. This assembly procedure is illustrated in Fig. 1. When there is no position satisfying the condition Eq. (3), another fragment is selected from the fragment set. If no fragment can be added at the current step, the assembly procedure goes back to the previous loop conformation with one less residue, and another fragment is chosen randomly. For a loop of length *L*, conformations of length *L* + 8 are generated to utilize information in the fragments including framework residues. The structures outside the loop region are discarded in the subsequent analysis.

*w*= 7 (middle) is joined to the growing loop conformation (top), resulting in a loop conformation with one more residue (bottom). The fragment is joined starting from the position next

**...**

Since the joining of new fragments usually occurs in the middle of the fragments, only parts of the 7-residue-long fragments are used in the assembly, as illustrated in Fig. 1. The average length of the actually inserted part of fragments by the current method is 1.9 for the conformations generated for the Fiser loop set [46], as shown in Table I. One can see that the sizes of the inserted fragments do not depend much on the target loop length.

By joining the fragments only at close values of dihedral angles, we concentrate on more realistic structures that resemble those found in the structure database even near fragment junctions. In this way, the conformational search space is reduced significantly [39–45] compared to other fragment assembly methods that do not require such condition. Due to this fact, a random sampling method tested in this study performs very well for the sizes of the loops considered here (up to 12 residues), as presented in the Results and Discussion section. A set of 5000 conformations was generated for each loop target in the Canutescu and Dunbrack set to compare with several previous methods. Initial 4000 conformations were generated for the test on the Fiser set [46], out of which a final set of 1000 conformations were selected after a screening procedure to compare with the RAPPER method [7]. There is no difficulty in increasing the number of sampled conformations because the whole procedure is very efficient, and the method may also be combined with more extensive search methods, especially for loops longer than those considered here.

### C. Analytical Loop Closure and Analytical Gradient

Conformations for a protein loop generated by the fragment assembly method alone do not satisfy the loop closure constraint in general. Therefore, the backbone torsion angles of the loop must be rotated so that the loop structures correctly fit into the rest of the protein structure. Since the minimum number of backbone torsion angles that has to be rotated for loop closure is six, we first perform an initial loop closure by randomly selecting three residues and computing their six backbone dihedral angles (three *ϕ* and three *ψ* angles) by solving the analytical loop closure equation [20, 21]. Among *N* loop dihedral angles, the *N* − 6 unperturbed ones are from the database fragments. However, the six dihedral angles perturbed for the closure may deviate from the initial fragment angles significantly or may even fall into Ramachandran-disallowed regions [55] in some cases, depending on the initial conformation. Such a problem can be alleviated by distributing the torsion angle changes from the initial six angles to all the available torsion angles, resulting in small changes for many angles instead of large changes for a few. The angle changes can be distributed by minimizing an energy function that guides the dihedral angles into desirable regions in the space of closed loop conformations.

The loop closure procedure adopted in this work is as follows. We first perform initial loop closure by randomly selecting three residues and compute their six backbone dihedral angles (three *ϕ* and three *ψ* angles) by solving the analytical loop closure equation [20, 21]. As an optional next step, we adjust all the torsion angles simultaneously to minimize the following measure for deviation from Ramachandran-allowed regions

under the loop-closure constraint, where *f*_{Rama}(*ϕ, ψ*) is an energy function for a residue that represents a Ramachandran plot, and *n* is the number of loop residues that are neither glycine nor proline. The function *f*_{Rama}(*ϕ, ψ*) is a sum of the Lennard-Jones and Coulomb interactions among the non-side chain atoms within a dipeptide, as developed in Ref. [56] with the CHARMM22 parameters [57]. The same form of *f*_{Rama} is used for the 18 amino acids that are neither glycine nor proline. The two-dimensional energy contour of the dipeptide energy function has been shown to reproduce the dihedral angle distribution in the structural database much better than the hard-sphere repulsion potential energy of Ramachandran et al. [55]. We allowed free changes for the glycine angles because of their flexibility and fixed proline angles at the fragment angles because of the *ϕ* angle rigidity. Separate *f*_{Rama} functions for glycine, proline, and pre-proline residues such as in Ref. [58] may also be used if desired. Minimization of the function *F*_{Rama} enforces the torsion angles to lie within the allowed regions of the Ramachandran map for each residue.

Among the *N* variable torsion angles, {_{1}, _{2}, _{3}, ···, _{N}_{−1}, * _{N}*}, only

*N*− 6 of them are independent under the loop closure constraint, and the minimization is performed in the

*N*− 6 dimensional space of closed loops. For simplicity we choose {

_{7},

_{8}, ···,

*} as the independent variable used for minimization, called the driver angles, and express the remaining 6 adjuster angles in terms of the driver angles. We then derive a formula for the gradient of*

_{N}*F*

_{Rama}in the

*N*− 6 dimensional space using chain rules as follows.

Let us denote the axis of * _{i}*-rotation by a unit vector

**Γ**

*, and label the atom at the N-terminal of the rotation axis by*

_{i}*i*, as depicted in Fig. 2. For any atom

*j*located in the C-terminal direction of the chain relative to the atom

*i*, the variation of its position

*d*

**R**

*due to an infinitesimal change of*

_{ij}*,*

_{i}*d*, is given by

_{i}*j*,

*d*

**R**

*, when the torsion angle about the axis Γ*

_{j}*changes by a small amount*

_{i}*d*is

_{i}*d*

**R**

*=*

_{j}*d*(

_{i}**Γ**

*×*

_{i}**R**

*).*

_{ij}

where **R*** _{ij}* is the position of the atom

*j*relative to

*i*.

Since the Cartesian coordinates of atoms in the framework region, the region outside the loop, are fixed under the loop closure constraint, *d***R*** _{j}* = Σ

_{i}d**R**

*= 0 for any atom*

_{ij}*j*in the framework. In the current convention, the framework region at the N-terminal side of the loop is unaffected by the change of loop dihedral angles, and the C-terminal framework moves as a rigid body in the absence of the loop closure constraint. It is therefore necessary and sufficient to impose the following constraint for three distinct atoms

*A*,

*B*, and

*C*in the C-terminal framework region:

Eq. (6) is a constraint on possible changes of the torsion angles *d _{i}* under the loop closure constraint. Considering

*i*(= 1, ···,

*N*) as the column index and

*j*(=

*A, B, C*) together with the space index

*μ*(=

*x, y, z*) as the row index

*α*(= 1, ···, 9), the matrix

is a 9 × *N* matrix, and Eq. (6) is a system of 9 equations for *N* variables. However, it has to be noted that

which amounts to 3 identities among the 9 rows of *M _{iα}*. These identities show that the distances between atoms

*A*,

*B*, and

*C*are preserved,

when *d***R*** _{i}*’s are given by the rotation Eq. (5). Due to the three identities in Eq. (8), any 3 rows of

*M*can be expressed as linear combinations of the remaining 6 rows, and Eq. (6) is reduced to a system of 6 independent equations for

_{iμ}*N*variables. Therefore, Eq. (6) can be used to express the change of the adjuster angles

*d*

_{1}, ···,

*d*

_{6}for an arbitrary perturbation of the driver angles

*d*

_{7}, ···,

*d*.

_{N}Expressing Eq. (6) in terms of the driver angle perturbations, we get

The derivative of the adjuster angles with respect to the driver angles * _{k}*/

*can then be obtained from the following linear equation:*

_{i}For simplicity, we use N, C* _{α}*, and C′ atoms of the first residue in the C-terminal framework region as the three atoms

*A*,

*B*, and

*C*, and solve Eq. (11) to obtain

*/*

_{k}*(*

_{i}*k*= 1, ···, 6;

*i*= 7, ···,

*N*) as a function of

*(*

_{i}*i*= 7, ···,

*N*). The analytic form of the gradient for the function

*F*

_{Rama}in the space of closed loops is then

Using the analytic gradient formula, the minimization was carried out with a gradient-based quasi-Newton optimization method, L-BFGS-B [59]. It has to be noted that any differentiable function of the backbone torsion angles can be used in place of *F*_{Rama} for minimization. For example, empirical functions for torsion angle maps may be used by deriving analytical versions of the functions using spline methods [60]. Other empirical energy functions for multipeptides [61] may also be useful.

### D. Screening of the Sampled Loop Conformations

After the loop closure, a screening procedure is performed for the Fiser loop set to compare with the results of RAPPER [7]. In the RAPPER program, each residue is sampled in the space of a fine-grained *ϕ*/*ψ* map obtained from the Ramachandran plot, and conformations that have steric clashes or that are impossible to satisfy loop closure are discarded during the loop building process [7]. Since we have not considered possible steric clashes for the loop conformations so far, we apply a screening step for a fairer comparison.

We employ the DFIRE potential [48], which has been derived from the distribution of inter-atomic distances found in a structure database and thus takes steric clashes into account effectively. Because the screening is performed before the side chain atoms are constructed, side chain atoms beyond *C _{β}* atoms are not included for score calculation. We call this score DFIRE-

*β*.

The purpose of the screening is to eliminate unphysical conformations with large steric clashes so that the overall qualities of the ensembles are improved. However, it is inevitable that some native-like conformations are eliminated as well in the process. After randomly generating 4000 conformations by fragment assembly and loop closure (and optional Ramachandran energy minimization) for each loop target, we score the resulting conformations using the DFIRE-*β* score and select the 1000 conformations with the best scores for further processing.

It is not possible for us to simply estimate the fraction of the discarded loops during sampling by RAPPER [7], but we found that if we select 1000 out of 4000 sampled conformations, more native-like conformations than the 1000 conformations sampled by RAPPER are obtained, as presented in the Results and Discussion section. In this four-fold sampling, only three quarters of the conformations are discarded, and this fraction is expected to be much smaller than the actual fraction of the conformations discarded in RAPPER due to steric clashes and impossibility of loop closure, which disfavors us in comparison.

### E. Construction of the Side Chains and Final Section of the Model Structure

Although the new developments in this work mainly involve loop sampling, the current method by itself can be combined with pre-existing scoring functions to provide predicted loop structures. We present a model selection procedure here to illustrate such an application.

Since the fragments are collected from proteins whose sequences are different from that of the query, only backbone dihedral angles are obtained from the fragments. With backbone fixed, the optimal side chain conformations are constructed by selecting the side chain dihedral angles from Dunbrack’s backbone-dependent rotamer library [62]. Possible side chain conformations are finite combinations of rotamers, and the exact global minimum of a free energy function can be found using an efficient optimization algorithm based on graph theory [63], where the free energy function of SCWRL 3.0 is used, consisting of a one-body term proportional to the log of the rotamer probability and steric repulsions with backbone and other side chain atoms [64].

We found that steric clashes still remain after the side chain building for some model structures and tried force-field minimization to adjust backbone structures to accommodate the clashes. However, the model accuracy became worse (data not shown) probably because optimization of backbone results in the erasure of the database information contained in the initial backbone conformations.

The final model structures are selected from the conformations generated for the Fiser loop set using the DFIRE potential [48, 49] again, now in the all-atom form. DFIRE has been shown to be as successful in scoring loop decoy conformations as the force fields such as AMBER or OPLS with generalized Born solvation free energy [65, 66].

## III. RESULTS AND DISCUSSION

### A. Loop Conformation Sampling

The loop sampling method developed here that combines fragment assembly and analytical loop closure (FALC) was applied to the 30 loop targets of lengths 4, 8, and 12 residues proposed by Canutescu and Dunbrack [27]. The loop set, chosen from a set of nonredundant X-ray crystallographic structures, was used to test the performance of several loop sampling algorithms including the Cyclic Coordinate Descent (CCD) algorithm [27] and the self-organizing algorithm (SOS) [47]. CCD is a robust iterative loop closure algorithm. It can be coupled with Ramachandran probability maps in a Monte Carlo fashion, resulting in preferential sampling in the Ramachandran maps. A recent loop construction method called self-organizing algorithm (SOS) iteratively superimposes small, rigid fragments (amide and C* _{α}*) and adjusts distances between atoms to satisfy loop closure and to consider steric conditions simultaneously. This method was reported to outperform the CCD method [47]. We previously tested a method that samples /

*ψ*angles from Ramachandran maps using PLOP (Protein Local Optimization Program) [8] and closes the loop with analytical loop closure on the same loop set. This method, called CSJD in Ref. [20], is also compared together.

For each of the loops in the test set, the minimum backbone RMSDs from the crystal structure among 5000 conformations sampled by the following five methods are compared in Table II: the Ramachandran map CCD (from Table 2 of Ref. [27]), the CSJD method (from Table 1 of Ref. [20]), the SOS algorithm (from Table 1 of Ref. [47]), and the current methods (FALC and FALCm). In Table II, ‘FALC’ refers to the results of the loop closure by rotating six random torsion angles after fragment assembly, and ‘FALCm’ to the results of the gradient minimization after FALC, as described in Methods. Both FALC and FALCm perform better than CCD, CSJD, and SOS. In particular, our algorithms perform better than SOS in all 10 8-residue loop targets and 8 out of 10 12-residue loop targets. With the FALC method, the minimum RMSD improves from 1.19 Å to 0.78 Å and from 2.25 Å to 1.84 Å on average for the 8-, and 12-residue loops, respectively. The FALCm method show further improvements over the FALC method for the 8- and 12-residue loops from 0.78 Å to 0.72 Å and from 1.84 Å to 1.81 Å.

The current method is different from the Ramachandran map CCD method in two respects. First, the local backbone torsion angles are sampled in the fragment space here, but they are sampled from Ramachandran probability maps in CCD. Ramachandran probability maps contain information specific to the amino acid types only, but fragments obtained from the PSI-BLAST profiles provide sequence-specific information. Second, the loop closure is performed analytically here, but an iterative method is used in CCD.

The differences between the current method and the SOS method are also two-fold. First, the small fragments (amide and C* _{α}*) employed in SOS are chosen to satisfy local geometric constraints, but the fragments used here contain additional information on the sequence-specific conformational preferences that encompass the length of several residues as well as local geometry. Second, loop closure is accomplished by iterative distance adjustments in SOS but by a single step of analytical loop closure here.

We argue that the excellent performance of the current loop sampling method originates from both fragment assembly and analytical loop closure. The fact that the CJSD method shows better performance than the Ramachandran CCD, as presented in Table II, implies that analytical loop closure has an advantage over CCD. In addition, the fact that the current methods (FALC and FALCm) give better results than the CSJD method and SOS demonstrates the effectiveness of the current fragment assembly method.

CCD has been used with Rosetta for loop modeling [29], and analytical loop closure was also combined with Rosetta for loop reconstruction tests [24] showing substantial improvement in performance over the CCD-based Rosetta protocol. These methods involve extensive sampling guided by the Rosetta energy function, but the current method is more focused on sampling independent of energy function by reducing the search space effectively. Since our sampling method is an order of magnitude faster than these methods (data not shown), it would be promising to employ the current method for global optimization of an accurate energy function in the future.

Application of the target function minimization in analytical loop closure, referred to as FALCm here, improves the loop sampling results for the 8- and 12-residue loops, as discussed above. The improvement is not dramatic probably because it is more probable to close the loop with resulting angles in Ramachandran-allowed regions when more native-like angles are assembled from fragments in the initial stage. The analytical gradient formula still has a wide potential area of applications, for example in guiding loop sampling with target functions that favor hydrogen bonding to specific functional groups in protein-ligand binding problems or that favor interactions with known or predicted hot spot residues in protein-protein binding problems.

### B. Loop Ensemble Generation with Screening

In order to test the feasibility of the application of the current method to loop ensemble generation, we carried out a loop reconstruction test on a subset of the loop target test set developed by Fiser *et al.* [46]. We consider only the targets used for the test in Ref. [7], where some of the targets in the original Fiser set were omitted due to poor qualities in the experimental structures. We also omit the shortest (and the easiest) loops of 2 and 3 residues. The resulting set consists of 317 targets, as shown in Table III.

The results of loop ensemble generation are displayed in Table III with the results of RAPPER reported in Table 3 of Ref. [7]. The minimum main chain RMSD and the average main chain RMSD of the 1000 conformations, obtained after screening 4000 conformations sampled by FALCm, were examined for each target, and their average values *R*_{ave} and *R*_{min} are displayed for each loop length. The main chain RMSD was calculated using the coordinates of N, C* _{α}*, C′, and O atoms, following Ref. [7].

In the ensemble generation test by RAPPER, 1000 conformations were generated screening out loops with possible steric clashes or with too extended conformations for loop closure during the loop building process. Although it is not possible for us to accurately estimate the fraction of the loops that were screened out in the RAPPER program, the fraction must be much larger than 3/4, considering the probabilities of typical loop closure and steric clash.

The performance of our method in generating native-like conformations are significantly better than RAPPER, both in *R*_{ave} and *R*_{min}, as can be seen from Table III. There are more improvements for longer loops, especially in the minimum RMSD. It has to be noted that only a four-fold random sampling was performed for an illustrative comparison. The success of this simple application shows the potential of the current method for loop ensemble generation enriched with native-like conformations when combined with more conformational search and more extensive use of good scoring functions [8, 67].

### C. Loop Model Selection with DFIRE

From the ensemble of 1000 conformations generated for each target in the Fiser set, the final model was selected by scoring the conformations with the DFIRE potential after side chain optimization, as presented in Methods. As compared in Table IV, the accuracy of the loop model prediction is improved significantly compared to that reported in Ref. [49] in which the RAPPER ensembles are also scored with DIFRE. This result demonstrates that the better-quality conformational ensembles obtained by this study can lead to higher modeling accuracy.

## IV. CONCLUSION

In this paper, we presented a novel method for protein loop sampling, based on fragment assembly and analytical loop closure. Efficient sampling is possible because the search space is drastically reduced by sampling in the space of closed loops and in the space of fragments obtained by utilizing sequence-specific information.

We also developed an analytic formula for the gradient of a target function that depends on a set of torsion angles satisfying the loop closure constraint. This gradient can be used for efficient sampling of closed loops satisfying an additional requirement of optimizing a target function.

The efficiency of our sampling method was demonstrated by performing loop reconstruction tests on two sets of loop targets whose lengths range from 4 to 12. We found that the ability of our method for generating native-like conformations is significantly better than the previous methods based on amino acid-specific information only and less elaborate loop closure methods. It is remarkable that such a result can be obtained when no or minimal level of energy information is used in the loop ensemble generation.

One notable feature of our method is that sampling and scoring procedures are separated. Given the efficiency of our method in generating native-like conformations, the current method would also be useful for testing discriminatory powers of various scoring functions and developing a new one.

Although the current tests were restricted to the loop reconstruction problem, where the framework region is fixed to the experimentally determined native structure, the efficiency of the current sampling method would allow application to a more challenging task of modeling loops in the context of the comparative modeling problem, where the framework region is given by templates and therefore contain inherent uncertainties.

## Acknowledgments

JL was supported by the Korea Science and Engineering Foundation (KOSEF) grant funded by the Korea government (MEST) (No. R01-2008-000-11299-0). EAC acknowledges partial support from NIH-NIGMS Grants No. R01-GM081710 and R01-GM090205.

## References

*α*-helix. Protein Science. 2003;12:2508–2522. [PMC free article] [PubMed]

## Formats:

- Article |
- PubReader |
- ePub (beta) |
- PDF (550K)

- Full cyclic coordinate descent: solving the protein loop closure problem in Calpha space.[BMC Bioinformatics. 2005]
*Boomsma W, Hamelryck T.**BMC Bioinformatics. 2005 Jun 28; 6:159. Epub 2005 Jun 28.* - LOOPER: a molecular mechanics-based algorithm for protein loop prediction.[Protein Eng Des Sel. 2008]
*Spassov VZ, Flook PK, Yan L.**Protein Eng Des Sel. 2008 Feb; 21(2):91-100. Epub 2008 Jan 14.* - Multiple copy sampling in protein loop modeling: computational efficiency and sensitivity to dihedral angle perturbations.[Protein Sci. 1994]
*Zheng Q, Rosenfeld R, DeLisi C, Kyle DJ.**Protein Sci. 1994 Mar; 3(3):493-506.* - Chaintweak: sampling from the neighbourhood of a protein conformation.[Pac Symp Biocomput. 2005]
*Singh R, Bergert B.**Pac Symp Biocomput. 2005; :52-63.* - Ab initio construction of all-atom loop conformations.[J Mol Model. 2006]
*Jiang H, Blouin C.**J Mol Model. 2006 Jan; 12(2):221-8. Epub 2005 Oct 25.*

- Fast Protein Loop Sampling and Structure Prediction Using Distance-Guided Sequential Chain-Growth Monte Carlo Method[PLoS Computational Biology. ]
*Tang K, Zhang J, Liang J.**PLoS Computational Biology. 10(4)e1003539* - Designing and evaluating the MULTICOM protein local and global model quality prediction methods in the CASP10 experiment[BMC Structural Biology. ]
*Cao R, Wang Z, Cheng J.**BMC Structural Biology. 1413* - Conformational Sampling in Template-Free Protein Loop Structure Modeling: An Overview[Computational and Structural Biotechnology ...]
*Li Y.**Computational and Structural Biotechnology Journal. 5e201302003* - From laptop to benchtop to bedside: Structure-based Drug Design on Protein Targets[Current pharmaceutical design. 2012]
*Chen L, Morrow JK, Tran HT, Phatak SS, Du-Cuny L, Zhang S.**Current pharmaceutical design. 2012; 18(9)1217-1239* - GalaxyRefine: protein structure refinement driven by side-chain repacking[Nucleic Acids Research. 2013]
*Heo L, Park H, Seok C.**Nucleic Acids Research. 2013 Jul; 41(Web Server issue)W384-W388*

- Protein loop modeling by using fragment assembly and analytical loop closureProtein loop modeling by using fragment assembly and analytical loop closureNIHPA Author Manuscripts. Dec 2010; 78(16)3428PMC

Your browsing activity is empty.

Activity recording is turned off.

See more...