# An assessment of three dinucleotide parameters to predict DNA curvature by quantitative comparison with experimental data

^{1}Molecular Biophysics Unit, Indian Institute of Science, Bangalore-560012, India and

^{2}Institute of Bioinformatics and Applied Biotechnology, ITPL, Bangalore-560066, India

^{*}To whom correspondence should be addressed at Molecular Biophysics Unit, Indian Institute of Science, Bangalore-560012, India. Tel: +91 80 394 2534; Fax: +91 80 360 0535; Email: ni.tenre.csii.ubm@bm

## Abstract

Curved DNA fragments are often found near functionally important sites such as promoters and origins of replication, and hence sequence-dependent DNA curvature prediction is of great utility in genomics and bioinformatics. In light of this, an assessment of three different dinucleotide step parameters (based on gel retardation as well as crystal structure data) is carried out. These parameters (BMHT, LB and CS) are evaluated quantitatively for their ability to predict correctly the experimental results of a large set of nucleic acid sequences containing A-tracts as well as GC-rich motifs. This set contained around 40 synthetic as well as natural sequences whose solution properties have been well characterized experimentally. All three models could account reasonably well for curvature in the various DNA sequences. The CS model, where dinucleotide parameters are calculated from crystal structure data, consistently shows slightly better correlation with experimental data. Our simple analysis also indicates that presently available trinucleotide parameters fail to predict curvature in some of the well-characterized sequences. The study shows that the dinucleotide parameters with some further refinement can be used to predict sequence-dependent curvature correctly in genomic sequences.

## INTRODUCTION

The role of intrinsic curvature of DNA in biologically important functions such as transcription, replication, recombination and chromatin organization has been well documented (1–10) over the last two decades. It has also been shown that elements of curved DNA are often found near functionally important sites such as promoters and origins of replication (1–10). Due to the functional importance of DNA curvature, biologists have long sought a biophysical explanation of this property (11–22). Initial observation of curvature in the sequences having periodic A-tract repeats led to various theories which tried to explain DNA curvature based on unique properties of oligo(A) tracts (23–24). However, observation of many more curved DNA fragments with sequence elements other than A-tracts necessitated formulation of a more general model. This led to a series of models based on sequence-dependent dinucleotide and trinucleotide parameters (25–37). These parameter sets were based on various experimental techniques such as X-ray crystallography, NMR, gel electrophoresis, cyclization kinetics, etc. In the light of the potential utility of DNA structure prediction in genomics and bioinformatics, it is essential to have an evaluation of various local parameters to arrive at an accurate estimate of the macroscopic curvature of nucleic acids. Though it has been thought that trinucleotide models (35–37) are necessary to explain DNA curvature in some of the sequences, dinucleotide models remain the simplest forms of nearest neighbor descriptions. Also, Liu and Beveridge (27) recently have re-examined the possibility that a dinucleotide parameter set, based on a refined method, can explain the curvature of A-tract as well as non-A-tract sequences. In view of this, we have carried out an assessment of different dinucleotide parameter sets.

Due to lack of enough data, some of the initially proposed dinucleotide parameters were based on a low number of observations. Some of the parameters are calculated using similar methods and give similar results. Considering this, we have selected three dinucleotide parameter sets, which are based on large data sets and different methodologies, i.e. BMHT (based on gel mobility data), CS (based on crystal structure data) and LB (based on gel mobility as well as crystal structure data).

The BMHT model, proposed by Bolshoy *et al*. (25), is based on DNA gel retardation values. They estimated 16 roll and tilt wedge angles based on independent gel mobility experiments performed on a training set of 54 different sequences. In this calculation, the twist values were fixed at those proposed earlier by Kabsch *et al*. (38). When choosing the starting values, previously determined roll and tilt values of the AA step (23) were taken into consideration. The resulting set of parameters have appreciable wedge angles in the case of dinucleotide steps AG/CT, CG/CG, GA/TC, GC/GC in addition to AA/TT. The BMHT model proves satisfactory for the sequences in the training set. However, due to its bias towards the AA/TT step, it fails to account for unusual cases such as unconventional helical phasing sequences proposed by Dlakic and Harrington (39) and the GGGCCC type of motif (36).

The CS model is based on mean values of dinucleotide step parameters obtained from B-DNA crystal structure data (26,40). Unlike the BMHT parameter set, the CS model is not based on any fitting procedure and hence can be upgraded as more data on novel structures become available. The striking feature of the model is the absence of large roll and tilt for the AA/TT dinucleotide step. Another unique feature of this parameter set is that it includes two distinctly different arrangements of CA/TG steps corresponding to the B-I and B-II conformations. The roll and tilt values corresponding to the B-II conformation are used when the CA/TG step is either preceded by a pyrimidine or followed by a purine, e.g. in CCA, TCA, CAA or CAG and their self-complementary sequences involving TG. Though the CS parameter set underestimates curvature of A-tract sequences, it gives a reasonably good agreement with the experimentally observed trends. The CS model was also shown to predict the curvature correctly in several naturally occurring sequences (26,40).

Recently, Liu and Beveridge (27) calculated dinucleotide step parameters based on a refined prediction method (the LB model). They carried out an optimization procedure using initial parameter values based on oligonucleotide crystal structure data. Further parameter optimization was carried out against the same experimental values used by Bolshoy *et al*. (25), using simulated annealing. However, in contrast to the BMHT model, all the 30 parameters (three base step parameters: roll, tilt and twist for the 10 unique base steps) were taken into consideration. During the optimization, the parameter variation is restricted within one standard deviation of crystal structure values. Due to the protocol followed, the LB model does not digress much from the crystal structure AA roll values, unlike the BMHT model. The LB model was tested (27) on a number of sequences including the GGGCCC motif (41) and helical phasing sequences (39) mentioned above. It is claimed (27) that it can account satisfactorily for the bending data for a wide variety of sequences.

The three different dinucleotide step parameters have been examined here for their ability to predict DNA structure correctly. The parameter sets are tested on a data set of nucleic acid fragments for which gel mobility data are available. Our data set is much larger and more diverse (encompassing A-tract-containing as well as non A-tract sequences) than previous studies. We find that the dinucleotide model, based on careful analysis of crystal structure data, consistently gives a more satisfactory correlation with experimental results when unrelated sequences are considered. The analysis, in addition, shows that trinucleotide parameters, based on DNase I sensitivity and nucleosomal positioning preferences, can also fail to predict curvature in some of the most extensively studied sequences.

## METHODS

### DNA structure prediction

The DNA duplex structures of the sequences, for which the gel mobility data are available (Table (Table1),1), were studied with the help of in-house software NUCGEN (42). For this purpose, three sets of dinucleotide parameters, i.e. BMHT (25), CS (26) and LB (27), were used to generate 96mer or longer fragments of DNA. Since it is not clear which is the best among the different measures used for quantifying varying degrees of DNA curvature, the predicted structures were analyzed for all of the following measures.

### Curvature calculation using least square circle (LSC) fit

A circle can be fitted to the atomic coordinates of a DNA molecule curved in a plane. The radius of this circle can be taken as a measure of the curvature of the DNA fragment. The smaller the radius, the more curved is the DNA fragment and vice versa. We used a least square fitting method to trace a circle through the base pair centers that define the three-dimensional path of the DNA. For all the generated DNA molecules, the calculated values of the r.m.s.d., from the best-fit plane, were found to be very small, indicating that the molecules are near planar. To avoid erroneous fitting, we also examined the r.m.s.ds from the fitted line (rmsL) as well as the circle (rmsC). The trajectory with the smaller r.m.s.d. was assigned as being the correct one. The DNA curvature is measured in terms of curvature units (43), where one curvature unit corresponds to the mean DNA curvature in the crystallized nucleosome (radius = 42.8 Å).

Curvature (LSC) = 42.8 Å/radius of the best fit circle

Koo and Crothers (24) have reported that relative gel mobility (R_{L}) is a linear function of curvature squared. Hence for carrying out a quantitative analysis, we have plotted (LSC)^{2} against R_{L.}

### Ratio of end to end distance to the contour length (d/l_{max})

Given the atomic coordinates of a DNA fragment, the end to end distance ‘d’ is the shortest distance between centers of the first and the last base pairs. The ratio of ‘d’ to total length traced by the DNA path, ‘l_{max}’ is used as one of the measures of curvature (26,39,44–46). For an ideally straight molecule, this ratio will be unity, while for a completely closed circle the ratio will be zero. Hence, the d/l_{max} ratio varies between limits of zero and one. The more curved the DNA, the smaller this ratio is.

### Ratio of moments of inertia (I_{max}/I_{min})

To describe the global nature of DNA structure, an ellipsoid can be fitted to the DNA construct. The magnitudes of the moments of inertia of the ellipsoid can be used to define the shape of the molecule. One of the three components of moments of inertia for a linear rod-like molecule will be very large compared with the others, while all the three components will be of similar magnitude for a molecule with a small radius of curvature and folded into a globular shape. The ratio of the maximum component (I_{max}) to the minimum component (I_{min}) can serve as a measure of relative elongation of the object (26,39,44–48). The I_{max}/I_{min} ratio will be unity for a perfect circle. For a particular structure, the I_{max}/I_{min} ratio varies inversely with respect to the LSC.

### Cumulative and successive bending angles

Vectors connecting every nth base pair center can define the DNA trajectory. The successive bending angle can be defined as the angle between consecutive vectors, while the cumulative bending angle at a particular point is the angle between the vector at that point and the first vector. The magnitude of the bending angle and the position of maximum bending can provide important information about the varying geometry along the DNA molecule (27,36), particularly for non-repetitive DNA sequences. For bending angle calculations, vectors joining every 10th base pair, i.e. the nth and (n + 10)th base pair, have been used. The cumulative bending angle at the nth position corresponds to the angle made by the vector joining the nth and (n + 10)th base pair centers with the vector joining the first base pair center to the 10th.

For each generated DNA duplex model, all the above measures were calculated and were used for the comparison with experimental data. We validated the values obtained for each measure by visual inspection of three views of the DNA structures, after orienting the molecule in the mutually perpendicular *x*–*y*, *y*–*z* and *z*–*x* planes.

### Calculation of curvature using the ‘BEND’ program

The curvature values have also been calculated using the BEND program (36). In this analysis, two trinucleotide parameters based on nucleosomal positioning preferences (36,37) and DNase I sensitivity (35) are included in addition to the three dinucleotide parameter sets. The curvature, in this algorithm, is defined as the angle between averaged normal vectors 31 bp apart.

## RESULTS AND DISCUSSION

As mentioned above, considering the need of sequence-dependent DNA structure prediction, we have carried out a comparison between three different dinucleotide parameter sets. The evaluation of parameters is based on quantitative comparison as far as possible. We have carried out the comparison on a much larger set of sequences, which cover a wider spectrum of dinucleotide types, than used in previous studies.

### Comparison of predicted curvature with the experimental data derived from gel mobility experiments

*‘A-tract’ sequences*. Polymeric sequences with A-tracts have been studied extensively and their gel mobility has been correlated with the extent of bending. A systematic study of 21mer and 31mer repeat-containing phased A6 and A5 tracts interspersed with non-A-tracts has been carried out by Haran *et al*. (49). They reported a set of gel retardation values and the corresponding curvature values, obtained using the formula mentioned earlier (24). The DNA models of these sequences generated using BMHT and CS parameter sets have already been tested (26,49) by comparing the predicted curvature with the experimental one. However, since there was no such analysis using the LB model, for the sake of completeness, we have carried out the same kind of comparison for the 96 bp long DNA fragments generated using all the three parameter sets (Table (Table1:1: sequences 1–9). While correlating with experimental data, we have included I_{max}/I_{min} and d/l_{max} values in addition to LSC. From Figure Figure1,1, it is obvious that all the three parameter sets predict the curvature well. The CS model predicts the experimental data significantly better than the other two. Both the parameters based on crystal structure data (CS and LB models) predict considerably less curvature than the BMHT model, which is itself based on similar gel mobility data but still predicts higher curvature values than the experimental ones. The scaling factors, for BMHT, CS and LB parameters, required to bring the average theoretical curvature into agreement with experimental curvature values are 0.69, 1.33 and 1.33, respectively. The slight difference in BMHT scaling factor, from the value reported (0.63) by Haran *et al*. (49), may be due to the slightly different methods followed for DNA structure generation and for calculating curvature.

_{L}) and the predicted curvature calculated using dinucelotide parameters, i.e. BMHT (open circles), CS (stars) and LB (open squares), for repetitive ‘A-tract’ sequences

**...**

*Unconventional helical phasing of DNA motifs*. Dlakic and Harrington (39) analyzed polynucleotides with identical composition but different combinations of AAAAA(A), GGGCCC and GAGAG motifs. The sequences were synthesized such that one of the above motifs is out of phase, while the other two motifs are in phase with the DNA helix (see Table Table1:1: sequences 10–12) and are referred to as AA-out, GC-out and GA-out. According to the gel mobility data, GA-out shows maximum curvature, while AA-out seems to be straight and GC-out shows intermediate curvature. It was suggested that trinucleotide models are more successful than dinucleotide models in predicting the curvature of these sequences (39). In a separate study, the dinucleotide model proposed by Liu and Beveridge (27) was shown to predict the correct trend qualitatively. We analyzed the structures of these sequences using three different geometrical measures (LSC, d/l_{max} and I_{max}/I_{min} described above). These DNA structures were generated for 200 bp long fragments using the three dinucleotide parameter sets. As in previous studies, we could also carry out only a qualitative comparison with experimental trends, since the number of sequences in this set is too small for a quantitative analysis (Fig. (Fig.2).2). In agreement with the earlier report (39), DNA structures generated using the BMHT model do not show experimentally observed trends for these three sequences. The BMHT parameter set predicts all the three sequences, including AA-out, to be curved, and GC-out is predicted to be more bent than GA-out, both these predictions being contrary to the observed data. On the other hand, for all three sequences, the CS and LB models do slightly better, in that AA-out is predicted to be straight and the other two to be curved, in agreement with the observed experimental trend (Fig. (Fig.2).2). However, the LB model predicts very low curvature for the moderately bent GC-out and GA-out sequences. The CS model predicts the gel mobility trends marginally better, though the predicted curvature of GC-out is still less than the expected value.

_{L}) and the predicted curvature for repetitive DNA sequences having unconventional helical phasing motifs (Table (Table1:1: sequences 10–12), using dinucelotide

**...**

*Sequences with an A*_{4}*T*_{4} *motif are bent while those with T*_{4}*A*_{4} *motif are straight*. Hagerman (50,51), using a set of elegant experiments, showed that DNA polymers with different polarity of oligo(dA)–oligo(dT) tracts behave very differently with respect to their gel mobilities. The four representative sequences are listed in Table Table11 (sequences 13–16). It can be seen (Table (Table2)2) that all three parameter sets are able to reproduce the observed trend qualitatively. It should be noted that the training set of sequences used for optimization of the LB and BMHT parameters includes the sequences CT_{4}A_{4}G and GT_{4}A_{4}C as well as a number of sequences similar to CA_{4}T_{4}G and GA_{4}T_{4}C. Hence it is not surprising that these two models predict the trend correctly. It is also clear that the BMHT model is far superior in predicting the difference between the highly curved A_{4}T_{4} and non-curved T_{4}A_{4} sequences than the other two models. It is surprising that in spite of using the same training set, the discrimination between these sequences is far less in the case of the LB model, which predicts very low curvature for the highly curved A_{4}T_{4} sequences. It is also quite interesting that the CS model, which is not biased in a similar manner, is in agreement with experimental data for three of the sequences. The only exception is GA_{4}T_{4}C, which is correctly predicted to be bent, but the curvature is slightly less than that expected from gel mobility data. It should be noted that all three models show some context dependence in the case of the highly curved A_{4}T_{4} motif, while experimentally they show nearly identical gel retardation.

**Comparison between the experimental and predicted values for the 96 bp repetitive sequences with an A**

_{4}T_{4}motif and T_{4}A_{4}motif (Table 1: sequences 13–16)*GGGCCC type of motif*. It has been observed experimentally that a non-A-tract sequence, containing the GGGCCC motif, in phase with helix periodicity (Table (Table1:1: sequence 17), in the presence of Mg^{2+}, also shows curvature with magnitude similar to phased A-tracts (41,52,53). In a previous analysis by Goodsell and Dickerson (36), it was noted that only trinucleotide parameters could account for curvature due to the GGGCCC motif, while the dinucleotide models fail to do the same. This study included only the BMHT model out of the three dinucleotide models examined here. It has been reported recently that the structure of a polymer with the repeat sequence, d(GAGGGCCCTA)_{n}, as predicted by the LB model, shows a bending profile indicating significant curvature, though not as pronounced as for A-tracts (27). We therefore evaluated the three parameter sets in the context of this sequence. Since there are no related sequences to compare in this case, we visually examined the bending profile for the structures generated using the three parameters (Fig. (Fig.3).3). The largest cumulative bending angles corresponding to the BMHT, CS and LB parameters are 26°, 25° and 12°, respectively (as shown in Fig. Fig.3B).3B). The corresponding bending angle values for the structure of an A-tract sequence (Table (Table1:1: sequence 1), which has a similar gel mobility, are 162° (BMHT), 141° (CS) and 83° (LB). The other measures also indicate that the magnitude of this bending is very small compared with the A-tract sequence and cannot be considered very substantial. An inspection of the bending angle profile over a longer stretch also gives a clear indication of the serpentine nature of the structures (Fig. (Fig.3B).3B). Though the bending angle value corresponding to the LB model is similar to that reported by Liu and Beveridge (27) and smaller than the values obtained for the BMHT and CS models, we tend to conclude that the magnitude of predicted bending is insignificant over a longer fragment.

**A**) The stereo pairs of the DNA structures of a 96mer fragment with repetitive (GAGGGCCCTA)

_{n}sequence (Table (Table1:1: sequence 17) generated using (a) BMHT parameters, (b) CS parameters and (c) LB parameters. All the structures are represented

**...**

*Mixed group of sequences studied in a single set of experiments*. Some of the sets of sequences discussed above have only a small number of related sequences and hence are insufficient for quantitative comparison with the trends shown by experimental data. Another problem is that each of them is designed to study a particular type of sequence motif and hence each set has very similar sequences. At the same time, different experimental conditions under which the various sets are studied do not allow pooling of all the experimental results together. This confines the comparison of the prediction with experimental data to a qualitative level. However, recently, gel retardation data have become available for a diverse set of sequences which have been studied under identical experimental conditions by Ussery *et al*. (54). This set of sequences has A-tract-containing as well as GC-rich sequences (Table (Table1:1: sequences 18–28). They also re-examined the two sequences, (GA_{4}T_{4}C)_{n} and (GT_{4}A_{4}C)_{n}, studied earlier by Hagerman (50,51), and reported slightly different retardation values. We have also retained them, so that we can carry out quantitative comparison between the experimentally determined gel retardation values (R_{L}) and the predicted curvature calculated as LSC, I_{max}/I_{min} and d/l_{max} (Fig. (Fig.4).4). The correlation coefficients, corresponding to the three dinucleotide parameters (given in the legend to Fig. Fig.4),4), clearly show that the CS model follows the experimental trend better than the other two parameter sets. However, as in other cases, the range of curvature predicted by the CS and LB parameters is smaller than for the BMHT set.

_{L}) and the predicted curvature for the mixed set of sequences studied by Ussery

*et al*. (54) (Table (Table1:1: sequences 18–28), using dinucleotide parameters, i.e.

**...**

*Other natural sequences from prokaryotes and eukaryotes*. Regions of curved DNA are known to be present in many naturally occurring DNA sequences from prokaryotes and eukaryotes (10,20,55). Since most of the sequences have phased A-tracts, it is not surprising that all the models generated using any of the three parameter sets predict them to be bent.

One of the most interesting and systematic studies on curvature of a natural sequence is that of the Alu156 promoter. The Alu156 promoter (Fig. (Fig.5A)5A) isolated from the *Bacillus subtilis* bacteriophage SP82 has two curved elements, one immediately upstream and another downstream of the –35 position (56,57). The curvature in the upstream region is attributed to the presence of phased A-tracts, while the downstream region has two pairs of adenines at positions –5 and +5. Due to its curved nature, the wild-type Alu156 promoter migrates with retarded gel mobility, a rate consistent with a molecule 22% larger than its actual size. McAllister and Achberger (57) altered the rotational orientation of the upstream curved DNA using short DNA insertions of 6, 9, 11, 13, 15, 17, 19, 21, 25 and 29 bp between this curved DNA and the promoter region downstream of the –35 position. When these mutant promoters were analyzed, it was found that changes in rotational orientation of the two curved DNA fragments correlated with changes in promoter function. The most efficient mutant promoters contained insertions of 11 and 21 bp, and insertions of 15 and 25 bp resulted in the least efficient mutant promoters. Since 11 and 21 bp insertion mutants maintain the helical phase of DNA, it is expected that the two curved elements will be in the same rotational orientation as that in the wild type, albeit further apart. For the same reason, in 15 and 25 bp insertion mutants, the direction of the upstream curve element will be opposite with respect to wild type. All three dinucleotide models are successful in predicting curvature in the A-tract-rich upstream region (Fig. (Fig.5).5). The superposition of the mutant structures by aligning the 3′ ends (corresponding to the base pairs between positions –35 and +30) of mutants with the wild-type structure clearly shows that the three dinucleotide models can predict correct rotational orientation in the 11, 21, 15 and 25 bp insertion mutants (Fig. (Fig.5).5). This difference in orientation is most pronounced in the structures predicted by the CS model and least for the LB parameters. The CS model is also superior to other models in predicting curvature in the region downstream of the –35 position. Here it is important to stress that the sequence downstream of the –35 region, in contrast to the upstream region, does not have any A-tracts except for pairs of adenines at positions –5 and +5. The quantitative comparison of the ratio of apparent to actual length, based on the gel mobilities of these sequences, with I_{max}/I_{min} and d/l_{max} values (Fig. (Fig.6),6), also confirms that the CS model predicts the experimental trend very well. [These structures showed large deviations from the circle as well as line fit (see Methods), and hence the curvature calculated in terms of LSC is not considered.]

**A**) The nucleotide sequence of the wild-type Alu156 promoter spanning the region –100 to +30. The site of insertions (–35 position) is indicated by a filled triangle. The runs of adenines in the upstream region and the two pairs

**...**

### Comparison with structures determined using NMR

An accurate determination of nucleic acid structures in solution is now possible due to the recent developments in NMR techniques based on residual dipolar couplings. The solution structure of an oligonucleotide containing an A-tract has been reported recently and helps explain the anomalously slow gel mobility of polymers of this sequence (19). The solution structure of the dodecamer d(GGCAAAAAACGG) (hereafter referred as A_{6}) has an overall helix bend of 19°, while the structure of the control sequence d(GGCAAG AAACGG) with an AT to GC transition in the center of the A-tract has been reported to be bent by 9° (henceforth the control sequence will be referred to as A_{2}G_{1}A_{3}). In addition, a polymer with the A_{6} sequence repeat showed anomalously retarded gel mobility which was considerably reduced in the case of the A_{2}G_{1}A_{3} polymer. The NMR structures provided a structural basis for this change in mobility. Based on the NMR structures, a bend of ~190° was estimated for the d(A_{6}N_{4})_{10} sequence and a bend of 90° for the control sequence d(A_{2}G_{1}A_{3}N_{4})_{10} (19). These estimated bending angles were found to be in agreement with the gel mobility data.

We generated structures of these two sequences using the three different dinucleotide parameter sets. The bending angles for the d(A_{6}N_{4})_{10} sequence are 254° (BMHT), 145° (CS) and 44° (LB), while those for the d(A_{2}G_{1}A_{3}N_{4})_{10} sequence are 65° (BMHT), 87° (CS) and 19° (LB). The values of radius of curvature (in Å), calculated from the corresponding bending angles, are 103 (NMR), 77 (BMHT), 134 (CS) and 448 (LB) for the A_{6} sequence. The bending angle values clearly show that the BMHT and CS parameters correctly predict the relative curvature of the two sequences.

### Relationship to previous comparative studies

To date, there have been only a few systematic attempts to compare the various parameter sets in terms of their predictive power using well-defined geometric measures. Tung *et al*. (45) were the first to compare the Tung–Harvey model (TH model) and an AA-wedge model. The TH model is based on conformational energy calculations (34), while the AA-wedge model attributes DNA curvature to the wedge angle in an AA dinucleotide (23). The analysis was carried out on the seven sequences described by Hagerman (50,51). The comparison was qualitative and based on different geometrical measures. Tung *et al*. concluded that both parameter sets require improvements to predict the DNA structure correctly. The TH model was better than the AA model in predicting the curvature as a function of fragment length. However, the TH model was shown to be qualitatively incorrect in predicting the observed difference in A_{N}T_{N} and T_{N}A_{N} loci, which are accounted for successfully by the AA model.

In a more detailed study, Goodsell and Dickerson (36) compared six bending models including AA-wedge, TH and BMHT dinucleotide models and a trinucleotide model (24,25,30–32,36,37) on four test sequences, comprising phased and out-of-phase A-tracts, a phased GGGCCC motif in the repeat sequence (GAGGGCCCTA)_{n} (Table (Table1:1: sequence 17) and the kinetoplast sequence from *Leishmania tarentolae*. This comparison was carried out based on the successive bending angle profile over the sequence. They observed that, even though all the models could correlate the extent of local bending with the macroscopic curvature behavior for the three sequences having phased A-tracts, the five dinucleotide models, including the BMHT model, fail to predict any curvature for the sequence with a GGGCCC motif. The only trinucleotide model included in this study was that based on nucleosome positioning preferences (36,37). Based on this analysis, it was concluded that only trinucleotide models could reproduce the curvature in this sequence. However, in a recent study (27), it was shown that dinucleotide parameters such as in the LB model could also predict curvature for this sequence. Though our analysis, discussed above, indicates that all the three models predict some amount of curvature for the (GAGGGCCCTA)_{n} sequence, we support the previous view that the magnitude of the predicted curvature cannot be considered to be very significant. At the same time, it may be pertinent to note that this sequence is only observed to be curved in the presence of divalent cations (41).

The contribution of the GGGCCC motif to DNA curvature was also studied by Dlakic and Harrington (39), with different flanking sequences, containing two other motifs, i.e. AAAAA(A) and GAGAG (Table (Table1:1: sequences 10–12). They also stated that trinucleotide models are apparently required for explaining the curvature in these DNA sequence motifs. The analysis included four dinucleotide parameter sets (including the BMHT model) (25,28,31–33) along with two trinucleotide sets (35–37) and three geometrical measures, i.e. the radius of gyration, EED (same as d/l_{max}) and RMI (similar to I_{max}/I_{min}). The evaluation of models was based on qualitative trends. Our analysis, as discussed in the section on ‘Unconventional helical phasing of DNA motifs’, confirms that the BMHT model does not predict the gel mobility trend correctly for these sequences. However, the CS and LB models, which were not included in the earlier analysis, both predict the correct trends. Thus, the CS and LB models, which did not predict significant curvature for the (GAGGGCC CTA)_{n} sequence, could, however, predict the correct trends for sequences which contain the GGGCCC motif in a different sequence context.

### Analysis using trinucleotide parameters

In this context, the sequence (GAGGGCCCTA)_{n} with a GGGCCC motif (Table (Table1:1: sequence 17) was re-examined, using the trinucleotide parameters. We used the ‘BEND’ program (36), which allowed us to include the two sets of trinucleotide parameters, based on DNase I sensitivity (35) and nucleosomal positioning preferences (36,37), in addition to the three dinucleotide parameters (BMHT, CS and LB). It is interesting to see that while the trinucleotide model based on nucleosomal positioning preferences is able to predict the high curvature reported for this sequence correctly, the DNase I sensitivity-based trinucleotide parameters totally fail to account for it (Fig. (Fig.7A).7A). The structures predicted using dinucleotide parameters, i.e. BMHT, CS and LB, have small curvature values that are intermediate between those predicted by the two trinucleotide parameters.

We also carried out a comparison of predicted curvature for the A_{4}T_{4} and T_{4}A_{4} sequences (50,51) using trinucleotide as well as dinucleotide models vis-à-vis the experimental data. As discussed above, the (CA_{4}T_{4}G)_{n} and (GA_{4}T_{4}C)_{n} sequences are bent, while the (CT_{4}A_{4}G)_{n} and (GT_{4}A_{4}C)_{n} sequences are straight (Table (Table1:1: sequences 13–16). The nucleosomal positioning preference model could not predict this trend (Fig. (Fig.7E).7E). In contrast the DNase I sensitivity-based trinucleotide parameters (Fig. (Fig.7F)7F) and dinucleotide param eters (Fig. (Fig.7B,7B, C and D) can correctly account for the experimentally observed trends.

In the case of the A-tract sequence set (Table (Table1:1: sequences 1–9), the average curvature predicted using both the trinucleotide sets showed poor correlation with experimental R_{L}. The calculated correlation coefficients between the average predicted curvature and the experimental R_{L} values for the nucleosomal positioning preference model and DNase I sensitivity-based model are 0.32 and 0.13, respectively. On the other hand, in the case of the mixed sequence set studied by Ussery *et al*. (54) (Table (Table1:1: sequences 18–28), the trinucleotide parameter sets showed quite good correlation with experimental data (correlation coefficient: 0.80 for the nucleosomal positioning preference model and 0.68 for the DNase I model).

It has often been thought that trinucleotide parameters necessarily represent an improvement over the dinucleotide-based descriptions (36,39,58). However, our simple yet comprehensive analysis shows that the currently available trinucleotide parameters fail to predict curvature in some of the most extensively studied sequences. This may be because the gel mobility data essentially reflect static DNA curvature, while trinucleotide parameters, especially the DNase I sensitivity model, reflect the dynamic ability of DNA to adopt a bent conformation. Correct assessment of the ability to predict the DNA flexibility/bendability requires a comparison with experimental data other than those based on gel mobility. It should also be mentioned that the present trinucleotide parameters are based on relative scales and not on actual structural information such as roll, tilt and twist angles. A trinucleotide model based on such three-dimensional structural studies requires a much larger data set than presently available, and hence a rigorous assessment of trinucleotide parameters to predict curvature is not possible.

## CONCLUSION

Various dinucleotide models (BMHT, CS and LB) can account reasonably well for intrinsic bending observed in different sets of sequences, especially those containing A-tracts. Such bending has been reported experimentally from gel mobility studies and corresponds to static curvature. Interestingly, a recent NMR study confirms the occurrence of such bending for an A-tract-containing sequence. The dinucleotide models can also predict very well the curvature for other sets of sequences, such as those studied by Dlakic and Harrington (39) and by Ussery *et al*. (54), which have GC-rich motifs located in different sequence contexts. The only exception is the sequence with a phased GGGCCC motif, where all three dinucleotide models predict only slight curvature. More experimental studies corresponding to novel sequences are needed to refine these parameters and properly account for the curvature in GC-rich sequences. Our analysis also indicates that presently available flexibility-based trinucleotide parameters fail to predict curvature in some of the well-characterized sequences. At present, the paucity of structural data for arriving at reliable tri- or tetranucleotide parameters precludes a quantitative assessment of their ability to predict DNA curvature.

It is interesting that the CS model, which is not based on any kind of fitting procedures, shows better quantitative agreement with the experimentally observed trends than the other two dinucleotide parameter sets. It has often been stated that the crystal structures of oligonucleotides are subject to crystal packing forces and hence may not represent the solution structure. In this context, the success of the CS model, which is based on crystal structure data, in predicting the observed bending of a DNA oligomer in solution is quite noteworthy and indicates that the local geometry of dinucleotide steps may be quite similar in the two environments.

Sequence-dependent bending of DNA is observed at a number of biologically important sites such as promoters, and hence correct understanding of the structure of DNA in genomic sequences is of immense importance. Hence, the success of the dinucleotide parameters, especially that of the CS model, in predicting curvature in natural sequences such as the Alu156 promoter and its insertion mutants, is quite significant. It suggests that the dinucleotide parameters with some further refinements are quite capable of correctly predicting DNA curvature in genomic sequences.

## ACKNOWLEDGEMENTS

We thank Professor N. V. Joshi for valuable discussions. A.K. is a recipient of a senior research fellowship from the University Grants Commission, Government of India. This work was supported by a grant from DBT, India.

## REFERENCES

*Escherichia coli*. Mol. Microbiol., 26, 261–275. [PubMed]

*Staphylococcus aureus*plasmid pT181. Gene, 134, 93–98. [PubMed]

_{2}. Biochemistry, 27, 3423–3432. [PubMed]

*Biological Structure and Dynamics*,

*Proceedings of the Ninth Convention*. Adenine Press, New York, NY, Vol. 1, pp. 121–134.

*Unusual DNA Structures*. Springer-Verlag, Berlin, Germany, pp. 173–187.

*DNA Bending and Curvature*. Adenine Press, New York, NY, Vol. 3, pp. 265–278.

*Molecular Conformation and Biological Interactions*. Indian Academy of Sciences, Bangalore, India, pp. 347–362.

*Escherichia coli*and their functional significance. Mol. Gen. Genet., 226, 367–376. [PubMed]

*Bacillus subtilis*. J. Biol. Chem., 263, 11743–11749. [PubMed]

*Bacillus subtilis*. J. Biol. Chem., 264, 10451–10456. [PubMed]

**Oxford University Press**

## Formats:

- Article |
- PubReader |
- ePub (beta) |
- PDF (280K) |
- Citation

- A refined prediction method for gel retardation of DNA oligonucleotides from dinucleotide step parameters: reconciliation of DNA bending models with crystal structure data.[J Biomol Struct Dyn. 2001]
*Liu Y, Beveridge DL.**J Biomol Struct Dyn. 2001 Feb; 18(4):505-26.* - The effects of sequence context on DNA curvature.[Proc Natl Acad Sci U S A. 1996]
*Dlakić M, Harrington RE.**Proc Natl Acad Sci U S A. 1996 Apr 30; 93(9):3847-52.* - Experimental evaluation of the Liu-Beveridge dinucleotide step model of DNA structure.[Nucleic Acids Res. 2001]
*Hardwidge PR, Maher LJ.**Nucleic Acids Res. 2001 Jun 15; 29(12):2619-25.* - Molecular dynamics simulations of DNA curvature and flexibility: helix phasing and premelting.[Biopolymers. 2004]
*Beveridge DL, Dixit SB, Barreiro G, Thayer KM.**Biopolymers. 2004 Feb 15; 73(3):380-403.* - A simple physical model for the gel electrophoretic manifestations of sequence-dependent DNA superstructures.[Electrophoresis. 1993]
*De Santis P, Palleschi A, Savino M.**Electrophoresis. 1993 Aug; 14(8):699-703.*

- Statistical investigation of position-specific deformation pattern of nucleosome DNA based on multiple conformational properties[Bioinformation. ]
*Yang X, Yan Y.**Bioinformation. 7(3)120-124* - An ensemble of B-DNA dinucleotide geometries lead to characteristic nucleosomal DNA structure and provide plasticity required for gene expression[BMC Structural Biology. ]
*Marathe A, Bansal M.**BMC Structural Biology. 111* - Small local variations in B-form DNA lead to a large variety of global geometries which can accommodate most DNA-binding protein motifs[BMC Structural Biology. ]
*Marathe A, Karandur D, Bansal M.**BMC Structural Biology. 924* - Molecular Dynamics Simulations of the 136 Unique Tetranucleotide Sequences of DNA Oligonucleotides. I. Research Design and Results on d(CpG) Steps[Biophysical Journal. 2004]
*Beveridge DL, Barreiro G, Byun KS, Case DA, Cheatham TE III, Dixit SB, Giudice E, Lankas F, Lavery R, Maddocks JH, Osman R, Seibert E, Sklenar H, Stoll G, Thayer KM, Varnai P, Young MA.**Biophysical Journal. 2004 Dec; 87(6)3799-3813* - Structural properties of promoters: similarities and differences between prokaryotes and eukaryotes[Nucleic Acids Research. 2005]
*Kanhere A, Bansal M.**Nucleic Acids Research. 2005; 33(10)3165-3175*

- CompoundCompoundPubChem Compound links
- PubMedPubMedPubMed citations for these articles
- SubstanceSubstancePubChem Substance links
- TaxonomyTaxonomyRelated taxonomy entry
- Taxonomy TreeTaxonomy Tree

- An assessment of three dinucleotide parameters to predict DNA curvature by quant...An assessment of three dinucleotide parameters to predict DNA curvature by quantitative comparison with experimental dataNucleic Acids Research. May 15, 2003; 31(10)2647

Your browsing activity is empty.

Activity recording is turned off.

See more...