# A set of nearest neighbor parameters for predicting the enthalpy change of RNA secondary structure formation

^{1}Department of Chemistry, University of Rochester, Box 0216, Rochester, NY 14627, USA

^{*}To whom correspondence should be addressed. Tel: 1 585 275 1734; Fax: 1 585 275 6007; Email: ude.retsehcor.cmru@swehtam_divad

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

## Abstract

A complete set of nearest neighbor parameters to predict the enthalpy change of RNA secondary structure formation was derived. These parameters can be used with available free energy nearest neighbor parameters to extend the secondary structure prediction of RNA sequences to temperatures other than 37°C. The parameters were tested by predicting the secondary structures of sequences with known secondary structure that are from organisms with known optimal growth temperatures. Compared with the previous set of enthalpy nearest neighbor parameters, the sensitivity of base pair prediction improved from 65.2 to 68.9% at optimal growth temperatures ranging from 10 to 60°C. Base pair probabilities were predicted with a partition function and the positive predictive value of structure prediction is 90.4% when considering the base pairs in the lowest free energy structure with pairing probability of 0.99 or above. Moreover, a strong correlation is found between the predicted melting temperatures of RNA sequences and the optimal growth temperatures of the host organism. This indicates that organisms that live at higher temperatures have evolved RNA sequences with higher melting temperatures.

## INTRODUCTION

RNA is more than a simple single-stranded sequence carrying genetic information as in the Central Dogma of Biology. For example, it can form tertiary structures that, such as proteins, can be catalytic. Natural and engineered RNA molecules are widely used as functional tools in enzymatic catalysis and genetic control (1–5). One current problem is how to predict the structures of functional RNA sequences.

Secondary structure, the sum of canonical base pairs, is stronger (6–9) and forms faster (10) than tertiary structure. Therefore, secondary structure can largely be determined without knowledge of tertiary structure. Comparative sequence analysis is a standard technique for determining the secondary structure of homologous RNA sequences (11–13). When only a few or even a single sequence is available, the secondary structure at 37°C can be predicted by free energy minimization algorithms (14–17) using a set of empirical free energy parameters, determined from optical melting experiments (17–21). Each parameter only depends on the sequence identity of nucleotides in the motif and in adjacent base pairs and the total free energy is the sum of nearest neighbor terms. The average sensitivity (the percentage of known base pairs that are correctly predicted) of free energy minimization prediction has been benchmarked as high as 72.8 ± 9.4% for a diverse database of sequences having fewer than 800 nt (17). Furthermore, experimentally determined constraints can improve this accuracy of prediction up to 84% (17,18) for sequences with <6% pseudoknotted (non-nested) base pairs (17). Partition function prediction of base pair probabilities can be used to identify base pairs in the predicted lowest free energy structure that are much more likely than average to be in the known secondary structure (22,23). For example, 91.0% of base pairs in the lowest free energy structure with pairing probability of 0.99 or higher are contained in the known structure, on average (22). The high accuracy of thermodynamic structure prediction (17) demonstrates that many RNA secondary structures can be determined from sequences, without knowledge of any tertiary contacts or protein interactions.

The current set of free energy nearest neighbor parameters for predicting the free energy of RNA secondary structure, however, is limited to application at 37°C. Many organisms, thermophiles and psychrophiles, live at temperatures far from 37°C and many experiments are conducted at other temperatures. The prediction of secondary structure of RNA at arbitrary temperature would expand our knowledge of structure and evolution in the RNA world. Moreover, it would facilitate studying and designing functional RNA molecules at temperatures other than 37°C. The enthalpy nearest neighbor parameters can be used in conjunction with available free energy nearest neighbor parameters for 37°C to determine free energy nearest neighbors at other temperatures. But the most recent enthalpy parameters were derived in 1995 using a simple model (24). At that time, no themes had emerged for the sequence-dependent stability of internal loops. Subsequently, the nearest neighbor model for free energy change at 37°C was significantly improved (17) using experimental results. Therefore, we applied the principles of the current free energy nearest neighbor model (17,18) to determine a complete set of enthalpy nearest neighbor parameters using the available optical melting data.

## MATERIALS AND METHODS

### Database of experiments

The database of experimental data for derivation of enthalpy parameters is included in Supplementary Data. It includes 130 hairpin loops (25–31), 37 bulge loops (32,33), 337 internal loops (17,18,34–49) (99 of which are 2 × 2 internal loops), 74 multibranch loops (50,51) and 43 coaxial stacking models (52–55).

### Derivation and refinement of enthalpy parameters

#### Canonical base pairs

The enthalpies of Watson–Crick and GU base pairs were derived by Xia *et al.* (21) and Mathews *et al.* (18), respectively.

#### Dangling ends and terminal mismatches

Dangling ends are unpaired nucleotides adjacent to canonical pairs and their enthalpy parameters were compiled previously (24). Dangling ends on terminal GU pairs are treated similar to dangling ends on terminal AU pairs. Terminal mismatches are non-canonical pairs at the end of helixes. The enthalpy parameters of terminal mismatches are taken from another compilation (20), with the exception of mismatches on terminal GU pairs, which were measured recently (30).

If a terminal mismatch has the potential to pair canonically, the values of A–C and C–A mismatches are used for the purine–pyrimidine mismatch and pyrimidine–purine mismatches, respectively. This is important for partition function calculations, where all possible secondary structures are considered.

#### Hairpin loops

The experimental enthalpies of hairpin loop formation are calculated from published experimental data (25–31) with the following equation:

where $\Delta {H}_{\text{stem}-\text{loop}}^{\text{o}}$ is the experimental value for unfolding the hairpin loop with stem, $\Delta {H}_{\text{stem}}^{\text{o}}$ is calculated by the INN-HB parameters (18,21), without an intermolecular initiation term.

The hairpin loop enthalpy parameters are estimated by linear regression using the same model as free energy nearest neighbor parameters (17), except that the GG first mismatch bonus observed for free energy does not apply for enthalpy because the bonus was not statistically significant for enthalpy. The GG stability bonus is therefore entropic in nature, consistent with the observation that GG mismatches are dynamic (56), i.e. they sample more than one single microstate on short timescales.

The enthalpies of hairpin loops are estimated by the following equation:

where *n* is the number of unpaired nucleotides in the loop. Hairpins with fewer than 3 unpaired nucleotides are not allowed by the model. When *n* = 3, only the initiation term is considered without any bonus and penalty terms, except a penalty for hairpin loops with three Cs. When *n* > 3, the special GU closure bonus applies to GU closed hairpins in which a 5′ closing G is preceded by two G residues; and $\Delta {H}_{\text{bonus}}^{o}$ (UU or GA first mismatch but not AG) is applied to loops with first mismatches of UU or GA (G on the 5′ side and A on 3′ side of loop). The oligo-C penalty applies only to loops composed of all C residues and, if *n* > 3, is calculated with $\Delta {H}_{\text{penalty}}^{o}$ (oligo-C loops, *n* > 3) = A*n* + B. For hairpinloops composed entirely of 3 C residues, the $\Delta {H}_{\text{penalty}}^{o}$ (oligo-C loops, *n* = 3) is applied.

The enthalpy parameters are listed in Table 1 and the database of measured loop enthalpies is available as Supplementary Data. In the absence of data, for hairpin loops longer than 9 nt, the initiation enthalpy is approximated with the initiation term for a hairpin of 9 nt. This assumes that additional instability of hairpin loops as the loop lengthens derives from the entropy (57).

The measured free energies at 37°C of some special hairpin loops of 3, 4 or 6 unpaired nucleotides (30,31,34–36) are either more or less stable by 0.9 kcal/mol than the model predicts. The enthalpies for each of these sequences are listed in a separate lookup table (Table 2), to be consistent with the free energy parameters.

#### Bulge loops

RNA secondary structure is destabilized by bulge loops, which are an interruption of helical structure in one strand only (32,37,38). The initiation terms, $\Delta {H}_{\text{bulge initiation}}^{\xb0}\left(n\right)$ for bulge loops of 1–3 nt, are listed in Table 3. They are the average values of experimental data (32,33), calculated using the following equation:

where the enthalpy of the duplex without bulge is the experimental value of the sequence of the duplex without the bulge or as calculated with INN-HB parameters (21) if the experimental values were not available. $\Delta {H}_{\text{bp stack}}^{\xb0}$ is the stacking enthalpy of the base pairs in the duplex without the bulge that flank the bulge loop in the duplex with the bulge. Because the difference of initiation enthalpies between 2 and 3 nt bulges is almost zero, it is assumed that the increasing instability for longer bulges (*n* ≥ 4) comes from the entropy of the loop closure (39,57). Thus, the initiation enthalpy for bulges longer than 3 nt is approximated as the 3 nt bulge enthalpy.

Assuming that helical stacking is continuous between the adjacent helices for single bulges, but is interrupted by longer bulges (39,40), the enthalpies of bulge loops are calculated with the following equation:

The calculation of enthalpies for the adjacent helices would include the terminal AU/GU penalty (21) for AU/GU pairs adjacent to the bulge loops that are longer than 1 nt. $\Delta {H}_{\text{bp stack}}^{\xb0}$ is the canonical helix stacking enthalpy applied for the two closing base pairs as though the helix was not interrupted by the bulge loop.

#### Internal loops

Internal loop enthalpies were calculated from experimental data (17,18,34–49) using the following equation:

The range of measured enthalpies differs for internal loops of different size and symmetry; therefore, different enthalpy models are used to predict different loop types. The models are similar to those used to model free energies (17).

### 1 × 1 Internal loops (single mismatches)

For single non-canonical pairs (1 × 1 internal loops), the loop enthalpies are approximated by the following equation:

where $\Delta {H}_{\text{loop initiation}}^{\xb0}\left(n=2\right)$ is the enthalpy of initiation for a single non-canonical pair; $\Delta {H}_{\text{AU}/\text{GU}}^{\xb0}$ is the penalty for each AU or GU closing base pair; $\Delta {H}_{\text{GG}}^{\xb0}(1\times 1)$ is a bonus for a GG pair in a 1 × 1 loop; and $\Delta {H}_{{5}^{\prime}\text{RU}/{3}^{\prime}\text{YU}}^{\text{o}}\left(1\times 1\right)$ is a bonus for a 5′RU/3′YU stack in a 1 × 1 loop, where R is a purine and Y is a pyrimidine.

### 2 × 2 Internal loops (tandem mismatches)

The 2 × 2 internal loops, also called tandem mismatches, interrupt helical RNA with two opposing unpaired nucleotides on each strand. Many of the sequence-symmetric 2 × 2 loops have been studied experimentally (17,18, 34–49) and their enthalpies are assembled in a ‘periodic table’ (Table 4). Symmetric sequences that have not been measured are approximated by averaging the most adjacent columns that have been measured. For asymmetric 2 × 2 loops, the enthalpies are approximated using the following equation:

where $\Delta {H}_{\text{GG}}^{\text{o}}$ (12.5 ± 2.7 kcal/mol) is applied to loops with a GG pair adjacent to an AA or any non-canonical pair with a pyrimidine and Δ_{p} (2.4 ± 3.1 kcal/mol) is applied to loops with an AG or GA pair adjacent to a UC, CU or CC pair or with a UU pair adjacent to an AA pair.

### Other internal loops

The enthalpies of other internal loops are approximated using the following equation:

where $\Delta {H}_{\text{loop}}^{\xb0}\left(n\right)$ is the enthalpy of initiation for a loop of *n* nucleotides; $\Delta {H}_{\text{asym}}^{\xb0}$ is a penalty for loops with unequal numbers of nucleotides on each side, with *n*_{1} and *n*_{2} the number of nucleotides on each side; $\Delta {H}_{\text{first non-canonical pairs}}^{\xb0}$ is applied for each sequence-specific first mismatch (Table 5), but it is not applied to loops of the form 1 × (*n* − 1) with *n* > 3 (*n* is the total number of unpaired bases). Special first mismatch bonuses were determined for 2 × 3 and 1 × 2 internal loops with separate linear regressions.

Moreover, the free energy parameters (Table 6) were updated for internal loops based on recent experimental measurements. The free energy parameters were obtained using the method of Mathews *et al.* (17). The recent data include the 3 × 3 loops from Chen *et al.* (41), but excluding the 3 × 3 loops with a middle GA pair. The middle GA pair is shown to enhance stability and this extra stability cannot be predicted by the nearest neighbor parameter set used in this work (41).

#### Coaxial stacking

Coaxial stacking, which is a favorable interaction of two helices stacked end to end, occurs in multibranch loops and exterior loops. Stability increments for coaxial stacking were measured with a structure composed of a short oligonucleotide bound to a single-stranded end of a stem–loop structure, creating a helical interface (52–55). The enthalpy of coaxial stacking is quantified as follows:

where Δ*H*°(correction) is the enthalpy for displacing a 3′ dangling end on the stem–loop structure if one is present.

When the helixes have no intervening mismatches, the enthalpy bonus is approximated by the nearest neighbor parameter (21) of a base pair in a helix. The excess enthalpy above the helical stacking nearest neighbor from Xia *et al.* (21), $\Delta {H}_{\text{coaxial}}^{\xb0}-\Delta {H}_{\text{NN}}^{\xb0}$, for each measured interface was calculated. With flush interfaces, i.e. with no intervening mismatch, and no strand extensions beyond the interface, the average excess enthalpy is −1.53 ± 1.45 kcal/mol. For interfaces followed by strand extensions, the excess enthalpy is 1.82 ± 1.13 kcal/mol. As the excess enthalpy changes are not statistically significant, coaxial stacking of helices with no intervening nucleotides is modeled with the enthalpy parameter in a helix.

With one intervening nucleotide from each strand, two helices can stack with an intervening mismatch between them. There are two stack increments: one is the mismatch stack at the end of one helix with continuous backbone, which is equal to the mismatch stacking parameter on a helix, and the other is the mismatch stack with discontinuous backbone, which is modeled as sequence independent. The average enthalpy of sequence independent stacks is −8.46 ± 2.75 kcal/mol. In addition to this, an enthalpy bonus of −0.4 or −0.2 kcal/mol are applied to intervening mismatches composed of nucleotides that could form a Watson–Crick or a GU base pair, respectively. These bonuses are identical to free energy increments that are used and are empirically found to improve structure prediction accuracy.

#### Multibranch loops

The parameters are determined by linear regression of experimental data for three- and four-way multibranch loops (50,51). In a nearest neighbor model, the bimolecular enthalpy ($\Delta {H}_{\text{bimol}}^{\xb0}$) for the formation of the duplex with a multibranch loop is given by the following equation:

where helix 1 and helix 2 are the intermolecular paired helices with Δ*H*° predicted from nearest neighbor parameters for Watson–Crick pairs (without including bimolecular initiation so that $\Delta {H}_{\text{bimol init}}^{\xb0}$ appears only once). The $\Delta {H}_{\text{product mm}}^{\text{o}}$ is a term that accounts for the stacking enthalpy increment of the nucleotides that can stack on the hairpin loop stems to form a modified motif after the two strands have dissociated. This is the most favorable configuration with coaxial stacking of helixes (in the case of four-way multibranch loops) or of the stacking of unpaired nucleotides. $\Delta {H}_{\text{bimol}}^{\xb0}$ is the experimental value which is taken from ${T}_{\text{M}}^{-1}$ versus ln(*C*_{T}/4) plots. The multibranch loop enthalpy initiation term ($\Delta {H}_{\text{MBL init}}^{\xb0}$) can be calculated from the above equation. The enthalpy of multibranch loops ($\Delta {H}_{\text{MBL}}^{\xb0}$) is then modeled as the sum of two terms, initiation and stacking:

The stacking term is the favorable enthalpy of coaxial stacking, terminal mismatch and/or dangling end stacking. It is determined from the stacking conformation that gives the lowest free energy, as determined by free energy nearest neighbors (50). The initiation term can be approximated by the following equation:

where *a*, *b* and *c* are parameters determined from linear regression (Table 7) and *h* is the number of branching helices. $\Delta {H}_{\text{strain}}^{\xb0}$ is a strain enthalpy that only applies to three-way multibranch loops with fewer than two unpaired nucleotides. The asym term is the average asymmetry that reflects the distribution of unpaired nucleotides, which is defined by the following equation:

The average asymmetry is limited to 2.0, following the rules suggested by free energy parameters. Asymmetry cannot be applied, however, by dynamic programming algorithms for secondary structure prediction (17,22). Thus, the *b* term was excluded for secondary structure prediction and the parameters *a* and *c* were optimized by finding the parameters that lead to the highest average sensitivity of secondary structure prediction by free energy minimization. The maximum sensitivity of prediction was found with *a* = 30.0 kcal/mol and *c* = −2.2 kcal/mol.

### Database of RNA secondary structures

The revised enthalpy nearest neighbor model was tested with RNA sequences with known secondary structure from organisms with known optimal growth temperature. The structures were taken from comparative analysis databases (42–49,58,59). Small (16S) subunit rRNA sequences are divided into domains as defined by Jaeger *et al.* (39). Large (23S) subunit rRNA sequences are divided into domains of fewer than 700 nt each (18). The optimal growth temperatures of different organisms were taken from the Prokaryotic Growth Temperature Database (http://pgtdb.csie.ncu.edu.tw/) and the DSMZ German Collection of Microorganisms and Cell Cultures website (http://www.dsmz.de/). Only the RNA sequences of mesophiles (organisms living at temperatures between 10 and 60°C, but with organisms living at 37°C excluded) were chosen to test the sensitivity and positive predictive value (PPV) of secondary structure prediction. Considering that posttranscriptional modification (60) and high pressure (61) in the thermophiles and hyperthermophiles (organism living above 60°C) would change the thermodynamics of secondary structure formation, sequences from these organisms were excluded. A list of sequences and optimal growth temperatures used are available in Supplementary Data.

### Accuracy of secondary structure prediction

The accuracy of structure prediction is determined by the sum of the canonical base pairs correctly predicted. A base pair is considered correctly predicted even if it is shifted by 1 nt on one side. For example a base pair between nucleotides *i* and *j* is considered to be correctly predicted if any of these base pairs is predicted: *i* to *j*, *i* to *j* − 1, *i* to *j* + 1, *i* − 1 to *j* or *i* + 1 to *j*. The predicted base pair between *i* − 1 and *j* + 1, however, is not considered to be correct. This scoring scheme reflects the uncertainty of exact base pair matches in comparative sequence analysis and the possibility for dynamics in base pairing. The values of sensitivity and PPV of this scoring scheme are ~2–3% higher than when determined with exact base pairing only, where only the *i* to *j* base pair is considered to be correct. The prediction accuracies are shown in Supplementary Tables 11 and 12. Each table includes accuracies determined when pairs can be shifted and when pairs must be an exact match.

### Availability of parameters

Machine-readable tables of the enthalpy parameters are available on the Mathews lab website (http://rna.urmc.rochester.edu/).

## RESULTS

### Nearest neighbor model parameters

In the nearest neighbor model of free energy (17,18), the parameters for Watson–Crick base pairs are well determined at 37°C with errors <10%, or ~0.1–0.2 kcal/mol (21). For other motifs such as loops and GU base pairs, individual nearest neighbor free energy increments are often determined with an error <0.5 kcal/mol (17,18). In order to extend the current model to predict free energy at temperatures other than 37°C, enthalpy parameters consistent with the current nearest neighbor model are required. The free energy at arbitrary temperature for each parameter is then

where the enthalpy (Δ*H*°) and entropy (Δ*S*°) are assumed to be temperature independent. As described in Materials and Methods, parameters for enthalpy prediction, compatible with the free energy model, were determined using available experimental data from optical melting experiments.

Experimental studies consistently demonstrate that enthalpy and entropy measurements have considerably larger percent error than free energy measurements. Free energy at 37°C is determined with greater precision because of correlation between errors in enthalpy and entropy (21). The larger experimental errors in enthalpy result in larger percent errors for enthalpy nearest neighbor parameters than free energy parameters. The enthalpy of RNA secondary structure is known to be a function of temperature. A linear model for heat capacity change predicts the following:

where $\Delta {C}_{\text{p}}^{\xb0}$ is a constant heat capacity change and *T*_{0} is a chosen reference temperature. It is hypothesized that the heat capacity change arises from the extent of stacking increasing with decreasing temperature. Thus, $\Delta {C}_{\text{p}}^{\xb0}$ is negative because single strands are more organized at low rather than high temperature (62–67). The $\Delta {C}_{\text{p}}^{\xb0}$ can be estimated by linear fits of enthalpy and entropy changes as a function of melting temperature (50,51,62) or determined by isothermal titration calorimetry at multiple temperatures (68,69). However, the effects of heat capacity change on enthalpy and entropy are antagonistic in terms of free energy change:

Therefore, for certain Δ*T* (Δ*T* = *T* − *T*_{0}), $\Delta {C}_{\text{p}}^{\xb0}$ can be neglected because the effects are compensated in terms of free energy. To calculate the compensation for a set of RNA duplexes (62), the free energy, Δ*G*°, was derived directly from Equation 4 assuming that the entropy and enthalpy were independent of temperature. Then the temperature-dependent free energy, $\Delta {G}_{\text{T}}^{\xb0}$, was calculated with the measured non-zero $\Delta {C}_{\text{p}}^{\xb0}$ from Equations 2–4. The free energy difference, ΔΔ*G*° = $\Delta {G}_{\text{T}}^{\xb0}$ − Δ*G*°, increases with the deviation of temperature from *T*_{0} (37°C) (Figure 1). The exact ΔΔ*G*° for each duplex is shown in Table 8 for different temperatures. The experimental error in individual loop free energy nearest neighbor parameters at 37°C is as large as 0.5 kcal/mol (17), which corresponds to roughly a factor of 2 in equilibrium constant. Thus, the small ΔΔ*G*° for helices suggests that the approximation of $\Delta {C}_{\text{p}}^{\xb0}$ = 0 is reasonable for predictions from ~10 to 60°C. Therefore, the enthalpy parameters derived here assume $\Delta {C}_{\text{p}}^{\xb0}$ = 0 and are most accurate at predicting free energy change close to 37°C.

**A**) Free energy difference of RNA duplex CCGGUp. Δ

*G*° (dashed line) was derived from Equation 3, where enthalpy and entropy were averaged from the optical melting curve fits, assuming that they were independent of the temperature. $\Delta $

**...**

### Dynamic programming algorithm for RNA secondary structure prediction

RNAstructure is a program for RNA secondary structure prediction and analysis. It includes prediction of secondary structure by free energy minimization (17), prediction of base pair probabilities using a partition function (22), the efn2 function for predicting the free energy change of folding given a sequence and secondary structure (18), and the Dynalign algorithm for finding the secondary structure common to two sequences (70). RNAstructure was revised to make predictions at user-defined temperature. Because large internal loops are more likely at high temperature, the previous limitation on internal loop size (fewer than 30 unpaired nucleotides) (17,18,22) was removed by implementing the method of Lyngsø *et al.* (71). This provides an *O*(*N*^{3}) algorithm that can predict internal loops of arbitrary size. Benchmarks for calculation time and memory requirement with and without this revision are shown in Table 9.

### Sensitivities and PPVs of structure predictions

The enthalpy nearest neighbor parameters were compared with the previous parameters and model for enthalpy and free energy assembled by Serra and Turner (24) by predicting the secondary structures of RNA sequences with known secondary structures. Sensitivities, the percent of known base pairs that are correctly predicted, using both sets of parameters are shown in Figure 2 (detailed numbers are in Supplementary Table 11A) for different types of structural RNA sequences. The known structures of these sequences were taken from comparative analysis databases (42–49,58,59). The average sensitivity is improved from 65.2 to 68.9% using the new parameters assembled here. Sensitivities are improved for most types of the RNA. The exceptions are 5S rRNA and Group II introns.

**...**

To test the enthalpy parameters, the accuracy of secondary structure prediction at optimal growth temperature was compared to the accuracy of structure prediction at 37°C for organisms that do not grow optimally at 37°C for several types of RNAs (Table 10). The comparison of predictions was shown in different groups divided by optimal growth temperature. The organisms in each group grow optimally in a certain range of temperatures. Compared to the prediction at 37°C, structure prediction at optimal growth temperature performs better for the organism living at temperatures between 22 and 37°C, but is worse at other optimal growth temperatures. This suggests that when enthalpy parameters are assumed to be temperature independent, their utility as a tool for deriving free energy parameters for use in predicting the lowest free energy structure is limited to a narrow temperature range. Small errors in enthalpy change parameters have a larger effect on free energy change parameter determination (Equation 1), the farther the temperature is from 37°C.

Figure 3 shows the PPV for base pairs from the lowest free energy structure for base pairs with different pairing probabilities (see detailed numbers in Supplementary Table 12A). They are predicted using a partition function calculation at optimal growth temperature (22). PPV is the percentage of predicted base pairs that are found in the known structure. The average PPV of all pairs in the lowest free energy structures is only 62.0%, which is lower than the sensitivity (68.9%). This suggests that the model over-predicts base pairs and/or that the base pairs may not be annotated completely in the structures from comparative analysis (22). For example, if a base pair is completely conserved, then it is sometimes not annotated by comparative analysis (42–49,58,59). Base pair probabilities for all possible pairs are calculated with a partition function and grouped by different thresholds. The PPV is significantly higher for predicted base pairs in the lowest free energy structure with higher pairing probability. The average PPV is up to 90.4% for those known base pairs having probability of 0.99 or above. It has been demonstrated previously that base pair probabilities predicted at 37°C can be used to find pairs with high PPV (22). The fact that this holds true at other temperatures shows that the enthalpy parameters are robust for base pair probability prediction.

**...**

The fact that the accuracy of secondary structure prediction is sensitive to the accuracy of the nearest neighbor parameters, but the base pair probabilities remain a robust measure of confidence for a wide variety of temperatures is consistent with a previous work. Layton and Bundschuh (72) demonstrated that the predicted lowest free energy structure was often changed in repeated structure predictions after random adjustments of the nearest neighbor parameters within the limits of their error. Base pair probabilities, however, were less perturbed by changes in the parameters (72). With the extrapolation of nearest neighbor parameters to temperatures far from 37°C, the accuracy of the predicted lowest free energy structure is often reduced as compared to structure prediction at 37°C. The ability of the partition function predicted base pair probabilities to determine base pairs predicted with a higher confidence is unchanged with secondary structure prediction at temperatures far from 37°C. This is because the determination of base pair probabilities is not as perturbed by errors in the nearest neighbor parameters.

An example of secondary structure prediction at 37°C and at optimal growth temperature of 30°C is shown in Figure 4 for a tRNA sequence. The base pairs with higher predicted pairing probability (color annotated according to pairing probability in Figure 4B and C) are pairs predicted with greater confidence. For this sequence, secondary structure prediction is more accurate and the fidelity of structure prediction (as judged by the percent of high probability pairs) is improved at optimal growth temperature.

### Correlation between melting temperature and optimal growth temperature

Melting temperature, *T*_{m}, is defined as the temperature at which half of strands are unpaired. Assuming that an RNA melts with a two-state transition, the melting temperature (in Kelvins) of a single-stranded RNA structure can be predicted by *T*_{m} = Δ*H*°/Δ*S*° (73). For example, the predicted melting temperatures (°C) for all hairpins in the database of optically melted sequences (Supplementary Data) (25–31) are plotted in Figure 5 as a function of experimentally determined *T*_{m}. This shows that the parameters adequately reflect the thermal stabilities of RNA sequences with known *T*_{m}. Better correlation was found at higher temperatures. This is expected because most hairpins were measured with high melting temperatures in experiments (25–31).

Melting temperature reflects the thermal stability of a structure. Therefore RNA structures in organisms living at higher temperature are expected to have higher melting temperatures. Figure 6A shows a plot of predicted melting temperatures of the lowest free energy structure versus organism optimal growth temperature (10–90°C). A strong correlation (linear correlation coefficient of 0.797) is found between the melting temperature and the optimal growth temperature for different types of RNA structures. On the other hand, there appears to be less correlation between nucleotide content and optimal growth temperature (Figure 6B–D) for diverse types of RNA, although uracil content of 16S rRNA of thermophiles and psychrophiles were found recently to correlate inversely with their optimal growth temperatures (74). Evidently, the thermal stability of RNA structure is not simply controlled by base content. Organisms that grow at high temperature have apparently evolved RNA secondary structures with a combination of motifs that provide thermal stability.

## DISCUSSION

The nearest neighbor parameters for enthalpy were derived here using similar rules as for free energy nearest neighbor parameters at 37°C (17). This makes these parameters useful for determining free energy parameters at arbitrary temperature that are compatible with dynamic programming algorithms for secondary structure prediction. Some of the enthalpy parameters have large percent standard errors as compared with the parameters of free energy. This reflects the larger errors in the experimental results of enthalpy than free energy, but it also suggests that enthalpy may be more sequence dependent than free energy. This sequence dependence cannot be determined using the currently available database of optical melting experiments and suggests a need for further optical melting experiments on model RNA systems.

Another source of error comes from the assumption that the enthalpy and entropy are independent of the temperature in both the model and in the analysis of optical melting experiments. When the temperature is too far from 37°C, the sensitivity of prediction is expected to be worse than 68.9% on average because of the approximation of $\Delta {C}_{\text{p}}^{\xb0}=0$. For example, experiments demonstrate cold denaturation of RNA (68,69), but the nearest neighbor model does not reproduce those results. Further experiments by isothermal titration calorimetry would be needed to provide the data for a model that can include a non-zero heat capacity change.

There are common error sources that should be considered for the prediction of base pairs. Free energy minimization assumes that the secondary structure is at equilibrium. The nearest neighbor model is an incomplete representation of structural free energy. The parameters average some sequence-specific effects and were derived from a limited set of experiments. Some RNA sequences, in particular mRNA, may sample multiple structures at equilibrium. The parameters are derived from experimental data at 1 M NaCl, whereas the salt concentration in different organisms may be very different.

In spite of all these limitations, the nearest neighbor model predicts secondary structures with a 72.8% average sensitivity (17). Recent experimental results on the self-folding of the 16S rRNA 5′ domain (75) support the assumption of thermodynamic control of folding pathway. Moreover, the base pair prediction with the partition function can be used to determine pairs predicted with greater confidence (22).

In spite of the fact that the enthalpy parameters have larger percent errors than the free energy parameters for 37°C, the enthalpy parameters are able to predict optical melting temperatures for small model sequences. Predicted melting temperatures for structural RNA sequences correlate well with optimal growth temperature, suggesting that these parameters capture many of the sequence-dependent features of RNA folding enthalpy change.

## SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

## Acknowledgments

The authors thank Rahul Tyagi and Andrew V. Uzilov for helpful discussions. D.H.M. is an Alfred P. Sloan Research Fellow. This work was supported by National Institutes of Health Grants GM22939 to D.H.T. and GM076485 to D.H.M. Funding to pay the Open Access publication charges for this article was provided by National Institutes of Health.

*Conflict of interest statement.* None declared.

## REFERENCES

*Caenorhabditis elegans*. Science. 2001;294:858–862. [PubMed]

*Escherichia coli*formylmethionine transfer RNA. J. Mol. Biol. 1974;87:63–88. [PubMed]

_{2}and determinants of stability for single guanosine–guanosine base pairs. Biochemistry. 2000;39:11748–11762. [PubMed]

^{+}pairs. Biochemistry. 1991;30:8242–8251. [PubMed]

**Oxford University Press**

## Formats:

- Article |
- PubReader |
- ePub (beta) |
- PDF (402K) |
- Citation

- Thermodynamic characterization of single mismatches found in naturally occurring RNA.[Biochemistry. 2007]
*Davis AR, Znosko BM.**Biochemistry. 2007 Nov 20; 46(46):13425-36. Epub 2007 Oct 24.* - Sequence dependence of the stability of RNA hairpin molecules with six nucleotide loops.[Biochemistry. 2006]
*Vecenie CJ, Morrow CV, Zyra A, Serra MJ.**Biochemistry. 2006 Feb 7; 45(5):1400-7.* - TurboFold: iterative probabilistic estimation of secondary structures for multiple RNA sequences.[BMC Bioinformatics. 2011]
*Harmanci AO, Sharma G, Mathews DH.**BMC Bioinformatics. 2011 Apr 20; 12:108. Epub 2011 Apr 20.* - Revolutions in RNA secondary structure prediction.[J Mol Biol. 2006]
*Mathews DH.**J Mol Biol. 2006 Jun 9; 359(3):526-32. Epub 2006 Feb 6.* - Measuring the thermodynamics of RNA secondary structure formation.[Biopolymers. 1997]
*SantaLucia J Jr, Turner DH.**Biopolymers. 1997; 44(3):309-19.*

- PubMedPubMedPubMed citations for these articles
- SubstanceSubstancePubChem Substance links
- TaxonomyTaxonomyRelated taxonomy entry
- Taxonomy TreeTaxonomy Tree

- A set of nearest neighbor parameters for predicting the enthalpy change of RNA s...A set of nearest neighbor parameters for predicting the enthalpy change of RNA secondary structure formationNucleic Acids Research. Oct 2006; 34(17)4912

Your browsing activity is empty.

Activity recording is turned off.

See more...