• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of rnaThe RNA SocietyeTOC AlertsSubscriptionsJournal HomeCSHL PressRNA
RNA. Oct 2009; 15(10): 1805–1813.
PMCID: PMC2743040

Improved RNA secondary structure prediction by maximizing expected pair accuracy

Abstract

Free energy minimization has been the most popular method for RNA secondary structure prediction for decades. It is based on a set of empirical free energy change parameters derived from experiments using a nearest-neighbor model. In this study, a program, MaxExpect, that predicts RNA secondary structure by maximizing the expected base-pair accuracy, is reported. This approach was first pioneered in the program CONTRAfold, using pair probabilities predicted with a statistical learning method. Here, a partition function calculation that utilizes the free energy change nearest-neighbor parameters is used to predict base-pair probabilities as well as probabilities of nucleotides being single-stranded. MaxExpect predicts both the optimal structure (having highest expected pair accuracy) and suboptimal structures to serve as alternative hypotheses for the structure. Tested on a large database of different types of RNA, the maximum expected accuracy structures are, on average, of higher accuracy than minimum free energy structures. Accuracy is measured by sensitivity, the percentage of known base pairs correctly predicted, and positive predictive value (PPV), the percentage of predicted pairs that are in the known structure. By favoring double-strandedness or single-strandedness, a higher sensitivity or PPV of prediction can be favored, respectively. Using MaxExpect, the average PPV of optimal structure is improved from 66% to 68% at the same sensitivity level (73%) compared with free energy minimization.

Keywords: RNA secondary structure, free energy minimization, partition function, nearest-neighbor model

INTRODUCTION

Many classes of functional RNAs have been discovered, and the pace of discovery has accelerated over the last decade. These are termed noncoding RNAs (ncRNA) because they function directly and not via a protein product (Eddy 2002). The multiple functions of ncRNA have fundamentally changed the original formulation of the Central Dogma of Biology, where it was originally thought that proteins were the only final effectors of genetics. Both the discovery (Washietl et al. 2005; Uzilov et al. 2006; Torarinsson et al. 2008) and the analysis of function (Mathews and Turner 2006) of ncRNA depend on the determination of structure.

RNA secondary structure refers to the set of canonical base pairs (A-U, G-C, and G-U) in an RNA sequence. The most popular methods for RNA secondary structure prediction from sequence are those methods that predict the most probable structure. One such approach is based on a biophysical model that predicts the preferred equilibrium structure, the structure with lowest Gibbs free energy change of folding. In free energy minimization, a nearest-neighbor model (Xia et al. 1998; Mathews et al. 1999, 2004; Lu et al. 2006) is used to predict the conformational stability of a given structure. The set of free energy and enthalpy change nearest-neighbor parameters was derived by linear regressions from a set of optical melting experiments on model systems (Mathews et al. 1999, 2004; Lu et al. 2006). An alternative to finding the most likely structure by free energy minimization is the approach in which the most probable structure is found using a stochastic context-free grammar (Dowell and Eddy 2004). The parameters underlying the stochastic context-free grammar are trained on the set of sequences with known structures. Recently, free energy minimization and knowledge-based structure prediction were combined (Andronescu et al. 2007). The product of this work is an alternative set of free energy change parameter values. These were derived from constraint satisfaction on the set of optical melting experiment data and a large database of RNA sequences with known secondary structures (Andronescu et al. 2007).

Secondary structure prediction can be benchmarked for accuracy using sensitivity and positive predictive value (PPV) for base-pair prediction. Sensitivity is the percentage of known pairs correctly predicted, and PPV is the percentage of predicted pairs that are in the known structure. These two statistics are calculated as:

equation image

The sensitivity of free energy minimization has been benchmarked as high as 73% on a diverse database of RNA sequences with known structures of fewer than 700 nucleotides (Mathews et al. 2004; Mathews and Turner 2006), but the PPV of free energy minimization is only 66% (Mathews 2004). The lower PPV has two causes. First, there is a tendency to overpredict base pairs because the formation of pairs lowers the free energy change. Second, occasionally the database of known structures does not annotate all experimentally determined base pairs, and predicting a correct pair that is not annotated lowers PPV.

As an alternative method to finding the most probable structure, the structure with maximum expected accuracy can be predicted. Pseudo-knot-free structures are predicted by maximizing the sum of the base-paired and single-stranded nucleotide probabilities, called expected accuracy, where pairing probabilities can be weighted by a factor, γ (Do et al. 2006):

equation image

where Pbp(i, j) is the base-pair (bp) probability for a nucleotide at position i and a nucleotide at position j; and Pss(k) is the single-stranded (SS) nucleotide probability for the nucleotide at position k. The sums are over all base pairs (BPs), the set of BPs, and all SS, the set of SS nucleotides in structure S. It makes intuitive sense that this is a measure of expected folding accuracy because it has previously been shown that base pairs with high base-pairing probabilities in the thermodynamic ensemble are more likely to be in the known structure (Mathews 2004). For example, in the lowest free energy structure, base pairs with pair probability >99% have a 91% PPV (Mathews 2004).

Do et al. (2006) first introduced the concept of predicting the maximum expected accuracy structure to RNA secondary structure prediction with their program CONTRAfold. CONTRAfold uses probabilistic parameters learned from a set of RNA secondary structures to predict base-pair probabilities and then predicts structures using the maximum expected accuracy approach. Subsequently, maximum expected accuracy structure prediction was applied to predicting the secondary structure common to multiple, aligned RNA sequences (Kiryu et al. 2007). The underlying pair probabilities in that approach were calculated using a composite score of free energy change and covariation (Hofacker et al. 2002).

This study explores the use of maximum expected accuracy in single sequence secondary structure prediction, where thermodynamics are utilized to predict the underlying base-pair probabilities. A partition function calculation (McCaskill 1990; Mathews 2004) predicts base-pair probabilities with the current free energy change parameters at 37°C (Mathews et al. 2004). These probabilities are then utilized by a dynamic programming algorithm, implemented in the program MaxExpect, to assemble the structure with maximum expected accuracy as in CONTRAfold. The method is tested on a diverse database of different types of RNA, with the weighting factor, γ, varied to show the trade-off between sensitivity and PPV. For γ equal to 1, the average positive predictive value (PPV) of MaxExpect is 68%, with a sensitivity the same as the free energy minimization method, 73%. Therefore, the predicted maximum expected accuracy structure is of higher average accuracy than the minimum free energy structure.

In addition to the optimal structure, other possible or competing structures, called suboptimal structures, can also be predicted by MaxExpect. Suboptimal structures with expected pair accuracy lower than the optimal structure are predicted using a heuristic method similar to that developed previously for free energy minimization (Steger et al. 1984; Zuker 1989). The predicted suboptimal structures serve as alternative hypotheses for the structure. To our knowledge, this is the first implementation of maximum expected accuracy structure prediction that implements suboptimal structure prediction.

RESULTS

Maximization of expected accuracy (implemented in the program MaxExpect) provides a best estimate for an RNA secondary structure by maximizing expected accuracy of base pairs and single-stranded nucleotides (see details in Materials and Methods). The base-pair probabilities and single-strand probabilities are calculated from a partition function calculation (McCaskill 1990; Mathews 2004), which can be constrained with experimental data (Mathews 2004), such as chemical modification (Ehresmann et al. 1987) or enzymatic cleavage (Knapp 1989). Here it is shown that the optimal structure predicted from expected accuracy maximization provides better average accuracy than the structure predicted from free energy minimization, although both methods are based on the same nearest-neighbor parameters for folding free energy change (Mathews et al. 2004).

RNA secondary structure prediction accuracy using MaxExpect

MaxExpect was tested on a diverse database of RNA sequences with known secondary structures (see Materials and Methods). The optimal structure, the maximum expected accuracy structure, is predicted and compared with the known structure in the database, and the accuracy of prediction is reported as sensitivity and PPV. The sensitivity versus PPV trade-off parameter, γ from Equation 1, is varied from 10−5 to 106 to find the optimal preference of double strandedness versus single strandedness. The plot of sensitivity as a function of PPV is shown in Figure 1. In addition to optimal structure prediction, a maximum of 750 suboptimal structures were predicted. The performance of the best suboptimal structure, i.e., the structure having the highest sensitivity, is also shown in Figure 1. This structure can only be determined by knowing the correct structure, but it represents the best hypothesis generated by MaxExpect.

FIGURE 1.
Performance of different prediction methods: optimal structure prediction using free energy minimization, optimal structure prediction using MaxExpect, suboptimal structure prediction using MaxExpect, CONTRAfold 1.10, CONTRAfold 2.02, and structure assembling ...

A simple method related to maximizing expected accuracy is to assemble structures composed of base pairs with base-pairing probability larger than a threshold. The predicted structure is a valid secondary structure (each nucleotide has only one pairing partner) if the threshold is 0.5 or higher. This has been previously shown to predict structures with higher PPV than free energy minimization (Mathews 2004). Furthermore, for a threshold of exactly 0.5, the approach is similar to the approach of predicting the ensemble centroid using statistical sampling by SFold (Ding and Lawrence 2003; Ding et al. 2005). In fact, in the limit of an infinite sample size, the two approaches are equivalent. As shown in Figure 1 for a variety of threshold levels, the threshold method can produce structures with similar high PPV to those generated with a large γ in maximization of expected accuracy. It cannot, however, predict structures with as high sensitivity because many true base pairs having base-pair probabilities of 0.5 and below are missed.

The prediction performance of free energy minimization (Mathews et al. 2004), CONTRAfold 1.10, and Contrafold 2.02 (Do et al. 2006) are compared with MaxExpect in Figure 1. As compared to free energy minimization, MaxExpect offers an improved PPV at roughly the same sensitivity of structure prediction when γ = 1. CONTRAfold 2.02 performs similarly to MaxExpect and is a significant improvement over CONTRAfold 1.10. The folding parameters used by CONTRAfold 1 were learned from 151 representative RNA structures (Do et al. 2006) chosen from the Rfam database (Griffiths-Jones et al. 2003). For CONTRAfold 2, parameters were learned from a much larger data set, the S-Processed data set, with 3439 structures (Andronescu et al. 2007).

Table 1 summarizes the accuracies of MaxExpect, free energy minimization, and CONTRAfold 2.02 for each type of RNA in the database. The MaxExpect accuracy is reported for γ = 1, which is a point at which PPV is improved over free energy minimization. For CONTRAfold, accuracy is reported for γ = 6, which is the default value of γ (Do et al. 2006).

TABLE 1.
Sensitivity and PPV of prediction methods

There is overlap in the S-Processed data set used for training CONTRAfold 2.02 and the testing data set used for this work. To examine the effect of this overlap, a cross-validation approach was taken for additional training and testing of CONTRAfold. For this cross-validation, all instances in the S-Processed data set for a specific type of RNA were removed from the parameter training, and then those parameters were applied to predict the structures of that type. For example, all the RNase P structures were removed when training a set of parameters used to predict RNase P structures for testing performance on RNase P. Interestingly, CONTRAfold performance was marginally improved, so that sensitivity is higher at a given PPV with the cross-validation testing strategy (Supplemental Fig. 1). For example, with the non-cross-validated CONTRAfold, at γ = 6.3, the average sensitivity is 73.3% with a PPV of 65.7%. Using the cross-validated approach, at a sensitivity of 73.7%, the PPV is 67.2% using γ = 3.2. Supplemental Table 1 shows a comparison of accuracies for all RNA types at γ = 6. This improvement may be because more iterations of training were allowed during the cross-validation strategy than for the determination of the standard parameter set. The results support the hypothesis that the parameter estimation by CONTRAfold is robust and that CONTRAfold is likely to do as well predicting the structure of a novel RNA type as it is in predicting structures for known RNA types.

Time benchmarks

CONTRAfold 2.02 runs faster than MaxExpect and free energy minimization (Table 2). The calculation for maximizing expected accuracy using thermodynamic parameters has two parts: calculating the partition function, which requires most of the computation time, and then MaxExpect. The computation time and memory cost are shown in Table 2. The computation time complexity is O(N3) and memory complexity is O(N2) for MaxExpect. A significant factor in the longer time required by free energy minimization, and the partition function used here is the explicit calculation of helical coaxial stacking, which was previously demonstrated to take about two-thirds of the total calculation time and is absent from the CONTRAfold calculation (Mathews 2004). Additionally, MaxExpect calculates suboptimal structures, and the method that facilitates this requires twice the calculation time of finding the optimal structure alone (see Materials and Methods).

TABLE 2.
Calculation size and time for different RNA sequence lengths

Differences between minimum free energy structures and maximum expected accuracy structures

The distributions of base-pair probabilities in optimal structures are shown in Table 3. As expected, the average base-pair probability in the optimal structure predicted from MaxExpect is, on average, 0.042 higher than from free energy minimization when the weighting factor, γ, for MaxExpect is 1. The portion of predicted base pairs with pairing probability larger than 0.50 is 7.8% higher using MaxExpect than using free energy minimization, although the difference is only 0.7% between the portion of base pairs with pairing probability >0.99. This shows that most highly probable pairs are already incorporated into structures predicted by free energy minimization. As has been demonstrated in a previous study on minimum free energy structures (Mathews 2004), base pairs with higher pairing probability provide more confidence of prediction accuracy than base pairs with lower pairing probability. Therefore, one of the reasons MaxExpect outperforms free energy minimization is because maximum expected accuracy structures include similar number of base pairs with high pairing probability but fewer base pairs with low pairing probability than minimum free energy structure. This fact is illustrated in Figure 2, where an example of improved structure prediction accuracy by maximizing expectation is illustrated for a 5S rRNA. The structure predicted by maximizing expected accuracy has a sensitivity of 91.4% and a PPV of 86.5%. Free energy minimization, however, has a sensitivity of 54.3% and a PPV of 55.9%.

TABLE 3.
The distribution of base pair probabilities (Pbp) in maximum expected accuracy structures and minimum free energy structures
FIGURE 2.
Example of structures predicted from MaxExpect and from free energy minimization. (A) The known structure for the Methanococcus thermolithotrophicus 5S rRNA in the database of 5S rRNA (Szymanski et al. 1998). (B) The predicted maximum expected accuracy ...

DISCUSSION

Free energy minimization provides the single best guess of the secondary structure for an RNA sequence because it is the single most probable structure in an ensemble (Mathews et al. 2004). There are drawbacks to this analysis, however. It implicitly assumes a single conformation for the RNA at equilibrium even though multiple states of secondary structure can exist in the solution for one RNA sequence. This analysis also assumes that the underlying nearest-neighbor parameters have the required resolution to pick the correct conformation from other conformations.

The base-pair probabilities calculated from a partition function provide the information about base pairs shared by multiple secondary structures. The base pairs with higher base-pair probability are more likely to be correctly predicted than base pairs with lower base-pairing probability (Mathews 2004). Furthermore, base-pair probabilities are less prone to change due to errors in the thermodynamic parameters than is the identity of the lowest free energy structure (Layton and Bundschuh 2005). Base-pair probabilities can therefore overcome uncertainties arising from errors in the parameters. Maximization of expected accuracy (MaxExpect) utilizes this feature to predict RNA secondary structure by maximizing the weighted probability of base-pairing and single-strandedness. This is why MaxExpect can predict structures with the same sensitivity as free energy minimization and also with higher positive predictive value (PPV). The free energy parameters in the nearest-neighbor model, which are derived from experiments, have errors because of the limited number of experiments that have been conducted compared to the possible sequence space and because there are some non-nearest-neighbor effects that are neglected.

The free energy parameters used in this study are derived at 37°C even though the tested RNA sequences are derived from organisms living at different temperatures. A method for predicting free energy change at temperatures other than 37°C has been developed in a previous study (Lu et al. 2006). RNA secondary structure prediction by maximizing expected accuracy should be improved by making predictions of base-pair probabilities at the organism's optimal growth temperature from which the sequence was derived.

Maximizing the expected accuracy of structure prediction improves the accuracy of structure prediction for all types of noncoding RNAs tested with the exception of signal recognition particle RNA (Table 2). Base-pair probability independence is assumed when predicting the maximum expected accuracy structure and may be a drawback because helix formation is cooperative. Signal recognition particle RNAs have a predominantly helical region connecting the Alu and S domains (composed of the helices labeled 5) (Zwieb et al. 2005). It is possible that this feature is not being correctly predicted because the cooperativity of helix formation is not included in maximization of expected accuracy other than during the calculation of individual base-pair probabilities in the partition function. A revised measure of expected accuracy that utilizes conditional stack probabilities (Bompfunewerer et al. 2008) could perhaps further improve the accuracy of structure prediction.

It is particularly interesting that CONTRAfold and MaxExpect predict secondary structures with similar overall accuracy. The nearest-neighbor models utilized by both approaches are similar, but the parameter values were determined with different methodologies. A recent study further optimized the parameters for a similar nearest-neighbor model by fitting both the set of optical melting data and the set of RNA sequences with known structure, demonstrating the potential for further accuracy improvement (Andronescu et al. 2007). Further breakthroughs in prediction accuracy will require innovation in the underlying stability models by discovery of features that affect stability that are currently unknown. These innovations will derive from new computational analyses (Parisien and Major 2008) and continued experimental determination of motif stabilities (Chen et al. 2005; Blose et al. 2007; Clanton-Arrowood et al. 2008; Davis and Znosko 2008).

MATERIALS AND METHODS

Nearest-neighbor parameters

The nearest-neighbor parameters for Watson–Crick helices are those determined by Xia et al. (1998); the parameters for GU pairs are by Mathews et al. (1999); and those for loops are by Mathews at al. (2004), with one exception. The multibranch loop parameter for the bonus for every additional helix in the multibranch loop was previously modified from −0.6 to −0.9 (Mathews et al. 2004) to be within the range of experimental error (Mathews and Turner 2002) and provide the highest prediction sensitivity. For this study, the experimental value of a multibranch loop parameter (Diamond et al. 2001; Mathews and Turner 2002; Mathews et al. 2004) was used for both free energy minimization and the partition function calculations so that no optimization has been performed on the nearest-neighbor parameters.

Maximizing expected accuracy

The base-pair probability of an i-j pair, Pbp(i, j), is predicted from a partition function calculation (McCaskill 1990; Mathews 2004) based on a biophysical model, the nearest-neighbor model (Xia et al. 1998; Mathews et al. 1999, 2004; Lu et al. 2006). The probability of nucleotide i being single-stranded is:

equation image

After base-pair probabilities are calculated with the partition function calculation, MaxExpect utilizes a dynamic programming algorithm to find the structure with maximum expected accuracy using Equation 1. This dynamic programming algorithm uses Nussinov-style recursions (Nussinov and Jacobson 1980), expanded to predict suboptimal solutions using the approach of Zuker (1989) and Steger et al. (1984). For a simpler set of Nussinov-style recursions that find only optimal solutions, the reader is referred to the prior work of Do et al. (2006) or Kiryu et al. (2007). Note also that there is a connection between the work of maximizing expected accuracy and recent work in finding pseudoknot free structures from pseudoknotted structures that retain the maximum number of pairs (Ponty 2006; Smit et al. 2008). For that work, the input to the same recursions used here is a probability of 1 for each pair that occurs in the pseudoknotted structure. The output is then the structure with the maximum number of non-pseudo-knotted pairs.

For an RNA sequence having N nucleotides, the following recursions compute the maximum expected accuracy with arrays W(i, j) and V(i, j):

equation image

where i is <j. W(i, j) is the maximum expected accuracy for a sequence fragment from nucleotides i to j, inclusive. V(i, j) is the maximum expected accuracy for a sequence fragment from nucleotides i to j, inclusive, given that i and j are base-paired. W(i, k) + W(k + 1, j) is computed for every k between i and j (j > ki) to identify multibranch loops. The maximum expected accuracy for the whole sequence between nucleotide at position 1 and nucleotide at position N is equal to W(1, N).

To facilitate suboptimal structure prediction, V′(i, j) and W′(i, j) are additionally computed. W′(i, j) is the maximum expected accuracy for a sequence fragment including nucleotides 1 to i and j to N. V′(i, j) is the maximum expected accuracy for a sequence fragment including nucleotides 1 to i and j to N, with i paired to j. The calculations of V′(i, j) and W′(i, j) are similar to V(i, j) and W(i, j).

Structures are determined using a trace-back procedure. The maximum expected accuracy possible for a structure, S, containing a pair between i and j is calculated according to:

equation image

Therefore, base pairs that are contained in structures with relatively high expected accuracy can be identified, and structures can be determined starting from a given pair of nucleotides that are in a structure with high expected accuracy. To ensure that predicted structures are sufficiently different from each other, a window parameter is used. When structures are generated, an array tracks the positions of base pairs that have been predicted. For an i–j pair, a square region is marked from nucleotides i − window to i + window and from j − window to j + window. When a new structure is determined, it is only provided to the user if it contains at least window base pairs that have not been found in previously marked regions. In the case where a window is specified by the user as zero, at least one novel base pair must be found. In this study, at most, 750 suboptimal structures are generated for each RNA sequence with a window size of 0.

Database of RNA secondary structures

Secondary structures determined by comparative analysis were used as known structures. The database includes various types of RNA as used in previous benchmarks (Mathews et al. 1999, 2004; Mathews 2004), including small-subunit (Gutell 1994) and large-subunit rRNA (Gutell et al. 1993; Schnare et al. 1996), 5S rRNA (Szymanski et al. 1998), Signal Recognition Particle RNA (Larsen et al. 1998), Group I introns (Waring and Davies 1984; Damberger and Gutell 1994), Group II introns (Michel et al. 1989), RNase P RNA (Brown 1998), and tRNA (Sprinzl et al. 1998). The small-subunit rRNA (16S) sequences are divided into domains as defined by Jaeger et al. (1989). The large-subunit rRNA (23S) sequences are divided into domains of fewer than 700 nucleotides each (Mathews et al. 1999). Moreover, chemically modified nucleotides in tRNA that cannot be accommodated in a canonical helix are forced single-stranded for all structure prediction methods, as has been done previously (Mathews et al. 1999).

Method of scoring prediction accuracy

Because of the uncertainty of base-pair matches in comparative analysis, a base pair is considered to be correctly predicted even if it is slipped by one nucleotide on one strand. A base pair between i and j is considered to be correct if any of the following base pairs are predicted: i and j, i and j − 1, i and j + 1, i − 1 and j, and i + 1 and j. Base pairs between i − 1 and j + 1 or between i + 1 and j − 1 are not considered to be correct. This scoring method also reflects the possibility of dynamic behavior of base-pairing. Sensitivity and PPV values calculated with this slippage scheme are generally 2%–3% higher than when calculated from the exact base-pairing scheme (Lu et al. 2006); a table of accuracies without allowing slippage is provided as Supplemental Table 2. Furthermore, although CONTRAfold also predicts noncanonical base pairs, they are not counted for scoring the prediction accuracy. When reporting average sensitivity and PPV for a type of the RNA (Table 1), the mean is reported over the sensitivity and PPV of each sequence.

Availability

The source code of the Linux version of MaxExpect method is downloadable from the Mathews lab website (http://rna.urmc.rochester.edu/). The algorithm is also embedded in the RNAstructure package for Microsoft Windows, which is available for download at the Mathews lab website.

SUPPLEMENTAL MATERIAL

Supplemental material can be found at http://www.rnajournal.org.

ACKNOWLEDGMENTS

We thank Ivo L. Hofacker for helpful discussions on the prediction of maximum expected accuracy structures and two anonymous reviewers for helpful comments. This study was supported by the National Institutes of Health through grant R01GM076485 to D.H.M.

Footnotes

Article published online ahead of print. Article and publication date are at http://www.rnajournal.org/cgi/doi/10.1261/rna.1643609.

NOTE ADDED IN PROOF

During the review of this manuscript, a study examining the use of thermodynamics in the prediction of maximum expected accuracy structures was published by Hamada et al. (2009).

REFERENCES

  • Andronescu M, Condon A, Hoos HH, Mathews DH, Murphy KP. Efficient parameter estimation for RNA secondary structure prediction. Bioinformatics. 2007;23:i19–i28. [PubMed]
  • Blose JM, Manni ML, Klapec KA, Stranger-Jones Y, Zyra AC, Sim V, Griffith CA, Long JD, Serra MJ. Non-nearest-neighbor dependence of the stability for RNA bulge loops based on the complete set of group I single-nucleotide bulge loops. Biochemistry. 2007;46:15123–15135. [PMC free article] [PubMed]
  • Bompfunewerer AF, Backofen R, Bernhart SH, Hertel J, Hofacker IL, Stadler PF, Will S. Variations on RNA folding and alignment: Lessons from Benasque. J Math Biol. 2008;56:129–144. [PubMed]
  • Brown JW. The ribonuclease P database. Nucleic Acids Res. 1998;26:351–352. [PMC free article] [PubMed]
  • Chen G, Znosko BM, Kennedy SD, Krugh TR, Turner DH. Solution structure of an RNA internal loop with three consecutive sheared GA pairs. Biochemistry. 2005;44:2845–2856. [PubMed]
  • Clanton-Arrowood K, McGurk J, Schroeder SJ. 3′ terminal nucleotides determine thermodynamic stabilities of mismatches at the ends of RNA helices. Biochemistry. 2008;47:13418–13427. [PubMed]
  • Damberger SH, Gutell RR. A comparative database of group I intron structures. Nucleic Acids Res. 1994;22:3508–3510. [PMC free article] [PubMed]
  • Davis AR, Znosko BM. Thermodynamic characterization of naturally occurring RNA single mismatches with G-U nearest neighbors. Biochemistry. 2008;47:10178–10187. [PubMed]
  • Diamond JM, Turner DH, Mathews DH. Thermodynamics of three-way multibranch loops in RNA. Biochemistry. 2001;40:6971–6981. [PubMed]
  • Ding Y, Lawrence CE. A statistical sampling algorithm for RNA secondary structure prediction. Nucleic Acids Res. 2003;31:7280–7301. [PMC free article] [PubMed]
  • Ding Y, Chan CY, Lawrence CE. RNA secondary structure prediction by centroids in a Boltzmann weighted ensemble. RNA. 2005;11:1157–1166. [PMC free article] [PubMed]
  • Do CB, Woods DA, Batzoglou S. CONTRAfold: RNA secondary structure prediction without physics-based models. Bioinformatics. 2006;22:e90–e98. [PubMed]
  • Dowell RD, Eddy SR. Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction. BMC Bioinformatics. 2004;5:71. doi: 10.1186/1471-2105-5-71. [PMC free article] [PubMed] [Cross Ref]
  • Eddy SR. Computational genomics of noncoding RNA genes. Cell. 2002;109:137–140. [PubMed]
  • Ehresmann C, Baudin F, Mougel M, Romby P, Ebel J, Ehresmann B. Probing the structure of RNAs in solution. Nucleic Acids Res. 1987;15:9109–9128. [PMC free article] [PubMed]
  • Griffiths-Jones S, Bateman A, Marshall M, Khanna A, Eddy SR. Rfam: An RNA family database. Nucleic Acids Res. 2003;31:439–441. [PMC free article] [PubMed]
  • Gutell RR. Collection of small subunit (16S- and 16S-like-) ribosomal RNA structures. Nucleic Acids Res. 1994;22:3502–3507. [PMC free article] [PubMed]
  • Gutell RR, Gray MW, Schnare MN. A compilation of large subunit (23S- and 23S-like) ribosomal RNA structures. Nucleic Acids Res. 1993;21:3055–3074. [PMC free article] [PubMed]
  • Hamada M, Kiryu H, Sato K, Mituyama T, Asai K. Prediction of RNA secondary structure using generalized centroid estimators. Bioinformatics. 2009;25:465–473. [PubMed]
  • Hofacker IL, Fekete M, Stadler PF. Secondary structure prediction for aligned RNA sequences. JMB. 2002;319:1059–1066. [PubMed]
  • Jaeger JA, Turner DH, Zuker M. Improved predictions of secondary structures for RNA. Proc Natl Acad Sci. 1989;86:7706–7710. [PMC free article] [PubMed]
  • Kiryu H, Kin T, Asai K. Robust prediction of consensus secondary structures using averaged base pairing probability matrices. Bioinformatics. 2007;23:434–441. [PubMed]
  • Knapp G. Enzymatic approaches to probing RNA secondary and tertiary structure. Methods Enzymol. 1989;180:192–212. [PubMed]
  • Larsen N, Samuelsson T, Zwieb C. The signal recognition particle database (SRPDB) Nucleic Acids Res. 1998;26:177–178. [PMC free article] [PubMed]
  • Layton DM, Bundschuh R. A statistical analysis of RNA folding algorithms through thermodynamic parameter perturbation. Nucleic Acids Res. 2005;33:519–524. [PMC free article] [PubMed]
  • Lu ZJ, Turner DH, Mathews DH. A set of nearest-neighbor parameters for predicting the enthalpy change of RNA secondary structure formation. Nucleic Acids Res. 2006;34:4912–4924. [PMC free article] [PubMed]
  • Mathews DH. Using an RNA secondary structure partition function to determine confidence in base pairs predicted by free energy minimization. RNA. 2004;10:1178–1190. [PMC free article] [PubMed]
  • Mathews DH, Turner DH. Experimentally derived nearest-neighbor parameters for the stability of RNA three- and four-way multibranch loops. Biochemistry. 2002;41:869–880. [PubMed]
  • Mathews DH, Turner DH. Prediction of RNA secondary structure by free energy minimization. Curr Opin Struct Biol. 2006;16:270–278. [PubMed]
  • Mathews DH, Sabina J, Zuker M, Turner DH. Expanded sequence dependence of thermodynamic parameters provides improved prediction of RNA Secondary Structure. J Mol Biol. 1999;288:911–940. [PubMed]
  • Mathews DH, Disney MD, Childs JL, Schroeder SJ, Zuker M, Turner DH. Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure. Proc Natl Acad Sci. 2004;101:7287–7292. [PMC free article] [PubMed]
  • McCaskill JS. The equilibrium partition function and base pair probabilities for RNA secondary structure. Biopolymers. 1990;29:1105–1119. [PubMed]
  • Michel F, Umesono K, Ozeki H. Comparative and functional anatomy of group II catalytic introns—a review. Gene. 1989;82:5–30. [PubMed]
  • Nussinov R, Jacobson AB. Fast algorithm for predicting the secondary structure of single-stranded RNA. Proc Natl Acad Sci. 1980;77:6309–6313. [PMC free article] [PubMed]
  • Parisien M, Major F. The MC-Fold and MC-Sym pipeline infers RNA structure from sequence data. Nature. 2008;452:51–55. [PubMed]
  • Ponty Y. Modélisation de séquences génomiques structurées, génération aléatoire et applications. Université Paris-Sud; Paris: 2006.
  • Schnare MN, Damberger SH, Gray MW, Gutell RR. Comprehensive comparison of structural characteristics in eukaryotic cytoplasmic large subunit (23S-like) ribosomal RNA. J Mol Biol. 1996;256:701–719. [PubMed]
  • Smit S, Rother K, Heringa J, Knight R. From knotted to nested RNA structures: A variety of computational methods for pseudoknot removal. RNA. 2008;14:410–416. [PMC free article] [PubMed]
  • Sprinzl M, Horn C, Brown M, Ioudovitch A, Steinberg S. Compilation of tRNA sequences and sequences of tRNA genes. Nucleic Acids Res. 1998;26:148–153. [PMC free article] [PubMed]
  • Steger G, Hofmann H, Fortsch J, Gross HJ, Randles JW, Sanger HL, Riesner D. Conformational transitions in viroids and virusoids: Comparison of results from energy minimization algorithm and from experimental data. J Biomol Struct Dyn. 1984;2:543–571. [PubMed]
  • Szymanski M, Specht T, Barciszewska MZ, Barciszewski J, Erdmann VA. 5S rRNA data bank. Nucleic Acids Res. 1998;26:156–159. [PMC free article] [PubMed]
  • Torarinsson E, Yao Z, Wiklund ED, Bramsen JB, Hansen C, Kjems J, Tommerup N, Ruzzo WL, Gorodkin J. Comparative genomics beyond sequence-based alignments: RNA structures in the ENCODE regions. Genome Res. 2008;18:242–251. [PMC free article] [PubMed]
  • Uzilov AV, Keegan JM, Mathews DH. Detection of noncoding RNAs on the basis of predicted secondary structure formation free energy change. BMC Bioinformatics. 2006;7:173. doi: 10.1186/1471-2105-7-173. [PMC free article] [PubMed] [Cross Ref]
  • Waring RB, Davies RW. Assessment of a model for intron RNA secondary structure relevant to RNA self-splicing—a review. Gene. 1984;28:277–291. [PubMed]
  • Washietl S, Hofacker IL, Stadler PF. Fast and reliable prediction of noncoding RNAs. Proc Natl Acad Sci. 2005;102:2454–2459. [PMC free article] [PubMed]
  • Xia T, SantaLucia J, Jr, Burkard ME, Kierzek R, Schroeder SJ, Jiao X, Cox C, Turner DH. Thermodynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes with Watson–Crick pairs. Biochemistry. 1998;37:14719–14735. [PubMed]
  • Zuker M. On finding all suboptimal foldings of an RNA molecule. Science. 1989;244:48–52. [PubMed]
  • Zwieb C, van Nues RW, Rosenblad MA, Brown JD, Samuelsson T. A nomenclature for all signal recognition particle RNAs. RNA. 2005;11:7–13. [PMC free article] [PubMed]

Articles from RNA are provided here courtesy of The RNA Society
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...