• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of rnaThe RNA SocietyeTOC AlertsSubscriptionsJournal HomeCSHL PressRNA
RNA. May 2005; 11(5): 578–591.
PMCID: PMC1370746

Structural RNA has lower folding energy than random RNA of the same dinucleotide frequency

Abstract

We present results of computer experiments that indicate that several RNAs for which the native state (minimum free energy secondary structure) is functionally important (type III hammerhead ribozymes, signal recognition particle RNAs, U2 small nucleolar spliceosomal RNAs, certain riboswitches, etc.) all have lower folding energy than random RNAs of the same length and dinucleotide frequency. Additionally, we find that whole mRNA as well as 5′-UTR, 3′-UTR, and cds regions of mRNA have folding energies comparable to that of random RNA, although there may be a statistically insignificant trace signal in 3′-UTR and cds regions. Various authors have used nucleotide (approximate) pattern matching and the computation of minimum free energy as filters to detect potential RNAs in ESTs and genomes. We introduce a new concept of the asymptotic Z-score and describe a fast, whole-genome scanning algorithm to compute asymptotic minimum free energy Z-scores of moving-window contents. Asymptotic Z-score computations offer another filter, to be used along with nucleotide pattern matching and minimum free energy computations, to detect potential functional RNAs in ESTs and genomic regions.

Keywords: tRNA, folding energy, RNA secondary structure, structural RNA, asymptotic Z-score

INTRODUCTION

In Le et al. (1990b), it was shown that RNA stem–loop structures situated 3′ to frameshift sites of retroviral gag–pol and pro–pol regions of several viruses (human immunodeficiency virus HIV-1, Rous sarcoma virus RSV, etc.) are thermodynamically stable and recognizable among positions 300 nt upstream and downstream of the frameshift site. Using Zuker’s algorithm4 (Zuker and Stiegler 1981; Mathews et al. 2000; Zuker 2003) to compute the minimum free energy (mfe) secondary structure for RNA, Le et al. (1990a) showed that certain RNAs have lower folding energy (i.e., minimum free energy of predicted secondary structure) than random RNA of the same mononucleotide (or compositional) frequency. This was measured by performing permutations (i.e., mononucleotide shuffles) of nucleotide positions, subsequently computing the Z-score5 of the minimum free energy (mfe) of real versus random RNA—see Materials and Methods for details.

In Seffens and Digby (1999), it was shown that the folding energy of mRNA is lower than that of random RNA of the same mononucleotide frequency, as measured by the Z-score of the mfe secondary structure of mRNA versus mononucleotide shuffles of mRNA. In Rivas and Eddy (2000), a moving-window, whole-genome scanning algorithm was developed to compute Z-scores of windows of a genome with respect to mononucleotide shuffles of the window contents. By constructing artificial data with samples of real RNA (RNase-P RNA, T5 tRNA, soy bean SSU, etc.) planted in the center of a background sequence of random RNA of the same compositional frequency, Rivas and Eddy (2000; see their Figs. 4 [triangle]–11 [triangle] [triangle] [triangle] [triangle] [triangle] [triangle]) found that the planted RNA had a low Z-score, as expected; however, other regions of the artificial data displayed low Z-scores as well, and by considering p-values for an assumed extreme value distribution, Rivas and Eddy subsequently argued that determining Z-scores of genomic window contents is statistically not reliable enough to allow one to construct an RNA gene finder on this basis.6

FIGURE 4.
Histograms of Z-scores of minimum free energy (mfe) of RNA riboswitch classes versus 1000 random RNAs of the same expected dinucleotide frequency using Algorithm 4. The curves, in left to right order, correspond to lysine riboswitch, ykoK element, cobalamin ...
FIGURE 5.
Histograms of Z-scores of minimum free energy (mfe) of RNA classes versus 1000 random RNAs of the same expected dinucleotide frequency using Algorithm 4. The curves, in left to right order, correspond to 530 tRNAs from Sprinzl’s database, whole ...
FIGURE 6.
Z-score and p-value correlation for nonstructural RNAs.
FIGURE 7.
Z-score and p-value correlation for structural RNAs.
FIGURE 8.
Z-score and p-value correlation for riboswitches.
FIGURE 9.
A plot of Z-scores and asymptotic Z-scores for 32-nt windows of artificial data obtained by planting SECIS element fruA CCUCGAGGGGAACCCGAAAGGGACCC GAGAGG in the middle of random RNA of compositional frequency A = 0.28125, C = 0.28125, G = 0.40625, and ...
FIGURE 10.
A plot of asymptotic Z-scores for 32 nt. Windows of artificial data obtained by planting SECIS element fruA at position 1000 in random RNA of compositional frequency A = 0.28125, C = 0.28125, G = 0.40625, and U = 0.03125 (i.e., of the same compositional ...

In Workman and Krogh (1999), it was noted that Zuker’s algorithm (Zuker and Stiegler 1981) computes secondary structure minimum free energy (mfe) by adding contributions of negative (stabilizing) energy terms for stacked base pairs and positive (destabilizing) energy terms for hairpin loops, bulges, internal loops, and multiloops. In Zuker’s algorithm, experimentally determined stacked base pair energies and loop energies for various lengths of hairpin, bulge, and internal loop are used, as determined by D. Turner’s lab (see Mathews et al. 1999). The energy term contributed by a base pair depends on the base pair (if any) upon which it is stacked; for instance, Turner’s current rules (Xia et al. 1998) at 37°C assign stacking free energy of −2.24 kcal/mol to

equation M1

of −3.26 kcal/mol to

equation M2

and of −2.08 kcal/mol to

equation M3

For this reason, Workman and Krogh (1999) argued that random RNA must be generated with the same dinucleotide frequency, for any valid conclusions to be drawn. Their experiments using mfold indicated that, in contrast to the results of Seffens and Digby (1999) mentioned above, mRNA does not have any statistically significant lower mfe than random RNA of the same dinucleotide frequency. This is consistent with the idea that mRNA exists in an ensemble of low-energy states, lacking any functional structure. Workman and Krogh additionally considered a small sample of five rRNAs and five tRNAs; for the latter, they stated, “Surprisingly, the tRNAs do not show a very clear difference between the native sequence and dinucleotide shuffled, and one of the native sequences even has a higher energy than the average of the shuffled ones” (Workman and Krogh 1999).

In this paper, we use Zuker’s algorithm as implemented in version 1.5 of Vienna RNA Package RNAfold (http://www.tbi.univie.ac.at/~ivo/RNA/) to compute minimum free energy for RNA sequences, and analyze the following RNA classes: tRNA, hammerhead type III ribozymes, SECIS7 elements, U1 and U2 small nuclear RNA (snRNA) components of the spliceosome, signal recognition particle RNA (srpRNA), seven classes of riboswitches (namely, Purine, Lysine, Cobalamin, THI element, S-box leader, RFN element, ykoK element), 5S ribosomal RNA, entire mRNA, as well as the 3′-UTR (untranslated region), 5′-UTR, and coding sequence (cds) of mRNA. Structural RNAs were chosen using information from the Rfam database (Griffiths-Jones et al. 2003) and the SCOR (Structural Classification of RNA) database (Klosterman et al. 2002). While Workman and Krogh (1999) use a heuristic to perform dinucleotide shuffle, their heuristic is not guaranteed to correctly sample random RNAs having a given number of dinucleotides, and thus we have implemented the provably correct procedure of Altschul and Erickson (1985). We provide both Python source code as well as a Web server for our implementation of the Altschul-Erikson algorithm8 (see http://clavius.bc.edu/~clotelab/). The work of the present paper validates the conclusion of Workman and Krogh (1999) concerning mRNA. Concerning their conclusion about tRNA, by using the database of 530 tRNAs (Sprinzl et al. 1998), where we generated 1000 random RNAs for each tRNA considered,9 we show that Z-scores for tRNA are low (~−1.5), although not as low as certain other classes of structural RNA (~−4), and that there is a statistically significant, although moderate signal in the Z-scores of tRNA from Sprinzl’s database, with p-values of ~0.12. See the related work of Bonnet et al. (2004), who investigate Z-scores and p-values10 of minimum free energy for precursor microRNAs.

Additionally, in this paper, we introduce the novel concept of the “asymptotic Z-score,” and by proving an asymptotic limit for the mean and standard deviation of minimum free energy per nucleotide for random RNA, we indicate how to perform certain precomputations that entail an enormous speed-up when computing asymptotic Z-scores for whole-genome, sliding-window scanning algorithms. This method provides a filter, which may be used along with (approximate) pattern matching, minimum free energy computations, and other filters, when attempting to determine putative functional RNA genes in ESTs and genomic data.

Various researchers have used a combination of filters to determine potential RNAs of interest. Kryukov et al. (1999) developed the program SECISearch, which uses PATSCAN (Dsouza et al. 1997) to filter for approximate matching nucleotide sequences for SECIS elements (e.g., there is a required AA dinucleotide in an internal loop region of the secondary structure of the SECIS element, as well as certain other nucleotide constraints). Subsequently, SECISearch uses Vienna RNA Package RNAfold to compute free energies related to the SECIS secondary structure. Lescure et al. (1999) developed a filter using the tool RNAMOT (Gautheret et al. 1990; Laferriere et al. 1994) to find approximate pattern matches in human ESTs for known SECIS stem–loop structure with certain nucleotide constraints. After experimentally validating the SECIS elements found in Lescure et al. (1999), the secondary structure of valid SECIS elements was found by chemical probing in Fagegaltier et al. (2000).

In Lim et al. (2003), vertebrate microRNA (miRNA) genes were found by devising a computational procedure, MiRscan, to identify potential miRNA genes. MicroRNAs (Harborth et al. 2003; Tuschl 2003) are 21-nt RNA sequences that form a known stem–loop secondary structure, are (approximately) the reverse complement of a portion of transcribed mRNA, and prevent the translation of protein product. MiRscan (Lim et al. 2003) involves a moving-window scan of 110-nt regions of the genome, and by using the Vienna RNA Package (C. Burge, pers. comm.), determines stem–loop structures, then assigns a log-likelihood score to each window to determine how well its attributes resemble those of certain experimentally verified miRNAs of Caenorhabditis elegans and Caenorhabditis briggsae homologs.

Klein et al. (2002) scanned for GC-rich regions in the AT-rich genomes of Methanococcus jannaschii and Pyrococcus furiosus to determine noncoding RNA genes. Recently, Hofacker et al. (2004) developed a fast, whole-genome version of RNAfold, which determines the minimum free energy structure of RNA from whole genomes, in which base-paired indices i, j are required to be of at most a user-specified distance (e.g., 100 nt). See the additional relevant work of Eddy (2001, 2002), Macke et al. (2001), and Washietl and Hofacker (2004).

Although Rivas and Eddy (2000) argued that genome scanning computations of Z-scores, in which randomized window contents preserve mononucleotide frequency (Algorithm 2), are not statistically significant enough to be used as a base for a general ncRNA gene finder, it is nevertheless possible that Z-score computations, in which randomized window contents preserve dinucleotide frequency (Algorithms 3 or 4), may be used as one of several filters to determine RNA of interest. Such Z-score computations, especially for large window size, are enormously time-consuming. Owing to a precomputation phase, asymptotic Z-scores, introduced in this paper, may provide a computationally efficient filter to identify certain RNA. In all of our computational experiments, asymptotic Z-scores, when compared to (classical) Z-scores, have substantially higher signal-to-noise ratio,11 although at present we have no understanding of why this is so.

RESULTS

As described in detail in Materials and Methods, we performed experiments on tRNA, SECIS elements, hammer-head type III ribozymes, and other structural RNAs, as well as whole mRNA and the cds, 5′-UTR and 3′-UTR regions of mRNA. For each RNA sequence s from a given class (e.g., tRNA), we compute the minimum free energy of s, as well as that of a large number of random RNA having the same expected (Algorithm 3) or the same exact (Algorithm 4) dinucleotide frequency as that of s. From these data, we compute the Z-score (number of standard deviation units to the right or left of the mean) for each RNA sequence, and produce histograms summarized in Tables 1 [triangle] and 2 [triangle] and related figures.

TABLE 1.
Z-score statistics for structural RNA compared to random RNA of the same expected dinucleotide frequency using Algorithm 3
TABLE 2.
Z-score and p-value statistics for structural RNA compared to random RNA of the same dinucleotide frequency using Algorithm 4

Tables 1 [triangle] and 2 [triangle] give details on the number of sequences, mean, standard deviation, maximum and minimum Z-scores12 for each investigated class of RNA. For Table 1 [triangle], we computed Z-scores with respect to random RNA of the same expected dinucleotide frequency, using Algorithm 3, while in Table 2 [triangle] we computed Z-scores with respect to random RNA of the same (exact) dinucleotide frequency using the provably correct Altschul-Erikson Algorithm 4. Since we correct an assertion of Workman and Krogh (1999) concerning tRNA, we implemented their method of computing p-values and list in Table 2 [triangle] the p-values for all investigated classes of RNA.

All classes of structurally important RNA, which we investigate (with the exception of the TPP riboswitch–THI element), show a significantly lower folding energy than random RNAs of the same dinucleotide frequency, using both Algorithms 3 and 4. In contrast, for entire mRNA, as well as in 5′-UTR, 3′-UTR, and cds of mRNA, the folding energy is approximately that of random RNA of the same (both expected and exact) dinucleotide frequency. Figures 1 [triangle] and 2 [triangle] present histograms of Z-score data for all RNA classes, where Z-scores were computed with respect to random RNA of the same expected dinucleotide frequency as generated by Algorithm 3. Figures 3 [triangle] and 5 [triangle] present similar histograms, differing only in that Z-scores were computed with respect to random RNA as computed by Algorithm 3 in the former and by Algorithm 4 in the latter. Figure 4 [triangle] presents histograms of Z-score data for the seven classes of riboswitches that are found in the current Rfam release, using Algorithm 4. As an additional test of our assertion that structural RNA13 has lower folding energy than random RNA of the same dinucleotide frequency (as generated by Algorithm 4), Figure 6 [triangle] graphs p-scores against Z-scores for nonstructural RNA, while Figures 7 [triangle] and 8 [triangle] graph p-scores against Z-scores for structural RNAs and for the seven classes of riboswitches in the current release of Rfam, respectively. Note that Figure 6 [triangle] is similar to Figure 2 [triangle] of Workman and Krogh (1999), although we additionally compute separate Z-scores for 5′-UTR, 3′-UTR, and cds regions of mRNA as well as whole mRNA, and we use the Altschul-Erikson algorithm to generate random RNA. Figures 7 [triangle] and 8 [triangle] furnish additional evidence that tRNA and other structural RNA has lower folding energy than random RNA of the same dinucleotide frequency. A Web server and Python source code for our implementation of this algorithm is available at the previously given Clote Lab Web site. We are currently computing Z-scores and p-values for all of Rfam. When completed, results will be summarized on this Web site.

FIGURE 1.
Histograms of Z-scores of minimum free energy (mfe) of RNA classes versus 1000 random RNAs of the same expected dinucleotide frequency using Algorithm 3. The curves, left to right order, correspond to signal recognition particle (srp) RNA, U2 small nucleolar ...
FIGURE 2.
Histograms of Z-scores of minimum free energy (mfe) of RNA classes versus 1000 random RNAs of the same expected dinucleotide frequency using Algorithm 3. The curves, in left to right order, correspond to 530 tRNAs from Sprinzl’s database, and ...
FIGURE 3.
Histograms of Z-scores of minimum free energy (mfe) of RNA classes versus 1000 random RNAs of the same expected dinucleotide frequency using Algorithm 4. The curves, in left to right order, correspond to U2 small nucleolar particle, signal recognition ...

In the Results section (explained in more detail in Materials and Methods), we introduce the new concept of “asymptotic Z-score” and state a new theorem, whose proof is given in the Appendix. This theorem postulates that for every complete set of dinucleotide frequencies q>→xy, there exist values μ(q>→here exist values μ(q→xy) (asymptotic mean minimum free energy per nucleotide) and σ(q>→xy) (asymptotic standard deviation of minimum free energy per nucleotide), with the following properties. If x0, x1, x2, … is a sequence of random variables generated by a first-order Markov process from the di-nucleotide frequencies q>→xy, then the limits

equation M4

and

equation M5

both exist and depend only on q>→xy.

We can now pre-compute a table of values μ(q>→ xy) and σ( q>→ xy) for all complete sets q>→ xy of dinucleotide frequencies, where dinucleotide frequencies are specified up to (say) two decimal places. Given RNA nucleotide sequence a1, …, an, compute the dinucleotide frequencies q>→ xy of a1, …, an. The asymptotic minimum free energy Z-score, defined by

An external file that holds a picture, illustration, etc.
Object name is 578_EQ1.jpg

can be computed by one application of Zuker’s algorithm with input a1, …, an, together with table look-up of the pre-computed (approximations) of μ(q>→ xy), σ(q>→ xy). Figure 9 [triangle] displays both Z-scores and asymptotic Z-scores for all windows of size 32 in the artificial genome constructed by planting RNA SECIS element fruA in the middle of random RNA of the same expected mononucleotide frequency. In this figure, Z-scores were computed using the Altschul-Erikson dinucleotide shuffle, Algorithm 4, and asymptotic Z-scores were computed by Algorithm 7. Note that although we are unsure why this is the case, there is a greatly improved signal-to-noise ratio in using asymptotic Z-scores compared to Z-scores.

DISCUSSION

In Seffens and Digby (1999), it was observed that mRNA has lower folding energy than random RNA of the same mononucleotide frequency, which latter is obtained by permuting nucleotide positions. Later, Workman and Krogh (1999) made an important observation that preserving dinucleotide frequency is critical, because of the nature of base-stacking free energies, and that mRNA cannot be distinguished from random RNA of the same dinucleotide frequency with respect to folding energy. Workman and Krogh additionally asserted that it appeared, according to their limited data set of five tRNAs, that the same was true of tRNA.

Our computation of both Z-scores and p-scores on the much larger data set of 530 tRNAs from the tRNA database of M. Sprinzl, K.S. Vassilenko, J. Emmerich, and F. Bauer, at URL http://www.staff.uni-bayreuth.de/~btc914/search/, indicates that tRNAs from Sprinzl’s database have lower Z-scores than random RNA of the same dinucleotide frequency, although the p-value is only around 0.12. More generally, by considering tRNAs, type III hammerhead ribozymes, SECIS sequences, srpRNAs, snRNAs, and so on, we show that many important classes of structural RNA have lower folding energy than random RNA of the same dinucleotide frequency. Our careful tabulation of Z-scores may prove useful in future work involving a moving-window, genome-scanning algorithm, where one might attempt to detect particular structural RNA by looking at regions whose Z-score is close to that listed in Table 2 [triangle].

It is known that tRNA has certain modified nucleotides; for example, aspartyl tRNA from Saccharomyces cerevisiae with PDB identity number 1ASY includes two dihydrouridines, three pseudouridines, one 5-methylcytidine, and one 1-methylguanosine. For this paper, we replaced all modified nucleotides as annotated in Sprinzl’s database by unmodified nucleotides (e.g., dihydrouridine is replaced by uridine) and subsequently applied RNAfold to the resulting tRNA sequences. It seems likely that computed energies of tRNA might differ from their experimentally determined energies, and that such a discrepancy would similarly influence predicted energies of randomizations of tRNA. This might explain the relatively high Z-scores and p-values of tRNA, when compared to other structural RNA classes.

While Workman and Krogh (1999) had considered whole mRNA, we additionally considered 5′-UTR, 3′-UTR, and cds of the same mRNA analyzed in those investigated by Workman and Krogh. Tables 1 [triangle] and 2 [triangle] provide evidence that these mRNA subclasses do not have lower folding energy than random RNA of the same dinucleotide frequency, although it should be noted that Table 2 [triangle] shows negative Z-scores of −0.111613 (respectively, −0.132962) for 3′-UTR (respectively, cds) of mRNA, suggesting a slightly discernable signal in both the 3′-UTR and cds of mRNA (for a recent review, see Wilkie et al. 2003). A possible explanation for the statistically insignificant signal in the 3′-UTR, which contains regulatory elements, is that these structural, regulatory elements are short and dispersed in the UTR, which in many cases may be very long.

Moreover, we present evidence that riboswitches (metabolites binding domains that are found within certain messenger RNAs) have lower folding energy than random RNA of the same dinucleotide frequency similarly to structural RNAs, with the only exception of the THI element (TPP riboswitch). The TPP riboswitch is found in the 5′ region of mRNAs of genes involved in thiamine biosynthesis and transport (Miranda-Rios et al. 2001), and is able to bind thiamine and its pyrophosphate derivatives (Winkler et al. 2002), resulting in the reduction of translation. The interaction with thiamine is thought to be dependent on the secondary structure assumed by this riboswitch; therefore, the Z-score close to zero of this class of riboswitches is unexpected, and we do not have any valid argument to justify this observation.

Figures 1 [triangle]–5 [triangle] [triangle] [triangle] [triangle] present superposed histograms of Z-scores for the RNAs analyzed. The general trend is a shift toward negative values in the curves associated with structural RNAs; Z-score curves obtained using both Algorithms 3 and 4 are quite similar, although the small discrepancy between algorithms in the case of 3′-UTR regions of mRNA suggests that one should prefer the use of Algorithm 4, if possible.

The work of Seffens and Digby (1999) and of Workman and Krogh (1999) together provide strong evidence that the mononucleotide shuffle, Algorithm 2, and the 0-order Markov chain, Algorithm 1, should never be used when computing Z-scores. The slight discrepancy between Tables 1 [triangle] and 2 [triangle] for 3′-UTR regions of mRNA suggests that Algorithm 4 should be used if possible over Algorithm 3, when computing Z-scores.

Additionally, based on new mathematical results concerning asymptotic comportment of random RNA (see the Appendix), we define the concept of “asymptotic Z-score” (see Definition 6 in Materials and Methods) and show how to radically reduce the computation time for moving-window, whole-genome algorithms that compute Z-scores of window contents. Rather than computing Z-scores on the fly for each window’s randomized contents, we use table look-up for precomputed asymptotic Z-scores and call Zuker’s algorithm only once, rather than tens or hundreds of times, per window. Our approach, combined with the O(NL2) genome-scanning version14 of Vienna RNA Package RNAfold (Hofacker et al. 2004), permits O(NL2) genome-scanning asymptotic Z-score computations of whole genomes.15

Asymptotic Z-scores are computed with respect to large random RNA sequences (in the current paper, we used sequences of length 1000 nt) of the same expected dinucleotide frequency as that of window contents using Algorithm 3, unlike computations of Z-scores in Le et al. (1990a), Seffens and Digby (1999), and Rivas and Eddy (2000), which used random RNA sequences of the same size as that of the moving window, generated by Algorithm 2. Although we have no explanation at the present, in all cases we have observed a greater signal-to-noise ratio in using asymptotic Z-scores to detect RNA genes (data not shown). This is, indeed, the case for Figure 9 [triangle], which plots Z-scores and asymptotic Z-scores for 32-nt windows of artificial data obtained by planting SECIS element fruA CCUCGAGGGGAACCCGAAAGGGAC CCGAGAGG in the middle of random RNA of compositional frequency A = 0.28125, C = 0.28125, G = 0.40625, and U = 0.03125 (i.e., of same compositional frequency as that of fruA). Our preliminary work on the asymptotic Z-score raises the hope of effectively using this approach along with other heuristic filters to detect RNA of interest.

MATERIALS AND METHODS

For expository reasons, in this section, we describe the computer experiments we performed for tRNA. Additional experiments on mRNA, SECIS elements, hammerhead type III ribozymes, and so on were set up identically. Unless otherwise stated, we generated 1000 random RNAs per (real) RNA sequence, for each experiment. Using the mono- and dinucleotide frequencies for tRNA from Table 1 [triangle], we generated random RNAs for each of the 530 tRNAs in the database of Sprinzl et al. (1998) according to two methods, which we respectively dub First-order Markov (Algorithm 3) and Dinucleotide Shuffle (Algorithm 4), and computed the mfe using RNAfold. The method First-order Markov generates random RNAs as a first-order Markov chain, and was considered in Workman and Krogh (1999), although it is unclear whether they generated the first nucleotide using sampling (as we do), or using the uniform probability of A, C, G, and U.

Algorithm 1 (sampling from 0-order Markov chain)

INPUT: An RNA sequence a = a1, …, an

OUTPUT: An RNA sequence x1, …, xn of the same expected mononucleotide frequency as a1, …, an

1. Compute the mononucleotide frequency F1(a) of a = a1, …, an; thus, F1(a)[A] = qA, F1(a)[C] = qC, F1(a)[G] = qG, F1(a)[U] = qU.

2. for i = 1 to n

x = random in (0,1)

if x < qA return ‘A’

else if x < qA + qC return ‘C’

else if x < qA + qC + qG return ‘G’

else return ‘U’

In their computation of Z-scores, Rivas and Eddy (2000) considered the following mononucleotide shuffle.

Algorithm 2 (Mononucleotide Shuffle)

INPUT: An RNA sequence a1, …, an

OUTPUT: An RNA sequence x1, …, xn of the same (exact) mononucleotide frequency as a1, …, an

1. generate a random permutation σ ε Sn

for i = 1 to n

xi = aσ(i)

Recall that Seffens and Digby (1999) observed negative Z-scores having large absolute value, when computing Z-scores of mRNA using Algorithm 2, while Workman and Krogh (1999) computed Z-scores approximately equal to 0 when computing Z-scores of mRNA using Algorithm 3.

Algorithm 3 (sampling from first-order Markov chain)

INPUT: An RNA sequence a1, …, an

OUTPUT: An RNA sequence x1, …, xn of the same expected dinucleotide frequency as a1, …, an

  1. Compute the mono- and dinucleotide frequency of a1, …, an.
  2. Generate x1 by sampling from mononucleotide frequency.
  3. Generate remaining nucleotides x2, …, xn by sampling from the conditional probabilities Pr[Y |X], where Pr[Y |X] equals the dinucleotide frequency that nucleotide Y follows X divided by the mononucleotide frequency of nucleotide X.

Algorithm 4 (Dinucleotide Shuffle)

Altschul and Erickson (1985).

INPUT: An RNA sequence a1, …, an

OUTPUT: An RNA sequence x1, …, xn of the same (exact) dinucleotide frequency as a1, …, an, where x1 = a1, xn = an; moreover, the Altschul-Erikson algorithm even produces the same number of dinucleotides of each type AA, AC, AG, AU, CA, CC, etc.

  1. For each nucleotide x ε{A, C, G, U}, create a list Lx of edges xy such that the dinucleotide xy occurs in the input RNA.
  2. For each nucleotide x ε{A, C, G, U} distinct from the last nucleotide xn, randomly choose an edge from the list Lx. Let E be the set of chosen edges (note that E contains at most three elements).
  3. Let G be the graph, whose edge set is E and whose vertex set consists of those nucleotides x, y such that xy is an edge in E. If there is a vertex of G that is not connected to the last nucleotide an, then return to (2).
  4. For each nucleotide x ε{A, C, G, U}, permute the edges in LxE. Append to the end of each Lx any edges from E that had been removed.
  5. For i = 1 to n − 1, generate xi + 1 by taking the next available nucleotide such that xixi + 1 belongs to the list Lxi.

The proof of correctness of the Altschul-Erikson dinucleotide shuffle algorithm depends on well-known criteria for the existence of an Euler tour in a directed graph. See Altschul and Erickson (1985) for details of Algorithm 4 and its extensions.

Before describing our experiments, we need to recall that the Z-score of a number x with respect to a sequence s1, …, sN of numbers is defined by (x − μ)/σ, where μ, respectively σ, is the average respective standard deviation of s1, …, sN. In (Workman and Krogh 1999), p-values associated with Z-scores are computed as the ratio N/D, where the numerator N is the number of Z-scores of random RNAs that exceed the Z-score of a fixed mRNA, and D is the number of Z-scores considered (see Workman and Krogh 1999 for details and an explicit graph of Z-scores vs. p-values for mRNA). Following the method of Workman and Krogh, we compute p-values and plot Z-scores and associated p-values for all classes of RNA investigated, where random RNA sequences were obtained by the Altschul-Erikson method.

We now describe our experiments. Lengths in Sprinzl’s collection (Sprinzl et al. 1998) of 530 tRNAs range from 54 to 95. For each tRNA, we generated 1000 random RNAs of the same expected dinucleotide frequency (using Algorithm 3) and 1000 random RNAs of the same dinucleotide frequency (using Algorithm 4). For each tRNA, we computed the Z-score of its minimum free energy (mfe) using version 1.5 of Vienna RNA Package RNAfold with respect to the mfe of the corresponding 1000 random RNAs, separately using Algorithm 3 and Algorithm 4 to generate the random sequences. We followed the same procedure for each class of RNA we investigated: 530 tRNAs from Sprinzl’s database, five SECIS elements from A. Böck of Ludwig-Maximilians-Universität München (pers. comm.), 114 hammerhead type III ribozymes, 53 U1 and 62 U2 small nucleolar spliceosomal RNAs, 94 signal recognition particle RNAs (srpRNAs), seven classes of riboswitches (namely, 70 S-box leaders, 48 RFN elements, 141 THI elements, 37 purine riboswitches, 48 lysine riboswitches, 82 cobalamin riboswitches, 40 ykoK elements), and 100 5S rRNAs. The hammerhead ribozymes, U1, U2, srpRNAs, and riboswitches sequences were taken from their respective Rfam seed alignment (Griffiths-Jones et al. 2003). The 100 5S rRNAs were sampled randomly from the very large Rfam seed alignment. Moreover, we considered the same mRNAs previously considered by Seffens and Digby (1999) and Workman and Krogh (1999); here, owing to the sequence length of mRNAs, we generated only 10 random RNAs per mRNA. Seffens and Digby considered 51 mRNAs.

Workman and Krogh considered a subset of 46 mRNAs, previously investigated in Seffens and Digby (1999), and explained their reasons for not including five spurious mRNAs considered by Seffens and Digby. We were not able to find five of these mRNAs in the latest GenBank release (namely, HUMIFNAB, HUMIFNAC, HUMIFNAH, SOYCHPI, XELSRBP); therefore, we included in the analysis 41 mRNAs, for which we considered the whole-length mRNA, and separately the untranslated regions (3′-UTR and 5′-UTR) and the coding sequence (cds) alone.

We now describe a new concept of “asymptotic Z-score,” motivated by a new theorem concerning an asymptotic limit result for the mean and standard deviation of minimum free energy per nucleotide for random RNA. This result, formalized in Theorem 5, is proved in detail in the Appendix.

Let F2 = {qxy: x, y ε{A, C, G, U}} be any complete set of dinucleotide frequencies; that is, 0 ≤qxy ≤ 1 for all x, y ε{A, C, G, U} and ∑x,yqxy = 1, where the sum is taken over all x, y ε{A, C, G, U}. Define F1 = {qx: x ε{A, C, G, U}} to be the corresponding set of mononucleotide frequencies; that is, qx = ∑uqux, where the sum ranges over u ε{A, C, G, U}. We may at times say that the mononucleotide distribution F1 is induced by the complete dinucleotide distribution F2; moreover, we may use the notation q>→xy to abbreviate F2, and q>→x to abbreviate F1.

Theorem 5

Let q>→xy be a complete set of dinucleotide frequencies, let q>→x be the induced set of mononucleotide frequencies, and let X denote the infinite sequence of random variables x0, x1, x2, … such that x0 has the distribution q>→x, and for all i, xi + 1 has the distribution given by the conditional probabilities

equation M6

For all 0 ≤ st, define random variables Xs, t = mfe(xs, …, xt − 1), where mfe denotes minimum free energy as measured by Zuker’s algorithm. Then the limits

equation M7

and

equation M8

both exist and depend only on q>→xy.

Although the proof gives no information on the rate of convergence, convergence appears to be fast (data not shown), and hence we can compute an approximation for the asymptotic mean, denoted by μ(q>→xy), [respectively standard deviation, denoted by σ (q>→xy)] per nucleotide of the minimum free energy of random RNA generated by a first-order Markov chain from dinucleotide frequencies q>→xy.

  1. Compute minimum free energies for m random RNAs, each of length n nucleotides, as generated by Algorithm 3. In Figure 9 [triangle], we used m = 50 and n = 1000.
  2. Compute the mean and (sample) standard deviation for this collection, and divide both values by n so as to normalize these values with respect to sequence length.

Since m, n must be fixed for this computation, we denote the approximate mean by μ(q>→xy, m, n), and the approximate standard deviation by σ(q>→xy, m, n). Thus, if s1, …, sm is a collection of m random RNA sequences, each s1 has length n and is generated by Algorithm 3 from dinucleotide frequencies q>→xy, then

equation M9
equation M10

We now define as follows the “asymptotic, normalized mfe Z-score,” with respect to random RNA of dinucleotide frequencies qxy: Given RNA sequence s of length n0 (generally n0 is much less than n), compute the dinucleotide frequencies qxy of s, and define

An external file that holds a picture, illustration, etc.
Object name is 578_EQ2.jpg

Notice that when n0 = n, we obtain the usual definition of Z-score, where randomization is performed with Algorithm 3.

As noted above, one should respect di-nucleotide frequencies when performing Z-score computations. Taking this into account, we now define the “asymptotic, normalized mfe Z-score,” with respect to random RNA of dinucleotide frequency qxy, as follows.

Definition 6

Given RNA sequence s of length n0 (generally n0 [double less-than sign]n), compute the dinucleotide frequencies qxy of s. Define

equation M11

This concludes the description of asymptotic Z-scores. Figure 9 [triangle] illustrates the approach on small artificial data involving the SECIS element fruA. In future work, we plan to make available pre-computed tables of μ(qxy, m, n), σ(qxy, m, n) for n = 1000, m = 50 over a range of dinucleotide frequencies found in windows of viral and bacterial genomes. Although not yet available, we can now describe an algorithm to efficiently compute asymptotic Z-scores in a moving-window scanning algorithm on a whole genome.

Algorithm 7

INPUT: An entire genome g1, …, gN, and window size n0

OUTPUT: Values (i, zi), where 1 ≤i ≤ N − n0 + 1 is the starting position for the i-th window, and zi is the asymptotic Z-score of the (reverse complement) of the i-th window

for i = 1 to Nn0 + 1

s = reverse complement of gi, . . . , gi+n0−1

compute mfe (s)

compute dinucleotide frequencies qxy of s

for x,y ε{A, C, G, U}

qxy = int(100 *qxy)/100

find μ(qxy, m, n), σ(qxy, m, n) by table look – up

equation M12

Note that the instruction qxy = int(100 * qxy)/100 truncates each dinucleotide frequency qxy to two decimal places. By using arrays with indirect addressing, table look-up does not require linear or logarithmic time, but rather unit time. Since Zuker’s algorithm is applied only once, for each window, the run time of Algorithm 7 is O(Nn0 3). By using the genome-scan version of RNAfold (see Hofacker et al. 2004), we can reduce the run time of Algorithm 7 to O(Nn0 2).

Acknowledgments

We thank the anonymous referees and Alice Tommasi di Vignano (Harvard Medical School) for helpful suggestions concerning this work. Research was supported in part by NSERC (Natural Sciences and Engineering Research Council of Canada) and MITACS (Mathematics of Information Technology and Complex Systems) grants.

APPENDIX

In this section, we state and prove Theorem 5, which provides the mathematical justification for our algorithm to compute (approximate) asymptotic Z-scores. The following theorem, due to Kingman (1973), provides the existence of a limit for certain types of subadditive stochastic processes.

Theorem 8 (Kingman 1973)

Let Xs, t, for nonnegative integers 0 ≤ st, denote a family of doubly indexed random variables that satisfy the following.

  1. Xs, tXs, r + Xr, t for all s < r < t.
  2. The joint distribution of Xs, t is the same as that of Xs + 1,t + 1 for all 0 ≤ st.
  3. There exists K < 0 such that the expectation E[X0, n] = μn exists and satisfies μnKn, for all natural numbers n.

Then there exists λ, for which limn → ∞E[X0, n]/n = λ.

Kingman’s theorem has applications ranging from Ulam’s problem concerning the asymptotic expected length of the longest increasing sequence [i.e., 1 ≤ i1 < i2 < … < ikn such that σ(i1) < σ(i2 < … < σ(ik)] in a random permutation σ ε Sn (Kingman 1973), to problems concerning restriction enzyme coverage (Waterman 1995). While Kingman’s theorem proves the existence of an asymptotic limit λ, it can be a very difficult open problem to determine the precise value of λ for concrete cases.

Let q>→xy denote any complete set {qxy: x, y ε {A, C, G, U}} of dinucleotide frequencies; that is, 0 ≤ qxy ≤1 for all x, y ε {A, C, G, U} and ∑x, yqxy = 1, where the sum is taken over all x, y ε{A, C, G, U}. Define q>→x to denote the set {qx: x ε {A, C, G, U}} of induced mononucleotide frequencies; that is, qx = ∑uqux, where the sum ranges over u ε {A, C, G, U}. We say that the mono-nucleotide distribution qq>→ is induced from the complete dinucleotide distribution q>→xy.

Theorem 9

Let q>→xy be a complete set of dinucleotide frequencies, let q>→x be the induced set of mono-nucleotide frequencies, and let X denote the infinite sequence of random variables x0, x1, x2, … such that x0 has the distribution q>→x, and for all i, xi + 1 has the distribution given by the conditional probabilities

equation M13

For all 0 ≤ st, define random variables Xs, t = mfe(xs, …, xt−1), where mfe denotes minimum free energy as measured by Zuker’s algorithm. Then the limits

equation M14

and

equation M15

both exist and depend only on q→xy.

PROOF: To prove the existence of the first limit stated in Theorem 9, we claim that the collection of doubly indexed random variables Xs, t satisfies the three conditions of Kingman’s subadditive ergodicity Theorem 8.

By analysis of the pseudocode of Zuker’s algorithm, it is clear that minimum free energy of RNA is subadditive, and hence condition (1) holds. Indeed, in the Turner energy model (Mathews et al. 1999), stacking free energies and loop energies are additive, hence the minimum free energy of the concatenation xs, …, xt−1 of subsequence xs, …, xu−1 and subsequence xu, …, xt−1 satisfies mfe(xs, …, xt−1) ≤ xs, …, xu−1 + mfe(xu, …, xt−1). Here is a concrete example:

equation M16

To show that condition (2) holds, we first claim that for all non-negative integers s, Pr[xs = x] = Pr[x0 = x] = qx, for any given x ε {A, C, G, U}. This is done by induction on s. When s = 0, this is by definition of x0. Assume that Pr[xs = x] = Pr[x0 = x] = qx, and consider xs+1. Then

equation M17

where the last equality follows from the definition of induced mononucleotide frequency qx. It thus follows by induction that Pr[xs = u] = qu, for all natural numbers s and all u ε {A, C, G, U}. Since the sequence x0, x1, x2, … of random variables follows a first-order Markov condition, clearly Pr[xs + 1 = yxs = x] = Pr[xs′ + 1 = yxs= x] holds for all natural numbers s, s′, and thus by induction on n, we have

equation M18

and hence the doubly indexed random variable Xs,t has the same joint distribution as that of Xs+1,t+1, for all natural numbers 0 ≤ st. Thus, condition (2) of Kingman’s theorem is satisfied.

We now turn to establish condition (3) of Kingman’s theorem. For fixed n, E[X0, n] = μn must exist, since the sample space Ω = {A, C, G, U} is finite, all probability distributions for fixed n are finite, and we consider only finitely many random variables x0, …, xn. Let K0 be the minimum value, −3.42 kcal/mol, over all base stacking free energies from Turner’s current rules (Xia et al. 1998)—for example, see “Stacking enthalpies in kcal/mol” from M. Zuker’s Web site, http://www.bioinfo.rpi.edu/~zukerm/rna/energy/. Note that base stacking free energies are all negative; hence, we are choosing that base stacking free energy whose absolute value is largest. Except for the (negative) base stacking free energies, all other energies (hairpin, bulge, internal loop, multiloop) are positive. The nearest neighbor energy model with Turner’s experimentally measured energies (Mathews et al. 1999) is additive, and there are at most n/2 base pairs in an RNA sequence of length n + 1 (going from 0 to n); hence, K0n/2 ≤ μn for all n. It follows that (3) holds, and hence the existence of limit

equation M19

depending only on q>→xy follows by application of Kingman’s theorem.

To prove the existence of the second limit stated in Theorem 9, let K = 3.42 = −K0, and define random variables Zs, t = K(ts) + Xs, t, and

equation M20

for all 0 ≤ st. We will show that the collection Ys, t, for all 0 ≤ st, satisfies conditions (1), (2), and (3) of Kingman’s ergodicity theorem. To prove the subadditivity condition (1), that is, that Ys, tYs, r + Yr, t for all 0 ≤ srt, fix 0 ≤ srt, and temporarily let

equation M21

Now

equation M22

Replacing B, C, m, n by the values they denote, we have shown that

equation M23

Since we have already established that Xs, tXs, r + Xr, t, it follows that K(ts) + Xs, tK(rs) + Xs, r + K(tr) + Xr, t; hence Zs, tZs, r + Zr, t. Since Zs, t ≥ 0, Zs, r ≥ 0, Zr, t ≥ 0, it follows that Z2s, t ≥ (Zs, r + Zr, t)2· 16 Thus

equation M24

and hence Ys, tYs, r + Yr, t. This establishes subadditivity condition (1).

The proof that the joint distribution of Ys, t is the same as that of Ys+1,t+1 for all 0 ≤ st is as in our treatment of Xs,t and Xs+1,t+1. This establishes condition (2) of Kingman’s theorem.

Finally, since

equation M25

condition (3) of Kingman’s theorem holds, thus by application of Kingman’s theorem, it follows that the limit

equation M26

exists and depends only on complete dinucleotide frequencies q>→ xy. Note that

equation M27

Define λ( q>→xy) = ζ(qxy) − K2 − 2Kμ(q>→xy). It follows that

equation M28

Now the variance of X0, n satisfies Var[X0, n] = E[X0, 2 n] − (E[X0, n])2, thus dividing by n2 and taking square roots of both sides of the equality, we have

equation M29

This completes the proof of Theorem 9.

Notes

Footnotes

4Zuker’s algorithm was first implemented in Zuker’s mfold, subsequently in Hofacker et al.’s Vienna RNA Package RNAfold, and most recently in Mathews and Turner’s RNAstructure.

5The Z-score of x (with respect to a histogram or probability distribution) is the number of standard deviation units to the left or right of the mean for the position where x lies, that is, (x − μ)/σ.

6Figures 12 and 13 of Rivas and Eddy (2000) are similar to some of the graphs presented in this paper; however, unlike our work, Rivas and Eddy (2000) use mononucleotide shuffles to produce random sequences. As previously observed in Workman and Krogh (1999) when computing Z-scores for minimum free energies of RNA, it is important to generate random sequences that preserve dinucleotide frequency of the given RNA. Our work presents a careful analysis of a large class of RNAs using the dinucleotide shuffling Algorithm 4.

7SECIS abbreviates “selenocysteine insertion sequence,” a small (30–45 nt) portion of the 3′-UTR that forms a stem–loop structure necessary for the UGA stop codon to be retranslated to allow selenocysteine incorporation.

8After completion of this paper, we learned of the more general Web server Shufflet (Coward 1999).

9The work of Workman and Krogh (1999) focuses on mRNA, and only at the end of their article do they consider a small collection of five tRNAs, where 100 random RNAs are generated per tRNA.

10Bonnet et al. (2004) compute p-values of minimum free energy not not p-values of Z-scores as done in this paper.

11Average Z-scores have value 0, while average asymptotic Z-scores are >0, making a greater contrast with negative scores of functional RNA in computational experiments.

12Z-score is often used as a statistical measure of deviation from the mean in units of standard deviation. See Materials and Methods for formal definition.

13By structural RNA, we mean naturally occurring classes of RNA whose functionality depends on the native state, where we identify the native state with the minimum free energy secondary structure if the structure is not experimentally determined.

15In this paper, we present a proof of concept. In work in progress, we are computing dinucleotide frequencies, within two decimal places, of viral and bacterial genomes and are computing tables necessary for a general application of our method, to be reported elsewhere.

14For a genome of length N, successive applications of Zuker’s algorithm to window contents of size L require time O(NL3). By re-using partial computations from previous window contents, Hofacker et al. (2004) describe an improvement to O(NL2).

16In order to obtain this last inequality, we needed Zs, t ≥ 0. This is the reason for working with Zs, t, rather than Xs, t.

REFERENCES

  • Altschul, S.F. and Erickson, B.W. 1985. Significance of nucleotide sequence alignments: A method for random sequence permutation that preserves dinucleotide and codon usage. Mol. Biol. Evol. 2: 526–538. [PubMed]
  • Bonnet, E., Wuyts, J., Rouze, P., and Van de Peer, Y. 2004. Evidence that microRNA precursors, unlike other non-coding RNAs, have lower folding free energies than random sequences. Bioinformatics 20: 2911–2917. [PubMed]
  • Dsouza, M., Larsen, N., and Overbeek, R. 1997. Searching for patterns in genomic data. Trends Genet. 13: 497–498. [PubMed]
  • Eddy, S.R. 2001. Non-coding RNA genes and the modern RNA world. Nat. Rev. Genet. 2: 919–929. [PubMed]
  • ———. 2002. Computational genomics of noncoding RNA genes. Cell 109: 137–140. [PubMed]
  • Fagegaltier, D., Lescure, A., Walczak, R., Carbon, P., and Krol, A. 2000. Structural analysis of new local features in SECIS RNA hairpins. Nucleic Acids Res. 28: 2679–2689. [PMC free article] [PubMed]
  • Gautheret, D., Major, F., and Cedergren, R. 1990. Pattern searching/alignment with RNA primary and secondary structures: An effective descriptor for tRNA. Comput. Appl. Biosci. 6: 325–331. [PubMed]
  • Griffiths-Jones, S., Bateman, A., Marshall, M., Khanna, A., and Eddy, S.R. 2003. Rfam: An RNA family database. Nucleic Acids Res. 31: 439–441. [PMC free article] [PubMed]
  • Harborth, J., Elbashir, S.M., Vandenburgh, K., Manninga, H., Scaringe, S.A., Weber, K., and Tuschl, T. 2003. Sequence, chemical, and structural variation of small interfering RNAs and short hairpin RNAs and the effect on mammalian gene silencing. Antisense Nucleic Acid Drug Dev. 13: 83–105. [PubMed]
  • Hofacker, I.L., Priwitzer, B., and Stadler, P.F. 2004. Prediction of locally stable RNA secondary structures for genome-wide surveys. Bioinformatics 20: 186–190. [PubMed]
  • Kingman, J.F.C. 1973. Subadditive ergodic theory. Ann. Probability 1: 893–909.
  • Klein, R.J., Misulovin, Z., and Eddy, S.R. 2002. Noncoding RNA genes identified in AT-rich hyperthermophiles. Proc. Natl. Acad. Sci. 99: 7542–7547. [PMC free article] [PubMed]
  • Klosterman, P.S., Tamura, M., Holbrook, S.R., and Brenner, S.E. 2002. SCOR: A structural classification of RNA database. Nucleic Acids Res. 30: 392–394. [PMC free article] [PubMed]
  • Kryukov, G.V., Kryukov, V.M., and Gladyshev, V.N. 1999. New mammalian selenocysteine-containing proteins identified with an algorithm that searches for selenocysteine insertion sequence elements. J. Biol. Chem. 274: 33888–33897. [PubMed]
  • Laferriere, A., Gautheret, D., and Cedergren, R. 1994. An RNA pattern matching program with enhanced performance and portability. Comput. Appl. Biosci. 10: 211–212. [PubMed]
  • Le, S.Y., Chen, J.H., and Maizel, J.V.J. 1990a. Efficient searches for unusual folding regions in RNA sequences. In Structure and methods: Human Genome Initiative and DNA recombination(eds. R.H. Sarma and M.H. Sarma), pp. 127–136. Adenine Press, Schenectady, NY.
  • Le, S.Y., Malim, M.H., Cullen, B.R., and Maizel, J.V. 1990b. A highly conserved RNA folding region coincident with the Rev response element of primate immunodeficiency viruses. Nucleic Acids Res. 18: 1613–1623. [PMC free article] [PubMed]
  • Lescure, A., Gautheret, D., Carbon, P., and Krol, A. 1999. Novel selenoproteins identified in silico and in vivo by using a conserved RNA structural motif. J. Biol. Chem. 274: 38147–38154. [PubMed]
  • Lim, L.P., Glasner, M.E., Yekta, S., Burge, C.B., and Bartel, D.P. 2003. Vertebrate microRNA genes. Science 299: 1540. [PubMed]
  • Macke, T.J., Ecker, D.J., Gutell, R.R., Gautheret, D., Case, D.A., and Sampath, R. 2001. RNAMotif, an RNA secondary structure definition and search algorithm. Nucleic Acids Res. 29: 4724–4735. [PMC free article] [PubMed]
  • Mathews, D.H., Sabina, J., Zuker, M., and Turner, D.H. 1999. Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. J. Mol. Biol. 288: 911–940. [PubMed]
  • Mathews, D.H., Turner, D.H., and Zuker, M. 2000. Secondary structure prediction. In Current protocols in nucleic acid chemistry(eds. S. Beaucage et al.), pp. 11.2.1–11.2.10. Wiley, New York.
  • Miranda-Rios, J., Navarro, M., and Soberon, M. 2001. A conserved RNA structure (thi box) is involved in regulation of thiamin bio-synthetic gene expression in bacteria. Proc. Natl. Acad. Sci. 98: 9736–9741. [PMC free article] [PubMed]
  • Rivas, E. and Eddy, S.R. 2000. Secondary structure alone is generally not statistically significant for the detection of noncoding RNAs. Bioinformatics 16: 583–605. [PubMed]
  • Seffens, W. and Digby, D. 1999. mRNAs have greater negative folding free energies than shuffled or codon choice randomized sequences. Nucleic Acids Res. 27: 1578–1584. [PMC free article] [PubMed]
  • Sprinzl, M., Horn, C., Brown, M., Ioudovitch, A., and Steinberg, S. 1998. Compilation of tRNA sequences and sequences of tRNA genes. Nucleic Acids Res. 26: 148–153. [PMC free article] [PubMed]
  • Tuschl, T. 2003. Functional genomics: RNA sets the standard. Nature 421: 220–221. [PubMed]
  • Washietl, S. and Hofacker, I.L. 2004. Consensus folding of aligned sequences as a new measure for the detection of functional RNAs by comparative genomics. J. Mol. Biol. 342: 19–30. [PubMed]
  • Waterman, M.S. 1995. Introduction to computational biology: Maps, sequences and genomes.Chapman and Hall, London, New York.
  • Wilkie, G.S., Dickson, K.S., and Gray, N.K. 2003. Regulation of mRNA translation by 5′- and 3′-UTR-binding factors. Trends Biochem. Sci. 28: 182–188. [PubMed]
  • Winkler, W., Nahvi, A., and Breaker, R.R. 2002. Thiamine derivatives bind messenger RNAs directly to regulate bacterial gene expression. Nature 419: 952–956. [PubMed]
  • Workman, C. and Krogh, A. 1999. No evidence that mRNAs have lower folding free energies than random sequences with the same dinucleotide distribution. Nucleic Acids Res. 27: 4816–4822. [PMC free article] [PubMed]
  • Xia, T., SantaLucia Jr., J., Burkard, M.E., Kierzek, R., Schroeder, S.J., Jiao, X., Cox, C., and Turner, D.H. 1998. Thermodynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes with Watson-Crick base pairs. Biochemistry 37: 14719–14735. [PubMed]
  • Zuker, M. 2003. Mfold Web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 31: 3406–3415. [PMC free article] [PubMed]
  • Zuker, M. and Stiegler, P. 1981. Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res. 9: 133–148. [PMC free article] [PubMed]

Articles from RNA are provided here courtesy of The RNA Society

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...