• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of narLink to Publisher's site
Nucleic Acids Res. Dec 1, 2000; 28(23): 4698–4708.
PMCID: PMC115172

Comparative analysis of more than 3000 sequences reveals the existence of two pseudoknots in area V4 of eukaryotic small subunit ribosomal RNA


The secondary structure of V4, the largest variable area of eukaryotic small subunit ribosomal RNA, was re-examined by comparative analysis of 3253 nucleotide sequences distributed over the animal, plant and fungal kingdoms and a diverse set of protist taxa. An extensive search for compensating base pair substitutions and for base covariation revealed that in most eukaryotes the secondary structure of the area consists of 11 helices and includes two pseudoknots. In one of the pseudoknots, exchange of base pairs between the two stems seems to occur, and covariation analysis points to the presence of a base triple. The area also contains three potential insertion points where additional hairpins or branched structures are present in a number of taxa scattered throughout the eukaryotic domain.


Secondary structure models for small subunit (SSU) and large subunit (LSU) rRNA were postulated nearly as soon as the first primary structures became available for these molecules. Although experimental approaches were used initially to gain information on the secondary structure, the models presently available are based essentially on comparative sequence analysis, which derives the folding from the observation of compensating substitutions that allow different sequences to adopt similar base pairing patterns. Secondary structure models were drawn up in this way and gradually improved as more sequences became available for SSU rRNA (1,2) as well as LSU rRNA (3,4). A number of tertiary interactions in both molecules have been discovered by similar methods (5).

Detailed knowledge of rRNA higher order structure is important for several reasons. First, it contributes to the ultimate goal of a complete molecular description of the ribosome and of the way it functions. Attempts (6) to reconstruct the complete spatial structure of the ribosome combine RNA secondary structure models with results from cross-linking experiments, spatial maps of ribosomal proteins and electron-microscopic observation of ribosomal subunits. Low resolution X-ray crystallographic structures of ribosomal subunits (7,8) and complete ribosomes (9) are just beginning to confirm the existence of the most easily recognizable RNA helices predicted by the secondary structure models. Secondly, the availability of thousands of SSU rRNA and LSU rRNA sequences (10,11) has allowed a breakthrough in our insight into evolutionary relationships between bacterial divisions (12) and between the major eukaryotic kingdoms and protist taxa (13). In order to use rRNA sequences for the reconstruction of phylogenetic trees, dependable alignments are required, and to draft these a thorough knowledge of the secondary structure is often necessary.

The alternation in SSU and LSU rRNAs of conserved, slowly evolving and variable, fast evolving areas has long been recognized (e.g. 14). Recently this observation has been put on a more quantitative basis by the measurement of the relative substitution rate of individual sites in both molecules (13,15,16). Woese (12) cited this as a useful property of rRNAs as molecular chronometers because it should allow the investigation of phylogenetic problems at different depths of the evolutionary scale. However, this prediction has not materialized because the fast-evolving areas are also harder to align and their secondary structure is harder to discover. As a result, the most variable areas are often eliminated before tree construction because their alignment is not considered reliable.

Whereas the secondary structure of prokaryotic 16S rRNA can be considered as completely established, the structure of certain variable areas in the eukaryotic 18S rRNA leaves room for different interpretations. This is especially the case for area V4 (see Fig. Fig.11 for the numbering system) which forms a complex structure in most eukaryotes whereas the corresponding area in prokaryotes is considerably shorter and forms a single hairpin. A secondary structure model containing a pseudoknot has been proposed for this area (17) and has been further refined since (2). Some of the helix structures of the model are present in a limited number of taxa. Recent comparisons of sequences of area V4 in certain arthropod taxa (1820) have confirmed the existence of most of the helices and provided some detail as to the presence of exceptional helices in this area. The availability of more than 3000 carefully aligned eukaryotic SSU rRNA sequences in our database (10) prompted us to re-examine systematically the secondary structure of area V4 in the majority of eukaryotes and to survey the exceptional cases of taxa, scattered throughout the eukaryotic evolutionary tree, where some helices are absent or additional ones are present.

Figure 1
New secondary structure model illustrated with P.palmata SSU rRNA. (a) Complete model with indication of variable areas V1–V9. V6 is variable only in prokaryotic SSU rRNA where helix 37 is branched. Area V4 is in blue; base pair compensation ...


Nucleotide sequences, alignment, drawing of secondary structure models

The European small ribosomal subunit RNA database (10) contains ~13 000 SSU rRNA sequences from bacterial, plastidial, mitochondrial, archaeal and eukaryotic genomes. The sequences are regularly collected from nucleotide sequence libraries and stored in the form of an alignment based on the secondary structure model adopted for the molecule, as explained by Van de Peer et al. (10). The secondary structure was re-examined for the eukaryote-specific variable area V4, which is situated between helices 23 and 24, and contains a set of helices numbered E23-n as indicated in Figure Figure1a.1a. A partial alignment was used for this purpose, limited to area V4 and consisting of 3253 eukaryotic sequences, distributed over the animal, fungal and plant kingdoms and a number of protist taxa as listed in Table Table1.1. After examination of base covariation and base complementarity compensation (see below), slight adjustments to the alignment were carried out, where necessary, using the alignment editor DCSE (21). Secondary structure models were drawn using the program RnaViz (22). The program mfold (23) was used to look for possible local foldings of inserts with unknown secondary structure.

Table 1.
Distribution of compared SSU rRNA sequences over the eukaryotic taxaa

Estimating the strength of base pair compensation

A compensating substitution in a sequence alignment is defined as the substitution of both bases of a complementary pair, present at two positions in a sequence, by other complementary bases at the same positions in another sequence. Complementary base pairs are defined as A·U, U·A, G·C, C·G, G·U and U·G. As an example, a substitution of A·U by G·C is a compensating substitution. A substitution of A·U by G·U is not compensating because only one base changes. Base pair compensation can serve as evidence for the existence of base pairing between two alignment positions, but the evidence can be weak or strong depending on the fraction of base pairs that are complementary and their distribution over the six cases.

Consider the bases occupying two columns i and j in an alignment matrix of n rows (nucleotide sequences) and m columns (nucleotide sites). Comparison of rows k and l at sites i and j can show base pair compensation or not. If each row is compared with each other row we define the compensation index C for sites i and j as

An external file that holds a picture, illustration, etc.
Object name is gkd671eq1.gif

where pc is the number of pairs of rows showing a compensating base pair substitution and pt is the total number of pairs of rows. The square root is taken because the number of pairs of rows is essentially quadratic in the number of rows itself. Note that C always lies between 0 and 1.

A straightforward computation of C requires consideration of all pt = n(n – 1)/2 pairs of rows, which in our case, with n = 3253, means pt = 5 289 378 pairs of sequences. C has to be computed for all m(m – 1)/2 pairs of positions, which in our case, with m = 118 for the examined part of area V4, amounts to 6903 pairs. As a result, verification of the existence of base pair compensation becomes very time-consuming.

The following algorithm allows a faster computation of C. For a fixed pair of positions (i,j), each row contains a pair of bases, rows that have a gap in either or both positions being ignored. By passing through the rows once we can record the number of times that each of the 16 possible pairs AA, AC, etc. occurs. The complexity of this computation is O(n), which means that the computation time increases linearly with n, whereas the straightforward algorithm is O(n2) because it has to verify each pair of rows. We denote the fraction of rows for which AU occurs as fAU, and similarly fUA, fGC, fCG, fGU and fUG. It turns out that we do not need to record the fractions of the 10 non-complementary pairs such as AA, AC, etc. For two rows k and l, with 1< k < l < n, we say that they constitute a compensating substitution when their combination is labeled as 1 in matrix matrix11.

There are 22 ones out of the 36 entries in this matrix. If we add the 10 non-complementary pairs AA, AC, etc. to this matrix then all the additional entries are zeros. Since the number pc of compensating row pairs remains the same for any permutation of the rows, it follows that

pc = n2(fAU fUA + fAU fGC + fAU fCG + fAU fUG + fUA fGC + fUA fCG +

fUA fGU + fGC fCG + fGC fUG + fCG fGU + fGU fUG)

where the sum contains 22/2 = 11 terms because matrix matrix11 is symmetrical. Since the total number of pairs of rows is pt = n(n – 1)/2 we can compute C by the formula

An external file that holds a picture, illustration, etc.
Object name is gkd671eq2.gif


Once the six fractions fAU, fUA, fGC, fCG, fGU and fUG are known, expression 2 is only a simple computation, so the computation time of the entire algorithm for C remains linear in n.

Estimating the strength of base covariation

Covariation (24) is the phenomenon that a base change in one column of the alignment matrix is matched by a base change in another column, but the two bases do not have to be complementary. Base covariation can point to a secondary or tertiary interaction if the corresponding bases are complementary; it usually points to a tertiary interaction if they are not.

Cramer’s ϕ is an index that allows determination of the strength of a relationship between two variables (25). In order to measure the strength of covariation between two sites, Cramer’s ϕ was calculated on the 4 × 4 table listing the numbers of each of the 16 base pairs occurring in the two alignment columns considered. If there are n sequences, ϕ is given by the expression:

An external file that holds a picture, illustration, etc.
Object name is gkd671eq3.gif


An external file that holds a picture, illustration, etc.
Object name is gkd671eq4.gif

with no the observed number of each of the 16 base pairs, and ne the expected number assuming that both positions are independent. χ2 is calculated on the 16 elements of the table and k is the number of columns or rows in the table, whichever is the smallest. Usually k = 4 in the case of the base pair table, unless a column or a row is empty, in which case k can be smaller, e.g. k = 3 if the four base pairs UU, UC, UA and UG are absent. Cramer’s ϕ assumes a value between 0 (no measurable covariation) and 1 (strongest possible covariation).

The strength of covariation was also measured using the mutual information (24) M, which can assume values between 0 and ln(4) = 1.386.


Presence of two pseudoknots in area V4

Figure Figure1a1a shows the complete secondary structure model for the SSU rRNA of the red alga Palmaria palmata (nucleotide sequence accession no. X53500), which is representative of the vast majority of eukaryotic SSU rRNAs. Among 3253 examined sequences 2832 fit this model, whereas in those remaining some of the helices are missing or additional ones are present, as described in the next paragraph. Detailed changes made to the folding in area V4 with respect to the previously assumed structure (26) are shown in Figure Figure1b1b and c. The helix numbering system is different in the old and in the new model, on the one hand because of the change in structure, on the other hand in order to allow consistent numbering of newly discovered extra helices present only in the SSU rRNAs of a minority of taxa. The changes in the majority model can be summarized as follows, using the old numbering system (Fig. (Fig.1b):1b): (i) hairpin E23-6, poorly supported by compensating substitutions, is abolished; (ii) the sequence UUA at its 5′-end is paired with the sequence UAA in the 5′-strand of helix E23-8, which is therefore shortened by three base pairs; (iii) four bases from the 3′-strand of helix E23-6 are paired with bases in or near the large internal loop in helix E23-7.

As a result, the new model, shown in Figure Figure1c1c with the new numbering system, contains two pseudoknots, rather than the single one present in the old model. The 5′-stem of the new pseudoknot is formed by helices E23-9 and E23-10, which are separated by hairpin E23-12. Its 3′-stem is formed by helix E23-11. The second pseudoknot, consisting of contiguous helices E23-13 and 14, remains as in the old model except that E23-13 is shortened by three base pairs at its 5′-end. The new helix E23-8 connects the two pseudoknots. Different drawing schemes were tried to represent the new structure in two dimensions, the one finally chosen proving the least confusing. The helix numbering system is in accordance with the principles followed in the SSU rRNA database (10 and references cited therein).

The new structure was discovered by systematically computing Cramer’s ϕ for all pairs of alignment positions situated between helices E23-4 and 24 (Fig. (Fig.1c)1c) and occupied by a nucleotide in at least 90% of the 3253 eukaryotic SSU rRNA sequences. This amounts to 118 sites, hence the computations were made on 6903 pairs of sites. The values of ϕ were found to be considerably higher for the seven base pairs forming the new helices E23-8, E23-9 and E23-10 (Fig. (Fig.1c)1c) than for the 12 rejected pairs of the abolished helix E23-6 and the shortened helices E23-7 and E23-8 (Fig. (Fig.1b).1b). A high ϕ points to covariation of two sites, which may be due to a tertiary or a secondary interaction. However, the fact that the sites could form neighboring base pairs rather than isolated ones suggested that they belong to secondary structure elements. This was confirmed by computation of the compensation index C for each base pair. Figure Figure22 gives a graphic representation of the geometric average of ϕ and C calculated for the 6903 pairs of nucleotide sites. Most of the helices can be seen as diagonals of darker dots in a half matrix of dots with an intensity commensurate with

An external file that holds a picture, illustration, etc.
Object name is gkd671eq5.gif


Figure 2
Matrix of base pair compensation and covariation strength. The lower half matrix consists of dots with a darkness proportional to the value of √ϕC measured for each pair that can be formed by the 118 sites lying between helix E23-4 and ...

In Table Table22 the scores C and ϕ are listed for the 32 base pairs of the pseudoknot area comprising helices E23-8 to E23-14. The mutual information M (24) is added for comparison. All the base pairs show compensation to various extents, except for the fourth pair of pseudoknot helix E23-11, where the only complementary pairs found are A·U and G·U. There are other strongly conserved base pairs, such as base pair 1 of helix E23-8, which is U·A in 97.5% of the sequences, yet each of the other five base pairs is present in a small number of sequences. This results in a higher compensation score than that of base pair 1 of helix E23-13, which is less conserved (67% U·G) but only four of the six complementary base pairs are found and the two base pairs present in the largest proportion, U·G and C·G, are not compensating. The highest compensation scores are found for the base pairs of helix E23-14, with base pair 3 attaining score 0.78 due to the presence of G·C, U·A and C·G in nearly equal amounts. Five of the base pairs in Table Table22 contain a non-complementary pair in >10% of the sequence set. In each case, among the 10 possible non-complementary pairs one is present in large excess, and this is A·C in three cases and C·U in the two other cases. This could point to some structural peculiarity at these sites that allows the presence of these particular base pairs. Of the four R·Y pairs, A·C is the only non-complementary one but it can form a pair bound by two H-bridges with the same geometry as Watson–Crick pairs if either A or C is in the imino tautomeric form. It is noteworthy that each of the base pairs with a high occurrence of A·C occupies the end of a helix.

Table 2.
Composition and degree of compensation for the base pairs of the pseudoknot structure in area V4

The existence of base pair compensation in helices E23-8 to E23-12, the area containing the newly discovered pseudoknot, is illustrated in Figure Figure33 with a set of nine structures. In most sequences this area contains 15 base pairs. Fourteen of these are proven by compensating substitution in the examined sequence set.

Figure 3
Base pair compensation in helices E23-8 to E23-12. Helices E23-8 to E23-12 comprise 15 base pairs in most species. Fourteen of these, in red in the P.palmata structure in the centre, show compensation in other species, as can be seen in the eight structures ...

The net difference between the old secondary structure model (Fig. (Fig.1b)1b) and the new one (Fig. (Fig.1c)1c) amounts to the dissolution of 12 base pairs and the creation of seven new ones. The average compensation score of the abolished base pairs is C = 0.0329, the average score of the new base pairs, C = 0.1745. For each of the base pairs of helices E23-8 to E23-14, Table Table22 mentions the number of sequences where it was observed, which is always smaller than 3253, the total number of sequences compared. There are several reasons for this. First, the 59 Microsporidia sequences form exceptional structures described below and miss all the helices mentioned in Table Table2,2, which leaves 3194 sequences to be compared. Secondly, some helices, especially E23-12 to E23-14, show length heterogeneity. Table Table22 lists the base pairs occurring in the majority of sequences but a pair can be missing due to a symmetrical deletion in both strands, or a bulge can result from a deletion in one strand. Thirdly, some of the compared sequences contain a few ambiguity codes such as R, Y, S, W and N. If one of the bases of a pair is incompletely identified the pair is not counted in Table Table2.2. Taking as an example helix E23-9, which consists of only two base pairs, examination of the alignment shows that both pairs are present in 3190 out of 3194 sequences if ambiguous nucleotide symbols are included. Allowing for the possibility of a few sequencing errors, it seems highly probable that helix E23-9 exists in all structures except those of the Microsporidia. Similar observations for the other helices listed in Table Table22 leads us to the conclusion that they exist in all the examined eukaryotic SSU rRNA sequences, with certain exceptions for the taxa with a reduced structure described below.

Deviant structures of area V4 in certain taxa

Of the 3253 sequences examined, 2832 follow the secondary structure model for area V4 shown for P.palmata in Figure Figure1a.1a. However, the variability of area V4 is not limited to a high substitution rate but also involves a higher rate of deletion and insertion, which in extreme cases results in the deletion or insertion of entire helices. A survey of sequences with an exceptional number of helices was made (2) in 1992 when only 197 eukaryotic SSU rRNA sequences were known, but other cases of deviant secondary structures in area V4 have been discovered in the meantime (e.g. 19,27). We therefore made a systematic survey of exceptional structures in area V4, present in 421 of the 3253 sequences.

Figure Figure44 shows the three sites in the secondary structure model where additional helices are inserted. Two of them are at the potential branching points between helices E23-1 and 2 and between E23-4 and 7. The third site is at the 3′-end of the area, between pseudoknot helix E23-14 and helix 24. Examples of exceptional structures are given in Figure Figure5.5. Table Table33 summarizes which helices are present in which taxa. Helix E23-1 is listed as a long range interaction because it contains a potential branching point to helices E23-2 and 3. However, since helix E23-3 is only exceptionally present, helices E23-1 and 2 look like a long hairpin in most species. The same applies to helices E23-4 and 7.

Figure 4
Position of exceptional helices in area V4. Double stranded areas are drawn as thick parallel lines, single stranded areas as thin lines. Universal helices 23 and 24 are drawn in black, eukaryote-specific helices are in color, blue for those present ...
Figure 5
Examples of exceptional structures in area V4 in specific taxa. (a) Parabasalidea; (b) Cladocera; (c) Pterigota; (d) Neodermata; (e) Acanthamoebidae; (f) Euglenida; (g) Kinetoplastida. Cf. Table 3.
Table 3.
Helix presence in area V4 of eukaryotic SSU rRNA secondary structurea

In short, exceptional structures were derived as follows. In case a sequence contains a large deletion or insertion with respect to the majority of sequences, the first step consists of localizing its position as precisely as possible on the basis of similarity of the flanking sequences to segments of the alignment. This is usually straightforward in the case of a deletion, but it can be difficult in the case of an insertion in an already variable area, such as that enclosed by helix E23-1. In this case, the entire area containing helix E23-1 plus the insert was examined with the secondary structure prediction program mfold which for certain taxa yielded an enlarged single hairpin and for others a branched structure. The existence of the stem helix E23-1 and the hairpins E23-2 and eventually E23-3 could then usually be confirmed by the observation of compensating substitutions among the sequences of each taxon. Structures at other potential insertion points were derived similarly.

There are three protist taxa, the Microsporidia, the Parabasalidea and the Babesiidae, that miss some of the 11 helices forming the consensus model for area V4 as indicated on the bottom row of Table Table3.3. In most Microsporidia the entire area V4 is absent, the others contain a sequence that can be folded into a single hairpin, as in the bacteria. This hairpin is labeled rather arbitrarily helix E23-1 in Table Table3,3, because it is the only helix present in the area and it follows universal helix 23 in the sequence. However, there is no evidence whatsoever that this hairpin is homologous with E23-1, or any other helix E23-n, in other eukaryotes. In the Parabasalidea (Fig. (Fig.5a)5a) helix E23-1 is present in 12 out of 20 sequences of the examined set. Homology with E23-1 of other eukaryotes is probable because many other helices of the area are present in the same succession. Helix E23-4 is present in 18 out of 20 species, but its extension E23-7 found in other eukaryotes is absent. The second pseudoknot, E23-13 and 14, also seems to be absent and replaced by a single hairpin, which was labeled E23-15 because it occupies a position similar to that of helix E23-15 in other taxa treated below, but there is no evidence for homology with these helices. In the Babesiidae, among 17 examined sequences, helix E23-4 is absent in 11, helix E23-7 in 14 of them. There is one more protist taxon, the Diplomonadida, with a reduced area V4, but the small number of available sequences and high variability made alignment too uncertain to derive a structure, hence it was not included in the examined set.

Eight taxa contain all 11 helices of the consensus model (bottom row of Table Table3)3) plus some additional ones. At branching point E23-1/2, an extra helix E23-3 is found in the protist taxon Acanthamoebidae (Fig. (Fig.5e)5e) and in the crustacean taxa Cladocera and Cyclestherida (Fig. (Fig.5b).5b). At branching point E23-4/7, a single helix E23-5 is inserted in a range of taxa (see Table Table3),3), examples being given in Figure Figure5b,5b, e and f. At the same branching point, two helices, E23-5 and 6, are inserted in certain representatives of the Pterigota (winged insects; Fig. 5c) and in the Platyhelminth subphylum Neodermata (Fig. (Fig.5d).5d). There is no apparent homology between the helices inserted at this branching point in different taxa. Where a single helix was found, it was given the number E23-5, if two were found, the second one was labeled E23-6. Between helix E23-14 and universal helix 24, inserts are found in the Euglenida and in the Kinetoplastida, which together belong to the protist taxon Euglenozoa. Since only four euglenid sequences were available there were not enough compensating substitutions to derive a dependable structure for the insert. The single hairpin E23-15 shown in Figure Figure5f5f was derived by mfold and should be regarded as tentative. In the case of Kinetoplastida a branched structure E23-15-16-17 (Fig. (Fig.5g)5g) is inserted. Except for helix E23-5 in Myxogastria and E23-15 in Euglenida, all extra helices are supported by compensating substitutions, though usually not in each base pair.

The presence of extra helices in certain species at the potential branching point E23-4/7 and between helix E23-14 and universal helix 24 has been known for several years (2,28). The structure of the insert at branching point E23-4/7 in insects was examined more thoroughly recently (20). The branching of hairpin E23-3 from junction E23-1/2 was observed in branchiopod crustaceans (Cladocera and Cyclestherida, Table Table3)3) by Crease and Taylor (19). However, the latter authors assumed the presence of an extra helix, which they named E23-c, between E23-1-2-3 and E23-4. Our comparative study shows that this helix is in fact helix E23-5 and that the structure should be modified as illustrated with the Daphnia pulex model in Figure Figure55b.


Implications for phylogenetic studies

Alignments of eukaryotic SSU rRNA sequences have been used extensively for the reconstruction of phylogenetic trees, from studies encompassing nearly the entire Eukarya domain (e.g. 13) to others focusing on specific low level taxa (e.g. 18). rRNA molecular evolutionists are well aware that their alignments should take into account secondary structure patterns, while improvements in the alignment can in turn lead to discovery of new details of the secondary structure, the entire process being enhanced by the availability of an ever-increasing set of sequences. Although the accuracy of phylogenetic trees increases with the length of the sequence alignment used, entire sequence chunks are often omitted because the secondary structure of certain variable areas is too poorly known to allow a justifiable alignment. This is a waste of data that can be avoided by a better knowledge of the detailed secondary structure of variable areas. The largest such area in eukaryotic SSU rRNA, V4, has already been the focus of several studies (2,1720), but the present work achieves a new degree of detail in its description, made possible by comparison of an extensive sequence set covering the most diverse eukaryotic taxa. It also gives an inventory of the insertion sites, and which structures are inserted in which taxa (Table (Table3).3). This should constitute an aid in phylogenetic studies in the sense that it allows it to be decided which parts of the sequence can be used, depending on the problem studied. The secondary structure of helices E23-8 to 14 is sufficiently conserved to be aligned among all eukaryotes except for three primitive potist taxa, the Microsporidia, Parabasalidea and Diplomonadida, where the structure is either absent or strongly reduced in size. Helices E23-8 to 14 can therefore be used in nearly all studies on eukaryotic evolution, general as well as specific. On the other hand, helices E23-1 to 7 and E23-15 to 17 are so variable that meaningful alignments should be limited to sets of closely related species having sufficiently similar structures. Even within such taxa a search for homologous nucleotides can be futile in strongly expanded segments of helices such as E23-2 or E23-5.

Table Table33 shows that some helices such as E23-5 are found in taxa as distant as euglenids and certain insects, while being absent in the majority of other eukaryotic species. It thus seems that the SSU rRNA molecule comprises a basic structure which at certain points leaves room for insertion of sequences. These seem to happen independently at certain points in evolution, leading to structures that, though present at similar sites of the molecule, are not homologous. In other words, they constitute a homoplasy between the taxa where they occur.

Structure of the pseudoknots and evidence for a tertiary interaction

Our study shows that area V4 contains two pseudoknots in succession. The first one is formed by helices E23-9 to 12, the second one by helices E23-13 and 14. The latter pseudoknot is of the simplest known type consisting of two stem–loop structures with each loop intertwined with the stem of the other. The newly discovered pseudoknot, though containing fewer base pairs, has a more complicated structure. Its 5′-stem, consisting of helices E23-9 and 10, is interrupted in its 3′-strand by the short hairpin E23-12. Consultation of the pseudoknot database (29) showed that this arrangement is most similar to pseudoknots PKB134, PKB135, PKB168 and PKB168 found in plant viruses.

The pseudoknot consisting of helices E23-13 and 14 has a peculiar property, already described when it was first reported (17), and illustrated in Figure Figure6.6. Helix E23-13 can acquire base pairs at the expense of helix E23-14 and vice versa, in other words the boundary between the two stems of the pseudoknot seems to be able to shift in both directions. The presumption of a mobile boundary is strengthened by the fact that the shift is possible in the large variety of species now examined, in spite of the variability of the local nucleotide sequence. Although no specific function has yet been ascribed to this area of SSU rRNA it seems possible that mobility of this part of the molecule would be associated with the ribosome switching between allosteric states associated with its protein synthesizing function.

Figure 6
Possible dynamic structure and base triple in pseudoknot E23-13-14. (a) The structure of helix E23-8 and of the pseudoknot E23-13-14 is shown for two species, the red alga P.palmata and the apicomplexan Sarcocystis muris. The uppermost structure corresponds ...

There is a strong covariation between the base preceding the 5′-strand of helix E23-8 (U in P.palmata) and the base pair at the 5′-end of helix E23-13 (U·G in P.palmata). The base pair is U·G in 66.5% of the sequences and C·G in 29.3%. The combination U (U·G) is found in 64% of the sequences, the combination G (C·G) in 27.4% of them. We therefore postulate that these three sites form a base triple. A search for isoform triples with these compositions by the program ISOPAIR (30) yielded the structures shown in Figure Figure6b.6b. The existence of two base triples in more conserved areas of eukaryotic SSU rRNA has been deduced by Gutell and coworkers (http://www.rna.icmb.utexas.edu/) but to our knowledge this is the first indication of the presence of such a structure in a variable area.



Our research was supported in part by the Special Research Fund of the University of Antwerp. P.D.R. and Y.V.d.P. are research fellows of the Fund for Scientific Research Flanders.


1. Gutell R.R., Weiser,B., Woese,C.R. and Noller,H.F. (1985) Progr. Nucleic Acids Res. Mol. Biol., 32, 155–216. [PubMed]
2. De Rijk P., Neefs,J.M., Van de Peer,Y. and De Wachter,R. (1992) Nucleic Acids Res., 20, Suppl., 2075–2089. [PMC free article] [PubMed]
3. Gutell R.R., Gray,M.W. and Schnare,M.N. (1993) Nucleic Acids Res., 21, 3055–3074. [PMC free article] [PubMed]
4. De Rijk P., Van de Peer,Y., Chapelle,S. and De Wachter,R. (1994) Nucleic Acids Res., 22, 3495–3501. [PMC free article] [PubMed]
5. Gutell R.R., Larsen,N. and Woese,C.R. (1994) Microbiol. Rev., 58, 10–26. [PMC free article] [PubMed]
6. Mueller F. and Brimacombe,R. (1997) J. Mol. Biol., 271, 524–544. [PubMed]
7. Clemons W.M., May,J.L.C., Wimberly,B.T., McCutcheon,J.P., Capel,M.S. and Ramakrishnan,V. (1999) Nature, 400, 833–840. [PubMed]
8. Ban N., Nissen,P., Hansen,J., Capel,M., Moore,P.B. and Steitz,T.A. (1999) Nature, 400, 841–847. [PubMed]
9. Cate J.H., Yusupov,M.M., Yusupova,G.Z., Earnest,T.N. and Noller,H.F. (1999) Science, 285, 2095–2104. [PubMed]
10. Van de Peer Y., De Rijk,P., Wuyts,J., Winkelmans,T. and De Wachter,R. (2000) Nucleic Acids Res., 28, 175–176. [PMC free article] [PubMed]
11. De Rijk P., Wuyts,J., Van de Peer,Y., Winkelmans,T. and De Wachter,R. (2000) Nucleic Acids Res., 28, 177–178. [PMC free article] [PubMed]
12. Woese C.R. (1987) Microbiol. Rev., 51, 221–271. [PMC free article] [PubMed]
13. Van de Peer Y. and De Wachter,R. (1997) J. Mol. Evol., 45, 619–630. [PubMed]
14. Hassouna N., Michot,B. and Bachellerie,J.P. (1984) Nucleic Acids Res., 12, 3563–3583. [PMC free article] [PubMed]
15. Van de Peer Y., Chapelle,S. and De Wachter,R. (1996) Nucleic Acids Res., 24, 3381–3391. [PMC free article] [PubMed]
16. Ben Ali A., Wuyts,J., De Wachter,R., Meyer,A. and Van de Peer,Y. (1999) Nucleic Acids Res., 27, 2825–2831. [PMC free article] [PubMed]
17. Neefs J.M. and De Wachter,R. (1990) Nucleic Acids Res., 18, 5695–5704. [PMC free article] [PubMed]
18. Hancock J.M. and Vogler,A.P. (1998) Nucleic Acids Res., 26, 1689–1699. [PMC free article] [PubMed]
19. Crease T.J. and Taylor,D.J. (1998) Mol. Biol. Evol., 15, 1430–1446. [PubMed]
20. Hwang U.W., Ree,H.I. and Kim,W. (2000) Zool. Sci., 17, 111–121. [PubMed]
21. De Rijk P. and De Wachter,R. (1993) Comput. Appl. Biosci., 9, 735–740. [PubMed]
22. De Rijk P. and De Wachter,R. (1997) Nucleic Acids Res., 25, 4679–4684. [PMC free article] [PubMed]
23. Mathews D.H., Sabina,J., Zuker,M. and Turner,D.H. (1999) J. Mol. Biol., 288, 911–940. [PubMed]
24. Gutell R.R., Power,A., Hertz,G.Z., Putz,E.J. and Stormo,G.D. (1992) Nucleic Acids Res., 20, 5785–5795. [PMC free article] [PubMed]
25. Welkowitz J., Ewen,R.B. and Cohen,J. (1982) Introductory Statistics for the Behavioral Sciences. 3rd Edn. Harcourt Brace Jovanovich, p. 288.
26. Van de Peer Y., Jansen,J., De Rijk,P. and De Wachter,R. (1997) Nucleic Acids Res., 25, 111–116. [PMC free article] [PubMed]
27. Hartskeerl R.A., Schuitema,A.R.J. and De Wachter,R. (1993) Nucleic Acids Res., 21, 1489. [PMC free article] [PubMed]
28. Neefs J.M., Van de Peer,Y., De Rijk,P., Goris,A. and De Wachter,R. (1991) Nucleic Acids Res., 19, 1987–2015. [PMC free article] [PubMed]
29. van Batenburg F.H.D., Gultyaev,A.P., Pleij,C.W.A., Ng,J. and Oliehoek,J. (2000) Nucleic Acids Res., 28, 201–204. [PMC free article] [PubMed]
30. Gautheret D. and Gutell,R.R. (1997) Nucleic Acids Res., 25, 1559–1564. [PMC free article] [PubMed]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press
PubReader format: click here to try


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...