• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of narLink to Publisher's site
Nucleic Acids Res. Feb 15, 2001; 29(4): 943–954.

Protein–RNA interactions: a structural analysis


A detailed computational analysis of 32 protein–RNA complexes is presented. A number of physical and chemical properties of the intermolecular interfaces are calculated and compared with those observed in protein–double-stranded DNA and protein–single-stranded DNA complexes. The interface properties of the protein–RNA complexes reveal the diverse nature of the binding sites. van der Waals contacts played a more prevalent role than hydrogen bond contacts, and preferential binding to guanine and uracil was observed. The positively charged residue, arginine, and the single aromatic residues, phenylalanine and tyrosine, all played key roles in the RNA binding sites. A comparison between protein–RNA and protein–DNA complexes showed that whilst base and backbone contacts (both hydrogen bonding and van der Waals) were observed with equal frequency in the protein–RNA complexes, backbone contacts were more dominant in the protein–DNA complexes. Although similar modes of secondary structure interactions have been observed in RNA and DNA binding proteins, the current analysis emphasises the differences that exist between the two types of nucleic acid binding protein at the atomic contact level.


RNA performs essential and diverse functions within the cell. It forms part of the ribosome (1,2) and the spliceosome (3) and also exhibits catalytic activity (46). A common thread to many of these functions is the interaction of RNA with proteins. For example, specific tRNAs are bound to aminoacyl-tRNA synthetases for the translation of the genetic code during protein synthesis (7), and ribonucleoprotein particles (RNPs) bind RNA in post-transcriptional regulation of gene expression (8). However, despite their obvious functional importance, the specific mechanisms of protein–RNA interactions are still poorly understood. This is in contrast to the much clearer picture of interactions in protein–DNA complexes (911). The lack of information about protein–RNA complexes reflects the smaller number of structures that have been solved by crystallography and NMR. The current work was completed prior to the structure of the ribosomal subunits being solved (1214). When the analysis was conducted there were a total of 330 known protein–DNA complex structures compared with just 35 protein–RNA complexes [Nucleic Acid Database (NDB) (15)]. With the coordinates of the ribsomal structures now available the number of protein–RNA complexes with known structures has risen to 89, and whilst this number is still small, these new structures provide important new data for analysis.

In contrast to the regular double helical structure of B-DNA commonly found in protein–DNA complexes, RNAs display structures almost as diverse as their function. RNA structures are flexible molecules that display complex secondary and tertiary structures. RNAs are commonly single-stranded but structures also include short lengths of double helices (A-form), hairpin loops, bulges and pseudoknots. Proteins tend to interact with RNA where it forms complex secondary structure elements such as stem–loops and bulges (16). In addition non-Watson–Crick base pairing can occur in loop regions of RNA structures and such features can also be preferentially identified by proteins (17).

Work in this field has primarily centred on the identification of recurring RNA recognition motifs such as the RNP and arginine-rich motifs (16,18,19) and interactions within individual complexes (2022). A large amount of data has also been derived from aminoacyl-tRNA synthetases for which a number of complexes have been solved (7,23).

Now that an increasing number of protein–RNA structures are known, there is a need to draw together the structural data to look for common features that might characterise the intermolecular interactions within them. A comprehensive review of protein–RNA structures has most recently been published by Draper (24). This work goes further than previous reviews on the subject, by dividing complexes into two main classes based on the mode of RNA recognition: (i) groove binding and (ii) β-sheet binding. In the former, proteins position a secondary structure element, such as an α-helix or loop, into the groove of an RNA helix. In the latter, proteins use β-sheet surfaces to create binding pockets that bind unpaired RNA bases. These two recognition themes are adopted in the current analysis (Table (Table11).

Table 1.
Dataset of 32 protein–RNA complexes selected from the NDB (December 7, 1999)

Here we present a comprehensive analysis of protein–RNA interactions at the residue and atom level, and compare them with interactions observed in protein–double-stranded DNA (dsDNA) complexes (N.M.Luscombe and J.M.Thornton, manuscript in preparation; 25) and protein–single-stranded DNA (ssDNA) complexes. Data for this analysis has been drawn from both the NDB (15) and the Protein Data Bank (PDB) (26). A computational analysis of chemical and physical properties of nucleic acid binding sites on proteins, including the size, polarity and packing is described. In addition the distribution of observed atom–atom contacts in the protein–nucleic acid complexes have been calculated and compared to expected values.



For this analysis a total of 35 protein–RNA complexes were extracted from the NDB (on December 7, 1999) with resolutions of 3.0 Å or better and the full coordinates of all atoms (by December 12, 2000 there were 59 protein–RNA complexes in the NDB with a resolution of 3.0 Å or better) (15). Of these, 27 involved at least five RNA bases. To this initial dataset two protein–RNA complexes solved by NMR and a further three structures very recently solved by X-ray crystallography (22) were added, to produce a dataset of 32 protein–RNA complexes (Table (Table1).1). The proteins in each complex were classified into structural families using the structural alignment program SSAP (27). A SSAP score of ≥80 (and a sequence identity of >20%) between a pair of protein chains indicates that the two are structurally related and hence they were clustered into the same structural family. A representative complex (with the best resolution) from each family was selected to be included in a non-homologous dataset. Additional proteins were included from a family if the sequence of the bound RNA was different, hence the same protein could be included in the non-homologous dataset but only if the RNA sequences were different in each case. This resulted in a non-homologous dataset of 20 protein–RNA complexes (Table (Table1).1). The protein–RNA complexes were also divided into three subsets dependant upon their function: (A) proteins binding viral RNA (vRNA), (B) those involved in protein synthesis, binding transfer RNA (tRNA) and ribosomal RNA (rRNA) and (C) those involved in RNA modification, binding messenger RNA (mRNA) and small nuclear RNA (snRNA) (Table (Table11).

In Table Table1,1, the RNA molecules bound to the proteins are classified into double-stranded (A-type double helix), single-stranded (elongated structures with no tertiary structure elements), single-stranded with single loop (commonly forming a hairpin loop), single-stranded with multiple loops (commonly forming the classic cloverleaf structures observed in the tRNAs). The type of recognition used by the protein is additionally included in Table Table1,1, using the two classes identified by Draper (21). One structure from each family is shown in a Molscript diagram in Figure Figure11.

Figure 1Figure 1
MOLSCRIPT diagrams depicting protein–RNA complexes. One complex from each of the 14 families in Table 1 is presented. The sizes of the proteins are not comparable between diagrams and each is viewed from an angle that best depicts both the ...

A second dataset of proteins bound to ssDNA structures was also selected from the NDB (on December 7, 1999). There were 16 protein–ssDNA complexes in the NDB with full coordinates available, that bind between 3 and 16 nucleic acid bases (by December 12, 2000 there were 29 protein–ssDNA complexes in the NDB). These 16 proteins were clustered into eight structural families using SSAP (27) as described above (Table (Table2).2). One complex with the best resolution was selected as a representative from each family if it included at least five DNA bases. As before, additional proteins were included from a family if the DNA sequence bound was different. This resulted in a non-homologous dataset of just three protein–ssDNA complexes (Table (Table22).

Table 2.
Dataset of 16 protein–ssDNA complexes selected from the NDB (December 7, 1999)

Analysis of nucleic acid binding site properties

As described in our previous analysis (28), an amino acid was defined as an interface residue if it lost >1 Å2 of accessible surface area (ASA) when passing from the uncomplexed state (protein only) to the complexed state (protein–RNA). The ASA of the protein complexed with RNA and the protein molecule without the RNA present was calculated using the computer program Naccess (http://wolf.bms.umist.ac.uk/naccess). With these two ASA calculations it is possible to identify those protein residues whose ASA is reduced by >1 Å2 on complex formation with RNA, termed the interface residues. The total number of interface residues in a single protein defines its nucleic acid binding site.

An algorithm was used to calculate a series of parameters summarising the characteristics of the RNA and ssDNA binding sites of the protein. This was a modified version of the algorithm used to calculate the same parameters for protein–dsDNA complexes (25). The parameters calculated for each binding site included the size, polarity, interface sequence segmentation, numbers of intermolecular hydrogen bonds, the gap volume between the protein and the nucleic acid chain, and the number of water molecules forming hydrogen bond bridges between the protein and the nucleic acid. The definitions of these parameters are given in the legend to Table Table3.3. The means and standard deviations for these parameters are shown for the protein–RNA non-homologous dataset, the protein–ssDNA non-homologous dataset and, for comparison, a dataset of 26 non-homologous protein–dsDNA complexes taken from our previous analysis (25) (Table (Table3).3). The means and standard deviations of the same parameters have also been calculated for the three protein–RNA subsets viral proteins, proteins involved in protein synthesis and proteins involved in RNA modification (Table (Table44).

Table 3.
Protein interface properties for datasets of protein–RNA, protein–ssDNA and protein–dsDNA complexes
Table 4.
Protein interface properties for three functional subsets of protein–RNA complexes

Residue interface propensities were calculated for the non-homologous dataset of protein–RNA complexes. These propensities give a measure of the relative importance of different amino acid residues in the RNA binding site of the protein. Residue interface propensities were calculated for each amino acid type (AAj) as the fraction of ASA that AAj contributed to the RNA binding site compared with the fraction of ASA contributed to the remainder of the surface of the protein (equation 1).

Interface residue propensity AAj =

An external file that holds a picture, illustration, etc.
Object name is gke198eq2.gif


where ASAAAj(i) is the sum of the ASA (in the protein) of the amino acid residues of type j in the interface (the ASA of each type of residue is calculated without the RNA present); ASA(i) is the sum of the ASA in the protein of all amino acid residues of all types in the interface (the ASA of each type of residue is calculated without the RNA present); ASAAAj(s) is the sum of the ASA (in the protein) of the amino acid residues of type j on the protein surface (the surface being defined as those residues with >5% relative ASA in isolation); ASA(s) is the sum of the ASA in the protein of all amino acid residues of all types on the protein surface. Ni is the number of residues in the interface and Ns is the number of residues on the protein surface, excluding the interface residues.

A propensity of >1 indicates that a residue occurs more frequently in the interface than on the protein surface. Propensities for the protein–RNA dataset are shown compared with those of protein–dsDNA dataset (25) (Fig. (Fig.2).2). Propensities were not calculated for the protein–ssDNA as the dataset of non-homologous structures was too small.

Figure 2
Histogram of the interface residue propensities calculated for the protein–RNA complexes and compared to a dataset of protein–dsDNA complexes (25). A propensity of more than one denotes that a residue occurs more frequently in the ...

An internet resource

The protein–RNA interface parameters calculated here can be calculated for any protein–RNA complex using the protein–nucleic acid server on the World Wide Web (http://www.biochem.ucl.ac.uk/bsm/DNA/server). This tool allows the user to upload the three-dimensional coordinates of any protein–nucleic acid complex and receive back a report of its interface parameters. This server provides a simple means of comparing new complexes with those already known.

Analysis of atom–atom contacts

The non-homologous datasets of protein–RNA and protein–ssDNA complexes each contain relatively few members (20 and 3, respectively) (Tables (Tables11 and and2). 2). Hence for the atom–atom contact analysis, all the structures were used to extract a dataset of non-homologous intermolecular contacts. This method also ensures that if a complex contains interactions that are unique within a family, these interactions were not lost.

Intermolecular hydrogen bonds and van der Waals contacts were calculated for each protein–nucleic acid complex using HBPLUS (29). This algorithm locates proximal donor (D) and acceptor (A) atom pairs and calculates theoretical hydrogen atom (H) positions that fit geometrical criteria. The criteria used to define a hydrogen bond were H–A distance <2.7 Å, D–A distance <3.35 Å, D–H–A angle >90°. van der Waals contacts were defined as all contacts between atoms not involved in hydrogen bonds that were <3.9 Å apart. The algorithm GROW (30) was used to extract all the intermolecular protein–nucleic acid contacts from each complex.

For each family in each dataset a structural alignment was generated using CORA (31). Then from each alignment a set of non-homologous contacts (hydrogen bonds and van der Waals interactions) were extracted from the total set of interactions, using a method designed by N.M.Luscombe and J.M.Thornton (manuscript in preparation). In this process, if more than two structures used the same atoms from the same residue to contact the same atoms in the same nucleic acid base or backbone, only the contact from the highest resolution structure was retained. Not every contact was included as this would mean that a specific type of contact would occur multiple times in the dataset just because it was present in proteins that are members of a large family. When a protein was the only member of a family all its protein–nucleic acid interactions were included. In addition, a second filter was used in the case of van der Waals contacts. If a residue was involved in an intermolecular hydrogen bond, all the contacts from the atoms in that single residue were excluded from the set of van der Waals contacts. However, when nucleic acid bases were involved in intermolecular hydrogen bonds, contacts from atoms within the bases were included in the set of van der Waals contacts.

These observed contact distributions cannot be used to detect possible preferential contacts without the calculation of expected contact distributions. Such expected values can be generated by assessing the availability of protein residues and nucleic acid groups to make potential contacts, by calculating the average solvent accessibility of such groups. All the solvent accessibilities were calculated using Naccess (http://wolf.bms.umist.ac.uk/naccess). For the expected distribution of hydrogen bond contacts the average contribution to the accessible surface area made by polar atoms was calculated for each of the 20 amino acids from a dataset of 119 non-homologous monomeric proteins as used by N.M.Luscombe and J.M.Thornton (manuscript in preparation). The average accessible surface area contribution made by polar atoms in the bases and backbone components of the nucleic acids were also calculated from RNA molecules in the non-homologous dataset of protein–RNA complexes, and from the ssDNA molecules in the complete dataset of protein–ssDNA complexes. For the expected distributions of the van der Waals contacts average solvent accessibilities were calculated using all the atoms in the dataset of proteins, and in the bases and backbone of the nucleic acids in the RNA and ssDNA molecules, as before. All these values are given as additional material at http://www.biochem.ucl.ac.uk/bsm/RNA. Using the percentage ASA contributions for the protein residues and the nucleic acid components, expected contact distributions were calculated.

The numbers of observed and expected contacts made by each type of base and each type of amino acid residue in the protein–RNA complexes are shown in Table Table5.5. The contacts for the protein–ssDNA complexes and for a dataset of 131 protein–DNA complexes (N.M.Luscombe and J.M.Thornton, manuscript in preparation) are included as additional material at http://www.biochem.ucl.ac.uk/bsm/RNA.

Table 5.
Observed frequency distributions of (A) hydrogen bond contacts and (B) van der Waals contacts between the 20 amino acid residues and the components of RNA

In addition to this general survey, the contacts made by the 2′-hydroxyl group in the ribose of RNA molecules were also considered. The contacts made by the oxygen atom in this group (that is not present in the deoxyribose of DNA) were extracted from the non-homologous set of contacts, obtained as described above. The average solvent accessibility of this oxygen was also calculated from the RNA molecules in an uncomplexed state using Naccess (http://wolf.bms.umist.ac.uk/naccess.html).


The classifications used in Table Table11 emphasise the diverse nature of RNA recognition by proteins. Each class of recognition site (groove binding and β-sheet binding) is observed with more than one type of RNA structure (single-stranded, single stranded with single loop, single-stranded with multiple loops, double-stranded). The scene is further complicated by the proteins binding tRNAs, as these commonly have a domain exhibiting groove binding and another exhibiting β-sheet binding. The diversity of interactions is also evident when considering the functional groupings of the proteins. Although the proteins involved in protein synthesis all bind RNAs with single strands folded into multiple loops, both the viral proteins and the RNA modification proteins bind a number of different RNA structures, using both the groove and β-sheet modes of binding.

Nucleic acid binding site properties

The interface properties for protein–RNA, protein–ssDNA and protein–dsDNA are summarised in Table Table3.3. Before a detailed comparative analysis is made it should be highlighted that the protein–ssDNA dataset comprises only three structures and hence the results shown may not be representative of such complexes in general.

The RNA binding sites ranged in size from 370 to 2422 Å2, comprised between 3 and 24 sequence segments and included between 5 and 26 intermolecular hydrogen bonds (0.4–2.0 hydrogen bonds per 100 Å2 of interface ASA). The binding sites comprised between 32 and 60% polar atoms. These large variations for many properties further emphasise the diverse nature of the binding sites.

In general the protein sites that bind RNA are slightly smaller than the dsDNA binding sites but have more sequence segments. The RNA binding sites also appear to be less polar than the dsDNA binding sites and less well packed. They show a similar number of intermolecular hydrogen bonds but on average only half the number of bridging water molecules. Although this last result is probably reflective of the lower resolution of the structures in the protein–RNA dataset (mean resolution, excluding two NMR structures, is 2.6 Å) compared to the protein–dsDNA dataset (mean resolution 2.4 Å).

The poorer packing of the protein–RNA complexes (as indicated by the larger gap volume index in Table Table3) 3) may result from the complex secondary structures that the RNA molecules often form. Many of the interactions with protein occur at features such as bulges or stem–loops (16), where the second unpaired RNA sequence may restrict the very close approach of the protein. In enzymatic proteins that bind DNA, it was observed that they used an enveloping mode of binding, using a large interaction site to surround the DNA double helix (25). Such a mode would not be likely in protein–RNA structures when the RNA forms a complex tertiary structure.

The ssDNA binding sites are the smallest of the three types of complex and show considerably fewer intermolecular hydrogen bonds than either the protein–RNA or the protein–dsDNA complexes (Table (Table3).3). This could indicate non-specific binding of the DNA. However, they are better packed than the RNA complexes, as with only one strand of bases the protein can make a close approach without being restricted by the presence of a second strand of bases. In this light it is perhaps surprising that these structures are not better packed than the dsDNA complexes.

The comparison between the three functional subsets of the protein–RNA complexes reveals some interesting differences (Table (Table4).4). Those proteins involved in protein synthesis (principally the tRNA amino synthetases) have RNA binding sites more than 1.5 times the size of the RNA modification complexes and twice the size of the viral complexes. These synthetase structures have large binding sites as they comprise at least two structural domains, one that interacts with the acceptor stem and one with the anticodon arm of the RNA (Fig. (Fig.1).1). These effectively form two separate RNA recognition sites. The viral proteins have the most polar and least well packed RNA binding sites. However, it should be considered that these complexes only include a small part of the RNA actually encapsulated in the viral structure. For example, in the case of the coat protein from BMV (PDBcode 1BMV) only 20% of the packaged RNA is ordered and visible in the structure of the complex (32). Hence the full structures of the protein–vRNA complexes may reveal further interaction sites on the coat proteins with many weak interactions combining to form stable multi-site complexes.

Dividing the protein–RNA complexes into three sets (viral proteins, proteins involved in protein synthesis and proteins involved in RNA modification) effectively divides the complexes into those with RNA interactions that are (i) not sequence specific (excluding the MS2 coat protein complex), (ii) partially sequence specific and (iii) highly sequence specific. Hence the sequence-specific complexes appear to achieve their specificity through tight packing and relatively non-polar interfaces. It is surprising that these specific interactions do not feature more hydrogen bonds. RNA modification proteins have the least polar interaction sites but achieve the best packing with the RNA.

The residue interface propensities for the RNA binding sites are compared with those observed for dsDNA binding sites (25) (Fig. (Fig.2).2). For the protein–RNA complexes the highest propensities were observed for lysine, tyrosine, phenylalanine, isoleucine and arginine (in order of decreasing propensity). Hence, aromatic and positively charged amino acids play important roles. It is likely that the aromatics stack adjacent to the unpaired bases in the RNA molecules. The single aromatic amino acids also play key roles in protein–protein interfaces (28,33). In the protein–dsDNA complexes the highest propensities were observed for threonine, arginine, serine, asparagine and glycine (in order of decreasing propensity). The charged and polar residues play important roles in these complexes as they complement the negative charge on the DNA (25). The absence of aromatics reflects the helical dsDNA structure in which the faces of the bases are buried and not accessible for binding interactions.

Atom–atom contacts

In all three types of complex (protein–RNA, protein–ssDNA and protein–dsDNA) van der Waals contacts are significantly more common than hydrogen bond contacts. The van der Waals contacts represent 76.3, 92.6 and 92.2% of the total interactions in the protein–dsDNA, protein–ssDNA and protein–RNA complexes, respectively.

In the protein–RNA and protein–ssDNA complexes ~58% of the contacts made by the protein are to the bases of the nucleic acid, with the remainder made to the backbone. In the protein–dsDNA complexes the opposite trend is observed, with only 24% of contacts made to bases and the remainder to the backbone. This was to be expected, as in the former structures many of the nucleic acids are unpaired and are available to make both hydrogen bond and van der Waals contacts with protein residues. In the dsDNA complexes, the nucleic acids are tightly paired in the regular B-DNA structures and hence the bases are not easily accessible to interacting proteins, and many interactions occur through the backbone.

The observed distributions of hydrogen bond and van der Waals contacts made between protein residues and RNA components are shown with expected distributions in Table Table5.5. These data have been used to create a composite table (Table (Table6).6). If a row or column total in Table Table55 was twice the number of the expected value, the base and residue preferences were included in Table Table66 (items c and d). Similarly, if an individual table entry was five times the expected value the contact preference was included in Table Table66 (item e). The same criteria were used to extract the atom–atom contact data from the protein–ssDNA and protein–dsDNA included as supplementary material at http://www.biochem.ucl.ac.uk/bsm/RNA. The composite data in Table Table66 do show some apparent preferences for specific bases, residues and nucleic residue contacts.

Table 6.
Summary of the contact preferences shown by protein–RNA, protein–ssDNA and protein–dsDNA complexes

In protein–RNA interactions, the van der Waals contacts far outnumber the hydrogen bonding in contacts. The proteins in these complexes show a preference to contact the purine guanine and the pyrimidine uracil, using both van der Waals contacts and hydrogen bonds. The proteins show a preference for the residues arginine, tyrosine and phenylalanine to be present in the RNA binding site.

A preference for hydrogen bonding contacts to guanine was also observed in the protein–dsDNA complexes, as was the preference for the residue arginine to be in the binding site. In the protein–ssDNA complexes, no preference was observed for contacts to any base, but a preference was observed for the residues methionine, phenylalanine and tryptophan, cysteine and serine to be present in the DNA binding sites. In both types of protein–DNA complex, van der Waals contacts were far more prevalent than hydrogen bonding contacts, as observed in the protein–RNA complexes. However, there were far more hydrogen bonding contacts observed in the protein–dsDNA complexes, than in either the protein–RNA or protein–ssDNA complexes (24%, compared to 8 and 7%, respectively).

The ratio of the number of observed contacts made to the nucleic acid bases and backbone are shown in Table Table7. 7. This shows that in the RNA complexes, hydrogen bond contacts to the bases and the backbone are present in equal numbers, as observed in the protein–ssDNA complexes. This is in contrast to the protein–dsDNA complexes in which there are half the numbers of hydrogen bonds made to the bases compared to the backbone. This is most likely as a result of the high numbers of unpaired bases in the RNA structures (and in the ssDNA). In both the protein–RNA and the protein–ssDNA complexes there are more than 1.5 times the number of contacts made to the bases compared to the backbone. In contrast, the protein–dsDNA complexes show only a third of the van der Waals contacts are made to the bases.

Table 7.
The ratios of intermolecular hydrogen bond and van der Waals contacts made between the protein and the base/backbone component of the nucleic acid for the three datasets of protein–nucleic acid complexes: protein–RNA (complete ...

Of the 23 hydrogen bonds made between protein residues and the ribose sugar of the RNA, all were made by the oxygen atom of the 2′-hydroxyl group. Of the 308 van der Waals contacts made between the protein and the sugar, 105 (34%) were made by the oxygen atom of the 2′-hydroxyl group. Of the 21 hydrogen bonds between protein and dsDNA all were made by the O4 atom in the pentose sugar ring, whilst 285 (27%) of the van der Waals contacts are made by the C5 carbon in the ribose ring. The oxygen atoms in the 2′-hydroxyl groups in the RNA molecules are highly solvent exposed (mean ASA is 22.2 Å2 in the current dataset) compared with the other oxygens in the sugar (O3*, O4*, O5* have mean ASAs of 7.36, 3.44 and 1.4 Å2, respectively). The 2′-hydroxyl group can be both a hydrogen bond donor and an acceptor and hence can potentially interact with many amino acids of the protein. The protruding nature of the 2′-hydroxyl groups has already been observed in a number of structures including MS2 coat protein and the tRNA synthetases. It has been observed that in such structures there are key ribose groups that, when substituted for deoxyribose, greatly reduce the affinity for the RNA to bind the protein (34).


The current analysis presents a similar picture to that observed in DNA binding proteins, in that there is not a single archetypal RNA binding site. In the current dataset, the largest analysed in this way, there are 32 proteins, representing 14 structural families. When the predominant secondary structure element of each binding site was analysed, the sites were equally divided between α-helix and β-strand, with only one example of an αβ interface. The RNAs bound include elongated single-stranded, looped single-stranded, single-stranded with multiple loops and double-helix structures. The size and polarity of the RNA binding sites vary widely, as do the modes of recognition used by the protein and the RNA structures recognised. Thus, the picture presented is far more complicated than that of protein–DNA complexes (25).

Similar modes of secondary structure contacts are observed in proteins binding RNA to those that bind DNA (24). However, when looking more closely at amino acid preferences and base versus backbone contacts, similarities are much harder to find. The unpaired state of many of the bases in RNA structures means that they are more readily available to make contacts with amino acids residues than those in the tightly paired double helices of dsDNA. Hydrogen bond contacts to all parts of the RNA are far less common than in the protein–dsDNA complexes. The ratios of contacts made to the nucleic acid bases and the backbone (Table (Table7)7) show the differences between protein–RNA and protein–dsDNA complexes, and the similarities between the contacts made to RNA and ssDNA.

However, some trends do emerge from the contact data. It is evident that van der Waals interactions are more numerous in protein–RNA complexes than hydrogen bonds. A preference for proteins to make contacts with guanine was observed, and arginine, asparagine, phenylalanine, threonine and tyrosine occur in RNA binding sites more often than expected.

One of the features of the current work is the comparison of the observations made for protein–RNA complexes with those for protein–ssDNA and protein–dsDNA complexes. In terms of size, the protein–RNA complexes are intermediate between the two types of protein–DNA complexes, but they are the least well packed of all three types of complex. The poor packing of the protein–RNA complexes is a result of the complex tertiary structure that the RNA chains form. The atom contact analysis showed that the purine base guanine is preferentially contacted by proteins in both RNA and dsDNA structures.

One issue that has not been addressed here is conformational changes on binding. With the recent availability of additional protein–RNA complexes from the ribosome (1215) it has become evident that almost every complex involves conformational changes in the protein, the RNA or both (35). For the protein it is frequently a case of a transition from an unstructured to a structured state of some part of the binding interface. For example, the structure of the L11 protein has two extended loops that are disordered in the absence of RNA but are defined structures in the complex (36).

Despite the recent addition of ribosomal subunit structures to the PDB and NDB (1215) there are still a relatively small number of characterised protein–RNA complex structures. Purification and crystallisation difficulties has meant that their presence in the databases lags behind those of the protein–DNA complexes. Many higher resolution structures, like those from the ribosome, are required before firmer conclusions can be drawn about the most common modes of interaction.

By looking for physical and structural features that characterise RNA binding sites on proteins, it may be possible to predict the location of such sites on proteins for which complexes have not yet been solved. This has successfully been achieved for protein–protein binding sites (37,38) using combinations of interface properties, including interface propensities. Knowing the characteristics of RNA binding sites may also be helpful in designing novel RNA binding proteins.


We would like to acknowledge the support of all those involved in the Nucleic Acid Databank (NDB), and thank Gabriele Varani for helpful discussions. This work was carried out with funding from the Department of Energy (USA) (grant number DE-FG02096ER62166.A000). This is a publication from the BBSRC Bloomsbury Centre for Structural Biology.


1. Moore P.B. (1998) The three-dimensional structure of the ribosome and its components. Annu. Rev. Biophys. Biomol. Struct., 27, 35–58. [PubMed]
2. Ramakrishnan V. and White,S.W. (1998) Ribosomal protein structures: insights into the architecture, machinery and evolution of the ribosome. Trends Biochem. Sci., 23, 208–212. [PubMed]
3. Luhrmann R., Kastner,B. and Bach,M. (1990) Structure of spliceosomal snRNP’s and their role in pre-mRNA splicing. Biochim. Biophys. Acta, 1087, 265–292. [PubMed]
4. Tarasow T.M. and Eaton,B.E. (1998) Dressed for success: realising the catalytic potential of RNA. Biopolymers ,48, 29–37.
5. Scott W.G. and Klug,A. (1996) Ribozymes: structures and mechanism in RNA catalysis. Trends Biochem. Sci., 21, 220–224. [PubMed]
6. Scott W.G. (1998) RNA catalysis. Curr. Opin. Struct. Biol., 8, 720–726. [PubMed]
7. Moras D. (1992) Aminoacyl-tRNA synthetases. Curr. Opin. Struct. Biol., 2, 138–142.
8. Varani G. and Nagai,K. (1998) RNA recognition by RNP proteins during RNA processing. Annu. Rev. Biophys. Biomol. Struct., 27, 407–445. [PubMed]
9. Steitz T.A. (1990) Structural studies of protein nucleic-acid interaction: the sources of sequence specific binding. Q. Rev. Biophys., 23, 205–210. [PubMed]
10. Harrison S.C. (1991) A structural taxonomy of DNA-binding domains. Nature, 353, 715–719. [PubMed]
11. Luscombe N.M., Austin,S.E., Berman,H.M. and Thornton,J.M. (2000) An overview of the structures of protein–DNA complexes. Genome Biol., 1, 1–37. [PMC free article] [PubMed]
12. Wimberly B.T., Brodersen,D.E., Clemons,W.M., Morgan-Warren,R.J., Carter,A.P., Vonrhein,C., Hartsch,T. and Ramakrishnan,V. (2000) Structure of the 30S ribosomal subunit. Nature, 407, 327–339. [PubMed]
13. Agalarov S.C., Prasad,G.S., Funke,P.M., Stout,C.D. and Williamson,J.R. (2000) Structure of the S15,S18-rRNA complex: assembly of the 30S ribosome central domain. Science, 288, 107–112. [PubMed]
14. Schluenzen F., Tocilj,A., Zarivach,R., Harms,J., Gluehmann,M., Janell,D., Bashan,A., Bartels,H., Agmon,I., Franceschi,F. and Yonath,A. (2000) Structure of functionally activated small ribosomal subunit at 3.3 angstroms resolution. Cell, 102, 615–623. [PubMed]
15. Berman H.M., Olson,W.K., Beveridge,D.L., Westbrook,J., Gelbin,A., Demeny,T., Hsieh,S.H., Srinivasan,A.R. and Schneider,B. (1992) The nucleic-acid database: a comprehensive relational database of 3-dimensional structures of nucleic acids. Biophys. J., 63, 751–759. [PMC free article] [PubMed]
16. Nagai K. (1996) RNA–protein complexes. Curr. Opin. Struct. Biol., 6, 53–61. [PubMed]
17. Steitz T.A. (1999) RNA recognition by proteins. In Gesteland,R.F., Cech,T.R. and Atkins,J.F. (eds), The RNA World. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, pp. 427–450.
18. Mattaj I.W. (1993) RNA recognition: a family matter? Cell, 73, 837–840. [PubMed]
19. Nagai K. (1992) RNA–protein interactions. Curr. Opin. Struct. Biol., 2, 131–137.
20. Guzman R.N., Turner,R.B. and Summers,M.F. (1998) Protein–RNA recognition. Biopolymers, 48, 181–195. [PubMed]
21. Draper D.E. (1995) Protein–RNA recognition. Annu. Rev. Biochem., 64, 593–620. [PubMed]
22. Cusack S. (1999) RNA–protein complexes. Curr. Opin. Struct. Biol., 9, 66–73. [PubMed]
23. Arnez J.G. and Moras,D. (1997) Structural and functional considerations of the aminoacylation reaction. Trends Biochem. Sci., 22, 211–216. [PubMed]
24. Draper D.E. (1999) Themes in RNA–protein recognition. J. Mol. Biol., 293, 255–270. [PubMed]
25. Jones S., van Heyningen,P., Berman,H.M. and Thornton,J.M. (1999) Protein–DNA interactions: a structural analysis. J. Mol. Biol., 287, 877–896. [PubMed]
26. Berman H.M., Westbrook,J., Feng,Z., Gilliland,G., Bhat,T.N., Weissig,H., Shindyalov,I.N. and Bourne,P.E. (2000) The Protein Data Bank. Nucleic Acids Res., 28, 235–242. [PMC free article] [PubMed]
27. Taylor W.R. and Orengo,C.A. (1989) Protein structure alignment. J. Mol. Biol., 208, 1–22. [PubMed]
28. Jones S. and Thornton,J.M. (1996) Principles of protein–protein interactions. Proc. Natl Acad. Sci. USA, 93, 13–20. [PMC free article] [PubMed]
29. McDonald I.K. and Thornton,J.M. (1994) Satisfying hydrogen-bonding potential in proteins. J. Mol. Biol., 238, 777–793. [PubMed]
30. Milburn D., Laskowski,R.A. and Thornton,J.M. (1998) Sequences annotated by structure: a tool to facilitate the use of structural information in sequence analysis. Protein Eng., 11, 855–859. [PubMed]
31. Orengo C.A. (1999) CORA—Topological fingerprints for protein structural families. Protein Sci., 8, 699–715. [PMC free article] [PubMed]
32. Chen Z.G., Stauffacher,C., Li,Y.G., Schmidt,T., Bomu,W., Kamer,G., Shanks,M., Lomonossoff,G. and Johnson,J.E. (1989) Protein–RNA interactions in an icosohedral virus at 3.0 angstroms resolution. Science, 245, 154–159. [PubMed]
33. Argos P. (1988) An investigation of protein subunit and domain interfaces. Protein Eng., 2, 101–113. [PubMed]
34. Talbot S.J., Goodman,S., Bates,S.R.E., Fishwick,C.W.G. and Stockley,P.G. (1990) Use of synthetic oligonucleotides to probe RNA–protein interactions in the MS2 translational operator complex. Nucleic Acids Res., 18, 3521–3528. [PMC free article] [PubMed]
35. Williamson J.R. (2000) Induced fit in RNA–protein recognition. Nature Struct. Biol., 7, 834–837. [PubMed]
36. Wimberly B.T., Guymon,R., McCutcheon,J.P., White,S.W. and Ramakrishnan,V. (1999) A detailed view of a ribosomal active site: the structure of the L11–RNA complex. Cell, 97, 491–502. [PubMed]
37. Jones S. and Thornton,J.M. (1997) Analysis of protein–protein interaction sites using surface patches. J. Mol. Biol., 272, 121–132. [PubMed]
38. Jones,S. and Thornton,J.M. (1997) Prediction of protein–protein interaction sites using patch analysis. J. Mol. Biol., 272, 133–143. [PubMed]
39. Fauchere,J. and Pliska,V. (1983) Hydrophobic parameters of amino acid side chains from the partitioning of N-acetyl amino acid amides. Eur. J. Med. Chem., 18, 369–375.

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press
PubReader format: click here to try


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...