Logo of narLink to Publisher's site
Nucleic Acids Res. Jan 2011; 39(Database issue): D277–D282.
Published online Nov 11, 2010. doi:  10.1093/nar/gkq1108
PMCID: PMC3013700

PRIDB: a protein–RNA interface database

Abstract

The Protein–RNA Interface Database (PRIDB) is a comprehensive database of protein–RNA interfaces extracted from complexes in the Protein Data Bank (PDB). It is designed to facilitate detailed analyses of individual protein–RNA complexes and their interfaces, in addition to automated generation of user-defined data sets of protein–RNA interfaces for statistical analyses and machine learning applications. For any chosen PDB complex or list of complexes, PRIDB rapidly displays interfacial amino acids and ribonucleotides within the primary sequences of the interacting protein and RNA chains. PRIDB also identifies ProSite motifs in protein chains and FR3D motifs in RNA chains and provides links to these external databases, as well as to structure files in the PDB. An integrated JMol applet is provided for visualization of interacting atoms and residues in the context of the 3D complex structures. The current version of PRIDB contains structural information regarding 926 protein–RNA complexes available in the PDB (as of 10 October 2010). Atomic- and residue-level contact information for the entire data set can be downloaded in a simple machine-readable format. Also, several non-redundant benchmark data sets of protein–RNA complexes are provided. The PRIDB database is freely available online at http://bindr.gdcb.iastate.edu/PRIDB.

INTRODUCTION

Protein–RNA interactions play critical roles in myriad and diverse biological processes, including many recently discovered regulatory functions, in addition to well-studied roles in protein synthesis, DNA replication, regulation of gene expression and defense against pathogens (1–9). Despite their importance, structures of protein–RNA complexes have proven difficult to obtain using experimental structure determination methods; such structures constitute only ~1% of structures in the Protein Data Bank (PDB) (10). For this reason, several computational methods for predicting the interfaces in protein–RNA complexes have been developed (11–21). Virtually all such methods require data in the form of information about structurally characterized protein–RNA complexes and their interfaces.

PRIDB is a repository of protein–RNA interface information derived from structures in the PDB. PRIDB is designed to facilitate detailed analyses of individual protein–RNA complexes of interest and rapid identification of interfacial atoms and residues in both the protein and RNA chains of a chosen complex or user-defined set of complexes. In addition, PRIDB can be used to generate data sets of protein–RNA interfaces for machine learning applications, such as the generation of classifiers for predicting interfaces in protein–RNA complexes for which high-resolution structures are not available.

Related databases/servers

To our knowledge, only one other up-to-date and comprehensive online repository of protein–RNA interfaces is currently available: Biological Interaction Database for Protein-Nucleic Acid (BIPA) (22). BIPA provides a list of protein–RNA (and protein–DNA) complexes from the PDB and displays RNA-binding residues within the linear primary sequence of a chosen protein, or within a multiple sequence alignment of related RNA-binding proteins. PRIDB complements BIPA by providing atomic- and residue-level interfacial information for both the RNA and protein chains of complexes, providing previously published reduced-redundancy data sets and allowing users to make advanced queries and compile custom data sets. Other collections of protein–RNA complexes and related resources include NDB (http://ndbserver.rutgers.edu/) (23), PRID (http://www-bioc.rice.edu/~shamoo/prid.html) (24), RsiteDB (http://bioinfo3d.cs.tau.ac.il/RsiteDB/) (25), w3DNA (http://w3dna.rutgers.edu/) (26), NPIDB (http://monkey.belozersky.msu.ru/NPIDB) (27), ProNIT (http://gibk26.bse.kyutech.ac.jp/jouhou/pronit/pronit.html) (28) and the RNP Databases http://rnp.uthct.edu/index.html/). Several excellent databases of protein–DNA interfaces are also available, including PDIdb (http://melolab.org/pdidb/) (29) and hPDI (http://bioinfo.wilmer.jhu.edu/PDI/).

DATABASE CONTENTS

Data extraction, interface definition and motif identification

Atomic coordinate information for all 926 protein–RNA complexes in the Protein Data Bank (PDB) on 10 October 2010 was extracted using the REST API advanced search interface. To generate this comprehensive data set (rRB926), no filters based on sequence redundancy, structure resolution or other criteria were applied (see ‘Non-redundant Benchmark data sets’ below). The complex structures in rRB926 were then scanned to identify interacting amino acids and ribonucleotides using two different definitions: (i) a simple distance-based definition in which a given amino acid residue (AA) in a protein chain is defined as interacting with a ribonucleotide (rNT) in an RNA chain if any atom in AA is within a 5-Å radius of any atom in rNT; and (ii) a rule-based definition based on that of Allers and Shamoo (30), in which interactions are classified as van der Waals, hydrogen-bonding, hydrophobic or electrostatic interactions, involving specific AAs and rNTs. All such interacting AAs and rNTs are defined as ‘interface’ residues.

ProSite patterns and profiles (31) appearing in any of the protein sequences in the database were retrieved using the ScanProsite REST service (32). RNA structural motifs were identified in RNA sequences using FR3D’s (33) pure symbolic search function; specific motif definitions used for these scans are available in the Tutorial and FAQs section of the PRIDB online server.

Non-redundant benchmark data sets

Because PRIDB is intended to be a comprehensive collection of protein–RNA complexes from the PDB, the rRB926 data set was not filtered on the basis of redundancy, structure determination method, resolution or protein/RNA chain length. While it is possible to filter with such criteria using PRIDB’s advanced search function, several pre-calculated benchmark data sets, which have been filtered to limit redundancy and to exclude low-resolution structures, are also provided for the user’s convenience. These include two previously published data sets, RB109 (17,34) and RB147 (35), as well as a larger, more recently extracted data set (RB199) (B. Lewis, submitted for publication). Complete lists of the PDB IDs for protein–RNA complexes in these data sets, in addition to the pre-calculated interface residue statistics, can be readily accessed from the ‘Datasets’ section of the PRIDB homepage.

Implementation and availability

PRIDB runs on the Apache 2.2 web server, using MySQL 14.14 as a database backend with AJAX and PHP 5 for user interface functions. Functions not requiring use of the database (e.g. calculating interface residues for a user-submitted complex) are implemented using standalone Perl 5 scripts and the BioPerl module (36). All PRIDB code is available on request under the Creative Commons Attribution Non-Commercial License. All data currently in PRIDB was obtained from databases or programs which impose no restrictions on academic use.

PRIDB summary statistics

As summarized in Table 1, the current version of PRIDB contains structural information for a total of 926 protein–RNA complexes available in the PDB as of 10 October 2010. These structures contain 9689 total protein chains, among which there are only 1174 unique sequences. While this would seem to indicate that most sequences in the database are repeated several times, this is not the case; 395 of the 1174 (34%) sequences appear only once, and 899 (77%) appear less than eight times (the ‘expected’ average redundancy). This disparity is due to the large proportion of ribosomal structures in the PDB (and, by extension, in PRIDB); 9 of the top 10 most abundant sequences, each present in more than 70 structures, are ribosomal proteins. The most abundant sequence, repeated more than 100 times, is that of the TRP-responsive attenuation protein, a protein for which numerous multimeric structures have been solved.

Table 1.
PRIDB contents: complexes and chains

As shown in Table 2, PRIDB currently contains 1 475 774 amino acid residues. Based on a 5Å distance cutoff definition for interfacial residues, 397 216 of these residues interact with RNA; of 851 853 ribonucleotide residues in PRIDB, 322 858 interact with protein. On average, 38% of the amino acids in the RNA-binding proteins directly interact with RNA, and 28% of the ribonucleotides in the bound RNAs directly interact with protein. As before, these averages are skewed by the prevalence of ribosome structures; ribosomal proteins account for ~90% of interacting amino acid residues and ~60% of interacting nucleotides.

Table 2.
PRIDB summary statistics

USER INTERFACE

PRIDB provides a ‘Tutorial and FAQs’ section with detailed instructions on using PRIDB’s web interface; a list and brief descriptions of key capabilities of PRIDB are provided here. Using the ‘Basic Search’ function, users can retrieve information about protein–RNA complexes using their PDB ID or a keyword. Using the ‘Advanced Search’ function, users can filter results by specifying:

  • the experimental method used to determine the complex structure (e.g. X-ray diffraction, nuclear magnetic resonance);
  • a resolution range or threshold (for structures determined using X-ray diffraction, electron microscopy or fiber diffraction);
  • the minimum or maximum length of protein or RNA chains within the complex;
  • an amino acid or nucleotide subsequence found within the sequence of at least one of the protein or RNA chains in the complex; and
  • a motif (as defined by ProSite for protein chains or FR3D for RNA chains) found within at least one chain in the complex.

The ‘Advanced Search’ function also allows users to either specify a different distance cutoff for the distance-based interaction definition or choose the alternative rule-based definition.

As shown in Figure 1, when viewing search results, PRIDB provides:

  • a summary of and basic information (name, resolution and structure determination method) about each complex, as well as a link to that complex’s PDB entry;
  • a linear display of the amino acid and nucleotide residues in each chain of each complex, with residues in the protein–RNA interface highlighted;
  • a display of residues (in red font) that are part of a protein or RNA motif, with information about that motif (and a link back to its source) provided on mouse-over;
  • a JMol applet for 3D visualization of each complex, with interacting amino acid and nucleotide residues colored (Figure 2A); and
    Figure 2.
    (A) PRIDB provides a JMol applet for visualizing and manipulating interfaces within 3-D structures. (B) PRIDB output can be downloaded as a CSV file.
  • a link to a dynamically-generated file containing atomic-level interface information for each result in a machine readable format (Figure 2B).

Figure 1.
Sample PRIDB output. Amino acid residues and ribonucleotides highlighted in yellow are located in the protein–RNA interface; residues in red font are part of a ProSite or FR3D motif.

In addition to providing machine-readable results files for all searches, pre-computed results files for the non-redundant RB109, RB147 and RB199 data sets described above have been made available. These files, along with the complete PRIDB database (rRB926), can be downloaded from the ‘Datasets’ section of the website. Users can also generate a machine-readable list of interface residues for any arbitrary collection of complexes by inputting a list of PDB IDs. Results files contain a single line for each pair of interacting atoms listing the specific interacting atoms (by chain name, residue number and atom name) and the distance between them.

Users may also calculate interface residues for protein–RNA complexes that are not in PDB using PRIDB by submitting a structure file in PDB format. A results file containing interface residues (as calculated using PRIDB’s 5 Å cutoff) is returned via e-mail.

CONCLUSIONS AND FUTURE DIRECTIONS

PRIDB provides researchers with atomic and residue-level information about structures of protein–RNA complexes and their interfaces, facilitating analyses of protein–RNA interactions by pre-computing commonly used information and by providing structural information both interactively onscreen and in a machine-readable format. It allows users to rapidly identify and visualize interfaces in protein–RNA complexes on a residue-by-residue basis and displays identified ProSite or FR3D motifs along with the amino acid or ribonucleotide sequences. PRIDB can be used to generate custom data sets of protein–RNA interfaces for statistical analyses and machine learning applications. The PRIDB server also provides pre-calculated benchmark data sets of protein–RNA complexes for evaluating the performance of interface prediction methods. PRIDB will be updated regularly as new structures are released through PDB, and is intended to be a stable resource for researchers in the field of protein–RNA interactions.

Future versions of PRIDB will include additional protein and RNA motifs from other sources, such as PRINTS (37), PIRSF (38) and other InterPro (39) member databases. In addition, the current JMol 3D visualization capabilities will be extended to user-submitted structures, allowing for more facile manipulation and examination of interfaces in complexes not currently in the PDB.

FUNDING

National Institutes of Health (GM066387 to V.H. and D.D.); the National Science Foundation [IGERT0504304 (to D.D.); GK120947929 (to B.A.L.); NIBIB-NSF0608769 (to V.H., J.F. and C.Z.)]; Iowa State University’s Center for Integrated Animal Genomics (to B.A.L. and D.D.); Center for Computational Intelligence, Learning and Discovery (to V.H.). Funding for open access charge: Center for Computational Intelligence, Learning and Discovery.

Conflict of interest statement. None declared.

ACKNOWLEDGEMENTS

The authors thank members of our research groups for helpful discussions and especially Usha Muppirala for critical comments on the PRIDB server and manuscript.

REFERENCES

1. Fabian MR, Sonenberg N, Filipowicz W. Regulation of mRNA translation and stability by microRNAs. Annu. Rev. Biochem. 2010;79:351–379. [PubMed]
2. Hogan DJ, Riordan DP, Gerber AP, Herschlag D, Brown PO. Diverse RNA-binding proteins interact with functionally related sets of RNAs, suggesting an extensive regulatory system. PLoS Biol. 2008;6:e255. [PMC free article] [PubMed]
3. Licatalosi DD, Darnell RB. RNA processing and its regulation: global insights into biological networks. Nat. Rev. Genet. 2010;11:75–87. [PMC free article] [PubMed]
4. Lorkovic ZJ. Role of plant RNA-binding proteins in development, stress response and genome organization. Trends Plant Sci. 2009;14:229–236. [PubMed]
5. Lukong KE, Chang KW, Khandjian EW, Richard S. RNA-binding proteins in human genetic disease. Trends Genet. 2008;24:416–425. [PubMed]
6. Lunde BM, Moore C, Varani G. RNA-binding proteins: modular design for efficient function. Nat. Rev. Mol. Cell Biol. 2007;8:479–490. [PubMed]
7. Mansfield KD, Keene JD. The ribonome: a dominant force in co-ordinating gene expression. Biol. Cell. 2009;101:169–181. [PubMed]
8. Mittal N, Roy N, Babu MM, Janga SC. Dissecting the expression dynamics of RNA-binding proteins in posttranscriptional regulatory networks. Proc. Natl Acad. Sci. USA. 2009;106:20300–20305. [PMC free article] [PubMed]
9. Mohammad MM, Donti TR, Sebastian Yakisich J, Smith AG, Kapler GM. Tetrahymena ORC contains a ribosomal RNA fragment that participates in rDNA origin recognition. EMBO J. 2007;26:5048–5060. [PMC free article] [PubMed]
10. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE. The protein data bank. Nucleic Acids Res. 2000;28:235–242. [PMC free article] [PubMed]
11. Liu ZP, Wu LY, Wang Y, Zhang XS, Chen L. Prediction of protein-RNA binding sites by a random forest method with combined features. Bioinformatics. 2010;26:1616–1622. [PubMed]
12. Murakami Y, Spriggs RV, Nakamura H, Jones S. PiRaNhA: a server for the computational prediction of RNA-binding residues in protein sequences. Nucleic Acids Res. 2010;38(Suppl.):W412–W416. [PMC free article] [PubMed]
13. Perez-Cano L, Fernandez-Recio J. Optimal protein-RNA area, OPRA: a propensity-based method to identify RNA-binding sites on proteins. Proteins. 2010;78:25–35. [PubMed]
14. Maetschke SR, Yuan Z. Exploiting structural and topological information to improve prediction of RNA-protein binding sites. BMC Bioinformatics. 2009;10:341. [PMC free article] [PubMed]
15. Shazman S, Mandel-Gutfreund Y. Classifying RNA-binding proteins based on electrostatic properties. PLoS Comput. Biol. 2008;4:e1000146. [PMC free article] [PubMed]
16. Wang L, Huang C, Yang MQ, Yang JY. BindN+ for accurate prediction of DNA and RNA-binding residues from protein sequence features. BMC Syst Biol. 2010;4(Suppl. 1):S3. [PMC free article] [PubMed]
17. Terribilini M, Lee JH, Yan C, Jernigan RL, Honavar V, Dobbs D. Prediction of RNA binding sites in proteins from amino acid sequence. RNA. 2006;12:1450–1462. [PMC free article] [PubMed]
18. Wang L, Brown SJ. Prediction of RNA-binding residues in protein sequences using support vector machines. Conf. Proc. IEEE Eng. Med. Biol. Soc. 2006;1:5830–5833. [PubMed]
19. Towfic F, Caragea C, Gemperline DC, Dobbs D, Honavar V. Struct-NB: predicting protein-RNA binding sites using structural features. Int. J. Data Min. Bioinform. 2010;4:21–43. [PMC free article] [PubMed]
20. Kumar M, Gromiha MM, Raghava GP. SVM based prediction of RNA-binding proteins using binding residues and evolutionary information. J. Mol. Recognit. 2010 doi:10.1002/jmr.1061. [PubMed]
21. Wang CC, Fang Y, Xiao J, Li M. Identification of RNA-binding sites in proteins by integrating various sequence information. Amino Acids. 2010 doi:10.1007/s00726-010-0639-7. [PubMed]
22. Lee S, Blundell TL. BIPA: a database for protein-nucleic acid interaction in 3D structures. Bioinformatics. 2009;25:1559–1560. [PubMed]
23. Berman HM, Olson WK, Beveridge DL, Westbrook J, Gelbin A, Demeny T, Hsieh SH, Srinivasan AR, Schneider B. The nucleic acid database. A comprehensive relational database of three-dimensional structures of nucleic acids. Biophys. J. 1992;63:751–759. [PMC free article] [PubMed]
24. Morozova N, Allers J, Myers J, Shamoo Y. Protein-RNA interactions: exploring binding patterns with a three-dimensional superposition analysis of high resolution structures. Bioinformatics. 2006;22:2746–2752. [PubMed]
25. Shulman-Peleg A, Nussinov R, Wolfson HJ. RsiteDB: a database of protein binding pockets that interact with RNA nucleotide bases. Nucleic Acids Res. 2009;37:D369–D373. [PMC free article] [PubMed]
26. Zheng G, Lu XJ, Olson WK. Web 3DNA–a web server for the analysis, reconstruction, and visualization of three-dimensional nucleic-acid structures. Nucleic Acids Res. 2009;37:W240–W246. [PMC free article] [PubMed]
27. Spirin S, Titov M, Karyagina A, Alexeevski A. NPIDB: a database of nucleic acids-protein interactions. Bioinformatics. 2007;23:3247–3248. [PubMed]
28. Kumar MD, Bava KA, Gromiha MM, Prabakaran P, Kitajima K, Uedaira H, Sarai A. ProTherm and ProNIT: thermodynamic databases for proteins and protein-nucleic acid interactions. Nucleic Acids Res. 2006;34:D204–D206. [PMC free article] [PubMed]
29. Norambuena T, Melo F. The Protein-DNA Interface database. BMC Bioinformatics. 2010;11:262. [PMC free article] [PubMed]
30. Allers J, Shamoo Y. Structure-based analysis of protein-RNA interactions using the program ENTANGLE. J. Mol. Biol. 2001;311:75–86. [PubMed]
31. Sigrist CJ, Cerutti L, de Castro E, Langendijk-Genevaux PS, Bulliard V, Bairoch A, Hulo N. PROSITE, a protein domain database for functional characterization and annotation. Nucleic Acids Res. 2010;38:D161–D166. [PMC free article] [PubMed]
32. de Castro E, Sigrist CJ, Gattiker A, Bulliard V, Langendijk-Genevaux PS, Gasteiger E, Bairoch A, Hulo N. ScanProsite: detection of PROSITE signature matches and ProRule-associated functional and structural residues in proteins. Nucleic Acids Res. 2006;34:W362–W365. [PMC free article] [PubMed]
33. Sarver M, Zirbel CL, Stombaugh J, Mokdad A, Leontis NB. FR3D: finding local and composite recurrent structural motifs in RNA 3D structures. J. Math. Biol. 2008;56:215–252. [PMC free article] [PubMed]
34. Terribilini M, Lee JH, Yan C, Jernigan RL, Carpenter S, Honavar V, Dobbs D. Identifying interaction sites in ‘recalcitrant’ proteins: predicted protein and RNA binding sites in rev proteins of HIV-1 and EIAV agree with experimental data. Pac. Symp. Biocomput. 2006:415–426. [PMC free article] [PubMed]
35. Terribilini M, Sander JD, Lee JH, Zaback P, Jernigan RL, Honavar V, Dobbs D. RNABindR: a server for analyzing and predicting RNA-binding sites in proteins. Nucleic Acids Res. 2007;35:W578–W584. [PMC free article] [PubMed]
36. Stajich JE, Block D, Boulez K, Brenner SE, Chervitz SA, Dagdigian C, Fuellen G, Gilbert JG, Korf I, Lapp H, et al. The Bioperl toolkit: Perl modules for the life sciences. Genome Res. 2002;12:1611–1618. [PMC free article] [PubMed]
37. Attwood TK, Bradley P, Flower DR, Gaulton A, Maudling N, Mitchell AL, Moulton G, Nordle A, Paine K, Taylor P, et al. PRINTS and its automatic supplement, prePRINTS. Nucleic Acids Res. 2003;31:400–402. [PMC free article] [PubMed]
38. Wu CH, Nikolskaya A, Huang H, Yeh LS, Natale DA, Vinayaka CR, Hu ZZ, Mazumder R, Kumar S, Kourtesis P, et al. PIRSF: family classification system at the protein information resource. Nucleic Acids Res. 2004;32:D112–D114. [PMC free article] [PubMed]
39. Hunter S, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork P, Das U, Daugherty L, Duquenne L, et al. InterPro: the integrative protein signature database. Nucleic Acids Res. 2009;37:D211–D215. [PMC free article] [PubMed]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

  • PubMed
    PubMed
    PubMed citations for these articles
  • Substance
    Substance
    PubChem Substance links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...