![]() | ![]() |
Formats:
|
||||||||||
Copyright © 2008 Biomedical Informatics Publishing Group Structural segments and residue propensities in protein-RNA interfaces: Comparison with protein-protein and protein-DNA complexes 1Department of Biochemistry and Bioinformatics Centre, Bose Institute, Kolkata 700054, India *Corresponding author: E-mail: pinak/at/boseinst.ernet.in; Phone: 91 33 2355 0256; Fax: 91 33 2355 3886 Received May 23, 2008; Revised June 19, 2008; Accepted July 7, 2008. This is an open-access article, which permits unrestricted use, distribution, and reproduction in any medium,
for non-commercial purposes, provided the original author and source are credited. This article has been cited by other articles in PMC.Abstract The interface of a protein molecule that is involved in binding another protein, DNA or RNA has been characterized in terms of the number of unique secondary structural segments (SSSs),
made up of stretches of helix, strand and non-regular (NR) regions. On average 10-11 segments define the protein interface in protein-protein (PP) and protein-DNA (PD) complexes, while the
number is higher (14) for protein-RNA (PR) complexes. While the length of helical segments in PP interaction increases with the interface area, this is not the case in PD and PR complexes.
The propensities of residues to occur in the three types of secondary structural elements (SSEs) in the interface relative to the corresponding elements in the protein tertiary structures
have been calculated. Arg, Lys, Asn, Tyr, His and Gln are preferred residues in PR complexes; in addition, Ser and Thr are also favoured in PD interfaces. Keywords: protein-protein interactions, protein-DNA interactions, protein-RNA interactions, binding interface, protein secondary structure Abbreviations PP - protein-protein, PD - protein-DNA, PR - protein-RNA, SSE - secondary structural element, SSS - secondary structural segment Background Characterization of protein-protein (PP), protein-DNA (PD) and protein-RNA (PR) interactions is essential for understanding the mechanisms of biological processes on a molecular level. Interactions
are highly specific and any distortion may be deleterious to the cellular function. Various experimental techniques have been employed to identify the interactions [1], with X-ray crystallography and NMR
spectroscopy providing the most detailed view. The atomic coordinates of the complexes stored in the Protein Data Bank (PDB) [2] have been analyzed to derive information on the physicochemical features of
the interface formed between the two components. PP interactions [3-5] have attracted the maximum attention. These can vary in strength – some are obligatory (permanent), as can be seen in the formation of
the quaternary structures, while others are non-obligatory, in which the individual protomers exist independently in the stable form [6], but the time scale of interaction can vary widely from ~10-3 to 103s
(transient to stable complexes, exemplified by electron transfer in redox proteins and antigen-antibody complexes, respectively). Studies in PD interactions have aimed at unravelling of the sequence specificity
of nucleotide recognition [7-11]. In comparison PR interactions have been relatively fewer in number as data have been scarce till only recently [12-14]. Most of the complexes contain double-stranded DNA and
the RNA is usually single stranded, though in a few cases depending on the sequence and length, it may fold into stem-loop structures including double helical segments. Akin to the non-obligatory PP complexes,
PD and PR complexes are mostly transient, forming only when the protein encounters the nucleic acid, and exhibit a wide range of stability and lifetimes. With increase in our understanding of protein structure
and interactions attempts are now geared towards synthetic biology for designing receptors for proteins and nucleic acids [15]. In this connection it is important to know what types of secondary structures are
used in the interface and the residue usage vis-à-vis the rest of the protein structure. In this article these features are derived for PD and PR interfaces and compared to those observed in PP complexes [16]. Methodology The list of 128 protein-DNA complexes used has been given in [11]. A search of PDB [2] (August, 2007) yielded 381 hits for the query “protein-RNA complex”. The list of entries was culled using PISCES [17], such
that the maximum percentage identity was 25% and the resolution not worse than 3.0 Å. The minimum chain length for the protein part was kept at 40 and for RNA, at least 3 bases. For this non-redundant dataset of
50 protein-RNA complexes, the information on the biologically relevant assembly was obtained from the Nucleic Acid Database (NDB) [18] (since many PDB files have coordinates only for the crystallographic asymmetric unit,
which may just contain a part of the whole molecule). The protein secondary structural elements (SSEs) were assigned using DSSP [19]. Only three types of SSEs were considered. All helices (with DSSP notations ‘H’ and ‘G’) were included irrespective of their type, ‘E’
and ‘B’ constituted strands; turns (‘T’ and ‘S’) and the unclassified residues (with assignment ‘ ’) together formed the nonregular (NR) region. Based on the presence of interface residues in distinct SSEs along the chain,
the interface can be split into secondary structural segments (SSS) - a segment is specified by the span between the two extreme locations of the interface residues on that SSE (with or without intervening non-interface residues)
[16]. The propensity (Pi)SSE of a residue i to occur in a given secondary structural element (SSE) was calculated by the following formula (1) under supplementary material. Results and discussion Basic RNA-binding module and the interface area Among the 50 RNA-binding proteins (Table 1, supplementary material) many are multimeric, each having distinct recognition sites which are structurally equivalent. Any one of them can be assumed to be the basic unit that gets
repeated. We define this basic unit as one RNA-binding module (akin to what we have done for protein-DNA complexes [11]). The basic RNA-binding module that has been constructed can be repeated (by the application of simple symmetry
operators) to generate the complete biological assembly. Thus for a homodimeric molecule (such as 1ooa), only one subunit interacting with the RNA was considered. In some other cases with more than two identical protein-RNA units
(as in 2gic, where five identical protein chains bind symmetrically to five individual RNA strands), only one protein chain complexed with one RNA was considered. A considerable number (5) of the complexes in the dataset are coat
proteins or nucleocapsids of viruses and bacteriophages. Basically, these are huge complexes (eg., 2fz2) formed by the application of a number of symmetry operators to a simple protein-RNA unit. For such complexes too, one subunit
of the protein with one strand of the RNA was considered. 42 of the 50 complexes had the protein monomer binding to single-stranded RNA, and the rest to double-stranded RNA. The interface area is given by the sum of the accessible surface area of the two isolated components minus that of the complex. This is the area that gets buried between the two components, which usually contribute almost equally
[4,5]. The average interface area in PR complexes is comparable to that observed in PD and PP complexes (Table 2 under supplementary material), though there is a larger variation around the mean. This is expected as the length of
RNA located in the interface varies considerably (range: 3 to 37) among the different structures. Secondary structural segments in the interface Data presented in Table 2 (see supplementary material) indicate that there is not much distinction between the numbers of SSSs present in PP and PD interfaces, even when the value is normalized for a fixed size (1000 Å2) of the
interface. However, both these numbers are higher for PR interfaces. When the three SSSs (helix, strand and NR) are considered individually, the numbers are comparatively higher for PR than those in PD and PP interfaces. In contrast,
the average lengths of the SSSs remain more or less the same in the three categories. Variation of SSS length with interface size The majority of the PP complexes have an interface with an area of 1600±400 Å2 that has been termed as the standard size [4]. The variation of the segment lengths as a function of the interface size has also been addressed [16].
It was found that the average length of helix is doubled from ~4 when the area increases ten-fold from 500 Å2; however, such changes were not observed for strand and NR segments. In comparison, in PD complexes (Figure 1a
Secondary structure preferences of interface residues Calculation of the propensities of residues to occur in a SSE in the PP interface relative to the same element in the overall protein tertiary structure indicated that Arg and the aromatic residues are observed more in all interface
SSEs [16]. In PD complexes (Figure 2a
Conclusion A non-redundant dataset of PR complexes has been created. This and a similar dataset of PD complexes [11] have been analyzed in terms of SSSs that constitute the protein part of the interface. PP complexes can bury a larger surface area by the
use of longer helical segments [16]. However, in PR complexes the SSS length is rather invariant (Figure 1
Data 1 Click here to view.(65K, pdf) References 1. Shoemaker BA, Panchenko AR. Plos Comp Biol. 2007;3:337. [PubMed] 2. Berman HM, et al. Nucleic Acids Res. 2000;28:235. [PubMed] 3. Jones S, Thornton JM. Proc Natl Acad Sci USA. 1996;93:13. [PubMed] 4. Lo Conte L. J Mol Biol. 1999;285:2177. [PubMed] 5. Chakrabarti P, Janin J. Proteins. 2002;47:334. [PubMed] 6. Nooren M, Thornton JM. EMBO J. 2003;22:3486. [PubMed] 7. Nadassy K, et al. Biochemistry. 1999;38:1999. [PubMed] 8. Jones S, et al. J Mol Biol. 1999;287:877. [PubMed] 9. Luscombe NM, et al. Genome Biol. 2000;1:001.1. [PubMed] 10. Sarai A, Kono H. Annu Rev Biophys Biomol Struct. 2005;34:379. [PubMed] 11. Biswas S, et al. Proteins. 2008 [PubMed] 12. Jones S, et al. Nucleic Acids Res. 2001;29:943. [PubMed] 13. Treger M, Westhof E. J Mol Recogn. 2001;14:199. [PubMed] 14. Ellis JJ, et al. Nuc Acids Res. 2007;66:903. [PubMed] 15. Endy D. Nature. 2005;438:449. [PubMed] 16. Guharoy M, Chakrabarti P. Bioinformatics. 2007;23:1909. [PubMed] 17. Wang G, Dunbrack RL. Bioinformatics. 2003;19:1589. [PubMed] 18. Berman HM, et al. Biophys J. 1992;63:751. [PubMed] 19. Kabsch W, Sander C. Biopolymers. 1983;22:2577. [PubMed] 20. Chakrabarti P, Pal D. Prog Biophys Mol Biol. 2001;76:1. [PubMed] |
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||
PLoS Comput Biol. 2007 Mar 30; 3(3):e42.
[PLoS Comput Biol. 2007]Nucleic Acids Res. 2000 Jan 1; 28(1):235-42.
[Nucleic Acids Res. 2000]Proc Natl Acad Sci U S A. 1996 Jan 9; 93(1):13-20.
[Proc Natl Acad Sci U S A. 1996]Proteins. 2002 May 15; 47(3):334-43.
[Proteins. 2002]EMBO J. 2003 Jul 15; 22(14):3486-92.
[EMBO J. 2003]Proteins. 2009 Feb 15; 74(3):643-54.
[Proteins. 2009]Nucleic Acids Res. 2000 Jan 1; 28(1):235-42.
[Nucleic Acids Res. 2000]Bioinformatics. 2003 Aug 12; 19(12):1589-91.
[Bioinformatics. 2003]Biophys J. 1992 Sep; 63(3):751-9.
[Biophys J. 1992]Biopolymers. 1983 Dec; 22(12):2577-637.
[Biopolymers. 1983]Bioinformatics. 2007 Aug 1; 23(15):1909-18.
[Bioinformatics. 2007]Proteins. 2009 Feb 15; 74(3):643-54.
[Proteins. 2009]J Mol Biol. 1999 Feb 5; 285(5):2177-98.
[J Mol Biol. 1999]Proteins. 2002 May 15; 47(3):334-43.
[Proteins. 2002]J Mol Biol. 1999 Feb 5; 285(5):2177-98.
[J Mol Biol. 1999]Bioinformatics. 2007 Aug 1; 23(15):1909-18.
[Bioinformatics. 2007]Bioinformatics. 2007 Aug 1; 23(15):1909-18.
[Bioinformatics. 2007]Biochemistry. 1999 Feb 16; 38(7):1999-2017.
[Biochemistry. 1999]J Mol Biol. 1999 Apr 16; 287(5):877-96.
[J Mol Biol. 1999]Proteins. 2009 Feb 15; 74(3):643-54.
[Proteins. 2009]Proteins. 2009 Feb 15; 74(3):643-54.
[Proteins. 2009]Bioinformatics. 2007 Aug 1; 23(15):1909-18.
[Bioinformatics. 2007]Prog Biophys Mol Biol. 2001; 76(1-2):1-102.
[Prog Biophys Mol Biol. 2001]