Logo of plosonePLoS OneView this ArticleSubmit to PLoSGet E-mail AlertsContact UsPublic Library of Science (PLoS)
PLoS One. 2012; 7(6): e38805.
Published online 2012 Jun 22. doi:  10.1371/journal.pone.0038805
PMCID: PMC3382195

Local Structural Differences in Homologous Proteins: Specificities in Different SCOP Classes

Franca Fraternali, Editor


The constant increase in the number of solved protein structures is of great help in understanding the basic principles behind protein folding and evolution. 3-D structural knowledge is valuable in designing and developing methods for comparison, modelling and prediction of protein structures. These approaches for structure analysis can be directly implicated in studying protein function and for drug design. The backbone of a protein structure favours certain local conformations which include α-helices, β-strands and turns. Libraries of limited number of local conformations (Structural Alphabets) were developed in the past to obtain a useful categorization of backbone conformation. Protein Block (PB) is one such Structural Alphabet that gave a reasonable structure approximation of 0.42 Å. In this study, we use PB description of local structures to analyse conformations that are preferred sites for structural variations and insertions, among group of related folds. This knowledge can be utilized in improving tools for structure comparison that work by analysing local structure similarities. Conformational differences between homologous proteins are known to occur often in the regions comprising turns and loops. Interestingly, these differences are found to have specific preferences depending upon the structural classes of proteins. Such class-specific preferences are mainly seen in the all-β class with changes involving short helical conformations and hairpin turns. A test carried out on a benchmark dataset also indicates that the use of knowledge on the class specific variations can improve the performance of a PB based structure comparison approach. The preference for the indel sites also seem to be confined to a few backbone conformations involving β-turns and helix C-caps. These are mainly associated with short loops joining the regular secondary structures that mediate a reversal in the chain direction. Rare β-turns of type I’ and II’ are also identified as preferred sites for insertions.


The three dimensional structure of protein provides precise details on its functional properties like ligand binding or catalysis [1], [2]. Protein structures can also serve as specific drug targets and structure based drug design has been quite successful. The functional properties can be studied by comparing related structures. The analysis of similarities (or variations) in protein structural features among related proteins, demands efficient means of comparing protein folds. Structural divergence occurs less rapidly than sequence divergence and structure based alignments are quite reliable when the proteins have distant relationships [3], [4], [5], [6], [7], [8], [9].

Most of the structure comparison methods consider protein folds as rigid bodies and quantify the structural similarity based on an average of atomic distances calculated using backbone coordinates. However, certain regions of a protein structure can be prone to variations, which arise due to structural flexibility or evolutionarily acquired changes. These variations can be either restricted to local regions in the backbone or involve large movements that alter the conformational state of the protein. Unlike the conformational alteration caused by large flexible movements, the local backbone changes are not likely to be affected by the nature of the global fold. Hence the preferences associated with the variations in the backbone conformations can be extracted as a general feature.

The evolutionary information has been used to explore the preferences in amino acid replacements based on empirical approaches [10], [11], [12]. Structural contexts of amino acid substitutions involving secondary structures and solvent accessibility have also been studied [13], [14], [15], [16], [17], [18], [19], [20]. Nevertheless, the precise local structural changes that occur need to be understood. Apart from local conformational changes, insertions and deletions (indels) seem to play a major role in protein evolution [7], [21], [22], [23], [24]. The studies on indels in the context of secondary structures suggested that the loops are more tolerant to indels than regular secondary structural regions and a significant percent of indels are disordered [7], [25], [26], [27], [28], [29], [30], [31]. The inserted regions prefer to be short [30] and hydrophobic amino acids were found to be less frequent in the inserted region [32]. A more detailed analysis of the effect of insertions on the flanking regions has also been carried out and insertions were found to break regular secondary structures or cause an alteration in the tertiary structure [33].

To study the preferences in the local conformational variations among homologous proteins, a good understanding of the frequent backbone conformations is necessary. The local backbone conformation of a protein chain is usually described in terms of α-helix and β-strand. More than 50% of the backbone is assigned to the coil state which reflects irregularity in the backbone. Later, more precise and comprehensive studies led to the identification of other repeating conformations [34]. The most important of them are the β-turns which cover about 25%–30% of the residues [35], [36], [37], [38], [39], [40], [41]. Out of the 9 different types of β-turns categorized based on the φ/ψ dihedrals, type I and type II are most common representing 31.6% and 10.4% of all turns (i.e., 10 and 4% of all residues). The type IV turns are comprised of those which could not be assigned to other types as per standard definitions and this has the maximum representation of about 43% [42], [43].

A more precise and different view of the favorable backbone conformations is provided by Structural Alphabets (SAs). SAs represent a library of limited number of local backbone conformations that are used to approximate the fold of a complete protein chain [44], [45], [46], [47], [48], [49], [50], [51], [52], [53]. A SA consisting of 16 prototypes called Protein Blocks (PBs) was developed in our laboratory [44], [54]. Each PB represents a pentapeptide backbone conformation described as a series of φ, ψ dihedrals and each PB is labeled by a character alphabet ranging from a to p (Figure 1). This SA gives a reasonable approximation of local protein 3D structures with a root mean square deviation (rmsd) of about 0.42 Å [54]. PB description has been used in several bioinformatics approaches including modeling and structure prediction [44], [55], [56], [57], [58], [59], [60], [61], [62], [63], [64], [65], [66], [67], [68], [69], [70], [71]. Figure 2 shows practical examples on the association of different PBs with regular secondary structures and Table 1 summarizes this relationship using PROMOTIF [42] based secondary structure assignment.

Figure 1
PBs series of φ,ψ backbone dihedral angles.
Figure 2
Association examples of PBs with secondary structural elements.
Table 1
Association of PB with secondary structures.

As in the case of the study of amino acid substitutions that occur during the course of evolution, the preferred local structural changes could be analysed with the help of PBs. This idea was extended to the comparison of protein structures. Approximation of protein structures in terms of SA helps to transform 3D information in 1D. Thus the 3D superposition of protein structures can be carried out with an alignment of sequences encoded in terms of SAs [67], [72]. A specialized PB substitution matrix (SM) was developed for this purpose [73]. The PB based structure alignment approach performed better than many of the other available tools for structure comparison [67], [74].

In this study we analyse the preferences for the conservation of local backbone conformations with the help of Protein Block abstraction. Initially, we analyse the pattern of PB substitutions and the effect of solvent accessibility on this. Here, we restrict our analysis to the equivalent structural regions found among families of related folds. This knowledge can be utilized in the improvement of structure comparison tools that works based on the similarities in the local backbone or fragment conformations. As the secondary structure content and topology varies between structural classes of proteins (as defined by SCOP [75]), we check whether there are class-specific specificities for changes in local pentapeptide conformations. In that case we also verify the use of class specific PB substitution matrices in improving the alignment of structures represented in terms of PB sequences. The preferred local backbone conformations associated with the sites of insertions were studied. Throughout the study, we associate the PB description of backbone conformation with different secondary structure assignments, to present a different view of the results.


Protein Blocks

Protein Blocks (PBs) are a set of 16 prototypes of main chain conformations that are 5 residues long. The pentapeptide backbone conformation is described in terms of the φ, ψ dihedral angles. The 16 prototypes are labeled from a to p (Figure 1). They were generated using an unsupervised classifier related to Kohonen Maps [76] and hidden Markov model. Protein Blocks renders a reasonable approximation of local structures in proteins [44] with an average root mean square deviation (rmsd) of 0.42 Å [54]. The assignment of PBs [54] has been carried out using an in-house Python software similar to the one used in iPBA web server [77].

Figure 2 highlights the correspondence between PBs and regular secondary structures assigned by DSSP (Dictionary of Secondary Structure of Proteins) [43]. The PBs m and d are prototypes for the central region of α-helix and β-strand, respectively. PBs a through c primarily represent the N-cap of β-strand while e and f correspond to C-caps. These N and C caps could also include regions in the loop leading to or arising from a secondary structural element. The PBs p, a, f, h, g and i are often seen in the region of transition between secondary structural elements. Figure 2A–C presents some examples highlighting the association of the PB structures with respect to the secondary structure definition while Table 1 gives a detailed list of this relationships extracted from a subset of PALI (Phylogeny and ALIgnment of homologous protein structures) [78] dataset generated using a sequence identity cut-off of 40%. Figure 2 also highlights some of the frequently occurring PB-PB transitions. PBs g through j are largely associated with coils, PBs k and l are frequent in the N cap of α-helix and n to p in C-caps.


The dataset of protein structure alignments used in the study is the recent version of PALI dataset V 2.8a [78], [79], [80]. It consists of 1,922 domain families comprising of 231,000 domain pairs aligned using MUSTANG [81]. The domains are classified based on SCOP definitions [75]. SCOP classifies domain structures into four major classes. All-α class consists of proteins with mainly α-helical content while all-β proteins are composed of mainly strand conformation. α/β contains both helical and strand conformations that are mixed in the structure, while they are segregated in the case of α+β class.

PB Substitution Matrix

Domain pairs in the PALI database that are solved at resolution better than 2 Å and share sequence identity less than 40%, were only used for obtaining the substitution frequencies. This corresponds to 5,223 domain alignment pairs from 476 families. The pairwise structural alignments were first represented as PB sequence alignments. The PB pairs occurring in the structurally conserved regions (within 3 Å) were counted for calculating the substitution frequencies. As in our previous work [72], the method presented by Johnson et al. [82] was adopted for calculating log odd scores from raw frequencies:

equation image

where Si,j is the substitution weight and Ni,j is the raw substitution frequency between PB i and PB j, M is the total number of different PBs (i.e., 16).

Structural Superposition Based on PBs

Protein structures to be aligned were first represented as PB sequences. These sequences have been aligned using Smith-Waterman dynamic programming algorithm [83], based on the PB substitution scores. Gap penalty of −5.0 was used for alignment [67]. Profit version 3.1 [84] was used to obtain a least squares fit of two protein structures based on the PB sequence alignment. The amino acid sequence alignment corresponding to the PB alignment was given as input for Profit for reading the aligned pairs of residues. The fit was performed on the aligned residue pairs and the Root Mean Square deviation (rmsd) was calculated.

Test Dataset for Alignments

The gain in the quality of superposition (quantified as the difference in rmsd of superimposition) obtained using the class specific PB substitution matrices was checked on a smaller dataset. From each SCOP superfamily in the PALI dataset (with two or more families), two families were randomly chosen and from each of these families, a domain pair with sequence identity less than 40%, was chosen. It represents 1,050 domains (comprising of 188,760 residues) from 263 families.

Clustering Based on Substitution Data

To compare the PB substitution patterns, pairwise correlation coefficients were calculated based on the substitution scores associated with each PB. These values were deducted from 1 to get a distance matrix for hierarchical clustering. The hclust module of ‘R’ software (http://www.r-project.org/) was used for clustering the PBs based on the distance matrix.

Secondary Structure Assignment

The secondary structure types associated with the PBs were identified with the help of assignments made by DSSP [43], SEGNO [85] and PROMOTIF [42].

PB Accessibility

A PB is considered solvent accessible if at least 3 residues (out of 5) that it corresponds to, are accessible to the solvent. NACCESS [86] was used for calculating the accessibility of each residue. Different cut-offs of 7%, 15% and 25% for relative solvent accessibility, were used to identify buried residues.

Locating Indels

The structural alignments of domain pairs sharing less than 80% sequence identity cut-off were extracted from PALI. If a continuous stretch of gaps of length n is flanked by aligned regions (each aligned residue pair within 3 Å) that are at least 3 residues long, then that position is considered as a point of insertion/deletion.

Z Value

A likelihood score was computed to identify significant members of a distribution. This was used to identify the local conformation prone to insertions. The preferred series of two PBs (di-PBs) binding the insert site are extracted from the observed distribution of di-PBs. The background frequency of occurrence of di-PBs in the dataset was considered as the expected distribution. Z values were computed based on the deviation from the expected distribution. The di-PBs with Z values greater than 2 were considered as the preferred sites for insertions.


The extent of conservation of local backbone conformations were identified in terms of PBs. The local structures undergoing subtle conformational differences and those which are preferred as insert sites, were looked into. Pairwise structural alignments from the PALI dataset were used as a reference to study such preferences among related structures in a family.

Local Structure Substitutions

The changes in local backbone conformation were deduced by looking at PB replacements among homologous structures. The reliable alignment regions (residue pairs within 3 Å) are only considered for calculating the replacement frequencies. The scores for substituting each PB with the 16 PBs, were calculated from the raw substitution frequencies (see Methods).

Figure 3A shows the substitution preferences associated with each PB. Surprisingly, the PBs associated with the N and C caps of helix and strand do not show highly preferred substitutions with the central helix PB m and central strand PB d respectively. This reflects the preference for conservation of the central or most favoured conformation of these regular structural elements. The PB p, usually found in the C-cap of helices and/or at the N-cap of β-strands, favours substitutions with PBs g and i. The PB pairs (p, g) and (p, i) share similar (φ,ψ) dihedrals along the 5 residue stretch (see Figure 3B which compares the dihedral angles associated with these PBs). The substitution (p, g) is dominated by changes in conformation of 3.10 helices and β-turns and a relatively fewer conversions to α-helix and coil (Table 1, Figure S1 & Table S1). These turns are mainly characterized by β-turns of type I and IV. On the other hand, (p,i) substitution involves variations in turns (β-turns type I, II and IV) and the substitutions between them and coils. These two substitutions mainly involve the region of helix-helix, strand-strand and helix-strand transitions (Figure S1). PB b which is largely seen in the N cap of β-strands, favour replacement with PB i which is frequently seen in the region of strand-strand transitions (Figure 3C). This change is associated with variation in turns and bends, mainly involving transitions between β turns of types I, & IV with types II and IV.

Figure 3
PB substitutions.

It is expected that the preference for PB substitution is dependent on the extent of structural similarity between PBs. Nonetheless, often the structurally closest PBs are not the ones with the best substitution preference (Figures 3D&E). For instance, the substitution of PB f and PB h is not high preferred (Figure 3E), even though they are very close in terms of the dihedral angle distribution. The preference for replacement can be dependent on the local structural environment. This is also true in the case of substitutions (k, l) and (c,d), which are not highly favoured even though they are structurally closest. PB j, which is usually seen in coils, favours replacement with h (Figures 3A and S2). PB k associated with N-cap of helices, also show preferred substitution with the loop PB h. These two changes are characterized by variations in β-turns and 3.10 helices (Figure S1). The replacement of h and i which are largely seen in the strand-strand transitions, with central α-helix PB m is strongly disfavoured. The more obvious case involving substitutions between helix and strand associated PBs, are not preferred (Figure 3A).

Hence many of the preferred variations in the backbone conformation, corresponds to changes in β-turns. The clustering based on the substitution pattern of each PB (Figure 3E) highlights differences with respect to the association based on PB conformation similarity (Figure 3D). The PBs associated with the helical conformation, i.e. l (N-terminus), m (central) and n, o and p (C-terminus) have similar preferences for substitution. PB k which is also frequent in the N-cap of helices has patterns of substitution similar to the loop associated PBs (j,h). On the other hand, the PBs mainly occurring at the N-terminus of strands cluster separately from the rest of strand associated PBs.

It should be noted that there are significant variations in the substitution preferences, among the helix associated PBs and those associated with the strands. The PBs associated with the central region of helix and its immediate C-terminus, i.e., PBs m and n are found to group closely. Similar relationship is observed in case of strand associated PBs d, e and f.

As mentioned in the Methods section, the local conformational changes discussed above were identified using a dataset of domain pairs sharing less than 40% sequence identity. To check whether the nature of backbone conformational changes has significant differences depending on the extent of structure relatedness, we compared the substitution patterns obtained from datasets filtered at different sequence identity cut-offs like 60%, 80% and finally a dataset with all domain pairs (no filtering, Figure S3). No significant differences were observed with respect to the original dataset (filtered at 40% sequence identity), the PB substitutions had correlation scores close to 1.

PB Substitution and Accessibility

Each PB was first classified into accessible and buried (see Methods) and the occurrence frequency was calculated. Figure 4A gives the ratio of the percentage of accessible PBs to buried. PB d found at the central strand regions, has the highest tendency to get buried (Figures 4A&B). The helix associated PBs has a higher preference for solvent exposure than that of the strand associated PBs. The PBs associated with the C-terminus of helices (n, o and p), have a greater tendency to get exposed when compared to the N-cap. On the other hand, both the N and C caps of strands have similar preferences for exposure. The loop associated PBs has variable preferences, with g and i being more accessible than h and j. The PB g is dominated by short helical conformations (including 3.10 helices) and turns, while PB i is very frequent in turns (Table 1). The relative increase in exposure with increase in the threshold for burial also shows a similar trend. The strand associated PBs have a relatively lower increase in the percentage of exposure.

Figure 4
Clustering PBs based on substitution patterns.

It is interesting to find out whether the substitution patterns vary with solvent accessibility of the local structures. To apprehend it, a substitution matrix was generated for the PBs categorized as exposed and buried (Figure S4). Apart from a few exceptions, the distribution of scores for substitutions between exposed PBs and between buried PBs was largely similar to the general distribution (Figure 3A). Substitution (k, i) is preferred in the buried regions than exposed. Most of the substitutions involving the replacement of an exposed PB by a buried PB of another kind are not favoured. The substitutions (p, g) and (h, j) are exceptions.

Clustering exposed and buried PBs based on the substitution patterns suggests that PBs associate differently depending on their accessibility (Figures 4C and D). The exposed PB (Figure 4C) cluster in a way similar to the general preferences (Figure 3A). In the buried region, the PBs b and i cluster with the loop PBs and not with the strand associated PBs. The substitution patterns associated with the central helix conformation m is not highly similar to the substitutions in the immediate C-terminus (PB n), unlike the exposed regions.

Class Specific PB Substitutions

The distribution of domain structures in different SCOP classes is based on the secondary structure content and topology. As a result, the background distribution of PBs also varies between the SCOP classes. For instance, the all-α class has very low percentage of strand associated PBs while all-β has a low percentage of helix associated PBs (Figure S5).

The PB substitution scores observed in the different SCOP classes were compared to the scores observed in the global distribution. The PB substitution patterns show variations across different SCOP classes. Clustering PBs based on the substitution patterns reflect different behaviours in each structural class.

For the all-α class (Figure 5A), the PBs mainly occurring in helix N-terminus, is associated with loop PB h which is largely found in β turns and strand C terminus. For the all-β class (Figure 5B), the group of loop associated PBs cluster is closer to the helix PBs than those which correspond to the strand.

Figure 5
PB relationship in each SCOP class derived based on the substitution pattern.

The PBs in the α/β class (Figure 5C) associate in a similar fashion as that of the global distribution, except that the PBs a and c which mark the beginning of strands, cluster closely with the other strand PBs and the helix N cap PB l associates with loop PBs. The clustering in the α+β class (Figure 5D) is closest to the general distribution (Figure 3D).

Preferred substitutions in each class

Thus variations in the substitution preferences of local structure conformations are seen across SCOP classes. Comparison of these class-specific substitution scores with the global matrix (see Methods) highlights a few differences (Figure 6).

Figure 6
Comparison of class-specific PB substitution scores with the global distribution (global substitution matrix).

It was seen that substitutions involving strand associated PBs and helix associated PBs have a higher score in the all-α and all-β classes respectively (Figures 6A and 6B). Indeed, they have lower background frequencies or lack sufficient substitution information in these respective classes. Nevertheless, the observed probabilities of changes between strands associated PBs with the central conformation d was low in the all-α class. Similarly, in the all-β class, the substitutions involving central helix conformation m and other helix associated PBs have low probabilities of occurrence (Figure S6). More class specific preferences for the change in local conformations were evident in the all-α and all-β classes (Figure 6). The substitution patterns associated with each PB was compared with that of the general preferences (Figure 3A) and the cases where the correlation was less than 0.95 were looked into.

In the all-α class, two substitutions (a, e) and (g, j) were found to be more favourable when compared to the global preferences (Figures 7A&B). Both the substitutions are usually associated with changes in β-turn type II, II’ and type IV conformations.

Figure 7
PB substitutions highly preferred in certain SCOP classes.

The substitutions that are preferred in the all-β class occur in the region of strand-strand transitions (Figures 7C&D). These substitutions can be grouped into the following categories. (i) Those which involve transition between central helix conformation (PB m) and those frequently associated with strands (PBs d and e). This change is usually characterized by changes in short helical regions found in this class. (ii) Those usually associated with beta turns. This includes PB changes (b,g), (c,i), (l,n) and (o,l) in the regions which are mainly characterized by hairpin beta turns.. (iii) Those associated with transitions between central helix and C-terminal PBs. The substitutions (o,m) and (p,m) belong to this category.

Sites of Indels

The sites of insertion/deletion events were analysed using PBs. The frequencies of the two PBs (di-PBs) that bind the site of indels, were calculated (see Methods). Preferred sites of insertions were identified using Z-values. The local structural regions where indels occur show some preferences (Table 2 & Figure 8). The length of the insert also affects the preferences for the insert site. However, certain di-PBs like ‘p-a’ and ‘j-a’ are the preferred sites for insertions of different lengths.

Table 2
Preferred indel sites in different SCOP classes.
Figure 8
Preferred local structure for indel events.

The preferences for the site of insertions, has variations across different SCOP classes. A few class specific preferences could be found for the all-α and all-β classes, especially for short inserts of length less than 4 (Table 2). Perhaps, many of the preferred sites for insertions/deletions are class-independent. β-turns and the C-capping region of α-helices are largely found as indel sites. These preferred sites are associated with loops that mediate the reversal in the direction of the backbone. Across the different SCOP classes, the two major PB bounds for insertions, are ‘h-i’ and ‘p-a’. The di-PB ‘p-a’ characterizes helix-helix and helix-strand transitions (Figures 8A and D). This local fold is characteristic of the C-cap motif of α-helices. Both short and long insertions are found associated with this site. In the all-β class, this site is preferred for single residue insertions with an association with beta turn of type I (Figure 8B). These di-PB ‘hi’ on the other hand, mainly characterizes region of strand-strand transitions (Figures 8B to 8D). Long insertions are found to occur at this site. The local structural region involving ‘hi’ is dominated by beta turn of type I’ (Figures 8B to 8D).

Single residue insertions are also preferred in the immediate C-terminus of the regular secondary structural elements. Though short insertions are also frequent in helices (‘mm’) and strands (‘dd’), the occurrences are not significantly higher than the background.


The precise description of local structures in terms of PBs presents a better view of the preferred local structural differences that occur among homologous proteins. The changes are highly constrained with preferences that are not necessarily correlated with the extent of structural similarity of PBs. β-turns are associated with a significant majority of the conformational variations. This involves both variations within a type of β-turn and exchanges with other types. Conformational flipping between β-turns has been studied for several years, especially inter-conversions between type I and type II turns and between type I’ and II’ [84], [87]. Many of these inter-conversions are noted to be associated with functional interaction and dynamics [88], [89]. Fairly low energy barriers are proposed for these changes and flipping of the central peptide unit (linking C-αs of residues i+1 and i+2) is suggested as a mechanism for these changes [87], [90]. Preferred changes from type I or II to type IV are also seen based on the PB substitution preferences. Replacements between turns and 3.10 helices also seem to be favoured. In fact, the conformation of 3.10 helix has similarities with type I β-turn [91]. As the substitution frequencies are calculated from the structurally similar regions, the larger variations are less evident.

Variations in the patterns of local structural changes are observed across different SCOP classes (Figure 5). Specific conformational changes are also preferred in certain SCOP classes (Figure 6). This is most evident in the case of all-β class, where the preferred local structure substitutions are found associated with short helical regions and β-turns. The preferred substitutions involving central helix PB m is rather unexpected. Short helices dominate the helical conformations found in the all-β class (Figure S7). About 69.2% of the PB m series occurring in this class are of length 3 or lesser. They are often seen in the region of transition between beta strands. Preferred substitutions with the PBs seen in the N-cap of strands (a & c), usually occur in such regions. Other structural elements associated with preferred local structural differences in the all-β class, are the β-hairpins. This local fold has a very high frequency of occurrence in the all-β class. It is interesting to see that the type IV β-turns are the predominant ones with class specific conformational changes. As they are uncharacterized, they encompass a wide range of conformations.

Using Class Specific PB Substitution Matrices for Structural Alignment

The knowledge on the substitution preferences observed in different SCOP classes could be utilized to improve structural comparisons based on PB sequence alignment [67], [72], [73]. PB based structural alignment method, iPBA, was shown to perform better than other established methods like DALI [92], MUSTANG [81], VAST [93], CE [94] and GANGSTA+ [95]. About 82% of the alignments had better quality when compared to DALI in benchmark tests. Comparable performance could be observed with respect to TMALIGN [96] and FATCAT [97].

The substitution matrices generated from the class-specific datasets are adapted for the background PB composition and observed changes. As seen above, specific domain families were found to contribute a significant portion of PB changes, favoured in a specific class. To avoid this bias resulting from non-uniform distribution of different family sizes, the raw frequencies counted from a family was normalized by the family size. As the substitution matrices are generated using the frequencies from the conserved regions of superposition, it is logical to compare the local alignments obtained using the class specific matrices with respect to the global matrix. The structural alignment pairs in the test dataset were used for this assessment.

As seen on Figure 9, a gain in the alignment quality is achieved in the all-α, all-β and α/β classes, with the use of class specific SMs. With the use of all-α class-specific SM for aligning domains in this class, 50.1% and 30.2% of the structural alignments had better and same rmsd values respectively, when compared to those generated using the general SM. For the all-β class, 38.1% of the alignments were better while 26.8% had poor rmsd. For the α/β class 43.3% and 28.8% alignments gave positive and negative results. The α+β class did not show any improvement with the use of specific SM. This suggests that the class specific substitution information could be useful in aligning the structurally similar regions. The negative cases with a lower alignment quality when compared to those generated with the global SM, need to be analysed in detail.

Figure 9
Percentage gain in alignments with better rmsd.

Hot-spots for Insertions

The relative frequency of occurrence of insertions is similar across different SCOP classes. The distribution of insertion of different lengths in the classes follows similar pattern (Figure S8). However, single residue insertions have a relatively low frequency in the all-β class. The preferred sites of insertions are highly specific in terms of local conformation. Though some class-specific insert sites are observed, the different SCOP classes share many insert sites. Helix C-caps and hairpin turns mainly constitute the sites favourable for occurrence of indels (Table 2).

Helix capping motifs have been widely studied since many years and exploring the amino acid preferences associated with these motifs, has been a main area of interest [98], [99], [100], [101], [102]. The dihedral angle distribution of the di-PB ‘pa’ is close to that observed in the Schellman motif and the αL type caps [98]. These motifs are stabilized by a specific pattern of backbone hydrogen bonds. Apart from the helix caps, beta turns of types I’, II’ and I are largely seen to characterize the site of indels. It is interesting to note that the turns of types I’ and II’ are quite rare, with an occurrence frequency of only about 3% [40]. Hence the preferred insertion sites are largely confined to a few specific conformations.

Both helix caps and beta turns have been implicated in structural stability and protein folding [37], [39], [103], [104], [105], [106], [107]. These β-turn types associated with indel sites (Table 2) are characterized by short hairpin loops. The conformation of helix C-caps pertaining to the indel sites are also confined to short loops that forms the region of transition with another helix or strand (Figure 8) [98]. These local folds thus restrict the orientation of the flanking secondary structural elements to an antiparallel conformation. The preferred conformation of insert regions is also reported to be shared among turns and coils and most of the indels are likely to be tolerated as extensions of the local conformation [30].

The use of dataset specific substitution information has been implicated in the improvement of amino acid sequence alignment [108], [109], [110], [111], [112]. Similar strategy can be adopted in the case of PB based structural alignment too [67], [72], [73]. Class-specific PB substitution matrices have been shown to be useful in improving the quality of alignments pertaining to the class. The nature of specific local structures that act as the hot spots of indels, can be also used to develop specialized gap penalties for structural alignment based on PBs. This strategy has already been reported to improve the quality of alignments generated [32], [113].


Our analysis throws light into the local structure variations that are found among homologous proteins. β-turns are most prone to minor backbone variations and the changes have specificities in certain structural classes. Common differences involve the conformations of types I, II and IV β-turns and to a lesser extent, 3.10 helices. Indels also have preferences for the local structural regions and these preferences vary with the length of the inserted fragment. Short loops involving hairpin β-turns and helix C-caps are the primary targets for insertions. Thus the inserted segments are likely to form structural extensions from these loops. The knowledge on the preferences for conformational variations and indel sites also aid in improving the methods for structure comparison and threading. The presence of specific substitution preferences in different structural classes can be explored to improve the PB based structural alignment in the respective class. This work also highlights the use of a structural alphabet which provides an effective description of the local structures of proteins and also gives a different view of the regularities in local conformations.

Supporting Information

Figure S1

Local structural contexts of (p,g) and (p,i) substitutions. (A-E) The sites of substitutions involving PBs (p,g) and (p,i). Some of the frequently occurring penta-PB (5 PB series) changes associated with these substitutions are presented. The change of one penta-PB to another is highlighted using same colours (orange and blue) in the PB series and in the picture.


Figure S2

Some of the frequent local conformational changes associated with the PB h. The PB that is structurally closest (angular RMSD) is indicated by black dotted lines. Other PBs that favour substitution with h are plotted in different colours.


Figure S3

Comparison of the PB substitution matrix generated from a dataset filtered at 40% sequence identity (A) to the matrices obtained at 60% (B), 80% (C) and also the one without any filtering (D). The substitution scores in each row (associated with each PB) is compared with the respective rows of the other matrix and the correlation coefficients are indicated adjacent to the matrices.


Figure S4

Substitution preferences of PBs classified into buried (uppercase) and exposed (lowercase). A 32*32 matrix was generated by segregating PBs into buried and exposed, based on a relative solvent accessibility cut-off of 25%. The color scale and corresponding range of substitution scores are given on the right side.


Figure S5

Frequency of occurrence of PBs in various SCOP classes.


Figure S6

The difference in the observed probabilities of substitution in each SCOP class, when compared to the global matrix. Only the observed substitution probabilities were computed for the PB substitutions and their differences from the global probabilities were calculated. This neglects the effect of background frequencies on the substitution scores. For each SCOP class all-α (A), all-β (B), α/β (C) and α+β (D), the variation in the observed probabilities were plotted.


Figure S7

Frequency of occurrence of helical conformation (series of PB m) in the all-β class. The percentage of occurrence (y axis) is plotted against the length of PB m series (x axis).


Figure S8

Distribution of inserts of different lengths in each SCOP class. The length 5 corresponds to inserts of length greater than or equal to 5.


Table S1

Some of the preferred PB substitutions and the three most frequent secondary structure changes associated with them. The secondary structure assignments were made using DSSP, SEGNO and PROMOTIF (refer Table 1 for details of the assignment abbreviations). The corresponding percentage of occurrence is also given.



Competing Interests: The authors have declared that no competing interests exist.

Funding: This work was supported by grants from the Ministère de la Recherche, Université Paris Diderot - Paris 7, National Institute for Blood Transfusion (INTS) and the Institute for Health and Medical Research (INSERM) and Indian Department of Biotechnology. APJ has a grant from CEFIPRA number 3903-E. NS and AdB acknowledge the support from CEFIPRA collaborative grant (number 3903-E). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.


1. Baker D, Sali A. Protein structure prediction and structural genomics. Science. 2001;294:93–96. [PubMed]
2. Byers DM, Gong H. Acyl carrier protein: structure-function relationships in a conserved multifunctional protein family. Biochem Cell Biol. 2007;85:649–662. [PubMed]
3. Choi IG, Kim SH. Evolution of protein structural classes and protein sequence families. Proc Natl Acad Sci U S A. 2006;103:14056–14061. [PMC free article] [PubMed]
4. Chothia C, Lesk AM. The relation between the divergence of sequence and structure in proteins. EMBO J. 1986;5:823–826. [PMC free article] [PubMed]
5. Flores TP, Orengo CA, Moss DS, Thornton JM. Comparison of conformational characteristics in structurally similar protein pairs. Protein Sci. 1993;2:1811–1826. [PMC free article] [PubMed]
6. Goldstein RA. The structure of protein evolution and the evolution of protein structure. Curr Opin Struct Biol. 2008;18:170–177. [PubMed]
7. Grishin NV. Fold change in evolution of protein structures. J Struct Biol. 2001;134:167–185. [PubMed]
8. Salemme FR, Miller MD, Jordan SR. Structural convergence during protein evolution. Proc Natl Acad Sci U S A. 1977;74:2820–2824. [PMC free article] [PubMed]
9. Thornton JM, Orengo CA, Todd AE, Pearl FM. Protein folds, functions and evolution. J Mol Biol. 1999;293:333–342. [PubMed]
10. Dayhoff MO, Eck RV. Eck. A model of evolutionary change in proteins. Atlas of protein sequence and structure. Washington D.C: National Biomedical Research Foundation. 1972.
11. Gonnet GH, Cohen MA, Benner SA. Exhaustive matching of the entire protein sequence database. Science. 1992;256:1443–1445. [PubMed]
12. Jones DT, Taylor WR, Thornton JM. The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosci. 1992;8:275–282. [PubMed]
13. Goldman N, Thorne JL, Jones DT. Assessing the impact of secondary structure and solvent accessibility on protein evolution. Genetics. 1998;149:445–458. [PMC free article] [PubMed]
14. Luthy R, McLachlan AD, Eisenberg D. Secondary structure-based profiles: use of structure-conserving scoring tables in searching protein sequence databases for structural similarities. Proteins. 1991;10:229–239. [PubMed]
15. Overington J, Johnson MS, Sali A, Blundell TL. Tertiary structural constraints on protein evolutionary diversity: templates, key residues and structure prediction. Proc Biol Sci. 1990;241:132–145. [PubMed]
16. Thorne JL, Goldman N, Jones DT. Combining protein evolution and secondary structure. Mol Biol Evol. 1996;13:666–673. [PubMed]
17. Topham CM, McLeod A, Eisenmenger F, Overington JP, Johnson MS, et al. Fragment ranking in modelling of protein structure. Conformationally constrained environmental amino acid substitution tables. J Mol Biol. 1993;229:194–220. [PubMed]
18. Wako H, Blundell TL. Use of amino acid environment-dependent substitution tables and conformational propensities in structure prediction from aligned sequences of homologous proteins. II. Secondary structures. J Mol Biol. 1994;238:693–708. [PubMed]
19. Wako H, Blundell TL. Use of amino acid environment-dependent substitution tables and conformational propensities in structure prediction from aligned sequences of homologous proteins. I. Solvent accessibility classes. J Mol Biol. 1994;238:682–692. [PubMed]
20. Przytycka T, Aurora R, Rose GD. A protein taxonomy based on secondary structure. Nat Struct Biol. 1999;6:672–682. [PubMed]
21. Panchenko AR, Wolf YI, Panchenko LA, Madej T. Evolutionary plasticity of protein families: coupling between sequence and structure variation. Proteins. 2005;61:535–544. [PMC free article] [PubMed]
22. Castillo-Davis CI, Kondrashov FA, Hartl DL, Kulathinal RJ. The functional genomic distribution of protein divergence in two animal phyla: coevolution, genomic conflict, and constraint. Genome Res. 2004;14:802–811. [PMC free article] [PubMed]
23. Petrov DA. Mutational equilibrium model of genome size evolution. Theor Popul Biol. 2002;61:531–544. [PubMed]
24. Sandhya S, Rani SS, Pankaj B, Govind MK, Offmann B, et al. Length variations amongst protein domain superfamilies and consequences on structure and function. PLoS One. 2009;4:e4981. [PMC free article] [PubMed]
25. Aravind L, Mazumder R, Vasudevan S, Koonin EV. Trends in protein evolution inferred from sequence and structure analysis. Curr Opin Struct Biol. 2002;12:392–399. [PubMed]
26. Jiang H, Blouin C. Insertions and the emergence of novel protein structure: a structure-based phylogenetic study of insertions. BMC Bioinformatics. 2007;8:444. [PMC free article] [PubMed]
27. Shortle D, Sondek J. The emerging role of insertions and deletions in protein engineering. Curr Opin Biotechnol. 1995;6:387–393. [PubMed]
28. Sondek J, Shortle D. Accommodation of single amino acid insertions by the native state of staphylococcal nuclease. Proteins. 1990;7:299–305. [PubMed]
29. Taylor MS, Ponting CP, Copley RR. Occurrence and consequences of coding sequence insertions and deletions in Mammalian genomes. Genome Res. 2004;14:555–566. [PMC free article] [PubMed]
30. Pascarella S, Argos P. Analysis of insertions/deletions in protein structures. J Mol Biol. 1992;224:461–471. [PubMed]
31. Kim R, Guo JT. Systematic analysis of short internal indels and their impact on protein folding. BMC Struct Biol. 2010;10:24. [PMC free article] [PubMed]
32. Chang MS, Benner SA. Empirical analysis of protein insertions and deletions determining parameters for the correct placement of gaps in protein sequence alignments. J Mol Biol. 2004;341:617–631. [PubMed]
33. Zhang Z, Huang J, Wang Z, Wang L, Gao P. Impact of indels on the flanking regions in structural domains. Mol Biol Evol. 2011;28:291–301. [PubMed]
34. Offmann B, Tyagi M, de Brevern AG. Local Protein Structures. Current Bioinformatics. 2007;3:165–202.
35. Bornot A, de Brevern AG. Protein beta-turn assignments. Bioinformation. 2006;1:153–155. [PMC free article] [PubMed]
36. Chou PY, Fasman GD. Beta-turns in proteins. J Mol Biol. 1977;115:135–175. [PubMed]
37. Lewis PN, Momany FA, Scheraga HA. Folding of polypeptide chains in proteins: a proposed mechanism for folding. Proc Natl Acad Sci U S A. 1971;68:2293–2297. [PMC free article] [PubMed]
38. Richardson JS. The anatomy and taxonomy of protein structure. Adv Protein Chem. 1981;34:167–339. [PubMed]
39. Yang AS, Hitz B, Honig B. Free energy determinants of secondary structure formation: III. beta-turns and their role in protein folding. J Mol Biol. 1996;259:873–882. [PubMed]
40. Shepherd AJ, Gorse D, Thornton JM. Prediction of the location and type of beta-turns in proteins using neural networks. Protein Sci. 1999;8:1045–1055. [PMC free article] [PubMed]
41. Kountouris P, Hirst JD. Predicting beta-turns and their types using predicted backbone dihedral angles and secondary structures. BMC Bioinformatics. 2010;11:407. [PMC free article] [PubMed]
42. Hutchinson EG, Thornton JM. PROMOTIF–a program to identify and analyze structural motifs in proteins. Protein Sci. 1996;5:212–220. [PMC free article] [PubMed]
43. Kabsch W, Sander C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers. 1983;22:2577–2637. [PubMed]
44. de Brevern AG, Etchebest C, Hazout S. Bayesian probabilistic approach for predicting backbone structures in terms of protein blocks. Proteins. 2000;41:271–287. [PubMed]
45. Jones TA, Thirup S. Using known substructures in protein model building and crystallography. EMBO J. 1986;5:819–822. [PMC free article] [PubMed]
46. Kolodny R, Koehl P, Guibas L, Levitt M. Small libraries of protein fragments model native protein structures accurately. J Mol Biol. 2002;323:297–307. [PubMed]
47. Levitt M. Accurate modeling of protein conformation by automatic segment matching. J Mol Biol. 1992;226:507–533. [PubMed]
48. Micheletti C, Seno F, Maritan A. Recurrent oligomers in proteins: an optimal scheme reconciling accurate and concise backbone representations in automated folding and design studies. Proteins. 2000;40:662–674. [PubMed]
49. Rooman MJ, Rodriguez J, Wodak SJ. Automatic definition of recurrent local structure motifs in proteins. J Mol Biol. 1990;213:327–336. [PubMed]
50. Schuchhardt J, Schneider G, Reichelt J, Schomburg D, Wrede P. Local structural motifs of protein backbones are classified by self-organizing neural networks. Protein Eng. 1996;9:833–842. [PubMed]
51. Unger R, Harel D, Wherland S, Sussman JL. A 3D building blocks approach to analyzing and predicting structure of proteins. Proteins. 1989;5:355–373. [PubMed]
52. Sander O, Sommer I, Lengauer T. Local protein structure prediction using discriminative models. BMC Bioinformatics. 2006;7:14. [PMC free article] [PubMed]
53. Thangudu RR, Sharma P, Srinivasan N, Offmann B. Analycys: a database for conservation and conformation of disulphide bonds in homologous protein domains. Proteins. 2007;67:255–261. [PubMed]
54. de Brevern AG. New assessment of a structural alphabet. In Silico Biol. 2005;5:283–289. [PMC free article] [PubMed]
55. de Brevern AG, Benros C, Gautier R, Valadie H, Hazout S, et al. Local backbone structure prediction of proteins. In Silico Biol. 2004;4:381–386. [PMC free article] [PubMed]
56. Etchebest C, Benros C, Hazout S, de Brevern AG. A structural alphabet for local protein structures: improved prediction methods. Proteins. 2005;59:810–827. [PubMed]
57. Zimmermann O, Hansmann UH. LOCUSTRA: accurate prediction of local protein structure using a two-layer support vector machine approach. J Chem Inf Model. 2008;48:1903–1908. [PubMed]
58. Dong Q, Wang X, Lin L, Wang Y. Analysis and prediction of protein local structure based on structure alphabets. Proteins. 2008;72:163–172. [PubMed]
59. Benros C, de Brevern AG, Hazout S. Analyzing the sequence-structure relationship of a library of local structural prototypes. J Theor Biol. 2009;256:215–226. [PubMed]
60. de Brevern AG, Etchebest C, Benros C, Hazout S. “Pinning strategy”: a novel approach for predicting the backbone structure in terms of protein blocks from sequence. J Biosci. 2007;32:51–70. [PubMed]
61. Li Q, Zhou C, Liu H. Fragment-based local statistical potentials derived by combining an alphabet of protein local structures with secondary structures and solvent accessibilities. Proteins. 2009;74:820–836. [PubMed]
62. Tyagi M, Bornot A, Offmann B, de Brevern AG. Protein short loop prediction in terms of a structural alphabet. Comput Biol Chem. 2009;33:329–333. [PubMed]
63. Chen B, Johnson M. Protein local 3D structure prediction by Super Granule Support Vector Machines (Super GSVM). BMC Bioinformatics. 2009;10:S15. [PMC free article] [PubMed]
64. Dudev M, Lim C. Discovering structural motifs using a structural alphabet: application to magnesium-binding sites. BMC Bioinformatics. 2007;8:106. [PMC free article] [PubMed]
65. Faure G, Bornot A, de Brevern AG. Analysis of protein contacts into Protein Units. Biochimie. 2009;91:876–887. [PubMed]
66. Thomas A, Deshayes S, Decaffmeyer M, Van Eyck MH, Charloteaux B, et al. Prediction of peptide structure: how far are we? Proteins. 2006;65:889–897. [PubMed]
67. Tyagi M, de Brevern AG, Srinivasan N, Offmann B. Protein structure mining using a structural alphabet. Proteins. 2008;71:920–937. [PubMed]
68. Zuo YC, Li QZ. Using reduced amino acid composition to predict defensin family and subfamily: Integrating similarity measure and structural alphabet. Peptides. 2009;30:1788–1793. [PubMed]
69. Joseph AP, Agarwal G, Mahajan S, Gelly J-C, Swapna LS, et al. A short survey on Protein Blocks. Biophysical Reviews. 2010;2:137–145. [PMC free article] [PubMed]
70. Joseph AP, Bornot A, de Brevern AG. Rangwala H, Karypis G, editors. Local Structure Alphabets. 2010. Protein Structure Prediction John Wiley & Sons, Inc., Hoboken, NJ, USA.
71. Wu CY, Chen YC, Lim C. A structural-alphabet-based strategy for finding structural motifs across protein families. Nucleic Acids Res. 2010;38:e150. [PMC free article] [PubMed]
72. Tyagi M, Sharma P, Swamy CS, Cadet F, Srinivasan N, et al. Protein Block Expert (PBE): a web-based protein structure analysis server using a structural alphabet. Nucleic Acids Res. 2006;34:W119–123. [PMC free article] [PubMed]
73. Tyagi M, Gowri VS, Srinivasan N, de Brevern AG, Offmann B. A substitution matrix for structural alphabet based on structural alignment of homologous proteins and its applications. Proteins. 2006;65:32–39. [PubMed]
74. Joseph AP, Srinivasan N, de Brevern AG. Improvement of protein structure comparison using a structural alphabet. Biochimie. 2011;93:1434–1445. [PubMed]
75. Murzin AG, Brenner SE, Hubbard T, Chothia C. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol. 1995;247:536–540. [PubMed]
76. Kohonen T. Self-Organizing Maps (3rd edition): Springer. 501 p. 2001.
77. Gelly JC, Joseph AP, Srinivasan N, de Brevern AG. iPBA: a tool for protein structure comparison using sequence alignment strategies. Nucleic Acids Res. 2011;39:W18–23. [PMC free article] [PubMed]
78. Balaji S, Sujatha S, Kumar SS, Srinivasan N. PALI-a database of Phylogeny and ALIgnment of homologous protein structures. Nucleic Acids Res. 2001;29:61–65. [PMC free article] [PubMed]
79. Gowri VS, Pandit SB, Karthik PS, Srinivasan N, Balaji S. Integration of related sequences with protein three-dimensional structural families in an updated version of PALI database. Nucleic Acids Res. 2003;31:486–488. [PMC free article] [PubMed]
80. Sujatha S, Balaji S, Srinivasan N. PALI: a database of alignments and phylogeny of homologous protein structures. Bioinformatics. 2001;17:375–376. [PubMed]
81. Konagurthu AS, Whisstock JC, Stuckey PJ, Lesk AM. MUSTANG: a multiple structural alignment algorithm. Proteins. 2006;64:559–574. [PubMed]
82. Johnson MS, Overington JP. A structural basis for sequence comparisons. An evaluation of scoring methodologies. J Mol Biol. 1993;233:716–738. [PubMed]
83. Smith TF, Waterman MS. Identification of common molecular subsequences. J Mol Biol. 1981;147:195–197. [PubMed]
84. Martinez JC, Pisabarro MT, Serrano L. Obligatory steps in protein folding and the conformational diversity of the transition state. Nat Struct Biol. 1998;5:721–729. [PubMed]
85. Cubellis MV, Cailliez F, Lovell SC. Secondary structure assignment that accurately reflects physical and evolutionary characteristics. BMC Bioinformatics. 2005;6:S8. [PMC free article] [PubMed]
86. Hubbard SJ, Thornton JM. NACCESS. Department of Biochemistry and Molecular Biology, University College London. pp. Computer Program. 1993.
87. Gunasekaran K, Gomathi L, Ramakrishnan C, Chandrasekhar J, Balaram P. Conformational interconversions in peptide beta-turns: analysis of turns in proteins and computational estimates of barriers. J Mol Biol. 1998;284:1505–1516. [PubMed]
88. Nicholson LK, Yamazaki T, Torchia DA, Grzesiek S, Bax A, et al. Flexibility and function in HIV-1 protease. Nat Struct Biol. 1995;2:274–280. [PubMed]
89. Srinivasan R, Rose GD. The T-to-R transformation in hemoglobin: a reevaluation. Proc Natl Acad Sci U S A. 1994;91:11113–11117. [PMC free article] [PubMed]
90. Hayward S. Peptide-plane flipping in proteins. Protein Sci. 2001;10:2219–2227. [PMC free article] [PubMed]
91. Hutchinson EG, Thornton JM. A revised set of potentials for beta-turn formation in proteins. Protein Sci. 1994;3:2207–2216. [PMC free article] [PubMed]
92. Holm L, Sander C. Protein structure comparison by alignment of distance matrices. J Mol Biol. 1993;233:123–138. [PubMed]
93. Gibrat JF, Madej T, Bryant SH. Surprising similarities in structure comparison. Curr Opin Struct Biol. 1996;6:377–385. [PubMed]
94. Shindyalov IN, Bourne PE. Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Eng. 1998;11:739–747. [PubMed]
95. Guerler A, Knapp EW. Novel protein folds and their nonsequential structural analogs. Protein Sci. 2008;17:1374–1382. [PMC free article] [PubMed]
96. Zhang Y, Skolnick J. TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res. 2005;33:2302–2309. [PMC free article] [PubMed]
97. Ye Y, Godzik A. Flexible structure alignment by chaining aligned fragment pairs allowing twists. Bioinformatics. 2003;19:ii246–255. [PubMed]
98. Aurora R, Rose GD. Helix capping. Protein Sci. 1998;7:21–38. [PMC free article] [PubMed]
99. Chakrabartty A, Doig AJ, Baldwin RL. Helix capping propensities in peptides parallel those in proteins. Proc Natl Acad Sci U S A. 1993;90:11332–11336. [PMC free article] [PubMed]
100. Engel DE, DeGrado WF. Alpha-alpha linking motifs and interhelical orientations. Proteins. 2005;61:325–337. [PubMed]
101. Sagermann M, Martensson LG, Baase WA, Matthews BW. A test of proposed rules for helix capping: implications for protein design. Protein Sci. 2002;11:516–521. [PMC free article] [PubMed]
102. Kruus E, Thumfort P, Tang C, Wingreen NS. Gibbs sampling and helix-cap motifs. Nucleic Acids Res. 2005;33:5343–5353. [PMC free article] [PubMed]
103. Fu H, Grimsley GR, Razvi A, Scholtz JM, Pace CN. Increasing protein stability by improving beta-turns. Proteins. 2009;77:491–498. [PMC free article] [PubMed]
104. Aurora R, Creamer TP, Srinivasan R, Rose GD. Local interactions in protein folding: lessons from the alpha-helix. J Biol Chem. 1997;272:1413–1416. [PubMed]
105. Kapp GT, Richardson JS, Oas TG. Kinetic role of helix caps in protein folding is context-dependent. Biochemistry. 2004;43:3814–3823. [PubMed]
106. Lacroix E, Viguera AR, Serrano L. Elucidating the folding problem of alpha-helices: local motifs, long-range electrostatics, ionic-strength dependence and prediction of NMR parameters. J Mol Biol. 1998;284:173–191. [PubMed]
107. Rose GD. Lifting the lid on helix-capping. Nat Chem Biol. 2006;2:123–124. [PubMed]
108. Altschul SF, Wootton JC, Gertz EM, Agarwala R, Morgulis A, et al. Protein database searches using compositionally adjusted substitution matrices. FEBS J. 2005;272:5101–5109. [PMC free article] [PubMed]
109. Brick K, Pizzi E. A novel series of compositionally biased substitution matrices for comparing Plasmodium proteins. BMC Bioinformatics. 2008;9:236. [PMC free article] [PubMed]
110. Coronado JE, Attie O, Epstein SL, Qiu WG, Lipke PN. Composition-modified matrices improve identification of homologs of saccharomyces cerevisiae low-complexity glycoproteins. Eukaryot Cell. 2006;5:628–637. [PMC free article] [PubMed]
111. Yu YK, Altschul SF. The construction of amino acid substitution matrices for the comparison of proteins with non-standard compositions. Bioinformatics. 2005;21:902–911. [PubMed]
112. Paila U, Kondam R, Ranjan A. Genome bias influences amino acid choices: analysis of amino acid substitution and re-compilation of substitution matrices exclusive to an AT-biased genome. Nucleic Acids Res. 2008;36:6664–6675. [PMC free article] [PubMed]
113. Ellrott K, Guo JT, Olman V, Xu Y. Improvement in protein sequence-structure alignment using insertion/deletion frequency arrays. Comput Syst Bioinformatics Conf. 2007;6:335–342. [PubMed]
114. The PyMol Molecular Graphics System. 1.2 ed: Schrodinger, LLC.

Articles from PLoS ONE are provided here courtesy of Public Library of Science
PubReader format: click here to try


Save items

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


  • PubMed
    PubMed citations for these articles
  • Substance
    PubChem chemical substance records that cite the current articles. These references are taken from those provided on submitted PubChem chemical substance records.

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...