• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptNIH Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Cell. Author manuscript; available in PMC Nov 8, 2011.
Published in final edited form as:
PMCID: PMC3210731
NIHMSID: NIHMS254639

Protein sectors: evolutionary units of three-dimensional structure

Abstract

Proteins display a hierarchy of structural features at primary, secondary, tertiary, and higher-order levels, an organization that guides our current understanding of their biological properties and evolutionary origins. Here, we reveal a structural organization distinct from this traditional hierarchy by statistical analysis of correlated evolution between amino acids. Applied to the S1A serine proteases, the analysis indicates a decomposition of the protein into three quasi-independent groups of correlated amino acids that we term “protein sectors”. Each sector is physically connected in the tertiary structure, has a distinct functional role, and constitutes an independent mode of sequence divergence in the protein family. Functionally relevant sectors are evident in other protein families as well, suggesting that they may be general features of proteins. We propose that sectors represent a structural organization of proteins that reflects their evolutionary histories.

Introduction

How does the amino acid sequence of a protein specify its biological properties? Here, we intend the term “biological properties” to broadly encompass chemical activity, structural stability, and other features that may be under selective pressure. A standard measure of the importance of protein residues is sequence conservation – the degree to which the frequency of amino acids at a given position deviates from random expectation in well-sampled multiple sequence alignment of the protein family (Capra and Singh, 2007; Ng and Henikoff, 2006; Zvelebil et al., 1987). The more unexpected the amino acid distribution at a position, the stronger the inference of evolutionary constraint and therefore of biological importance. However, protein structure and function also depend on the cooperative action of amino acids, indicating that amino acid distributions at positions cannot be taken as independent of one another (Gobel et al., 1994; Lichtarge et al., 1996; Lockless and Ranganathan, 1999; Neher, 1994). A more informative formulation of sequence conservation should be to include pairwise or even higher-order correlations between sequence positions – the statistical signature of conserved interactions between residues. Indeed, analyses of correlations have contributed to the identification of allosteric mechanisms in proteins (Ferguson et al., 2007; Hatley et al., 2003; Kass and Horovitz, 2002; Lee et al., 2008; Lee et al., 2009; Peterson et al., 2004; Shulman et al., 2004; Skerker et al., 2008) and were found to be sufficient for recapitulating native folding and function in a small protein interaction module (Russ et al., 2005; Socolich et al., 2005).

These findings motivate a deeper theoretical and experimental analysis of correlations of sequence positions with the goal of understanding how protein sequences encode the basic conserved biological properties of a protein family. Here, we carry out this analysis using a classic model system for enzyme catalysis, the S1A family of serine proteases (Hedstrom, 2002; Rawlings and Barrett, 1994; Rawlings et al., 2008). We find that the non-random correlations between sequence positions indicate a decomposition of the protein into groups of co-evolving amino acids that we term “sectors”. In the S1A proteases, the sectors are nearly statistically independent, are physically connected in the tertiary structure, are associated with different biochemical properties, and have diverged independently in the evolution of the protein family. Functionally relevant and physically contiguous sectors are evident in other protein domains as well, providing a basis for directing further experimentation using the principles outlined in the serine protease family. Overall, our data support two main findings: (1) protein domains have a heterogeneous internal organization of amino acid interactions that can comprise multiple functionally distinct subdivisions (the sectors), and (2) these sectors define a decomposition of proteins that is distinct from the hierarchy of primary, secondary, tertiary, and quaternary structure. We propose that the sectors are features of protein structures than result from the evolutionary histories of their basic biological properties.

Results

From amino acid sequence to sectors

The S1A family consists primarily of enzymes catalyzing peptide bond hydrolysis through a conserved chemical mechanism, but its members show a broad range of substrate specificities and environments within which they operate. Analysis of positional conservation (see Methods) in a multiple sequence alignment of 1470 members of the family reveals a pattern over sequence positions (Fig. 1A) that has a simple and well-known structural interpretation: more conserved positions tend to be located in the core of the protein or at functional surfaces, and less conserved positions tend to occur on the remainder of the protein surface (Fig. 1B–C) (Bowie et al., 1990; Chothia and Lesk, 1982; Lesk and Chothia, 1982).

Figure 1
Position-specific and correlated conservation in the S1A protease family. A, The conservation of each position i in a multiple sequence alignment of 1470 members of the S1A family, computed by the relative entropy Di(ai) (position numbering according ...

To examine the contribution of correlations to conservation, we followed the statistical coupling analysis (SCA) approach (Lockless and Ranganathan, 1999) to compute a conservation-weighted covariance matrix between all sequence positions in the S1A family (Cij, Fig. 1D, see Methods). Inspection of the matrix clearly indicates that correlations are not simply dominated by proximity in primary structure; many positions show only weak correlation to neighboring positions but significant correlation to positions that are distant along the sequence (Fig. 1D).

What pattern of functional correlations within the serine protease does this matrix indicate? The essence of addressing this problem is two-fold: (1) to separate the functionally significant correlations in the Cij matrix (the “signal”) from correlations that could arise due to limited sampling of sequences (“statistical noise”) or phylogenetic relationships between sequences (“historical noise”), and then (2) to analyze the pattern of the remaining significant correlations.

Our approach for isolating signal from noise in the SCA correlation matrix derives from work more than fifty years ago on random matrix theory (Wigner, 1967). The basic idea is to model the effect of statistical noise by examining correlation matrices for randomized versions of the data; significant patterns of correlations are then deduced by comparison. This approach was used in finance to extract non-random correlations of stock performance over a finite time window (Bouchaud and Potters, 2004; Plerou et al., 2002). This analysis showed that only a small fraction of observed correlations are relevant because most could arise simply by the limited period of time over which stock prices are sampled. The remaining significant correlations are organized in a few collective modes that decompose the economy into business sectors – groups of business entities whose performance fluctuates together over time. We applied these same methods to extract the non-random correlated modes of the SCA matrix, effectively “cleaning” the matrix of statistical noise (Fig. S2A). As for the financial markets, we find that only a few top modes (5 out of 223 total) contain correlations that are clearly distinct from random expectation.

The work in finance also provides a clue for reducing the effect of historical noise. Global, coherent correlations in stock performance occur due to fluctuations in the overall economy and are responsible for a dominant first mode of the correlation matrix. This mode is irrelevant for identifying the non-global, heterogeneous correlations between stocks that define the different business sectors and is therefore removed (Bouchaud and Potters, 2004; Plerou et al., 2002). Similarly, global, coherent correlations between positions should occur due to phylogenetic relationships between sequences and are expected to produce the dominant first mode observed for the SCA matrix (see Methods and Supplementary Note). This mode is irrelevant for decomposing the protein sequence into functional units and is removed. Though the principles of computing correlations vary, similar approaches for partial elimination of purely phylogenetic correlations in protein sequence alignments have been previously described (Atchley et al., 2000; Buck and Atchley, 2005; Ortiz et al., 1999). The process is summarized in the Methods, and a script for reproducing this analysis is provided in the supplementary information. The final result is shown in Figure 1E, a highly simplified representation of the SCA matrix that shows the statistically relevant pattern of correlation. For the S1A family, this analysis reveals two main findings: (1) the 223 sequence positions in the multiple sequence alignment are reduced to 65 positions that show significant patterns of correlations, and (2) these 65 positions can be separated into three seemingly distinct groups (labeled red, blue, and green). By analogy with the work in finance, and to distinguish from other terminologies used in describing protein structures, we refer to these groups of correlated positions as protein sectors – units of a protein that have co-evolved within a protein family.

The concept of business sectors in the economy is clear, but what is the meaning of sectors in proteins? Using the S1A family as a model system, we identify four characteristics of sectors: (1) statistical independence, (2) physical connectivity in the tertiary structure, (3) biochemical independence in mediating protein function, and (4) independent phenotypic variation in the protein family.

Statistical independence

Figure 1E provides a qualitative picture of independence between sectors in the S1A family, but quantitatively, how independent are they? To address this, we computed a measure called the correlation entropy – the degree to which a selected group of residues are statistically coupled to each other in the multiple sequence alignment (Figs. 2 and S6, and see Supplementary Note). If two sectors are independent, then the correlation entropy of two taken together must be the sum of their correlation entropies taken individually. Figure 2 shows that for all pairs of sectors (Fig. 2A–C) and for the three sectors combined (Fig. 2D), this condition holds to a remarkable degree. For example, the correlation entropy of the red and blue sectors taken together (Fig. 2A, black bar) is nearly that of the sum of the individual sector correlation entropies (stacked red and blue bars), and much different from random expectation (gray bar). Overall, the data in Fig. 2A–D show that the red, blue, and green sectors represent highly independent statistical units in the serine protease family. This analysis also permits a quantitative comparison of the degree of independence; the red/blue and green/blue sectors emerge as the most independent (Fig. 2A–B), while the red and green sectors show less independence (Fig. 2C). For example, G216 and V213 are jointly shared by both the red and green sectors (Fig. S3F), suggesting that they represent sites of interaction between these two otherwise independent sectors.

Figure 2
Statistical independence of the three sectors. A–D, For each pairwise combination of sectors (A, red-blue (RB), B, green-blue (GB), and C, green-red (GR)) and the combination of all three sectors (D, red-blue-green (RBG)), the graph shows the ...

Structural connectivity

The identification of independent protein sectors in the serine protease is entirely based on statistical analysis of the sequence alignment without any consideration of the protein structure or its biochemical properties. Nevertheless, the sectors have clearly interpretable tertiary structural properties (Fig. 3). The red sector comprises a contiguous network of amino acids built around the S1 pocket, the primary determinant of substrate specificity (Hedstrom, 2002) (Fig. 3A). The color gradient in Figure 3A represents residue weights, revealing a tertiary structural organization in which the strongest contributors are centered around the S1 pocket and weaker positions comprise the surrounding. This sector includes residues in the environment of the S1 pocket that are known to contribute to its mechanical stability, providing a rationale for their cooperative action (Bush-Pelc et al., 2007; Perona et al., 1995). This sector is clearly involved in catalytic specificity; mutation of residues comprising this sector are known to influence specificity for substrates in several S1A family members (Craik et al., 1985; Hedstrom, 1996; McGrath et al., 1992; Perona et al., 1993; Wang et al., 1997), and this sector correlates well with positions mutated in transferring chymotryptic specificity into trypsin (Hedstrom et al., 1994).

Figure 3
Structural connectivity of the three sectors. A–C, Residues comprising each sector displayed in space filling representation with a van der Waals surface on the tertiary structure of rat trypsin (PDB 3TGI(Pasternak et al., 1999)). Each sector ...

The blue sector comprises another contiguous group of amino acids, but is structurally distinct from the red sector; the constituent residues run through the interior of both of the β-barrels that comprise the core structure of the protease (Fig. 3B), but also extend from both β-barrels to directly contact the catalytic triad residues (Fig. 4B). Mapping of residue weights in this sector indicates a few foci joined by intervening positions with lower weights, as if the activity of this sector is a more distributed rather than localized property of the protein structure. Unlike the red sector, prior work establishes no unified role for this sector, likely because blue sector residues are not obviously distinguishable from the general milieu of residues in the protein core that are similarly conserved and well-packed (Figs. 4 and S7).

Figure 4
Relationship of sectors to primary, secondary, and tertiary structure. A, Positions colored by sector identity on the primary and secondary structure of a member of the S1A family (rat trypsin); the bar graph shows the global conservation of each position. ...

Finally, the green sector forms another contiguous group of amino acids, located at the interface between the two β-barrels that make up the protease (Fig. 3C). Residues within this sector include the catalytic triad (H57, D102, and S195), and surrounding residues known to be important for the basic chemical mechanism of this enzyme family (Baird et al., 2006; Hedstrom, 2002), and for some forms of allosteric control over this activity (Guinto et al., 1999; Huntington and Esmon, 2003). Like the red sector, residue weights are largest around a hotspot (the catalytic residues), and fall off in surrounding positions. This sector includes one disulfide bond pair (C42–C58), substitution of which has been shown to cooperatively interact with mutation of S195 (Baird et al., 2006). Indeed, triple mutation of C42A, C58A/V, and S195T are sufficient to convert trypsin from a serine protease to a threonine protease. We conclude that the green sector represents the catalytic core of the protease family. Consistent with joint contribution to both the red and green sectors, positions 213 and 216 are found to form a major part of the packing interface between these two sectors (not shown).

More generally, the physical connectivity of each sector is striking given that no information about tertiary structure was used in their identification and that only ~10% of total sequence positions contribute strongly to each sector. Shown together, the three sectors occupy largely distinct subdivisions within the core of the tertiary structure, making contacts only at a few positions (Fig. 4B–D). The considerable prior experimental work on the serine proteases permits the partial functional interpretation of sectors provided above, but it is important to note that the sectors are otherwise not obvious. No sector corresponds to any known subdivision of proteins by primary structure segments, secondary structure elements, or subdomain architecture (Fig. 4A). In addition, the three sectors are not distinguishable by degree of solvent exposure, by the conservation of positions taken independently, or with the obvious exception of the green sector, by proximity to the active site (Fig. 4).

Biochemical independence

What is the functional meaning of independence between protein sectors? To address this question, we carried out alanine mutagenesis of residues spanning the range of correlation strengths in the red and blue sectors in rat trypsin and measured the effect on two basic properties of these enzymes: catalytic power and thermal stability. Catalytic power was measured using a standard chromophore-based assay on a model trypsin substrate peptide (NH2-AAPK-pNA) (Hedstrom et al., 1994), and stability by following the denaturation temperature (Tm) using the fluorescence of buried tryptophan residues as a probe for the native state (Fig. S8–S10 and Table S2).

Consistent with prior work (Hedstrom, 2002), we find that mutations in the red sector have significant effect on catalytic activity (red circles, Fig. 5A). However, mutations in this sector have only minor effect on thermal stability. The same result holds for a multiple mutant in the red sector (Hswap, Fig. 5B), in which a large number of red sector positions are exchanged for corresponding amino acids in chymotrypsin (Hedstrom et al., 1994). Strikingly, blue sector mutations have the opposite phenotype – a wide range of effects on thermal stability, but only marginal effect on catalytic activity (Fig. 5A, blue circles). Double alanine mutants within the blue sector reinforce this result: these proteins show exclusive effects on thermal stability, with little or no effect on catalytic activity (Fig. 5B). In addition, the data suggest that mutations within the sector act cooperatively. For example, L105A and T229A destabilize trypsin by 10.4 K and 8.0 K, respectively, but the effects of these mutations are reduced or abrogated in the background of M104A, an indication of epistatic interactions within this sector. These findings are structurally non-trivial; some blue sector positions (e.g. M104, T229) are as close to catalytic residues as some red sector positions (e.g. C191, G216) and are just as buried (Figs. 4C–D), but nevertheless show distinct functional properties upon mutation.

Figure 5
Mutational analysis of the red and blue sectors. A, Single alanine mutations at a set of red and blue sectors positions in rat trypsin, evaluated for effects on catalytic power, and thermal stability (Tm). Residues selected for single mutation were chosen ...

One further experiment tests the independence of the red and blue sectors in affecting structural stability and catalytic activity. If these groups act independently, then combinations of mutations between these two groups should show additive effects on measured parameters. Figure 5C confirms this prediction for two pairs of inter-sector mutations; the measured effect of the double mutants (magenta circles) is nearly that predicted from the single mutant experiments (white circles). Thus, the red and blue sectors are associated with near-independent biochemical properties of the protease.

A small sampling of non-sector mutants in the core of trypsin shows little effects on either catalytic activity or thermal stability (white circles, Fig. 5A). Further work will be required to more broadly test the role of conserved but non-sector positions in contributing to protease function. In addition, one aspect of the mutational effects in the blue sector is worth noting. Blue sector mutants affect thermal stability, but it is possible that this may actually reflect changes in the local stability of regions involved in functional processes such as control over protease lifetime through autocatalytic degradation. Consistent with this notion, the C136–C201 disulfide bond and other blue sector positions are located in regions that flank known autocatalytic sites (Bodi et al., 2001; Lee et al., 2004).

Independent sequence divergence

The finding of independent sectors in the serine protease has important implications for phylogenetic analysis of this protein family. Specifically, the data suggest that no single measure of the divergence of protein sequences can correctly represent their differences in functional properties. Instead, sequence divergence should be treated as a fundamentally multidimensional problem – using separate measures for each sector. To illustrate this, we calculated sequence similarities between sequences within the multiple sequence alignment using only the positions that contribute to the red, blue, or green sectors separately. As a control, we also calculated sequence similarity conventionally, using all positions in the sequence. Principal components analysis of the corresponding similarity matrices provides a simple representation of the relationships amongst the sequences as defined by each sector (Figs. 6A–C) or by all positions taken together (Fig. 6D). Thus, sequences with a similar motif in the red sector are grouped in Figures 6A regardless of their divergence in other positions. Similarly, sequences with a similar motif in the blue sector are grouped in Figures 6B and sequences with a similar motif in the green sector are grouped in Figures 6C, regardless of divergence elsewhere. Sequences are grouped in Figures 6D only if they are globally similar.

Figure 6
Multidimensional sequence divergence within the serine protease family. Each stacked histogram shows the principal component of a sequence similarity matrix between the 442 members of the S1A family for which functional annotation is available. Similarity ...

Consistent with the role of the red sector in substrate recognition, sequence divergence in this sector classifies the proteases effectively by primary catalytic specificity (Fig. 6A, left panel). The trypsins (magenta) and chymotrypsins (blue) are separated, while the trypsins, the tryptases (yellow) and kallikreins (orange), diverse proteases with similar specificity (Kam et al., 2000; Olsson et al., 2004), are found together. The granzymes (green) come in several specificity classes (A and K (tryptic), B (aspartic) and M (chymotryptic), (Bell et al., 2003; Kam et al., 2000; Ruggles et al., 2004)), and occupy regions that correlate with their specificity class. However, this sector fails to separate the sequences according to the organism type in which they occur (Fig. 6A, middle, vertebrate and invertebrate sequences are mixed) or by the existence of the catalytic mechanism (Fig. 6A, right, non-enzymatic and enzymatic members of the S1A family are mixed). In contrast, the blue sector has a completely distinct effect in classifying protease sequences. This sector fails to group sequences by their catalytic specificity (Fig. 6B, left) or by catalytic mechanism (Fig. 6B, right), but does effectively classify sequences by organism type (Fig. 6B, middle). Finally, the green sector displays a third classification; it fails to separate sequences by catalytic specificity (Fig. 6C, left) or by organism type (Fig. 6C, middle), but does separate the non-enzymatic and enzymatic members of the S1A family (Fig. 6C, right). Similarity calculated over the entire protein sequence fails to effectively classify either by catalytic specificity, by organism type, or by chemical mechanism (Fig. 6D), indicating (1) that these phenotypic classifications are specific properties of the sectors and (2) that this result cannot be trivially explained by phylogenetic proximity of sequences. Thus, sectors represent independent modes of selection, a result that should provide important constraints in developing models for the evolutionary origins of the S1A family.

Sectors in other protein families

To begin to examine the generality of the sector concept, we carried out spectral analysis of the SCA matrix from four other protein families for which substantial prior experimental data permit a meaningful interpretation (Figure 7). The results show that functionally relevant sectors are found in each case and provide a non-trivial basis for experiment design. For example, two sectors are evident in the PSD95/Dlg1/ZO1 (PDZ) domain family of protein interaction modules (blue and red, Fig. 7A), each of which comprises a small fraction of total residues. Interestingly, each sector is involved in a distinct regulatory mechanism in the PDZ family. The blue sector is connected through peptide ligand (Lockless and Ranganathan, 1999) and defines an allosteric mechanism for regulating binding affinity at the α2-β2 groove through molecular interactions at a distant surface site on the α1 helix (Peterson et al., 2004), and the red sector corresponds to a redox-based conformational switch that regulates the shape of the ligand-binding pocket (Mishra et al., 2007) (Fig. 7A). These regulatory mechanisms have been experimentally demonstrated to date in only a few members of the PDZ family and could be seen as idiosyncratic features of specific PDZ domains. However, the sector hypothesis suggests that they are more general features of the protein family – variations within a sparse network of correlated positions that have the capacity to generate a diversity of regulatory phenotypes through step-wise modification of a few amino acid positions. The identification of PDZ sectors provides a basis for testing this hypothesis.

Figure 7
Functional sectors in other protein families. SCA correlation matrices for the PDZ (A), PAS (B), SH2 (C), and SH3 (D) domain families after reduction of statistical and historical noises (C˜ij, analogous to Fig. 1E). In each case, the ...

Two sectors are also evident in the Per/Arnt/Sim (PAS) domain family of allosteric signalling modules in which ligand binding (or chromophore isomerisation) at a surface pocket located on one side triggers conformational changes at N- and C-terminal structural motifs docked at the opposite surface (Halavaty and Moffat, 2007; Harper et al., 2003) (Fig. 7B). In the PAS family, one sector (blue) forms a network of amino acids within the core domain that links the ligand-binding pocket to the allosteric surface sites, and the other sector (red) comprises a cluster of amino acids at one surface site that connects the PAS core to a modular C-terminal “output” motif (Halavaty and Moffat, 2007) (Jα helix in Fig. 7B). This mapping motivated the design of a synthetic two-domain allosteric protein by connecting sectors in two different proteins across their surface sites (Lee et al., 2008). The concept of allosteric coupling through sector linkage provides a starting point for testing a more general hypothesis that surface exposed regions of sectors represent “hotspots” for the establishment of cooperative functional interactions between protein domains.

Physically contiguous sectors are also evident in the SH2 and SH3 families of interaction modules (Fig. 7C–D). A full discussion of the extensive literature regarding these domains is a matter for future work, but an initial analysis reveals consistency with known functional mechanisms. In the phosphotyrosine-binding SH2 domains, the blue sector is largely buried within the core, while the red and green sectors make direct interactions with substrate peptide. The red sector surrounds the P-Tyr and the immediately N-terminal residue (positions 0 and −1, respectively, Fig. 7C), and extends to a surface of the αA helix through network of intervening residues. Interestingly, αA is a major aspect of the interface between the SH2 domain and the catalytic domain of the Fes tyrosine kinase and experiments show that SH2 ligands allosterically influence kinase activity through this interdomain interaction (Filippakopoulos et al., 2008). The green sector includes residues interacting with the peptide ligand at positions that are C-terminal to the P-Tyr (+1 to +5, Fig. 7C); these positions are known to contribute to determining the specificity of SH2 domains for target ligands (Kuriyan and Cowburn, 1997). In SH3, the blue sector identifies the residues that bind the canonical polyproline motif that occurs in peptide ligands for this domain family (Yu et al., 1994; Zarrinpar et al., 2003). This finding suggests that the different subsites within the SH3 binding pocket (Fig. 7D) should act cooperatively rather than separately in binding ligands. The SH3 red sector comprises a contiguous network of residues that link a region formed by a short 310 helix and a portion of the n-Src loop to the so-called distal loop via residues within β-strand c. Prior work indicates that these residues contribute to SH3 domain stability (Martinez and Serrano, 1999) and form part of a conserved “folding nucleus” that is partially ordered at the transition state for the folding reaction (Martinez and Serrano, 1999; Riddle et al., 1999). It will be interesting to test the sector-based prediction that determinants of substrate specificity and folding kinetics in the SH3 domain can be independently tuned through targeted variation of sector positions.

Discussion

Classical analyses describe proteins as a hierarchy of primary, secondary, tertiary, and quaternary structures. This description derives from the basic chemical properties of polypeptide chains and empirical observation, and is the basis for current classifications and comparative analyses of protein families (Holm and Sander, 1996; Orengo and Thornton, 2005; Thornton et al., 1999). However, biological properties of proteins arise from the cooperative action of amino acid residues, and the pattern of residue cooperativity in the three-dimensional structure is generally unknown. Here, we show that generalizing the principle of conservation to account for correlations between positions reveals a novel structural organization for proteins that is distinct from traditional hierarchical descriptions. Statistically non-random correlations are arranged into physically connected groups of co-evolving amino acids – the sectors – that involve amino acids spread out throughout primary structure, and across various secondary structure elements and tertiary structural subdomains. In the S1A family, the sectors manifest as strikingly independent features, controlling distinct biochemical properties and corresponding to orthogonal modes of sequence variation. The degree of independence of sectors in other domain families is yet to be investigated, and indeed, strict independence of sectors need not hold in every case (see below). Nevertheless, the fact that sectors correspond to important structural and functional properties in several protein families provides strong support for their biological relevance. Overall, we hypothesize that sectors represent the structural organization within proteins reflecting, at least in part, the functional interactions between amino acid residues that underlies conserved biological properties.

The finding of multiple independent sectors within a single protein domain has implications for physical properties of proteins. Atomic structures typically show a tightly packed and nearly homogeneous pattern of contacts between atoms, an observation that suggests an uniform importance of local interactions between amino acid residues. However, the finding of sparse, physically connected, and functionally quasi-independent sectors indicates that out of the uniform pattern of contacts between residues emerges a heterogeneous pattern of functional interactions. Indeed, a large body of experimental work now argues that amino acids contribute cooperatively but unequally in specifying protein structure and function (Agarwal et al., 2002; Benkovic and Hammes-Schiffer, 2003; Clackson and Wells, 1995; Datta et al., 2008; Eisenmesser et al., 2002; Fuentes et al., 2004; Ota and Agard, 2005; Sadovsky and Yifrach, 2007; Smock and Gierasch, 2009). The heterogeneity of correlations emerging from the statistics of conservation in protein families may be the representation of this feature. Support for this idea is provided by a correlation-based protein design experiment. For a small protein interaction module (the WW domain), this experiment showed that the pattern of correlations in the SCA matrix alone suffices to design artificial proteins that recapitulate the native-like atomic structure and biochemical activity of the WW family (Russ et al., 2005; Socolich et al., 2005).

These results underlie a conceptual departure between the SCA and some previous methods of analyzing residue covariation in protein families. Indeed, several reports have proposed approaches for the calculation of residue covariance, but often with the goal of identifying the pattern of contacts in the three dimensional structure (Gobel et al., 1994; Hamilton et al., 2004; Larson et al., 2000; Neher, 1994; Olmea and Valencia, 1997; Ortiz et al., 1999; Shindyalov et al., 1994; Thomas et al., 1996). The methodological details vary, but the conclusion of these studies is consistently clear: residue covariation is a poor indicator of the overall pattern of contacts in protein structures. One possible interpretation of this result is that covariation analyses fail to capture the essential design of proteins (Fodor and Aldrich, 2004), but, consistent with other studies (Kass and Horovitz, 2002; Lapedes et al., 1999; Lichtarge et al., 1996), the sector hypothesis suggests an alternate view: the pattern of constraints underlying the biological properties of proteins fundamentally differs from the pattern of observed contacts. More specifically, the hypothesis is that many contacts have weak or idiosyncratic roles, while a fraction of contacts are organized into collective systems – the sectors – that contribute most significantly to biological properties. In such a heterogeneous organization, different sectors could operate with near-independence. The identification and validation of sectors in a few model proteins should help direct physical studies to experimentally test this proposal for the organization of amino acid interactions.

The results presented here imply the possibility of sector mapping for many protein families, but we caution that significant technical challenges remain in the development of general approaches for sector identification. The S1A, PDZ, PAS, SH2, and SH3 families represent cases in which both the extent and uniformity of sampling in the alignment permits straightforward application of the computational methods introduced in this work. In contrast, non-uniform sampling can lead to complications in sector analysis. An illustrative example of this problem is even evident in the S1A family; the presence of a small clade of snake venom proteases results in a weak “pseudo-sector” that emerges on one of the lower modes of the SCA matrix (Fig. S4, and the Supplementary Information). This pseudo-sector is easily recognized and disregarded in this case, but serves to highlight a potential challenge in the analysis of other protein families (Buck and Atchley, 2005). However, several strategies exist for correcting for biased sampling in alignments and for improving the recognition of statistical independent subgroups from correlation matrices that could be exploited in developing more powerful methods for sector identification. By taking the simplest approach in families suitable for forward and retrospective experimental analysis, this work provides a starting point for future studies.

Regardless of methodological issues, the validation of sectors in a few experimentally tractable model systems opens the possibility of addressing basic questions about the design of natural proteins. What is the origin of sectors in proteins and what controls their independence? Indeed, why should there be sectors at all? The answer to these questions ultimately involves the largely unknown evolutionary histories of protein families. In the case of serine proteases, it is interesting to note that enzymes with the same specificity are found in a variety of chemical environments and enzymes with different specificities are found in the same chemical environment. For example, tryptic specificity occurs in the gut, but also in the plasma and at sites of wound healing. At the same time, tryptic and chymotryptic specificities are often found together in the same environments. Thus, the capacity for independent control over enzyme activity, selectivity, and stability may provide an important adaptive advantage for the serine protease family. An implication of this line of thinking is that strict sector independence need not be guaranteed in every protein family. Instead, the emergence of independent functional sectors in proteins might be fundamentally tied to the independent variation of selective pressures acting on members of a protein family. More generally, we suggest that information about the statistics of the selective pressures is stored in the pattern of correlations in the protein sequence. The identification of sectors reported here provides a necessary first step in testing this hypothesis.

Experimental Procedures

A. Sequence alignment construction and annotation

Sequences comprising the S1A, PAS, SH2, and SH3 families were collected from the NCBI non-redundant database (release 2.2.14 May-07-2006) through iterative PSI-BLAST (Altschul et al., 1997) and aligned using Cn3D (Wang et al., 2000) and ClustalX (Thompson et al., 1997) followed by standard manual adjustment methods (Doolittle, 1996). The alignment of PDZ domains is from previous work (Lockless and Ranganathan, 1999). See supplementary methods for more information.

B. Sequence analyses

The analysis of conservation and pairwise correlation in the multiple sequence alignment uses updated versions of the SCA method (Lockless and Ranganathan, 1999; Suel et al., 2003). Due to size considerations, methodological details for this analysis (including a MATLAB script for reproduction of all of the calculations) are provided in the supplementary information. A MATLAB (Mathworks Inc.) toolbox implementing the methods described here is available by request.

C. Minimum discriminatory information (MDI) method

The minimum discriminatory information (Kullback, 1997) (MDI) method generalizes the notion of positional conservation to include correlations between positions. In the binary approximation where only the most frequent amino acid ai is considered at each position i, this is achieved by minimizing the relative entropy D(PQ)=xP(x) lnP(x)Q(x) over the probability distributions P(x) whose marginals reproduce the frequencies fij(aiaj). Here, x represents a sequence in the binary approximation and Q(x) denotes its background probability, Q(x)=i(q(ai))xi(1q(ai))1xi. We performed the minimization numerically for small subsets S of positions using the Generalized Iterative Scaling Algorithm (Darroch and Ratcliff). In figure 2, the entropies DS represent the case when S was composed of the top five positions contributing to each sector. The statistical dependence between two sectors S1 and S2 was measured by DS1 [union or logical sum]S2DS1DS2.

D. Protein purification and kinetic assays

Purification of wild-type and mutant rat trypsins and measurement of kinetic parameters (Vmax and Km) were as previously described (Hedstrom et al., 1994) with minor modifications as detailed in the supplementary methods. The substrate used was Suc-Ala-Ala-Pro-Lys-PNA (Bachem) dissolved in dimethylformamide (DMF) to 50mM, and enzyme activity was measured at 23° C in 50 mM Hepes, 10 mM CaCl2 and 100 mM NaCl, at a pH 8.0 by spectroscopically monitoring release of p-nitroaniline (extinction coefficient of 10204 M−1 cm−1 at 410nm). To obtain kcat (as Vmax/active site concentration), active site concentration was measured by 4-methylumbelliferyl p-guanidobenzoate (MUGB, Sigma-Aldrich) titration (see supplementary methods). Kinetic assays were verified by comparison of data for WT rat trypsin and mutants with previously reported data (Craik et al., 1985; Hedstrom, 1996; McGrath et al., 1992; Wang et al., 1997).

E. Thermal denaturation assays

The fold stability of enzymes was measured using thermal denaturation and monitoring the intrinsic tryptophan fluorescence of enzymes. Stability was assayed in 0.1 M formic acid to keep enzymes inactive(Bittar et al., 2003; Brumano et al., 2000). The fluorescence (excitation at 295 nm/emission at 340 nm) was measured in the range of 4° C to 85° C (at a rate of 4° C/min; sampling interval 0.1° C for most proteins) in a 3 ml quartz cuvette with stirring. The total volume was kept at 2.1 mL to ensure that the rate of temperature increase was the same across different assays. Pre- and post-transition baselines were fit by linear regression, subtracted from the raw data, and the Tm was calculated by the differential method (John and Weeks, 2000; Naganathan and Munoz, 2008). Briefly, baseline subtracted data were smoothed by the robust Lowess method (MATLAB, Mathworks Inc.), differentiated, and the Tm measured as the extremum of the differential melt. C136A showed no observable transition in the range of the experiment (Fig. S10). All data were collected at least in triplicate; the data in Figure S9 show the mean and standard deviation of the individual trials.

Supplementary Material

01

Acknowledgements

We thank members of the Ranganathan and Leibler laboratories for discussion and critical review of the manuscript, L. Hedstrom for materials and protocols, and A. Poole and W. Russ for the SH domain alignments. This study was supported by the Robert A. Welch foundation (R.R.), and the Green Center for Systems Biology at UT Southwestern Medical Center. OR is a fellow of the Human Frontier Science Program.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Contributor Information

Najeeb Halabi, The Green Center for Systems Biology, and Department of Pharmacology, University of Texas Southwestern Medical Center, Dallas, TX 75390-9050.

Olivier Rivoire, The Center for Studies in Physics and Biology and Laboratory of Living Matter, Rockefeller University, New York, NY 10065.

Stanislas Leibler, The Center for Studies in Physics and Biology and Laboratory of Living Matter, Rockefeller University, New York, NY 10065. The Simons Center for Systems Biology and the School of Natural Sciences, The Institute for Advanced Study, Princeton, NJ 08540.

Rama Ranganathan, The Green Center for Systems Biology, and Department of Pharmacology, University of Texas Southwestern Medical Center, Dallas, TX 75390-9050.

References

  • Agarwal PK, Billeter SR, Rajagopalan PT, Benkovic SJ, Hammes-Schiffer S. Network of coupled promoting motions in enzyme catalysis. Proc Natl Acad Sci U S A. 2002;99:2794–2799. [PMC free article] [PubMed]
  • Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. [PMC free article] [PubMed]
  • Atchley WR, Wollenberg KR, Fitch WM, Terhalle W, Dress AW. Correlations among amino acid sites in bHLH protein domains: an information theoretic analysis. Mol Biol Evol. 2000;17:164–178. [PubMed]
  • Baird TT, Jr, Wright WD, Craik CS. Conversion of trypsin to a functional threonine protease. Protein Sci. 2006;15:1229–1238. [PMC free article] [PubMed]
  • Bell JK, Goetz DH, Mahrus S, Harris JL, Fletterick RJ, Craik CS. The oligomeric structure of human granzyme A is a determinant of its extended substrate specificity. Nat Struct Biol. 2003;10:527–534. [PubMed]
  • Benkovic SJ, Hammes-Schiffer S. A perspective on enzyme catalysis. Science. 2003;301:1196–1202. [PubMed]
  • Bittar ER, Caldeira FR, Santos AMC, A.R. Gn, Rogana E, M. SM. Characterization of -trypsin at acid pH by differential scanning calorimetry. Brazilian Journal of Medical and Biological Research. 2003;36:1621–1627. [PubMed]
  • Bodi A, Kaslik G, Venekei I, Graf L. Structural determinants of the half-life and cleavage site preference in the autolytic inactivation of chymotrypsin. Eur J Biochem. 2001;268:6238–6246. [PubMed]
  • Bouchaud J-P, Potters M. Theory of Financial Risk and Derivative Pricing: From Statistical Physics to Risk Management. 2 edn. Cambridge University Press; 2004.
  • Bowie JU, Reidhaar-Olson JF, Lim WA, Sauer RT. Deciphering the message in protein sequences: tolerance to amino acid substitutions. Science. 1990;247:1306–1310. [PubMed]
  • Brumano MH, Rogana E, Swaisgood HE. Thermodynamics of unfolding of beta-trypsin at pH 2.8. Archives of Biochemistry and Biophysics. 2000;382:57–62. [PubMed]
  • Buck MJ, Atchley WR. Networks of coevolving sites in structural and functional domains of serpin proteins. Mol Biol Evol. 2005;22:1627–1634. [PubMed]
  • Bush-Pelc LA, Marino F, Chen Z, Pineda AO, Mathews FS, Di Cera E. Important role of the cys-191 cys-220 disulfide bond in thrombin function and allostery. J Biol Chem. 2007;282:27165–27170. [PubMed]
  • Capra JA, Singh M. Predicting functionally important residues from sequence conservation. Bioinformatics. 2007;23:1875–1882. [PubMed]
  • Chothia C, Lesk AM. Evolution of proteins formed by beta-sheets. I. Plastocyanin and azurin. Journal of Molecular Biology. 1982;160:309–323. [PubMed]
  • Clackson T, Wells JA. A hot spot of binding energy in a hormone-receptor interface. Science. 1995;267:383–386. [PubMed]
  • Craik CS, Largman C, Fletcher T, Roczniak S, Barr PJ, Fletterick R, Rutter WJ. Redesigning trypsin: alteration of substrate specificity. Science. 1985;228:291–297. [PubMed]
  • Darroch JN, Ratcliff D. Generalized Iterative Scaling For Log-Linear Models. The Annals of Mathematical Statistics. 43:1470–1480.
  • Datta D, Scheer JM, Romanowski MJ, Wells JA. An allosteric circuit in caspase-1. J Mol Biol. 2008;381:1157–1167. [PMC free article] [PubMed]
  • Doolittle RF. Computer Methods for Macromolecular Seqeunce Analysis. Vol 266. Academic Press; 1996.
  • Doyle DA, Lee A, Lewis J, Kim E, Sheng M, MacKinnon R. Crystal structures of a complexed and peptide-free membrane protein-binding domain: molecular basis of peptide recognition by PDZ. Cell. 1996;85:1067–1076. [PubMed]
  • Eisenmesser EZ, Bosco DA, Akke M, Kern D. Enzyme dynamics during catalysis. Science. 2002;295:1520–1523. [PubMed]
  • Ferguson AD, Amezcua CA, Halabi NM, Chelliah Y, Rosen MK, Ranganathan R, Deisenhofer J. Signal transduction pathway of TonB-dependent transporters. Proc Natl Acad Sci U S A. 2007;104:513–518. [PMC free article] [PubMed]
  • Filippakopoulos P, Kofler M, Hantschel O, Gish GD, Grebien F, Salah E, Neudecker P, Kay LE, Turk BE, Superti-Furga G, et al. Structural coupling of SH2-kinase domains links Fes and Abl substrate recognition and kinase activation. Cell. 2008;134:793–803. [PMC free article] [PubMed]
  • Fodor AA, Aldrich RW. On evolutionary conservation of thermodynamic coupling in proteins. The Journal of Biological Chemistry. 2004;279:19046–19050. [PubMed]
  • Fuentes EJ, Der CJ, Lee AL. Ligand-dependent dynamics and intramolecular signaling in a PDZ domain. J Mol Biol. 2004;335:1105–1115. [PubMed]
  • Gobel U, Sander C, Schneider R, Valencia A. Correlated mutations and residue contacts in proteins. Proteins. 1994;18:309–317. [PubMed]
  • Guinto ER, Caccia S, Rose T, Futterer K, Waksman G, Di Cera E. Unexpected crucial role of residue 225 in serine proteases. Proc Natl Acad Sci U S A. 1999;96:1852–1857. [PMC free article] [PubMed]
  • Halavaty AS, Moffat K. N- and C-terminal flanking regions modulate light-induced signal transduction in the LOV2 domain of the blue light sensor phototropin 1 from Avena sativa. Biochemistry. 2007;46:14001–14009. [PubMed]
  • Hamilton N, Burrage K, Ragan MA, Huber T. Protein contact prediction using patterns of correlation. Proteins. 2004;56:679–684. [PubMed]
  • Harper SM, Neil LC, Gardner KH. Structural basis of a phototropin light switch. Science. 2003;301:1541–1544. [PubMed]
  • Hatley ME, Lockless SW, Gibson SK, Gilman AG, Ranganathan R. Allosteric determinants in guanine nucleotide-binding proteins. Proc Natl Acad Sci U S A. 2003;100:14445–14450. [PMC free article] [PubMed]
  • Hedstrom L. Trypsin: a case study in the structural determinants of enzyme specificity. Biol Chem. 1996;377:465–470. [PubMed]
  • Hedstrom L. Serine protease mechanism and specificity. Chem Rev. 2002;102:4501–4524. [PubMed]
  • Hedstrom L, Perona JJ, Rutter WJ. Converting trypsin to chymotrypsin: residue 172 is a substrate specificity determinant. Biochemistry. 1994;33:8757–8763. [PubMed]
  • Holm L, Sander C. Mapping the protein universe. Science. 1996;273:595–603. [PubMed]
  • Huntington JA, Esmon CT. The molecular basis of thrombin allostery revealed by a 1.8 A structure of the "slow" form. Structure. 2003;11:469–479. [PubMed]
  • John DM, Weeks KM. van't Hoff enthalpies without baselines. Protein Sci. 2000;9:1416–1419. [PMC free article] [PubMed]
  • Kam CM, Hudig D, Powers JC. Granzymes (lymphocyte serine proteases): characterization with natural and synthetic substrates and inhibitors. Biochim Biophys Acta. 2000;1477:307–323. [PubMed]
  • Kass I, Horovitz A. Mapping pathways of allosteric communication in GroEL by analysis of correlated mutations. Proteins. 2002;48:611–617. [PubMed]
  • Kullback S. Information Theory and Statistics. Dover Publications; 1997.
  • Kuriyan J, Cowburn D. Modular peptide recognition domains in eukaryotic signaling. Annu Rev Biophys Biomol Struct. 1997;26:259–288. [PubMed]
  • Lapedes AS, Giraud BG, Liu L, Stormo GD. Correlated mutations in models of protein sequences: phylogenetic and structural effects. Statistics in Molecular Biology. 1999:236–256.
  • Larson SM, Di Nardo AA, Davidson AR. Analysis of covariation in an SH3 domain sequence alignment: applications in tertiary contact prediction and the design of compensating hydrophobic core substitutions. J Mol Biol. 2000;303:433–446. [PubMed]
  • Lee J, Natarajan M, Nashine VC, Socolich M, Vo T, Russ WP, Benkovic SJ, Ranganathan R. Surface sites for engineering allosteric control in proteins. Science. 2008;322:438–442. [PMC free article] [PubMed]
  • Lee SY, Banerjee A, MacKinnon R. Two separate interfaces between the voltage sensor and pore are required for the function of voltage-dependent K(+) channels. PLoS Biol. 2009;7:e47. [PMC free article] [PubMed]
  • Lee WS, Park CH, Byun SM. Streptomyces griseus trypsin is stabilized against autolysis by the cooperation of a salt bridge and cation-pi interaction. J Biochem. 2004;135:93–99. [PubMed]
  • Lesk AM, Chothia C. Evolution of proteins formed by beta-sheets. II. The core of the immunoglobulin domains. Journal of Molecular Biology. 1982;160:325–342. [PubMed]
  • Lichtarge O, Bourne HR, Cohen FE. An evolutionary trace method defines binding surfaces common to protein families. J Mol Biol. 1996;257:342–358. [PubMed]
  • Lockless SW, Ranganathan R. Evolutionarily conserved pathways of energetic connectivity in protein families. Science. 1999;286:295–299. [PubMed]
  • Martinez JC, Serrano L. The folding transition state between SH3 domains is conformationally restricted and evolutionarily conserved. Nat Struct Biol. 1999;6:1010–1016. [PubMed]
  • McGrath ME, Vasquez JR, Craik CS, Yang AS, Honig B, Fletterick RJ. Perturbing the polar environment of Asp102 in trypsin: consequences of replacing conserved Ser214. Biochemistry. 1992;31:3059–3064. [PubMed]
  • Mishra P, Socolich M, Wall MA, Graves J, Wang Z, Ranganathan R. Dynamic scaffolding in a G protein-coupled signaling system. Cell. 2007;131:80–92. [PubMed]
  • Naganathan AN, Munoz V. Determining denaturation midpoints in multiprobe equilibrium protein folding experiments. Biochemistry. 2008;47:6752–6761. [PubMed]
  • Neher E. How frequent are correlated changes in families of protein sequences? Proc Natl Acad Sci U S A. 1994;91:98–102. [PMC free article] [PubMed]
  • Ng PC, Henikoff S. Predicting the effects of amino acid substitutions on protein function. Annu Rev Genomics Hum Genet. 2006;7:61–80. [PubMed]
  • Olmea O, Valencia A. Improving contact predictions by the combination of correlated mutations and other sources of sequence information. Fold Des. 1997;2:S25–S32. [PubMed]
  • Olsson AY, Lilja H, Lundwall A. Taxon-specific evolution of glandular kallikrein genes and identification of a progenitor of prostate-specific antigen. Genomics. 2004;84:147–156. [PubMed]
  • Orengo CA, Thornton JM. Protein families and their evolution-a structural perspective. Annu Rev Biochem. 2005;74:867–900. [PubMed]
  • Ortiz AR, Kolinski A, Rotkiewicz P, Ilkowski B, Skolnick J. Ab initio folding of proteins using restraints derived from evolutionary information. Proteins. 1999 Suppl 3:177–185. [PubMed]
  • Ota N, Agard DA. Intramolecular signaling pathways revealed by modeling anisotropic thermal diffusion. J Mol Biol. 2005;351:345–354. [PubMed]
  • Pasternak A, Ringe D, Hedstrom L. Comparison of anionic and cationic trypsinogens: the anionic activation domain is more flexible in solution and differs in its mode of BPTI binding in the crystal structure. Protein Sci. 1999;8:253–258. [PMC free article] [PubMed]
  • Perona JJ, Craik CS, Fletterick RJ. Locating the catalytic water molecule in serine proteases. Science. 1993;261:620–622. [PubMed]
  • Perona JJ, Hedstrom L, Rutter WJ, Fletterick RJ. Structural origins of substrate discrimination in trypsin and chymotrypsin. Biochemistry. 1995;34:1489–1499. [PubMed]
  • Peterson FC, Penkert RR, Volkman BF, Prehoda KE. Cdc42 regulates the Par-6 PDZ domain through an allosteric CRIB-PDZ transition. Mol Cell. 2004;13:665–676. [PubMed]
  • Plerou V, Gopikrishnan P, Rosenow B, Amaral LA, Guhr T, Stanley HE. Random matrix approach to cross correlations in financial data. Phys Rev E Stat Nonlin Soft Matter Phys. 2002;65 066126. [PubMed]
  • Rawlings ND, Barrett AJ. Families of serine peptidases. Methods Enzymol. 1994;244:19–61. [PubMed]
  • Rawlings ND, Morton FR, Kok CY, Kong J, Barrett AJ. MEROPS: the peptidase database. Nucleic Acids Res. 2008;36:D320–D325. [PMC free article] [PubMed]
  • Riddle DS, Grantcharova VP, Santiago JV, Alm E, Ruczinski I, Baker D. Experiment and theory highlight role of native state topology in SH3 folding. Nat Struct Biol. 1999;6:1016–1024. [PubMed]
  • Ruggles SW, Fletterick RJ, Craik CS. Characterization of structural determinants of granzyme B reveals potent mediators of extended substrate specificity. J Biol Chem. 2004;279:30751–30759. [PubMed]
  • Russ WP, Lowery DM, Mishra P, Yaffe MB, Ranganathan R. Natural-like function in artificial WW domains. Nature. 2005;437:579–583. [PubMed]
  • Sadovsky E, Yifrach O. Principles underlying energetic coupling along an allosteric communication trajectory of a voltage-activated K+ channel. Proc Natl Acad Sci U S A. 2007;104:19813–19818. [PMC free article] [PubMed]
  • Shindyalov IN, Kolchanov NA, Sander C. Can three-dimensional contacts in protein structures be predicted by analysis of correlated mutations? Protein Eng. 1994;7:349–358. [PubMed]
  • Shulman AI, Larson C, Mangelsdorf DJ, Ranganathan R. Structural determinants of allosteric ligand activation in RXR heterodimers. Cell. 2004;116:417–429. [PubMed]
  • Skerker JM, Perchuk BS, Siryaporn A, Lubin EA, Ashenberg O, Goulian M, Laub MT. Rewiring the specificity of two-component signal transduction systems. Cell. 2008;133:1043–1054. [PMC free article] [PubMed]
  • Smock RG, Gierasch LM. Sending signals dynamically. Science. 2009;324:198–203. [PMC free article] [PubMed]
  • Socolich M, Lockless SW, Russ WP, Lee H, Gardner KH, Ranganathan R. Evolutionary information for specifying a protein fold. Nature. 2005;437:512–518. [PubMed]
  • Suel GM, Lockless SW, Wall MA, Ranganathan R. Evolutionarily conserved networks of residues mediate allosteric communication in proteins. Nat Struct Biol. 2003;10:59–69. [PubMed]
  • Thomas DJ, Casari G, Sander C. The prediction of protein contacts from multiple sequence alignments. Protein Eng. 1996;9:941–948. [PubMed]
  • Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG. The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 1997;25:4876–4882. [PMC free article] [PubMed]
  • Thornton JM, Orengo CA, Todd AE, Pearl FM. Protein folds, functions and evolution. J Mol Biol. 1999;293:333–342. [PubMed]
  • Wang EC, Hung SH, Cahoon M, Hedstrom L. The role of the Cys191–Cys220 disulfide bond in trypsin: new targets for engineering substrate specificity. Protein Eng. 1997;10:405–411. [PubMed]
  • Wang Y, Geer LY, Chappey C, Kans JA, Bryant SH. Cn3D: sequence and structure views for Entrez. Trends Biochem Sci. 2000;25:300–302. [PubMed]
  • Wigner EP. Random Matrices in Physics. Siam Review. 1967;9 1-&.
  • Yu H, Chen JK, Feng S, Dalgarno DC, Brauer AW, Schreiber SL. Structural basis for the binding of proline-rich peptides to SH3 domains. Cell. 1994;76:933–945. [PubMed]
  • Zarrinpar A, Bhattacharyya RP, Lim WA. The structure and function of proline recognition domains. Sci STKE. 2003;2003:RE8. [PubMed]
  • Zvelebil MJ, Barton GJ, Taylor WR, Sternberg MJ. Prediction of protein secondary structure and active sites using the alignment of homologous sequences. Journal of Molecular Biology. 1987;195:957–961. [PubMed]
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

  • MedGen
    MedGen
    Related information in MedGen
  • PubMed
    PubMed
    PubMed citations for these articles
  • Substance
    Substance
    PubChem Substance links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...