• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of narLink to Publisher's site
Nucleic Acids Res. Jul 1, 2011; 39(Web Server issue): W215–W222.
Published online May 18, 2011. doi:  10.1093/nar/gkr363
PMCID: PMC3125769

SDM—a server for predicting effects of mutations on protein stability and malfunction

Abstract

The sheer volume of non-synonymous single nucleotide polymorphisms that have been generated in recent years from projects such as the Human Genome Project, the HapMap Project and Genome-Wide Association Studies means that it is not possible to characterize all mutations experimentally on the gene products, i.e. elucidate the effects of mutations on protein structure and function. However, automatic methods that can predict the effects of mutations will allow a reduced set of mutations to be studied. Site Directed Mutator (SDM) is a statistical potential energy function that uses environment-specific amino-acid substitution frequencies within homologous protein families to calculate a stability score, which is analogous to the free energy difference between the wild-type and mutant protein. Here, we present a web server for SDM (http://www-cryst.bioc.cam.ac.uk/~sdm/sdm.php), which has obtained more than 10 000 submissions since being online in April 2008. To run SDM, users must upload a wild-type structure and the position and amino acid type of the mutation. The results returned include information about the local structural environment of the wild-type and mutant residues, a stability score prediction and prediction of disease association. Additionally, the wild-type and mutant structures are displayed in a Jmol applet with the relevant residues highlighted.

INTRODUCTION

Primarily hydrophobic interactions and a network of hydrogen bonds stabilize the folded state of a protein. However, a protein that is folded correctly is only marginally more stable than when it is unfolded, and mutations that affect a stabilizing interaction within a folded protein may lead to protein instability and malfunction. Where protein malfunction does occur and cannot be remediated by an alternative molecular pathway this may result in disease. For example, destabilizing mutations in phenylalanine hydroxylase lead to the metabolic disease, phenylketonuria (1). In fact, up to 80% of Mendelian disease-associated single mutations in protein coding regions are estimated to be caused by protein destabilization effects (2). However, a huge volume of single nucleotide polymorphisms (SNPs) has been generated in recent years from projects such as the Human Genome Project (3) and the HapMap Project (4) largely due to the availability of high-throughput array-based genotyping methods (5) and next generation sequencing platforms (6,7). Automatic methods that can predict the effect of mutations accurately will allow a reduced set of mutations to be characterized experimentally, saving time and money.

Various methods of predicting protein stability changes caused by mutation have been described and can be grouped into four main categories based on the strategy used in the calculation: (i) physical effective energy functions; (ii) empirical potential energy functions; (iii) machine learning methods; and (iv) statistical potential energy functions.

Physical potential energy functions (such as molecular mechanics approaches or Monte Carlo simulations) are probably the most accurate methods for predicting the effects of mutations on protein stability, however, they are currently only useful for testing small sets of mutants due to the large amount of time required to compute calculated ΔΔG values (8–12). The reliability of predictions is also complicated by the difficulties in sampling in the folded and unfolded states (12). Empirical potential approaches are fitted to experimental data using a set of weighted terms incorporating physical and statistical energy terms and structural descriptors (13,14). Machine learning methods include neural networks and support vector machines (SVMs) and use information about mutations, protein sequence and structural information to fit a non-linear function to experimental data (15–17). They are similar to empirical potential approaches in their use of experimental data to fit their function and in both cases, care must be taken that the function is not over-fitted to the training data set. Statistical potential energy approaches are derived from the statistical analysis of protein data such as substitution frequencies, distance potentials and amino acid environmental propensities (18–21). Other methods use a combination of the above strategies (22–24).

Site Directed Mutator (SDM) is a statistical potential energy function developed by Topham et al. (20) to predict the effect that SNPs will have on the stability of proteins. SDM uses environment-specific amino acid substitution frequencies within homologous protein families to calculate a stability score, which is analogous to the free energy difference between a wild-type and mutant protein. Blind testing on a set of 83 staphylococcal nuclease and 63 barnase mutants showed a correlation of 0.80 between the predicted stability changes and experimental data (20). The method performs comparably or better than other published methods in the task of classifying mutations as stabilizing or destabilizing (25). Additionally, SDM has much improved sensitivity in predicting stabilizing mutations compared to other published methods (five of the seven methods tested incorrectly classify >68% of the stabilizing mutations). When applied to the task of predicting disease-associated mutations, SDM had an accuracy of 61% (26). Therefore, SDM is a useful tool for guiding the design of site-directed mutagenesis experiments or for predicting whether a mutation will impact protein structure and have a role in disease. Here, we present a web server for SDM (http://www-cryst.bioc.cam.ac.uk/~sdm/sdm.php), which has not previously been published.

MATERIALS AND METHODS

Environment-specific substitution tables

SDM uses a set of conformationally constrained environment-specific substitution tables (ESSTs), the general methodology of which are described in (27,28). The tables were derived from 371 protein family sequence alignments from the HOMSTRAD database (29), consisting of 1357 structures and were built using a modified version of the program Makesub, which is able to handle sidechain hydrogen bond satisfaction (C. Topham, unpublished data). By defining the local structural environment of amino acid residues (secondary structure, solvent accessibility and formation of hydrogen bonds) distinct patterns of substitutions have been observed (30,31). Environment-specific substitution tables (ESSTs) store these substitution data quantitatively in the form of probabilities and therefore provide information about the existence of each amino acid in a particular environment and the probability of it being substituted by any other amino acid. Functional residues [as defined by Uniprot (32), the Catalytic Site Atlas (33) and Interpare (34)] were masked from substitution counts.

Definition of structural environment

The structural parameters that were used to define the local environment of amino acid residues are mainchain conformation, solvent accessibility and hydrogen-bonding class.

  1. Mainchain conformation and secondary structure: Nine classes of mainchain conformation were defined: residues were identified as belonging to either α-helix or β-sheet first and the remaining residues were classified as being a, b, p, t, l, g or e according to their mainchain [var phi]-ψ torsion angles. The torsion angles and secondary structure assignments were calculated using the sstruc program (D. Smith, unpublished data).
  2. Relative sidechain solvent accessibility: Three classes of relative sidechain solvent accessibility were defined based on the method of Lee and Richards (35). Residues with sidechain relative accessibilities of:
    1. <17% were defined as inaccessible
    2. 17–43% were defined as partially accessible
    3. >43% were defined as accessible

These cut-offs were chosen based on an assessment of relative sidechain solvent accessibility values (36). The accessibility of each residue in a structure was calculated using the program psa (A. Sali, unpublished data).

  • iii. Hydrogen bonding: Two classes of hydrogen bonding were defined: residues were classed as either being satisfied in terms of their sidechain hydrogen bonding or not based on the criteria described by Worth and Blundell (37). Proteins were first protonated and the charge state of ionisable residues determined using the program, PROPKA (38). The program, hbond (J. Overington, unpublished data), was used to identify hydrogen bonds defined by the criterion that the distance between donor and acceptor was <3.5Å except for interactions involving sulphur atoms where 4.0Å was used. Hydrogen bonds were then further filtered using the methodology described by Worth and Blundell (37).

These structural parameters gave a total of 54 local environments (nine mainchain × three solvent accessibility × two hydrogen bonding terms).

Prediction of protein stability changes caused by mutation

The algorithm underlying SDM was first described by Topham et al. (20). In this original work, two stability difference scores were calculated using either amino acid environmental substitution data (method I) or amino acid propensities (method II). Our subsequent analysis showed that updating the substitution and propensity data using additional protein families resulted in a better performance when the environment substitution data were used (data not shown). Therefore, SDM uses only method I to calculate protein stability changes caused by mutation. In addition, SDM now uses a far more comprehensive set of substitution data (ESSTs) compared to the original publication (371 families compared to 131) and known functional sites are excluded from the substitution counts. Furthermore, the local structural environment parameter ‘sidechain hydrogen bond (yes/no)’ was modified to ‘sidechain hydrogen-bonding satisfaction (satisfied/unsatisfied)’ and this was shown to improve the stability score calculations (36).

By analogy to the folding-unfolding cycle in Figure 1, the algorithm uses ESSTs to calculate the difference in the stability scores of the folded and unfolded state for the wild-type and mutant protein structures:

equation image
(1)

The substitution data used for calculating the stability score are from families of homologous proteins, which have accepted multiple mutations during the course of their evolution. However, the effects of single substitutions are not often observed over the timescale of evolution e.g. cavity mutants. In order to compensate for this a disruption term is introduced for buried mutated residues. It is defined as the logarithmic function of the absolute value of the net change over the mutated position in the sidechain surface accessible area in an extended peptide Gly-X-Gly, relative to that for glycine. Therefore Equation (1) becomes:

equation image
(2)

ESSTs take into account the environment of only one of the two residues (wild-type or mutant), therefore it is necessary to consider not only the probability of replacement of the wild-type residue (Rj) in the wild-type environment (εwt) by a mutant residue type (rk) in an undefined environment [P(rk/Rj, εwt)] but also the probability of replacement of the mutant residue type (Rk) in the mutant environment (εmut) by the wild-type residue (rj) in an undefined environment [P(rj/Rk, εmut)].

Figure 1.
The thermodynamic cycle can be used to calculate protein stability changes between wild-type and mutant proteins.

In order to normalise the probabilities that are combined from different substitution tables, it is necessary to introduce a reference state. For the wild-type residue (Rj) in the wild-type environment a suitable reference state is the probability of it being conserved in that environment [P(rj/Rj, εwt)]. In an analogous way, for the mutant residue type (Rk) in the mutant environment, a suitable reference state is the probability of it being conserved in that environment [P(rk/Rk, εmut)].

The difference in stability scores for a mutation in the folded state is therefore calculated by:

equation image
(3)

The difference in stability scores in the unfolded state (An external file that holds a picture, illustration, etc.
Object name is gkr363i1.jpg) is also calculated using Equation (3) but uses an environmental substitution table derived from non-hydrogen bonded, surface exposed amino acid residues falling outside regions of regular secondary structure. The stability difference scores for the folded and unfolded state for the wild-type and mutant protein structures are then calculated using Equation (1).

Prediction of disease-association

From studying missense mutations for which the phenotypes are known, it is estimated that the stability margin that can be accommodated without any immediate effect on protein fitness is 1–3 kcal mol−1 (39–41). Studies of Ig-like proteins have shown that mutations that decrease the stability of these proteins by >2 kcal mol−1 result in severe disease phenotypes (42,43).

It may appear counter-intuitive that increased protein stability can lead to protein malfunction; however, protein flexibility is essential for enzyme catalysis. For instance, the increased stability of many thermophilic proteins is accompanied by loss of protein flexibility and reduced enzymatic activity at low temperatures (44–48). Furthermore, stabilizing mutations at catalytic site residues typically decrease activity and suggest that function often comes with a substantial penalty to stability (44,49–52). In addition, highly stable proteins are protease-resistant and therefore difficult to regulate—this is important to consider in systems such as cell signalling, where removing a signal is as important as its activation (53). A recent study showed that β-catenin accumulation is the most common aberration in parathyroid tumours of primary origin and that the S37A stabilizing mutation of CTNNB1 was found in 5.8% of the tumours (54). Another example of a stabilizing and damaging mutation is the Parkinson disease-associated A30P mutation, which stabilizes α-synuclein against proteasomal degradation triggered by haeme oxygenease-1 over-expression in human neuroblastoma cells (55). Hence, there is biological evidence that increased protein stability can lead to protein malfunction and hence disease.

In light of the studies mentioned in the previous two paragraphs, we have used a cut-off of 2 kcal mol−1 (stabilizing or destabilizing) for classifying mutations as leading to protein malfunction and possibly disease.

Mutant thermodynamic data sets

A subset of the data set used by Capriotti et al. (16) was used for initial benchmarking. This mutant data set was taken from the ProTherm database, which stores thermodynamic data for proteins and mutants (56). Our method requires knowledge of the local structural environment of wild-type and mutant residues in order to predict the effect of mutation on the stability of a protein. If the local environment is incorrectly defined e.g. the protein functions as a trimer but is defined in the crystallographic asymmetric unit as the protomer, this may affect our calculation. To remove the effect of such errors we used the Protein Interfaces, Surface and Assemblies (PISA) service to predict the oligomeric state of each of the proteins in the data set (57). Only those proteins predicted to be monomers were used. This data set is hereafter referred to as the monomeric set.

The validation data set used by Dehouck et al. (22) for benchmarking their method PoPMuSiC-2.0 was used for comparison of SDM’s performance to other published stability change prediction algorithms. This data set comprises 350 mutations, none of which was included in any of the databases used to devise or test the seven methods tested by Dehouck et al. (22).

A set of 388 mutants (S388) with thermodynamic measurements conducted under physiological conditions was also used to test our method. The S388 data set has been used to test other published methods and therefore allows us to perform a direct comparison of our method to them.

WEBSERVER

Input

SDM requires the 3D co-ordinates of the wild-type protein (in PDB format), the PDB chain identifier, the mutation position and the amino acid type of the mutation in one-letter code in order to calculate a stability score for mutant proteins. Users who have not already obtained a structure of their protein of interest may use the search boxes on the home page to do so. These search boxes allow a user to query the RCSB Protein Data Bank (www.pdb.org) (58) for their protein of interest, using protein name, description or amino acid sequence.

The wild-type structure may be submitted using one of two methods; the user can either upload the PDB file or enter the four-letter PDB code. NMR structures are accepted by SDM for input; however, users should note that it is only the first model in the PDB file, which is used for subsequent analysis.

SDM also requires a 3D structure of the mutant protein to perform its calculations. In this case, the user has the option of either uploading a mutant structure or using the program ANDANTE to build a model structure of the mutant (59). A requirement of SDM is that the wild-type and mutant structures span the same part of the polypeptide chain; therefore users must ensure that when they upload a mutant PDB structure that they fulfil this requirement.

The home page also provides a link to example output in order that users may view the type of output produced before running their job. Additionally, tutorials on usage are available for viewing using the link provided on the navigator bar.

Output

The results page is split into three sections. On the left-hand side the mutant information is displayed (wild-type and mutant amino acid types plus the position). Where ANDANTE was used to build a mutant structure, the PDB file is made available for download. The results returned include information about the local structural environment of the wild-type and mutant residues (the secondary structure, solvent accessibility and sidechain hydrogen bond satisfaction), a stability score prediction and prediction of disease association. As mentioned in the methods section, a cut-off of 2 kcal mol−1 is used to indicate whether a mutation is likely to be disease-associated or not. However, mutations that do not reach this cut-off may still lead to protein malfunction and disease if they affect binding sites. A statement indicating this issue is therefore displayed and the links page lists resources that can be used to assess whether a residue is involved in binding.

In the middle portion of the results page, the wild-type and mutant structures are displayed using the Jmol structure viewer (Jmol: an open-source Java viewer for chemical structures in 3D http://www.jmol.org/) with the relevant residues highlighted. The user may control the display of these structures using the menu buttons on the right-hand side.

An example of the type of output produced by SDM is shown in Figure 2. A particular advantage of the predictions provided by SDM over other published methods is the indication of the local structural environment of wild-type and mutant residues and the fact that the user may view the 3D structural context of the residues. This allows users to identify possible molecular mechanisms that underlie predicted stability changes for example, loss of hydrogen bonds to the protein backbone.

Figure 2.
Screenshot of SDM analysis results for the example of mutation Y231N in Dystrophin (PDB code 1DXX, chain A). On the left hand side information about the wild-type and mutant residue is displayed such as the secondary structure, solvent accessibility and ...

VALIDATION

SDM has previously been validated using a set of ~230 mutants and was shown to have an accuracy of 74% in predicting the sign of stability change and a linear correlation coefficient of 0.60 between predicted and observed ΔΔG values (25). Removal of one outlying data point increased the linear correlation coefficient to 0.66. Analysis of the performance of SDM in predicting the sign of stability change in comparison to eight other published methods demonstrated that SDM performs comparably or better than the other methods.

Since the benchmarking detailed above was carried out, SDM has been modified so that the definition of sidechain hydrogen bonding has been changed from yes or no to satisfied or unsatisfied. Furthermore, functional residues have been masked from the substitution counts used to generate the ESSTs. We tested the improvement that these changes made to SDM’s predictions using the 855 mutants in the monomeric data set. The additional families used to generate the ESSTs, masking functional residues and incorporation of the hydrogen bond satisfaction term improved the correlation coefficient between predicted stability changes and experimental measurements from 0.51 to 0.58 (Table 1).

Table 1.
Comparison of the performance of SDM using different sets of ESSTs and the monomeric data set

The statistical potential-based method, PoPMuSiC-2.0 was recently reported and achieved a correlation of 0.63 between measured and predicted stability changes (22). The predictive power of the method was shown to be significantly higher than that of other programs described in the literature. In order to compare the predictive power of SDM to PoPMuSiC-2.0 and the other tested methods, we used the same data set of 350 mutants. After the PoPMuSiC algorithms, SDM has the highest linear correlation between predicted and measured ΔΔG values (Table 2). It also has the benefit of making predictions for the entire data set of 350 mutants. It is encouraging that the performance of SDM is improved when considering only highly stabilizing or destabilizing mutations—the correlation coefficient increases from 0.52 to 0.63 (Table 2).

Table 2.
Comparison of the performance of different prediction methods

The vast majority of published methods for predicting the effects of mutations on protein stability are based on machine learning (ML). These are first trained on a data set of mutations. Many of these ML methods report high correlations with experimental data sets [e.g. CUPSAT R = 0.87 (21) and IMutant2.0 R = 0.71 (60)]. However, when tested later in blind tests, these correlations drop drastically [e.g. CUPSAT R = 0.37 and IMutant-2.0 R = 0.29 (22)]. This reduction in prediction performance may be due to over-fitting to available data sets. The problem of decreasing performance of ML methods using blind-data sets was also observed by two independent assessments of the performance of protein stability predictors (61,62). SDM is not a ML method, but rather a statistical method based on observed amino acid substitutions that have occurred during divergent protein evolution. Therefore, it does not suffer from the problem of over-fitting, as demonstrated by the similar correlation coefficients obtained using the monomeric data set and the PoPMuSiC-2.0 validation data set. The problem of over-fitting is an important point to consider if methods are to be used to help successfully design mutagenesis experiments.

Table 3 shows the results of testing the S388 data set. These results show the performance of methods in predicting the sign of stability change i.e. whether a mutation is stabilizing or destabilizing. Many of the methods have accuracies of over 80%, which is impressive. However, if we examine the ability of the methods to predict stabilizing and destabilizing mutations another picture emerges; they tend to be very good at predicting destabilizing mutations but much worse at predicting stabilizing mutations. SDM however has a more balanced sensitivity in predicting both types of mutations, although the specificity of predicting destabilizing mutations is far better than that of predicting stabilizing mutations. Most mutations are destabilizing and this is reflected in the mutant thermodynamic data sets used for developing and testing such methods. Methods that assign all of the samples to the majority class (destabilizing mutations) will have high accuracy even though the performance is poor for the minority class (stabilizing mutations). This trend is observed for most of the methods reported in Table 3. It is possible that some of the results in Table 3 are biased by some over-fitting to the training data sets used in developing the methods.

Table 3.
Comparison of the performance of different prediction methods

When applied to the task of predicting disease-associated mutations, SDM had an accuracy of 61% (26), only 3% less than the accuracy achieved by the program Sorting Intolerant from Tolerant (SIFT) (63). Of course, it is unsurprising that SIFT obtains a higher accuracy than SDM as SDM is able to distinguish disease-associations only for those mutations that perturb protein structure and not those that directly affect catalytic residues, binding sites etc. Mutations that cause protein malfunction by affecting the functional residues of a protein (active sites or protein–protein interaction sites) or by altering post-translational modifications will not be identified as damaging by SDM. Therefore, to obtain a more accurate prediction of whether an nsSNP is associated with disease, these other effects should also be taken into account. We previously demonstrated that when SDM’s predictions were combined with predictions of functional sites using Crescendo (64) and known functional sites, this combined approach has a comparable accuracy to the other methods tested but has the benefit of a much lower false-positive rate, therefore providing a high-quality set of predictions (26).

SUMMARY

The SDM server provides users with a fast and accurate means of assessing the impact that a mutation will have on protein structure and stability. It provides a 3D view of the wild-type and mutant residues, allowing users to inspect the structural context of the sidechains. SDM is a useful tool for identifying possible disease associations and has been applied to the task of predicting deleterious nsSNPs at the genome scale (25,26,65) and also for generating new hypotheses regarding: (i) the molecular aetiology of renal cell carcinoma and pheochromocytoma in the cancer syndrome, von Hippel-Lindau disease (66); (ii) the structural effects of mutations in thyroid stimulating hormone receptor that are associated with congenital non-goitrous hypothyroidism (67); and (iii) tumour risk associated with mutations in succinate dehydrogenase D (68). It has also been used in the analysis of mutations in the autoimmune regulator protein (69), mixed lineage kinase 3 (70), the adaptor protein MyD88 adaptor-like (71) and breast cancer susceptibility gene 1 (72).

FUNDING

This work was supported by the Biotechnology and Biological Sciences Research Council (research studentship to C.L.W.) and a Wellcome Trust Programme Grant (to T.L.B.). Funding for open access charge: Wellcome Trust Programme Grant.

Conflict of interest statement. None declared.

REFERENCES

1. Bjorgo E, Knappskog PM, Martinez A, Stevens RC, Flatmark T. Partial characterization and three-dimensional-structural localization of eight mutations in exon 7 of the human phenylalanine hydroxylase gene associated with phenylketonuria. Eur. J. Biochem. 1998;257:1–10. [PubMed]
2. Wang Z, Moult J. SNPs, protein structure, and disease. Hum. Mutat. 2001;17:263–270. [PubMed]
3. Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, Evans CA, Holt RA, et al. The sequence of the human genome. Science. 2001;291:1304–1351. [PubMed]
4. Frazer KA, Ballinger DG, Cox DR, Hinds DA, Stuve LL, Gibbs RA, Belmont JW, Boudreau A, Hardenbol P, Leal SM, et al. A second generation human haplotype map of over 3.1 million SNPs. Nature. 2007;449:851–861. [PMC free article] [PubMed]
5. Gunderson KL, Steemers FJ, Ren H, Ng P, Zhou L, Tsan C, Chang W, Bullis D, Musmacker J, King C, et al. Whole-genome genotyping. Methods Enzymol. 2006;410:359–376. [PubMed]
6. Metzker ML. Sequencing technologies - the next generation. Nat. Rev. Genet. 2010;11:31–46. [PubMed]
7. Wheeler DA, Srinivasan M, Egholm M, Shen Y, Chen L, McGuire A, He W, Chen YJ, Makhijani V, Roth GT, et al. The complete genome of an individual by massively parallel DNA sequencing. Nature. 2008;452:872–876. [PubMed]
8. Bash PA, Singh UC, Langridge R, Kollman PA. Free energy calculations by computer simulation. Science. 1987;236:564–568. [PubMed]
9. Funahashi J, Sugita Y, Kitao A, Yutani K. How can free energy component analysis explain the difference in protein stability caused by amino acid substitutions? Effect of three hydrophobic mutations at the 56th residue on the stability of human lysozyme. Protein Eng. 2003;16:665–671. [PubMed]
10. Kollman PA, Massova I, Reyes C, Kuhn B, Huo S, Chong L, Lee M, Lee T, Duan Y, Wang W, et al. Calculating structures and free energies of complex molecules: combining molecular mechanics and continuum models. Acc. Chem. Res. 2000;33:889–897. [PubMed]
11. Park H, Lee S. Prediction of the mutation-induced change in thermodynamic stabilities of membrane proteins from free energy simulations. Biophys. Chem. 2005;114:191–197. [PubMed]
12. Shi YY, Mark AE, Wang CX, Huang F, Berendsen HJ, van Gunsteren WF. Can the stability of protein mutants be predicted by free energy calculations? Protein Eng. 1993;6:289–295. [PubMed]
13. Bordner AJ, Abagyan RA. Large-scale prediction of protein geometry and stability changes for arbitrary single point mutations. Proteins. 2004;57:400–413. [PubMed]
14. Guerois R, Nielsen JE, Serrano L. Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. J. Mol. Biol. 2002;320:369–387. [PubMed]
15. Capriotti E, Fariselli P, Calabrese R, Casadio R. Predicting protein stability changes from sequences using support vector machines. Bioinformatics. 2005;21(Suppl. 2):ii54–ii58. [PubMed]
16. Capriotti E, Fariselli P, Casadio R. A neural-network-based method for predicting protein stability changes upon single point mutations. Bioinformatics. 2004;20(Suppl. 1):i63–i68. [PubMed]
17. Cheng J, Randall A, Baldi P. Prediction of protein stability changes for single-site mutations using support vector machines. Proteins. 2006;62:1125–1132. [PubMed]
18. Gilis D, Rooman M. Predicting protein stability changes upon mutation using database-derived potentials: solvent accessibility determines the importance of local versus non-local interactions along the sequence. J. Mol. Biol. 1997;272:276–290. [PubMed]
19. Saraboji K, Gromiha MM, Ponnuswamy MN. Average assignment method for predicting the stability of protein mutants. Biopolymers. 2006;82:80–92. [PubMed]
20. Topham CM, Srinivasan N, Blundell TL. Prediction of the stability of protein mutants based on structural environment-dependent amino acid substitution and propensity tables. Protein Eng. 1997;10:7–21. [PubMed]
21. Parthiban V, Gromiha MM, Schomburg D. CUPSAT: prediction of protein stability upon point mutations. Nucleic Acids Res. 2006;34:W239–W242. [PMC free article] [PubMed]
22. Dehouck Y, Grosfils A, Folch B, Gilis D, Bogaerts P, Rooman M. Fast and accurate predictions of protein stability changes upon mutations using statistical potentials and neural networks: PoPMuSiC-2.0. Bioinformatics. 2009;25:2537–2543. [PubMed]
23. Yin S, Ding F, Dokholyan NV. Modeling backbone flexibility improves protein stability estimation. Structure. 2007;15:1567–1576. [PubMed]
24. Masso M, Vaisman II. Accurate prediction of stability changes in protein mutants by combining machine learning with structure based computational mutagenesis. Bioinformatics. 2008;24:2002–2009. [PubMed]
25. Worth CL, Burke DF, Blundell TL. Estimating the effects of single nucleotide polymorphisms on protein structure: how good are we at identifying likely disease associated mutations? Proceedings of “Molecular Interactions - Bringing Chemistry to Life” 2007:11–26.
26. Worth CL, Bickerton GR, Schreyer A, Forman JR, Cheng TM, Lee S, Gong S, Burke DF, Blundell TL. A structural bioinformatics approach to the analysis of nonsynonymous single nucleotide polymorphisms (nsSNPs) and their relation to disease. J. Bioinform. Comput. Biol. 2007;5:1297–1318. [PubMed]
27. Overington J, Donnelly D, Johnson MS, Sali A, Blundell TL. Environment-specific amino acid substitution tables: tertiary templates and prediction of protein folds. Protein Sci. 1992;1:216–226. [PMC free article] [PubMed]
28. Topham CM, McLeod A, Eisenmenger F, Overington JP, Johnson MS, Blundell TL. Fragment ranking in modelling of protein structure. Conformationally constrained environmental amino acid substitution tables. J. Mol. Biol. 1993;229:194–220. [PubMed]
29. Mizuguchi K, Deane CM, Blundell TL, Overington JP. HOMSTRAD: a database of protein structure alignments for homologous families. Protein Sci. 1998;7:2469–2471. [PMC free article] [PubMed]
30. Blundell TL, Cooper J, Donnelly D, Driessen H, Edwards Y, Eisenmenger F, Frazao C, Johnson M, Niefind K, Newman M, et al. Patterns of sequence variation in families of homologous proteins. In: Jornvall/Hoog/Gustavsson, editor. Methods in Protein Sequence Analysis. Basel: Birkhauser Verlag AG; 1991. pp. 373–385.
31. Overington J, Johnson MS, Sali A, Blundell TL. Tertiary structural constraints on protein evolutionary diversity: templates, key residues and structure prediction. Proc. Biol. Sci. 1990;241:132–145. [PubMed]
32. Consortium U. Ongoing and future developments at the Universal Protein Resource. Nucleic Acids Res. 2011;39:D214–D219. [PMC free article] [PubMed]
33. Porter CT, Bartlett GJ, Thornton JM. The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data. Nucleic Acids Res. 2004;32:D129–D133. [PMC free article] [PubMed]
34. Gong S, Park C, Choi H, Ko J, Jang I, Lee J, Bolser DM, Oh D, Kim DS, Bhak J. A protein domain interaction interface database: InterPare. BMC Bioinformatics. 2005;6:207. [PMC free article] [PubMed]
35. Lee B, Richards FM. The interpretation of protein structures: estimation of static accessibility. J. Mol. Biol. 1971;55:379–400. [PubMed]
36. Worth CL. Ph.D Thesis. Cambridge: University of Cambridge; 2008. The role of amino acid sidechains in protein stability.
37. Worth CL, Blundell TL. Satisfaction of hydrogen-bonding potential influences the conservation of polar sidechains. Proteins. 2009;75:413–429. [PubMed]
38. Li H, Robertson AD, Jensen JH. Very fast empirical prediction and rationalization of protein pKa values. Proteins. 2005;61:704–721. [PubMed]
39. Calloni G, Zoffoli S, Stefani M, Dobson CM, Chiti F. Investigating the effects of mutations on protein aggregation in the cell. J. Biol. Chem. 2005;280:10607–10613. [PubMed]
40. Mayer S, Rudiger S, Ang HC, Joerger AC, Fersht AR. Correlation of levels of folded recombinant p53 in escherichia coli with thermodynamic stability in vitro. J. Mol. Biol. 2007;372:268–276. [PubMed]
41. Tokuriki N, Tawfik DS. Stability effects of mutations and protein evolvability. Curr. Opin. Struct. Biol. 2009;19:596–604. [PubMed]
42. Lindberg MJ, Bystrom R, Boknas N, Andersen PM, Oliveberg M. Systematically perturbed folding patterns of amyotrophic lateral sclerosis (ALS)-associated SOD1 mutants. Proc. Natl Acad. Sci. USA. 2005;102:9754–9759. [PMC free article] [PubMed]
43. Randles LG, Lappalainen I, Fowler SB, Moore B, Hamill SJ, Clarke J. Using model proteins to quantify the effects of pathogenic mutations in Ig-like proteins. J. Biol. Chem. 2006;281:24216–24226. [PubMed]
44. Counago R, Wilson CJ, Pena MI, Wittung-Stafshede P, Shamoo Y. An adaptive mutation in adenylate kinase that increases organismal fitness is linked to stability-activity trade-offs. Protein Eng. Des. Sel. 2008;21:19–27. [PubMed]
45. Jaenicke R. Protein stability and molecular adaptation to extreme conditions. Eur. J. Biochem. 1991;202:715–728. [PubMed]
46. Somero GN. Proteins and temperature. Annu. Rev. Physiol. 1995;57:43–68. [PubMed]
47. Wolf-Watz M, Thai V, Henzler-Wildman K, Hadjipavlou G, Eisenmesser EZ, Kern D. Linkage between dynamics and catalysis in a thermophilic-mesophilic enzyme pair. Nat. Struct. Mol. Biol. 2004;11:945–949. [PubMed]
48. Zavodszky P, Kardos J, Svingor, Petsko GA. Adjustment of conformational flexibility is a key event in the thermal adaptation of proteins. Proc. Natl Acad. Sci. USA. 1998;95:7406–7411. [PMC free article] [PubMed]
49. Beadle BM, Shoichet BK. Structural bases of stability-function tradeoffs in enzymes. J. Mol. Biol. 2002;321:285–296. [PubMed]
50. Meiering EM, Serrano L, Fersht AR. Effect of active site residues in barnase on activity and stability. J. Mol. Biol. 1992;225:585–589. [PubMed]
51. Mukaiyama A, Haruki M, Ota M, Koga Y, Takano K, Kanaya S. A hyperthermophilic protein acquires function at the cost of stability. Biochemistry. 2006;45:12673–12679. [PubMed]
52. Yutani K, Ogasahara K, Tsujita T, Sugino Y. Dependence of conformational stability on hydrophobicity of the amino acid residue in a series of variant proteins substituted at a unique position of tryptophan synthase alpha subunit. Proc. Natl Acad. Sci. USA. 1987;84:4441–4444. [PMC free article] [PubMed]
53. DePristo MA, Weinreich DM, Hartl DL. Missense meanderings in sequence space: a biophysical view of protein evolution. Nat. Rev. Genet. 2005;6:678–687. [PubMed]
54. Bjorklund P, Lindberg D, Akerstrom G, Westin G. Stabilizing mutation of CTNNB1/beta-catenin and protein accumulation analyzed in a large series of parathyroid tumors of Swedish patients. Mol. Cancer. 2008;7:53. [PMC free article] [PubMed]
55. Song W, Patel A, Qureshi HY, Han D, Schipper HM, Paudel HK. The Parkinson disease-associated A30P mutation stabilizes alpha-synuclein against proteasomal degradation triggered by heme oxygenase-1 over-expression in human neuroblastoma cells. J. Neurochem. 2009;110:719–733. [PubMed]
56. Gromiha MM, Sarai A. Thermodynamic database for proteins: features and applications. Methods Mol. Biol. 2010;609:97–112. [PubMed]
57. Krissinel E, Henrick K. Inference of macromolecular assemblies from crystalline state. J. Mol. Biol. 2007;372:774–797. [PubMed]
58. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE. The protein data bank. Nucleic Acids Res. 2000;28:235–242. [PMC free article] [PubMed]
59. Smith RE, Lovell SC, Burke DF, Montalvao RW, Blundell TL. Andante: reducing side-chain rotamer search space during comparative modeling using environment-specific substitution probabilities. Bioinformatics. 2007;23:1099–1105. [PubMed]
60. Capriotti E, Fariselli P, Casadio R. I-Mutant2.0: predicting stability changes upon mutation from the protein sequence or structure. Nucleic Acids Res. 2005;33:W306–W310. [PMC free article] [PubMed]
61. Khan S, Vihinen M. Performance of protein stability predictors. Hum. Mutat. 2010;31:675–684. [PubMed]
62. Potapov V, Cohen M, Schreiber G. Assessing computational methods for predicting protein stability upon mutation: good on average but not in the details. Protein Eng. Des. Sel. 2009;22:553–560. [PubMed]
63. Ng PC, Henikoff S. SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Res. 2003;31:3812–3814. [PMC free article] [PubMed]
64. Chelliah V, Chen L, Blundell TL, Lovell SC. Distinguishing structural and functional restraints in evolution in order to identify interaction sites. J. Mol. Biol. 2004;342:1487–1504. [PubMed]
65. Burke DF, Worth CL, Priego EM, Cheng T, Smink LJ, Todd JA, Blundell TL. Genome bioinformatic analysis of nonsynonymous SNPs. BMC Bioinformatics. 2007;8:301. [PMC free article] [PubMed]
66. Forman JR, Worth CL, Bickerton GR, Eisen TG, Blundell TL. Structural bioinformatics mutation analysis reveals genotype-phenotype correlations in von Hippel-Lindau disease and suggests molecular mechanisms of tumorigenesis. Proteins. 2009;77:84–96. [PubMed]
67. Cangul H, Morgan NV, Forman JR, Saglam H, Aycan Z, Yakut T, Gulten T, Tarim O, Bober E, Cesur Y, et al. Novel TSHR mutations in consanguineous families with congenital nongoitrous hypothyroidism. Clin. Endocrinol. 2010;73:671–677. [PubMed]
68. Ricketts CJ, Forman JR, Rattenberry E, Bradshaw N, Lalloo F, Izatt L, Cole TR, Armstrong R, Kumar VK, Morrison PJ, et al. Tumor risks and genotype-phenotype-proteotype analysis in 358 patients with germline mutations in SDHB and SDHD. Hum. Mutat. 2010;31:41–51. [PubMed]
69. Ferguson BJ, Alexander C, Rossi SW, Liiv I, Rebane A, Worth CL, Wong J, Laan M, Peterson P, Jenkinson EJ, et al. AIRE's CARD revealed, a new structure for central tolerance provokes transcriptional plasticity. J. Biol. Chem. 2008;283:1723–1731. [PubMed]
70. Velho S, Oliveira C, Paredes J, Sousa S, Leite M, Matos P, Milanezi F, Ribeiro AS, Mendes N, Licastro D, et al. Mixed lineage kinase 3 gene mutations in mismatch repair deficient gastrointestinal tumours. Hum. Mol. Genet. 2010;19:697–706. [PMC free article] [PubMed]
71. Nagpal K, Plantinga TS, Wong J, Monks BG, Gay NJ, Netea MG, Fitzgerald KA, Golenbock DT. A TIR domain variant of MyD88 adapter-like (Mal)/TIRAP results in loss of MyD88 binding and reduced TLR2/TLR4 signaling. J. Biol. Chem. 2009;284:25742–25748. [PMC free article] [PubMed]
72. Rowling PJ, Cook R, Itzhaki LS. Toward classification of BRCA1 missense variants using a biophysical approach. J. Biol. Chem. 2010;285:20080–20087. [PMC free article] [PubMed]
73. Singh SM, Kongari N, Cabello-Villegas J, Mallela KM. Missense mutations in dystrophin that trigger muscular dystrophy decrease protein stability and lead to cross-beta aggregates. Proc. Natl Acad. Sci. USA. 2010;107:15069–15074. [PMC free article] [PubMed]
74. Kang S, Chen G, Xiao G. Robust prediction of mutation-induced protein stability change by property encoding of amino acids. Protein Eng. Des. Sel. 2009;22:75–83. [PubMed]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...