• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of narLink to Publisher's site
Nucleic Acids Res. Jul 2007; 35(Web Server issue): W407–W410.
Published online May 21, 2007. doi:  10.1093/nar/gkm290
PMCID: PMC1933241

ProSA-web: interactive web service for the recognition of errors in three-dimensional structures of proteins

Abstract

A major problem in structural biology is the recognition of errors in experimental and theoretical models of protein structures. The ProSA program (Protein Structure Analysis) is an established tool which has a large user base and is frequently employed in the refinement and validation of experimental protein structures and in structure prediction and modeling. The analysis of protein structures is generally a difficult and cumbersome exercise. The new service presented here is a straightforward and easy to use extension of the classic ProSA program which exploits the advantages of interactive web-based applications for the display of scores and energy plots that highlight potential problems spotted in protein structures. In particular, the quality scores of a protein are displayed in the context of all known protein structures and problematic parts of a structure are shown and highlighted in a 3D molecule viewer. The service specifically addresses the needs encountered in the validation of protein structures obtained from X-ray analysis, NMR spectroscopy and theoretical calculations. ProSA-web is accessible at https://prosa.services.came.sbg.ac.at

INTRODUCTION

The availability of a structural model of a protein is one of the keys for understanding biological processes at a molecular level. The recent advances in experimental technology have led to the emergence of large-scale structure determination pipelines aimed at the rapid characterization of protein structures. The resulting amount of experimental structural information is enormous. The application of computational methods for the prediction of unknown structures adds another plethora of structural models. The latest NAR web server issue, e.g. lists about 50 tools in the category ‘3D Structure Prediction’ (1). The assessment of the accuracy and reliability of experimental and theoretical models of protein structures is a necessary task that needs to be addressed regularly and in particular, it is essential for maintaining integrity, consistency and reliability of public structure repositories (2).

ProSA (3) is a tool widely used to check 3D models of protein structures for potential errors. Its range of application includes error recognition in experimentally determined structures (4–6), theoretical models (7–10) and protein engineering (11,12). Here we present a web-based version of ProSA, ProSA-web, that encompasses the basic functionality of stand-alone ProSA and extends it with new features that facilitate interpretation of the results obtained. The overall quality score calculated by ProSA for a specific input structure is displayed in a plot that shows the scores of all experimentally determined protein chains currently available in the Protein Data Bank (PDB) (13). This feature relates the score of a specific model to the scores computed from all experimental structures deposited in PDB. Problematic parts of a model are identified by a plot of local quality scores and the same scores are mapped on a display of the 3D structure using color codes.

A particular intention of the ProSA-web application is to encourage structure depositors to validate their structures before they are submitted to PDB and to use the tool in early stages of structure determination and refinement. The service requires only Cα atoms so that low-resolution structures and approximate models obtained early in the structure determination process can be evaluated and compared against high-resolution structures. The ProSA-web service returns results instantaneously, i.e. the response time is in the order of seconds, even for large molecules.

WEB SERVER USAGE

Required input

ProSA-web requires the atomic coordinates of the model to be evaluated. Users can supply coordinates either by uploading a file in PDB format or by entering the four-letter code of a protein structure available from PDB. A chain identifier and an NMR model number may be used to specify a particular model. A list with possible values of these parameters is presented to the user if the entered chain identifier or model number is invalid. If no chain identifier or model number is supplied by the user, the first chain of the first model found in the PDB file is used for analysis.

Range of computations

The computational engine used for the calculation of scores and plots is standard ProSA which uses knowledge-based potentials of mean force to evaluate model accuracy (3). All calculations are carried out with Cα potentials, hence ProSA-web can also be applied to low-resolution structures or other cases where the Cα trace is available only (a set of Cβ potentials is included in the stand-alone version of ProSA, see Supplementary Data 1). After parsing the coordinates, the energy of the structure is evaluated using a distance-based pair potential (14,15) and a potential that captures the solvent exposure of protein residues (16). From these energies, two characteristics of the input structure are derived and displayed on the web page: its z-score and a plot of its residue energies.

The z-score indicates overall model quality and measures the deviation of the total energy of the structure with respect to an energy distribution derived from random conformations (3,15). Z-scores outside a range characteristic for native proteins indicate erroneous structures. In order to facilitate interpretation of the z-score of the specified protein, its particular value is displayed in a plot that contains the z-scores of all experimentally determined protein chains in current PDB (an example is shown in Figure 1A). Groups of structures from different sources (X-ray, NMR) are distinguished by different colors. This plot can be used to check whether the z-score of the protein in question is within the range of scores typically found for proteins of similar size belonging to one of these groups.

Figure 1.
Investigation of two ABC transporter structures using the ProSA-web service. Subfigures (A–C) show the results for a monomer of MsbA (PDB code 1JSQ, chain A (17)). The structure was determined by X-ray crystallography to 4.5 Å ...

The energy plot shows the local model quality by plotting energies as a function of amino acid sequence position i (see Figure 1B and D for example). In general, positive values correspond to problematic or erroneous parts of a model. A plot of single residue energies usually contains large fluctuations and is of limited value for model evaluation. Hence the plot is smoothed by calculating the average energy over each 40-residue fragment si,i+39, which is then assigned to the ‘central’ residue of the fragment at position i + 19.

In order to further narrow down those regions in the model that contribute to a bad overall score, ProSA-web visualizes the 3D structure of the protein using the molecule viewer Jmol (http://www.jmol.org). Residues with unusually high energies stand out by color from the rest of the structure (Figure 1C and E). The interactive facilities provided by Jmol, like distance measurements, etc. are available for exploring these regions in more detail.

Protein structure validation by example

In what follows, we provide a typical example for the application of ProSA-web in the validation of protein structures. We analyze two structures determined by X-ray analysis and deposited in PDB. The first is the structure of MsbA from Escherichia coli, a homolog of the multi-drug resistance ATP-binding cassette (ABC) transporters (PDB code 1JSQ, release date 12 September 2001) determined to a resolution of 4.5 Å (17). The structure consists of an N-terminal transmembrane domain and a soluble nucleotide-binding domain. Doubts regarding the quality of 1JSQ were raised after the X-ray structure of a close homolog became available which turned out to be surprisingly different. This second structure, multi-drug ABC transporter Sav1866 from Staphylococcus aureus (PDB code 2HYD, release date 5 September 2006) was determined to a resolution of 3.0 Å (18). Based on the newly determined structure, it was realized that the published structure of the MsbA model is incorrect and as a consequence the related publication had to be retracted (19).

Here, we apply the ProSA-web service to the analysis of the incorrect 1JSQ and the recently released 2HYD model. An interesting aspect is that both structures contain a transmembrane domain. Since the energy functions used in ProSA are derived mainly from soluble globular proteins of known structure, it is not clear in advance to what extent the ProSA scores reflect problems in protein structures containing membrane spanning domains.

Figure 1A–C shows the results of ProSA-web obtained for 1JSQ (chain A). The z-score of this model is −0.60, a value far too high for a typical native structure. This can clearly be seen when the score is compared to the scores of other experimentally determined protein structures of the size of 1JSQ (Figure 1A). Furthermore, large parts of the energy plot show highly positive energy values, especially the N-terminal half of the sequence which contains part of the membrane spanning domain (Figure 1B). In the Cα trace of the model, residues with high energies are shown in grades of red (Figures 1C), and it is evident from these figures that the N-terminal transmembrane domain as well as the C-terminal globular domain contain regions of offending energies.

Figure 1A also shows the location of the z-score for 2HYD (chain A). The value, −8.29, is in the range of native conformations. Overall the residue energies are largely negative with the exception of some peaks in the N-terminal part (Figure 1D). These peaks are supposed to correspond to membrane spanning regions of the protein. In the Cα trace, these regions show up as clusters of residues colored in red (Figure 1E, lower left). The C-terminal domain shows a high number of residues colored in blue and an energy distribution that is entirely below the zero base line, consistent with the parameters of a typical protein (Figure 1D and E).

CONCLUSION

The protein structure community is, to some extent, aware of the fact that the RCSB protein data base contains erroneous structures. But it is quite difficult to spot these errors. Grossly misfolded structures are sometimes revealed after the results of subsequent independent structure determinations become available. Errors in regular PDB files generally remain unknown to the structural community until the corresponding revisions are made available. Hence, diagnostic tools that reveal unusual structures and problematic parts of a structure in a manner that is independent of the experimental data and the specific method employed are essential in many areas of protein structure research.

ProSA is a diagnostic tool that is based on the statistical analysis of all available protein structures. The potentials of mean force compiled from the data base provide a statistical average over the known structures. Structures of soluble globular proteins whose z-scores deviate strongly from the data base average are unusual and frequently such structures turn out to be erroneous. For proteins containing membrane spanning regions, the significance of deviations from the average over the data base is less clear.

Here, we provide an example of a published structure (1JSQ) that is known to be incorrect as is revealed by subsequent independent X-ray analysis of a related protein yielding a completely different conformation. The ProSA-web result obtained for 1JSQ shows extreme deviations when compared to all the structures in PDB (Figure 1A). In contrast, the score obtained for the related 2HYD structure is close to the data base average. The result demonstrates that also for membrane proteins large deviations from normality may indicate an erroneous structure.

SUPPLEMENTARY DATA

(1) ProSA stand-alone version: http://cms.came.sbg.ac.at/typo3/index.php?id=prosa_download (2) List of studies that use ProSA for model validation: http://www.came.sbg.ac.at/typo3/index.php?id=prosa_literature

ACKNOWLEDGEMENTS

The authors are grateful to Christian X. Weichenberger who suggested the use of the ABC transporter structures as an example. This work was supported by FWF Austria, grant number P13710-MOB. Use of the ProSA-II program on the ProSA-web server is granted under an academic license agreement by Proceryon Science for Life GmbH (http://www.proceryon.com) which is gratefully acknowledged. Funding to pay the Open Access publication charges for this article was provided by the University of Salzburg, Austria.

Conflict of interest statement. None declared

REFERENCES

1. Fox JA, McMillan S, Ouellette BFF. A compilation of molecular biology web servers: 2006 update on the Bioinformatics Links Directory. Nucleic Acids Res. 2006;34:W3–W5. [PMC free article] [PubMed]
2. Berman HM, Burley SK, Chiu W, Sali A, Adzhubei A, Bourne PE, Bryant SH, Dunbrack RL, Fidelis K, et al. Outcome of a workshop on archiving structural models of biological macromolecules. Structure. 2006;14:1211–1217. [PubMed]
3. Sippl MJ. Recognition of errors in three-dimensional structures of proteins. Proteins. 1993;17:355–362. [PubMed]
4. Banci L, Bertini I, Cantini F, DellaMalva N, Herrmann T, Rosato A, Wüthrich, K. Solution structure and intermolecular interactions of the third metal-binding domain of ATP7A, the Menkes disease protein. J. Biol. Chem. 2006;281:29141–29147. [PubMed]
5. Llorca O, Betti M, Gonzlez JM, Valencia A, Mrquez AJ, Valpuesta JM. The three-dimensional structure of an eukaryotic glutamine synthetase: functional implications of its oligomeric structure. J. Struct. Biol. 2006;156:469–479. [PubMed]
6. Teilum K, Hoch JC, Goffin V, Kinet S, Martial JA, Kragelund BB. Solution structure of human prolactin. J. Mol. Biol. 2005;351:810–823. [PubMed]
7. Petrey D, Honig B. Protein structure prediction: inroads to biology. Mol. Cell. 2005;20:811–819. [PubMed]
8. Ginalski K. Comparative modeling for protein structure prediction. Curr. Opin. Struct. Biol. 2006;16:172–177. [PubMed]
9. Panteri R, Paiardini A, Keller F. A 3D model of Reelin subrepeat regions predicts Reelin binding to carbohydrates. Brain Res. 2006;1116:222–230. [PubMed]
10. Mansfeld J, Gebauer S, Dathe K, Ulbrich-Hofmann R. Secretory phospholipase A2 from Arabidopsis thaliana: insights into the three-dimensional structure and the amino acids involved in catalysis. Biochemistry. 2006;45:5687–5694. [PubMed]
11. Beissenhirtz MK, Scheller FW, Viezzoli MS, Lisdat F. Engineered superoxide dismutase monomers for superoxide biosensor applications. Anal. Chem. 2006;78:928–935. [PubMed]
12. Wiederstein M, Sippl MJ. Protein sequence randomization: efficient estimation of protein stability using knowledge-based potentials. J. Mol. Biol. 2005;345:1199–1212. [PubMed]
13. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE. The Protein Data Bank. Nucleic Acids Res. 2000;28:235–242. [PMC free article] [PubMed]
14. Sippl MJ. Calculation of conformational ensembles from potentials of mean force. An approach to the knowledge-based prediction of local structures in globular proteins. J. Mol. Biol. 1990;213:859–883. [PubMed]
15. Sippl MJ. Knowledge-based potentials for proteins. Curr. Opin. Struct. Biol. 1995;5:229–235. [PubMed]
16. Sippl MJ. Boltzmann's principle, knowledge-based mean fields and protein folding. An approach to the computational determination of protein structures. J. Comput. Aided Mol. Des. 1993;7:473–501. [PubMed]
17. Chang G, Roth CB. Structure of MsbA from E. coli: a homolog of the multidrug resistance ATP binding cassette (ABC) transporters. Science. 2001;293:1793–1800. [PubMed]
18. Dawson RJP, Locher KP. Structure of a bacterial multidrug ABC transporter. Nature. 2006;443:180–185. [PubMed]
19. Chang G, Roth CB, Reyes CL, Pornillos O, Chen YJ, Chen AP. Retraction. Science. 2006;314:1875. [PubMed]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...