Logo of narLink to Publisher's site
Nucleic Acids Res. 2009 Jan; 37(Database issue): D387–D392.
Published online 2008 Oct 18. doi:  10.1093/nar/gkn750
PMCID: PMC2686475

The SWISS-MODEL Repository and associated resources


SWISS-MODEL Repository (http://swissmodel.expasy.org/repository/) is a database of 3D protein structure models generated by the SWISS-MODEL homology-modelling pipeline. The aim of the SWISS-MODEL Repository is to provide access to an up-to-date collection of annotated 3D protein models generated by automated homology modelling for all sequences in Swiss-Prot and for relevant models organisms. Regular updates ensure that target coverage is complete, that models are built using the most recent sequence and template structure databases, and that improvements in the underlying modelling pipeline are fully utilised. As of September 2008, the database contains 3.4 million entries for 2.7 million different protein sequences from the UniProt database. SWISS-MODEL Repository allows the users to assess the quality of the models in the database, search for alternative template structures, and to build models interactively via SWISS-MODEL Workspace (http://swissmodel.expasy.org/workspace/). Annotation of models with functional information and cross-linking with other databases such as the Protein Model Portal (http://www.proteinmodelportal.org) of the PSI Structural Genomics Knowledge Base facilitates the navigation between protein sequence and structure resources.


Three-dimensional protein structures are crucial for understanding protein function at a molecular level. In recent years, tremendous progress in experimental techniques for large-scale protein structure determination by X-ray crystallography and NMR has been achieved. Structural genomic efforts have contributed significantly to the elucidation of novel protein structures (1), and to the development of technologies, which have increased the speed and success rate at which structures can be determined and lowered the cost of the experiments (2,3). However, the number of known protein sequences grows at an ever higher rate as large-scale sequencing projects, such as the Global Ocean Sampling expedition, are producing sequence data at an unprecedented rate (4). Consequently, the last release of the UniProt (5) protein knowledgebase (version 14.0) contained more than 6.5 million sequences, which is about 100 times the number protein structures currently deposited in the Protein Data Bank (6) (∼53 000, September 2008). For the foreseeable future, stable and reliable computational approaches for protein structure modelling will therefore be required to derive structural information for the majority of proteins, and a broad variety of in silico methods for protein structure prediction has been developed in recent years.

Homology (or comparative) modelling techniques have been shown to provide the most accurate models in cases, where experimental structures related to the protein of interest were available. Although the number of protein sequence families increases at a rate that is linear or almost linear with the addition of new sequences (4), the number of distinct protein folds in nature is limited (1,7) and the growth in the complexity of protein families appears as a result of the combination of domains (M. Levitt, manuscript in preparation). Achieving complete structural coverage of whole proteomes (on the level of individual soluble domain structures) by combining experimental and comparative modelling techniques therefore appears to be a realistic goal, and is already been pursued, e.g. by the Joint Center for Structural Genomics for the small model organism Thermotoga maritima (JCSG) (8,9). Assessment of the accuracy of methods for protein structure prediction, e.g. during the bi-annual CASP (Critical Assessment of Techniques for Protein Structure Prediction) experiments (10,11) or the automated EVA project (12), has demonstrated that comparative protein structure modelling is currently the most accurate technique for prediction of the 3D structure of proteins. During the CASP7 experiment, it became apparent that the best fully automated modelling methods have improved to a level where they challenge most human predictors in producing the most accurate models (13–15). Nowadays, comparative protein structure models are often sufficiently accurate to be employed for a wide spectrum of biomedical applications, such as structure based drug design (16–20), functional characterization of diverse members of a protein family (21), or rational protein engineering, e.g. the humanization of therapeutic antibodies, or to study functional properties of proteins (22–26).

Here, we describe the SWISS-MODEL Repository, a database of annotated protein structure models generated by the SWISS-MODEL Pipeline, and a set of associated web-based services that facilitate protein structure modelling and assessment. We emphasize the improvements of the SWISS-MODEL Repository which have been implemented since our last report (27). These include a new pipeline for template selection, the integration with interactive tools in the SWISS-MODEL Workspace, the programmatic access via DAS (distributed annotation system) (28), the implementation of a reference frame for protein sequences based on md5 cryptographic hashes, and the integration with the Protein Model Portal (http://www.proteinmodelportal.org) of the PSI Structural Genomics Knowledge Base (29,30).


Homology modelling

The SWISS-MODEL Repository contains models that are calculated using a fully automated homology modelling pipeline. Homology modelling typically consists of the following steps: selection of a suitable template, alignment of target sequence and template structure, model building, energy minimization and/or refinement and model quality assessment. This requires a set of specialized software tools as well as up-to-date sequence and structure databases. The SWISS-MODEL pipeline (version 8.9) integrates these steps into a fully automated workflow by combining the required programs in a PERL based framework.

Since template search and selection is a crucial step for successful model building, we have implemented a hierarchical template search and selection protocol, which is sufficiently fast to be used for automated large-scale modelling, sensitive in detecting low homology targets, and accurate in correctly identifying close target structures. In the first step, segments of the target sequence sharing close similarity to known protein structures are identified using a conservative BLAST (31) search with restrictive parameters [E-value cut-off: 10−5, 60% minimum sequence identity to sequences of the SWISS-MODEL Template Library SMTL (32)]. This ensures that information about close sequence relationships is not dispersed by the subsequent profile-based search strategies (33). If regions of the target sequence remain uncovered, in the second step a search for suitable templates is performed against a library of Hidden Markov Models for SMTL using HHSearch (14). Templates resulting from both steps are ranked according to their E-value, sequence identity to the target, resolution and structure quality. From this ranked list, the best templates are progressively selected to maximize the length of the modelled region of the protein. New templates are added if they significantly increase the coverage of the target sequence (spanning at least 25 consecutive residues), or new information is gained (e.g. templates spanning several domains help to infer relative domain orientation). For each selected target–template alignment, 3D models are calculated using ProModII (34) and energy minimized using the Gromos force field (35). The quality of the resulting model is assessed using the ANOLEA mean force potential (36).

Depending on the size of the protein and the evolutionary distance to the template, model building can be relatively time-consuming. Therefore, comprehensive databases of pre-computed models (27,37,38) have been developed in order to be able to cross-link real-time model information with other biological data resources, such as sequence databases or genome browsers.

Model database

The SWISS-MODEL Repository is a relational database of models generated by the automated SWISS-MODEL pipeline based on protein sequences from the UniProt database (5). Within the database, model target sequences are uniquely identified by their md5 cryptographic hash of the full length raw amino acid sequence. This mechanism allows the redundancy in protein sequence databases entries to be reduced, and facilitates cross-referencing with databases using different accession code systems. Mapping between UniProt and various database accession code systems to our md5 based reference system is derived from the iProClass database (39). Regular updates are performed for all protein sequences in the SwissProt database (40), as well as complete proteomes of several model organisms (Homo sapiens, Mus musculus, Rattus norvegicus, Drosophila melanogaster, Arabidopsis thaliana, Escherichia coli, Bacillus subtilis, Saccharomyces cerevisiae, Caenorhabditis elegans and Hepacivirus). Incremental updates are performed on a regular basis in order to both include new target sequences from the UniProt database and to take advantage of newly available template structures, whereas full updates are required when major improvements to the underlying modelling algorithms have been made. The current SWISS-MODEL Repository release contains 3.45 million models for 2.72 million unique sequences, built on 26 185 different template structures (34 540 chains), covering 48.8% of the entries from UniProt (14.0), and more specifically 65.4% of the unique sequences of Swiss-Prot (56.0), the manually annotated section of the UniProt knowledgebase. The size of the models ranges from 25 up to 2059 residues (e.g. fatty acid synthase β-subunit from Thermomyces lanuginosus) with an average model length of 221 residues.

Graphical user web interface

The web interface at http://swissmodel.expasy.org/repository/ provides the main entry point to the SWISS-MODEL Repository. Models for specific proteins can be queried using different database accession codes (e.g. UniProt AC and ID, GenBank, IPI, Refseq) or directly with the protein amino acid sequence (or fragments thereof, e.g. for a specific domain). For a given target protein, a graphical overview illustrating the segments for which models (or experimental structures) are available is shown (Figure 1). Functional and domain annotation for the target protein is retrieved dynamically in real time using web service protocols to ensure that the annotation information is up-to-date. UniProt annotation of the target protein is retrieved via REST queries (http://www.uniprot.org). Structural domains in the target protein are annotated by PFAM domain assignment (41), which is retrieved dynamically by querying the InterPro (42) database using the DAS protocol (28). The md5-based reference frame for target proteins allows to update the database accession mappings in between modelling release cycles. This ensures that cross references with functional annotation resources such as InterPro correspond to proteins of identical primary sequence, thereby avoiding commonly observed problems with incorrect cross-references as a result of instable accession codes or asynchronous updates of different data resources. Finally, for each model, a summary page provides information on the modelling process (template selection and alignment), model quality assessment by ANOLEA (36) and Gromos (35), and in page visualization of the structure using the Astex Viewer (43) plugin.

Figure 1.
Typical view of a SWISS-MODEL Repository entry. For the UniProt entry P53354, the α-amylase I (EC; 1,4-α-d-glucan glucanohydrolase) from ...

Integration with SWISS-MODEL Workspace

The SWISS-MODEL Repository is a large-scale database of pre-computed 3D models. Often however, one may be interested in performing additional analyses either on the models themselves, or on the underlying protein target sequence. We have therefore implemented a tight link between the entries of the SWISS-MODEL Repository and the corresponding modules in the SWISS-MODEL Workspace, which provides an interactive web-based, personalized working environment (32,34,44). Besides the functionality for building protein models it provides various modules to assess protein structures and models. The estimation of the quality of a protein model is an important step to assess its usefulness for specific applications. In particular, models based on template structures sharing low sequence identity require careful evaluation. Therefore, entries from the Repository can be directly submitted to the Workspace for quality assessment using different global and local quality scores such as DFire (45), ProQRes (46) or QMEAN (47).

The default output format for models in the Repository is the project file for the program DeepView (34); this program allows the underlying alignments to be adjusted manually and for the request to be resubmitted to Workspace for modelling. While new protein structures are deposited in the PDB on a daily basis, the respective modelling update cycles are more infrequent, resulting in a delay in the incorporation of new templates. The Repository therefore links directly to the corresponding template search module in Workspace, which allows searches for newly released templates to be performed. The direct cross-linking between Repository and Workspace allows combining the advantages of the database of pre-computed models with the flexibility of an interactive modelling system.


Programmatic access

One of the major challenges of computational biology today is the integration of large amounts of diverse data in heterogeneous formats. Very often, data exchange within one domain, e.g. sequence-based data resources, is relatively straightforward, but seamless exchange between resources serving different data types, such as genome browsers and protein structure databases, is more difficult due to the lack of common and accepted standards. DAS (28) is a light-weight mechanism for web service-based annotation exchange. The DAS concept relies on a XML specification which defines the communication between server and client. Queries can be executed by sending a specific http-request to the DAS server. The result of the DAS-Server request is a human readable and easy-to-parse XML-document following the Biodas specifications (http://www.biodas.org).

The DAS-Server of the SWISS-MODEL Repository is based on the DAS/1 standard and can be queried by primary UniProt accession codes or md5-hashs of the corresponding sequences. Individual models for a query sequence (‘SEGMENT’) are annotated as ‘FEATURE’, with information about the start and stop position in the target sequence, template-sequence identity and the URL to the corresponding SWISS-MODEL Repository entry. The DAS service allows the SWISS-MODEL Repository to be cross-linked with other resources using the same standards, e.g. genome browsers. The SWISS-MODEL Repository DAS service is accessible at http://swissmodel.expasy.org/service/das/swissmodel/.

The protein model portal

One of the major bottlenecks in the use of protein models is that, unlike for experimental structures, modelling resources are heterogeneous and distributed over numerous servers. However, it is often beneficial for the user to directly compare the results of different modelling methods for the same protein. We have therefore developed the protein model portal (PMP) as a component of the PSI structural genomics knowledge base (29,30). This resource provides access to all structures in the PDB, functional annotations, homology models, structural genomics protein target tracking information, available protocols and the potential to obtain DNA materials for many of the targets. The PMP currently provides access to several million pre-built models from four PSI centers, ModBase (38) and SWISS-MODEL Repository (27,37).


SWISS-MODEL Repository will be updated regularly to reflect the growth of the sequence and structure databases. Future releases of SWISS-MODEL Repository will include models of oligomeric assemblies, as well as models including essential co-factors, metal ions and structural ligands. Structural clustering of the Swiss Model Template Library will also allow us to routinely include ensembles of models for such proteins, which undergo extensive domain movements.


Users of SWISS-MODEL Repository are requested to cite this article in their publications.


The PSI SGKB Protein Model Portal was supported by the National Institutes of Health as a sub-grant with Fox Chase Cancer Center (3 P20 GM076222-02S1); as a sub-grant with Rutgers University, under Prime Agreement Award Number (3U54GM074958-04S2). SWISS-MODEL Workspace and Repository have been supported by the Swiss Institute of Bioinformatics (SIB). Funding for open access charges: Swiss Institute of Bioinformatics.

Conflict of interest statement. None declared.


We are grateful to Rainer Pöhlmann, [BC]2 & Biozentrum University of Basel, for professional systems support, Pascal Benkert for fruitful discussions on model quality assessment and Jürgen Kopp for pioneering work on earlier versions of SWISS-MODEL Repository. We thank James Battey for critically reading the article. We are indebted to Dr Michael Podvinec for his enthusiastic support and excellent coordination of the Scrum process for the SWISS-MODEL team. We are grateful to Eric Jain for the swift implementation of md5 based REST queries on the UniProt server, and Wendy Tao, John Westbrook and Helen Berman (RCSB) for the great collaboration on the PSI SGKB Protein Model Portal. Computational resources for SWISS-MODEL Repository are provided by [BC]2 Basel Computational Biology Center (http://www.bc2.ch) and Vital-IT (http://www.vital-it.ch).


1. Levitt M. Growth of novel protein structural data. Proc. Natl Acad. Sci. USA. 2007;104:3183–3188. [PMC free article] [PubMed]
2. Slabinski L, Jaroszewski L, Rodrigues AP, Rychlewski L, Wilson IA, Lesley SA, Godzik A. The challenge of protein structure determination—lessons from structural genomics. Protein Sci. 2007;16:2472–2482. [PMC free article] [PubMed]
3. Manjasetty BA, Turnbull AP, Panjikar S, Bussow K, Chance MR. Automated technologies and novel techniques to accelerate protein crystallography for structural genomics. Proteomics. 2008;8:612–625. [PubMed]
4. Yooseph S, Sutton G, Rusch DB, Halpern AL, Williamson SJ, Remington K, Eisen JA, Heidelberg KB, Manning G, Li W, et al. The Sorcerer II Global Ocean Sampling expedition: expanding the universe of protein families. PLoS Biol. 2007;5:e16. [PMC free article] [PubMed]
5. Bairoch A, Apweiler R, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, et al. The Universal Protein Resource (UniProt) Nucleic Acids Res. 2005;33:D154–D159. [PMC free article] [PubMed]
6. Berman H, Henrick K, Nakamura H, Markley JL. The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data. Nucleic Acids Res. 2007;35:D301–D303. [PMC free article] [PubMed]
7. Chothia C. Proteins. One thousand families for the molecular biologist. Nature. 1992;357:543–544. [PubMed]
8. McCleverty CJ, Columbus L, Kreusch A, Lesley SA. Structure and ligand binding of the soluble domain of a Thermotoga maritima membrane protein of unknown function TM1634. Protein Sci. 2008;17:869–877. [PMC free article] [PubMed]
9. Xu Q, Kozbial P, McMullan D, Krishna SS, Brittain SM, Ficarro SB, DiDonato M, Miller MD, Abdubek P, Axelrod HL, et al. Crystal structure of an ADP-ribosylated protein with a cytidine deaminase-like fold, but unknown function (TM1506), from Thermotoga maritima at 2.70 A resolution. Proteins. 2008;71:1546–1552. [PubMed]
10. Kopp J, Bordoli L, Battey JN, Kiefer F, Schwede T. Assessment of CASP7 predictions for template-based modeling targets. Proteins. 2007;69(Suppl 8):38–56. [PubMed]
11. Kryshtafovych A, Fidelis K, Moult J. Progress from CASP6 to CASP7. Proteins. 2007;69(Suppl 8):194–207. [PubMed]
12. Koh IY, Eyrich VA, Marti-Renom MA, Przybylski D, Madhusudhan MS, Eswar N, Grana O, Pazos F, Valencia A, Sali A, et al. EVA: evaluation of protein structure prediction servers. Nucleic Acids Res. 2003;31:3311–3315. [PMC free article] [PubMed]
13. Battey JN, Kopp J, Bordoli L, Read RJ, Clarke ND, Schwede T. Automated server predictions in CASP7. Proteins. 2007;69(Suppl 8):68–82. [PubMed]
14. Soding J. Protein homology detection by HMM-HMM comparison. Bioinformatics. 2005;21:951–960. [PubMed]
15. Zhang Y. Template-based modeling and free modeling by I-TASSER in CASP7. Proteins. 2007;69(Suppl 8):108–117. [PubMed]
16. Hillisch A, Pineda LF, Hilgenfeld R. Utility of homology models in the drug discovery process. Drug Discov. Today. 2004;9:659–669. [PubMed]
17. Tan ES, Groban ES, Jacobson MP, Scanlan TS. Toward deciphering the code to aminergic G protein-coupled receptor drug design. Chem. Biol. 2008;15:343–353. [PMC free article] [PubMed]
18. Thorsteinsdottir HB, Schwede T, Zoete V, Meuwly M. How inaccuracies in protein structure models affect estimates of protein-ligand interactions: computational analysis of HIV-I protease inhibitor binding. Proteins. 2006;65:407–423. [PubMed]
19. Vangrevelinghe E, Zimmermann K, Schoepfer J, Portmann R, Fabbro D, Furet P. Discovery of a potent and selective protein kinase CK2 inhibitor by high-throughput docking. J. Med. Chem. 2003;46:2656–2662. [PubMed]
20. Oshiro C, Bradley EK, Eksterowicz J, Evensen E, Lamb ML, Lanctot JK, Putta S, Stanton R, Grootenhuis PD. Performance of 3D-database molecular docking studies into homology models. J. Med. Chem. 2004;47:764–767. [PubMed]
21. Murray PS, Li Z, Wang J, Tang CL, Honig B, Murray D. Retroviral matrix domains share electrostatic homology: models for membrane binding function throughout the viral life cycle. Structure. 2005;13:1521–1531. [PubMed]
22. Lippow SM, Wittrup KD, Tidor B. Computational design of antibody-affinity improvement beyond in vivo maturation. Nat. Biotechnol. 2007;25:1171–1176. [PMC free article] [PubMed]
23. Junne T, Schwede T, Goder V, Spiess M. The plug domain of yeast Sec61p is important for efficient protein translocation, but is not essential for cell viability. Mol. Biol. Cell. 2006;17:4063–4068. [PMC free article] [PubMed]
24. Peitsch MC. About the use of protein models. Bioinformatics. 2002;18:934–938. [PubMed]
25. Tramontano A. The biological applications of protein models. In: Schwede T, Peitsch MC, editors. Computational Structural Biology. World Scientific Publishing, Singapore; 2008.
26. Li Y, Drummond DA, Sawayama AM, Snow CD, Bloom JD, Arnold FH. A diverse family of thermostable cytochrome P450s created by recombination of stabilizing fragments. Nat. Biotechnol. 2007;25:1051–1056. [PubMed]
27. Kopp J, Schwede T. The SWISS-MODEL Repository: new features and functionalities. Nucleic Acids Res. 2006;34:D315–D318. [PMC free article] [PubMed]
28. Jenkinson AM, Albrecht M, Birney E, Blankenburg H, Down T, Finn RD, Hermjakob H, Hubbard TJ, Jimenez RC, Jones P, et al. Integrating biological data – the distributed annotation system. BMC Bioinformatics. 2008;9(Suppl 8):S3. [PMC free article] [PubMed]
29. Berman HM, Westbrook JD, Gabanyi MJ, Tao Y, Shah R, Kouranov A, Schwede T, Arnold K, Kiefer F, Bordoli L, et al. PSI structural genomics knowledge base. Nucleic Acids Res. 2008 in press.
30. Berman HM. Harnessing knowledge from structural genomics. Structure. 2008;16:16–18. [PubMed]
31. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. [PMC free article] [PubMed]
32. Arnold K, Bordoli L, Kopp J, Schwede T. The SWISS-MODEL workspace: a web-based environment for protein structure homology modelling. Bioinformatics. 2006;22:195–201. [PubMed]
33. Sadowski MI, Jones DT. Benchmarking template selection and model quality assessment for high-resolution comparative modeling. Proteins. 2007;69:476–485. [PubMed]
34. Guex N, Peitsch MC. SWISS-MODEL and the Swiss-PdbViewer: an environment for comparative protein modeling. Electrophoresis. 1997;18:2714–2723. [PubMed]
35. van Gunsteren WF, Billeter SR, Eising A, Hünenberger PH, Krüger P, Mark AE, Scott WRP, Tironi IG. Biomolecular Simulations: The GROMOS96 Manual and User Guide. Zürich: VdF Hochschulverlag ETHZ; 1996.
36. Melo F, Feytmans E. Assessing protein structures with a non-local atomic interaction energy. J. Mol. Biol. 1998;277:1141–1152. [PubMed]
37. Kopp J, Schwede T. The SWISS-MODEL Repository of annotated three-dimensional protein structure homology models. Nucleic Acids Res. 2004;32:D230–D234. [PMC free article] [PubMed]
38. Pieper U, Eswar N, Davis FP, Braberg H, Madhusudhan MS, Rossi A, Marti-Renom M, Karchin R, Webb BM, Eramian D, et al. MODBASE: a database of annotated comparative protein structure models and associated resources. Nucleic Acids Res. 2006;34:D291–D295. [PMC free article] [PubMed]
39. Huang H, Hu ZZ, Arighi CN, Wu CH. Integration of bioinformatics resources for functional analysis of gene expression and proteomic data. Front Biosci. 2007;12:5071–5088. [PubMed]
40. Boutet E, Lieberherr D, Tognolli M, Schneider M, Bairoch A. UniProtKB/Swiss-Prot: The manually annotated section of the UniProt KnowledgeBase. Methods Mol. Biol. 2007;406:89–112. [PubMed]
41. Finn RD, Tate J, Mistry J, Coggill PC, Sammut SJ, Hotz HR, Ceric G, Forslund K, Eddy SR, Sonnhammer EL, et al. The Pfam protein families database. Nucleic Acids Res. 2008;36:D281–D288. [PMC free article] [PubMed]
42. Mulder NJ, Apweiler R. The InterPro database and tools for protein domain analysis. Curr. Protoc. Bioinformatics. 2008 Chapter 2, Unit 2.7. [PubMed]
43. Hartshorn MJ. AstexViewer: a visualisation aid for structure-based drug design. J. Comput. Aided Mol. Des. 2002;16:871–881. [PubMed]
44. Schwede T, Kopp J, Guex N, Peitsch MC. SWISS-MODEL: an automated protein homology-modeling server. Nucleic Acids Res. 2003;31:3381–3385. [PMC free article] [PubMed]
45. Zhou H, Zhou Y. Distance-scaled, finite ideal-gas reference state improves structure-derived potentials of mean force for structure selection and stability prediction. Protein Sci. 2002;11:2714–2726. [PMC free article] [PubMed]
46. Wallner B, Elofsson A. Identification of correct regions in protein models using structural, alignment, and consensus information. Protein Sci. 2006;15:900–913. [PMC free article] [PubMed]
47. Benkert P, Tosatto SC, Schomburg D. QMEAN: a comprehensive scoring function for model quality assessment. Proteins. 2008;71:261–277. [PubMed]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press
PubReader format: click here to try


Save items

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


  • Protein
    Protein translation features of primary database (GenBank) nucleotide records reported in the current articles as well as Reference Sequences (RefSeqs) that include the articles as references.
  • PubMed
    PubMed citations for these articles

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...