• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of narLink to Publisher's site
Nucleic Acids Res. Jul 1, 2003; 31(13): 3367–3369.
PMCID: PMC168987

MATRAS: a program for protein 3D structure comparison

Abstract

The recent accumulation of large amounts of 3D structural data warrants a sensitive and automatic method to compare and classify these structures. We developed a web server for comparing protein 3D structures using the program Matras (http://biunit.aist-nara.ac.jp/matras). An advantage of Matras is its structure similarity score, which is defined as the log-odds of the probabilities, similar to Dayhoff's substitution model of amino acids. This score is designed to detect evolutionarily related (homologous) structural similarities. Our web server has three main services. The first one is a pairwise 3D alignment, which is simply align two structures. A user can assign structures by either inputting PDB codes or by uploading PDB format files in the local machine. The second service is a multiple 3D alignment, which compares several protein structures. This program employs the progressive alignment algorithm, in which pairwise 3D alignments are assembled in the proper order. The third service is a 3D library search, which compares one query structure against a large number of library structures. We hope this server provides useful tools for insights into protein 3D structures.

INTRODUCTION

The comparison of protein 3D structures is an important technique in structural biology. Due to the conservation of structural features in evolution, structural similarities provide biologically and evolutionarily interesting insights and help us to predict molecular functions from structures. Recently, the growth of the Protein Data Bank (PDB) has been accelerated by a large scale structure determination project, called ‘structural genomics’ (1) and thus an automatic comparison of 3D structures has become more important to take advantage of the huge amount of structural data.

We now report a new server for protein 3D structure comparisons (http://biunit.aist-nara.ac.jp/matras). Several automatic servers for protein structure comparisons are already available. Among them, the DALI server (http://www2.ebi.ac.uk/dali/) (2) is the most popular, structural biologists routinely use it after solving structures experimentally. Other servers, such as VAST (http://www.ncbi.nlm.nih.gov/Structure/VAST/vast.shtml) (3), SSAP (http://www.biochem.ucl.ac.uk/cgi-bin/cath/GetSsapRasmol.pl) (4) and CE (http://cl.sdsc.edu/ce.html) (5), are also available. They have their own unique features. We believe that our server has two advantages over the other sites. The first point is its novel structural similarity score, which is defined as the log-odds of two probabilities (6), using a scheme similar to Dayhoff's amino acids substitution score (7). It is designed to detect homologous similarity sensitively. The second point is that besides the 3D library search, our server has various other structure comparison methods, such as multiple 3D structure alignment and self 3D structure alignment.

BASIC METHOD

We will briefly explain the outline of our structure comparison method Matras (MArkov TRAnsition of protein Structure evolution) (6). Our structure similarity score is based on the following log-odds formula:

An external file that holds a picture, illustration, etc.
Object name is gkg581equ1.gif

where i and j are the states of structural features, such as the residue–residue distance, P(j) is the probability that a state j appears by chance, and P(ij) is the probability that state i changes to state j during evolution, which is obtained using the Markov transition model. The definition of our score is similar to Dayhoff's model of amino acid substitution (7). We used three kinds of scores, but the final evaluation is done by the distance score Sdis, which depends on the distance between the C beta atoms. The alignment is done by the hierarchical alignment heuristics, in which the SSEs (secondary structure elements) are first aligned, and then a residue-based alignment is iteratively performed using the previous alignment. The details were described in our previous paper (6).

PAIRWISE 3D ALIGNMENT

To compare and align two structures is the most basic procedure for structural comparison. Other structural comparisons, such as multiple alignments and library searches, were developed based on the pairwise 3D alignment. In our web page, a user can assign structures either by inputting the PDB code or by uploading the PDB format files in the user's computer. An alignment, superimposed structures, and various kinds of structural similarities, such as raw score, RMSD, sequence identity are shown. The Z-score is the most sensitive value for detecting homology (6), but when only two structures are aligned, statistical parameters from library searches are not available. To evaluate pairwise similarities, we introduced the following R-score (%):

An external file that holds a picture, illustration, etc.
Object name is gkg581equ2.gif

where A and B represent proteins, S(A, B) is the raw log-odds score between protein A and B, and Smin and Smax are the minimum and maximum scores, respectively. This score is similar to the guide-tree score proposed by Feng and Doolittle (8). We simply assign Smin=0, and Smax is defined as an averaged score value between self similarity scores:

An external file that holds a picture, illustration, etc.
Object name is gkg581equ3.gif

This R score is reasonably sensitive, although a little worse than Z-score in terms of the coverage-reliability plots of the SCOP superfamily (data not shown). Our web server outputs the R-score instead of the Z-score, for a pairwise alignment. It also shows reliability that a pair with a R-score belongs to the same SCOP superfamily or fold. In order to view a superimposed structure, we prepared three options: image, Chime plug-in (http://www.mdlchime.com/chime/), and RasMol (http://www.OpenRasMol.org) as an external application.

MULTIPLE 3D ALIGNMENT

A multiple 3D alignment compares several structures belonging to the same superfamily, which provides important biological insight such as conserved sites or conserved structural features. However, it is well known that the problem of multiple sequence alignment is difficult to solve strictly, and that for 3D structures must be much more difficult because of the multi-body properties of 3D structures. To solve the problem within a reasonable computational time, we used the progressive alignment algorithm, which is the most popular heuristics for multiple sequence alignment (8). The progressive alignment consists of the following three steps: (i) calculate pairwise 3D alignments and similarities for all of the protein pairs; (ii) construct a guide tree using the R-score (Eq. 2) by the UPGMA method; (iii) starting from the leaf nodes of the guide tree, progressively align all of the nodes, in order of decreasing similarity. For aligning a group to another group, all of the protein pairs between the two groups are tried, and the best pairwise alignment determines the alignment of the two groups. In other words, our multiple 3D alignment is performed by assembling the results of the pairwise 3D alignments in the proper order. Using our web server, a user can compare up to 10 structures. It also shows the superimposed structures and a dendrogram of structural similarities (Fig. (Fig.11).

Figure 1
Screenshots of the multiple 3D alignment service. (A) Title page with forms where a user can assign up to 10 structures. (B) A calculated multiple 3D alignment. (C) A Chime plug-in view of superimposed multiple structures. (D) A dendrogram of structural ...

3D LIBRARY SEARCH

This is for searching similar structures of a query structure within a large number of library structures. Among the several services of our server, only this search service returns the result by email, because it requires long computational times (20–40 min). A user can upload a PDB file as a query structure and the result will contain a list of similar library structures ranked by the Z-score and all of the pairwise alignments between the query and similar library structures (shown in Fig. Fig.2).2). Two kinds of library sets, the PDB representative list (updated weekly) and the SCOP (9) domain representative list are available. The latter list is useful when a user wants to know the domain configuration of a query structure.

Figure 2
A screenshot and email results of the 3D library search service. (A) A screenshot of the title page. A user can assign one query structure, either by inputting the PDB code or by uploading a PDB format file. (B) A portion of the results sent by email. ...

OTHER SERVICES

The Matras server contains two other services. The first one is self 3D alignment, which finds internal similarities within one protein structure. It is useful to reveal the repeated structures of proteins. The second one is a standard sequence homology search against the PDB using the BLAST program (10) with a graphical representation of the aligned regions.

FUTURE PLANS

We are now preparing to distribute Matras source codes for users who wish to use it in the stand alone environment. We also plan to develop the web database containing all the results of automatic structure classifications and multiple 3D alignments, calculated by Matras.

ACKNOWLEDGEMENTS

We thank Dr Nozomi Nagano, Dr Keiko Matsuda, Dr Kensuke Nakamura and two reviewers for their useful critical comments about Matras server. This work was supported by the Special Coordination Funds Promoting Science and Technology, and the Grant-in-Aid for Scientific Research on Priority Area (C), Genome Information Science, from the MEXT (Ministry of Education, Culture, Sports, Science and Technology, Japan).

REFERENCES

1. Westbrook J., Feng,Z., Li,C., Yang,H. and Berman,H.M. (2003) The Protein Data Bank and structural genomics. Nucleic Acids Res., 31, 489–491. [PMC free article] [PubMed]
2. Holm L. and Sander,C. (1993) Protein structure comparison by alignment of distance matrix. J. Mol. Biol., 233, 123–138. [PubMed]
3. Gibrat J.F., Madej,T. and Bryant,S.H. (1996) Surprising similarities in structure comparison. Curr. Opin. Struct. Biol., 6, 377–385. [PubMed]
4. Orengo C.A., Brown,N.P. and Taylor,W.R. (1992) Fast structure alignment for protein databank searching. Proteins, 14, 139–167. [PubMed]
5. Shindyalov I.N. and Bourne,P.E. (1998) Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Eng., 11, 749–747. [PubMed]
6. Kawabata T. and Nishikawa,K. (2000) Protein structure comparison using the Markov transition of evolution. Proteins, 41, 108–122. [PubMed]
7. Dayhoff M.O., Schwartz,R.M. and Orcutt,B.C. (1978) A model of evolutionary change in proteins. In Dayhoff,M.O. (ed), Atlas of Protein Sequence and Structure, Vol. 5, Suppl. 3. National Biomedical Research Foundation, Washington DC, pp. 345–352.
8. Feng D.F. and Doolittle,R.F. (1987) Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J. Mol. Evol., 25, 351–360. [PubMed]
9. Lo Conte L., Brenner,S.E., Tim,J.P., Hubbard,T.J.P., Chothia,C. and Murzin,A.G. (2002) SCOP database in 2002: refinements accommodate structural genomics. Nucleic Acids Res., 30, 264–267. [PMC free article] [PubMed]
10. Altschul S.F., Madden,T.L., Schäffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, 3389–3402. [PMC free article] [PubMed]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

  • PubMed
    PubMed
    PubMed citations for these articles
  • Substance
    Substance
    PubChem Substance links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...