• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of narLink to Publisher's site
Nucleic Acids Res. Jan 1, 2003; 31(1): 406–409.
PMCID: PMC165467

TMPDB: a database of experimentally-characterized transmembrane topologies

Abstract

TMPDB is a database of experimentally-characterized transmembrane (TM) topologies. TMPDB release 6.2 contains a total of 302 TM protein sequences, in which 276 are α-helical sequences, 17 β-stranded, and 9 α-helical sequences with short pore-forming helices buried in the membrane. The TM topologies in TMPDB were determined experimentally by means of X-ray crystallography, NMR, gene fusion technique, substituted cysteine accessibility method, N-linked glycosylation experiment and other biochemical methods. TMPDB would be useful as a test and/or training dataset in improving the proposed TM topology prediction methods or developing novel methods with higher performance, and as a guide for both the bioinformaticians and biologists to better understand TM proteins. TMPDB and its subsets are freely available at the following web site: http://bioinfo.si.hirosaki-u.ac.jp/~TMPDB/.

INTRODUCTION

Transmembrane (TM) proteins serve extremely important functions in life as pump, channel, receptor, energy transducer etc., and have been reported recently to share ~20–30% of genes in a whole genome (14). Nevertheless, the number of three-dimensional (3D) structures with high-resolution is far below one hundred at present, in contrast to more than 18 000 3D structures for soluble proteins registered in PDB (5). It is because TM protein molecules are difficult to crystallize due to their amphiphilic characteristics—hydrophobic TM segments (TMSs) and hydrophilic loops. The functions of TM proteins, however, can be inferred rather easily from their TM topology (i.e., the number of TMSs, TMS position and orientation of TMS to the membrane lipid bilayer) without knowing their 3D structures because of rather simple structural characteristics (6).

In this context, a number of TM topology prediction methods have been developed to determine the structure and function of TM proteins from their amino acid sequences (2,722). However, the proposed prediction methods have not attained the desired accuracies for this purpose. The recent reports of evaluating prediction performance by using experimentally-characterized TM topology datasets have revealed that even the best methods predict the TM topology with accuracies of only around 60% (2325). This could be attributed mainly to the lack of well-characterized topology data to be used for training or tuning TM topology prediction methods. Thus, more high-quality TM topology data are required to evaluate the existing prediction methods more precisely.

For this reason, we have constructed a transmembrane protein database, TMPDB (19,24,26) which is a collection of TM proteins with topologies based on definite experimental evidence such as X-ray crystallography, NMR, gene fusion technique, substituted cysteine accessibility method, Asp (N)-linked glycosylation experiment and other biochemical methods. TMPDB would serve the requirements of both bioinformaticians and biologists, as a test and/or training dataset, for improving the existing TM topology prediction methods and developing novel prediction methods with higher performance as well as for gaining better understanding of TM proteins.

CONSTRUCTION OF TMPDB

We have collected 1074 articles reporting TM topology, by using MEDLINE (27) search with the keywords, ‘transmembrane’ and ‘topology’ (895 articles), by searching directly without using MEDLINE (46 articles), and by referring to the reference position line (RP) of the entries with the following annotations: ‘X-RAY CRYSTALLOGRAPHY’, ‘STRUCTURE BY NEUTRON DIFFRACTION’, ‘STRUCTURE BY ELECTRON CRYO-MICROSCOPY’, ‘STRUCTURE BY NMR’ or ‘TOPOLOGY’ in SWISS-PROT and TrEMBL (28) (133 articles). By checking the content of each collected article, we extracted the experimentally-characterized 302 TM topology models. To obtain the complete sequence annotation that the articles often lack, we crosschecked the sequences in question to public databases such as DDBJ (29), SWISS-PROT, PIR (30) and PDB (5), using the protein name or the partial sequence as a clue. By combining the information contained in the articles and other information of the cross-referenced public databases, we constructed TMPDB in the SWISS-PROT format.

There are 21 cases in total in TMPDB in which two or more articles report topology models for a single sequence, which are almost the same as each other with only a small TMS-position difference (at most 5 amino acids). For these cases, we selected the topology model based on the highest-quality experiment among the reported ones.

TMPDB CURRENT HOLDINGS

The latest release of TMPDB contains 302 TM protein sequences: 276 α-helical sequences (TMPDB_alpha dataset), 17 β-stranded sequences (TMPDB_beta dataset) and 9 α-helical sequences with short pore-forming α-helices buried in the membrane (e.g., aquaporin 1) (TMPDB_alpha-buried dataset). The dataset of TMPDB_alpha comprises 165 prokaryotic and 111 eukaryotic sequences while the TMPDB_beta dataset includes only prokaryotic sequences with topologies determined by X-ray diffraction. The TMPDB_alpha-buried dataset includes 6 prokaryotic and 3 eukaryotic sequences, in which topologies are given by X-ray diffraction (7 entries), N-linked glycosylation and protease-protection assays (1 entry each, respectively). The distributions of the number of TMSs included in TMPDB_alpha, TMPDB_alpha-buried and TMPDB_beta datasets are summarized in Table Table1.1. We note that TMPDB widely covers a variety of numbers of TMSs.

Table 1.
Distributions of the number of transmembrane segments in TMPDB_alpha (276 sequences comprising of 165 prokaryotic and 111 eukaryotic), TMPDB_alpha_non-redundant (231 sequences comprising of 138 prokaryotic and 93 eukaryotic), TMPDB_alpha-buried ...

Furthermore, we subjected TMPDB_alpha, TMPDB_beta and TMPDB_alpha-buried datasets to a sequence similarity check (<30%) using CLUSTALW version 1.81 (31), and finally obtained non-redundant datasets—TMPDB_alpha_non-redundant with 231 entries (138 prokaryotic and 93 eukaryotic), TMPDB_beta_non-redundant with 15 entries, and TMPDB_alpha-buried_non-redundant with 7 entries (4 prokaryotic and 3 eukaryotic). Among the TMPDB_alpha_non-redundant entries, 112 topology models are determined by gene fusion experiment, 47 by X-ray diffraction, 5 by NMR, 2 by substituted cysteine accessibility method, 11 by Asp (N)-linked glycosylation and 54 by other biochemical experiments.

The results of comparing TMPDB with other published TM topology datasets, i.e., MEMSAT 1.5 (9), HTP (11), PHDhtm (12), DAS (13), SOSUI (16), HMMTOP 1.1 (17), TMHMM 1.0 (18), PRED-TMR (20), Moeller's (32) and MPtopo (33) are shown in Table Table2.2. We can see, for example, that 122 sequences are common in both TMPDB_alpha_non-redundant and Moeller's non-redundant datasets, and 109 sequences are unique in the former while only 26 in the latter.

Table 2.
How many entries of other published TM topology datasets are included in our datasets, i.e., TMPDB_alpha, TMPDB_alpha_non-redundant, TMPDB_alpha-buried, TMPDB_alpha-buried_non-redudant, TMPDB_beta ...

By applying TMPDB_alpha_non-redundant dataset TMPDB, we have evaluated 10 proposed TM topology prediction methods: KKD (7), TMpred (8), TopPred II (10), DAS (13), TMAP (14), MEMSAT 2 (15), SOSUI (16), PRED-TMR2+OrienTM (20,21), TMHMM 2.0 (2) and HMMTOP 2.0 (22) (see 24 for the details). The result shows that even the methods with the highest performance could predict the number of TMSs, number of TMSs+position, and N-tail location with accuracies of only 69.6%, 66.7% and 79.7% for prokaryotic sequences, and 68.8%, 64.5% and 72.0% for eukaryotic ones, respectively. Furthermore, by combining several methods out of the 10 and employing a simple majority-voting approach, we have improved the prediction accuracies to 79.7%, 76.8% and 89.1% for prokaryotic sequences, and 73.1%, 69.9% and 80.6% for eukaryotic ones, respectively (ConPred, 24). The detailed results of prediction performance evaluation are posted in our web site: http://bioinfo.si.hirosaki-u.ac.jp/~ConPred/table_accuracy.html.

ConPred is available for use at the site—http://bioinfo.si.hirosaki-u.ac.jp/~ConPred which is linked to the methods involved in the consensus prediction, where users have the individual methods run, and manually copy individual results and paste them in the input field of the ConPred web page. More detailed information on how to use ConPred can be obtained from the site, http://bioinfo.si.hirosaki-u.ac.jp/~ConPred/help.html.

SEARCHING TMPDB

In the TMPDB web page (http://bioinfo.si.hirosaki-u.ac.jp/~TMPDB/), the ‘Database Search’ is available with the use of gene name, PubMed (27) identifier (PMID), accession numbers of DDBJ (29), SWISS-PROT (28) and PIR (30), PDB (5) identifier, the number of TMS(s), or any combinations of those, over TMPDB or selected subset among the 6 TMPDB subsets. A returned search result will be displayed as a list of retrieved entry(ies) with a link to the complete database entry(ies).

FUTURE DIRECTIONS

TMPDB will be updated continuously at least once every year to increase the number of entries as well to add other detailed experimental information from the referred articles, such as gene-fusion points, the accessibility for fused proteins, etc. Also, the next release of TMPDB on the web will support the display of the graphical image of TM topology and employ an SQL-based engine for more efficient and rapid database search.

CITING AND ACCESSING TMPDB

TMPDB should be cited with the present publication as a reference. TMPDB and its subsets are available for anonymous ftp download as a plain text file from http://bioinfo.si.hirosaki-u.ac.jp/~TMPDB/. We appreciate feedback from users concerning new experimentally-characterized TM topology models for submissions, additions and corrections.

ACKNOWLEDGEMENTS

We appreciate Dr Kenta Nakai for taking time to begin this study, and also Takumi Watanabe for his technical support. This research was supported in part by a Grant-in-Aid for Scientific Research on Priority Areas (C) ‘Genome Information Science’ from the Ministry of Education, Culture, Sports, Science and Technology of Japan (grant 14015203).

REFERENCES

1. Stevens T.J. and Arkin,I.T. (2000) Do more complex organisms have a greater proportion of membrane proteins in their genomes? Proteins, 39, 417–420. [PubMed]
2. Krogh A., Larsson,B., von Heijne,G. and Sonnhammer,E.L. (2001) Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J. Mol. Biol., 305, 567–580. [PubMed]
3. Liu J. and Rost,B. (2001) Comparing function and structure between entire proteomes. Protein Sci., 10, 1970–1979. [PMC free article] [PubMed]
4. Arai M., Noto,K., Lao,D.M., Ikeda,M. and Shimizu,T. (2001) Comprehensive analysis of transmembrane protein sequences in 39 microbial genomes. In Matsuda,H., Wong,L., Miyano,S. and Takagi,T. (eds), Genome Informatics 2001. Universal Academy Press, Tokyo, pp. 338–339.
5. Westbrook J., Feng,Z., Jain,S., Bhat,T.N., Thanki,N., Ravichandran,V., Gilliland,G.L., Bluhm,W., Weissig,H., Greer,D.S., Bourne,P.E. and Berman,H.M. (2002) The Protein Data Bank: unifying the archive. Nucleic Acids Res., 30, 245–248. [PMC free article] [PubMed]
6. Sugiyama Y., Arai,M. and Shimizu,T. (2001) Comprehensive functional identification of prokaryotic transmembrane proteins by binary topology pattern. In Matsuda,H., Wong,L., Miyano,S. and Takagi,T. (eds), Genome Informatics 2001. Universal Academy Press, Tokyo, pp. 334–335.
7. Klein P., Kanehisa,M. and De Lisi,C. (1985) The detection and classification of membrane-spanning proteins. Biochim. Biophys. Acta, 815, 468–476. [PubMed]
8. Hofmann K. and Stoffel,W. (1993) TMbase-a database of membrane spanning proteins segments. Biol. Chem. Hoppe Seyler, 347, 166.
9. Jones D.T., Taylor,W.R. and Thornton,J.M. (1994) A model recognition approach to the prediction of all-helical membrane protein structure and topology. Biochemistry, 33, 3038–3049. [PubMed]
10. Claros M.G. and von Heijne,G. (1994) TopPred II: An improved software for membrane protein structure predictions. Comput. Appl. Biosci., 10, 685–686. [PubMed]
11. Fariselli P. and Casadio,R. (1996) HTP: a neural network-based method for predicting the topology of helical transmembrane domains in proteins. Comput. Appl. Biosci., 12, 41–48. [PubMed]
12. Rost B., Casadio,R. and Fariselli,P. (1996) Refining neural network predictions for helical transmembrane proteins by dynamic programming. In States,D.T., Agarwal,P., Gaasterland,T., Hunter,L. and Smith,R.F. (eds), Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology. AAAI Press, Menlo Park, California, pp. 192–200.
13. Cserzo M., Wallin,E., Simon,I., von Heijne,G. and Elofsson,A. (1997) Prediction of transmembrane alpha-helices in prokaryotic membrane proteins: The dense alignment surface method. Protein Eng., 10, 673–676. [PubMed]
14. Persson B. and Argos,P. (1997) Prediction of membrane protein topology utilizing multiple sequence alignments. J. Protein Chem., 16, 453–457. [PubMed]
15. McGuffin L.J., Bryson,K. and Jones,D.T. (2000) The PSIPRED protein structure prediction server. Bioinformatics, 16, 404–405. [PubMed]
16. Hirokawa T., Boon-Chieng,S. and Mitaku,S. (1998) SOSUI: Classification and secondary structure prediction system for membrane proteins. Bioinformatics, 14, 378–379. [PubMed]
17. Tusnady G.E. and Simon,I. (1998) Principles governing amino acid composition of integral membrane proteins: application to topology prediction. J. Mol. Biol., 283, 489–506. [PubMed]
18. Sonnhammer E.L., von Heijne,G. and Krogh,A. (1998) A hidden Markov model for predicting transmembrane helices in protein sequences. In Glasgow,J., Littlejohn,T., Major,F., Lathrop,R., Sankoff,D. and Sensen,C. (eds), Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology. AAAI Press, Menlo Park, California, pp. 175–182.
19. Kihara D., Shimizu,T. and Kanehisa,M. (1998) Prediction of membrane proteins based on classification of transmembrane segments. Protein Eng., 11, 961–970. [PubMed]
20. Pasquier C., Promponas,V.J., Palaios,G.A., Hamodrakas,J.S. and Hamodrakas,S.J. (1999) A novel method for predicting transmembrane segments in proteins based on a statistical analysis of the SwissProt database: the PRED-TMR algorithm. Protein Eng., 12, 381–385. [PubMed]
21. Liakopoulos T.D., Pasquier,C. and Hamodrakas,S.J. (2001) A novel tool for the prediction of transmembrane protein topology based on a statistical analysis of the SwissProt database: the OrienTM algorithm. Protein Eng., 14, 387–390. [PubMed]
22. Tusnady G.E. and Simon,I. (2001) The HMMTOP transmembrane topology prediction server. Bioinformatics, 17, 849–850. [PubMed]
23. Moeller S., Croning,M.D. and Apweiler,R. (2001) Evaluation of methods for the prediction of membrane spanning regions. Bioinformatics, 17, 646–653. [PubMed]
24. Ikeda M., Arai,M., Lao,D.M. and Shimizu,T. (2002) Transmembrane topology prediction methods: a re-assessment and improvement by a consensus method using dataset of experimentally-characterized transmembrane topologies. In Silico Biol., 2, 19–33. [PubMed]
25. Chen C.P. and Rost,B. (2002) State-of-the-art in membrane protein prediction. Appl. Bioinformatics, 1, 21–35. [PubMed]
26. Shimizu T. and Nakai,K. (1994) Construction of a membrane protein database and an evaluation of several prediction methods of transmembrane segments. In Miyano,S., Akutsu,T., Imai,H., Gotoh,O. and Takagi,T. (eds), Proceedings of Genome Informatics Workshop 1994. Universal Academy Press, Tokyo, pp. 148–149.
27. Wheeler D.L., Church,D.M., Lash,A.E., Leipe,D.D., Madden,T.L., Pontius,J.U., Schuler,G.D., Schriml,L.M., Tatusova,T.A., Wagner,L. and Rapp,B.A. (2002) Database resources of the National Center for Biotechnology Information: 2002 update. Nucleic Acids Res., 30, 13–16. [PMC free article] [PubMed]
28. Bairoch A. and Apweiler,R. (2000) The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res., 28, 45–48. [PMC free article] [PubMed]
29. Tateno Y., Imanishi,T., Miyazaki,S., Fukami-Kobayashi,K., Saitou,N., Sugawara,H. and Gojobori,T. (2002) DNA Data Bank of Japan (DDBJ) for genome scale research in life science. Nucleic Acids Res., 30, 27–30. [PMC free article] [PubMed]
30. Wu C.H., Huang,H., Arminski,L., Castro-Alvear,J., Chen,Y., Hu,Z.Z., Ledley,R.S., Lewis,K.C., Mewes,H.W., Orcutt,B.C., Suzek,B.E., Tsugita,A., Vinayaka,C.R., Yeh,L.S., Zhang,J. and Barker,W.C. (2002) The Protein Information Resource: an integrated public resource of functional annotation of proteins. Nucleic Acids Res., 30, 35–37. [PMC free article] [PubMed]
31. Thompson J.D., Higgins,D.G. and Gibson,T.J. (1994) CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res., 22, 4673–4680. [PMC free article] [PubMed]
32. Moeller S., Kriventseva,E.V. and Apweiler,R. (2000) A collection of well characterised integral membrane proteins. Bioinformatics, 16, 1159–1160. [PubMed]
33. Jayasinghe S., Hristova,K. and White,S.H. (2001) MPtopo: A database of membrane protein topology. Protein Sci., 10, 455–458. [PMC free article] [PubMed]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

  • PubMed
    PubMed
    PubMed citations for these articles

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...