Logo of narLink to Publisher's site
Nucleic Acids Res. Jul 1, 2006; 34(Web Server issue): W219–W224.
Published online Jul 14, 2006. doi:  10.1093/nar/gkl114
PMCID: PMC1538869

TarFisDock: a web server for identifying drug targets with docking approach

Abstract

TarFisDock is a web-based tool for automating the procedure of searching for small molecule–protein interactions over a large repertoire of protein structures. It offers PDTD (potential drug target database), a target database containing 698 protein structures covering 15 therapeutic areas and a reverse ligand–protein docking program. In contrast to conventional ligand–protein docking, reverse ligand–protein docking aims to seek potential protein targets by screening an appropriate protein database. The input file of this web server is the small molecule to be tested, in standard mol2 format; TarFisDock then searches for possible binding proteins for the given small molecule by use of a docking approach. The ligand–protein interaction energy terms of the program DOCK are adopted for ranking the proteins. To test the reliability of the TarFisDock server, we searched the PDTD for putative binding proteins for vitamin E and 4H-tamoxifen. The top 2 and 10% candidates of vitamin E binding proteins identified by TarFisDock respectively cover 30 and 50% of reported targets verified or implicated by experiments; and 30 and 50% of experimentally confirmed targets for 4H-tamoxifen appear amongst the top 2 and 5% of the TarFisDock predicted candidates, respectively. Therefore, TarFisDock may be a useful tool for target identification, mechanism study of old drugs and probes discovered from natural products. TarFisDock and PDTD are available at http://www.dddc.ac.cn/tarfisdock/.

INTRODUCTION

Recent advances in the development of tools for docking small molecules to proteins, i.e. virtual screening, has demonstrated the efficiency of this approach for the discovery of potential lead compounds for drug development in the postgenomic era (13). Numerous docking programs (410) have been used to seek ligands which recognize the 3D structure of a given target obtained by X-ray crystallography, NMR spectroscopy or even by homology modeling [for a review comparing and evaluating docking tools see ref. (11)]. However, identification and validation of druggable targets from amongst thousands of candidate macromolecules is still a challenging task (12,13). A proteomic approach for identification of binding proteins for a given small molecule involves comparison of the protein expression profiles for a given cell or tissue in the presence or absence of the given molecule. This method has not proved very successful in target discovery because it is laborious and time-consuming (14). Thus an efficient computational method for identifying the targets of a small molecule which had been demonstrated experimentally to have an important biological activity would provide a tool of great potential value. An alternative approach that has shown promise in recent years is to use computational methods to find putative binding proteins for a given compound from either genomic or protein databases, and subsequently use experimental procedures to validate the computational result (1518). One such computational approach, which is the reverse of docking a set of ligands into a given target, is to dock a compound with a known biological activity into the binding sites of all the 3D structures in a given protein database. Protein ‘hits’ so identified can then serve as potential candidates for experimental validation. Accordingly, this approach is referred to as reverse docking.

Herein, we present a web-based tool Target Fishing Dock (TarFisDock) for seeking potential binding proteins for a given ligand. It makes use of a ligand–protein reverse docking strategy to search out all possible binding proteins for a small molecule from the potential drug target database (PDTD). The small molecule might be a biologically active compound detected in a cell- or animal-based bioassay screen, a natural product or an existing drug whose molecular target(s) is (are) unknown. Thus, TarFisDock may serve as a valuable tool for identifying targets for a novel synthetic compound or for a newly isolated natural product, for a compound with known biological activity, or for an existing drug whose mechanism of action is unknown.

METHODS

Construction of the potential drug target database

TarFisDock requires a sufficient number of known protein structures covering a diverse range of drug targets. The target proteins collected in PDTD were selected from the literature (1922), and from several online databases, such as DrugBank (http://redpoll.pharmacy.ualberta.ca/drugbank/) (23), and TTD (http://bidd.nus.edu.sg/group/cjttd/) (24). Only proteins with known 3D structures were deposited in PDTD, the Protein Data Bank (PDB) (25) being the major source of their coordinates. PDTD currently consists of 698 entries covering 371 drug targets. These drug targets may be categorized into 15 types, according to their therapeutic areas (20,22), as shown in Table 1. Because TarFisDock does not take into account protein flexibility, PDTD includes redundant entries for proteins known to be flexible. Thus, for example, there are seven entries for HIV-1 (Figure 1).

Figure 1
An example of PDTD querying and finding out 22 targets records of ‘[HIV] DISEASE’.
Table 1
Diseases categories of drug targets in PDTD

Water molecules and complexed ligands were removed from the protein structures, after which hydrogen atoms were added, and KOLLMAN charges (26), with the protonation state of the individual residues being taken into account during charge assignment. A mo12 file (Mol2 file (.mol2) developed by SYBYL, Tripos Inc., St Louis, USA (http://www.tripos.com/) is a complete, portable representation of a SYBYL molecule. It is an ASCII file which contains all the information needed to reconstruct a SYBYL molecule.) was then constructed for each protein. The active site of each protein was defined as all residues within 6.5 Å of the ligand bound, and a sphere file for the active site was generated using the SPHGEN program (27). The PDB, mol2 and sphere files for each protein were stored in PDTD.

Reverse docking procedure using TarFisDock

TarFisDock consists of two parts, a front-end web interface written in both PHP and HTML, with MySQL as database system, and a back-end tool for reverse docking. TarFisDock was developed on the basis of the widely used docking program, DOCK (version 4.0) (5,27). The reverse docking procedure is as follows: (i) TarFisDock either generates a protein target list according to the user's preference (see INPUT) or selects all the protein entries in the PDTD if the user intends to find a new target or targets for an active compound; (ii) TarFisDock docks a given small molecule into the possible binding sites of proteins in the target list, and the interaction energies between the small molecule and the proteins are calculated and recorded; (iii) TarFisDock analyzes the reverse docking result. In general, TarFisDock may output the top 2, 5 or 10% of the ranking list, from which the user may select protein candidates for further biological study. So far, TarFisDock has taken into account the flexibility of the small molecules, but has not yet taken into account protein flexibility. Putative binding proteins are selected by ranking the values of the interaction energy (Einter), which is composed of van der Waals and electrostatic interaction terms (Equation 1),

Einter=i=1ligj=1rec(AijrijaBijrijb+332.0qiqjDrij),
1

where each term is a double sum over ligand atoms i and receptor atoms j; rij is the distance between atom i in the ligand and atom j in the putative receptor protein; Aij and Bij are van der Waals repulsion and attraction parameters, respectively; a and b are the van der Waals repulsion and attraction exponents, respectively; qi and qj are point charges on atoms i and j; D is dielectric function; and 332.0 is the factor that converts the electrostatic energy into kcal/mol. The Amber force field (26) was used for the energy calculation.

INPUT, OUTPUT AND OPTIONS

The input file consists of only the test small molecule in standard mol2 format. The 2D structure of a small molecule can be either sketched using ISIS/Draw (ISIS/Draw, MDL Informations Systems, Inc., San Leandro, CA 945577) or ChemDraw (ChemDraw, CambridgeSoft Corporation, 875 Massachusetts Avenue, Cambridge, MA 02139, USA) or taken from such chemical databases as CCD (http://www.chemnetbase.com/), ACD (http://www.mdli.com/) and SPECS (http://www.specs.net/). The user can convert the small molecule from its 2D structures to the 3D structures by using CORINA (28) (http://www2.chemie.uni-erlangen.de/software/corina/free_struct.html) or other modeling software. The structures can be minimized by means of molecular mechanics, and Gasteiger charges (29) should be assigned to them. Finally, the 3D structure of the small molecule is saved in a mol2 file.

Users can register free of charge for using the TarFisDock server, including access to PDTD. The user must provide his/her email address and username so as to receive the result. After registration, the user can login to the server to upload the mol2 file of the test molecule, customize a target list from PDTD, and submit a job (Figure 2). A job identity number, the ‘job_id’, is assigned to each job by the web server, and the number is appended to a job queue in the back-end server. The user may use the job_id to check the status of his/her job.

Figure 2
An example of the input and output of TarFisDock.

The output is delivered in ascending order of energy score (interaction energy). The archive file contains a list of the scores, together with binding models (in mol2 format) of the small molecule tested within the binding sites of the candidate targets. The user can also browse the ‘Categories’ dropdown menu of PDTD to obtain detailed information for the potential target proteins identified by TarFisDock: the ‘PDB_ID’ field contains a hyperlink to the PDB website; the ‘TARGET NAME’ field also contains a hyperlink to the DrugBank website (Figures 1 and and2),2), and any information linking targets to diseases is contained in the ‘RELATED DISEASE’ field taken from TTD.

TEST CASES

To test the reliability of the TarFisDock server, we searched for the candidate binding proteins for vitamin E and for 4H-tamoxifen. The results and their comparison with the published experimental data are described below.

Potential binding proteins for vitamin E

Vitamin E is an antioxidant which is widely used as a dietary supplement (30). It has also been shown to be of therapeutic value in the treatment of a number of diseases, such as cardiovascular disease and some forms of cancer, and to enhance the immune response (31). It is thus likely that vitamin E may interact with multiple target proteins. Indeed, 12 targets for vitamin E have already been reported (16) (Supplementary Table S1). Candidate vitamin E-binding proteins identified using TarFisDock are listed in Supplementary Table S2. The top 2% candidates identified by TarFisDock, ranked by interaction energies, included 4 out of the 12 targets identified experimentally. Three more of these experimentally identified targets were in the top 10% of the proteins ranked by interaction energy. The top 2 and 10% candidates of vitamin E-binding proteins identified by TarFisDock cover 30 and 50%, respectively, of reported targets verified or implicated by experiments. Other targets, such as glutathione S-transferase, glutathione synthetase, D-amino acid oxidase, and guanylyl cyclase (it is not available in PDTD), were not identified by TarFisDock (Table 2). The main reason may be that TarFisDock does not take into account protein flexibility. It is of interest that many of the top 10% candidate vitamin E-binding proteins are associated with cancer, cardiovascular diseases, immune function and dementia (Supplementary Table S2).

Table 2
The protein target candidates of vitamin E identified by TarFisDock

Potential binding proteins for 4H-tamoxifen

4H-tamoxifen is used as an adjuvant therapy in the treatment of breast cancer (32). Like vitamin E, it is a multiple target drug. So far, 10 proteins have been identified as interaction targets for 4H-tamoxifen or for its metabolite, tamoxifen (16) (Supplementary Table S1). To test the reliability of our TarFisDock server, we used it to search for candidate binding proteins for 4H-tamoxifen in the PDTD. The target candidates so thus identified are listed in Supplementary Table S3, and those which correspond to proteins identified experimentally are shown in Table 3. Three amongst the top 2% of the candidates are known targets of 4H-tamoxifen, namely dihydrofolate reductase, immunoglobulin and glutathione transferase. The top 5% of the candidates include two additional targets identified experimentally, i.e. human fibroblast collagenase and 17β-hydroxysteroid dehydrogenase. Of experimentally confirmed targets for 4H-tamoxifen 30 and 50% appear amongst the top 2 and 5% of the TarFisDock predicted candidates, respectively, indicating the reliability of this server tool again.

Table 3
The protein target candidates of 4H-tamoxifen identified by TarFisDock

TarFisDock has been in use for about 9 months, and over 1000 small molecules, including synthetic compounds, existing drugs and natural products, have been screened. Five groups outside the authors' labs have become involved in screening. Experimental evidence has been obtained to confirm that binding proteins identified by TarFisDock for several compounds indeed display binding activity. In one case, that of a binding protein for a natural product, not only was binding verified experimentally, but a complex was obtained whose 3D crystal structure was solved by X-ray crystallography (data not shown). The computing time required depends on the flexibility of the given compound. Thus, TarFisDock may finish the PDTD search within 5–20 h using one CPU of the SGI Origin3800 superserver.

SUMMARY

In bringing together the target database PDTD and the reverse docking program, TarFisDock server is a convenient tool for identification of potential binding proteins for small molecules such as drugs, lead compounds and natural products. Totally, this web server has already been tested for over 1000 small molecules, the binding proteins for several molecules have been verified by bioassay including crystal structure determination (data not shown). This web server can also be used in mapping the regulation genomic network for an existing drug or a drug candidate. In general, one drug molecule may interact with several targets including targets associated with side effect (toxicity). As illustrated by the examples for identifying potential binding proteins of vitamin E and 4H-tamxifen, TarFisDock provides multiple options for selecting protein targets. These are useful clues for further experimental test in evaluating the efficacy and toxicity of the drug. On the other hand, the targets information produced by TarFisDock is also significant for functional genomic study with the chemical biology paradigm (33). In general, TarFisDock web sever is a convenient tool for ‘fishing’ the target proteins of small molecules, the user just inputs the structure of querying compound and customizes a target list from PDTD (a list of all the targets is recommended).

However, TarFisDock still has certain limitations. The major one is that the protein entries are not enough for covering all the protein information of disease related genomes. The second one is that TarFisDock has not considered the flexibility of proteins during docking simulation. These two aspects will produce negative false. Another limitation is that the scoring function for reverse docking is not accurate enough, which will produce positive false. To overcome these shortages, we are (i) collecting proteins structures (experimental and modeling structures) as more as possible for enlarging PDTD, (ii) developing new docking program including protein flexibility, and (iii) establishing accurate scoring function. TarFisDock and PDTD are available at http://www.dddc.ac.cn/tarfisdock/.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR online.

Supplementary Material

[Supplementary Material]

Acknowledgments

The authors thank Prof. Irwin D. Kuntz for providing the source code of DOCK4.0. The Shanghai Supercomputing Center and Computer Network Information Center are acknowledged for allocation of computing time. The authors thank Prof. Israel Silman at Weizmann Institute of Science and the reviewers for critical reading and helpful comments on the manuscript. This study was supported by the Special Fund for the Major State Basic Research Project of China (grants 2002CB512802 and 2002CB512807) from Ministry of Science and Technology of China and the National Natural Science Foundation of China (grant 10572033).

Conflict of interest statement. None declared.

REFERENCES

1. Shen J., Xu X., Cheng F., Liu H., Luo X., Shen J., Chen K., Zhao W., Shen X., Jiang H. Virtual screening on natural products for discovering active compounds and target information. Curr. Med. Chem. 2003;10:2327–2342. [PubMed]
2. Kitchen D.B., Decornez H., Furr J.R., Bajorath J. Docking and scoring in virtual screening for drug discovery: methods and applications. Nature Rev. Drug Discov. 2004;3:935–949. [PubMed]
3. Mohan V., Gibbs A.C., Cummings M.D., Jaeger E.P., DesJarlais R.L. Docking: successes and challenges. Curr. Pharm. Des. 2005;11:323–333. [PubMed]
4. Rarey M., Kramer B., Lengauer T., Klebe G. A fast flexible docking method using an incremental construction algorithm. J. Mol. Biol. 1996;261:470–489. [PubMed]
5. Ewing T.J., Makino S., Skillman A.G., Kuntz I.D. DOCK 4.0: search strategies for automated molecular docking of flexible molecule databases. J. Comput. Aided Mol. Des. 2001;15:411–428. [PubMed]
6. Morris G.M., Goodsell D.S., Halliday R.S., Huey R., Hart W.E., Belew R.K., Olson A.J. Automated docking using a Lamarckian Genetic Algorithm and an empirical binding Free Energy Function. J. Comput. Chem. 1998;19:1639–1662.
7. Jones G., Willett P., Glen R.C., Leach A.R., Taylor R. Development and validation of a genetic algorithm for flexible docking. J. Mol. Biol. 1997;267:727–748. [PubMed]
8. Li H., Li C., Gui C., Luo X., Chen K., Shen J., Wang X., Jiang H. GAsDock: a new approach for rapid flexible docking based on an improved multi-population genetic algorithm. Bioorg. Med. Chem. Lett. 2004;14:4671–4676. [PubMed]
9. Dooley A.J., Shindo N., Taggart B., Park J.G., Pang Y.P. From genome to drug lead: identification of a small-molecule inhibitor of the SARS virus. Bioorg. Med. Chem. Lett. 2006;16:830–833. [PubMed]
10. Choi V. YUCCA: an efficient algorithm for small-molecule docking. Chem. Biodivers. 2005;2:1517–1524. [PubMed]
11. Kellenberger E., Rodrigo J., Muller P., Rognan D. Comparative evaluation of eight docking tools for docking and virtual screening accuracy. Proteins. 2004;57:225–242. [PubMed]
12. Hajduk P.J., Huth J.R., Tse C. Predicting protein druggability. Drug Discov. Today. 2005;10:1675–1682. [PubMed]
13. Hopkins A.L., Groom C.R. The druggable genome. Nature Rev. Drug Discov. 2002;1:727–730. [PubMed]
14. Huang C.M., Elmets C.A., Tang D.C., Li F., Yusuf N. Proteomics reveals that proteins expressed during the early stage of Bacillus anthracis infection are potential targets for the development of vaccines and drugs. Genomics Proteomics Bioinformatics. 2004;2:143–151. [PubMed]
15. Rockey W.M., Elcock A.H. Rapid computational identification of the targets of protein kinase inhibitors. J. Med. Chem. 2005;48:4138–4152. [PubMed]
16. Chen Y.Z., Zhi D.G. Ligand-protein inverse docking and its potential use in the computer search of protein targets of a small molecule. Proteins. 2001;43:217–226. [PubMed]
17. Chen Y.Z., Ung C.Y. Prediction of potential toxicity and side effect protein targets of a small molecule by a ligand-protein inverse docking approach. J. Mol. Graph. Model. 2001;20:199–218. [PubMed]
18. Paul N., Kellenberger E., Bret G., Muller P., Rognan D. Recovering the true targets of specific ligands by virtual screening of the protein data bank. Proteins. 2004;54:671–680. [PubMed]
19. Bonday Z.Q., Dhanasekaran S., Rangarajan P.N., Padmanaban G. Import of host delta-aminolevulinate dehydratase into the malarial parasite: identification of a new drug target. Nature Med. 2000;6:898–903. [PubMed]
20. Drews J., Ryser S. Classic drug targets. Nat. Biotechnol. 1997;15:1350.
21. Gibbs J.B. Mechanism-based target identification and drug discovery in cancer research. Science. 2000;287:1969–1973. [PubMed]
22. Hardman J.G., Limbird L.E., Gilman A.G. Goodman and Gilman's The Pharmacological Basis of Therapeutics, 10 edn. NY: McGraw-Hill; 2001.
23. Wishart D.S., Knox C., Guo A.C., Shrivastava S., Hassanali M., Stothard P., Chang Z., Woolsey J. DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res. 2006;34:D668–D672. [PMC free article] [PubMed]
24. Chen X., Ji Z.L., Chen Y.Z. TTD: Therapeutic Target Database. Nucleic Acids Res. 2002;30:412–415. [PMC free article] [PubMed]
25. Berman H.M., Westbrook J., Feng Z., Gilliland G., Bhat T.N., Weissig H., Shindyalov I.N., Bourne P.E. The Protein Data Bank. Nucleic Acids Res. 2000;28:235–242. [PMC free article] [PubMed]
26. Weiner S.J., Kollman P.A., Case D.A., Singh U.C., Ghio C., Alagona G., Profeta S., Weiner P. A new force field for molecular mechanical simulation of nucleic acids and proteins. J. Am. Chem. Soc. 1984;106:765–784.
27. Kuntz I.D., Blaney J.M., Oatley S.J., Langridge R., Ferrin T.E. A geometric approach to macromolecule-ligand interactions. J. Mol. Biol. 1982;161:269–288. [PubMed]
28. Gasteiger J., Rudolph C., Sadowski J. Automatic generation of 3D-atomic coordinates for organic molecules. Tetrahedron Comput. Methodol. 1990;3:537–547.
29. Gasteiger J., Marsili M. Iterative partial equalization of orbital electronegativity—a rapid access to atomic charges. Tetrahedron. 1980;36:3219–3228.
30. Norman H.A., Butrum R.R., Feldman E., Heber D., Nixon D., Picciano M.F., Rivlin R., Simopoulos A., Wargovich M.J., Weisburger E.K., et al. The role of dietary supplements during cancer therapy. J. Nutr. 2003;133:3794S–3799S. [PubMed]
31. Han S.N., Meydani S.N. Vitamin E and infectious diseases in the aged. Proc. Nutr. Soc. 1999;58:697–705. [PubMed]
32. Chen X., Ung C.Y., Chen Y. Can an in silico drug-target search method be used to probe potential mechanisms of medicinal plant ingredients? Nature Prod. Rep. 2003;20:432–444. [PubMed]
33. Stockwell B.R. Exploring biology with small organic molecules. Nature. 2004;432:846–854. [PMC free article] [PubMed]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...