![]() | ![]() |
Formats:
|
||||||||||
Virtual screening of chemical libraries Department of Pharmaceutical Chemistry, University of California, 600 16th Street, San Francisco, California 94143-2240, USA (e-mail: shoichet/at/cgl.ucsf.edu) Abstract Virtual screening uses computer-based methods to discover new ligands on the basis of biological structures. Although widely heralded in the 1970s and 1980s, the technique has since struggled to meet its initial promise, and drug discovery remains dominated by empirical screening. Recent successes in predicting new ligands and their receptor-bound structures, and better rates of ligand discovery compared to empirical screening, have re-ignited interest in virtual screening, which is now widely used in drug discovery, albeit on a more limited scale than empirical screening. The dominant technique for the identification of new lead compounds in drug discovery is the physical screening of large libraries of chemicals against a biological target (high-throughput screening). An alternative approach, known as virtual screening, is to computationally screen large libraries of chemicals for compounds that complement targets of known structure, and experimentally test those that are predicted to bind well. Such receptor-based virtual screening faces several fundamental challenges, including sampling the various conformations of flexible molecules and calculating absolute binding energies in an aqueous environment. Nevertheless, the field has recently had important successes: new ligands have been predicted along with their receptor-bound structures — in several cases with hit rates (ligands discovered per molecules tested) significantly greater than with high-throughput screening. Even with its current limitations, virtual screening accesses a large number of possible new ligands, most of which may then be simply purchased and tested. For those who can tolerate its false-positive and false-negative predictions, virtual screening offers a practical route to discovering new reagents and leads for pharmaceutical research. Problems with virtual screening A founding idea in molecular biology was that biological function follows from molecular form. If you knew the molecular structure of a receptor — defined here as a biological macromolecule that converts ligand binding into an activity — you could understand and predict its function. This notion has underpinned a 70-year project to determine receptor structures to atomic resolution. From the early X-ray diffraction studies of pepsin and of haemoglobin, to those of macromolecular assemblies like the ribosome and to structural genomics, the taxonomic part of this enterprise (that is, cataloguing receptor structures) has been extraordinarily successful. But still largely unfulfilled is the promise of exploiting receptor structures to discover new ligands that modulate the activities of these molecules and macromolecular assemblies. As early as the mid-1970s, investigators suggested that computational simulations of receptor structures and the chemical forces that govern their interactions would enable ‘structure-based’ ligand design and discovery1,2. Ligands could be designed on the basis of the receptor structure alone, which would free medicinal chemistry from the tyranny of empirical screening, substrate-based design and incremental modification. Since then, structure-based design has contributed to and even motivated the development of marketed drugs3,4, such as the human immunodeficiency virus (HIV) protease inhibitor Viracept and the anti-influenza drug Relenza, typically through cycles of modification and subsequent experimental structure determination. Computational modelling has been used extensively in these efforts5,6 and indeed in non-receptor-based methods; for example, when searching for new ligands on the basis of their chemical similarity to a known ligand or when matching candidate molecules to a ‘pharmacophore’ that represents the chemical properties of a series of known ligands7. But until recently there have been few instances of completely new ligands (not resembling those previously known) discovered directly from receptor-based computation. Although there are now many more and much better receptor structures than there were in the 1970s and 1980s, and computer speed has grown exponentially, drug discovery and chemical biology remain dominated by empirical screening and substrate-based design. Three problems have impeded progress in receptor-guided explorations of ligand chemistry. First, chemical space is vast but most of it is biologically uninteresting: blank, lightless galaxies exist within it into which good ideas at their peril wander. Constraining the number of chemical compounds that are searched to biologically relevant and synthetically accessible molecules remains an area of active research. Second, receptor structures are complicated, resembling “tangled knot(s) of viscera”8. They consist of several thousand atoms, each of which is more or less free to move, and they frequently change shape and solvent structure upon binding to a ligand. To predict what molecules might be recognized by a given receptor, energetically accessible receptor and ligand conformations should be calculated. Unfortunately, the number of possible conformations rises exponentially with the number of rotatable bonds, of which there are thousands in a protein–ligand complex, and the full sampling of conformations involves a set of computational problems for which no general solution is known. Third, calculating ligand–receptor binding energies is difficult9. Binding affinity in an aqueous environment is determined by the solvation energies of the individual molecules (high solvation energies typically disfavour binding), and by the interaction energies between them (high interaction energies favour binding). Solvation and interaction energies are both typically much larger in magnitude than the net affinity, making calculation of the latter problematic. Although it has been possible to calculate accurately the differential affinity between two related ligands using thermodynamic integration methods, doing so is time consuming. Calculating the absolute affinities for many thousands of unrelated molecules necessary to encode new chemical functionality remains beyond our reach. So in principle, it could be argued that structure-based computational screens for new ligands do not work at all. Successes from virtual screening However, genuinely novel ligands have been discovered using structure-based computation. Recently, the structures of known ligands in complex with their receptors have been correctly predicted computationally using the structures of the independent receptor and ligand molecules10–12 (Fig. 1
Even relatively simple receptor-based constraints can improve the likelihood of finding ligands from among the many possible structures in a library, if only by screening out those that are unlikely to bind the receptor17. In library design, for instance, pre-calculation of possible side chains that would complement a receptor structure resulted in structure-based libraries that were tenfold more likely to contain ligands than random18 or diverse17 libraries constructed at the same time. Similarly, virtual and high-throughput screening have been deployed simultaneously to discover new ligands from libraries of several-hundred-thousand diverse molecules. The virtual screens had ‘hit rates’ (defined as the number of compounds that bind at a particular concentration divided by the number of compounds experimentally tested) that were 100-fold to 1,000-fold higher than those achieved by empirical screens19,20 (Table 1); intriguingly, each technique discovered classes of ligands that the other technique had overlooked19, suggesting that the two screening approaches (virtual and empirical) can be complementary. In a few cases the structures of the new ligands in complex with the receptors have been subsequently determined experimentally — typically by X-ray crystallography. Although the docking-derived hits are very different from natural ligands for a given receptor, they often bind at the active site, interacting with conserved receptor groups, as predicted by the docking program21–24 (Fig. 3
How can these successes be reconciled with the field’s methodological weaknesses? Virtual screening avoids the problem of broad searches of chemical space by restricting itself to libraries of specific, accessible compounds (often those that can simply be purchased). This avoids costly syntheses and restricts the search to compounds that are interesting enough biologically to have been previously made, albeit for another reason. Filters may be applied to ensure that the library meets some standard of biological relevance or ‘drug-likeness’25,26. Progress in both the number and quality of molecules in docking libraries has contributed to the increasingly drug-like character of docking hits in recent studies19. Although the problems of sampling molecular conformations and of calculating affinities remain acute, progress has been made both algorithmically16 and in the computer resources available for these calculations. Moreover, we can define success in virtual screening as ‘finding some interesting new ligands’, and not as ‘correctly ranking all the molecules in the library’ or ‘finding all the possible ligands in a library’. Virtual screening thus adopts the same logic as high-throughput screening: as long as some interesting ligands are found, false-negatives are tolerated. Indeed, the two techniques, because of their emphasis on large libraries, share other similarities: both accept limited accuracy in return for screening on a large scale; both look to enrich a list of likely-but-not-certain candidates for further quantitative study; and both are dogged by curious false-positive hits27. Although high-throughput screening remains the dominant technique, virtual screening is now commonly used in pharmaceutical research. Finally, it must be admitted that these successes retain an episodic character. Even expert practitioners are frequently surprised and sometimes disappointed. Geometries of true ligands may be slightly (Fig. 3e Prospects Notwithstanding these caveats, virtual screening will be an evermore important tool for exploring biologically relevant chemical space. Large high-throughput screens have liabilities of their own, and are inaccessible to many investigators (although this will begin to change with the advent of screening resource centres30). In contrast, virtual screening processes large libraries (in principle, libraries that are larger than any library used by empirical screening) and any receptor for which there is a structure at little cost. What advances might be anticipated to make virtual screening reliable and accessible enough to be widely used? Improved sampling and ‘scoring functions’ (calculations of ligand–receptor energetics) will undoubtedly help. The good news is that the fundamentals of molecular interactions are well understood, and so the field has a clear way forward. But the challenge, as always, will be to implement good physical models for hundreds of thousands of possible ligands, each one sampled in many thousands of possible receptor complexes. Indeed, accurate calculation of absolute binding affinity in screens of large, diverse libraries will remain beyond us for the foreseeable future; even predicting the rank order of affinity for disparate ligands in a hit list will be difficult. What we may anticipate are improved explorations of conformational states for ligand and receptor, and scoring functions that use more sophisticated models of solvation and a better balance of electrostatic and non-polar terms. An interesting strategy will be the use of higher-level, typically much slower methods to re-score initial hits from virtual screening, using the screening calculation as a fast first filter31. From these we can hope for better hit rates and better predictions of geometries23 (Fig. 3d To bring virtual screening to a wide community it will be important to democratize the resources on which it depends. Receptor structures are already available through the Protein Data Bank or PDB (for experimental structures), and through databases such as MODBASE (for a much larger number of structures from computer-based modelling32). Several groups provide docking programs without charge to the academic community, although these programs often require some effort to learn. Programs less demanding of expert knowledge, perhaps as a web-accessible resource, would bring docking to many interested non-specialists. Finally, community-accessible chemical libraries are needed. The National Cancer Institute (NCI) provides calculated structures for about 140,000 of its compounds, and will provide at least some of these for experimental testing (http://cactus.nci.nih.gov/). MDL Inc. sells the Available Chemicals Directory (ACD; http://www.mdl.com/products/experiment/available_chem_dir/index.jsp) of commercially available compounds and the ACD-SC for screening collections. To use these libraries in docking screens, molecular properties such as protonation, charge, stereochemistry, accessible conformations and solvation must be calculated. Even details such as stereochemistry, tautomerization and protonation, which we frequently take for granted, are often ambiguous, or can change on binding to a receptor. Recently, about one million commercially accessible molecules have become available through the ZINC database (http://blaster.docking.org/zinc/). ZINC is a free, web-accessible database constructed with docking, substructure searching and compound purchasing in mind. In the immediate future, virtual screening is mature enough to benefit from an aggressive programme of experimental testing. As more docking predictions are evaluated, and sometimes falsified, the methods will improve, especially if care is taken to remove the false-positives that have plagued both high-throughput and virtual screening27. Subsequent solution of receptor–ligand complex structures will be particularly informative; so far, too few of these have been determined. For those who can tolerate its false-positives, structure-based virtual screening is reliable enough to justify its use in active ligand discovery projects, providing an important complementary approach to empirical screening. For some projects, especially those centred in academic laboratories, virtual screening will be the best way to access a large chemical space without the commitment in time, material and infrastructure that an empirical screen demands. Acknowledgments I thank G. Klebe, A. Olson, and W. Jorgensen for contributing figures and comments, and I. D. Kuntz, M. Jacobson, A. Sali, K. Dill and J. Irwin for many insightful conversations. My laboratory’s research in docking is supported by NIGMS. Footnotes Competing interests statement The author declares competing financial interests: details accompany the paper on www.nature.com/nature. References 1. Beddell CR, Goodford PJ, Norrington FE, Wilkinson S, Wootton R. Compounds designed to fit a site of known structure in human haemoglobin. Br J Pharmacol. 1976;57:201–209. [PubMed] 2. Cohen SS. A strategy for the chemotherapy of infectious disease. Science. 1977;197:431–432. [PubMed] 3. Itzstein MV, et al. Rational design of potent sialidase-based inhibitors of influenza virus replication. Nature. 1993;36:418–423. 4. Varney MD, et al. Crystal-structure-based design and synthesis of Benz[cd]indole-containing inhibitors of thymidylate synthase. J. Med. Chem. 1992;35:663–676. [PubMed] 5. Kuntz ID. Structure-based strategies for drug design and discovery. Science. 1992;257:1078–1082. [PubMed] 6. Jorgensen WL. The many roles of computation in drug discovery. Science. 2004;303:1813–1818. [PubMed] 7. Stahura FL, Bajorath J. Virtual screening methods that complement HTS. Comb Chem High Throughput Screen. 2004;7:259–269. [PubMed] 8. Perutz MF. The hemaglobin molecule. Sci Am. 1964;211:64–76. [PubMed] 9. van Gunsteren WF, Berendsen HJC. Computer simulation of molecular dynamics: methodology, applications, and perspectives in chemistry. Angew Chem Int Ed Engl. 1990;29:992–1023. 10. Rizzo R, Wang D, Tirado-Rives J, Jorgensen W. Validation of a model for the complex of HIV-1 reverse transcriptase with sustiva through computation of resistance profiles. J Am Chem Soc. 2000;122:12898–12900. 11. Rosenfeld RJ, et al. Automated docking of ligands to an artificial active site: augmenting crystallographic analysis with computer modeling. J. Comput. Aided Mol. Des. 2003;17:525–536. [PubMed] 12. Brik A, et al. Rapid diversity-oriented synthesis in microtiter plates for in situ screening of HIV protease inhibitors. Chembiochem. 2003;4:1246–1248. [PubMed] 13. Schapira M, et al. Discovery of diverse thyroid hormone receptor antagonists by high-throughput docking. Proc. Natl Acad. Sci. USA. 2003;100:7354–7359. [PubMed] 14. Evers A, Klebe G. Ligand-supported homology modeling of G-protein-coupled receptor sites: models sufficient for successful virtual screening. Angew Chem Int Ed Engl. 2004;43:248–251. [PubMed] 15. Shoichet BK, McGovern SL, Wei BI, Irwin JJ. Lead discovery using molecular docking. Curr Opin Chem Biol. 2002;6:439–446. [PubMed] 16. Schneidman-Duhovny D, Nussinov R, Wolfson HJ. Predicting molecular interactions in silico: II. Protein-protein and protein-drug docking. Curr Med Chem. 2004;11:91–107. [PubMed] 17. Wyss PC, et al. Novel dihydrofolate reductase inhibitors. Structure-based versus diversity-based library design and high-throughput synthesis and screening. J. Med. Chem. 2003;46:2304–2312. [PubMed] 18. Kick EK, et al. Structure-based design and combinatorial chemistry yield low nanomolar inhibitors of cathepsin D. Chem. Biol. 1997;4:297–307. [PubMed] 19. Doman TN, et al. Molecular docking and high-throughput screening for novel inhibitors of protein tyrosine phosphatase-1B. J. Med. Chem. 2002;45:2213–2221. [PubMed] 20. Paiva AM, et al. Inhibitors of dihydrodipicolinate reductase, a key enzyme of the diaminopimelate pathway of Mycobacterium tuberculosis. Biochim. Biophys. Acta . 2001;1545:67–77. [PubMed] 21. Gradler U, et al. A new target for shigellosis: rational design and crystallographic studies of inhibitors of tRNA-guanine transglycosylase. J. Mol. Biol. 2001;306:455–467. [PubMed] 22. Powers RA, Morandi F, Shoichet BK. Structure-based discovery of a novel, noncovalent inhibitor of AmpC beta-lactamase. Structure (Camb). 2002;10:1013–1023. [PubMed] 23. Gruneberg S, Stubbs MT, Klebe G. Successful virtual screening for novel inhibitors of human carbonic anhydrase: strategy and experimental confirmation. J Med Chem. 2002;45:3588–3602. [PubMed] 24. Wei BQ, Baase WA, Weaver LH, Matthews BW, Shoichet BK. A model binding site for testing scoring functions in molecular docking. J Mol Biol. 2002;322:339–355. [PubMed] 25. Lipinski CA, Lombardo F, Dominy BW, Feeney PJ. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv Drug Deliv Rev. 1997;23:3–25. 26. Oprea TI. Current trends in lead discovery: are we looking for the appropriate properties? Mol Divers. 2002;5:199–208. [PubMed] 27. McGovern SL, Caselli E, Grigorieff N, Shoichet BK. A common mechanism underlying promiscuous inhibitors from virtual and high-throughput screening. J Med Chem. 2002;45:1712–1722. [PubMed] 28. Krämer O, Hazemann I, Podjarny AD, Klebe G. Virtual screening for inhibitors of human aldose reductase. Proteins. 2004;55:814–823. [PubMed] 29. Horn JR, Shoichet BK. Allosteric inhibition through core disruption. J Mol Biol. 2004;336:1283–1291. [PubMed] 30. Kaiser J. NIH Gears up for chemical genomics. Science. 2004;304:1728. [PubMed] 31. Kalyanaraman, C., Bernacki, K., & Jacobson, M. P. Virtual screening against highly charged active sites: Identifying substrates of alpha-beta barrel enzymes. Biochemistry in the press. 32. Pieper U, Eswar N, Stuart AC, Ilyin VA, Sali A. MODBASE, a database of annotated comparative protein structure models. Nucleic Acids Res. 2002;30:255–259. [PubMed] |
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||
Br J Pharmacol. 1976 Jun; 57(2):201-9.
[Br J Pharmacol. 1976]Science. 1977 Jul 29; 197(4302):431-2.
[Science. 1977]J Med Chem. 1992 Feb 21; 35(4):663-76.
[J Med Chem. 1992]Science. 1992 Aug 21; 257(5073):1078-82.
[Science. 1992]Science. 2004 Mar 19; 303(5665):1813-8.
[Science. 2004]Sci Am. 1964 Nov; 211():64-76.
[Sci Am. 1964]Chembiochem. 2003 Nov 7; 4(11):1246-8.
[Chembiochem. 2003]Proc Natl Acad Sci U S A. 2003 Jun 10; 100(12):7354-9.
[Proc Natl Acad Sci U S A. 2003]Angew Chem Int Ed Engl. 2004 Jan; 43(2):248-51.
[Angew Chem Int Ed Engl. 2004]Science. 1992 Aug 21; 257(5073):1078-82.
[Science. 1992]J Med Chem. 2003 Jun 5; 46(12):2304-12.
[J Med Chem. 2003]Chem Biol. 1997 Apr; 4(4):297-307.
[Chem Biol. 1997]J Med Chem. 2002 May 23; 45(11):2213-21.
[J Med Chem. 2002]Biochim Biophys Acta. 2001 Feb 9; 1545(1-2):67-77.
[Biochim Biophys Acta. 2001]J Mol Biol. 2001 Feb 23; 306(3):455-67.
[J Mol Biol. 2001]J Mol Biol. 2002 Sep 13; 322(2):339-55.
[J Mol Biol. 2002]Mol Divers. 2002; 5(4):199-208.
[Mol Divers. 2002]J Med Chem. 2002 May 23; 45(11):2213-21.
[J Med Chem. 2002]Curr Med Chem. 2004 Jan; 11(1):91-107.
[Curr Med Chem. 2004]J Med Chem. 2002 Apr 11; 45(8):1712-22.
[J Med Chem. 2002]Proteins. 2004 Jun 1; 55(4):814-23.
[Proteins. 2004]J Mol Biol. 2004 Mar 5; 336(5):1283-91.
[J Mol Biol. 2004]J Med Chem. 2002 May 23; 45(11):2213-21.
[J Med Chem. 2002]Structure. 2002 Jul; 10(7):1013-23.
[Structure. 2002]Science. 2004 Jun 18; 304(5678):1728.
[Science. 2004]J Med Chem. 2002 Aug 15; 45(17):3588-602.
[J Med Chem. 2002]Nucleic Acids Res. 2002 Jan 1; 30(1):255-9.
[Nucleic Acids Res. 2002]J Med Chem. 2002 Apr 11; 45(8):1712-22.
[J Med Chem. 2002]J Comput Aided Mol Des. 2003 Aug; 17(8):525-36.
[J Comput Aided Mol Des. 2003]Chembiochem. 2003 Nov 7; 4(11):1246-8.
[Chembiochem. 2003]J Mol Biol. 2001 Feb 23; 306(3):455-67.
[J Mol Biol. 2001]J Mol Biol. 2002 Sep 13; 322(2):339-55.
[J Mol Biol. 2002]Structure. 2002 Jul; 10(7):1013-23.
[Structure. 2002]J Med Chem. 2002 Aug 15; 45(17):3588-602.
[J Med Chem. 2002]Proteins. 2004 Jun 1; 55(4):814-23.
[Proteins. 2004]J Med Chem. 2002 May 23; 45(11):2213-21.
[J Med Chem. 2002]