PostDOCK: a structural, empirical approach to scoring protein ligand complexes

J Med Chem. 2005 Nov 3;48(22):6821-31. doi: 10.1021/jm0493360.

Abstract

In this work we introduce a postprocessing filter (PostDOCK) that distinguishes true binding ligand-protein complexes from docking artifacts (that are created by DOCK 4.0.1). PostDOCK is a pattern recognition system that relies on (1) a database of complexes, (2) biochemical descriptors of those complexes, and (3) machine learning tools. We use the protein databank (PDB) as the structural database of complexes and create diverse training and validation sets from it based on the "families of structurally similar proteins" (FSSP) hierarchy. For the biochemical descriptors, we consider terms from the DOCK score, empirical scoring, and buried solvent accessible surface area. For the machine-learners, we use a random forest classifier and logistic regression. Our results were obtained on a test set of 44 structurally diverse protein targets. Our highest performing descriptor combinations obtained approximately 19-fold enrichment (39 of 44 binding complexes were correctly identified, while only allowing 2 of 44 decoy complexes), and our best overall accuracy was 92%.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Ligands*
  • Logistic Models
  • Models, Molecular*
  • Protein Binding
  • Proteins / chemistry*
  • Quantitative Structure-Activity Relationship*

Substances

  • Ligands
  • Proteins