On docking, scoring and assessing protein-DNA complexes in a rigid-body framework

Marc Parisien; Karl F Freed; Tobin R Sosnick

doi:10.1371/journal.pone.0032647

On docking, scoring and assessing protein-DNA complexes in a rigid-body framework

PLoS One. 2012;7(2):e32647. doi: 10.1371/journal.pone.0032647. Epub 2012 Feb 29.

Authors

Marc Parisien¹, Karl F Freed, Tobin R Sosnick

Affiliation

¹ Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, Illinois, United States of America.

Abstract

We consider the identification of interacting protein-nucleic acid partners using the rigid body docking method FTdock, which is systematic and exhaustive in the exploration of docking conformations. The accuracy of rigid body docking methods is tested using known protein-DNA complexes for which the docked and undocked structures are both available. Additional tests with large decoy sets probe the efficacy of two published statistically derived scoring functions that contain a huge number of parameters. In contrast, we demonstrate that state-of-the-art machine learning techniques can enormously reduce the number of parameters required, thereby identifying the relevant docking features using a miniscule fraction of the number of parameters in the prior works. The present machine learning study considers a 300 dimensional vector (dependent on only 15 parameters), termed the Chemical Context Profile (CCP), where each dimension reflects a specific type of protein amino acid-nucleic acid base interaction. The CCP is designed to capture the chemical complementarities of the interface and is well suited for machine learning techniques. Our objective function is the Chemical Context Discrepancy (CCD), which is defined as the angle between the native system's CCP vector and the decoy's vector and which serves as a substitute for the more commonly used root mean squared deviation (RMSD). We demonstrate that the CCP provides a useful scoring function when certain dimensions are properly weighted. Finally, we explore how the amino acids on a protein's surface can help guide DNA binding, first through long-range interactions, followed by direct contacts, according to specific preferences for either the major or minor grooves of the DNA.

Publication types

Research Support, N.I.H., Extramural
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Algorithms
Artificial Intelligence
Computational Biology / methods
DNA / chemistry*
DNA / genetics
Databases, Protein
Genetic Vectors
Hydrogen Bonding
Models, Statistical
Nucleic Acid Conformation
Protein Binding
Proteins / chemistry*
Reproducibility of Results

Substances

Proteins
DNA

Abstract

Publication types

MeSH terms

Substances

Grants and funding