Send to

Choose Destination
See comment in PubMed Commons below
Trends Appl Sci Res. 2008 Dec 1;3(4):285-291.

On the Accuracy of Sequence-Based Computational Inference of Protein Residues Involved in Interactions with DNA.

Author information

Gen NY sis Center for Excellence in Cancer Genomics, Department of Epidemiology and Biostatistics, University at Albany, One Discovery Drive Rensselaer, 12144 New York, USA.


Methods for computational inference of DNA-binding residues in DNA-binding proteins are usually developed using classification techniques trained to distinguish between binding and non-binding residues on the basis of known examples observed in experimentally determined high-resolution structures of protein-DNA complexes. What degree of accuracy can be expected when a computational methods is applied to a particular novel protein remains largely unknown. We test the utility of classification methods on the example of Kernel Logistic Regression (KLR) predictors of DNA-binding residues. We show that predictors that utilize sequence properties of proteins can successfully predict DNA-binding residues in proteins from a novel structural class. We use Multiple Linear Regression (MLR) to establish a quantitative relationship between protein properties and the expected accuracy of KLR predictors. Present results indicate that in the case of novel proteins the expected accuracy provided by an MLR model is close to the actual accuracy and can be used to assess the overall confidence of the prediction.

PubMed Commons home

PubMed Commons

How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for PubMed Central
    Loading ...
    Support Center