Send to

Choose Destination
Nat Biotechnol. 2015 Dec;33(12):1242-1249. doi: 10.1038/nbt.3343. Epub 2015 Nov 16.

Affinity regression predicts the recognition code of nucleic acid-binding proteins.

Author information

Computational Biology Program, Memorial Sloan Kettering Cancer Center, New York, New York, USA.
Tri-I Program in Computational Biology and Medicine, Weill Cornell Graduate College, New York, New York, USA.
Center for Autoimmune Genomics and Etiology (CAGE), Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA.
Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA.
Donnelly Centre, University of Toronto, Toronto, ON, Canada.


Predicting the affinity profiles of nucleic acid-binding proteins directly from the protein sequence is a challenging problem. We present a statistical approach for learning the recognition code of a family of transcription factors or RNA-binding proteins (RBPs) from high-throughput binding data. Our method, called affinity regression, trains on protein binding microarray (PBM) or RNAcompete data to learn an interaction model between proteins and nucleic acids using only protein domain and probe sequences as inputs. When trained on mouse homeodomain PBM profiles, our model correctly identifies residues that confer DNA-binding specificity and accurately predicts binding motifs for an independent set of divergent homeodomains. Similarly, when trained on RNAcompete profiles for diverse RBPs, our model correctly predicts the binding affinities of held-out proteins and identifies key RNA-binding residues, despite the high level of sequence divergence across RBPs. We expect that the method will be broadly applicable to modeling and predicting paired macromolecular interactions in settings where high-throughput affinity data are available.

Supplemental Content

Full text links

Icon for Nature Publishing Group Icon for PubMed Central
Loading ...
Support Center