Improving the generalizability of protein-ligand binding predictions with AI-Bind

Ayan Chatterjee; Robin Walters; Zohair Shafi; Omair Shafi Ahmed; Michael Sebek; Deisy Gysi; Rose Yu; Tina Eliassi-Rad; Albert-László Barabási; Giulia Menichetti

doi:10.1038/s41467-023-37572-z

Improving the generalizability of protein-ligand binding predictions with AI-Bind

Nat Commun. 2023 Apr 8;14(1):1989. doi: 10.1038/s41467-023-37572-z.

Authors

Ayan Chatterjee¹, Robin Walters², Zohair Shafi², Omair Shafi Ahmed², Michael Sebek^{1

3}, Deisy Gysi^{1

3

4}, Rose Yu⁵, Tina Eliassi-Rad^{1

2

6

7}, Albert-László Barabási^{1

3

8}, Giulia Menichetti^{9

10

11}

Affiliations

¹ Network Science Institute, Northeastern University, Boston, MA, USA.
² Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA.
³ Department of Physics, Northeastern University, Boston, MA, USA.
⁴ Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
⁵ Department of Computer Science and Engineering, University of California, San Diego, CA, USA.
⁶ Santa Fe Institute, Santa Fe, NM, USA.
⁷ The Institute for Experiential AI, Northeastern University, Boston, MA, USA.
⁸ Department of Network and Data Science, Central European University, Budapest, Hungary.
⁹ Network Science Institute, Northeastern University, Boston, MA, USA. giulia.menichetti@channing.harvard.edu.
¹⁰ Department of Physics, Northeastern University, Boston, MA, USA. giulia.menichetti@channing.harvard.edu.
¹¹ Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA. giulia.menichetti@channing.harvard.edu.

Abstract

Identifying novel drug-target interactions is a critical and rate-limiting step in drug discovery. While deep learning models have been proposed to accelerate the identification process, here we show that state-of-the-art models fail to generalize to novel (i.e., never-before-seen) structures. We unveil the mechanisms responsible for this shortcoming, demonstrating how models rely on shortcuts that leverage the topology of the protein-ligand bipartite network, rather than learning the node features. Here we introduce AI-Bind, a pipeline that combines network-based sampling strategies with unsupervised pre-training to improve binding predictions for novel proteins and ligands. We validate AI-Bind predictions via docking simulations and comparison with recent experimental evidence, and step up the process of interpreting machine learning prediction of protein-ligand binding by identifying potential active binding sites on the amino acid sequence. AI-Bind is a high-throughput approach to identify drug-target combinations with the potential of becoming a powerful tool in drug discovery.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Amino Acid Sequence
Binding Sites
Ligands
Protein Binding
Proteins* / metabolism

Substances

Ligands
Proteins

Grants and funding

P01 HL132825/HL/NHLBI NIH HHS/United States