A semantic similarity based methodology for predicting protein-protein interactions: Evaluation with P53-interacting kinases

J Biomed Inform. 2020 Nov:111:103579. doi: 10.1016/j.jbi.2020.103579. Epub 2020 Sep 30.

Abstract

Biomedical literature contains unstructured, rich information regarding proteins, ligands, diseases as well as biological pathways in which they are involved. Systematically analyzing such textual corpus has the potential for biomedical discovery of new protein-protein interactions and hidden drug indications. For this purpose, we have investigated a methodology that is based on a well-established text mining tool, Word2Vec, for the analysis of PubMed full text articles to derive word embeddings, and the use of a simple semantic similarity comparison either by itself or in conjunction with k-Nearest Neighbor (kNN) technique for the prediction of new relationships. To test this methodology, three lines of retrospective analyses of a dataset with known P53-interacting proteins have been conducted. First, we demonstrated that Word2Vec semantic similarity can infer functional relatedness among all kinases known to interact with P53. Second, in a series of time-split experiments, we demonstrated that both a simple similarity comparison and kNN models built with papers published up to a certain year were able to discover P53 interactors described in later publications. Third, in a different scenario of time-split experiments, we examined the predictions of P53-interacting proteins based on the kNN models built on data prior to a certain split year for different time ranges past that year, and found that the cumulative number of correct predictions was indeed increasing with time. We conclude that text mining of research papers in the PubMed literature based on Word2Vec analysis followed by a simple similarity comparison or kNN modeling affords excellent predictions of protein-protein interactions between P53 and kinases, and should have wide applications in translational biomedical studies such as repurposing of existing drugs, drug-drug interaction, and elucidation of mechanisms of action for drugs.

Keywords: Drug repurposing; Protein protein interaction; Semantic similarity; Text mining; Word2Vec.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Data Mining
  • Protein Interaction Maps*
  • PubMed
  • Retrospective Studies
  • Semantics*
  • Tumor Suppressor Protein p53*

Substances

  • Tumor Suppressor Protein p53