Conjoint Feature Representation of GO and Protein Sequence for PPI Prediction Based on an Inception RNN Attention Network

Mol Ther Nucleic Acids. 2020 Aug 25:22:198-208. doi: 10.1016/j.omtn.2020.08.025. eCollection 2020 Dec 4.

Abstract

Protein-protein interactions (PPIs) are pivotal for cellular functions and biological processes. In the past years, computational methods using amino acid sequences and gene ontology (GO) annotations of proteins for prioritizing PPIs have provided important references for biological experiments in the wet lab. Despite the current success, sequence information and ontological annotation in semantic representation have not been integrated into current methods. We propose a deep-learning-based PPI prediction methodology conjointly featuring sequence information and GO annotation. First, we adopt a word-embedding tool, the NCBI-blueBERT model pre-trained on PubMed, to map the GO terms into their semantic vectors. Then, the GO semantic vectors and protein sequence vector serve as the input of the proposed inception recurrent neural network (RNN) attention network (IRAN). The IRAN captures the spatial relationship and the potential sequential feature of the protein sequence and ontological annotation semantics. The extensive experimental results on 12 benchmarks demonstrate that our method achieves superiority over state-of-the-art baselines. In the yeast dataset of a binary PPI prediction, our method improved the performance with the Matthews correlation coefficient increasing from 94.2% to 98.2% and the accuracy from 97.1% to 98.2%. The analogous results were also obtained in other comparison evaluations.

Keywords: GOSeqPPI; NCBI-blueBERT; biomedical research; inception RNN attention network; ontology semantic representation; protein-protein interaction.