DNN-m6A: A Cross-Species Method for Identifying RNA N6-Methyladenosine Sites Based on Deep Neural Network with Multi-Information Fusion

Genes (Basel). 2021 Feb 28;12(3):354. doi: 10.3390/genes12030354.

Abstract

As a prevalent existing post-transcriptional modification of RNA, N6-methyladenosine (m6A) plays a crucial role in various biological processes. To better radically reveal its regulatory mechanism and provide new insights for drug design, the accurate identification of m6A sites in genome-wide is vital. As the traditional experimental methods are time-consuming and cost-prohibitive, it is necessary to design a more efficient computational method to detect the m6A sites. In this study, we propose a novel cross-species computational method DNN-m6A based on the deep neural network (DNN) to identify m6A sites in multiple tissues of human, mouse and rat. Firstly, binary encoding (BE), tri-nucleotide composition (TNC), enhanced nucleic acid composition (ENAC), K-spaced nucleotide pair frequencies (KSNPFs), nucleotide chemical property (NCP), pseudo dinucleotide composition (PseDNC), position-specific nucleotide propensity (PSNP) and position-specific dinucleotide propensity (PSDP) are employed to extract RNA sequence features which are subsequently fused to construct the initial feature vector set. Secondly, we use elastic net to eliminate redundant features while building the optimal feature subset. Finally, the hyper-parameters of DNN are tuned with Bayesian hyper-parameter optimization based on the selected feature subset. The five-fold cross-validation test on training datasets show that the proposed DNN-m6A method outperformed the state-of-the-art method for predicting m6A sites, with an accuracy (ACC) of 73.58%-83.38% and an area under the curve (AUC) of 81.39%-91.04%. Furthermore, the independent datasets achieved an ACC of 72.95%-83.04% and an AUC of 80.79%-91.09%, which shows an excellent generalization ability of our proposed method.

Keywords: Bayesian hyper-parameter optimization; N6-methyladenosine sites; deep neural network; elastic net; multi-information fusion.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adenosine / analogs & derivatives*
  • Animals
  • Humans
  • Mice
  • Neural Networks, Computer*
  • RNA / genetics*
  • Sequence Analysis, RNA*

Substances

  • RNA
  • N-methyladenosine
  • Adenosine