A generative model for constructing nucleic acid sequences binding to a protein

BMC Genomics. 2019 Dec 27;20(Suppl 13):967. doi: 10.1186/s12864-019-6299-4.

Abstract

Background: Interactions between protein and nucleic acid molecules are essential to a variety of cellular processes. A large amount of interaction data generated by high-throughput technologies have triggered the development of several computational methods either to predict binding sites in a sequence or to determine whether a pair of sequences interacts or not. Most of these methods treat the problem of the interaction of nucleic acids with proteins as a classification problem rather than a generation problem.

Results: We developed a generative model for constructing single-stranded nucleic acids binding to a target protein using a long short-term memory (LSTM) neural network. Experimental results of the generative model are promising in the sense that DNA and RNA sequences generated by the model for several target proteins show high specificity and that motifs present in the generated sequences are similar to known protein-binding motifs.

Conclusions: Although these are preliminary results of our ongoing research, our approach can be used to generate nucleic acid sequences binding to a target protein. In particular, it will help design efficient in vitro experiments by constructing an initial pool of potential aptamers that bind to a target protein with high affinity and specificity.

Keywords: Aptamer; Protein-nucleic acid binding; Recurrent neural network.

MeSH terms

  • Algorithms
  • Aptamers, Nucleotide / chemistry
  • Aptamers, Nucleotide / metabolism
  • Base Sequence
  • DNA / metabolism*
  • Humans
  • Neural Networks, Computer*
  • Nucleic Acid Conformation
  • Protein Binding
  • Proteins / chemistry
  • Proteins / metabolism*
  • Transcription Factors / metabolism

Substances

  • Aptamers, Nucleotide
  • Proteins
  • Transcription Factors
  • DNA