Format

Send to

Choose Destination
Bioinformatics. 2019 Nov 1;35(22):4577-4585. doi: 10.1093/bioinformatics/btz283.

DeepPASTA: deep neural network based polyadenylation site analysis.

Author information

1
Department of Computer Science and Engineering, University of California, Riverside, CA 92521, USA.
2
Department of Integrative Biology and Physiology, University of California, Los Angeles, CA 90095, USA.
3
Institute of Integrative Genome Biology, University of California, Riverside, CA 92521, USA.
4
Bioinformatics Division, BNRIST, Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China.

Abstract

MOTIVATION:

Alternative polyadenylation (polyA) sites near the 3' end of a pre-mRNA create multiple mRNA transcripts with different 3' untranslated regions (3' UTRs). The sequence elements of a 3' UTR are essential for many biological activities such as mRNA stability, sub-cellular localization, protein translation, protein binding and translation efficiency. Moreover, numerous studies in the literature have reported the correlation between diseases and the shortening (or lengthening) of 3' UTRs. As alternative polyA sites are common in mammalian genes, several machine learning tools have been published for predicting polyA sites from sequence data. These tools either consider limited sequence features or use relatively old algorithms for polyA site prediction. Moreover, none of the previous tools consider RNA secondary structures as a feature to predict polyA sites.

RESULTS:

In this paper, we propose a new deep learning model, called DeepPASTA, for predicting polyA sites from both sequence and RNA secondary structure data. The model is then extended to predict tissue-specific polyA sites. Moreover, the tool can predict the most dominant (i.e. frequently used) polyA site of a gene in a specific tissue and relative dominance when two polyA sites of the same gene are given. Our extensive experiments demonstrate that DeepPASTA signisficantly outperforms the existing tools for polyA site prediction and tissue-specific relative and absolute dominant polyA site prediction.

AVAILABILITY AND IMPLEMENTATION:

https://github.com/arefeen/DeepPASTA.

SUPPLEMENTARY INFORMATION:

Supplementary data are available at Bioinformatics online.

PMID:
31081512
PMCID:
PMC6853695
[Available on 2020-11-01]
DOI:
10.1093/bioinformatics/btz283

Supplemental Content

Full text links

Icon for Silverchair Information Systems
Loading ...
Support Center