Format

Send to

Choose Destination
RNA. 2017 Mar;23(3):270-283. doi: 10.1261/rna.059105.116. Epub 2016 Dec 19.

A common class of transcripts with 5'-intron depletion, distinct early coding sequence features, and N1-methyladenosine modification.

Cenik C1,2, Chua HN3,4,5, Singh G2,6,7,8, Akef A9, Snyder MP1, Palazzo AF9, Moore MJ10,7,8, Roth FP11,4,12,13.

Author information

1
Department of Genetics, Stanford University School of Medicine, Stanford, California 94305, USA.
2
Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA.
3
Donnelly Centre, Department of Molecular Genetics, and Department of Computer Science, University of Toronto, Toronto M5S 3E1, Ontario, Canada.
4
Lunenfeld-Tanenbaum Research Institute, Mt. Sinai Hospital, Toronto M5G 1X5, Ontario, Canada.
5
DataRobot, Inc., Boston, Massachusetts 02109, USA.
6
Department of Molecular Genetics, Center for RNA Biology, The Ohio State University, Columbus, Ohio 43210, USA.
7
Howard Hughes Medical Institute, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA.
8
RNA Therapeutics Institute, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA.
9
Department of Biochemistry, University of Toronto, Toronto, Ontario M5S 1A8, Canada.
10
Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA Melissa.Moore@umassmed.edu Fritz.Roth@utoronto.ca.
11
Donnelly Centre, Department of Molecular Genetics, and Department of Computer Science, University of Toronto, Toronto M5S 3E1, Ontario, Canada Melissa.Moore@umassmed.edu Fritz.Roth@utoronto.ca.
12
Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston 02215, Massachusetts, USA.
13
The Canadian Institute for Advanced Research, Toronto M5G 1Z8, Ontario, Canada.

Abstract

Introns are found in 5' untranslated regions (5'UTRs) for 35% of all human transcripts. These 5'UTR introns are not randomly distributed: Genes that encode secreted, membrane-bound and mitochondrial proteins are less likely to have them. Curiously, transcripts lacking 5'UTR introns tend to harbor specific RNA sequence elements in their early coding regions. To model and understand the connection between coding-region sequence and 5'UTR intron status, we developed a classifier that can predict 5'UTR intron status with >80% accuracy using only sequence features in the early coding region. Thus, the classifier identifies transcripts with 5' proximal-intron-minus-like-coding regions ("5IM" transcripts). Unexpectedly, we found that the early coding sequence features defining 5IM transcripts are widespread, appearing in 21% of all human RefSeq transcripts. The 5IM class of transcripts is enriched for non-AUG start codons, more extensive secondary structure both preceding the start codon and near the 5' cap, greater dependence on eIF4E for translation, and association with ER-proximal ribosomes. 5IM transcripts are bound by the exon junction complex (EJC) at noncanonical 5' proximal positions. Finally, N1-methyladenosines are specifically enriched in the early coding regions of 5IM transcripts. Taken together, our analyses point to the existence of a distinct 5IM class comprising ∼20% of human transcripts. This class is defined by depletion of 5' proximal introns, presence of specific RNA sequence features associated with low translation efficiency, N1-methyladenosines in the early coding region, and enrichment for noncanonical binding by the EJC.

KEYWORDS:

5′-UTR introns; N1-methyladenosine; exon junction complex; random forest

PMID:
27994090
PMCID:
PMC5311483
DOI:
10.1261/rna.059105.116
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for HighWire Icon for PubMed Central
Loading ...
Support Center