Format

Send to

Choose Destination
PLoS Comput Biol. 2019 Oct 9;15(10):e1007309. doi: 10.1371/journal.pcbi.1007309. eCollection 2019 Oct.

miRWoods: Enhanced precursor detection and stacked random forests for the sensitive detection of microRNAs.

Author information

1
School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, United States of America.
2
Departments of Clinical and Biomedical Sciences, College of Veterinary Medicine, Oregon State University, Corvallis, OR, United States of America.
3
Department of Animal and Rangeland Sciences, Oregon State University, Corvallis, OR, United States of America.
4
Department of Biochemistry and Biophysics, Oregon State University, Corvallis, OR, United States of America.

Abstract

MicroRNAs are conserved, endogenous small RNAs with critical post-transcriptional regulatory functions throughout eukaryota, including prominent roles in development and disease. Despite much effort, microRNA annotations still contain errors and are incomplete due especially to challenges related to identifying valid miRs that have small numbers of reads, to properly locating hairpin precursors and to balancing precision and recall. Here, we present miRWoods, which solves these challenges using a duplex-focused precursor detection method and stacked random forests with specialized layers to detect mature and precursor microRNAs, and has been tuned to optimize the harmonic mean of precision and recall. We trained and tuned our discovery pipeline on data sets from the well-annotated human genome, and evaluated its performance on data from mouse. Compared to existing approaches, miRWoods better identifies precursor spans, and can balance sensitivity and specificity for an overall greater prediction accuracy, recalling an average of 10% more annotated microRNAs, and correctly predicts substantially more microRNAs with only one read. We apply this method to the under-annotated genomes of Felis catus (domestic cat) and Bos taurus (cow). We identified hundreds of novel microRNAs in small RNA sequencing data sets from muscle and skin from cat, from 10 tissues from cow and also from human and mouse cells. Our novel predictions include a microRNA in an intron of tyrosine kinase 2 (TYK2) that is present in both cat and cow, as well as a family of mirtrons with two instances in the human genome. Our predictions support a more expanded miR-2284 family in the bovine genome, a larger mir-548 family in the human genome, and a larger let-7 family in the feline genome.

Supplemental Content

Full text links

Icon for Public Library of Science Icon for PubMed Central
Loading ...
Support Center