Format

Send to

Choose Destination
See comment in PubMed Commons below
PLoS One. 2012;7(12):e51673. doi: 10.1371/journal.pone.0051673. Epub 2012 Dec 27.

MiRmat: mature microRNA sequence prediction.

Author information

  • 1The State Key Laboratory of Pharmaceutical Biotechnology and Jiangsu Engineering Research Center for MicroRNA Biology and Biotechnology, School of Life Science, Nanjing University, Nanjing, China.

Abstract

BACKGROUND:

MicroRNAs are known to be generated from primary transcripts mainly through the sequential cleavages by two enzymes, Drosha and Dicer. The sequence of a mature microRNA, especially the 'seeding sequence', largely determines its binding ability and specificity to target mRNAs. Therefore, methods that predict mature microRNA sequences with high accuracy will benefit the identification and characterization of novel microRNAs and their targets, and contribute to inferring the post-transcriptional regulation network at a genome scale.

METHODOLOGY/PRINCIPAL FINDINGS:

We have developed a method, MiRmat, to predict the mature microRNA sequence. MiRmat is essentially composed of two parts: the prediction of Drosha processing site and the identification of Dicer processing site. Based on the analysis of microRNAs from 12 species, we found that the patterns of free energy profiles are conserved among vertebrate microRNA hairpins. Therefore, we introduced in our method the free energy distribution pattern of the downstream part of pri-microRNA secondary structure and Random Forest algorithm to predict the mature microRNA sequence. Based on the evaluation on an independent test dataset from 10 vertebrates, MiRmat was shown to identify 77.8% of the Drosha processing sites and 92.8% of the Dicer sites within a deviation of 2 nt. In a more stringent evaluation by excluding the microRNAs sharing the same family between the training set and test set, MiRmat kept a rather well performance of 71.9% and 87.2% of the identification rate on the Drosha and Dicer site respectively, which represents the ability to deal with the novel microRNA family. MiRmat outperforms other state-of-the-art methods and has a high degree of efficacy for the prediction of mature microRNA sequences of vertebrates.

CONCLUSION:

MiRmat was developed for identifying microRNA mature sequence(s) by introducing the free energy distribution of RNA stem-loop structure and the Random Forest algorithm. We prove that MiRmat has better performance than the existing tools and is applicable among vertebrates. MiRmat is freely available at http://mcube.nju.edu.cn/jwang/lab/soft/MiRmat/.

PMID:
23300555
PMCID:
PMC3531441
DOI:
10.1371/journal.pone.0051673
[PubMed - indexed for MEDLINE]
Free PMC Article
PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for Public Library of Science Icon for PubMed Central
    Loading ...
    Support Center