Shape and secondary structure prediction for ncRNAs including pseudoknots based on linear SVM

BMC Bioinformatics. 2013;14 Suppl 2(Suppl 2):S1. doi: 10.1186/1471-2105-14-S2-S1. Epub 2013 Jan 21.

Abstract

Background: Accurate secondary structure prediction provides important information to undefirstafinding the tertiary structures and thus the functions of ncRNAs. However, the accuracy of the native structure derivation of ncRNAs is still not satisfactory, especially on sequences containing pseudoknots. It is recently shown that using the abstract shapes, which retain adjacency and nesting of structural features but disregard the length details of helix and loop regions, can improve the performance of structure prediction. In this work, we use SVM-based feature selection to derive the consensus abstract shape of homologous ncRNAs and apply the predicted shape to structure prediction including pseudoknots.

Results: Our approach was applied to predict shapes and secondary structures on hundreds of ncRNA data sets with and without psuedoknots. The experimental results show that we can achieve 18% higher accuracy in shape prediction than the state-of-the-art consensus shape prediction tools. Using predicted shapes in structure prediction allows us to achieve approximate 29% higher sensitivity and 10% higher positive predictive value than other pseudoknot prediction tools.

Conclusions: Extensive analysis of RNA properties based on SVM allows us to identify important properties of sequences and structures related to their shapes. The combination of mass data analysis and SVM-based feature selection makes our approach a promising method for shape and structure prediction. The implemented tools, Knot Shape and Knot Structure are open source software and can be downloaded at: http://www.cse.msu.edu/~achawana/KnotShape.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Computational Biology
  • Nucleic Acid Conformation*
  • RNA, Untranslated / chemistry*
  • RNA, Untranslated / genetics
  • Software*
  • Support Vector Machine*

Substances

  • RNA, Untranslated