Thermodynamic matchers for the construction of the cuckoo RNA family

RNA Biol. 2015;12(2):197-207. doi: 10.1080/15476286.2015.1017206.

Abstract

RNA family models describe classes of functionally related, non-coding RNAs based on sequence and structure conservation. The most important method for modeling RNA families is the use of covariance models, which are stochastic models that serve in the discovery of yet unknown, homologous RNAs. However, the performance of covariance models in finding remote homologs is poor for RNA families with high sequence conservation, while for families with high structure but low sequence conservation, these models are difficult to built in the first place. A complementary approach to RNA family modeling involves the use of thermodynamic matchers. Thermodynamic matchers are RNA folding programs, based on the established thermodynamic model, but tailored to a specific structural motif. As thermodynamic matchers focus on structure and folding energy, they unfold their potential in discovering homologs, when high structure conservation is paired with low sequence conservation. In contrast to covariance models, construction of thermodynamic matchers does not require an input alignment, but requires human design decisions and experimentation, and hence, model construction is more laborious. Here we report a case study on an RNA family that was constructed by means of thermodynamic matchers. It starts from a set of known but structurally different members of the same RNA family. The consensus secondary structure of this family consists of 2 to 4 adjacent hairpins. Each hairpin loop carries the same motif, CCUCCUCCC, while the stems show high variability in their nucleotide content. The present study describes (1) a novel approach for the integration of the structurally varying family into a single RNA family model by means of the thermodynamic matcher methodology, and (2) provides the results of homology searches that were conducted with this model in a wide spectrum of bacterial species.

Keywords: CIN, conserved intergenic neighborhood; CM, covariance model; HMM, hidden Markov model; MFE, minimum free energy; OG, orthologous group of genes; RBS, ribosome binding site; RFM, RNA family model; TDM, thermodynamic matcher; aSD, anti Shine-Dalgarno; alphaproteobacteria; cuckoo RNA; dRNA-seq, differential RNA sequencing; family model; homology search; sRNA, small non-coding RNA; small RNA; structural RNA; thermodynamic matcher.

MeSH terms

  • Algorithms*
  • Gram-Negative Bacteria / genetics*
  • Gram-Positive Bacteria / genetics*
  • Models, Genetic
  • Molecular Sequence Data
  • Nucleic Acid Conformation
  • Nucleotide Motifs
  • RNA, Bacterial / chemistry*
  • RNA, Bacterial / genetics
  • RNA, Small Untranslated / chemistry*
  • RNA, Small Untranslated / genetics
  • Sequence Analysis, RNA
  • Sequence Homology, Nucleic Acid
  • Synteny
  • Thermodynamics

Substances

  • RNA, Bacterial
  • RNA, Small Untranslated