Genomic mid-range inhomogeneity correlates with an abundance of RNA secondary structures

BMC Genomics. 2008 Jun 12:9:284. doi: 10.1186/1471-2164-9-284.

Abstract

Background: Genomes possess different levels of non-randomness, in particular, an inhomogeneity in their nucleotide composition. Inhomogeneity is manifest from the short-range where neighboring nucleotides influence the choice of base at a site, to the long-range, commonly known as isochores, where a particular base composition can span millions of nucleotides. A separate genomic issue that has yet to be thoroughly elucidated is the role that RNA secondary structure (SS) plays in gene expression.

Results: We present novel data and approaches that show that a mid-range inhomogeneity (~30 to 1000 nt) not only exists in mammalian genomes but is also significantly associated with strong RNA SS. A whole-genome bioinformatics investigation of local SS in a set of 11,315 non-redundant human pre-mRNA sequences has been carried out. Four distinct components of these molecules (5'-UTRs, exons, introns and 3'-UTRs) were considered separately, since they differ in overall nucleotide composition, sequence motifs and periodicities. For each pre-mRNA component, the abundance of strong local SS (< -25 kcal/mol) was a factor of two to ten greater than a random expectation model. The randomization process preserves the short-range inhomogeneity of the corresponding natural sequences, thus, eliminating short-range signals as possible contributors to any observed phenomena.

Conclusion: We demonstrate that the excess of strong local SS in pre-mRNAs is linked to the little explored phenomenon of genomic mid-range inhomogeneity (MRI). MRI is an interdependence between nucleotide choice and base composition over a distance of 20-1000 nt. Additionally, we have created a public computational resource to support further study of genomic MRI.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • 3' Untranslated Regions / chemistry
  • 3' Untranslated Regions / genetics
  • 5' Untranslated Regions / chemistry
  • 5' Untranslated Regions / genetics
  • Algorithms
  • Animals
  • Base Composition
  • Base Sequence
  • Chromosomes, Human, Pair 17 / chemistry
  • Chromosomes, Human, Pair 17 / genetics
  • Computational Biology
  • DNA, Intergenic / chemistry
  • DNA, Intergenic / genetics
  • Exons / genetics
  • Genome, Human
  • Humans
  • Introns / genetics
  • Molecular Sequence Data
  • Nucleic Acid Conformation*
  • RNA Precursors / chemistry*
  • RNA Precursors / genetics*
  • Thermodynamics

Substances

  • 3' Untranslated Regions
  • 5' Untranslated Regions
  • DNA, Intergenic
  • RNA Precursors