Numericalization of the self adaptive spectral rotation method for coding region prediction

J Theor Biol. 2012 Mar 7:296:95-102. doi: 10.1016/j.jtbi.2011.12.002. Epub 2011 Dec 8.

Abstract

Recently, for identifying protein coding regions in new sequences from unknown organisms without training sets, a Self Adaptive Spectral Rotation (SASR) method has been developed to visualize the Triplet Periodicity (TP) property, which is a simple and universal coding related property. The rough locations of coding regions can be visually revealed by the SASR method, without any training. However, the method does not numerically discriminate the locations of coding regions. Based on the SASR method, we develop a new approach, named the T-Z-T analysis, to provide numerical results of coding region prediction. This approach adopts a t-test segmentation to separate coding and non-coding regions in the SASR's output and further uses a z-test filter to recognize region patterns. After that, another t-test segmentation is conducted to break down adjacent coding regions by detecting the frame shifts. Since it is based on the graphic output of the SASR, this approach does not require any training. Meanwhile, this approach is more stable, because it is not sensitive to errors in the input DNA sequence. Such advantages make it suitable for coding region prediction in the early stage, when there is insufficient training set, and even the input data are inaccurate.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Codon / genetics
  • Computational Biology / methods
  • DNA, Mitochondrial / genetics
  • Humans
  • Open Reading Frames / genetics*
  • Sequence Analysis, DNA / methods*

Substances

  • Codon
  • DNA, Mitochondrial