Format

Send to

Choose Destination
J Comput Biol. 2010 Jun;17(6):853-68. doi: 10.1089/cmb.2008.0023.

Discretization of time series data.

Author information

1
Department of Mathematical Sciences, Clemson University, Clemson, South Carolina 29634-0975, USA. edimit@clemson.edu

Abstract

An increasing number of algorithms for biochemical network inference from experimental data require discrete data as input. For example, dynamic Bayesian network methods and methods that use the framework of finite dynamical systems, such as Boolean networks, all take discrete input. Experimental data, however, are typically continuous and represented by computer floating point numbers. The translation from continuous to discrete data is crucial in preserving the variable dependencies and thus has a significant impact on the performance of the network inference algorithms. We compare the performance of two such algorithms that use discrete data using several different discretization algorithms. One of the inference methods uses a dynamic Bayesian network framework, the other-a time-and state-discrete dynamical system framework. The discretization algorithms are quantile, interval discretization, and a new algorithm introduced in this article, SSD. SSD is especially designed for short time series data and is capable of determining the optimal number of discretization states. The experiments show that both inference methods perform better with SSD than with the other methods. In addition, SSD is demonstrated to preserve the dynamic features of the time series, as well as to be robust to noise in the experimental data. A C++ implementation of SSD is available from the authors at http://polymath.vbi.vt.edu/discretization .

PMID:
20583929
PMCID:
PMC3203514
DOI:
10.1089/cmb.2008.0023
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for PubMed Central
Loading ...
Support Center