# Inference of Gene Regulatory Networks Using Time-Series Data: A Survey

^{*}Address correspondence to this author at the Computational Biology Division, Translational Genomics Research Institute, Phoenix, AZ 85004, USA; Tel: 1(602)343-8485; Fax: 1(602)343-8740; E-mail: gro.negt@amisc

## Abstract

The advent of high-throughput technology like microarrays has provided the platform for studying how different cellular components work together, thus created an enormous interest in mathematically modeling biological network, particularly gene regulatory network (GRN). Of particular interest is the modeling and inference on time-series data, which capture a more thorough picture of the system than non-temporal data do. We have given an extensive review of methodologies that have been used on time-series data. In realizing that validation is an impartible part of the inference paradigm, we have also presented a discussion on the principles and challenges in performance evaluation of different methods. This survey gives a panoramic view on these topics, with anticipation that the readers will be inspired to improve and/or expand GRN inference and validation tool repository.

## 1. INTRODUCTION

Biological system has been traditionally studied *via *reductionism approach, that is, explaining cell behaviors by studying functions of individual cellular components. Though the knowledge being insightful, it has been increasingly apparent that the understanding of the complex cellular system requires understanding of how different components work together. The advent of high-throughput technology like microarrays, where cellular activities can be measured at genome-wide scale, has provided just this platform and thus created an enormous interest in mathematically modeling biological network, particularly gene regulatory network (GRN). The goal is to mimic the biological network in some abstract level, and a better understanding of the underlying biological system could be achieved through the analysis on the resulted mathematical model. To do so, it is critical to have a reliable modeling and inference procedure.

Amid the explosion of efforts in inferring biological network models that has been witnessed by the last decade, there have been accordingly a number of reviews on these different methodologies. Some of these reviews are focused on one specific type of modeling, for example, on Bayesian Network and Dynamic Bayesian Network (graphical model) [1, 2], or topologies (random *v.s.* scale-free *v.s.* hierarchical) [3]. Others describe models in different categories: Cho *et al*. reviewed methods that incorporate prior knowledge as well as machine learning approaches (e.g., Genetic Algorithm or Neural Net) [4], in addition to the general methods we reviewed in this paper; Schlitt *et al*. ordered the modeling methods according to their complexity [5], from simple ones that are just an aggregation of different functional components, to those that are trying to model the dynamics and have a myriad of parameters to tune. van Someren *et al*. took a unique approach, listing the models in a chronological order along with pointing out the key differences between them [6]. Lastly there is a good review by de Jong [7] which provided a concise mathematical background and formulation for many widely used models.

A lot of those early efforts have been focused on non-temporal data, largely due to the paucity of time-series data. As the biological system is inherently complex and GRN is essentially an ensemble of genes which evolve over time, time-series data will clearly capture a more thorough picture of the system than non-temporal data do, which only take *snapshots* of the system at one given time point. Table **11** lists some of the time-series data sets that have been used. (Bar-Joseph has a general review of gene expression time-series data [26]). It is therefore our interest to present a survey of study on models and inference methodologies for time-series data, its promises and unique challenges. In addition to our focus of time-series data inference, there are two other distinctive aspects of our review work that are different from others: (1) we will take a practitioner's view, reviewing different implementations within one modeling framework as well as different modeling frameworks; (2) we have a special interest in performance evaluation: the validation of the methodologies and comparison studies where it is possible. It is our hope that at the end of paper, readers will have a rather complete knowledge of the methods in the literature, and be inspired to improve and/or expand this tool repository.

This paper is organized as follows: modeling and inference methods focusing on inferring the structure of the network will be reviewed in Section 2, which include the *Relevance Network*, *Bayesian Network* and *Dynamic Bayesian Network* (DBN). This is followed by Section 3 in which methods inferring both structure and dynamics will be reviewed. Among them, *Boolean Network* and *Probabilistic Boolean Network* (PBN) have their states in discrete space and *Markov Model* (MM), *State Space Model* (SSM), and *Ordinary Differential Equation* (ODE) typically have their states in continuous space. The prominent characteristics of these models are shown in Fig. (**11**). After a discussion on performance evaluation of the models in Section 4, we conclude the paper in Section 5.

We need to point out that the placement of DBN in Section 2 is due to its close kinship with *Bayesian Network* and we feel it is more natural to introduce them together. It would have well fallen into Section 3 otherwise. In fact DBN was shown to be related to a lot of models in Section 3: Lähdesmäki *et al*. showed that variables in both discrete-valued DBN and PBN can have the same joint probability distribution [27]; DBN can also be considered as a generalization of Boolean Network [28], or SSM [29].

## 2. INFERRING STRUCTURE OF THE NETWORK

Given a set of genes as nodes for a network, the structure is basically the assemble of all the interconnections among the nodes. Depending on the models, these connections usually take on different meanings, but generally specify the relationships between one gene with another gene, or another set of genes. *Relevance Network*, argued by some to be *ad hoc* network, is the simplest model in this category.

### 2.1. Relevance Network

Relevance networks can be categorized as networks in which relationship among genes can be defined using a pairwise measure of relevance. Given a set of *n* genes **G** = {*g*_{1},*g*_{2},...,*g _{n}*} and a set of observations

**D**on genomic profiling (e.g., gene expression) for

*m*time points, a relevance between

*g*and

_{i}*g*can be evaluated using their time-series profiles, [

_{j}*g*

_{i,1}

*g*

_{i,2}...

*g*] and [

_{i,m}*g*

_{j,1}

*g*

_{j,2}...

*g*]. Various relevance measures have been used to infer relationships between two genes, from simple correlation measures to biologically motivated relevance measures.

_{j,m}One example of using a simple Pearson correlation measure is the work of Remondini *et al*., where they used two sets of gene expression data from rat fibroblast cell lines to construct correlation-based networks [30, 31]. Even though such a correlation measure can be useful in many cases, it cannot provide causal information. To overcome this limitation, Gupta *et al*. used a slope metric (SR) to elucidate not only the presence of a relationship between two genes but also its directionality [32], which is used to represent causality. In their study, the structure was determined using a correlation measure with a preferred threshold. The directionality of relevance was determined using the following SR, based on the assumption that a gene *g _{i}* is linearly dependent on another gene

*g*, i.e.,

_{j}*g*=

_{i}*a*+

_{ij}*b*.

_{ij}g_{j}

If
$\mathit{SR}=\left|{b}_{\mathit{ij}}\right|/\left|{b}_{\mathit{ji}}\right|$, then the directionality is determined as *g _{i}* →

*g*and if $\mathit{SR}=\left|{b}_{\mathit{ji}}\right|/\left|{b}_{\mathit{ij}}\right|$, then it becomes

_{j}*g*→

_{j}*g*. This interpretation of the directed edge between two genes is based on the biological assumption that a small change in the source gene is associated with a large change in the target gene. This method was applied to a time-series microarray data of

_{i}*B. subtilis*[21] and showed some correspondence with already known biological information.

Besides the approaches based on relevance measures for the same time point, there are approaches to incorporate time-delay in measuring the relevance between genes. Schmitt *et al*. used a time-lagged correlation to infer gene regulatory networks [19]. For a transcription profile represented by a series of *m* measurements taken at equally spaced time points, the correlation between genes *g _{i}* and

*g*with a time lag,

_{j}**, is**

*τ***R**(

**) =**

*τ**r*(

_{ij}**), defined by**

*τ*

where *g _{i,t}* denotes the expression of a gene

*g*at time

_{i}*t*, $\stackrel{\_}{{g}_{i}}$ is the averaged expression value of a gene

*g*across all time points, and

_{i}*S*is essentially the inner product between the time-shifted profiles. This measure was used to analyze whole genome DNA microarrays of

_{ij}*Synechocystis*under various light intensity conditions [19]. Seed genes were selected first based on the time-lagged correlation between the expression profile and the light intensity profile across the time points. These seed genes were expanded using the time-lagged correlation between the average expression profile of seeds and the expression profile of each gene.

Ma *et al*. and Barker *et al*. used a biologically motivated approach to measure the dependency between two genes using temporal information in time-series profiles [33-35]. After ternarizing the expression data into highly/averagely/lowly expressed states (denoted by H/A/L respectively), Ma *et al*. looked for dependencies between *g _{i}* being ‘H’ at one time and

*g*also being ‘H’ at next time point [33, 34]. Specifically, they computed a statistics on the difference between the number of occurrences of

_{j}*g*being ‘H’ at one time AND

_{i}*g*also being ‘H’ at next time point, and the expected number of these occurrences. The latter was computed by averaging the product of the number of

_{j}*g*being ‘H’ at one time (regardless of

_{i}*g*'s state at next time point) and number of

_{j}*g*being ‘H’ at next time point (regardless of

_{j}*g*'s state at previous time point). A 95 percent confidence level was then used to determine whether the dependency is deemed significant. Using this method, the authors constructed a gene interaction diagram from a yeast data set [10]. Similarly, Barker

_{i}*et al*. proposed a method to determine whether

*g*is regulated by

_{i}*g*, based on three ratios of samples, the ratio of

_{j}*g*activating

_{j}*g*, the ratio of

_{i}*g*repressing

_{j}*g*and the ratio of

_{i}*g*doing nothing on

_{j}*g*[35]. With threshold values to those three ratios, the final structure of a GRN is constructed. These methods investigate the relationship between two time-series profiles with single epoch-delay, thus assuming Markov condition regarding the time-point, where an observation at some time point is dependent on that of the previous time-point only.

_{i}Kwon *et al*. proposed a method based on string alignment to infer transcriptional regulation relationships from time-series gene expression data, in which relevance with arbitrary time-delay can be considered [36]. They converted the time-course of each gene into a string composed of rising(R), constant(C) or falling(F). The similarity between two event strings is evaluated by string alignment algorithm using scoring matrix that describes similarity score between each pair of event characters. This method was applied to a yeast cell-cycle data [10] and showed it could find some already known transcriptional regulation relationships.

There were also efforts to enhance the quality of inferred relevance network through post processing using additional criteria. Bickel used decisive false discovery rate (dFDR) to estimate the probability of spurious connections between genes in GRNs [37]. After building a network using a time-lagged correlation measure, three kinds of probabilities were evaluated for each edge, i) the probability of being a false positive connection, ii) the probability of being a connection with wrong time order and iii) the probability of being a connection with a time-delay while there is actually no time-delay. This method was applied to yeast cell-cycle data [10] and showed it could successfully find already known genetic relationships.

### 2.2. Bayesian Network

A Bayesian network model is a graphical representation of a joint probability distribution of random variables. A Bayesian network *B* is defined as a tuple of (*G*,**Φ**), where *G* is a graph structure that represents conditional dependency relationships between random variables and **Φ** is a set of parameters describing conditional probability distribution. In modeling GRNs, *G* corresponds to the topology of a GRN, where each node represents a gene as a random variable and each edge represents dependency between genes. With Markov assumption on dependency relationships, the joint probability distribution of genes **G** = {*g*_{1},*g*_{2},...,*g _{n}*} is described as follows:

where **Pa**(*g _{i}*) is a set of parents of

*g*in

_{i}*G*and

*θ*is a statistics from

**D**. In this framework, a Bayesian network represents a static joint probability distribution of a set of random variables. For this reason, most of applications of Bayesian networks do not incorporate the temporal information in time-series data. Usual assumption in using Bayesian networks for time-series data is that the time-series is from the stationary state of the target biological system. One may use the temporal information to determine the direction of edges in Bayesian networks. However, it is important to notice that the direction of an edge in Bayesian networks does not necessarily represent causality between random variables.

Learning a Bayesian network *B* = (*G*,**Φ**) from observed data **D** implies learning the dependency structure *G* and learning the set of probabilistic parameters **Φ**. Learning **Φ** is a relatively easy problem once *G* and **D** are given, and learning *G* can be done by finding *G*^{*} with maximum *P*(*G* | **D**) However, learning *G* given **D** is a hard problem because the number of possible graph structures increase exponentially as the size of a network increases. For this reason, most of approaches using Bayesian networks focus on small problems or take heuristics to handle large problems. One popular heuristic approach is restricting the *G* to a certain category.

One of the first applications of Bayesian networks for genetic networks is the work by Friedman *et al*. with the strategy of restricting *G* [38]. In this study, learning Bayesian networks was applied to microarray gene expression data for cell-cycle of *S. cerevisiae* [10]. By using Bayesian network learning, the authors analyzed gene expression data for Markov relation between genes and the coverage of influence for each gene, where the coverage of influence was measured by counting the number of descendants in the graph structure for that gene. They used sparse candidate algorithm, which restricts the candidate parents of each node in *G* during the search. In their study, 800 genes related to cell-cycle were considered as random variables. Their result was evaluated through literature mining and statistical significance test. Even though they used time-series expression data, temporal information was not used in their study due to the previously mentioned reason.

Considering the problem of learning Bayesian networks as an optimization problem for the objective function *P*(*G* | **D**), search algorithms for large solution spaces can be also used. One of such methods is an estimation of distribution algorithm (EDA) and Dai *et al*. used the EDA for learning genetic networks with a Bayesian network model [39]. With EDA, they evolved a population of *G* s that have high Bayesian information criterion (BIC) scores. After some iterated evolution process, *k* graph structures with highest scores were chosen to build final aggregated genetic networks. Their method was applied to two sets of time-series data [10, 11], and was evaluated by Gene Ontology (GO) search and literature mining.

### 2.3. Dynamic Bayesian Network

A Dynamic Bayesian Network (DBN) is an extension of a Bayesian network model to incorporate temporal concept. Compared to conventional Bayesian networks, DBNs include random variables $\left\{{g}_{1,t-1},{g}_{2,t-1},...,{g}_{n,t-1}\right\}$
of time step *t*–1 in addition to $\left\{{g}_{1,t},{g}_{2,t},...,{g}_{n,t}\right\}$
of time step *t* . A transition network is composed of those 2*n* random variables with no edges from time step *t* to *t*–1. It is assumed that the transition probability
$P\left({\overrightarrow{g}}_{t}|{\overrightarrow{g}}_{t-1}\right)$, where
${\overrightarrow{g}}_{t}$ represents the values of *n* genes at time *t* , is homogeneous across entire observation. Learning DBNs can be done using the same idea of learning Bayesian networks. The only difference is that we need to consider additional random variables of time *t* – 1. From this perspective, Friedman *et al*. extended some scoring rules for learning structures from Bayesian networks to the case of DBNs [40].

The difficulty in learning DBNs is its heavy computational complexity. Because additional *n* random variables are considered compared to conventional Bayesian networks, learning algorithms should consider much more candidate graph structures and probabilistic parameters. For this reason, most of applications of DBN usually target smaller systems compared to the study of Bayesian networks. Further, heuristics to restrict candidate graph structures are widely used in the applications of DBNs.

Ong *et al*. used DBNs to infer regulatory network for tryptophan metabolism in *E. coli* [41]. By using prior knowledge (operon map), they restricted possible network structures into predetermined category. Then a DBN was learned using Expectation-Maximization (EM) method with gene expression data of 8 time points [16]. Missal *et al*. used mutual information between two gene expression profiles and applied *χ*^{2}-test for the significance of the mutual information to determine the structure of a DBN [42]. Zhao *et al*. took a similar approach of using mutual information [43], but they used minimum description length (MDL) principle [44] to determine the threshold of the significance, and to remove indirect or false links in a post-pruning process. However, by using different encoding schemes, this method can generate non-unique results that need *ad hoc* adjustment. Dougherty *et al*. overcome this drawback by measuring the description length based on a universal model: normalized maximum likelihood model [45]. Zou *et al*. used DBNs with various time-delay, by shifting time-series profiles with properly predicted amount of time steps [46], and applied their method to the yeast cell-cycle data of Chou *et al*. [11].

Variables in DBNs can take continuous values as well as discrete values. When random variables are continuous in a DBN, conditional probability tables, which are used in the discrete case, cannot be used. To model *P*(*g _{i,t}* |

**Pa**(

*g*)) in a continuous domain, Kim

_{i,t}*et al*. assumed a nonparametric additive regression model with Gaussian noise [47-49],

where *q* is the number of parents of *g _{i,t}*,

*p*is the

_{ij}*j*th parent of

*g*and

_{i,t}*ε*(

_{i}*t*) is a Gaussian noise of

*g*at time

_{i}*t*. A scoring measure for DBN structures was proposed based on the regression model. This method was applied to

*yeast*cell-cycle data of Spellman

*et al*. [10]. A nonlinear regression extension was proposed by Ferrazzi

*et al*. in which

*p*in Eqn. (5) is replaced by

_{ij,t–1}*tanh*(

*p*), where

_{ij,t–1}*tanh*(.) is the hyperbolic tangent function [50].

Several recent studies focused on using different types of gene expression data. Dojer *et al*. proposed a method to handle perturbed gene expression data in using DBNs [51]. In their approach, candidate regulators for each gene were inferred from only a subset of entire data, where the target gene was not perturbed. The motivation of this approach is based on the assumption that data with a specific gene perturbed may not be used in the process of inferring regulators of that gene, because knocking out a gene can disable regulations *toward* that gene. But since knocking out a gene may represent under-expression of that gene, the targets of a perturbed gene may still have regulation effect *from* the perturbed gene. The effectiveness of this method was shown using simulated data from an ordinary differential equation model. Lähdesmäki *et al*. proposed a method for learning DBNs from mixture of steady state data and time-series data [52]. If a steady state data is **D*** _{A}* and a time-series data is

**D**

*, learning a DBN structure*

_{π}*G*requires the evaluation of the marginal likelihood

*P*(

**D**

*,*

_{A}**D**

*|*

_{π}*G*). By assuming

**D**

*and*

_{A}**D**

*are independent given (*

_{π}*G*,

**Φ**), where

**Φ**is a set of probabilistic parameters, the marginal likelihood can be evaluated as follows:

Evaluation of *P*(**D*** _{A}* |

*G*,

**Φ**) can be done in the same way of static Bayesian networks and the evaluation of

*P*(

**D**

*|*

_{π}*G*,

**Φ**) is done using the steady-state distribution of the DBN.

## 3. INFERRING STRUCTURE AND DYNAMICS

Structure alone does not completely describe the network. Often we are interested in the evolution of the system from a given condition, or the response to a particular perturbation, which require a network model that is able to characterize the dynamics and describe the system transitions into future time. The states, defined as the values for the vector of genes, can be either in continuous domain or constrained to be in discrete space.

### 3.1. Discrete State Space

A discrete state space model characterizes a system using quantized data. The most popular approaches are Boolean network and probabilistic Boolean network (PBN).

Boolean network assumes that the gene expression takes just two levels: ON/1 and OFF/0, and the functional relationship between the genes is determined by logical rules. A Boolean network consists of *n* nodes **G** = {*g*_{1},*g*_{2},...,*g _{n}*} and a list of Boolean functions

**F**= {

*f*

_{1},

*f*

_{2},...,

*f*}. Each node ${g}_{i}\in \left\{0,1\right\}$ is a binary variable that represents the state (expression) of gene

_{n}*i*. The Boolean function

*g*=

_{i,t+1}*f*(

_{i}**Pa**(

*g*)) specifies how the value of node

_{i,t}*g*at next time point

_{i}*t*+ 1 is determined by the values of its input nodes

**Pa**(

*g*) at current time point

_{i}*t*.

REVEAL [53] proposed by Liang *et al*. is one of the first Boolean-network-based inference scheme. Based on information theory, if the mutual information of the input and output is equal to the entropy of the output, the input fully determines the output. Hence for each node, REVEAL searches for the minimal input node set that can fully determine the output. Rather than using mutual information, Akutsu *et al*. chose to check consistency for all logic rules [54], and later extended the inference to noisy data [55]. Furthermore, Akutsu *et al*. proved that, if the maximum number of input nodes is bounded by a constant and transition pairs are randomly picked, the sample size needed to reconstruct the original Boolean network is in *O*(log *n*) [54]. However, for limited sample size, there exist considerable number of valid networks that are consistent with the given data. Rather than pick one or a few networks, Martin *et al*. enumerate all the consistent networks and aggregate them by network's attractors [24]. Their results show that most networks share only a few common attractors, which indicate similar network dynamics.

Some view the binary-state system like Boolean network as oversimplification that has significant information loss. Laubenbacher and Stigler introduced a multi-state system where the gene-gene relationship is determined by the computational algebra of the finite field [56]. Normally the solution is not unique, and the inference scheme will pick the minimal network functions by removing all redundant terms. Like REVEAL, this approach assumes noiseless data and the performance suffers when noise presents.

The Boolean network and the multi-state extension are based on the assumption of deterministic gene-gene relationship. Probabilistic Boolean network adds a sense of randomness into the Boolean network by allowing the nodes to have more than one associated Boolean functions. So for a PBN, the nodes **G** 's associated functions are now denoted as **F** = {**F**_{1},**F**_{2},...,**F*** _{n}*}, where ${\text{F}}_{i}=\left\{{f}_{1}^{\left(i\right)},{f}_{2}^{\left(i\right)},...,{{f}_{l}}_{\left(i\right)}^{\left(i\right)}\right\}$, i.e., each node's output is now associated with

*l*possible Boolean functions. At any time point, PBN allows each node

_{(i)}*i*to take only one Boolean function from

**F**

*. Hence there are altogether ${\mathrm{\Pi}}_{i=1}^{n}{l}_{\left(i\right)}$ realizations of a PBN. Perturbation probability and selection probability were later introduced to allow the network be perturbed in current realization or switched between realizations, respectively.*

_{i}It was suggested in Shmulevich *et al*. that *Coefficient of Determination* can be used to infer the network [57]. Later Marshall *et al*. implemented an inference procedure for PBN that successfully infers all the constituent Boolean networks, at well as all the perturbation and selection probabilities associated with them [58]. Unfortunately, as they pointed out in the paper, the amount of temporal data needed for inference is huge. A more practical way to infer PBN is to approximate the network by multivariate Markov model, as shown by Ching *et al*. [59]. In this model, the state of gene *i* at time point *t* takes a binary probability distribution denoted by vector
${{\overrightarrow{p}}_{i,t}=\left[P\left({g}_{i,t}=0\right),P\left({g}_{i,t}=1\right)\right]}^{\prime}$. The model assumes

where *T _{ij}* is the probability transition matrix from gene

*j*to gene

*i*, and

*λ*the non-negative weight factor that has ${\sum}_{j=1}^{n}{\mathit{\lambda}}_{\mathit{ij}}=1$.

_{ij}### 3.2. Continuous State Space

A continuous state space model can characterize a system without discretizing the data, a step argued to result loss of information. One of the straightforward modeling strategies here is to describe the system evolution as a *Markov Process*, and this gives the *Markov Model* networks.

#### 3.2.1. Markov Model

Dewey *et al*. studied a simple linear Markov model in the form of

in which
${\overrightarrow{g}}_{t}$ denotes the gene expression levels at time *t* , and the matrix *T* is the transition for genes between two time points [60, 61]. Assuming a total of *m* time points, this model can be written in another form

where
${G}_{+1}=\left[{\overrightarrow{g}}_{2}{\overrightarrow{g}}_{3}...{\overrightarrow{g}}_{m}\right]$
and $G=\left[{\overrightarrow{g}}_{1}{\overrightarrow{g}}_{2}...{\overrightarrow{g}}_{m-1}\right],$ and the transition matrix *T* can be calculated as *G*_{+1}*G*^{+}, *G*^{+} being the pseudo-inverse of *G* obtained by using *singular value decomposition* (SVD). They further extended it to include non-linear terms that capture both between-time (contained in *GG ^{T}* ) and between-gene (contained in

*G*

^{T}*G*) correlations by considering the form

*G*

_{+1}=

*T*

_{1}

*G*+

*T*

_{2}

*G*+

^{T}G*GG*

^{T}T_{3}, as well as higher order Markov terms [60].

Holter *et al*. performed SVD [62-64] on the data
$D=\left[{\overrightarrow{g}}_{1}{\overrightarrow{g}}_{2}...{\overrightarrow{g}}_{m}\right]=U\sum {V}^{T}$ and took the first *r* rows in **Σ***V ^{T}*,

*r*being the number of non-zero eigenvalues of

*DD*, as the

^{T}*dominant patterns*, or

*modes*[65]. The temporal expression for each gene is therefore a linear combination of these

*modes*${\overrightarrow{X}}_{i}=\left[{x}_{i,1}{x}_{i,2}...{x}_{i,m}\right],i=1,...,r.$. The linear Markov model they considered is in the form of

The *modes* transition matrix *T* is then estimated by minimizing the divergence of the trajectory from the observed values.

Wiggins *et al*. took a Bayesian approach and rewrote the transition dynamics with added term of Gaussian noise
${\overrightarrow{\mathit{\epsilon}}}_{t}$ as [66]:

where *I* is the identity matrix. Given the biological prior which was modeled in *T* , they were able to derive the posterior probability for *T* and used the expectation of *T* to represent the transition dynamics. They also considered an augmented model which treated latent variables as additional “hidden degrees of freedom” and derived the expected *T* after integrating out these latent variable.

Another variation of Markov model was studied by Li *et al*. in which the transition of gene expression *g _{i,t}* is modeled as power functions:

where *α _{i}* is the rate of transcription and

*β*is the rate of degradation [67, 68]. To find the most-likely structure of the network, the authors used either guided simulated annealing method [67, 69] or Genetic Algorithm [68] to optimize a score function which is essentially the likelihood function with a penalty term that penalizes complex models (complex in the sense that number of parameters being too large). Used on a set of yeast

_{i}*Saccharomyces Cerevisiae*microarray data [10], this procedure recovered 31 out of 42 and 17 out of 22 regulation relationships which are consistent with those found experimentally [67, 68].

#### 3.2.2. State Space Models

State Space Model (SSM) can be viewed as an extension to Markov models, based on the assumption that gene expression levels are hidden states and cannot be directly observed, and are related with the observed values by some transformation. A general linear *SSM with input* takes the form as:

where
$\overrightarrow{g}$ denotes the state of the genes for the system,
$\overrightarrow{y}$
the observed data for
$\overrightarrow{g}$, *T* the state transition matrix, *C* the state to observation matrix, and *A* and *B* are the inputs influence matrices for inputs
${\overrightarrow{u}}_{t}$ and
${\overrightarrow{v}}_{t}$, respectively. Here
${\overrightarrow{\mathit{\epsilon}}}_{g,t}$ and
${\overrightarrow{\mathit{\epsilon}}}_{y,t}$ are white noise terms. If *A* = 0 and *B* = 0 , this is reduced to the *basic linear SSM*, or *standard SSM*.

Rangel *et al*. and Beal *et al*. used *SSM with input* and set the inputs as the observations from previous time point [18, 70]. Specifically, their system is described as:

Notice the above equation can be rewritten as

Therefore the matrix *T*′ = *CA* + *B* captures the transition in the *observation* domain over time, through the hidden states
${\overrightarrow{g}}_{t}$, and the authors focused their interests in *T*′. Rangel *et al*. estimated the model parameters using EM algorithm and constructed confidence interval on *T*′ by using bootstrap, while Beal *et al*. used what they called *Variational Bayesian EM Algorithm*, which can be considered as a Bayesian extension of the standard EM algorithm, to derive a posterior estimation on *T*′.

Both Hirose *et al*. and Yoshida *et al*. argued that the dimension *k* of the hidden states
${\overrightarrow{g}}_{t}$, which they called “modules”, is less than the number of dimension for observations
${\overrightarrow{y}}_{t}$ [71, 72] . Using *standard SSM* and assuming the noise term
${\overrightarrow{\mathit{\epsilon}}}_{y,t}$ has a diagonal covariance matrix *R* and some other constraints, the authors carefully designed a projection transformation matrix *H* such that the denoised observation vector
${\overrightarrow{y}}_{t}^{\mathrm{\diamond}}={R}^{-1/2}\left({\overrightarrow{y}}_{t}-{\overrightarrow{\mathit{\epsilon}}}_{y,t}\right)$ can be projected to lower dimension
$k\left({\overrightarrow{g}}_{t}=H{\overrightarrow{y}}_{t}^{\mathrm{\diamond}}\text{to be exact}\right)$ and the system follows a dynamic as described in:

where *T _{M}* and
${{\overrightarrow{\mathrm{\epsilon}}}_{{g}^{\mathrm{\diamond}}}}_{,t}$ are appropriately transformed from

*T*and ${\overrightarrow{\mathit{\epsilon}}}_{g,t}$, respectively. All module-module interactions are presented in the transition matrix

*T*. Yoshida

_{M}*et al*. further argued that the network structure may not be the same over time, and they proposed what they called “Markov switching” [71]. In essence, this is an inhomogeneous SSM, where at each time point the system is allowed to change its structure. They put the inference in a Bayesian framework, and introduced an additional vector of hidden variables which served as indicators of whether the system is switching at each time points. Posterior distribution of model parameters were estimated by using Gibbs sampling. Hirose

*et al*. on the other hand, did a comparison study with the models in the work by Rangel

*et al*. and Beal

*et al*. [18, 70] and investigated the benefits of using multiple replicates of time course data.

Kasabov *et al*. also used the *standard SSM* and proceeded to estimate *T* using Kalman filter, and constructed the network from two sets of human leukemic cell line data, which have 32 pre-selected genes and 4 time points [73]. They also showed the potential application in larger data sets with more genes by implementing Genetic Algorithm (fitness function evaluated by using Kalman Filter estimated likelihood) for gene selection. The validation of the networks is rather weak, however, as they merely showed that the observed data fall on the estimated trajectories for four of the genes they selected for the network.

#### 3.2.3. Ordinary Differential Equations

Ordinary differential equations can be used to describe the gene products, e.g., mRNA and protein, and their interactions. A general form of an *n* -node ODE system can be written as:

where
$\overrightarrow{g}\left(t\right)$ is the gene product concentrations of *n* nodes,
$f\left(\overrightarrow{g}\left(t\right),t\right)$ the regulation function, and *S(t )* the external stimulus. For genetic regulation network, the most popular ODE model is the linear time-invariant model:

where *R* is an *n* × *n* matrix denoting the direct regulation among *n* nodes, and
$\overrightarrow{s}$ a *k* × 1 constant vector whose effects on *n* nodes are intermediated by the *n* × *k* constant matrix *W* . In practice the available data are sampled at limited time points, so most methods actually solve *Difference Equations* like

In reality, noise, whether significant or slight, is always present in the observed raw data. Some approaches explicitly incorporate the noise into the system as error terms ${\overrightarrow{\mathit{\epsilon}}}_{t}$:

Note that this form closely resembles Eqn. (12) in Markov model.

For any node *i* , its regulation is defined by the *i* -th row of *R* , i.e.,
${R}_{i}=\left[{r}_{\mathit{i1}},...,{r}_{\mathit{in}}\right]$. The signs of
$\left[{r}_{\mathit{i1}},...,{r}_{\mathit{in}}\right]$ determine the network structure: for node *j* , *r _{ij}* ≠ 0 means it is an input of node

*i*, and depending on whether

*r*is positive or negative, node

_{ij}*j*activates or inhibits node

*i*, respectively. If

*r*≠ 0, then the node

_{ii}*i*is self-regulated. The actual values of [

*r*,...,

_{i1}*r*] determine the production/degradation rates of node

_{in}*i*. For most algorithms, the objective is to estimate

*R*, especially the network structure. Furthermore, they normally decouple the problem and work on the regulation of one gene at a time based on ${\mathit{dg}}_{i}\left(t\right)/\mathit{dt}={\text{}R}_{i}\overrightarrow{g}\left(t\right)+{W}_{i}\overrightarrow{s}\left(t\right)$.

Taking
$\overrightarrow{s}\equiv 0$, Chen *et al*. provided two estimation solutions of *R* [74]. One solution used Fourier transform. It assumed that several cell cycles are observed, and the system is stable enough to be approximated by
$\overrightarrow{g}\left(t\right)=Q{e}^{t\mathit{\lambda}}$, where *Q* is constant matrix and *λ* the eigenvalues of *R* . Assuming that each node's input set size is small and fixed, the other solution used Minimal Weight Solutions to Linear Equations [75] to solve the difference equations.

Instead of fixing input set size, de Hoon *et al*. allowed the network to have input sets of different sizes [76]. They assumed that the error terms are normally distributed so log-likelihood of proposed network can be calculated. The search for the network of maximal log-likelihood was regulated by Akaike's Information Criterion (AIC): AIC = -2[log-likelihood] + 2[number of parameters]. The approach limited the size of nodes it can handle to be smaller than the number of time points. Bourque and Sandkoff preferred a forward search approach [77]. For gene *i* , the fitness of its input set is measured by the sum of squared errors (SSE). To add a new input to the existing input set, the decrease of SSE must pass F-test at given significance level. The authors also extended the inference to multiple related networks, where the fitness includes not only SSE, but also evolution cost, which is the sum of pairwise symmetric differences of all networks. Chan *et al*. put connection constraints on linear model through sparse Bayesian learning [78]. In this approach, SSE is regulated by a parameter magnitude function, in which the magnitude of each connection *r _{ij}* is weighted by a hyperparameter. The optimization is conducted in a recursive way where the regulation matrix

*R*and hyperparameters are estimated alternately.

The linear assumption limits the range of networks the model can emulate. To add nonlinearity to the model, Perkins *et al*. introduced a hybrid model by coupling the concentration value of each node with its logical state [79]. The approach normalizes the concentration to a range of [0, 1], discretizes the production rate to either on or off, and fixes the degradation rate. By this means, the regulation function *f _{i}* of gene

*i*is reduced to a logical function plus the degradation term.

Another popular non-linear differential equation model is the S-system form, where the production/degradation rate is a product of power-law functions:

where *α _{i}* and

*β*are rate constants,

_{i}*u*and

_{ij}*v*are kinetic orders. The earlier work of Akutsu

_{ij}*et al*. was based on qualitative modeling and used linear programming which can only determine log

*α*- log

_{i}*β*and

_{i}*u*-

_{ij}*v*[55]. Later work like Marino and Voit used Levenberg-Marquardt method to find the parameters that minimize SSE [80]. Daisuke and Horton used distributed genetic algorithm to overcome the local minima [81]. The algorithm is seeded with networks that follow the scale-free property and results in multiple candidates that are later aggregated to determine the network structure. Novikov and Barillot converted differential equations into integral equations :

_{ij}

where the *r _{ij}* and

*s*are modeled by nonlinear time-variant kernel functions: polynomial, exponential or delta-function models [82]. Similar to Bourque and Sandkoff [77], the network is inferred by using forward search, but with

_{i}*χ*criterion value replacing F-test's p-value.

^{2}Recurrent Neural Network (RNN) is another popular model that can capture the nonlinear dynamics of various systems. Xu *et al*. used the model of the following form:

where *σ*(.) is the sigmoid function, *b _{i}* the bias term,

*λ*the self-degradation rate, and

_{i}*τ*the time constant [83]. The network is learned with particle swarm optimization [84]. Busch

_{i}*et al*. studied the gene regulation network of the migration of human skin keratinocytes with RNN [25]. The model used in their work follows the form

where Δ*τ _{j}* is the time delay associated with gene

*j*. The network is learned with Genetic Algorithm.

Cavelier and Anastassiou pursued a hybrid of linear and nonlinear functions, where linear functions are used for translation and nonlinear functions for transcription [85]. The authors considered three nonlinear functions of increasing complexity: sigmoid-type function, Hill function and thermodynamically derived function. The authors further assumed that the network structure is available through prior knowledge in the literature, so the algorithm focuses on the estimation of the production/degradation parameters through the so-called evolution strategies. Another way to add nonlinearity is through time-delay, as described by Kim *et al*. [86]:

where *ε _{ij}* is a noise term.

Rather than inferring the exact chemical kinetic equations regulating every node, Sontag *et al*. estimated the influence of node *i* on the node *j* by measuring the change of node *i* production rate relative to the change of node *j* concentration at any give network status [87]. To do so, they proposed an experiment protocol in which a series of perturbations is applied and the unperturbed and perturbed time-series data are compared and evaluated: for an *n* -gene network, the protocol needs about *n*^{2} perturbations with perfect measurement to fully infer all the influences. Bansal *et al*. also proposed an inference scheme based on perturbation [22]. The difference equation is now written as

where *I* is the identity matrix,
$G=\left[{\overrightarrow{g}}_{1},{\overrightarrow{g}}_{2},...\right]$
, ${G}_{+1}=\left[{\overrightarrow{g}}_{2},{\overrightarrow{g}}_{3},...\right]$, and
$S=\left[{\overrightarrow{s}}_{1},{\overrightarrow{s}}_{2},...\right]$ The network structure is then estimated through the dominated singular values of the PCA decomposition of
$\left[\begin{array}{c}G\\ S\end{array}\right]$. With PCA decomposition, this approach can handle a lot more genes with limited observations.

Arguing that the accuracy of single best regulation parameter set is prone to error due to limited data, Nam *et al*. aggregated the most likely regulator sets through voting [88]. Similarly, Kim *et al*. used noise injection to improve inference robustness [89]. In their model, *r _{ij}* is a time-varying function

*r*(t) =

_{ij}*α*sin (

_{ij}*ωt*+

*φ*) +

_{ij}*β*, where

_{ij}*α*,

_{ij}*ω*,

*Φ*and

_{ij}*β*are unknown parameters. The effects of gene

_{ij}*j*on gene

*i*are assessed on the 2D trajectory map of {

*g*(

_{i}*t*),

*r*(

_{ij}*t*)

*g*(

_{j}*t*)}. By injecting random noise to the original data to generate several slightly different data sets, only the connections that shows certain degree of stability across all data sets are picked.

## 4. PERFORMANCE EVALUATION

With the abundance of proposed models and inference algorithms, it is essential to have a validation protocol so that the merit of each proposed method can be assessed, and guideline can be established in aiding practitioners to choose the right modeling and inference procedure. Validation therefore should be considered as an impartible part of the complete inference scheme.

### 4.1. Distance Measure

Ideally one would compare the inferred network *N _{I}*, with ‘ground truth’ network

*N*, the one from which data used for inference are derived. The validation is given quantitively in the form of the distance measure

_{G}*D*(

*N*,

_{I}*N*), as shown in Fig. (

_{G}**2a2a**). Two critical issues arise from here, as discussed below.

#### 4.1.1. Lack of ‘Ground Truth’

Although *N _{G}* as the underlying biological network is the ultimate golden standard that the inferred network should be evaluated against, unfortunately it is almost guaranteed that we will not be blessed with this knowledge for any biological system of non-trivial size (as a matter of fact, we lack the knowledge for even trivial-sized system, for example, the three gene oscillating network of

*E. Coli*[90]). To circumvent this problem, two common approaches have been used.

The first approach is to use ‘partial ground truth’, which is most often in the form of regulation relationships gleaned from literature. This is demonstrated in Fig. (**2b2b**), in which
$D\left({N}_{I},{N}_{G}^{P}\right)$ can be thought as an approximation of *D*(*N _{I}*,

*N*). This approach has being widely utilized. For example: Hirose

_{G}*et al*. found genes in each “module” are related to the same molecular function according to Gene Ontology (GO) [72]; in addition to GO, Dai

*et al*. also searched

*Saccharomyces Genome Database*to find relationships which are consistent to their findings; both Kim

*et al*. and Novikov

*et al*. compared their yeast cell cycle pathway networks with those selected from KEGG (Kyoto Encyclopedia of Genes and Genomes) [47, 82], and Kim

*et al*. further compared with the metabolic pathway reported by DeRisi

*et al*. [9]. Although versatile and bearing immediate biological interpretation, this approach is limited by the thoroughness and accuracy of reports from literature, and subject to the bias in mining the literature.

The second approach is the use of synthetic network generator, so we know every aspect of the underlying network ${N}_{G}^{S}$. Illustrated in Fig. (**2c2c**), the synthetic network ${N}_{G}^{S}$ serves as a surrogate of the original *N _{G}*, and ideally, should mimic the real biological system. In reality, ${N}_{G}^{S}$ can only model a subset of the properties in

*N*and this leads to a inherent bias where certain class or classes of inference methods could be favored unintentionally. Setting up the synthetic network that is free of this bias is not trivial, but sometimes this obstacle can be overcome by carefully aligning synthetic network properties with those in the study objectives.

_{G}One immediate advantage of using ${N}_{G}^{S}$ is that we can easily study the properties of the proposed inference algorithms. As an example, robustness property can be analyzed by perturbing and adding noise to the network. These learned properties often lead to improvement on the inference. For example, Bansal *et al*. randomly generated 100 networks for each of the two sizes, one with 10 gene and 5 time points, and the other with 1000 gene and 10 time points, and added white Gaussian noise of different standard deviations to the generated time series data [22]. The tuning of parameters in inferring network from real data was facilitated by examining the performances on these synthetic network data. Perkins *et al*. fixed input set size for each node for their ODE model, and randomly selected the elements in input node sets and the associated Boolean functions [79]. They were able to derive the number of data points needed to fully infer the network structure and regulation functions in different scenarios. An intensive simulation study on DBN by Yu *et al*. evaluated various combinations of Bayesian scoring metrics, search heuristics, discretization levels, sampling interval, quantity of data and data interpolation using simulated data from a stochastic linear Markov model [91]. As a result, they published a series of guidelines for various factors in DBN learning. The benefits associated with using synthetic network in validating inference algorithms make it a practical and fruitful choice as surrogate of *N _{G}*.

#### 4.1.2. Choice of Criterion

When comparing two networks *N _{I}* and

*N*,

_{G}*D*(

*N*,

_{I}*N*) can take one of many possible forms. For example, the most commonly used measures $D\left({N}_{I},{N}_{G}^{P}\right)$ for structures are

_{G}*true positive*(TP),

*true negative*(TN),

*false positive*(FP),

*false negative*(FN), or some derivation from them (e.g.,

*Receiver Operating Characteristic*, or

*ROC*curves). Dougherty has proposed a list of “semi-metrics” as validation measures for goodness of the inference algorithms [92]. The list can be grown to accommodate different objectives of studies. It is important to notice that there doesn't exist a one-size-fit-all criterion that works as a universal validation measure due to two prominent reasons: (1) the goal and focus vary widely from study to study, and the criterion has to be chosen to be consistent with the objective of the study. For example, if the purpose of the experiment is to discover pairwise gene-gene interactions, then a proper measure could be to compare the difference in connections (either directed or undirected) in

*N*and

_{I}*N*. Or if the interest is in how system evolves, a trajectory-based measure may be more appropriate [92]. (2) Inference algorithms are typically designed for a certain subtype of models, which in turn are proposed for some specific aims of the study. Using a validation measure which is more in line with the same goal will inevitably bias favorably towards these methods. This makes comparing across models particularly difficult, and is part of the reason that we see very few comparative studies on inference methodology. Even when such studies are carried out, they are practically limited to one type of models [93].

_{G}### 4.2. Comparative Studies

Some efforts have been made to address the above issues. Brun *et al*. proposed a steady-state trajectory based metric between networks that is independent of nature of networks [94], hence has immediate application in comparative studies. Trajectory of each gene of the network is decomposed into a transient part and a steady-state part, the latter of which could be either constant or periodic, and assessed with an *amplitude cumulative distribution* [95]. *D*(*N _{I}*,

*N*) is therefore the average (across all genes) of distances, which are computed as some norm between

_{G}*amplitude cumulative distributions*. This metric could be useful if it is the steady-state behavior of the network that is of interest.

For comparison studies, Hartemink suggested that DBN seems to work better than Bayesian Network [2]. Note the author conceded that this is hardly a conclusion due to the “different properties” of the data used for inference. A more detailed study by Werhli *et al*. [93] compared the Relevance Network and Bayesian Network models, as well as the Graphical Gaussian Model. The last model is based on the assumption that genes are multivariate Gaussian distributed and the *partial correlation*, calculated on the estimated covariance matrix (through some stabilizing techniques), describes the correlation between two genes. In addition to a *Raf* signaling pathway protein expression data, they run the study using two synthetic data generators (one linear and the other nonlinear) so ${N}_{G}^{S}$ is known. The evaluation was carried out using 2 criteria: ROC, and comparison of TP given fixed FP value, both of which are based on directed or undirected *edges*, or connections, in the networks *N _{I}* and ${N}_{G}^{S}$. Though the edges carry different meanings in these three different network models, the validation is appropriate in a broader ‘regulation relation’ sense.

Comparing networks of different natures has been attempted by Bansal *et al*. [96]. Three inference procedures were chosen: BANJO (Bayesian network) [97], ARACHE (relevance network) [98], and NIR [99] and MNI [100] (ODE), along with hierarchical clustering and *random* inference which served as references. The choice of these methods was due partially to the availability of their software code. Data were generated from a linear ODE model, and merits of inference were evaluated on the networks using *positive predictive value* (PPV), a ratio of TP and TP+FP, and *sensitivity*, a ratio of TP and TP+FN. Note PPV and *sensitivity* are also referred to as *precision* and *recall* and used by other researchers [101, 102] . It is very interesting to notice the lackluster performances, as demonstrated by being not far from the random method, for all three inference algorithms, particularly on time-series data, using the authors' model setups.

An applaudable endeavor by the DREAM initiative team during the past couple of years allows researchers to validate their networks *N _{I}* inferred from the data, which are generated from the

*N*the team provides, in a competitive setting [102].

_{G}*N*is in the form of pairwise regulation relationships, each tagged with a confidence probability. Much like the work of Werhli

_{I}*et al*. [93], each participant's inferred network is evaluated against

*N*using the area-under-ROC-curve, but with an additional area-under-prediction-versus-recall criterion. It is worth noting that both of these scores are still structure based, and it is expected that the team will have measures that target on the dynamics of inferred networks in the future.

_{G}### 4.3. Validation by New Experiments

As a completely different paradigm, the validation can be done experimentally, and the protocol usually runs like this:

- Inference of network from experiment data;
- Prediction of certain response using the inferred model;
- Verification using new experiment.

If the confirmation from new experiment comes back as negative, the model is expected to *learn* from the error and revised. The feedback helps improving the inference procedure, but from a pure validation point of view, this protocol is particularly useful if we are interested in biological network intervention, where the exact correctness of the inferred network structure is of less importance; the ultimate criterion there is whether it can successfully predict the response given a perturbation. In practice the experimental data typically are the responses to selective or systematic perturbation (e.g., stimuli like starvation or drugs), or system behaviors after gene knockout/knock down, and are well suited for network intervention studies.

Such approach has already been taken by researchers. In the study of keratinocyte migration [25], the authors used gene *ptgs2* as migration indicator and built a *recurrent neural network* (RNN) model with nine genes to predict the migration behavior upon hepatocyte growth factor stimulus. They were able to follow it by *in vitro* experiment and the discrepancy with *in silico* predictions helped them build a second RNN model that is more consistent with experimental findings.

## 5. CONCLUSION

We have reviewed the modeling and inference of Gene Regulatory Network (GRN) from time-series data, categorized into those focused on structure, or those on both structure and dynamics, the latter of which is further bifurcated into discrete or continuous space models. The richness of proposed methods calls for comparative performance studies that can be used to establish merit of each inference procedure and appropriateness for a given application. These studies as we see are still lacking despite some modest efforts, due to the fact that ‘ground truth’ network required by various validation schemes is not readily available, and it is not always easy to find a criterion that can effectively evaluate a motley of model types which all have different design goals. On the other hand, the validation can be carried out experimentally where the requirement for ‘ground truth’ network is relaxed. This is especially useful in perturbation-response experiments, and the feedback from new experiments will aid the inferences as well. This survey gives a panoramic view on these topics, with anticipation that the readers will be inspired to improve and/or expand GRN inference and validation tool repository.

## 6. ACKNOWLEDGEMENTS

This study was funded by Translational Genomics Research Institute (TGen) institutional support (CS and JH) and partially by NIH LM009706-01 (SJ).

## REFERENCES

**Bentham Science Publishers**

## Formats:

- Article |
- PubReader |
- ePub (beta) |
- PDF (2.9M) |
- Citation

- Inference of Gene Regulatory Networks with Variable Time Delay from Time-Series Microarray Data.[IEEE/ACM Trans Comput Biol Bioinform. 2013]
*Elbakry O, Ahmad MO, Swamy MN.**IEEE/ACM Trans Comput Biol Bioinform. 2013 Jun 19; . Epub 2013 Jun 19.* - A review of modeling techniques for genetic regulatory networks.[J Med Signals Sens. 2012]
*Yaghoobi H, Haghipour S, Hamzeiy H, Asadi-Khiavi M.**J Med Signals Sens. 2012 Jan; 2(1):61-70.* - Gene regulatory network inference: data integration in dynamic models-a review.[Biosystems. 2009]
*Hecker M, Lambeck S, Toepfer S, van Someren E, Guthke R.**Biosystems. 2009 Apr; 96(1):86-103. Epub 2008 Dec 27.* - Learning gene regulatory networks from gene expression measurements using non-parametric molecular kinetics.[Bioinformatics. 2009]
*Aijö T, Lähdesmäki H.**Bioinformatics. 2009 Nov 15; 25(22):2937-44. Epub 2009 Aug 25.* - Discovering functions and revealing mechanisms at molecular level from biological networks.[Proteomics. 2007]
*Zhang S, Jin G, Zhang XS, Chen L.**Proteomics. 2007 Aug; 7(16):2856-69.*

- System-wide analysis of the transcriptional network of human myelomonocytic leukemia cells predicts attractor structure and phorbol-ester-induced differentiation and dedifferentiation transitions[Scientific Reports. ]
*Sakata K, Ohyanagi H, Sato S, Nobori H, Hayashi A, Ishii H, Daub CO, Kawai J, Suzuki H, Saito T.**Scientific Reports. 58283* - Gene network inference using continuous time Bayesian networks: a comparative study and application to Th17 cell differentiation[BMC Bioinformatics. ]
*Acerbi E, Zelante T, Narang V, Stella F.**BMC Bioinformatics. 15(1)387* - Network Reconstruction Using Nonparametric Additive ODE Models[PLoS ONE. ]
*Henderson J, Michailidis G.**PLoS ONE. 9(4)e94003* - CaSPIAN: A Causal Compressive Sensing Algorithm for Discovering Directed Interactions in Gene Networks[PLoS ONE. ]
*Emad A, Milenkovic O.**PLoS ONE. 9(3)e90781* - Modeling the Drosophila Gene Cluster Regulation Network for Muscle Development[PLoS ONE. ]
*Haye A, Albert J, Rooman M.**PLoS ONE. 9(3)e90285*

- CompoundCompoundPubChem chemical compound records that cite the current articles. These references are taken from those provided on submitted PubChem chemical substance records. Multiple substance records may contribute to the PubChem compound record.
- MedGenMedGenRelated information in MedGen
- PubMedPubMedPubMed citations for these articles
- SubstanceSubstancePubChem chemical substance records that cite the current articles. These references are taken from those provided on submitted PubChem chemical substance records.

- Inference of Gene Regulatory Networks Using Time-Series Data: A SurveyInference of Gene Regulatory Networks Using Time-Series Data: A SurveyCurrent Genomics. 2009 Sep; 10(6)416

Your browsing activity is empty.

Activity recording is turned off.

See more...