- We are sorry, but NCBI web applications do not support your browser and may not function properly. More information

- Journal List
- Bioinformatics
- PMC2642641

# Differential dependency network analysis to identify condition-specific topological changes in biological networks

^{1}Huai Li,

^{2}Rebecca B. Riggins,

^{3}Ming Zhan,

^{2}Jianhua Xuan,

^{1}Zhen Zhang,

^{4}Eric P. Hoffman,

^{5}Robert Clarke,

^{3}and Yue Wang

^{1,}

^{*}

^{1}Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203,

^{2}Bioinformatics Unit, RRB, National Institute on Aging, National Institutes of Health, Baltimore, MD 21224,

^{3}Lombardi Comprehensive Cancer Center and Department of Oncology, Physiology and Biophysics, Georgetown University, Washington, DC 20057,

^{4}Department of Pathology, Johns Hopkins Medical Institutions, Baltimore, MD 21231 and

^{5}Research Center for Genetic Medicine, Children's National Medical Center, Washington, DC 20010, USA

## Abstract

**Motivation:** Significant efforts have been made to acquire data under different conditions and to construct static networks that can explain various gene regulation mechanisms. However, gene regulatory networks are dynamic and condition-specific; under different conditions, networks exhibit different regulation patterns accompanied by different transcriptional network topologies. Thus, an investigation on the topological changes in transcriptional networks can facilitate the understanding of cell development or provide novel insights into the pathophysiology of certain diseases, and help identify the key genetic players that could serve as biomarkers or drug targets.

**Results:** Here, we report a differential dependency network (DDN) analysis to detect statistically significant topological changes in the transcriptional networks between two biological conditions. We propose a local dependency model to represent the local structures of a network by a set of conditional probabilities. We develop an efficient learning algorithm to learn the local dependency model using the Lasso technique. A permutation test is subsequently performed to estimate the statistical significance of each learned local structure. In testing on a simulation dataset, the proposed algorithm accurately detected all the genes with network topological changes. The method was then applied to the estrogen-dependent T-47D estrogen receptor-positive (ER+) breast cancer cell line datasets and human and mouse embryonic stem cell datasets. In both experiments using real microarray datasets, the proposed method produced biologically meaningful results. We expect DDN to emerge as an important bioinformatics tool in transcriptional network analyses. While we focus specifically on transcriptional networks, the DDN method we introduce here is generally applicable to other biological networks with similar characteristics.

**Availability:** The DDN MATLAB toolbox and experiment data are available at http://www.cbil.ece.vt.edu/software.htm.

**Contact:** ude.tv@gnaweuy

**Supplementary information:** Supplementary data are available at *Bioinformatics* online.

## 1 INTRODUCTION

Recent advances in high-throughput genomic technologies provide ample opportunities to study cellular activities at the individual gene expression and network levels, while also presenting new challenges for data analysis (Clarke *et al.*, 2008). Discovering the mechanisms that orchestrate the activities of genes and proteins in cells remains one of the key goals of systems biology studies (Kitano, 2002). Several approaches have been proposed to model genetic regulatory networks (Li *et al.*, 2008), such as Bayesian networks (Friedman, 2004; Friedman *et al.*, 2000; Husmeier, 2003), probabilistic Boolean networks (Shmulevich *et al.*, 2002), state–space models (Rangel *et al.*, 2004) and network component analysis (Liao *et al.*, 2003). These methods attempt to construct a static network that can explain various gene regulation programs.

However, genetic regulatory networks are context-specific and dynamic in nature (Beyer *et al.*, 2007; Clarke *et al.*, 2008). Under different conditions, different regulatory components and mechanisms are activated and the topology of the underlying gene regulatory network changes accordingly. For example, in response to diverse conditions in the yeast, transcription factors alter their interactions and rewire the signaling networks (Luscombe *et al.*, 2004). While the inference of transcriptional networks using data from composite conditions could sometimes be contradictory due to changes in the underlying topology, most network learning algorithms assume an invariant network topology (Friedman *et al.*, 2000; Rangel *et al.*, 2004; Shmulevich *et al.*, 2002). Therefore, some methods have been presented to learn condition-specific transcriptional networks in yeast (Kim *et al.*, 2006; Segal *et al.*, 2003). It is important to focus on and examine the topological changes in transcriptional networks between disease and normal conditions or under different stages of cell development. For example, a deviation from normal regulatory network topology may reveal the mechanism of pathogenesis (Hood *et al.*, 2004), and the genes that undergo the most network topological changes may serve as biomarkers or drug targets.

Several methods have been proposed to utilize network topology information to carry out various bioinformatics tasks. Liu *et al.* (2006) introduced a topology-based cancer classification method, where correlation networks were first constructed and later used to perform classification. Fuller *et al.* (2007) developed weighted gene co-expression network analysis strategies, via single network analysis and differential network analysis, to identify physiologically relevant modules. Qiu *et al.* (2005, 2007) proposed an ensemble dependence model to detect the dependence changes of gene clusters between cancer and normal conditions for cancer classification, and further extended the dependence model to dependence networks. Wei and Li (2007) introduced a Markov random field model for network-based analysis of genomic data that utilizes the known pathway structures to identify differentially expressed genes and sub-networks.

In this article, we propose a differential dependency network (DDN) analysis to model and detect the statistically significant topological changes in transcriptional networks between two conditions. We use local dependency models to characterize the dependencies of genes in the network and represent local network structures. Local dependency models decompose the whole network into a series of local networks, which serve as the basic elements of the network used for statistical testing. Unlike other dependency models that consider only pairwise relationships (Choi *et al.*, 2005; Fuller *et al.*, 2007; Kostka and Spang, 2004; Watson, 2006) or binding triples (Qiu *et al.*, 2007), the local dependency models select the number of dependent variables automatically by the Lasso method (Tibshirani, 1996), and thereby learn the local network structures. Subsequently, we perform permutation tests on the local dependency models under two conditions and assign the *P*-values to the local structures. It may seem straightforward to construct an entire network under each condition and compare the differences between the two networks (Fuller *et al.*, 2007; Qiu *et al.*, 2007). However, in realistic applications this approach runs into the difficulty that the network structure learning can be inconsistent with a limited number of data samples. The detection procedure proposed here assures the statistical significance of the detected network topological changes by performing a permutation test on individual local structures. We also pinpoint ‘hot spots’ in the network where the genes exhibit network topological changes between two conditions above a given significance level. Lastly, we extract and visualize the DDN, i.e. the sub-network showing significant topological changes. We demonstrate the usefulness of the proposed method on both simulated and real microarray data. Tested on a simulation dataset, the proposed algorithm accurately captured the genes with network topological changes. When applied to the estrogen-dependent T-47D estrogen receptor-positive (ER+) breast cancer cell line datasets and human and mouse embryonic stem cell (ESC) datasets, the DDN analysis obtained biological meaningful and promising results.

## 2 METHODS

### 2.1 Local dependency models

Given a set of random variables **X**={*X*_{1},*X*_{2},…,*X*_{M}}, a dependency network for X is modeled by a set of local conditional probability distributions, one for each node given its parents, denoted as **Z**_{i}, which satisfies

where **X**_{−i}={*X*_{1},*X*_{2},…,*X*_{i−1},*X*_{i+1},…,*X*_{M}} and **Z**_{i}**X**_{−i}. *P*(*X*_{i}|**Z**_{i}) also represents the local structure of node *X*_{i}, i.e. the relationship of node *X*_{i} and its parents **Z**_{i} on the graph (Heckerman *et al.*, 2000).

Inspired by this formulation, we propose a local dependency model to describe the dependencies of genes in a transcriptional network. Unlike a conventional dependency network approach, where there is only one conditional probability distribution for each node given its parents, our local dependency model allows more than one conditional probability distributions for each node. Mathematically, suppose there are *M* genes in the network of interest, and the dependencies of gene *i* on other genes are formulated by a set of conditional probabilities,

where **Z**_{i,1},**Z**_{i,2},…,**Z**_{i,si} are some subsets of **X**_{−i} and *s*_{i} is the number of conditional probabilities for random variable *X*_{i}. We use *X*_{i} to refer both to the expressions of gene *i* and to its corresponding node on the graph. This modification is primarily based on the following considerations. First, our goal is not to construct the entire network that represents the full joint distribution of all variables, rather we wish to model the local structures for further statistical testing. Second, many genes are highly correlated and the data points are very limited when extracting most biological networks. Through our experiments, we found that the conventional approach misses some meaningful dependency connections in data-sparse situations. For example, regulator genes R1 and R2 have the same target gene A, and the expression patterns of R1, R2 and A are highly correlated. When the data points are few, the standard approach may only select one of the dependencies, for instance, gene A on gene R1, even though the dependency of gene A on gene R2 is only slightly less significant than the dependency of gene A on gene R1. However, the dependencies of gene A on genes R1 and R2 are both important, and we want to keep the rich structural information for later step to assess the topological changes. Therefore, to retain more meaningful local structure information, instead of selecting ‘the best’ local structure, we select a set of ‘sufficiently good’ local structures for further statistical testing. We achieve this goal by allowing each node to be modeled by more than one conditional probability distribution.

### 2.2 Local structure learning

The conditional probability distributions in Equation (2) can be inferred by regression methods. In our approach, we consider a linear regression model in which the variable *X*_{i} is predicted by a linear function of **Z**_{i}

where **Z**_{i}{**Z**_{i,1},**Z**_{i,2},…,**Z**_{i,si}} is a column vector of random variables, **β** is a column vector of unknown parameters. The random error _{i} is independent of **Z**_{i} and is assumed to have normal distribution *N*(0,σ_{i}^{2}). The local conditional probability *P*(*X*_{i}|**Z**_{i}) is given by

Learning the structure of the local dependency model requires the selection of a **Z**_{i} that shows good predictability of *X*_{i}. Given a predefined maximum size of **Z**_{i}, *K*, we examine all *C*_{M−1}^{K} combinations of the elements in **X**_{−i} with size *K*. *K* can be empirically set to a positive integer between 1 and *M*−1. When *K*=1, the proposed local dependency model only considers pairwise relationships. When *K*=*M*−1, the proposed local dependency model is equivalent to standard dependency networks as described in Equation (1) (Heckerman *et al.*, 2000).

Suppose one *K*-combination of **X**_{−i} is {*X*_{k1},*X*_{k2},…,*X*_{kK}}, where *k*_{1},*k*_{2},…,*k*_{K}{1,2,…,*i*−1,*i*+1,…,*M*}, and there are *N* expression samples. Lower case letter *x*_{i}(*j*) denotes the *j*-th sample value taken by the variable *X*_{i}, *j*=1,2,…,*N*. We perform a *L*_{1} constrained regression of *X*_{i} on **Z**_{i}={*X*_{k1},*X*_{k2},…,*X*_{kK}}

Equation (5) is known as the Lasso estimator (Tibshirani, 1996), which minimizes *L*_{2} norm loss with constraint on the *L*_{1} norm of **β**. The nature of *L*_{1} constraint tends to make some coefficients in exactly zero, and hence it automatically selects a subset of features and leads to a simpler model that avoids overfitting the data, and therefore usually has better generalization performance. The parameter *t*≥0 controls the amount of shrinkage that is applied to the estimates. In our software implementation, parameter *t* is determined by 5-fold cross-validation. Solving Equation (5) is a convex optimization problem, and can be solved very efficiently. We adopt the least angle regression (LARS) method to solve this problem. The detailed procedure of LARS can be found in Efron *et al.* (2004).

We also use a prescreening strategy to release the computational burden. We first regress *X*_{i} on **Z**_{i}={*X*_{k1},*X*_{k2},…,*X*_{kK}}, using the ordinary least square method

If the corresponding mean square error (MSE) is above a predetermined threshold *T*, which means *X*_{i} cannot be accurately predicted by the subset {*X*_{k1},*X*_{k2},…,*X*_{kK}}, the subset {*X*_{k1},*X*_{k2},…,*X*_{kK}} will be discarded. If the MSE is below *T*, we will then perform the *L*_{1} constrained regression of *X*_{i}.

We perform the above prescreening and local structure learning with the Lasso on each of *K*-combinations of **X**_{−i}, and obtain predictor sets **Z**_{i,1},**Z**_{i,2},…,**Z**_{i,si} and the conditional probability distributions _{i}={*P*(*X*_{i}|**Z**_{i,1}),*P*(*X*_{i}|**Z**_{i,2}),…,*P*(*X*_{i}|**Z**_{i,si})} for node *X*_{i}.

To measure how well variables **Z**_{i} can predict *X*_{i}, or how well the local dependency model fits gene expression microarray data, we further introduce the definition of coefficient of determination (COD)

where *var*{·} is the estimate of the variance of the random variable and *f*_{Xi|Zi} (·) is the best function in a given function class that minimizes the residual variance. COD has been successfully used in non-linear signal processing and probabilistic Boolean network inference (Shmulevich *et al.*, 2002). Here we only use linear functions, and var{*X*_{i}−*f*_{Xi|Zi}(**Z**_{i})} is an estimate of σ_{i}^{2} in Equation (4).

### 2.3 Detection of statistically significant topological changes

To detect the statistically significant network topological changes between two experimental conditions, we assume there are *M* genes in the network of interest, and *N*_{1} samples from condition 1 and *N*_{2} samples from condition 2. We further denote the datasets from two conditions by **D**^{(m)}=[**x**^{(m)}(1),**x**^{(m)}(2),…,**x**^{(m)}(*N*_{m})], where superscript (*m*) indicates condition *m*, *m*= 1, 2. The bold face lower case letter **x**^{(m)}(*j*) denotes the column vector [*x*_{1}^{(m)}(*j*),*x*_{2}^{(m)}(*j*),…,*x*_{M}^{(m)}(*j*)]^{T}, where lower case letter *x*_{i}^{(m)}(*j*) denotes the *j*-th sample value taken by variable *X*_{i} under condition *m*.

By applying the learning procedure to datasets **D**^{(1)} and **D**^{(2)}, respectively, we obtain _{i}^{(1)}={*P*(*X*_{i}|**Z**_{i,1}^{(1)}),*P*(*X*_{i}|**Z**_{i,2}^{(1)}),…,*P*(*X*_{i}|**Z**_{i,si(1)}^{(1)})} under condition 1 and _{i}^{(2)}={*P*(*X*_{i}|**Z**_{i,1}^{(2)}),*P*(*X*_{i}|**Z**_{i,2}^{(2)}),…,*P*(*X*_{i}|**Z**_{i,si(2)}^{(2)})} under condition 2 for each node *i*, *i*=1,2,…,*M*. Then we take the union of the local structures learned under two conditions

for further statistical testing.

For each conditional probability distribution in _{i}, *i*=1,2,…,*M*, for instance, *P*(*X*_{i}|**Z**_{i})_{i}, we perform a permutation test to assess how significantly it is different between two conditions. Given samples {[*x*_{i}^{(1)}(*j*^{(1)}),**z**_{i}^{(1)}(*j*^{(1)})]^{T}, *j*^{(1)}=1,2,…,*N*_{1}} under the first condition and {[*x*_{i}^{(2)}(*j*^{(2)}),**z**_{i}^{(2)}(*j*^{(2)})]^{T}, *j*^{(2)}=1,2,…,*N*_{2}} under the second condition, we calculate COD^{(1)} and COD^{(2)}, using Equation (7). A test statistic is defined by the absolute difference of the coefficients of determination under two conditions

We want to test the null hypothesis, *H*_{0}, of no difference between *P*^{(1)}(*X*_{i}|**Z**_{i}) and *P*^{(2)}(*X*_{i}|**Z**_{i}). We first combine {[*x*_{i}^{(1)}(*j*^{(1)}),**z**_{i}^{(1)}(*j*^{(1)})]^{T}, *j*^{(1)}=1,2,…,*N*_{1}} and {[*x*_{i}^{(2)}(*j*^{(2)}),**z**_{i}^{(2)}(*j*^{(2)})]^{T},*j*^{(2)}=1,2,…,*N*_{2}}, and then randomly permute samples from two conditions and divide the data into two sets of *N*_{1} and *N*_{2} samples, respectively. We perform the above procedure *B* times, where *B* is set to 5000 in our software implementation, and calculate , *b*=1,2,…,*B* according to Equation (9). An estimate of the achieved significance level (ASL) of the test is

where the random variable is generated by permutation and denotes the indicator function, which takes 1 when and 0 otherwise. The smaller the value of ASL, the stronger the evidence against *H*_{0} is. Equation (10) also is an estimate of the *P*-value. The detailed permutation procedure is described in Efron and Tibshirani (1993). This detection procedure is performed on every local structure in _{i}, *i*=1,2,…,*M*, and each local structure is assigned a *P*-value.

### 2.4 Identification of the ‘hot spots’ in the network and extraction of the DDN

Given a user-defined *P*-value cutoff, we obtain a set of statistically significant differential local structures. The nodes in these differential local structures are identified as ‘hot spots’ in the network, which are the genes undergoing topological changes defined by a specified significance level. These genes may correspond to the genes in disease- or process-related pathways.

DDN is the focused sub-network that exhibits the topological changes. We consider a connection to exist from each element in **Z**_{i} to *X*_{i} under one specific condition if the variance of *P*(*X*_{i}|**Z**_{i}) is below the user-defined threshold *T* for that condition (see Supplementary Material for discussions on the selection of *T*). We use different colors to represent connections appearing under different conditions. DDN provides a way to visualize the topological changes, and when applied to disease studies, DDN extracts and focuses on the disease-related pathways that may contribute to the understanding of the mechanism of the disease.

## 3 RESULTS

### 3.1 A simulation experiment

We first used the software SynTReN (Van den Bulcke *et al.*, 2006) to generate one simulation dataset of a sub-network drawn from an existing signaling network in *Saccharomyces cerevisiae*. Then we changed part of network topology and used SynTReN to generate another dataset according to this modified network. The network topology under two conditions is shown in Figure 1. The network contains 20 nodes that represent 20 genes. The black lines indicate the regulatory relationships that exist under both conditions. The red and green lines are the regulatory relationships that only exist under conditions 1 and 2, respectively. The sub-network comprised of nodes MBP1_SWI6, CLB5, CLB6, PHO2, FLO1, FLO10 and TRP4 and green and red lines is the DDN that our algorithm tries to identify from expression data.

**...**

The parameters for our algorithm are: threshold *T* is 0.25, *P*-value cutoff is 0.01 and the maximum size of **Z**_{i}, *K*, is 2. Table 1 presents the ‘hot spots’ identified by the DDN analysis. Table 1 also shows the fold-changes of individual genes (after base 2 logarithm), and *P*-values of *t*-tests of individual genes. Our algorithm picked up all genes involved in topological changes, including some genes that did not show a significant difference in fold-change or *t*-tests, such as CLB6, FLO1 and MBP1_SWI6. This indicates that our algorithm can successfully detect these interesting genes using their topological information, even though the means of their expressions did not change substantially between the two conditions. Therefore, this method is able to identify biomarkers that cannot be picked up by traditional gene ranking methods, providing a complimentary approach for biomarker identification problem.

Figure 2 shows the DDN between the two conditions extracted by the proposed algorithm. The DDN shows network topological changes and the genes involved therein. The red lines in Figure 2 represent the connections that exist only under condition 1, and the green lines represent the connections that exist only under condition 2. Compared with the known network topology shown in Figure 1, the proposed algorithm correctly identified and extracted all the nodes with topology changes and 9 of 10 differential connections, with only the connection between PHO2 and TRP4 under condition 1 falsely missed, and the connection between PHO2 and SWI4 under condition 1 and the connection between MBP1_SWI6 and SWI4 under condition 2 falsely detected.

### 3.2 Breast cancer dataset analysis

We applied our method to the dataset from an ER+ breast cancer cell study by Lin *et al.* (2004). In that dataset, the estrogen-dependent T-47D ER+ breast cancer cell line was treated with 17β-estradiol (E2) and with E2 in combination with the pure anti-estrogen ICI 182 780 (ICI, Faslodex, Fulvestrant). Samples were then harvested on an hourly basis for the first 8 h (0–8 h) and bi-hourly for the next 16 h (10–24 h) for a total of 16 time points under each condition. Experiments were performed on microarrays generated by spotting the Compugen 19 K human oligo library, made by Sigma-Genosys, on poly-L-lysine-coated glass slides.

Here we are interested in the cellular response to the drug ICI, which inhibits E2 signaling through the ER (Howell, 2006). We first selected 55 genes that are reported in the literature to be relevant to breast cancer and responsiveness to ICI (for example, Kuo, 2007; Riggins *et al.*, 2005, 2007). We then applied our DDN analysis to the data under two conditions (E2 versus E2+ICI). The parameters in our algorithm are: threshold *T* is 0.25, *P*-value cutoff is 0.01 and *K* is 2.

Table 2 lists the genes that exhibit significant topological changes in the network identified by DDN analysis. The DDN under these two conditions is shown in Figure 3. The genes identified by the proposed algorithm and their expression results (Table 2) are consistent with published data. For example, XBP1 and BCL2 show strongly decreased expression in response to E2+ICI relative to E2 alone, and both of these genes are known to be induced by E2 (Gompel *et al.*, 2000; Tozlu *et al.*, 2006; Wang *et al.*, 2004).

**...**

In Figure 3, there are 18 red connections in the DDN, which implies that these connections exist only under E2 condition and disappear after the addition of ICI. Since ICI 182 780 is an ER antagonist, which works both by downregulating and by degrading the ER-alpha, it is plausible that these connections disappear because ICI is blocking or inactivating their connections. For example, as a transcription factor, XBP1 can directly regulate gene expression through binding to its response element (Iwakoshi *et al.*, 2003), or it can act as a co-regulator of other transcription factors, most notably ER-alpha, to enhance their transcriptional activity (Ding *et al.*, 2003; Fang *et al.*, 2004). Because BCL2 contains response elements for both ER-alpha and XBP1 (Gomez *et al.*, 2007; Somai *et al.*, 2003), the connection between XBP1 and BCL2 in the DDN may either be direct or involve ER-alpha as a latent variable, or intervening gene. In direct support of this predicted edge, we have shown that constitutive overexpression of XBP1 in a different breast cancer cell line (MCF-7) led to significantly increased mRNA and protein expression of both ER-alpha and BCL2 and functionally conferred antiestrogen resistance and estrogen-independence (Gomez *et al.*, 2007).

Novel relationships between these genes identified by our DDN analysis will also serve as useful guidance for future studies. For example, BCAR3 is a well-established effector of cell motility, estrogen independence and antiestrogen resistance in ER+breast cancer cell lines (Felekkis *et al.*, 2005; Riggins *et al.*, 2003; Schrecengost *et al.*, 2007; Van Agthoven *et al.*, 2006). Expression of NFKB2 and its activator BCL3 are also associated with estrogen independence in breast cancer cell lines (Pratt *et al.*, 2003), and these nuclear factor κ B subunits appear to be selectively activated in clinical breast cancer (Pratt *et al.*, 2003). However, there is no experimental evidence linking BCAR3 with NFKB2, so the suggestion that these two genes exhibit differential dependence under E2-treated conditions (Fig. 3) provides a starting point for biological studies of their relationship.

Additional relationships that may be completely new to breast cancer are also identified by this method. For example, MAPK8 (also known as JNK1) has been shown to be activated by BIRC1 (also known as NAIP) during its inhibition of caspase-mediated cell death (Sanna *et al.*, 2002). In chronic fatigue syndrome, growth factor receptor signaling can activate MAPK4, which via Ras and/or PI3K can subsequently increase AKT1 activity (Englebienne and Meirleir, 2002). And finally, in B cells from patients with chronic lymphocytic leukemia NFKB1 (p50) homodimers are able to stimulate transcription from the BCL2 promoter through binding to another member of the BCL family (BCL3) (Viatour *et al.*, 2003).

### 3.3 Human and mouse ESC analysis

ESCs can either maintain their pluripotency by self-renewal or undergo differentiation. The molecular mechanisms controlling ESC self-renewal and differentiation are complex and poorly understood (Sun *et al.*, 2006; Zhan, 2008). ESCs harvested from different species show common characteristics, yet significant differences exist. Thus, cross-species analysis may help to distinguish between fundamental and species-specific mechanisms regulating ESC development (Sun *et al.*, 2007; Zhan *et al.*, 2005). Network biology can provide a new avenue for exploring ESC biology (Barabasi and Oltvai, 2004). Here, we used our new algorithm to conduct a human–mouse comparative analysis of ESCs, identifying evolutionarily divergent sub-networks. We focused our analysis on the cell cycle, a critical process for controlling cell development. In this study, 58 cell-cycle genes were selected for the DDN analysis. The 58 genes are the core components of the cell cycle machinery, and are orthologous between human and mouse cells. The expression profile data for these genes were determined from 18 samples from human ESCs and their earliest differentiation counterparts, embryoid bodies (EBs) and 18 samples from mouse ESCs and EBs, so that our inferred networks were directly related to ESC differentiation. The human ESC and EB expression data were determined from BG01, BG02 and BG03 cell lines in our previous studies using Illumina's BeadArrays (Liu *et al.*, 2006), and from H1 (Sato *et al.*, 2003) and HES2 (E-MEXP-303 of the ArrayExpress database) cell lines using Affymetrix chips. The mouse ESC and EB expression data were determined from V6.5 ({"type":"entrez-geo","attrs":{"text":"GSE3231","term_id":"3231"}}GSE3231 of GEO database), R1 ({"type":"entrez-geo","attrs":{"text":"GSE2972","term_id":"2972"}}GSE2972) and J1 ({"type":"entrez-geo","attrs":{"text":"GSE3749","term_id":"3749"}}GSE3749) cell lines, based on Affymetrix chips. The final datasets contained 9 ESC and 9 EB (14-day differentiated) samples from human and mouse cells, respectively. In the network analysis, we set *K* to 1, and threshold *T* to 0.2 and *P*-value cutoff to 0.01.

Figure 4 shows DDN of the cell cycle between human and mouse cells (see Supplementary Material for gene annotations). The red lines represent the gene connections in human, and the green lines represent the connections in mouse. As shown, CDC25C, DUSP1 and BUB1 exhibit high connectivity on the network of human cells. On the other hand, PLK1, CDK2AP1, CDC20, TFDP1 and CDC5L showed a high connectivity on the network of mouse cells. These results suggest evolutionary divergence across species during ESC development and may provide clues for insights into species-specific mechanism of the cell cycle in controlling ESC self-renewal and differentiation.

## 4 DISCUSSIONS

In this article, we propose a systematic approach to detect the statistically significant changes in transcriptional networks between two different experimental conditions. We tested our algorithm on simulation data, breast cancer data and ESC data. From the simulation study, we see that the proposed algorithm can capture the topological changes efficiently and accurately, even when the fold change of the expression values of each gene is not statistically significant. This approach utilizes the network structure information and provides an alternative way for biomarker identification. In addition, as knowledge of cellular networks accumulates, many biological databases will expand to contain more useful information. The proposed approach is an open framework, into which biological knowledge in specific applications can be easily incorporated as the local structure learning constraints.

The high level of correlation among genes is a common feature of microarray data. Therefore, we propose a local dependency model that allows multiple predictor sets for each node. Accordingly, a local structure learning algorithm is also represented. Lasso is used to select features for the predictor sets (Tibshirani, 1996), an approach that has been successfully applied to variable selection and graph structure learning (Meinshausen and Buhlmann, 2006). In the linear Gaussian case, under certain conditions, it is proved that the probability of estimating the correct neighborhood converges exponentially to 1, and as a consequence it is possible to obtain a consistent estimation of the full edge set (Meinshausen and Buhlmann, 2006). However, in microarray data, the so-called irrepresentable condition (Zhao and Yu, 2006) or the neighborhood stability assumption (Meinshausen and Buhlmann, 2006) can easily be violated in the presence of highly correlated genes. Some modified algorithms have been proposed to deal with the highly correlated cases, for example, elastic net (Zou and Hastie, 2005) and network-constrained regularization (Li and Li, 2008), both of which tend to group highly correlated predictors in the regression process. However, these two approaches are not suitable for our problem, because the grouping of highly correlated variables can be different under two conditions and this makes the later statistical testing problematic. The local structure learning algorithm proposed here attempts to alleviate the effects of the highly correlated gene expression data and to preserve local structure information for further statistical testing.

Some issues are worth further exploration. In this article, only linear relationships are considered. How non-linear relationships should be modeled efficiently and correctly, remains a difficult problem. Second, since many cellular reactions take place in the genome, transcriptome and proteome, it is essential to construct pathways by integrating data from heterogeneous sources.

In sum, this article presents a new approach to extract knowledge of a biological network by emphasizing the dynamic nature of cellular networks and utilizing a network's structural information. It also provides an alternative and promising approach to identify possible biomarkers and drug targets.

*Funding*: National Institutes of Health (CA109872, EB000830, CA096483, CA86323 and NS29525, partial); Department of Defense Breast Cancer Research Program {"type":"entrez-nucleotide","attrs":{"text":"BC030280","term_id":"20988043"}}BC030280. IRP/NIA/NIH (to H.L. and M.Z.).

*Conflict of Interest*: none declared.

## REFERENCES

- Barabasi AL, Oltvai ZN. Network biology: understanding the cell's functional organization. Nat. Rev. Genet. 2004;5:101–113. [PubMed]
- Beyer A, et al. Integrating physical and genetic maps: from genomes to interaction networks. Nat. Rev. Genet. 2007;8:699–710. [PMC free article] [PubMed]
- Choi JK, et al. Differential coexpression analysis using microarray data and its application to human cancer. Bioinformatics. 2005;21:4348–4355. [PubMed]
- Clarke R, et al. The properties of high-dimensional data spaces: implications for exploring gene and protein expression data. Nat. Rev. Cancer. 2008;8:37–49. [PMC free article] [PubMed]
- Ding LH, et al. Ligand-independent activation of estrogen receptor alpha by XBP-1. Nucleic Acids Res. 2003;31:5266–5274. [PMC free article] [PubMed]
- Efron B, Tibshirani R. An Introduction to the Bootstrap. New York: Chapman & Hall; 1993.
- Efron B, et al. Least angle regression. Ann. Stat. 2004;32:407–451.
- Englebienne P, Meirleir K. Chronic Fatigue Syndrome: A Biological Approach. Boca Raton: CRC Press; 2002.
- Fang Y, et al. XBP-1 increases ER alpha transcriptional activity through regulation of large-scale chromatin unfolding. Biochem. Biophys. Res. Commun. 2004;323:269–274. [PubMed]
- Felekkis KN, et al. AND-34 activates phosphatidylinositol 3-kinase and induces anti-estrogen resistance in a SH2 and GDP exchange factor-like domain-dependent manner. Mol. Cancer Res. 2005;3:32–41. [PubMed]
- Friedman N. Inferring cellular networks using probabilistic graphical models. Science. 2004;303:799–805. [PubMed]
- Friedman N, et al. Using Bayesian networks to analyze expression data. J. Comput. Biol. 2000;7:601–620. [PubMed]
- Fuller TF, et al. Weighted gene coexpression network analysis strategies applied to mouse weight. Mamm. Genome. 2007;18:463–472. [PMC free article] [PubMed]
- Gomez BP, et al. Human X-Box binding protein-1 confers both estrogen independence and antiestrogen resistance in breast cancer cell lines. FASEB J. 2007;21:4013–4027. [PubMed]
- Gompel A, et al. Hormonal regulation of apoptosis in breast cells and tissues. Steroids. 2000;65:593–598. [PubMed]
- Heckerman D, et al. Dependency networks for inference, collaborative filtering, and data visualization. J. Mach. Learn. Res. 2000;1:49–75.
- Hood L, et al. Systems biology and new technologies enable predictive and preventative medicine. Science. 2004;306:640–643. [PubMed]
- Howell A. Pure oestrogen antagonists for the treatment of advanced breast cancer. Endocr. Relat. Cancer. 2006;13:689–706. [PubMed]
- Husmeier D. Sensitivity and specificity of inferring genetic regulatory interactions from microarray experiments with dynamic Bayesian networks. Bioinformatics. 2003;19:2271–2282. [PubMed]
- Iwakoshi NN, et al. The X-box binding protein-1 transcription factor is required for plasma cell differentiation and the unfolded protein response. Immunol. Rev. 2003;194:29–38. [PubMed]
- Kim H, et al. Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae. BMC Bioinformatics. 2006;7:165. [PMC free article] [PubMed]
- Kitano H. Systems biology: a brief overview. Science. 2002;295:1662–1664. [PubMed]
- Kostka D, Spang R. Finding disease specific alterations in the co-expression of genes. Bioinformatics. 2004;20:i194–i199. [PubMed]
- Kuo MT. Breast Cancer Chemosensitivity. Berlin: Springer; 2007. Roles of multidrug resistance genes in breast cancer chemoresistance; pp. 23–30. [PubMed]
- Li C, Li H. Network-constrained regularization and variable selection for analysis of genomic data. Bioinformatics. 2008;24:1175–1182. [PubMed]
- Li H, et al. Inferring regulatory networks. Front. Biosci. 2008;13:263–275. [PubMed]
- Liao JC, et al. Network component analysis: Reconstruction of regulatory signals in biological systems. Proc. Natl Acad. Sci. USA. 2003;100:15522–15527. [PMC free article] [PubMed]
- Lin CY, et al. Discovery of estrogen receptor alpha target genes and response elements in breast tumor cells. Genome Biol. 2004;5:R66. [PMC free article] [PubMed]
- Liu CC, et al. Topology-based cancer classification and related pathway mining using microarray data. Nucleic Acids Res. 2006;34:4069–4080. [PMC free article] [PubMed]
- Liu Y, et al. Genome wide profiling of human embryonic stem cells (hESCs), their derivatives and embryonal carcinoma cells to develop base profiles of U.S. Federal government approved hESC lines. BMC Dev. Biol. 2006;6:20. [PMC free article] [PubMed]
- Luscombe NM, et al. Genomic analysis of regulatory network dynamics reveals large topological changes. Nature. 2004;431:308–312. [PubMed]
- Meinshausen N, Buhlmann P. High-dimensional graphs and variable selection with the Lasso. Ann. Stat. 2006;34:1436–1462.
- Pratt MAC, et al. Estrogen withdrawal-induced NF-kappa B activity and Bcl-3 expression in breast cancer cells: roles in growth and hormone independence. Mol. Cell. Biol. 2003;23:6887–6900. [PMC free article] [PubMed]
- Qiu P, et al. Ensemble dependence model for classification and prediction of cancer and normal gene expression data. Bioinformatics. 2005;21:3114–3121. [PubMed]
- Qiu P, et al. Dependence network modeling for biomarker identification. Bioinformatics. 2007;23:198–206. [PubMed]
- Rangel C, et al. Modeling T-cell activation using gene expression profiling and state-space models. Bioinformatics. 2004;20:1361–1372. [PubMed]
- Riggins RB, et al. Synergistic promotion of c-Src activation and cell migration by Cas and AND-34/BCAR3. J. Biol. Chem. 2003;278:28264–28273. [PubMed]
- Riggins RB, et al. Vitamins and Hormones - Advances in Research and Applications. Vol. 71. San Diego: Elsevier Academic Press Inc; 2005. Antiestrogens, aromatase inhibitors, and apoptosis in breast cancer; pp. 201–237.
- Riggins RB, et al. Pathways to tamoxifen resistance. Cancer Lett. 2007;256:1–24. [PMC free article] [PubMed]
- Sanna MG, et al. IAP suppression of apoptosis involves distinct mechanisms: the TAK1/JNK1 signaling cascade and caspase inhibition. Mol. Cell. Biol. 2002;22:1754–1766. [PMC free article] [PubMed]
- Sato N, et al. Molecular signature of human embryonic stem cells and its comparison with the mouse. Dev. Biol. 2003;260:404–413. [PubMed]
- Schrecengost RS, et al. Breast cancer antiestrogen resistance-3 expression regulates breast cancer cell migration through promotion of p130(Cas) membrane localization and membrane ruffling. Cancer Res. 2007;67:6174–6182. [PMC free article] [PubMed]
- Segal E, et al. Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nat. Genet. 2003;34:166–176. [PubMed]
- Shmulevich I, et al. Probabilistic Boolean networks: a rule-based uncertainty model for gene regulatory networks. Bioinformatics. 2002;18:261–274. [PubMed]
- Somai S, et al. Antiestrogens are pro-apoptotic in normal human breast epithelial cells. Int. J. Cancer. 2003;105:607–612. [PubMed]
- Sun Y, et al. Mechanisms controlling embryonic stem cell self-renewal and differentiation. Crit. Rev. Eukaryot. Gene Expr. 2006;16:211–231. [PubMed]
- Sun Y, et al. Cross-species transcriptional profiles establish a functional portrait of embryonic stem cells. Genomics. 2007;89:22–35. [PMC free article] [PubMed]
- Tibshirani R. Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B Methodol. 1996;58:267–288.
- Tozlu S, et al. Identification of novel genes that co-cluster with estrogen receptor alpha in breast tumor biopsy specimens, using a large-scale real-time reverse transcription-PCR approach. Endocr. Relat. Cancer. 2006;13:1109–1120. [PubMed]
- Van Agthoven T, et al. Functional identification of genes causing estrogen independence. Breast Cancer Res. Treat. 2006;100:S37–S37.
- Van den Bulcke T, et al. SynTReN: a generator of synthetic gene expression data for design and analysis of structure learning algorithms. BMC Bioinformatics. 2006;7:43. [PMC free article] [PubMed]
- Viatour P, et al. NF-kappa B2/p100 induces Bcl-2 expression. Leukemia. 2003;17:1349–1356. [PubMed]
- Wang DY, et al. Identification of estrogen-responsive genes by complementary deoxyribonucleic acid microarray and characterization of a novel early estrogen-induced gene: EEIG1. Mol. Endocrinol. 2004;18:402–411. [PubMed]
- Watson M. CoXpress: differential co-expression in gene expression data. BMC Bioinformatics. 2006;7:509. [PMC free article] [PubMed]
- Wei Z, Li H. A Markov random field model for network-based analysis of genomic data. Bioinformatics. 2007;23:1537–1544. [PubMed]
- Zhan M. Genomic studies to explore self-renewal and differentiation properties of embryonic stem cells. Front Biosci. 2008;13:276–283. [PubMed]
- Zhan M, et al. Conservation and variation of gene regulation in embryonic stem cells assessed by comparative genomics. Cell Biochem. Biophys. 2005;43:379–405. [PubMed]
- Zhao P, Yu B. On model selection consistency of Lasso. J. Mach. Learn. Res. 2006;7:2541–2563.
- Zou H, Hastie T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B Stat. Methodol. 2005;67:301–320.

**Oxford University Press**

## Formats:

- Article |
- PubReader |
- ePub (beta) |
- PDF (249K)

- Network legos: building blocks of cellular wiring diagrams.[J Comput Biol. 2008]
*Murali TM, Rivera CG.**J Comput Biol. 2008 Sep; 15(7):829-44.* - DDN: a caBIG® analytical tool for differential network analysis.[Bioinformatics. 2011]
*Zhang B, Tian Y, Jin L, Li H, Shih IeM, Madhavan S, Clarke R, Hoffman EP, Xuan J, Hilakivi-Clarke L, et al.**Bioinformatics. 2011 Apr 1; 27(7):1036-8. Epub 2011 Feb 3.* - Statistical identification of gene association by CID in application of constructing ER regulatory network.[BMC Bioinformatics. 2009]
*Liu LY, Chen CY, Chen MJ, Tsai MS, Lee CH, Phang TL, Chang LY, Kuo WH, Hwa HL, Lien HC, et al.**BMC Bioinformatics. 2009 Mar 17; 10:85. Epub 2009 Mar 17.* - Inferring regulatory networks.[Front Biosci. 2008]
*Li H, Xuan J, Wang Y, Zhan M.**Front Biosci. 2008 Jan 1; 13:263-75. Epub 2008 Jan 1.* - Revealing biological information using data structuring and automated learning.[Recent Pat DNA Gene Seq. 2010]
*Mohorianu I, Moulton V.**Recent Pat DNA Gene Seq. 2010 Nov; 4(3):181-91.*

- Knowledge-fused differential dependency network models for detecting significant rewiring in biological networks[BMC Systems Biology. ]
*Tian Y, Zhang B, Hoffman EP, Clarke R, Zhang Z, Shih IM, Xuan J, Herrington DM, Wang Y.**BMC Systems Biology. 887* - EDDY: a novel statistical gene set test method to detect differential genetic dependencies[Nucleic Acids Research. 2014]
*Jung S, Kim S.**Nucleic Acids Research. 2014 Apr; 42(7)e60* - Preclinical Magnetic Resonance Imaging and Systems Biology in Cancer Research: Current Applications and Challenges[The American Journal of Pathology. 2013]
*Albanese C, Rodriguez OC, VanMeter J, Fricke ST, Rood BR, Lee Y, Wang SS, Madhavan S, Gusev Y, Petricoin EF III, Wang Y.**The American Journal of Pathology. 2013 Feb; 182(2)312-318* - New network topology approaches reveal differential correlation patterns in breast cancer[BMC Systems Biology. ]
*Bockmayr M, Klauschen F, Györffy B, Denkert C, Budczies J.**BMC Systems Biology. 778* - Endoplasmic reticulum stress, the unfolded protein response, and gene network modeling in antiestrogen resistant breast cancer[Hormone molecular biology and clinical inve...]
*Clarke R, Shajahan AN, Wang Y, Tyson JJ, Riggins RB, Weiner LM, Bauman WT, Xuan J, Zhang B, Facey C, Aiyer H, Cook K, Hickman FE, Tavassoly I, Verdugo A, Chen C, Zwart A, Wärri A, Hilakivi-Clarke LA.**Hormone molecular biology and clinical investigation. 2011 Mar; 5(1)35-44*

- GeneGeneGene links
- Gene (nucleotide)Gene (nucleotide)Records in Gene identified from shared sequence links
- GEO ProfilesGEO ProfilesRelated GEO records
- NucleotideNucleotidePublished Nucleotide sequences
- PubMedPubMedPubMed citations for these articles
- TaxonomyTaxonomyRelated taxonomy entry
- Taxonomy TreeTaxonomy Tree

- Differential dependency network analysis to identify condition-specific topologi...Differential dependency network analysis to identify condition-specific topological changes in biological networksBioinformatics. Feb 15, 2009; 25(4)526PMC

Your browsing activity is empty.

Activity recording is turned off.

See more...