• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of plosonePLoS OneView this ArticleSubmit to PLoSGet E-mail AlertsContact UsPublic Library of Science (PLoS)
PLoS One. 2010; 5(11): e13580.
Published online Nov 5, 2010. doi:  10.1371/journal.pone.0013580
PMCID: PMC2974640

A Scale-Free Structure Prior for Graphical Models with Applications in Functional Genomics

Joel S. Bader, Editor

Abstract

The problem of reconstructing large-scale, gene regulatory networks from gene expression data has garnered considerable attention in bioinformatics over the past decade with the graphical modeling paradigm having emerged as a popular framework for inference. Analysis in a full Bayesian setting is contingent upon the assignment of a so-called structure prior—a probability distribution on networks, encoding a priori biological knowledge either in the form of supplemental data or high-level topological features. A key topological consideration is that a wide range of cellular networks are approximately scale-free, meaning that the fraction, An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e001.jpg, of nodes in a network with degree An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e002.jpg is roughly described by a power-law An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e003.jpg with exponent An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e004.jpg between An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e005.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e006.jpg. The standard practice, however, is to utilize a random structure prior, which favors networks with binomially distributed degree distributions. In this paper, we introduce a scale-free structure prior for graphical models based on the formula for the probability of a network under a simple scale-free network model. Unlike the random structure prior, its scale-free counterpart requires a node labeling as a parameter. In order to use this prior for large-scale network inference, we design a novel Metropolis-Hastings sampler for graphical models that includes a node labeling as a state space variable. In a simulation study, we demonstrate that the scale-free structure prior outperforms the random structure prior at recovering scale-free networks while at the same time retains the ability to recover random networks. We then estimate a gene association network from gene expression data taken from a breast cancer tumor study, showing that scale-free structure prior recovers hubs, including the previously unknown hub SLC39A6, which is a zinc transporter that has been implicated with the spread of breast cancer to the lymph nodes. Our analysis of the breast cancer expression data underscores the value of the scale-free structure prior as an instrument to aid in the identification of candidate hub genes with the potential to direct the hypotheses of molecular biologists, and thus drive future experiments.

Introduction

Gene Regulatory Networks and Gene Expression Data

Knowledge of the interactions among genes and gene products that occur within a cell is vital for understanding cellular behavior. These activities are largely a consequence of gene expression, the process whereby genes transcribe signature mRNA molecules that are translated into gene products of numerous kinds and functions. As it happens, genes do not express independently of one another; instead, their activities are coordinated in a complex system of control in which distinguished genes, called transcriptions factors, regulate the expression of other genes via their gene product proxies.

An undirected network An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e007.jpg is a mathematical object consisting of a set of nodes and a set of unordered pairs of nodes called undirected edges. It differs from a directed network, which is also denoted by An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e008.jpg, in that the latter is defined in terms of ordered pairs of nodes known as directed edges. Applying these straightforward abstractions to cellular processes has gained currency throughout the biosciences, so much so that a network mind-set has become a necessary precondition for thinking about systems of gene regulatory interactions. For the purposes of this paper, a gene regulatory network is a directed network in which genes are identified with nodes and regulatory interactions with directed edges. From a purely statistical standpoint, it is best to regard a gene regulatory network as a convenient depiction of the true regulatory interactions of a system that, in reality, must be estimated from data.

Indeed, the network approach toward understanding gene regulatory systems only came to prominence in response to the advent of DNA microarray technology, which makes the profiling of mRNA expression levels for individual genes possible on a genome-wide scale. A typical experiment consists of a library of An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e009.jpg expression profiles, each one a snapshot of the expression levels for An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e010.jpg genes under a different experimental condition. The raw expression profile data is preprocessed and then arranged by row in an An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e011.jpg data matrix, An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e012.jpg. In practice, not only is gene expression data notoriously noisy [1], but to make matters worse the number of samples is typically at least an order of magnitude smaller than the number of genes, that is, An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e013.jpg (the “small An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e014.jpg, large An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e015.jpg” problem), making the inference of regulatory interactions a challenging statistical problem [2].

There is now an extensive repertoire of algorithms available for the analysis of gene expression data, the majority of which are based on Boolean networks, differential equations, and graphical models [3]. Some approaches produce estimated gene regulatory networks that are directed networks, while others do not. In this paper, we work primarily with a variety of undirected graphical model known as gene association networks (GANs), in which undirected edges, called gene associations, correspond to certain statistical dependencies that are inferred from gene expression data. Therefore, in an effort to simplify the terminology, the terms “network” and “edge” will be used hereafter to mean undirected network and undirected edge, respectively. Although we will occasionally use the term “network” in a colloquial sense, such as in “network mind-set” or “network approach.” At any rate, the meaning should be clear from context.

Graphical Models

Graphical models [4], [5], [6] are a suite of probabilistic models, widely used for estimating large-scale gene regulatory networks from gene expression data [7]. In this framework, genes are identified with the random variables of a multivariate distribution An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e016.jpg with covariance matrix An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e017.jpg, and each row of An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e018.jpg is taken as a random sample from An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e019.jpg. The conditional independence structure of An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e020.jpg defines a network with the random variables as nodes and conditional dependencies latent between the random variables as directed or undirected edges; a diversity of models arise from the extent to which the dependencies are resolved [8].

Relevance networks comprise the simplest class of graphical model with absent edges corresponding to marginal independencies between the components of An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e021.jpg. These networks have long been used in the analysis of genetic data [9]. But in terms of identifying regulatory interactions, relevance networks are bound to be misleading because marginal independence alone cannot discriminate among direct and indirect dependencies.

GANs provide a better alternative, circumventing this drawback by appealing to conditional independence as a criterion for edge exclusion. Gaussian graphical models (GGMs) are the gold standard. In a GGM, a pair of nodes do not share an edge when their underlying random variables from An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e022.jpg are conditionally independent given all of the remaining random variables. However, GGMs too are not without disadvantages, as their estimation can be computationally intensive in a “small An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e023.jpg, large An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e024.jpg” setting [10]. A class of GANs, bridging the gap between relevance networks and GGMs, has been advanced with this consideration in mind where absent edges are identified with lower order conditional independencies [11], [12], [13].

Lastly, Bayesian networks are a variety of graphical model founded on a more refined notion of conditional independence, conferring directionality to the edges; they are also well-established as a methodology for estimating gene regulatory networks [14].

The Structure Prior

Inference within the graphical modeling paradigm amounts to an often painstaking exercise in covariance estimation and model selection. We defer a discussion on the problem of covariance estimation to the Methods section. That is because our interest pertains to model selection, which in a Bayesian setting is accomplished by sampling from the posterior distribution

equation image
(1)

over the appropriate space of networks using either heuristic searches or else Markov chain Monte Carlo (MCMC). The term An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e026.jpg is the likelihood and An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e027.jpg the structure prior, that is, a prior assigning a probability to each possible network.

The role of the structure prior is to direct inference toward graphical models consistent with biological prior knowledge, which may come in the form of a priori topological considerations or from a posteriori sources apart from the dataset. As far as the latter is concerned, previous research has concentrated on Bayesian networks [15], [16], [17], [18]. On the other hand, biologically-motivated topological assumptions are a consistent feature of graphical models tailored for genetic data. Heuristic search strategies often include implicit assumptions concerning network sparsity [19], [20], [21], [22]. In instances in which the structure prior is given explicit specification, standard practices include using a uniform prior capped at a small number of potential regulators per gene [23], or assigning it as a sparse random network [24], [25].

Random and Scale-Free Networks

The theory of random networks was given its first systematic expression by Erdös and Rényi [26], [27]. According to the theory, a An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e028.jpg-node random network is defined by an eponymous, generating algorithm — the Erdös-Rényi (ER) model — that works by connecting each pair of nodes in an initially empty network independently with probability An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e029.jpg. This simple procedure gives rise to a probability distribution over the space of An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e030.jpg-node networks, which is used to define the so-called random structure prior, An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e031.jpg. The degree distribution An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e032.jpg — the probability that a given node is connected to An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e033.jpg other nodes — of a random network is binomially distributed according to An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e034.jpg where the degree, An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e035.jpg, of a node denotes the number of edges incident upon it. It follows, therefore, that degree in a random network has a strong central tendency, implying that the average degree of a random network is representative of the degree of a typical node.

Empirical studies, however, have firmly established that a wide variety of large-scale networks in nature, society, and technology exhibit heavy-tailed degree distributions that cannot be accounted for by random network theory [28], [29], [30]. This property is often approximately described by a power-law degree distribution, An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e036.jpg, over a large range of An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e037.jpg with exponent An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e038.jpg typically between An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e039.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e040.jpg. A network that follows a power-law is called scale-free. It gets this name because the functional form of An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e041.jpg is retained under a scaling of the argument An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e042.jpg by a constant factor An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e043.jpg: An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e044.jpg. The scale-free property is thought to be a key organizational feature of cellular networks [31], and analyses suggest such an architecture for the gene regulatory networks of the model organisms S. cerevisiae [32] and C. elegans [33].

Introducing a Scale-Free Structure Prior

Proposing a structure prior which incorporates the scale-free property is the topic of this paper. We define the scale-free structure prior, An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e045.jpg, according to the probability of a network under a simple, scale-free network model. As for the underlying network model itself, a multitude of candidates have been proposed in the literature [34]. They fall into two broad categories: 1) growing models, where a network is generated via the successive addition of nodes and edges to a small seed network, and 2) non-growing models, where to a fixed number of nodes, pairs of nodes are chosen randomly and connected by edges.

The growing model approach employs a handful of simple universal mechanisms, thought to underpin disparate natural phenomena, to drive the stochastic evolution of networks toward power-laws. Preferential attachment is, perhaps, the best known mechanism. The idea works something like this: the probability of attaching an edge from a newly added node to a node already in the system is roughly proportional to the degree of the old node. The Bárabasi-Albert (BA) model [35] is the latter-day progenitor of a wide variety of preferential attachment models. The BA model generates a network via the successive addition of nodes and edges to a small seed network. At each step, a node is added to the system with a fixed number of emanating edges, which are subsequently preferential attached to the existing nodes. The resulting network follows a power-law with An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e046.jpg on average.

Preferential attachment is not considered to be the main driving force behind genome evolution; instead, gene (node) duplication and point mutations (edge dynamics) play the dominant role in shaping of gene regulatory networks [36]. The duplication model as formulated by [37] is such a network model, which in an analysis by [38] is suggested to approximately follow a power-law.

By contrast, in the non-growing approach, each node is assigned a fixed weight with the probability of a particular network depending on those weights. The ER model is an example of a non-growing model with uniform weights. Another non-growing model is the static model [39], which is a generalization of the ER model that has been shown to follow a power-law with An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e047.jpg tunable to any value greater than An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e048.jpg, depending on the specification of the model parameters; see Methods for details. We use the static model to define An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e049.jpg. Indeed, this model is an appealing candidate for the purpose as the probability of a network is easy to compute compared with growing models of similar complexity. Moreover, the static model actually includes the ER model as a limiting case.

A New Metropolis-Hastings Sampler for Networks

We implement an MCMC algorithm with An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e050.jpg for GGMs adapted from [25], although it is important to point out that our methodology applies to graphical models in general. Reworking the algorithm is not simply a matter of plugging in a formula for An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e051.jpg because it depends furthermore on a labeling of the nodes of An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e052.jpg. Confronted with this complication, we design a novel Metropolis-Hastings sampler that solves the problem by including a node labeling, An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e053.jpg, which is defined in the Methods section, as a variable in the state space, thereby allowing it to be estimated.

Summary of Contributions

In this paper, we advance a scale-free structure prior, An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e054.jpg, for graphical models defined by the formula for the probability of a network under the static model. Our objective is to compare the performance of this prior with the commonly used random structure prior, An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e055.jpg, in the arena of simulation as well as with a real data example. We choose GGMs for this purpose, modifying the MCMC algorithm of [25] to include An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e056.jpg. As mentioned above, one challenge of implementing An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e057.jpg is that, unlike with An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e058.jpg, it requires a labeling of the nodes of An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e059.jpg. We address this issue by introducing a Metropolis-Hastings sampler that includes the node labeling as a variable in the state space.

In a simulation study, we generate networks with given degree distributions together with Gaussian data in accordance with their implied conditional independence structures. As a case study we show that An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e060.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e061.jpg are equally effective at recovering a random network, but that An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e062.jpg is comparatively ineffective at recovering a scale-free network. In the full simulation study, we confirm that the aforementioned result holds, illustrating our main conclusion: An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e063.jpg recovers random networks on an equal footing with the An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e064.jpg, yet surpasses it in recovering scale-free networks. Finally, we illustrate our methodology by analyzing a real gene expression dataset taken from a breast cancer tumor study by [40], showing that in contrast with the random structure prior, the scale-free structure prior recovers hubs, including the estrogen regulator FOXA1 and the zinc transporter SLC39A6, which was previously unrecognized as a hub.

Methods

Network Notation

The terms “network” and “graph” are used synonymously throughout this paper. An undirected network An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e065.jpg is a mathematical object defined by a set of nodes An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e066.jpg together with a set of undirected edges An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e067.jpg consisting of unordered pairs An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e068.jpg taken from An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e069.jpg, provided that An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e070.jpg. The set of all An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e071.jpg-node, undirected networks is denoted by An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e072.jpg. A directed network is defined in an analogous manner, save that the elements of An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e073.jpg are ordered pairs An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e074.jpg called directed edges; An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e075.jpg is called the parent and An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e076.jpg the child.

It should be understood that a network refers to an undirected network, and likewise an edge is to be understood to mean an undirected edge. However, the following definitions are applicable to both undirected and directed networks. An empty network has no edges, that is An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e077.jpg, while, in a complete network An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e078.jpg is defined as the cross product An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e079.jpg. A subnetwork of An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e080.jpg is a network whose node set An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e081.jpg is a subset of An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e082.jpg, and whose edges are a subset of An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e083.jpg restricted to An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e084.jpg. The subnetwork of An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e085.jpg induced by a given subset of nodes An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e086.jpg is the subnetwork containing all edges from An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e087.jpg that connect nodes in An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e088.jpg. Two nodes are said to be neighbors when they are connected by an edge. And, a network is itself connected when every pair of nodes is connected by a sequence of neighbors. Finally, a node labeling An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e089.jpg is a permutation of the integers An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e090.jpg, applied to the nodes of An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e091.jpg so that each An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e092.jpg is represented by the integer An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e093.jpg; see Figure 1. This node labeling is used later for defining the structure prior.

Figure 1
An example of a node labeling.

Gaussian Graphical Models

In this section we sketch out the theory of GGMs essential to this paper. A detailed overview of the GGM estimation procedures outlined here is described in [25], while [8] is a good starting point for understanding the niche they occupy in the larger context of graphical models.

Let An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e098.jpg be a An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e099.jpg-dimensional Gaussian random vector with zero-mean and positive definite covariance matrix, An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e100.jpg. Two random variables An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e101.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e102.jpg are not conditionally independent given the remaining variables in An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e103.jpg if, and only if, there is a corresponding nonzero entry in the precision matrix, An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e104.jpg [41]. The conditional independence structure of An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e105.jpg can be represented by a network, An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e106.jpg, where An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e107.jpg is the value at node An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e108.jpg and there is an edge between An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e109.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e110.jpg when An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e111.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e112.jpg are not conditionally independent. A GGM for An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e113.jpg is the family of An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e114.jpg-dimensional Gaussian distributions from An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e115.jpg, constrained by the structure of An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e116.jpg.

Fitting a GGM to a given dataset — a task known as covariance selection — amounts to identifying zeros in the estimated precision matrix. In the classical setting when An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e117.jpg, ensuring that An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e118.jpg is positive definite, this is typically accomplished by inverting the estimated covariance matrix and then applying statistical tests to identify any entries significantly different from zero [4]. With genomic data, however, “small An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e119.jpg, large An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e120.jpg” is the norm and, consequently, An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e121.jpg will not generally be invertible.

This problem can be addressed in one of two ways. One way calls for restricting inference to pairwise independencies conditioned on fewer than all An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e122.jpg remaining random variables. A relevance network, for example, is constructed by estimating the pairwise correlations between all random variables, connecting any pair with correlation exceeding a specified cutoff value [9]. A related approach goes one step beyond a relevance network by estimating a GAN based on not only marginal but also first-order conditional independencies [11].

A more ambitious approach is to compute satisfactory small sample estimates for An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e123.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e124.jpg using Bayesian methods. Empirical Bayesian solutions are exemplified by shrinkage estimates [21] and sparsity encouraging lasso regression estimates [22]. Meanwhile, the full Bayesian scheme of [42] works by marginalizing over An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e125.jpg to compute the likelihood term in (1), using a prior that constrains elements of the precision matrix to zero depending on An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e126.jpg:

equation image
(2)

The term An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e128.jpg is multivariate Gaussian, while the prior An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e129.jpg is hyper-inverse Wishart with hyperparameters An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e130.jpg, a positive definite dispersion matrix, and An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e131.jpg, a degrees of freedom parameter. Jones et. al [25] advise a small value for An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e132.jpg as a reflection of ignorance, and take An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e133.jpg as the diagonal matrix An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e134.jpg, which assumes that the underlying Gaussian variables have common variance. A consequence of this assignment is that An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e135.jpg can be used to specify An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e136.jpg by making use of the fact that the marginal prior mode for each variance term is An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e137.jpg.

GGM theory comes equipped with powerful techniques for computing the likelihood function when the underlying network is decomposable. Roughly speaking, a decomposable network can be broken down into distinguished subsets of nodes called maximal cliques. A clique is a subset of nodes whose induced subgraph is complete, and is called maximal when it is not contained within a larger complete subgraph. Computing the likelihood for a subset of An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e138.jpg corresponding to a maximal clique is particularly tractable because the density is just an unrestricted multivariate Gaussian. Hence, when a network is decomposable, the evaluation of the likelihood term in (1) reduces to the computation of many likelihoods of smaller dimension [43]. We will return to these issues in the section on our MCMC implementation.

The Static Model

A network model is a stochastic algorithm for generating networks that may depend on a vector of parameters, An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e139.jpg. Associated with any model is a probability distribution, assigning a probability An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e140.jpg to each An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e141.jpg, where An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e142.jpg is a node labeling of An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e143.jpg.

The static network model [39] works by first assigning a weight An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e144.jpg to each node An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e145.jpg where An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e146.jpg, the Zipf exponent, is a tunable parameter in An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e147.jpg. To generate a network, An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e148.jpg, the following step is repeated An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e149.jpg (An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e150.jpg) times: select nodes An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e151.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e152.jpg with probabilities An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e153.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e154.jpg and connect them with an edge, unless An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e155.jpg or An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e156.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e157.jpg are already connected, in which case no edge is added to the network. The overall model parameter is An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e158.jpg.

In order to work out the functional form of the degree distribution, it is enough to notice that, on average, each node acquires edges in proportion to its weight. Supposing for a moment that An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e159.jpg denotes the degree of node An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e160.jpg, we may write this as An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e161.jpg. The probability distribution over the An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e162.jpg's is known as Zipf's law, and it has been shown to be equivalent to a power-law degree distribution with An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e163.jpg [44]. It follows that the static model generates networks that follow a power-law with An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e164.jpg depending on the choice of An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e165.jpg. A rigorous derivation of the power-law appears in a comprehensive analysis of the static model by Lee et al. [45]. In the case when An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e166.jpg, the exponent, An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e167.jpg, lies between An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e168.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e169.jpg, which is the most interesting range of values from the point of view of scale-free architecture. In contrast, for values of An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e170.jpg, which corresponds to An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e171.jpg, the tail of An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e172.jpg is less pronounced. In the limit of An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e173.jpg, or equivalently An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e174.jpg, each weight An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e175.jpg tends to An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e176.jpg, resulting in the ER model with edge inclusion probability An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e177.jpg. To be clear, the static model actually includes the ER model as a special case.

A formula for the probability of a network is provided in the same analysis. The probability that nodes An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e178.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e179.jpg are connected in the final network is An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e180.jpg, which is well-approximated by An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e181.jpg when An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e182.jpg is large. The probability of a network, then, is given by overall product of the edge inclusion probabilities

equation image
(3)

assuming independence.

A Scale-Free Structure Prior

The structure prior is generically defined as

equation image
(4)

where An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e185.jpg is the probability of a An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e186.jpg-node network, An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e187.jpg, under a certain network model given a vector of parameters, An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e188.jpg, and a node labeling, An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e189.jpg; the summation is over all permutations of An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e190.jpg. It is obvious from the definition that An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e191.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e192.jpg are hyperparameters and must be dealt with accordingly. In the case of An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e193.jpg, each one of the An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e194.jpg possible node labeling assignments is alloted uniform weight. In our work, we additionally impose that An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e195.jpg is a uniform prior, leaving the details to be described below within the contexts of specific network models.

The simplest means of dealing with uncertainty about graphical structures is to assign uniform weight either to each An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e196.jpg, that is, An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e197.jpg, or to the subspace of decomposable networks [43]. This approach is in fact a special case of the probability of a network under the ER network model when An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e198.jpg. A related approach uses a prior that distributes probability mass uniformly according to the number of edges as opposed to individual networks [42]. More recently, the ER model has been explicitly employed as a structure prior [24], [25]. The random structure prior, An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e199.jpg, is formally defined by

equation image
(5)

where An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e201.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e202.jpg is the number of possible edges. A node labeling would be superfluous due to symmetry. To foster sparsity An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e203.jpg may be fixed at An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e204.jpg so that the expected number of edges comes out to be An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e205.jpg when (5) is taken over all networks. Although strictly speaking the value will be somewhat lower in the decomposable case [25]. The approach taken in this paper goes one step further by simply taking An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e206.jpg as uniform over the unit interval.

As explained above, the static model with parameter An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e207.jpg is a generalization of the ER model that is accommodating to scale-free topologies. We define the the scale-free structure prior, An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e208.jpg, according to the probability of a network under (3). The static model has two parameters, An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e209.jpg, and they are not exactly independent as the domain of An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e210.jpg is a function of An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e211.jpg. This means that the prior An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e212.jpg must actually be treated as the product of An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e213.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e214.jpg. We take each term to be uniform over its respective domain.

MCMC Implementation

MCMC algorithms are commonly used for sampling from high-dimensional probability distributions such as those encountered in modern bioinformatic applications [46], [47], [48]. In this section, we describe a Metropolis-Hastings sampling scheme for updating the state variables An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e215.jpg, An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e216.jpg, and An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e217.jpg. Our main interest is in inference for the posterior An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e218.jpg. We take the approach of estimating the target distribution

equation image
(6)

with An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e220.jpg and then marginalize over An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e221.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e222.jpg to obtain An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e223.jpg. In the process, any An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e224.jpg (An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e225.jpg) or An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e226.jpg (An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e227.jpg) can be estimated from a histogram of values, constructed from an MCMC chain. While methodology for sampling from An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e228.jpg is well-established for GGMs [43], [42], [25], the concept of including An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e229.jpg as a state space variable is new to our work. In principle, it is possible to marginalize over all permutations of An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e230.jpg at each step, that is, An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e231.jpg. This approach, however, quickly becomes unfeasible as the number of nodes becomes large. What is more, very few assignments for An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e232.jpg actually capture scale-free network structure, making the marginalization difficult to estimate by random sampling. Instead, we include An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e233.jpg in the MCMC directly. We describe a Metropolis-Hastings sampler for An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e234.jpg below, and provide an implementation in C computer code for decomposable GGMs, built largely on the work of [25].

Metropolis-Hastings Sampler

Updating An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e235.jpg

The space of decomposable graphs can be traversed by adding or deleting a single edge in the transition from a current network, An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e236.jpg, to a proposed network, An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e237.jpg [49]. In an arrangement of this sort, An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e238.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e239.jpg will have nearly identical maximal cliques, leading to extensive cancellation in the likelihood ratio An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e240.jpg [42]. This coupled with the closed form expressions for (2) in the decomposable case, results in considerable computational savings in comparison with the same computations for non-decomposable models. However, in the transition from An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e241.jpg to An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e242.jpg, special care is required to preserve decomposability. To that end, a theorem of [43] provides easily verifiable, necessary and sufficient conditions to determine whether or not a network is decomposable. In their implementation [25], a transition is accomplished by first deciding to either add or delete an edge to An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e243.jpg by the flip of a coin. Next the appropriate move is made at random to obtain An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e244.jpg as shown in Figure 2. If An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e245.jpg happens to be non-decomposable, then it is rejected outright.

Figure 2
An example of the Metropolis-Hastings transition step.

Updating An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e263.jpg

Each hyperparameter An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e264.jpg (An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e265.jpg) is updated as follows: select a value for An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e266.jpg uniformly from An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e267.jpg for a given step size An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e268.jpg, rejecting when An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e269.jpg falls outside its domain An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e270.jpg.

Updating An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e271.jpg

In order to obtain An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e272.jpg we select an integer An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e273.jpg at random, find nodes An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e274.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e275.jpg such that An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e276.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e277.jpg, and then exchange the values of An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e278.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e279.jpg; see Figure 2.

Network and Parameter Estimation

Estimating An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e280.jpg

An MCMC sample of the posterior (1) becomes increasingly threadbare as the number of variables grow, so much so that the frequency of a network in a chain is an inadequate approximation to its true probability, even for problems of moderate dimension. So too for the maximum posterior network — the single most probable network in a chain — unless its probability mass dominates a possibly multi-modal landscape, comprising a near-infinity of alternative models, its status as a representative estimator is questionable [50]. This is even more important in our implementation, as we carry the model parameters through the computation. Alternatively, a more representative estimator can be pursued by exploiting marginal probabilities of edge inclusion, which do reflect posterior density. We took our estimated network to be the network of all edges in the sample with marginal probability greater than An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e281.jpg, which we denote by An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e282.jpg; the subscript An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e283.jpg denotes the structure prior.

Estimating An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e284.jpg

Let An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e285.jpg denote the An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e286.jpg'th value of An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e287.jpg (An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e288.jpg) in an MCMC chain of length An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e289.jpg. An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e290.jpg is estimated by averaging over the values in an MCMC sample so that An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e291.jpg.

Results

Simulation Design

We carried out a simulation study in order to evaluate the relative performance of the random and scale-free structure priors. In our experiments, we generated trees invested with a variety degree distributions that can be thought of as falling along a spectrum ranging from binomial to scale-free on through to more extreme heavy-tail forms, called crumple trees, culminating finally with a star tree. For each tree, we generated multivariate Gaussian data under the assumption that a tree represents the true underlying conditional independence structure of a GGM. We then ran our Metropolis-Hastings sampler for both structure priors in an effort to recover each true tree from the data.

Data generation

In order to simulate trees we more or less relied on the stochastic algorithm of [51]. Their approach rests on specifying a formula for the degree distribution, An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e292.jpg, for a An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e293.jpg-node connected tree. Then, roughly speaking, they use MCMC to draw a tree that is maximally random under An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e294.jpg.

The reason for restricting our simulation to trees is that data satisfying their implied conditional independence structures can be generated by a simple iterative procedure. With this end in mind, it is convenient to imagine the edges as being directed according to index so that an edge from An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e295.jpg to An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e296.jpg implies that An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e297.jpg. The procedure begins with simulating An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e298.jpg, which is identified with node An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e299.jpg, as a standard normal random variable, An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e300.jpg. Next, any An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e301.jpg corresponding to a child of An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e302.jpg is simulated as An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e303.jpg. The step An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e304.jpg is then repeated from parent An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e305.jpg to child An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e306.jpg until all nodes have been reached. The scaling factor, An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e307.jpg, ensures that each An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e308.jpg has unit variance.

Performance measures

Let An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e309.jpg (true positive) denote the number of edges correctly identified by the estimated network with An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e310.jpg (false positive), An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e311.jpg (false negative), and An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e312.jpg (true negative) defined similarly. The positive predictive value, An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e313.jpg, and the sensitivity, An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e314.jpg, are reported for each estimated network. While it is often customary to include specificity, An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e315.jpg, along with An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e316.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e317.jpg, its conspicuous absence here is for good reason. Since GANs are sparse, An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e318.jpg is sure to be very large in comparison to An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e319.jpg. As a result, even a moderate change to An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e320.jpg will have little influence on the specificity, making this an unsuitable measure of performance.

Simulated Example

This section serves as a prelude to an extensive simulation study, illustrating our methodology by means of a simple example. Specifically, we set An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e321.jpg and generated a binomial tree and a scale-free tree with An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e322.jpg, and then simulated an An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e323.jpg observation dataset from each. In each case, we attempted to recover the true tree from the scaled dataset using our Metropolis-Hastings sampler implemented with 1) An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e324.jpg, and 2) An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e325.jpg.

For each chain, the Metropolis-Hastings sampler was run for An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e326.jpg steps after a burn-in of An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e327.jpg, starting from the empty network and identity node labeling. The value for the step size, An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e328.jpg, required for updating the hyperparameters was set to An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e329.jpg with An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e330.jpg. As for the hyper-inverse Wishart parameters, we choose An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e331.jpg which fixes An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e332.jpg at An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e333.jpg since the data was standardized. The values of the hyperparameters were recorded at every An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e334.jpg'th step after burn-in. The runtime for the Metropolis-Hastings sampler with An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e335.jpg on a dual An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e336.jpg GHz PowerPC G5 processor was An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e337.jpghrs for the binomial tree and An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e338.jpghrs for the scale-free tree. The corresponding runtimes with An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e339.jpg were An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e340.jpghrs and An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e341.jpghrs.

The results of the case study are shown in Table 1. In this experiment, our expectation that An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e342.jpg will recover the scale-free tree more accurately than An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e343.jpg is confirmed. It should also be noted that An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e344.jpg was able to recover a reasonable value for the scale-free exponent, too. Not to mention that it recovered the binomial tree on par with An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e345.jpg, thereby allaying the potential drawback that it would infer a heavy-tailed network, even from binomial data. Remember, this can be explained by the rather large value of An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e346.jpg. Recall that when An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e347.jpg is large, An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e348.jpg actually approximates An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e349.jpg. And although it may seem odd that An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e350.jpg fared slightly better on the binomial tree, the disparity falls within the boundaries of sampling variation. More precisely, we ran the Metropolis-Hastings sampler 10 times for each structure prior, starting each run from a different random seed, and found that the standard deviation of the sensitivity was An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e351.jpg in each case. Finally, we ran the uniform structure prior on both trees, but decided against including the results in Table 1 due to very poor performance.

Table 1
Case study.

Extended Simulation

Table 2 contains the results of our main simulation. In the previous section, we focused on two particular trees: one binomial, the other scale-free. This time we generated An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e367.jpg trees (An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e368.jpg) under each model listed in the table together with accompanying datasets of An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e369.jpg observations. The models listed as scale-free, not including the BA model, and the crumpled one were generated from a two-parameter family of distributions [51]. The parameter setting for generating the crumple trees was An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e370.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e371.jpg. The simulation settings used for each MCMC run are identical to those of the case study. Finally, the values of An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e372.jpg, An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e373.jpg, and An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e374.jpg reported in the table are averaged over the An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e375.jpg chains. The simulation was run on the supercomputer, Tsubame [52]. The system has a total of 639 Sun Fire ×4600 nodes. Each node has 8 AMD Opteron Dual Core model 880, 2.4GHz cpus.

Table 2
Full simulation.

Just as with the simulated example, An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e391.jpg recovers the binomial trees equally as well as An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e392.jpg. In fact, the An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e393.jpg agreed to two decimal places, while the An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e394.jpg was actually a little higher under the scale-free structure prior. This slight discrepancy can be accounted for by noting that the standard deviation of An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e395.jpg was An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e396.jpg for both priors. Also as expected, the more heavy-tailed the underlying trees become, the more An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e397.jpg outperforms An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e398.jpg. The difference becomes huge in the extreme case of a star tree. Moreover, An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e399.jpg demonstrated the ability to roughly recover the scale-free exponent of the underlying tree.

Real Data Example

We demonstrate our methodology on a subset of the gene expression data from a breast cancer study by [40] that was originally analyzed in [25]. The dataset (Dataset S1) consists of expression profiles for An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e400.jpg genes related to the estrogen receptor gene ESR1 (also known as ER-alpha) derived from An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e401.jpg tumor samples. This gene is an estrogen-activated transcription factor key to the proliferation of cancerous cells that is found to be overexpressed in luminal type A and B breast cancers. The overall level of ESR1 expression is higher in type A than in type B with the former correlating with better prognosis [53].

The Metropolis-Hastings sampler was run on the standardized data with both the random structure prior and its scale-free counterpart, yielding the corresponding GANs An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e402.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e403.jpg. For comparison's sake, the edge inclusion threshold, An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e404.jpg, was tuned for each run so that the resulting GAN comprised exactly An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e405.jpg edges; the value of An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e406.jpg is An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e407.jpg for An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e408.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e409.jpg for An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e410.jpg. In both cases, the Metropolis-Hastings sampler was started from the empty network with identity node labeling and An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e411.jpg iterations were run with the first An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e412.jpg discarded as burn-in. The hyperparameter assignments were identical to those of the simulated examples. The runtime on a dual An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e413.jpg GHz PowerPC G5 processor was An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e414.jpghrs with An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e415.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e416.jpghrs with An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e417.jpg.

At this stage, comparing the performance of the scale-free structure prior in a broader context is of key importance. To this end, we used the software packages ARACNE [54], [55] and BANJO [23] to analyze the gene expression data as well. ARACNE constructs a relevance network based on estimated mutual information between all pairs genes, but there is a twist. After a relevance network is inferred by connecting any pair of genes with mutual information greater than a certain cutoff value, some edges suspected to represent indirect interactions are eliminated using the data processing inequality principle. We chose the cutoff value to be An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e418.jpg so that the number of estimated edges was An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e419.jpg, while all other program arguments were set at their default values. The code itself was run in a matter of minutes. BANJO, on the other hand, constructs a Bayesian network from discrete data using a heuristic search strategy to explore the space of An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e420.jpg-node, directed networks without cycles. Each network happened upon in the search is ranked using a Bayesian Dirichlet equivalence scoring metric. We discretized the data into three categories and limited the number of parents any given node may have to 10. Once again, all other program arguments were set at their default values. The estimated network, which was found to have An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e421.jpg directed edges, was the highest scoring network after running BANJO for An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e422.jpghr.

Figures 3(A) and (B) show the GANs estimated with An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e423.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e424.jpg. The latter exhibits clear hubs, supporting the view that a gene regulatory network consists of a small minority of hub genes with the vast majority of genes engaged in a small number of interactions. By contrast, the topology of An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e425.jpg is relatively decentralized with no single gene dominating the network. Additionally, the estimated value of exponent An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e426.jpg in the static model was An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e427.jpg, in line with findings in the literature for gene regulatory networks [31]. Turning now to Figures 3(C) and (D), it is interesting to see that the topology of the relevance network echoes that of the GAN inferred using the scale-free structure prior. The same can be said for the Bayesian network and the random structure prior GAN. Of course, a more exquisite experimental technique is the only sure-fire way to validate the individual regulatory interactions suggested by these graphical models. These results, however, are telltale in one respect. In a study comparing different reconstruction methods on simulated data [56], it was reported that BANJO performs well only when An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e428.jpg, while ARACNE shows good performance even when An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e429.jpg.

Figure 3
Estimated networks for the breast cancer expression dataset.

The topological dissimilarity between the two GANs is again made evident by a visual inspection of their degree distributions, plotted in Figure 4. The most abundantly connected node in An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e433.jpg has degree An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e434.jpg, whereas An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e435.jpg contains four nodes with degree exceeding this value; the largest hubs correspond to the genes FOXA1 (HNF-3A), SLC39A6 (LIV-1), and E2F3 (KIAA0075) and have degree An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e436.jpg, An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e437.jpg, and An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e438.jpg, respectively. The main hub FOXA1 is a forkhead box family transcription factor that is necessary for optimum expression of roughly half of all ESR1-regulated genes [57]. In a recent study [58], it was found that FOXA1 is expressed predominantly in luminal type A carcinomas, making it a potential marker of good prognosis. Previously unrecognized as a hub, SLC39A6 functions as a zinc transporter, and was identified in [59] to be highly expressed in ESR1-positive tumors as well as showing a highly significant association with the spread of breast cancer to the lymph nodes. Meanwhile, E2F3 is a transcription factor that has been shown to regulate numerous genes involved in cell cycle progression [60].

Figure 4
Degree distributions of the gene association networks.

Finally, both GANs agreed with the relevance network on some established regulatory interactions as can be seen in Figure 5. For instance, FOXA1 is connected to AR (androgen receptor), which is known to regulate estrogen receptor expression [61]. FOXA1 has also been shown to play a direct role in the transcription of the TFF1 (pS2) gene [62], and our work agrees with [24] on the role of TFF3 (ITF) as an intermediary. By contrast, the Bayesian network agreed on very few of these interactions. Part of the explanation is likely to rest in using the maximum posterior network as the estimated network. As we drew attention to in the section Network and Parameter Estimation, a single network of high posterior probability may be a less representative estimator than an network consisting of edges that occur with high frequency in an MCMC chain. Another possible contributing factor is that the number of observations was insufficient for BANJO, but what is also unclear is the extent to which discretizing the expression data affected the quality of the inference.

Figure 5
Gene interactions identified by all methods.

Discussion

The main purpose of this paper has been to introduce a scale-free structure prior, An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e445.jpg, for graphical models with a view toward the inference of large-scale GANs from datasets consisting of few observations, An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e446.jpg, for a comparatively large number of variables, An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e447.jpg. It is important to point out that the true network need not follow a power-law in order for the scale-free prior to be applicable; rather, An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e448.jpg is a convenient distribution that can account for heavy-tailed degree distributions — a crucial limitation of the random structure prior. That said, we have shown in simulated examples that An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e449.jpg performs markedly better than the random structure prior at recovering networks characterized by heavy-tailed degree distributions. What is more, An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e450.jpg proved versatile enough to recover random networks on par with An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e451.jpg itself. Above all, our analysis of the breast cancer expression data illustrates the practical value of the scale-free structure prior as an instrument to aid in the identification of candidate hub genes with the potential to direct the hypotheses of molecular biologists, and thus drive future experiments.

A node labeling An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e452.jpg, that is, a permutation of the integers An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e453.jpg applied to the nodes of An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e454.jpg so that each An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e455.jpg is represented by the integer An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e456.jpg, is an essential prerequisite for any MCMC implementation of the scale-free structure prior. The reason is that the scale-free network model underlying An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e457.jpg, or any other scale-free network model for that matter, is so elaborate that the nodes are not interchangeable in regard to computing the probability of An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e458.jpg. And, although easily overshadowed by An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e459.jpg itself, our new Metropolis-Hastings sampler for An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e460.jpg is an innovative contribution in its own right. Our sampler uses a simple pair swapping strategy for updating An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e461.jpg, and one future topic of research is to investigate the comparative performance of more ingenious update schemes. More research is also required in order to assesses how accurately An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e462.jpg can be estimated.

We take pains to point out that while our implementation is for GGMs, the methodology described here applies to graphical models more generally. For instance, An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e463.jpg could be applied crudely to Bayesian network inference by simply ignoring edge directionality, or else the underlying static model could be modified to have directed edges. The latter approach raises an interesting consideration: in gene regulatory networks, according to the prevailing wisdom [31], it is actually only the out-degree distribution that follows a power-law. By contrast, the in-degree of a node is usually small and its distribution is better approximated by a sort of restricted exponential function. While this distinction gets blurred when inference is conducted with undirected graphical models, Bayesian networks provide an obvious incentive for taking it into account. Indeed, Bayesian networks may prove to be a more promising area of application because they currently able to handle much larger networks than GGMs [63].

Although the static model is not biologically motivated, it is a defensible choice as an underlying model for An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e464.jpg on the grounds that it is a simple model with the potential to describe any network topology; not to mention that it includes the ER model as a limiting case. But there is more, implementing a structure prior based on a growing network model poses some added difficulties because not only will the probability of a network depend on the choice of seed network, but evaluating An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e465.jpg will result in a greater expenditure of computational resources as the edge inclusion probabilities depend on the order in which they were added to the network.

All the same, we implemented two other scale-free structure priors based on growing models; one on the Poisson-growth, preferential attachment model [64], and another on the biologically meaningful duplication model. In the former case, we were able to get away with using a single node as the seed network, and we found that while this prior recovered heavy-tailed networks as well as An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e466.jpg, yet it understandably struggled to accurately recover random networks. Meanwhile, the duplication model based structure prior was highly sensitive to the choice of seed network in addition to being unstable due to the complexity of the model. One future avenue of research is to adapt these models, or the MCMC implementation, to be more applicable for use as a prior distributions. The primary motivation for doing so is that the model parameters have biological meaning, and their estimation could prove of independent interest.

The estimation of network model parameters has been an incidental aspect of our work; however, it is related to the quite different problem of fitting network models to known biological networks. Likelihood and likelihood-free methods have been developed [65], [66] in order to fit a hybrid preferential attachment/duplication and divergence model to some protein-protein interaction networks, obtaining estimates of the model parameters. These methodologies assume that the ordering of the nodes in time, that is An external file that holds a picture, illustration, etc.
Object name is pone.0013580.e467.jpg, is known, but in most cases this information is unknown. In the future, our Metropolis-Hastings sampler could very well be applied to this problem.

Software is available from the corresponding author upon request.

Supporting Information

Dataset S1

This file contains the gene expression data that we analyzed in our paper.

(0.05 MB TXT)

Acknowledgments

The authors kindly thank Mariko Okada-Hatakeyama and Seiya Imoto for fruitful discussions as well as Beatrix Jones for fielding our various inquiries concerning the gene expression dataset. Also special thanks go to the editor, the two anonymous referees, and Brent Hagen for their most constructive comments and suggestions.

Footnotes

Competing Interests: The authors have declared that no competing interests exist.

Funding: This research was supported in part by Computationism as a Foundation for the Sciences, a program of the Japan Society for the Promotion of Science (JSPS) Global Centers for Excellence (COE) (http://compview.titech.ac.jp/front-page-en/view?set_language=en). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

1. Husmeier D. Sensitivity and specificity of inferring genetic regulatory interactions from microarray experiments with dynamic bayesian networks. Bioinformatics. 2003;19:2271–2282. [PubMed]
2. West M. Bayesian factor regression models in the “large p, small n” paradigm. Oxford University Press; 2003. pp. 723–732.
3. Schlitt T, Brazma A. Current approaches to gene regulatory network modelling. BMC Bioinformatics. 2007;8:S9+. [PMC free article] [PubMed]
4. Whittaker J. Graphical models in applied multivariate statistics. New York: John Wiley & Sons; 1990.
5. Lauritzen SL. Graphical models. Oxford Statistical Science Series. New York: Oxford University Press; 1996.
6. Edwards D. Introduction to graphical modelling. New York: Springer-Verlag; 2000.
7. Grzegorczyk M, Husmeier D, Werhli AV. Reverse engineering gene regulatory networks with various machine learning methods. In: Emmert-Streib F, Dehmer M, editors. Analysis of Microarray Data. Weinheim: Wiley-VCH; 2008. pp. 101–142.
8. Markowetz F, Spang R. Inferring cellular networks - a review. BMC Bioinformatics. 2007;8:S5+. [PMC free article] [PubMed]
9. Butte AJ, Tamayo P, Slonim D, Golub TR, Kohane IS. Discovering functional relationships between rna expression and chemotherapeutic susceptibility using relevance networks. Proc Natl Acad Sci U S A. 2000;97:12182–12186. [PMC free article] [PubMed]
10. Schäfer J, Strimmer K. A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statistical applications in genetics and molecular biology. 2005;4 [PubMed]
11. Wille A, Bhlmann P. Low-order conditional independence graphs for inferring genetic networks. Statistical Applications in Genetics and Molecular Biology. 2006;5:1. [PubMed]
12. Castelo R, Roverato A, Chickering M. A robust procedure for gaussian graphical model search from microarray data with p larger than n. Journal of Machine Learning Research. 2006;7:2006.
13. Magwene PM, Kim J. Estimating genomic coexpression networks using first-order conditional independence. Genome Biology. 2004;5:R100. [PMC free article] [PubMed]
14. Pe'er D, Nachman I, Linial M, Friedman N. Using bayesian networks to analyze expression data. Journal of Computational Biology. 2000;7:601–620. [PubMed]
15. Imoto S, Higuchi T, Goto T, Tashiro K, Kuhara S, et al. CSB '03: Proceedings of the IEEE Computer Society conference on Bioinformatics. Washington, DC, USA: IEEE Computer Society; 2003. Combining microarrays and biological knowledge for estimating gene networks via bayesian networks. pp. 104–113. [PubMed]
16. Tamada Y, Kim S, Bannai H, Imoto S, Tashiro K, et al. Estimating gene networks from gene expression data by combining bayesian network model with promoter element detection. Bioinformatics. 2003;19(Suppl 2):11227–11236. [PubMed]
17. Barnard A, Hartemink AJ. PSB '05: Pacific Symposium on Biocomputing. World Scientific; 2005. Informative structure priors: joint learning of dynamic regulatory networks from multiple types of data. pp. 459–470. [PubMed]
18. Werhli AV, Husmeier D. Reconstructing gene regulatory networks with bayesian networks by combining expression data with multiple sources of prior knowledge. Statistical applications in genetics and molecular biology. 2007;6 [PubMed]
19. Pe'er D, Regev A, Tanay A. Minreg: inferring an active regulator set. Bioinformatics. 2002;18(Suppl 1):S258–S267. [PubMed]
20. Segal E, Shapira M, Regev A, Pe'er D, Botstein D, et al. Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nature Genetics. 2003;34:166–176. [PubMed]
21. Schäfer J, Strimmer K. An empirical bayes approach to inferring large-scale gene association networks. Bioinformatics. 2005;21:754–764. [PubMed]
22. Meinshausen N, Bhlmann P. High dimensional graphs and variable selection with the lasso. Annals of Statistics. 2006;34:1436–1462.
23. Yu J, Smith VA, Wang PP, Hartemink AJ, Jarvis ED. Advances to bayesian network inference for generating causal networks from observational biological data. Bioinformatics (Oxford, England) 2004;20:3594–3603. [PubMed]
24. Dobra A, Hans C, Jones B, Nevins JR, West M. Sparse graphical models for exploring gene expression data. Journal of Multivariate Analysis. 2004;90:196–212.
25. Jones B, Carvalho CM, Dobra A, Hans C, Carter C, et al. Experiments in stochastic computation for high-dimensional graphical models. Statistical Science. 2005;20:388–400.
26. Erdös P, Rényi A. On random graphs. Publicationes Mathematicae. 1959;6:290–297.
27. Erdös P, Rényi A. On the evolution of random graphs. Publication of the Mathematical Institute of the Hungarian Academy of Sciences. 1960;5:17–61.
28. Albert R, Barabási AL. Statistical mechanics of complex networks. Reviews of Modern Physics. 2002;74
29. Dorogovtsev SN, Mendes JFF. Evolution of networks. Advances in Physics. 2002;51:1070–1187.
30. Newman MEJ. The structure and function of complex networks. Siam Review. 2003;45:167–256.
31. Albert R. Scale-free networks in cell biology. J Cell Sci. 2005;118:4947–4957. [PubMed]
32. Farkas I, Jeong H, Vicsek T, Barabasi, Oltvai ZN. The topology of the transcription regulatory network in the yeast, saccharomyces cerevisiae. Physica A: Statistical Mechanics and its Applications. 2003;318:601–612.
33. Kim SK, Lund J, Kiraly M, Duke K, Jiang M, et al. A gene expression map for caenorhabditis elegans. Science. 2001;293:2087–2092. [PubMed]
34. Boccaletti S, Latora V, Moreno Y, Chavez M, Hwang D. Complex networks: Structure and dynamics. Physics Reports. 2006;424:175–308.
35. Bárabasi AL, Albert R. Emergence of scaling in random networks. Science. 1999;286:509–512. [PubMed]
36. Ohno S. Evolution by gene duplication. G. Allen and Unwin; Springer; 1970.
37. Solé R, Pastor-Satorras R, Smith E, Kepler T. A model of large-scale proteome evolution. Advances in Complex Systems. 2002;5:43–54.
38. Pastor-Satorras R, Smith E, Solé RV. Evolving protein interaction networks through gene duplication. J Theor Biol. 2003;222:199–210. [PubMed]
39. Goh KI, Kahng B, Kim D. Universal behavior of load distribution in scale-free networks. Phys Rev Lett. 2001;87 [PubMed]
40. West M, Blanchette C, Dressman H, Huang E, Ishida S, et al. Predicting the clinical status of human breast cancer by using gene expression profiles. Proceedings of the National Academy of Sciences of the United States of America. 2001;98:11462–11467. [PMC free article] [PubMed]
41. Dempster AP. Covariance selection. Biometrics. 1972;28:95–108.
42. Wong K, Carter C, Kohn R. Efficient estimation of covariance selection models. Biometrika. 2003;90:809–830.
43. Guidici P, Green PJ. Decomposable graphical model determination. Biometrika. 1999;86:785–801.
44. Adamic LA, Huberman BA. Zipf's law and the internet. Glottometrics. 2002;3:143–150.
45. Lee DS, Goh KI, Khang B, Kim D. Scale-free random graphs and potts model. Pramana Journal of Physics. 2005;64:1149–1159.
46. Rannala B, Yang Z. Probability distribution of molecular evolutionary trees: a new method of phylogenetic inference. J Mol Evol. 1996;43:304–311. [PubMed]
47. Yang Z, Rannala B. Bayesian phylogenetic inference using dna sequences: a markov chain monte carlo method. Mol Biol Evol. 1997;14:717–724. [PubMed]
48. Ronquist F, Huelsenbeck JP. Mrbayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics. 2003;19:1572–1574. [PubMed]
49. Frydenberg M, Lauritzen SL. Decomposition of maximum likelihood in mixed interaction models. Biometrika. 1989;76:539–555.
50. Carvalho LE, Lawrence CE. Centroid estimation in discrete high-dimensional spaces with applications in biology. Proceedings of the National Academy of Sciences. 2008;105:3209–3214. [PMC free article] [PubMed]
51. Burda Z, Correia JD, Krzywicki A. Statistical ensemble of scale-free random graphs. Phys Rev E. 2001;64:046118. [PubMed]
52. Matsuoka S. The road to TSUBAME and beyond, Boca Raton, FL: Chapman & Hall/CRC, chapter 14. 2008. pp. 289–310. Computational Science Series.
53. Sørlie T, Perou CM, Tibshirani R, Aas T, Geisler S, et al. Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proceedings of the National Academy of Sciences of the United States of America. 2001;98:10869–10874. [PMC free article] [PubMed]
54. Basso K, Margolin AA, Stolovitzky G, Klein U, Dalla-Favera R, et al. Reverse engineering of regulatory networks in human b cells. Nature Genetics. 2005;37:382–390. [PubMed]
55. Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, et al. Aracne: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC bioinformatics. 2006;7(Suppl 1) [PMC free article] [PubMed]
56. Bansal M, Belcastro V, Ambesi-Impiombato A, di Bernardo D. How to infer gene networks from expression profiles. Molecular Systems Biology. 2007;3 [PMC free article] [PubMed]
57. Carroll JS, Brown M. Estrogen receptor target gene: an evolving concept. Molecular Endocrinology. 2006;20:1707–1714. [PubMed]
58. Nakshatri H, Badve S. Foxa1 as a therapeutic target for breast cancer. Expert Opinions on Therapeutic Targets. 2007;11:507–514. [PubMed]
59. Taylor KM. A distinct role in breast cancer for two liv-1 family zinc transporters. Biochemical Society Transactions. 2008;36(Pt6):1247–1251. [PubMed]
60. Giangrande P, Hallstrom T, Tunyaplin C, Calame K, Nevins J. Identification of e-box factor tfe3 as a functional partner for the e2f3 transcription factor. Molecular and Cellular Biology. 2003;23:3707–3720. [PMC free article] [PubMed]
61. Sahlin L, Norstedt G, Eriksson H. Androgen regulation of the insulin-like growth factor 1 and the estrogen receptor in rat uterus and liver. Journal of Steroid Biology. 1994;51:57–66. [PubMed]
62. Beck S, Sommer P, dos Santos Silva E, Blin N, Gött P. Hepatocyte nuclear factor 3 (winged helix domain) activates trefoil factor gene tff1 through a binding motif adjacent to the tata box. DNA Cell Biology. 1999;18:157–164. [PubMed]
63. Hartemink AJ. Reverse engineering gene regulatory networks. Nature Biotechnology. 2005;23:554–555. [PubMed]
64. Sheridan P, Yagahara Y, Shimodaira H. A preferential attachment model with poisson growth for scale-free networks. Annals of the Institute of Statistical Mathematics. 2008;60:747–761.
65. Wiuf C, Brameier M, Hagberg O, Stumpf MP. A likelihood approach to analysis of network data. Proceedings of the National Academy of Sciences of the United States of America. 2006;103:7566–7570. [PMC free article] [PubMed]
66. Ratmann O, Jørgensen O, Hinkley T, Stumpf M, Richardson S, et al. Using likelihood-free inference to compare evolutionary dynamics of the protein networks of h. pylori and p. falciparum. PLoS Computational Biology. 2007;3:2266–2278. [PMC free article] [PubMed]

Articles from PLoS ONE are provided here courtesy of Public Library of Science
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

  • MedGen
    MedGen
    Related information in MedGen
  • PubMed
    PubMed
    PubMed citations for these articles