- Journal List
- NIHPA Author Manuscripts
- PMC2724184

# Bayesian Approach to Network Modularity

## Abstract

We present an efficient, principled, and interpretable technique for inferring module assignments and for identifying the optimal number of modules in a given network. We show how several existing methods for finding modules can be described as variant, special, or limiting cases of our work, and how the method overcomes the resolution limit problem, accurately recovering the true number of modules. Our approach is based on Bayesian methods for model selection which have been used with success for almost a century, implemented using a variational technique developed only in the past decade. We apply the technique to synthetic and real networks and outline how the method naturally allows selection among competing models.

Large-scale networks describing complex interactions among a multitude of objects have found application in a wide array of fields, from biology to social science to information technology [1,2]. In these applications one often wishes to model networks, suppressing the complexity of the full description while retaining relevant information about the structure of the interactions [3]. One such network model groups nodes into modules, or “communities,” with different densities of intra- and interconnectivity for nodes in the same or different modules. We present here a computationally efficient Bayesian framework for inferring the number of modules, model parameters, and module assignments for such a model.

The problem of finding modules in networks (or “community detection”) has received much attention in the physics literature, wherein many approaches [4,5] focus on optimizing an energy-based cost function with fixed parameters over possible assignments of nodes into modules. The particular cost functions vary, but most compare a given node partitioning to an implicit null model, the two most popular being the configuration model and a limited version of the stochastic block model (SBM) [6,7]. While much effort has gone into how to optimize these cost functions, less attention has been paid to what is to be optimized. In recent studies which emphasize the importance of the latter question it was shown that there are inherent problems with existing approaches regardless of how optimization is performed, wherein parameter choice sets a lower limit on the size of detected modules, referred to as the “resolution limit” problem [8,9]. We extend recent probabilistic treatments of modular networks [10,11] to develop a solution to this problem that relies on inferring distributions over the model parameters, as opposed to asserting parameter values *a priori*, to determine the modular structure of a given network. The developed techniques are principled, interpretable, computationally efficient, and can be shown to generalize several previous studies on module detection.

We specify an *N*-node network by its adjacency matrix **A**, where *A _{ij}* = 1 if there is an edge between nodes

*i*and

*j*and

*A*= 0 otherwise, and define

_{ij}*σ*{1,…,

_{i}*K*} to be the unobserved module membership of the

*i*th node. We use a constrained SBM, which consists of a multinomial distribution over module assignments with weights

*π*p{

_{μ}*σ*=

_{i}*μ*|) and Bernoulli distributions over edges contained within and between modules with weights

_{c}*p*(

*A*= 1 |

_{ij}*σ*=

_{i}*σ*,

_{j}*ϑ⃗*) and

_{d}*p*(

*A*= 1|

_{ij}*σ*≠

_{i}*σ*,

_{j}*ϑ⃗*), respectively. In short, to generate a random undirected graph under this model we roll a

*K*-sided die (biased by )

*N*times to determine module assignments for each of the

*N*nodes; we then flip one of two biased coins (for either intra- or intermodule connection, biased by

*,*

_{c}*, respectively) for each of the*

_{d}*N*(

*N*− 1)/2 pairs of nodes to determine if the pair is connected. The extension to directed graphs is straightforward.

Using this model, we write the joint probability *p*(**A**, |, *ϑ⃗*, *K*) = *p*(**A**|, *ϑ⃗*) *p*(|) (conditional dependence on *K* has been suppressed below for brevity) as

where *c*_{+} Σ_{i}_{>}* _{j}A_{ij} δ_{σi, σj}* is the number of edges contained within communities,

*c*

_{−}Σ

_{i}_{>}

*(1 −*

_{j}*A*)

_{ij}*δ*is the number of nonedges contained within communities,

_{σi, σj}*d*

_{+}Σ

_{i}_{>}

*(1 −*

_{j}A_{ij}*δ*) is the number of edges between different communities,

_{σi, σj}*d*

_{−}Σ

_{i}_{>}

*(1 −*

_{j}*A*) (1 −

_{ij}*δ*) is the number of nonedges between different communities, and ${n}_{\mu}\equiv {\sum}_{i=1}^{N}{\delta}_{{\sigma}_{i},\mu}$ is the occupation number of the

_{σi, σj}*μ*th module. Defining − ln

*p*(

**A**, |,

*ϑ⃗*) and regrouping terms by local and global counts, we recover (up to additive constants) a generalized version of [10]:

a Potts model Hamiltonian with unknown coupling constants *J _{G}* ln(1 −

*)/(1 −*

_{d}*),*

_{c}*J*ln

_{L}*/*

_{c}*+*

_{d}*J*, and chemical potentials

_{G}*h*− ln

_{μ}*π*. (Note that many previous methods omit a chemical potential term, implicitly assuming equally sized groups.)

_{μ}While previous approaches [4,10] minimize related Hamiltonians as a function of , these methods require that the user specifies values for these unknown constants, which gives rise to the resolution limit problem [8,9]. Our approach, however, uses a disorder-averaged calculation to infer distributions over these parameters, avoiding this issue. To do so, we take beta () and Dirichlet () distributions over *ϑ⃗* and , respectively:

These *conjugate prior* distributions are defined on the full range of *ϑ⃗* and , respectively, and their functional forms are preserved when integrated against the model to obtain updated parameter distributions. Their hyperparameters {_{+0}, _{−0}, _{+0}, _{−0},
${\stackrel{\sim}{\overrightarrow{n}}}_{0}$} act as pseudocounts that augment observed edge counts and occupation numbers.

In this framework the problem of module detection can be stated as follows: given an adjacency matrix **A**, determine the most probable number of modules (i.e., occupied spin states) *K*^{*} = argmax* _{K} p*(

*K*|

**A**) and infer posterior distributions over the model parameters (i.e., coupling constants and chemical potentials)

*p*(,

*ϑ⃗|*

**A**) and the latent module assignments (i.e., spin states)

*p*(

*|*

**A**). In the absence of

*a priori*belief about the number of modules, we demand that

*p*(

*K*) is sufficiently weak that maximizing

*p*(

*K*|

**A**)

*p*(

**A**|

*K*)

*p*(

*K*) is equivalent to maximizing

*p*(

**A**|

*K*), referred to as the evidence. This approach to model selection [12] proposed by the statistical physicist Jeffreys in 1935 [13] balances model fidelity and complexity to determine, in this context, the number of modules.

A more physically intuitive interpretation of the evidence is as the disorder-averaged partition function of a spin glass, calculated by marginalizing over the possible quenched values of the parameters *ϑ⃗* and as well as the spin configurations :

While the *ϑ⃗* and integrals in Eq. (4) can be performed analytically, the remaining sum over module assignments scales as *K ^{N}* and becomes computationally intractable for networks of even modest sizes. To accommodate large-scale networks we use a variational approach that is well known to the statistical physics community [14] and has recently found application in the statistics and machine learning literature, commonly termed variational Bayes (VB) [15]. We proceed by taking the negative logarithm of and using Gibbs’s inequality:

That is, we first multiply and divide by an arbitrary approximating distribution *q*(*, , ϑ⃗*) and then upper-bound the log of the expectation by the expectation of the log. We define the quantity to be minimized—the expression in Eq. (6)—as the variational free energy *F*{*q*; **A**}, a functional of *q*(*, , ϑ⃗*). (Note that the negative log of *q*(*, , ϑ⃗*) plays the role of a test Hamiltonian in variational approaches in statistical mechanics.)

We next choose a factorized approximating distribution q(, , ϑ⃗)=q_{} () q_{} () q_{ϑ⃗}(ϑ⃗) with q_{} () =(; $\stackrel{\u20d7}{n}$) and q_{ϑ⃗}(ϑ⃗) = q_{c}(_{c})q_{d}(_{d}) = (_{c}; _{+}, _{−})(_{d}; _{+}, _{−}) as in mean field theory, we factorize q_{}() as q(σ_{i} = μ) = Q_{iμ}, an N-by-K matrix which gives the probability that the ith node belongs to the μth module. Evaluating F{q; **A**} with this functional form for q(, , ϑ⃗) gives a function of the variational parameters {_{+}, _{−}, _{+}, _{−},
$\stackrel{\sim}{\overrightarrow{n}}$} and matrix elements *Q _{iμ}* which can subsequently be minimized by taking the appropriate derivatives.

We summarize the resulting iterative algorithm, which provably converges to a local minimum of *F*{*q*; **A**} and provides controlled approximations to the evidence *p*(**A**|*K*) as well as the posteriors *p*(*, ϑ⃗*|**A**) and *p*(|**A**):

## Initialization

Initialize the *N*-by-*K* matrix **Q** = **Q**_{0} and set pseudocounts _{+} = _{+0}, _{−} = _{−0}, _{+} = _{+0}, _{−} = _{−0}, *ñ _{μ}* =

*ñ*

_{μ}_{0}.

## Main loop

Until convergence in *F*{*q*; **A**}:

- Update the expected value of the coupling constants and chemical potentials$$\langle {J}_{L}\rangle =\psi ({\stackrel{\sim}{c}}_{+})-\psi ({\stackrel{\sim}{c}}_{-})-\psi ({\stackrel{\sim}{d}}_{+})+\psi ({\stackrel{\sim}{d}}_{-})$$(7)$$\langle {J}_{G}\rangle =\psi ({\stackrel{\sim}{d}}_{-})-\psi ({\stackrel{\sim}{d}}_{+}+{\stackrel{\sim}{d}}_{-})-\psi ({\stackrel{\sim}{c}}_{-})+\psi ({\stackrel{\sim}{c}}_{+}+{\stackrel{\sim}{c}}_{-})$$(8)$$\langle {h}_{\mu}\rangle =\psi \left(\sum _{\mu}{\stackrel{\sim}{n}}_{\mu}\right)-\psi ({\stackrel{\sim}{n}}_{\mu}),$$(9)where
*ψ*(*x*) is the digamma function; - Update the variational distribution over each spin
*σ*_{i}$${Q}_{i\mu}\propto exp\left\{\sum _{j\ne i}[\langle {J}_{L}\rangle {A}_{ij}-\langle {J}_{G}\rangle ]{Q}_{j\mu}-\langle {h}_{\mu}\rangle \right\}$$(10)normalized such that Σ= 1, for all_{μ}Q_{iμ}*i*; - Update the variational distribution over parameters from the expected counts and pseudocounts$${\stackrel{\sim}{n}}_{\mu}=\langle {n}_{\mu}\rangle +{\stackrel{\sim}{n}}_{{\mu}_{0}}=\sum _{i=1}^{N}{Q}_{i\mu}+{\stackrel{\sim}{n}}_{{\mu}_{0}}$$(11)$${\stackrel{\sim}{c}}_{+}=\langle {c}_{+}\rangle +{\stackrel{\sim}{c}}_{{+}_{0}}={\scriptstyle \frac{1}{2}}\text{Tr}({\mathbf{Q}}^{T}\mathbf{AQ})+{\stackrel{\sim}{c}}_{{+}_{0}}$$(12)$${\stackrel{\sim}{c}}_{-}=\langle {c}_{-}\rangle +{\stackrel{\sim}{c}}_{{-}_{0}}={\scriptstyle \frac{1}{2}}\text{Tr}({\mathbf{Q}}^{T}(\overrightarrow{u}{\langle \overrightarrow{n}\rangle}^{T}-\mathbf{Q}))-\langle {c}_{+}\rangle +{\stackrel{\sim}{c}}_{{-}_{0}}$$(13)$${\stackrel{\sim}{d}}_{+}=\langle {d}_{+}\rangle +{\stackrel{\sim}{d}}_{{+}_{0}}=M-\langle {c}_{+}\rangle +{\stackrel{\sim}{d}}_{{+}_{0}}$$(14)$${\stackrel{\sim}{d}}_{-}=\langle {d}_{-}\rangle +{\stackrel{\sim}{d}}_{{-}_{0}}=C-M-\langle {c}_{-}\rangle +{\stackrel{\sim}{d}}_{{-}_{0}},$$(15)where
*C*=*N*(*N*− 1)/2,*M*=Σ_{i}_{>}, and_{j}A_{ij}*$\stackrel{\u20d7}{u}$*is a*N*-by-1 vector of 1’s; - Calculate the updated optimized free energy$$F\{q;\mathbf{A}\}=-ln\frac{{\mathcal{Z}}_{c}{\mathcal{Z}}_{d}{\mathcal{Z}}_{\overrightarrow{\pi}}}{{\stackrel{\sim}{\mathcal{Z}}}_{c}{\stackrel{\sim}{\mathcal{Z}}}_{d}{\stackrel{\sim}{\mathcal{Z}}}_{\overrightarrow{\pi}}}+\sum _{\mu =1}^{K}\sum _{i=1}^{N}{Q}_{i\mu}ln{Q}_{i\mu},$$(16)

where
${\mathcal{Z}}_{\overrightarrow{\pi}}=B(\stackrel{\sim}{\overrightarrow{n}})$ is the beta function with a vector-valued argument, the partition function for the Dirichlet distribution *q _{}*() [likewise for

*q*(

_{c}*),*

_{c}*q*(

_{d}*)]. As this provably converges to a local optimum, VB is best implemented with multiple randomly chosen initializations of*

_{d}**Q**

_{0}to find the global minimum of

*F*{

*q*;

**A**}.

Convergence of the above algorithm provides the approximate posterior distributions *q _{}*(),

*q*(), and

_{}*q*(

_{ϑ⃗}*ϑ⃗*) and simultaneously returns

*K**, the number of nonempty modules that maximizes the evidence. As such, one needs only to specify a maximum number of allowed modules and run VB; the probability of occupation for extraneous modules converges to zero as the algorithm runs and the most probable number of occupied modules remains.

This is significantly more accurate than other approximate methods, such as Bayesian information criterion (BIC) [16] and integrated classification likelihood (ICL) [17,18], and is less computationally expensive than empirical methods such as cross-validation (CV) [19,20] in which one must perform the associated procedure after fitting the model for each considered value of *K*. Specifically, BIC and ICL are suggested for single-peaked likelihood functions well approximated by Laplace integration and studied in the large-*N* limit. For a SBM the first assumption of a single-peaked function is invalidated by the underlying symmetries of the latent variables; i.e., nodes are distinguishable and modules indistinguishable. See Fig. 1 for a comparison of our method with the Girvan-Newman modularity [5] in the resolution limit test [8,9], where VB consistently identifies the correct number of modules. [Note that VB is both accurate and fast: it performs competitively in the ‘‘four groups’’ test [21] and scales as
(*MK*). Runtime for the main loop in MATLAB on a 2 GHz laptop is ~6 min for *N* = 10^{6} nodes with average degree 16 and *K* = 4.]

Furthermore, we note that previous methods in which parameter inference is performed by optimizing a likelihood function via expectation maximization (EM) [11,18] are also special cases of the framework presented here. EM is a limiting case of VB in which one collapses the distributions over parameters to point estimates at the mode of each distribution; however EM is prone to over-fitting and cannot be used to determine the appropriate number of modules, as the likelihood of observed data increases with the number of modules in the model. As such, VB performs at least as well as EM while simultaneously providing complexity control [22,23].

In addition to validating the method on synthetic networks, we apply VB to the 2000 NCAA American football schedule shown in Fig. 2 [24]. Each of the 115 nodes represents an individual team and each of the 613 edges represents a game played between the nodes joined. The algorithm correctly identifies the presence of the 12 conferences which comprise the schedule, where teams tend to play more games within than between conferences, making most modules assortative. Of the 115 teams, 105 teams are assigned to their corresponding conferences, with the majority of exceptions belonging to the frequently misclassified independent teams [25]—the only disassortative group in the network. We emphasize that, unlike other methods in which the number of conferences must be asserted, VB determines 12 as the most probable number of conferences automatically.

**...**

Posing module detection as inference of a latent variable within a probabilistic model has a number of advantages. It clarifies what precisely is to be optimized and suggests a principled and efficient procedure for how to perform this optimization. Inferring distributions over model parameters reveals the natural scale of a given modular network, avoiding resolution limit problems. This method allows us to view a number of approaches to the problem by physicists, applied mathematicians, social scientists, and computer scientists as related subparts of a larger problem. In short, it suggests how a number of seemingly disparate methods may be recast and united. A second advantage of this work is its generalization to other models, including those designed to reveal structural features other than modularity. Finally, use of the evidence allows model selection not only among nested models, e.g., models differing only in the number of parameters, but even among models of different parametric families. The last strikes us as a natural area for progress in the statistical study of real-world networks.

## Acknowledgments

It is a pleasure to acknowledge useful conversations on modeling with Joel Bader and Matthew Hastings, on Monte Carlo methods for Potts models with Jonathan Goodman, with David Blei on variational methods, and with Aaron Clauset for his feedback on this manuscript [26]. J. H. was supported by NIH No. 5PN2EY016586; C. W. was supported by NSF No. ECS-0425850 and NIH No. 1U54CA121852.

## Footnotes

PACS numbers: 89.75.Hc, 02.50.–r, 02.50.Tt

## Contributor Information

Jake M. Hofman, Department of Physics, Columbia University, New York, New York 10027, USA, Email: ude.aibmuloc@5402hmj..

Chris H. Wiggins, Department of Applied Physics and Applied Mathematics, Columbia University, New York, New York 10027, USA, Email: ude.aibmuloc@sniggiw.sirhc.

## References

## Formats:

- Article |
- PubReader |
- ePub (beta) |
- PDF (2.1M) |
- Citation

- Empirical evaluation of scoring functions for Bayesian network model selection.[BMC Bioinformatics. 2012]
*Liu Z, Malone B, Yuan C.**BMC Bioinformatics. 2012; 13 Suppl 15:S14. Epub 2012 Sep 11.* - Community structure in directed networks.[Phys Rev Lett. 2008]
*Leicht EA, Newman ME.**Phys Rev Lett. 2008 Mar 21; 100(11):118703. Epub 2008 Mar 21.* - Kernel-imbedded Gaussian processes for disease classification using microarray gene expression data.[BMC Bioinformatics. 2007]
*Zhao X, Cheung LW.**BMC Bioinformatics. 2007 Feb 28; 8:67. Epub 2007 Feb 28.* - Bayesian inference in FMRI.[Neuroimage. 2012]
*Woolrich MW.**Neuroimage. 2012 Aug 15; 62(2):801-10. Epub 2011 Oct 20.* - Bayesian networks in environmental and resource management.[Integr Environ Assess Manag. 2012]
*Barton DN, Kuikka S, Varis O, Uusitalo L, Henriksen HJ, Borsuk M, de la Hera A, Farmani R, Johnson S, Linnell JD.**Integr Environ Assess Manag. 2012 Jul; 8(3):418-29.*

- Network-based modular latent structure analysis[BMC Bioinformatics. ]
*Yu T, Bai Y.**BMC Bioinformatics. 15(Suppl 13)S6* - fastSTRUCTURE: Variational Inference of Population Structure in Large SNP Data Sets[Genetics. 2014]
*Raj A, Stephens M, Pritchard JK.**Genetics. 2014 Jun; 197(2)573-589* - Uncovering and Testing the Fuzzy Clusters Based on Lumped Markov Chain in Complex Network[PLoS ONE. ]
*Jing F, Jianbin X, Jinlong W, Jinshuai Q.**PLoS ONE. 8(12)e82964* - Efficient discovery of overlapping communities in massive networks[Proceedings of the National Academy of Scie...]
*Gopalan PK, Blei DM.**Proceedings of the National Academy of Sciences of the United States of America. 2013 Sep 3; 110(36)14534-14539* - Community Structure Analysis of Gene Interaction Networks in Duchenne Muscular Dystrophy[PLoS ONE. ]
*Narayanan T, Subramaniam S.**PLoS ONE. 8(6)e67237*

- PubMedPubMedPubMed citations for these articles

- Bayesian Approach to Network ModularityBayesian Approach to Network ModularityNIHPA Author Manuscripts. Jun 27, 2008; 100(25)258701

Your browsing activity is empty.

Activity recording is turned off.

See more...