- We are sorry, but NCBI web applications do not support your browser and may not function properly. More information

- Journal List
- NIHPA Author Manuscripts
- PMC2151742

# Cartography of complex networks: modules and universal roles

## Abstract

Integrative approaches to the study of complex systems demand that one knows the manner in which the parts comprising the system are connected. The structure of the complex network defining the interactions provides insight into the function and evolution of the components of the system. Unfortunately, the large size and intricacy of these networks implies that such insight is usually difficult to extract. Here, we propose a method that allows one to systematically extract and display information contained in complex networks. Specifically, we demonstrate that one can (i) find modules in complex networks and (ii) classify nodes into universal roles according to their pattern of within- and between-module connections. The method thus yields a ‘cartographic representation’ of complex networks.

**Keywords:**network dynamics

## 1. Introduction

Integrative approaches to the study of complex systems demand that one knows the manner in which the parts comprising the system are connected. Recent studies have revealed that these interaction networks often display complex features themselves, thus deviating from purely random or purely ordered topologies [1]-[8]. The Internet [9], food webs [10]-[12], and metabolic networks [13] are examples of complex networks.

To extract the relevant information from the topology of large complex networks, knowledge of the role of each node is of crucial importance. A cartographic analogy helps to illustrate this point. Consider the network formed by all cities and towns in a country—the nodes—and all the roads that connect them—the links. It is clear that a map in which each city and town is represented by a circle of fixed size and each road is represented by a line of fixed width is of little use. Rather, real maps emphasize capitals and important communication lines so that one can obtain scale-specific information at a glance. Similarly, it is difficult, if not impossible, to obtain information from a network with hundreds or thousands of nodes and links, unless the information about nodes and links is presented in a scale-specific context.

Here, we propose a methodology, which is based on the connectivity of the nodes, that yields a ’cartographic representation’ of a complex network. The first step in our method is to identify the modules [14] in the network. In the cartographic picture, modules are analogous to countries or regions, and enable a coarse-grained, and thus simplified, description of the network. Then, we classify the nodes in the network into a small number of *system-independent* ’universal roles’.

## 2. Modules in complex networks

It is a matter of common experience that social networks have communities of highly interconnected nodes that are less connected to nodes in other communities. Such modular structures have been reported not only in social networks [14]-[17], but also in the Internet [18], in food webs [19, 20] and in biochemical networks [21]-[24]. It is widely believed that the modular structure of complex networks plays a critical role in their functionality [21, 22, 25]. There is therefore a clear need to develop algorithms for identifying modules accurately [14, 16, 18, 26, 27].

### 2.1. Modularity

Here we propose a method that is based on the maximization of the *modularity* [16, 26, 28]. For a given partition of the nodes of a network into modules, the modularity *M* of this partition is

where *N*_{M} is the number of modules, *L* is the number of links in the network, *l _{s}* is the number of links between nodes in module

*s*, and

*d*is the sum of the degrees of the nodes in module

_{s}*s*. The rationale for this definition of modularity is the following. A good partition of a network into modules must comprise many within-module links and as few as possible between-module links. However, if one just tries to minimize the number of between-module links (or, equivalently, maximize the number of within-module links) the optimal partition consists of a single module and no between-module links. Equation (1) addresses this difficulty by imposing that

*M*= 0 if nodes are placed at random into modules

*or*if all nodes are in the same cluster [16, 26, 28].

### 2.2. Simulated annealing for module identification

The objective of a module identification algorithm is to find the partition with largest modularity, and several methods have been proposed for attaining such a goal. Most of them rely on heuristic procedures and use *M*—or a similar measure—only to assess their performance. In contrast, we use simulated annealing (SA) [29, 28] to obtain the best determination of the modules of a network by direct maximization of *M*.

Simulated annealing [29] is a stochastic optimization technique that enables one to find ’low cost’ configurations without getting trapped in ’high cost’ local minima. This is achieved by introducing a *computational temperature T*. When *T* is high, the system can explore configurations of high cost while at low *T* the system only explores low cost regions. Starting at high *T* and slowly decreasing it, the system descends gradually toward deep minima, overcoming small cost barriers.

When identifying modules, the objective is to maximize the modularity and, thus, the cost is *C* = −*M*, where *M* is the modularity as defined in equation (1). At each temperature, we perform a number of random updates and accept them with probability

where *C _{f}* is the cost after the update and

*C*is the cost before the update.

_{i}Specifically, at each *T* we propose *n*_{i} = *fS*^{2} individual node movements from one module to another, where *S* is the number of nodes in the network. We also propose *n*_{c} = *fS* collective movements, which involve either merging two modules or splitting a module. For *f* we typically choose *f* = 1. After the movements are evaluated at a certain *T*, the system is cooled down to *T*′ = *cT*, with *c* = 0.995.

Both the individual movement of a node from a module to another, and the merging of two modules are straightforward movements. The split movement is, however, more involved. In principle, any heuristic procedure that generates non-deterministic plausible splits of a module—that is, splits that have a finite probability of being accepted—would suffice.

We have considered a number of different schemes. The one that performs best consists in isolating the module from the rest of the network, and performing a ‘nested’ SA, entirely independent of the ‘global’ one. We start by isolating the module that we intend to split. We then partition its nodes into two random groups, and set the initial temperature of the nested SA to a high value. Then, we make individual node movements according to the modularity of the isolated module and decrease the temperature of the nested SA until it reaches the value of the global SA. The result of the nested SA is a partition of the module into two new modules, which we accept or reject according to the global modularity and the global temperature.

In using *M* as a ‘fitness function’, our method is more ‘transparent’ than those relying on heuristic procedures. Furthermore, SA enables us to carry out an exhaustive search and to minimize the problem of finding sub-optimal partitions. It is noteworthy that, in our method, one does not need to specify *a priori* the number of modules; rather, the number of modules is an outcome of the algorithm.

To test the performance of the method, we build ‘random networks’ with known module structure. Each test network comprises 128 nodes divided into 4 modules of 32 nodes. Each node is connected to the other nodes in its module with probability *p*_{in}, and to nodes in other modules with probability *p*_{out} < *p*_{in}. On average, thus, each node is connected to *k*_{out} = 96*p*_{out} nodes in other modules and to *k*_{in} = 31*p*_{in} in the same module. Additionally, *p*_{in} and *p*_{out} are selected so that the average degree of the nodes is *k* = 16. Our algorithm, which significantly outperforms the best algorithm in the literature [14], is able to reliably identify modules in a network whose nodes have as many as 50% of their connections outside their own module (figure 1).

## 3. Roles in complex modular networks

Already in 1957, Nadel argued that ‘roles’ are the central elements in the analysis of social systems [30, 31], and in the 1970s White and co-workers introduced the concepts of *structural equivalence* and *block model* to address this issue from a network perspective [32]-[34], [31]. Two nodes are structurally equivalent if they are connected to the same nodes [32, 34]. Therefore, any network can be divided into blocks of structurally equivalent nodes in such a way that the structure of the network is *summarized* in a block model by stating the relations between the blocks (figure 2).

**...**

Usually, structural equivalence is too strong a requirement for a large complex network; it is very unlikely that two nodes are connected to exactly the same set of other nodes. Regular structural equivalence [35, 34] relaxes this requirement by requiring that regularly equivalent nodes have identical links to other *equivalent* nodes. Formally, if nodes *i* and *j* are regularly equivalent and *i* has a link to/from some node *k*, then node *j* must have a link to/from some node *l*, and nodes *k* and *l* must be, also, regularly equivalent [34].

Real networks are likely to have both a modular structure and a block structure. This fact raises serious concerns about the conceptual relationship between blocks and roles. Although blocks certainly give interesting information about the overall structure of the network, simple examples, such as the one shown in figure 3, demonstrate that, in general, *blocks cannot be interpreted as roles*.

**...**

Motivated by this handicap of the block scheme, we propose a new method for determining the role of a node in a complex network. Our approach is not based on the idea of blocks but on the general idea that nodes with the same role should have similar topological properties (figure 3(b)).

### 3.1. Within-module degree and participation coefficient

Each module can be organized in very different ways, ranging from totally centralized—with one or a few nodes connected to all the others—to totally decentralized—with all nodes having similar connectivities. Nodes with similar roles are expected to have similar relative within-module connectivity. If *κ*_{i} is the number of links of node *i* to other nodes in its module *s*_{i}, ${\stackrel{-}{\kappa}}_{{s}_{i}}$ is the average of *κ* over all the nodes in *s*_{i}, and *σ*_{κsi} is the standard deviation of *κ* in *s*_{i}, then

is the so-called *z*-score. The within-module degree *z*-score measures how ‘well connected’ node *i* is to other nodes in the module.

Different roles can also arise because of the connections of a node to modules other than its own. For example, two nodes with the same *z*-score will probably play different roles if one of them is connected to several nodes in other modules while the other is not. We define the participation coefficient *P _{i}* of node

*i*as

where *k _{is}* is the number of links of node

*i*to nodes in module

*s*, and

*k*is the total degree of node

_{i}*i*. The participation coefficient of a node is therefore close to one if its links are uniformly distributed among all the modules and zero if all its links are within its own module.

We hypothesize that the role of a node can be determined, to a great extent, by its *within-module degree* and its *participation coefficient*, which define how the node is positioned in its own module and with respect to other modules [36, 37]. Note that these two properties are easily computed once the modules of a network are known.

### 3.2. Arguments for the definition of a universal set of discrete roles

We surmise that the role of a node is defined mainly by its within-community degree and its participation coefficient. Our definition of the roles is firstly determined by the within-module degree. We classify nodes with *z* ≥ 2.5 as module hubs and nodes *z* < 2.5 as non-hubs. Both hub and non-hub nodes are then more finely characterized by using the values of the participation coefficient. Simple calculations suggest that non-hub nodes can be naturally assigned into four roles:

*Ultra-peripheral nodes*(role R1).- If a node has all its links within its module (
*P*≈ 0).

*Peripheral nodes*(role R2).- If a node has at least 60% its links within the module, then for
*k*< 4 it follows that*P*< 0.625 (figure 4(a)).

*Non-hub connectors*(role R3).- If a node with
*k*< 4 has half of its links (or at least two links, whichever is larger) within the module, then it follows that*P*< 0.8 (figure 4(a)). Thus, a plausible region for non-hub connectors is 0.62 <*P*< 0.8.

*Non-hub kinless nodes*(role R4).- If a node has fewer than 35% of its links within the module, it implies that
*P*> 0.8. We surmise that such nodes cannot be clearly assigned to a single module. We thus classify them as kinless nodes. We will demonstrate later that non-hub kinless nodes are found in most network growth models, but not in real-world networks.

Similarly, hubs can be naturally assigned into three different roles:

*Provincial hubs*(role R5).- If a node with a large degree,
*k*» 1, has at least 5*/*6 of its links within the module, then it follows that*P*= 1 − (5*/*6)^{2}− (*k/*6)(1*/k*^{2}) = 0.31 − 1*/*(6*k*) ≈ 0.30.

*Connector hubs*(role R6).- If a node with a large degree has at least half of its links within the module, then it follows that
*P*= 1 − 1*/*4 − (*k/*2)(1*/k*^{2}) = 0.75 − 1*/*(2*k*). Since*k*» 1,*P*< 0.75 for such nodes.

*Kinless hubs*(role R7).- If a hub has fewer than half its links within the module, i.e.,
*P*> 0.75, then we surmise that it may not be clearly associated with a single module. We then classify it as a kinless hub. We will demonstrate later that hubs in most network growth models are actually kinless hubs.

In total, we are left with seven roles that correspond to seven regions of the *zP* parameter space (figure 5).

## 4. Roles in real networks: validation of the role definitions

Our definition of a set of distinct roles has been, so far, based on mathematical arguments. A question that we need to address is, therefore, how this definition relates to the roles of nodes in real networks.

In order to obtain as complete as possible a picture of how the nodes in a given network might populate the *zP* parameter space, we calculate *z* and *P* values for all the nodes in a large number of networks (figure 6). Specifically, we obtain these values for (i) the metabolic networks of three organisms, (ii) the proteome of *C. elegans*, (iii) the North-American airport network, (iv) the collaboration networks of chemical engineers as defined by publications in two different journals, (v) the Internet at the autonomous system level. Additionally, we obtain these values for nodes in model networks generated by the Barabási–Albert network growth model [3] and the Erdös–Rényi model [40]. In all, we consider in our analysis 26 771 nodes.

### 4.1. Uncertainty in the position of nodes in parameter space and the density landscape

In our analysis, we estimate the value of the within-module degree of each node and its participation coefficient. Since we have access to these networks at a single moment in time, it is plausible to assume that the values that we measure for *z _{m}* and

*P*for a given node are not error free. To take this uncertainty into consideration, we assume that each node could be in a

_{m}*region*of the

*zP*space, which is centred in the measured (

*z*) value. Specifically, we assign to each node a Gaussian distribution centred at (

_{m}, P_{m}*z*) and with widths

_{m}, P_{m}*σ*and

_{z}*σ*, which gives the probability of finding that particular node at any point of the

_{P}*zP*parameter space.

By adding the distributions of all nodes, one obtains a ‘density landscape’ that represents the probability of finding a node at a certain point of the *zP* space. In figure 7, we plot the density landscape obtained for the 26 771 nodes with *σ _{P}* = 0.035. In the density landscape, high probability regions are valleys and low probability regions are peaks. Then, at (almost) every point of the landscape, one can ‘follow’ the gradient to reach a local minimum. The region of the space that ‘flows’ toward a certain minimum is what we call a ‘basin of attraction’.

### 4.2. Non-hub nodes

As discussed above, we define non-hub nodes as those with *z* < 2.5. We then calculate the node density plot for different choices of the values of *σ _{z}* and

*σ*and identify the basins of attraction for the different node density plots (figure 8). These plots confirm that our definition of non-hub roles with boundaries at

_{P}*P*= 0.62 and

*P*= 0.80 is a sensible one and that, indeed, these regions of the

*zP*space correspond to distinct universal roles in real networks.

### 4.3. Hub nodes

We define hub nodes as those with *z* ≤ 2.5. We then calculate the node density plot for different choices of the values of *σ _{z}* and

*σ*and identify the basins of attraction for the different node density plots (figure 9).

_{P}*σ*= 0.03 and (b)

_{P}*σ*= 0.05.

_{P}In this case, there are many more basins of attraction than for the non-hub region because of the scarcity of data points. However, the density plots are compatible with a selection of three regions corresponding to distinct roles, with boundaries at *P* = 0.30 and 0.75, as estimated before.

## 5. Conclusions

Computational and high throughput techniques are leading to an explosive and unprecedented growth in the amount of information available for some physical, biological, and socio-economic systems. These advances are creating the opportunity to revolutionize our understanding of nature, life and disease, and social organization. Interpretation of these data remains, however, a major scientific challenge.

Here, we presented a methodology for extracting relevant scale-specific information from complex networks. Our method is based on the analysis of the connectivity patterns of the nodes, and yields a ‘cartographic representation’ of a complex network. The first step in our method is to identify the modules in the network. In the cartographic picture, modules are analogous to countries or regions, and enable a coarse-grained, and thus simplified, description of the network. Then, we classify the nodes in the network into a small number of *system-independent* ‘universal roles’. A node's role is determined from its pattern of inter- and intra-module connections.

Our ‘cartographic method’ provides a way to process the information contained in the structure of complex networks, and to extract knowledge about the function carried out by the network and its constituents. This should allow us, in turn, to identify key players in the network. Some of these key nodes are likely to be already known. For example, hubs are highly visible due to their large number of connections. More interestingly, our method also enables one to identify more ‘subtle’ roles, such as non-hub connectors, which play important structural roles in spite of their small number of connections. In metabolic networks, for example, it seems that these nodes are highly conserved compared to provincial hubs [24].

## Acknowledgments

We thank L Broadbelt, A A Moreira, E T Papoutsakis, M Sales-Pardo, and D B Stouffer for stimulating discussions and helpful suggestions. RG thanks the Fulbright Program and the Spanish Ministry of Education, Culture & Sports. LANA gratefully acknowledges the support of a Searle Leadership Fund Award and of a NIH/NIGMS K-25 award.

## References

## Formats:

- Article |
- PubReader |
- ePub (beta) |
- PDF (1.4M)

- Functional cartography of complex metabolic networks.[Nature. 2005]
*Guimerà R, Nunes Amaral LA.**Nature. 2005 Feb 24; 433(7028):895-900.* - Network theory to understand microarray studies of complex diseases.[Curr Mol Med. 2006]
*Benson M, Breitling R.**Curr Mol Med. 2006 Sep; 6(6):695-701.* - New Markov-Shannon Entropy models to assess connectivity quality in complex networks: from molecular to cellular pathway, Parasite-Host, Neural, Industry, and Legal-Social networks.[J Theor Biol. 2012]
*Riera-Fernández P, Munteanu CR, Escobar M, Prado-Prado F, Martín-Romalde R, Pereira D, Villalba K, Duardo-Sánchez A, González-Díaz H.**J Theor Biol. 2012 Jan 21; 293:174-88. Epub 2011 Oct 25.* - Ecological modules and roles of species in heathland plant-insect flower visitor networks.[J Anim Ecol. 2009]
*Dupont YL, Olesen JM.**J Anim Ecol. 2009 Mar; 78(2):346-53. Epub 2008 Nov 4.* - Systems biology and the origins of life? part II. Are biochemical networks possible ancestors of living systems? networks of catalysed chemical reactions: non-equilibrium, self-organization and evolution.[C R Biol. 2010]
*Ricard J.**C R Biol. 2010 Nov-Dec; 333(11-12):769-78. Epub 2010 Nov 17.*

- ModuleRole: A Tool for Modulization, Role Determination and Visualization in Protein-Protein Interaction Networks[PLoS ONE. ]
*Li G, Li M, Zhang Y, Wang D, Li R, Guimerà R, Gao JT, Zhang MQ.**PLoS ONE. 9(5)e94608* - Graph theoretical analysis of developmental patterns of the white matter network[Frontiers in Human Neuroscience. ]
*Chen Z, Liu M, Gross DW, Beaulieu C.**Frontiers in Human Neuroscience. 7716* - Increasing functional modularity with residence time in the co-distribution of native and introduced vascular plants[Nature Communications. ]
*Hui C, Richardson DM, Pyšek P, Le Roux JJ, Kučera T, Jarošík V.**Nature Communications. 42454* - Developmental Stage of Parasites Influences the Structure of Fish-Parasite Networks[PLoS ONE. ]
*Bellay S, de Oliveira EF, Almeida-Neto M, Lima Junior DP, Takemoto RM, Luque JL.**PLoS ONE. 8(10)e75710* - Structural and Functional Analysis of Giant Strong Component of Bacillus thuringiensis Metabolic Network[Brazilian Journal of Microbiology. 2009]
*Ding DW, Ding YR, Li LN, Cai YJ, Xu WB.**Brazilian Journal of Microbiology. 2009; 40(2)411-416*

- PubMedPubMedPubMed citations for these articles

- Cartography of complex networks: modules and universal rolesCartography of complex networks: modules and universal rolesNIHPA Author Manuscripts. Feb 1, 2005; 2005(P02001)P02001-1PMC

Your browsing activity is empty.

Activity recording is turned off.

See more...