![]() | ![]() |
Formats:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright : © 2008 Hintze and Adami. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Evolution of Complex Modular Biological Networks Keck Graduate Institute of Applied Life Sciences, Claremont, California, United States of America Lauren A Meyers, Editor University of Texas at Austin, United States of America * To whom correspondence should be addressed. E-mail: adami/at/kgi.edu Received May 1, 2007; Accepted December 20, 2007. This article has been cited by other articles in PMC.Abstract Biological networks have evolved to be highly functional within uncertain environments while remaining extremely adaptable. One of the main contributors to the robustness and evolvability of biological networks is believed to be their modularity of function, with modules defined as sets of genes that are strongly interconnected but whose function is separable from those of other modules. Here, we investigate the in silico evolution of modularity and robustness in complex artificial metabolic networks that encode an increasing amount of information about their environment while acquiring ubiquitous features of biological, social, and engineering networks, such as scale-free edge distribution, small-world property, and fault-tolerance. These networks evolve in environments that differ in their predictability, and allow us to study modularity from topological, information-theoretic, and gene-epistatic points of view using new tools that do not depend on any preconceived notion of modularity. We find that for our evolved complex networks as well as for the yeast protein–protein interaction network, synthetic lethal gene pairs consist mostly of redundant genes that lie close to each other and therefore within modules, while knockdown suppressor gene pairs are farther apart and often straddle modules, suggesting that knockdown rescue is mediated by alternative pathways or modules. The combination of network modularity tools together with genetic interaction data constitutes a powerful approach to study and dissect the role of modularity in the evolution and function of biological networks. Author Summary The modular organization of cells is not immediately obvious from the network of interacting genes, proteins, and molecules. A new window into cellular modularity is opened up by genetic data that identifies pairs of genes that interact either directly or indirectly to provide robustness to cellular function. Such pairs can map out the modular nature of a network if we understand how they relate to established mathematical clustering methods applied to networks to identify putative modules. We can test the relationship between genetically interacting pairs and modules on artificial data: large networks of interacting proteins and molecules that were evolved within an artificial chemistry and genetics, and that pass the standard tests for biological networks. Modularity evolves in these networks in order to deal with a multitude of functional goals, with a degree depending on environmental variability. Relationships between genetically interacting pairs and modules similar to those displayed by the artificial gene networks are found in the protein–protein interaction network of baker's yeast. The evolution of complex functional biological networks in silico provides an opportunity to develop and test new methods and tools to understand the complexity of biological systems at the network level. Introduction Biological function is an extremely complicated consequence of the action of a large number of different molecules that interact in many different ways. Elucidating the contribution of each molecule to a particular function would seem hopeless, had evolution not shaped the interaction of molecules in such a way that they participate in functional units, or building blocks, of the organism's function [1–4]. These building blocks can be called modules, whose interactions, interconnections, and fault-tolerance can be investigated from a higher-level point of view, thus allowing for a synthetic rather than analytic view of biological systems [5,6]. The recognition of modules as discrete entities whose function is separable from those of other modules [7] introduces a critical level of biological organization that enables in silico studies. Here, we evolve large metabolic networks based on an artificial chemistry of precursors and metabolites, and examine topological and information-theoretical modularity measures in the light of simulated genetic interaction experiments. Intuitively, modularity must be a consequence of the evolutionary process, because modularity implies the possibility of change with minimal disruption of function [1], a feature that is directly selected for [3,8]. Yet, if a module is essential, its independence from other modules is irrelevant unless, when disrupted, its function can be restored either by a redundant gene or by an alternative pathway or module. Furthermore, modularity must affect the evolutionary mechanisms themselves, so that both robustness and evolvability can be optimized simultaneously [1,9,10]. A thorough analysis of these concepts requires both an understanding of what constitutes a module in biological systems and tools to recognize modules among groups of genes. In particular, a systems view of biological function requires that we develop a vocabulary that not only classifies modules according to the role they play within a network of modules and motifs, but also how these modules and their interconnections are changed by evolution, i.e., how they constitute units of evolution targeted directly by the selection process [4]. The identification of biological modules is usually based either on functional, evolutionary, or topological criteria. For example, genes that are co-expressed and/or coregulated can be classified into modules by identifying their common transcription factors [11,12], while genes that are highly connected by edges in a network form clusters that are only weakly connected to other clusters [13]. From an evolutionary point of view, genes that are inherited together but not with others often form modules [14–16]. Yet, the concept of modularity is not at all well defined. For example, the fraction of proteins that constitutes the core of a module and that is inherited together is small [14], implying that modules are fuzzy but also flexible so that they can be rewired quickly, allowing an organism to adapt to novel circumstances [17]. Progress in our understanding of the modular nature of biological networks must come from new functional data that allow us to study different groups of genes both together and apart, and compare this data to our topological, information-theoretic, and evolutionary concepts. A promising set of data is provided by genetic interactions [18], such as synthetic lethal pairs of genes (pairs of mutations that show no phenotype on their own but that are lethal when combined), or dosage rescue pairs, in which a knockout or mutation of a gene (in general, a loss of function) is suppressed by overexpressing another gene. Such pairs are interesting because they provide a window on cellular robustness and modularity brought about by the conditional expression of genes. Indeed, the interaction between genes—gene epistasis [19]—has been used to successfully identify modules in yeast metabolic genes [20]. However, often interacting pairs of genes lie in alternate pathways rather than cluster in functional modules, do not interact directly, and thus are expected to straddle modules more often than lie within one [21]. In silico evolution is a powerful tool if complex networks can be generated that share the pervasive characteristics of biological networks, such as error tolerance, small-world connectivity, and scale-free degree distribution [22]. If furthermore each node in the network represents a simulated chemical or a protein catalyzing reactions involving these molecules, then it is possible to conduct a detailed functional analysis of the network by simulating knockdown or overexpression experiments. This functional datum can then be combined with evolutionary and topological information to arrive at a more sharpened concept of modularity that can be tested in vitro when more genetic data become available. Previous work on the in silico evolution of metabolic [23], signaling [24,25], biochemical [26,27], regulatory [28], as well as Boolean [29], electronic [30], and neural [30–32] networks has begun to reveal how network properties such as hubness, scaling, mutational robustness as well as short pathway length can emerge in a purely Darwinian setting. In particular, in silico experiments testing the evolution of modularity both in abstract [33] and in simulated electronic networks [30] suggest that environmental variation is key to a modular organization of function. In the experiments we describe below, we evolve large metabolic networks of many hundreds of nodes with over a thousand edges for up to 5,000 generations from simple networks with only five genes. These networks are complex—in the sense of information-rich [34,35]—are topologically interesting, and function within simulated environments with different variability that can be arbitrarily controlled. We analyze these networks using new tools that allow us to see genetically interacting pairs in the light of different concepts of modules, and compare our results to an application of those tools to the yeast protein–protein interaction network. Results Structure of the Model Artificial chemistry. We evolve the genomes of artificial cells that produce metabolites within a simple artificial chemistry of linear molecules constructed from three atoms, termed 1, 2, and 3. In valid molecules each atom must carry as many bonds as the numeral representing it, with a maximum length of twelve atoms. For example, 1-2-2-1 is a valid molecule, as is 2=2 or 1-2-3=3-2-1, but 1-3=1 is not. In this chemistry there are thus 608 valid molecules, which can undergo chemical reactions of the form A + B → A′ + B′ through a form of cleavage that preserves the atomic content. For example, the valid molecules 1-2-2–1 and 2=3–3=2 can react by cleaving each molecule in the middle (indicated by the arrow):
Organisms. Each organism in an evolving population consists of a cell containing molecules and proteins that perform various functions, as well as a genome (on two circular chromosomes) that codes for those proteins. The cells float in a 2D chemostat in which the smallest 53 of the 608 possible molecules are produced at a constant rate at locations from which they diffuse, and all molecules produced by the cell and exported to the environment are removed every update. The 53 short molecules play the role of precursors for the synthesis of the remaining more complex molecules. The chemostat can carry 1,000 organisms, and at each update 1 of 16 organisms is removed (see Methods). For a cell to divide, it must produce a sufficient amount of some of the remaining 555 molecules (metabolites) within the cell, by importing any of the 53 precursors using specific transporter proteins and catalyzing any of the possible reactions with enzymatic proteins specific to the reaction. The precursors also leak into the cell at a concentration of a millionth of their concentration at the cell's location. In principle, cells can move around on the two-dimensional plane if they develop proteins for ciliates and flagella (for example, to follow the source of the precursor molecules), but these are turned off for the present experiments, so that the cells are anchored to the center of the chemostat. A description of enzyme and transporter affinities to molecules, as well as details of the calculation of organismal fitness as a function of the metabolites the cell produces is found in the Methods. Proteins are encoded in the genome using the alphabet [0,1,2,3]. Each gene starts with four consecutive zeros (start codon), followed by the expression level, the type of protein (import, export, or catalytic), followed by the specificity to the reaction and the affinity to the molecule transported or catalyzed (see Methods). The genomes are evolved with a standard Genetic Algorithm with fitness-proportional selection (Wright-Fisher model), a Poisson-random point mutation rate μ = 1 per genome (but capping the maximum number of mutations per genome at six), and the possibility of gene duplication and deletion (see Methods). Environments. In order to simulate dynamic and unpredictable environments, we designed three environments that differ in their precursor availability. In all environments the sources of the 53 precursor molecules are randomly distributed, and constantly replenished so that they cannot be drawn down. In the static environment, the location of the precursor sources is fixed throughout the experiment, while in the quasi-static environment the location of a single random precursor is moved each update. In the dynamic environment, the source of all precursors is moved every update, and 25% of the precursors are randomly chosen to be unavailable. The set of unavailable precursors also changes periodically. Most experiments were repeated in each of these environments. Organism and network evolution. Cells are initialized with a genome encoding five genes: two proteins catalyzing molecular reactions that produce metabolites that contribute to fitness, one that produces a metabolite that does not contribute to fitness, one import protein and one export protein (see Methods). Different metabolic pathways evolve depending on the imported molecules and their abundance, and can be represented by a network connecting molecules and proteins. For example, the pathway importing molecule 1-2-1 with protein A, molecule 1-2-2-1 with protein B, and catalyzing the reaction 1-2-1 + 1-2-2-1 → 1–1 + 1-2-2-2-1 with protein C and subsequent export of 1–1 using protein D (Figure 1
Phylogenetic depth. In asexually evolving populations, every organism has a unique line of descent that connects it to the ancestral genome, via intermediary genomes carrying heritable genetic differences between mother and daughter genome that occurred during reproduction. Often these changes are single substitutions, but can also be duplications or deletions of genomic sequences of various lengths. Because the environments present the same niche to every organism, the lines of descent coalesce quickly to a single dominating type irrespective of the depth. Since beneficial mutations are very common, the phylogenetic depth is a good proxy for the number of generations elapsed in a run up to that depth. Network Evolution Networks evolve to be highly complex, increase in size and develop complex pathways to metabolize the precursors. Typically, pathways evolve first via duplication and divergence of the existing genes, but later pathways are combined and new pathways emerge by evolving import proteins for precursors that leak into cells and for which catalytic proteins had evolved. Reaction networks are complicated, involving loops and multiple interconnections. Genetic information content about environment increases in evolution. In the example experiment depicted in Figure 2
We show in Figure 2 Evolved metabolic networks have pervasive properties. The metabolic networks generated by the evolved genomes can be analyzed using standard tools, and display some of the usual properties that distinguish biological networks from random graphs [22]. Figure 3
The probability distribution that a substrate participates in k metabolic reactions is also a power law, with p(k) ~ k−λ with λ ≈ 2.23 (Figure S1). A similar value was found empirically for this distribution in the E. coli metabolic network [22]. The paths between nodes in the network (the “average geodesic distances”, see Methods) are short (“small-world networks”), normally distributed (Figure S2), and they remain short even as the network size grows during evolution (Figure S3). This small-world character has been shown to be a universal feature of metabolic networks in 43 organisms [22,36], and is hypothesized to be an adaptation geared towards minimizing the transition time between metabolic states when reacting to changed external conditions. Similar to what was observed in yeast protein–protein interaction networks [21], the path length in our networks increases dramatically up to a break point when nodes that are characterized as hubs are removed from the network (see Figure 4
Network modularity increases in evolution. We can assign a network modularity score to every network on the evolutionary line of descent using the information bottleneck algorithm of Ziv et al. [40] as described in Methods. The modularity of the networks increases over evolutionary time in the long run, but can go up or down intermittently as new pathways are forged. Figure 6
Our finding that networks evolved in dynamic environments are less modular than those evolved in static environments appears to run opposite to the conclusion reached by Kashtan and Alon [30], who noted that dynamic environments are necessary for the evolution of modularity. However, metabolic networks are very different from the type of logical networks evolved there, as is the nature of environmental changes. Our dynamic environments change randomly, whereas Kashtan and Alon's environment changes in a modular fashion, rewarding one or the other function in turn. We further comment on this observation in the Discussion. Mutational and environmental robustness decrease. Biological networks have evolved to be robust to mutations, knockouts, and environmental noise, as compared to random networks [3]. This robustness is believed to be due to genetic redundancy [41] as well as to the interaction between unrelated genes that can compensate for loss of function [42]. We have measured the robustness of our evolved networks to node removal as well as to environmental noise, by measuring the fitness of cells as more and more nodes are removed, and as more and more of precursor molecule concentrations are set to zero. The scaled fitness of cells decreases approximately exponentially with the number of nodes or precursors removed (see Figure S4A and S4B), with a fitness decay parameter that reflects the fitness effect of accumulating mutations (see Methods). The larger this parameter the more fragile the organism; consequently we define robustness as one minus fragility. We show the robustness parameter ρKO and ρENV along the line of descent in Figure 7
Genetic interactions and modularity. To understand how modules interact, we studied whether genetic interactions occur predominantly between genes within modules or between modules, for the networks evolved in dynamic vs. static environments. We used two different methods to determine clusters: a topological one (betweenness-centrality clustering), and an information-theoretic one (network bottleneck method, see Methods). For both of these methods, the clustering method returns a ranked list of nodes, but the orders are different, and they reflect different properties of the nodes. Modules are often thought to communicate with each other via nodes with high betweenness centrality (BC) [43]. Such nodes are distinguished not by their connectivity, but by being major signal thoroughfares: the shortest path of many pairs of nodes runs through them ([44,45], see Methods). To test the modular structure of our networks, we remove nodes with high BC one by one in the order of their (reiterated) BC rank, and study the rate at which pairs of nodes with a given character are separated, i.e., the path between them is severed. We obtained a list of synthetic lethal pairs by finding all those pairs of genes whose knockout does not affect fitness on their own, but cause a loss of fitness when knocked out together. Such pairs (for the network shown in Figure 5
The rate at which pairs of genes are separated is explained in part by their distance distribution (Figure 8 When removing nodes with high BC, knockdown suppressor pairs (green) are separated quickly, in fact much more quickly than is suggested by their distance distribution, which peaks in between that of the random pairs and the synthetic lethal pairs (Figure 8 We also studied how the decay of genetically interacting pairs compares to global topological properties, and compared their behavior to similar experiments performed in random networks. The size of the largest connected component in the functional network (grey line in Figure 8 We can compare the behavior of these genetically interacting pairs in evolved metabolic networks to equivalent pairs in the highly curated yeast protein–protein interaction network of Reguly et al. [18] of 1,038 nodes with the genetic interactions removed. For this network, both synthetic lethal and compensatory pairs are separated later than random pairs (see Figure 9
We can also study the relationship of genetically interacting pairs with clusters determined by an information-theoretic method [40]. Clusters determined by this method are chosen so that they simplify the original network while the relevant character of the network (the fidelity) is determined by network diffusion (see Methods). This algorithm results in a list of nodes that reflects the order in which nodes are merged to generate the optimal clustered network. We can use this list to study the fraction of genetically interacting pairs that remain separate under the node merging procedure (shown in Figure 10
Discussion Evolution shapes our artificial metabolic networks into complex tightly connected pathways that are modular in nature, and that share many of the well-known properties of biological networks, such as scale-free edge distribution, small-world connectivity, and hubness. We can use these networks to study how established concepts of modularity—such as betweenness centrality clustering and information-theoretic modularity—compare to the rate at which genetically interacting pairs are disrupted by either removing nodes with high BC, or merging nodes that have been assigned to the same information-theoretical cluster. By evolving networks in different environments that are expected to yield different modularities, we can dissect the impact of genetically interacting pairs on modularity notions. When we compare the behavior of genetically interacting pairs in our evolved networks to those in the yeast protein–protein interaction network, we find commonalities and some discrepancies. One of our main findings is that synthetic lethal pairs usually lie within modules, no matter how modules are defined, and that compensatory (suppressor) pairs preferentially straddle modules. We also find that in our metabolic networks, many nodes that are assigned the same module in fact have high betweenness centrality themselves, a property that does not appear to be shared with the yeast protein–protein interaction graph, where random pairs separate faster than compensatory pairs. A number of differences between the networks can explain these findings. First, the functional graphs (Figure 1 We find no evidence that dynamic environments are required for the evolution of functional modules [30,33]. Rather, it appears that genes segregate into functional modules as long as there are a large number of different ways to achieve functionality. Indeed, on the contrary, metabolic networks evolved in dynamic environments appear to be less modular. We can understand this finding by noting that our dynamic environments change randomly by omitting the availability of a random fraction of precursors, as opposed to the modular changes implemented in Ref. [30]. To deal with the unpredictability of the environment, our metabolic networks first evolve reactions that produce precursors from other precursors and metabolites (see Figure S8) such that several different genes produce the same precursor from different precursors and metabolites at any point in time. In that way, the evolved redundancy ensures the presence of any particular precursor. Because this redundancy creates connections between pathways, the modularity score of such networks is lower. We also find that networks evolve more slowly in dynamic environments, but they are more robust to environmental fluctuations in return. Thus, at least for metabolic networks, robustness and modularity do not necessarily go hand-in-hand. The in silico evolution of functional networks based on artificial genetics and chemistry presents an opportunity to study how complex networks, their structure and organization, evolve over time to cope with environments with varying degrees of predictability. We believe that such networks can provide a formidable benchmark for experiments with biochemical networks, and allow predictions with hitherto unavailable accuracy. The type of functional interaction experiments that we performed on our large evolved networks anticipates high-throughput efforts currently under way using temperature-sensitive yeast deletion mutants and their multi-copy suppressors, and suggests that dosage rescue (or multi-copy suppressor) pairs of genes represent an appropriate and sensitive tool to study modularity in biological networks. Methods Genome code and organization. Molecular interactions occur through proteins that catalyze the reactions between the molecules of our artificial chemistry and transport them in and out of cells. These proteins are encoded by an artificial genetics using the four “nucleotides” 0, 1, 2, and 3 and determine the rate at which the reactions proceed. An open reading frame on a chromosome starts with four zeros (see Table S1), followed by a code indicating the expression level, followed by a tag designating the protein type, followed by the specificity and the affinity. The specificity is a 12 nucleotide stretch that determines the target molecule or reaction (e.g., if the tag is “import”, 123321000000 specifies that molecule 1-2-3=2-1 is transported into the cell). Reactions are specified by mapping the 5,020,279 legal reactions to the 412 = 16,777,216 possible 12-mer specificities, in such a manner that any mutation in the specificity region is guaranteed to catalyze a legal reaction. A protein's affinity is determined by an “active site” that has four domains; one each for the four molecules involved in the reaction A + B → A′ + B′. The binding affinity of a transport protein to the specified target is obtained by averaging the affinity of all four domains. Each domain has twelve entries that are matched to particular molecules (of maximally twelve atoms) in the following manner. First, a molecule is translated into its binary equivalent, for example, 1-2-3=3-2-1 is 01-10-11-11-10-01-00-00-00-00-00-00 (zeros are used to pad molecules smaller than 12 atoms). The 24 bit domain of the protein P is compared with the binary equivalent of the target molecule M, resulting in an affinity score D(M,P) that is highest if the protein domain is precisely complementary to the molecule. So, for example the perfect domain for molecule 1-2-3=3-2-1 is 10-01-00-00-01-10-11-11-11-11-11-11. Numerically, D(M,P) is obtained as 1 − S(M,P), where S(M,P) is a similarity score
is the base-10 translation of the logical bitwise EQUAL of the molecule's and protein's ith site. The base-10 translation of the equivalent of a perfect match (‘11') is 3, so that the maximal
is 12 × 32 = 108, ensuring that 0 ≤ A(M,P) ≤ 1. The complementarity scheme is chosen to minimize the occurrence of domains of the type 00-00-00–00, as they would be decoded as start codons. The maximal genome size in this model is 120,000 bits, or 60,000 nucleotides, on 2 circular chromosomes. Genes are allowed to overlap. Note that because of the absence of recombination, one of the two chromosomes consistently degenerates during evolution so that all of the complexity ends up contained in a single circular genome.
Chemostat physics and reaction kinetics. Cells live in a two-dimensional space where precursor molecules are produced at defined locations and diffuse out, so that the concentration of molecule M at distance d from the source, [M](d), depends on the concentration at the source via
Molecule concentrations [Mi] are updated according to a discretized version of the standard metabolic rate equations [46]
and vj is the metabolic flux
In Equation 3,
is the number of edges leaving molecule l, and we defined the reaction matrix for reaction j
j ) by
Organism fitness. The fitness of an organism is determined by the amount and complexity of the molecules it can metabolize from the precursors. The 608 possible molecules of the artificial chemistry are numbered according to their complexity (length and type of atoms):
In Equation 6, the product extends only across metabolites that have achieved non-vanishing abundance during a cell's lifetime. Because of the explicit dependence of a cell's fitness on the concentration of precursors in the cell's vicinity, fitness is context dependent, and in principle depends on the frequency of other cells in a population. Due to the multiplicative nature of the fitness function, the discovery of new pathways is always beneficial with the same percentage, and the fitness increases exponentially during evolution. We usually plot the logarithm of the fitness, which is additive. Evolution. A Genetic Algorithm [47] is used to evolve circular genomes encoding genes using the nucleotide alphabet [0,1,2,3]. Mutations are Poisson-random with a mean of one mutation per genome (and a maximum of six mutations per genome). With a probability of 1/16 per genome, a stretch of 4–512 base pairs is duplicated and inserted directly adjacent to the duplicated stretch. With the same probability, a stretch of the same size is deleted from the genome. No recombination takes place between genomes. The probability for a genome to be replicated is proportional to the fitness calculated in Equation 6 (Wright-Fisher selection). Organisms must be at least 8 updates old before they can replicate, and they are protected from death during those first 8 updates. Ancestral genome. We designed the ancestral genome to have 3 genes on the first 1,000 bp chromosome, with the 2nd chromosome of 1,000 bps filled with poly-‘3′s in order to be as distant as possible to start codons. However, it turned out that the third gene has a start codon (0000) within its specificity domain as well as in the sequence specifying the expression level, both of which give rise to two additional proteins in overlapping reading frames (see Figure 11
Information content. The complexity of an organism can be estimated by the amount of information its genome encodes about the environment within which it thrives [34,35,48]. We can estimate the information content I of a sequence s of length L encoding the bases 0,1,2,3 by I = L − H(s), where the entropy of the sequence H(s) is approximated by the sum of the per-site entropies
, with a per-site entropy
In Equation 7, the pi are the probabilities to find base i at position x, which can be obtained from an alignment of genomes in mutation-selection balance. For small populations and long genomes, this balance is not achieved, and the substitution probabilities pi must be estimated using the fitness effect of each substitution wi according to the implicit equation [49]
is the mean fitness of the possible alleles at that position and μ is the mutation rate per site. We obtain the fitness wi of each allele at each position by constructing the genotype and evaluating the fitness of the cell it gives rise to in the appropriate environment. (Mutations that appear to be beneficial are counted as wild-type fitness.) Using the four values wi, the probabilities pi can be obtained by iterating Equation 8 10,000 times or until the variance of all pi drops below 10−12.
Information-theoretic clustering. To assign a modularity score to our networks, we use the information bottleneck method [50], as applied to biological networks by Ziv et al. [40]. Briefly, the method assigns clusters to the nodes of a network described by a random variable X using an assignment random variable Z and a relevance variable Y (the bottleneck) by maximizing both the simplicity of the description (maximizing the mutual entropy between the graph and its description I(X : Z)) and its relevance or fidelity (maximizing I(Y : Z)). This is achieved via a hard clustering method that starts with a description Z with one fewer nodes than X, then calculates the conditional probability p(z | y) from a diffusion process and selects those nodes of X to merge in the description Z that result in the highest I(Y : Z). This process iterates until all the nodes have been joined and the size of Z is one. This procedure results in a list of nodes (from highest cluster probability to lowest) that can be used to study how synthetic lethal and knockdown suppressor pairs are merged as an alternative to the topological clustering via betweenness centrality. A modularity score for each network is obtained as the area under the information curve obtained by plotting the normalized quantities I(Z : X)/H(X) and I(Z : Y)/I(X:Y) against each other [40]. Because random graphs give rise to an information curve with area 0.5, any modularity score above 0.5 signals a modular organization of the network. To obtain the modularity score in Figure 6
Average geodesic distance. The average distance D of each node to any other defines the average geodesic distance of a graph
Network and environmental robustness. We measure the robustness of evolved networks with respect to node deletions and to changes in the precursor concentrations. Even though these perturbations are unrelated prima facie, there is evidence that mutational robustness and robustness to noise are correlated [28]. We measure mutational robustness by removing n random nodes and determining the (scaled) fitness of the remaining graph
, where
is the mean of 1,000 independent fitness measurements of a network where n random nodes have been removed. The fitness decreases exponentially as long as less than 30% of the nodes are removed, suggesting a (“knock-out”) robustness parameter ρKO defined via
Environmental robustness is determined by evaluating the fitness of an organism as more and more of the 53 precursor molecules are removed. Fitness declines exponentially with the number of deleted nodes or chemicals removed, and robustness can be quantified by the slope of the decrease of log fitness, defining ρENV in a similar manner. Betweenness centrality. The betweenness centrality of a node in a network topology measures how many shortest paths go through that node. If bi is the ratio of the number of shortest paths between a pair of nodes in the network that pass through node i and the total number of shortest paths between those two nodes, then the unscaled betweenness of node i is
, and the (scaled) betweenness centrality is [45]
Software availability. The software to implement the artificial chemistry and genetics, as well as the evolution experiments described in this manuscript, is available at http://public.kgi.edu/~ahintze. Figure S1: Distribution of Molecules in Reactions Probability distribution p(k) that a molecule participates in k reactions, compiled from 80 runs to depth 1,000 in a dynamic environment. The distribution is fit to a power law, with λ ≈ 2.23 (r2 = 0.88). Error bars are standard error. Variable bin sizes are determined by the threshold binning method [37], with a minimum of T = 100 points per bin. (135 KB PDF) Click here for additional data file.(136K, pdf) Figure S2: Evolution of Path Length Distribution Evolution of the distribution p(d), the probability to find two nodes in the network that are a distance d apart, for every 1,000th network on the line of descent, for a network evolved in a dynamic environment. (409 KB PDF) Click here for additional data file.(409K, pdf) Figure S3: Average Path Length D on the Line of Descent Mean path length D (see Methods) for a network with (A) metabolic, and (B) protein–protein annotation, in three different environments, for the network evolution shown in Figure 2 (347 KB PDF) Click here for additional data file.(347K, pdf) Figure S4: Robustness of Fitness under Precursor and Gene Removal Decrease of normalized log fitness with increasing precursor removal (A), node removal (B), as a function of the position on the line of descent (colors in inset of (A)). Depth 0: ancestor. (409 KB PDF) Click here for additional data file.(409K, pdf) Figure S5: Robustness of Decay of Knockdown Suppressor Pairs Fraction of knockdown suppressor pairs separated upon removing nodes with high BC using all (100%, weakest criterion) or fewer (only the top 10%–80%) of suppressor pairs. The top 95% of pairs were used for Figures 8 (409 KB PDF) Click here for additional data file.(73K, pdf) Figure S6: Modularity Analysis for Static and Quasi-static Environments Analysis of the separation of pairs of genes from networks evolved in a static (A) and quasi-static (B) environment, as in Figure 8 (73 KB PDF) Click here for additional data file.(409K, pdf) Figure S7: Distance Distribution of Pairs of Genes Distance distribution of pairs on a network evolved in static (A) and quasi-static (B) environment. Red, synthetic lethal pairs; green, knockdown suppressor pairs; black, random pairs. (409 KB PDF) Click here for additional data file.(409K, pdf) Figure S8: Fraction of Genes Producing Precursors Fraction of genes involved in the production of one of the 53 precursor molecules for the network evolved in a dynamic environment (red) versus a static environment (green line). (409 KB PDF) Click here for additional data file.(409K, pdf) Acknowledgments We would like to thank D. Galas, H. Sauro, A. Raval, R. Rao, and N. Chaumont for discussions and critical insight, and S. Benner for discussions on artificial chemistry. Footnotes A previous version of this article appeared as an Early Online Release on January 2, 2008 (doi:10.1371/journal.pcbi.0040023.eor). Author contributions. AH and CA conceived and designed the model, simulations, and methods. AH wrote the simulation and analysis tools, performed the experiments, and analyzed the experiments. CA wrote the manuscript. Funding. This work was supported by the National Science Foundation's Frontiers in Integrative Biological Research grant FIBR-0527023. Competing interests. The authors have declared that no competing interests exist. References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Proc Natl Acad Sci U S A. 1998 Jul 21; 95(15):8420-7.
[Proc Natl Acad Sci U S A. 1998]Nature. 2005 Nov 24; 438(7067):443-8.
[Nature. 2005]Nature. 1999 Dec 2; 402(6761 Suppl):C47-52.
[Nature. 1999]Proc Natl Acad Sci U S A. 1998 Jul 21; 95(15):8420-7.
[Proc Natl Acad Sci U S A. 1998]Mutat Res. 2003 Jan 28; 522(1-2):3-11.
[Mutat Res. 2003]FEBS Lett. 2005 Mar 21; 579(8):1772-8.
[FEBS Lett. 2005]Nat Genet. 2003 Jun; 34(2):166-76.
[Nat Genet. 2003]Nat Genet. 2004 Oct; 36(10):1090-8.
[Nat Genet. 2004]Proc Natl Acad Sci U S A. 2003 Feb 4; 100(3):1128-33.
[Proc Natl Acad Sci U S A. 2003]Genome Res. 2004 Mar; 14(3):391-7.
[Genome Res. 2004]Genome Res. 2006 Mar; 16(3):374-82.
[Genome Res. 2006]J Biol. 2006; 5(4):11.
[J Biol. 2006]Nat Genet. 2005 Jan; 37(1):77-83.
[Nat Genet. 2005]Nature. 2004 Jul 1; 430(6995):88-93.
[Nature. 2004]Nature. 2000 Oct 5; 407(6804):651-4.
[Nature. 2000]Proc Natl Acad Sci U S A. 2006 Oct 31; 103(44):16337-42.
[Proc Natl Acad Sci U S A. 2006]J Theor Biol. 2006 Jul 21; 241(2):223-32.
[J Theor Biol. 2006]Proc Natl Acad Sci U S A. 2004 Jan 13; 101(2):580-5.
[Proc Natl Acad Sci U S A. 2004]Proc Natl Acad Sci U S A. 2005 Sep 27; 102(39):13773-8.
[Proc Natl Acad Sci U S A. 2005]Evolution. 2002 Aug; 56(8):1549-56.
[Evolution. 2002]Nature. 2000 Oct 5; 407(6804):651-4.
[Nature. 2000]Science. 2002 Aug 30; 297(5586):1551-5.
[Science. 2002]Nature. 2000 Oct 5; 407(6804):651-4.
[Nature. 2000]Nature. 2000 Oct 5; 407(6804):651-4.
[Nature. 2000]Nature. 2004 Jul 1; 430(6995):88-93.
[Nature. 2004]Proc Natl Acad Sci U S A. 2005 Sep 27; 102(39):13773-8.
[Proc Natl Acad Sci U S A. 2005]Nature. 2003 Jan 2; 421(6918):63-6.
[Nature. 2003]Nat Genet. 2000 Apr; 24(4):355-61.
[Nat Genet. 2000]Proc Natl Acad Sci U S A. 2002 Jun 11; 99(12):7821-6.
[Proc Natl Acad Sci U S A. 2002]J Biomed Biotechnol. 2005 Jun 30; 2005(2):96-103.
[J Biomed Biotechnol. 2005]J Biol. 2006; 5(4):11.
[J Biol. 2006]J Biol. 2006; 5(4):11.
[J Biol. 2006]J Biol. 2006; 5(4):11.
[J Biol. 2006]Proc Natl Acad Sci U S A. 2005 Sep 27; 102(39):13773-8.
[Proc Natl Acad Sci U S A. 2005]Evolution. 2002 Aug; 56(8):1549-56.
[Evolution. 2002]Prog Biophys Mol Biol. 1977; 32(1):1-82.
[Prog Biophys Mol Biol. 1977]Bioessays. 2002 Dec; 24(12):1085-94.
[Bioessays. 2002]J Biomed Biotechnol. 2005 Jun 30; 2005(2):96-103.
[J Biomed Biotechnol. 2005]