• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of ploscompComputational BiologyView this ArticleSubmit to PLoSGet E-mail AlertsContact UsPublic Library of Science (PLoS)
PLoS Comput Biol. Feb 2010; 6(2): e1000672.
Published online Feb 12, 2010. doi:  10.1371/journal.pcbi.1000672
PMCID: PMC2820515

Conditions for the Evolution of Gene Clusters in Bacterial Genomes

Jonathan A. Eisen, Editor

Abstract

Genes encoding proteins in a common pathway are often found near each other along bacterial chromosomes. Several explanations have been proposed to account for the evolution of these structures. For instance, natural selection may directly favour gene clusters through a variety of mechanisms, such as increased efficiency of coregulation. An alternative and controversial hypothesis is the selfish operon model, which asserts that clustered arrangements of genes are more easily transferred to other species, thus improving the prospects for survival of the cluster. According to another hypothesis (the persistence model), genes that are in close proximity are less likely to be disrupted by deletions. Here we develop computational models to study the conditions under which gene clusters can evolve and persist. First, we examine the selfish operon model by re-implementing the simulation and running it under a wide range of conditions. Second, we introduce and study a Moran process in which there is natural selection for gene clustering and rearrangement occurs by genome inversion events. Finally, we develop and study a model that includes selection and inversion, which tracks the occurrence and fixation of rearrangements. Surprisingly, gene clusters fail to evolve under a wide range of conditions. Factors that promote the evolution of gene clusters include a low number of genes in the pathway, a high population size, and in the case of the selfish operon model, a high horizontal transfer rate. The computational analysis here has shown that the evolution of gene clusters can occur under both direct and indirect selection as long as certain conditions hold. Under these conditions the selfish operon model is still viable as an explanation for the evolution of gene clusters.

Author Summary

Genes involved in a common pathway or function are frequently found near each other on bacterial chromosomes. A number of hypotheses have been previously presented to explain this observation. A particularly influential theory is the selfish operon model, which posits that horizontal transfer could promote gene clustering by favouring transfer of arrangements of genes that are close together. Subsequent theoretical development and analysis of genomic data have contributed to the debate about the plausibility of this model. Here, by re-examining the evolutionary dynamics of gene clusters, we provide and discuss conditions under which gene clusters can evolve. We find that first, some form of bias for clustering is required for clusters to evolve. This bias can be in the form of bias in horizontal transfer towards genes that are close together, or direct natural selection for gene proximity. Our computational work does not present a theoretical obstacle to the selfish operon model as a possible explanation for the evolution of gene clusters.

Introduction

A conspicuous feature of bacterial genomes is the grouping of genes involved in a metabolic pathway into functional units on the chromosome. Early linkage studies of Escherichia coli and Salmonella typhimurium showed that genes in the biosynthetic pathways of tryptophan and histidine occur on a contiguous region of the genome [1],[2]. Furthermore, genes are often found in their biochemical reaction order [3]. Gene clustering has since become recognized as a widespread feature of bacterial genomes. Grouped genes are sometimes transcribed together as an operon, with shared promoter and operator sequences (for example the galactose operon galETK [4],[5]). Regulatory genes have also been found close to the genes they regulate. A classic example is the lacI repressor gene, which resides near but not within the lacZYA operon in Escherichia coli. The extent of gene clustering is variable – a given set of related genes may be clustered in one species but unclustered and/or reordered in another [6],[7]. Interestingly, most clusters do not contain much intergenic DNA, and in some cases genes even overlap [8],[9].

A number of explanations for clustering have been considered over the years. The most controversial and influential hypothesis has been the selfish operon model, which offers a mechanism for the evolution of clustering without needing to invoke the action of natural selection [10],[11]. In this model, gene clusters persist because the proximity of the genes in question facilitates their collective transfer between species. It applies to genes encoding accessory functions rather than essential genes.

Another model that does not require direct selection to explain clustering is the persistence model [12]. Unlike the selfish operon model, this model has been proposed to explain the clustering of essential genes – genes that are evolutionarily persistent. The hypothesis here is that by occupying less space, clustered genes are less likely to be disrupted by the deletion or insertion of DNA. In other words, an individual with clustered genes is more “resilient” to the lethal or deleterious effects of mutation. This hypothesis is similar to the idea that genes sharing regulatory sequences by residing in a single operon present a smaller target for deleterious mutation than scattered genes with individual control elements [13].

Hypotheses involving direct selection have also been examined. Here, clustering of genes confers a direct fitness advantage to the organism. For example, a scenario in which selection directly favouring the co-regulation of genes can lead to the evolution of operons has been outlined [14]. Apart from efficiently regulated transcription, a fitness advantage may arise through shorter diffusion times for proteins finding their targets when the genes encoding them are clustered. Thermodynamic models have been developed to apply this idea to enzymes and transcription factors [15],[16]. The efficiency gained from shorter diffusion times is assumed to translate into a reproductive fitness advantage [17]. Another mechanism conferring advantage to gene clustering is gene amplification [18]. In this model, gene dosage is rapidly and reversibly increased by tandem duplication of the genes in question. The closer the genes are, the greater the probability of coamplification. The increased dosage is assumed to contribute to elevated fitness. Other models for the evolution of gene clusters based on metabolic arguments have also been studied [19],[20].

Other hypotheses have been considered but rejected [10],[17, for example]. A hypothesis now called the natal model suggests that clusters arose by gene duplication and divergence such that the newly formed genes participate in a common pathway. However, the lack of sequence homology for most genes within clusters undermines this explanation [10]. Fisher's theory of the evolution of linkage and recombination has been suggested to apply to bacteria [1],[21]. Under this theory natural selection favours increased linkage among co-adapted genes – genes whose products work well together – because recombination (chromosomal crossover during meiosis) breaks up combinations of alleles with high fitness. However, it has been pointed out that this requires high recombination rates, which are typical for eukaryotes, to work [10]. Although recombination rates are found to be high in some species [22],[23], they are not high enough relative to the cellular generation rate to support an account of clustering based on Fisher's theory.

The debate on the origins and maintenance of gene clusters continues, with recent genomic studies casting doubt on the selfish operon hypothesis. First, the prediction that non-essential genes are clustered while essential genes are not has been tested and rejected [24]. Second, if horizontal gene transfer is an important source of gene clusters, then horizontally transferred sequences should be associated with operons. Genomic data, however, do not support such an association [14]. On the other hand, they do support the possibility that genes and their regulators may have evolved close proximity via horizontal transfer [25]. Third, the selfish operon model is unable to explain the observation that genes in clusters are sometimes arranged in the order of biochemical reactions. A resolution may involve multiple mechanisms, of which horizontal transfer of selfish operons is one [12].

Here, we re-examine the theoretical basis for explaining the origins and maintenance of gene clusters. By studying a number of distinct models, we provide and discuss conditions under which clustering can evolve.

Model

We describe three kinds of models for gene clustering in this article. First, we revisit the selfish operon model [10]. We seek to explore the parameter space and understand in more detail when and why it produces gene clusters. Second, we propose a model based on the Moran process, which tracks individual bacterial cells and in which the total population size is constant. Third, we develop a further model that tracks the substitution of new arrangements, making the assumption that populations are monomorphic. By running computer simulations of these three systems we consider the factors that lead to the evolution of gene clusters.

The assumptions common to all models are as follows. Genomes are made up of circular chromosomes divided into An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e001.jpg regions; we let An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e002.jpg kilobases (kb). This genome size is constant over time. There are An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e003.jpg genes in the pathway of interest. Only a single gene can occupy any given position. The units of reproduction are either species or individual bacteria depending on the model. A genome can undergo rearrangement with probability An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e004.jpg per step or generation. We explore two processes: first, translocation of a random gene to a random position and second, inversion by which two breakpoints are chosen randomly uniformly and the intervening segment inverted. If the resulting arrangement moves the terminus or origin more than An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e005.jpg kb the new arrangement is regarded as lethal [26],[27]. Both translocation and inversion are used within the selfish operon framework of Lawrence and Roth 1996, while only inversion is considered for the Moran model and the rearrangement substitution model.

Model of Lawrence and Roth 1996

In their influential paper, Lawrence and Roth describe a simulation model that produces gene clusters through a horizontal gene transfer process that is biased towards genes that are physically closer on a chromosome [10]. This is called the selfish operon model. In this model, species in which individuals carry all the genes needed for the function are called “positive” species. Each species is assumed to be monomorphically composed of genomes with a particular arrangement of genes on the chromosome, and fixation is assumed to occur instantaneously. That is, each species is associated with a single arrangement of genes. We are interested in the minimum arc distance along the chromosome that contains all genes, which is equivalently the genome length minus the longest interval between pairs of neighbouring genes. The simulation is initialised with 100 species, with each species given a random set of gene positions. Lawrence and Roth kept the number of species between 10 and 900 [10]. We have implemented this by switching off the horizontal transfer process when the number of species reaches 900 and re-instating it when the size goes under 900. We ran our simulations for 50,000 time steps.

Horizontal transfer leads to a species that lacks the function (a “negative” species) acquiring the function along with the arrangement of gene positions of the donor genome. The probability of horizontal transfer An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e006.jpg decreases with distance An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e007.jpg. Although its form is not given in [10], we will assume it is exponential with a decay parameter An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e008.jpg. That is,

equation image
(1)

The exponential distribution is a natural choice for the size distribution of transferred DNA among bacteria, and has been empirically tested for homologous recombination [28],[29]. Some support for a skewed distribution of gene transfer fragment lengths is found in Ochman and Jones 2000 [30].

At each time step, each species or individual can undergo loss of the function with probability An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e010.jpg. Following Lawrence and Roth, we set the loss probability An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e011.jpg to 0.001 per genome per time step and the maximum probability of horizontal transfer An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e012.jpg, occurring when the genes are located in the same minute of the chromosome, to 0.01 per genome per time step [10]. We set An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e013.jpg by default, under which a 50 kb fragment is 6 times more likely to transfer than one of 500 kb. Because the probability of rearrangement is likely to be very low in nature [31], we set An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e014.jpg per genome per time step by default. Lawrence and Roth 1996 used a much higher value of An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e015.jpg and we investigate the effect of lowering this parameter from this high value. We studied the effect of varying An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e016.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e017.jpg by varying parameters one at a time as well as using latin hypercube sampling [32],[33] to explore the parameter space. Under this methodology, each parameter is divided into equiprobable regions in the area of interest, and parameter sets are constructed by selecting values randomly from the resulting grid without replacement. A uniform distribution was used for each parameter.

The algorithm we used for the dynamic is as follows.

  1. Initialise population as described above.
  2. For each species An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e018.jpg,
    1. With probability An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e019.jpg, rearrange the genome by moving a gene to a random new position.
    2. With probability An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e020.jpg, destroy gene function (the species is lost from the pool of positive species).
    3. If the number of positive species is under 900:Horizontal transfer leads to recruitment of a species (from a limitless supply of negative species) with the same arrangement of genes as species An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e021.jpg, with probability An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e022.jpg, where An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e023.jpg is the minimum arc distance between the genes in genome An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e024.jpg.
  3. Compute the average minimum arc distance between genes across species in the population of positive species.
  4. Repeat steps 2, 3 until the end of the simulation.

One problem we have noticed with this model is that given a rearrangement event, the genes in question are always affected. A more natural assumption would be that the genes in question are affected with probability An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e025.jpg, which is the proportion of the genome occupied by the An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e026.jpg genes assuming that genes are 1 kilobase in length. Thus, we have also run the simulations using this corrected translocation process, replacing step 2(a) in the above algorithm with

  • 2(a) With probability An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e027.jpg move a gene to a random new location.

This correction effectively lowers the rearrangement probability by a few orders of magnitude.

We have also implemented a version of the model in which rearrangement occurs by inversion instead of translocation. Here, we replace step 2(a) in the algorithm with

  • 2(a) With probability An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e028.jpg choose two random positions An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e029.jpg randomly uniformly between 1 and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e030.jpg. To implement breakpoints occurring between genes, subtract 0.5 from each of these values.

If (An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e031.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e032.jpg) or (An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e033.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e034.jpg) or (An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e035.jpg) then the inversion is viable. (Recall An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e036.jpg is the tolerance to imbalance between origin and terminus.) For each gene whose location An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e037.jpg is between An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e038.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e039.jpg, move it to its new location given by An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e040.jpg.

A Moran model of clustering

We construct a model in which the population evolves according to a Moran process [34],[35] combined with a process of genome inversion. Here, we track a population of bacterial cells. As with the selfish operon model, we consider a pathway involving 3 or more genes. A population is initialised with all bacteria carrying the same genome with genes placed randomly uniformly on the chromosome. The population size is An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e041.jpg. Let An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e042.jpg represent the relative fitness of cells with the genes at minimum arc distance An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e043.jpg. Genomes with the genes closer together have a reproductive or survival advantage over those with the genes further apart. We use the function An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e044.jpg to describe this relationship. Because this relative fitness function is analogous to An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e045.jpg, we use the same symbol (An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e046.jpg) to describe the decay in fitness with respect to distance An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e047.jpg. An alternative function An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e048.jpg was also used to ascertain the effect of using a steep sigmoidal relationship. Selection for clustering here can be due to any of the mechanisms discussed in the Introduction.

The algorithm is as follows.

  1. Initialise the population as described above.
  2. Choose an individual at random. Choose two positions (An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e049.jpg) at random uniformly between 1 and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e050.jpg. To implement breakpoints occurring between genes, subtract 0.5 from each of these values.
  3. Inversion occurs with probability An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e051.jpg.
    1. If inversion occurs,
      1. if (An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e052.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e053.jpg) or (An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e054.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e055.jpg) or (An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e056.jpg) then the inversion is viable. For each gene whose location An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e057.jpg is between An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e058.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e059.jpg, move it to its new location given by An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e060.jpg.
      2. Otherwise the inversion is lethal: replace the individual with a random individual from the population in proportion to An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e061.jpg where An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e062.jpg is the minimum arc distance between the genes in genome An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e063.jpg.
    2. Otherwise if inversion does not occur, there is random death and replacement. Replacement birth occurs by picking a random individual from the population in proportion to An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e064.jpg.
  4. Record the average minimum arc distance between genes across the population.
  5. Repeat steps 2–4 until the end of the simulation.

Following the classical definition of the Moran process, a single generation is An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e065.jpg time steps.

This process is very slow with high population sizes, particularly when the rearrangement probability An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e066.jpg is low. The computational demands of running these simulations precluded the possibility of systematically analysing sensitivity to parameters. This motivated us to develop a further model, which tracks the mutation and fixation process without following details at the population level. This model is described in the next subsection.

Rearrangement substitution model

Here, the population is monomorphic (except during periods of substitution of new arrangements) and so only a single genome arrangement is tracked. Again, the An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e067.jpg genes in the pathway in question can occupy An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e068.jpg positions, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e069.jpg represents the population size and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e070.jpg is the rearrangement probability. The assumption that the population is monomorphic implies that An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e071.jpg must not be too large. In each generation the probability of a rearrangement occurring in at least one individual is An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e072.jpg which can be approximated with An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e073.jpg since An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e074.jpg is small. The time until the next rearrangement event An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e075.jpg is distributed geometrically with parameter An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e076.jpg. We use inversion rather than translocation as the source of rearrangements.

As above we specify selection through an exponential decay in fitness as a function of the minimum arc distance An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e077.jpg, so that the relative fitness of a new genome with distance An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e078.jpg is An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e079.jpg, and the selective coefficient is An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e080.jpg. A new arrangement fixes in a population with probability

equation image
(2)

and the time it takes to reach fixation is

equation image
(3)

These quantities have been derived from diffusion theory in population genetics (for details see [36]). We use An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e083.jpg in place of An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e084.jpg to apply the theory to haploids, where An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e085.jpg is the effective population size of a diploid population.

The algorithm for the rearrangement substitution model is therefore as follows.

  1. Initialise by choosing a random arrangement of genes. Choose these positions without replacement.
     Set An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e086.jpg, the number of generations to run the simulation. Set current time to An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e087.jpg.
  2. Get random time An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e088.jpg until next rearrangement event:
    equation image
  3. Inversion: choose two integers at random (uniformly between 1 and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e090.jpg inclusive). Subtract 0.5 from each value. Label these points An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e091.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e092.jpg.
    1. If (An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e093.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e094.jpg) or (An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e095.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e096.jpg) or (An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e097.jpg) then the new arrangement is viable. Obtain the new arrangement as follows. Locate all genes between An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e098.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e099.jpg. Call these positions An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e100.jpg. For each An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e101.jpg, place the gene into the new location given by An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e102.jpg. Go to Step 4.
    2. Otherwise, the arrangement is not viable. Set An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e103.jpg and go to Step 6.
  4. Obtain the current minimum arc distance An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e104.jpg and the minimum arc distance An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e105.jpg of the mutant arrangement. Compute the selection coefficient:
    equation image
    where An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e107.jpg is the new distance.
  5. Compute the fixation probability An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e108.jpg given by Equation (2). Fixation occurs if a random uniform U(0,1) is less than An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e109.jpg.
     If fixation occurs, find the expected time until fixation An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e110.jpg given by Equation (3). Set the current genome to the new arrangement.
    Otherwise there is no fixation and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e111.jpg.
  6. Update elapsed time: An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e112.jpg
  7. If elapsed time An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e113.jpg, stop the process and record the minimum arc distance. Otherwise, return to Step 2.

An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e114.jpg was set at 50,000 generations. We investigated this model by varying one parameter at a time as well as using latin hypercube sampling to explore the parameter space.

Results

Lawrence and Roth model

When three genes are placed randomly around a chromosome with a uniform distribution, the average minimum arc distance between them is around 1900 kb. When the rearrangement probability An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e115.jpg is An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e116.jpg or An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e117.jpg, the selfish operon model [10] produces an initial wave of gene clustering down to around 600–800kb ((Figure 1A), also reflected in the rise of the proportion of genomes that are clustered under a threshold (Figure 1B). The maximum population size of 900 is reached quickly (Figure 1C) and the dynamics of clustering undergo a change as a new population dynamic regime sets in. When the rearrangement probability is high, clusters break up until the average minimum arc distance settles on high values (Figure 1A). In these cases, the selfish operon model fails to maintain tight clustering in the long term. In particular, gene clusters do not evolve under the parameter values used by Lawrence and Roth [10].

Figure 1
Gene clustering under the original selfish operon model.

To determine if there are conditions under which the selfish operon model does produce clustering, we re-examined this model by exploring its parameter space. Figure 2 reveals the effect of varying the parameters in this model on the average minimum arc distance. It shows that under the original model clustering is only produced when the rearrangement probability An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e125.jpg is low, the number of genes An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e126.jpg is small, and the maximum transfer probability An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e127.jpg is sufficiently high. Under the corrected translocation process, the effective rearrangement probability is lowered by a factor An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e128.jpg and the probability An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e129.jpg itself has no apparent effect on clustering. The decay in transfer probability An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e130.jpg (see Equation 1) must take intermediate values of around An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e131.jpg for clustering to evolve. If An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e132.jpg is too low, selection is too weak to promote clustering while if it is too high, the probability of transfer is depressed for most minimum arc distances, preventing selection from acting effectively.

Figure 2
Gene clustering under the selfish operon model.

Very similar results are observed when translocation is replaced by inversion, as shown by varying one parameter at a time as well as by latin hypercube sampling analysis (Figure 3). The major difference is that a high probability of inversion does not prevent the evolution of clusters to the same extent as observed in the uncorrected translocation process of Figure 2A.

Figure 3
A sensitivity analysis for the selfish operon model with inversion rather than translocation.

Moran model

We further explored the evolution of clustering using the Moran model with selection for gene clusters. By holding the population size constant this model also allows us to disentagle the effects of population dynamics from those of rearrangement and selection. Figure 4 shows simulation runs of the process for progressively lower values of An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e149.jpg: An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e150.jpg. It was not computationally feasible to run the simulation under even lower, and more realistic, values. The general pattern emerging from these sample trajectories is that the minimum arc distance is reduced through a series of selective sweeps. The time taken until the appearance of a rearranged genome that reaches fixation is long and depends on the rearrangement probability An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e151.jpg and the population size An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e152.jpg. The reduction of minimum arc distance is a slow process made even slower by lowering An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e153.jpg. Using a steep sigmoidal function for selection instead of exponential decay (Figure 4D) gave qualitatively similar results.

Figure 4
Gene clustering under a Moran model.

Rearrangement substitution model

The rearrangement substitution model, which “compresses” time by tracking fixation events, is amenable to sensitivity analysis. Figure 5 demonstrates that a low rearrangement probability of An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e159.jpg is able to produce clustering in 50,000 generations. Even lower probabilities lead to weak or no clustering because successful rearrangements that reduce the distance between genes are too rare. Increasing the population size An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e160.jpg improves the efficiency of selection and leads to clustering. Similarly, increasing the decay in fitness An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e161.jpg improves clustering. Gene clusters are also more readily formed for pathways with a smaller number of genes An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e162.jpg.

Figure 5
Rearrangement substitution model, varying one parameter at a time.

Similar results are produced when the parameter space is explored using latin hypercube sampling (Figure 6). Minimum arc distance decreases with An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e171.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e172.jpg and increases with An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e173.jpg. Distance also decreases with An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e174.jpg, though this effect is subtle. For An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e175.jpg (panel B) and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e176.jpg (panel C) the correlation with distance is statistically detectable using a non-parametric method (Kendall's tau), with An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e177.jpg-values of An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e178.jpg and 0.0148 respectively. The corresponding An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e179.jpg-values for An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e180.jpg (panel A) and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e181.jpg (panel D) were both less than An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e182.jpg. Note that each factor on its own does not explain much of the variation in distance.

Figure 6
Rearrangement substitution model, varying the parameters of the model using latin hypercube sampling with 200 points.

Discussion

This study presents new computational models showing that direct natural selection can lead to the formation of gene clusters under appropriate conditions. We have also re-examined an existing simulation model involving indirect selection – the selfish operon model. By exploring these models under many conditions, we have identified the regions in parameter space that produce gene clustering. In the following, we will discuss parameters as rates rather than probabilities per time step.

Selfish operon model revisited and the role of rearrangement rates

The selfish operon model of Lawrence and Roth 1996 is able to produce gene clusters, but only when the right conditions hold. The overall transfer rate must be high, as reflected in the maximum rate of transfer, and the decay in transfer over distance must be in a suitable range. The rearrangement rate must be low. We note that with a slight correction – accounting for the probability that rearrangement affects the genes in question – these rates are indeed effectively low enough for clustering to evolve. When the selfish operon model is modified so that inversion is the mechanism of rearrangement, again gene clusters can evolve, and inversion rates must be low enough to prevent clusters from disintegrating too quickly once formed.

Rates of fixation of rearrangements are typically very low in nature. From comparisons of genomic data the number of rearrangements per genome per lineage varies across evolutionary lineages, but is usually on the same order of magnitude as or a little higher than the expected number of substitutions per site [37],[38]. Rocha (2006) found the substitution rate of rearrangements to be several orders of magnitude lower than the cellular per generation rate of An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e187.jpg [39] because of selection against most new arrangements [40]. Overall these studies suggest the rearrangement rate may be on the order of An external file that holds a picture, illustration, etc.
Object name is pcbi.1000672.e188.jpg per year. The application of these figures to the model of Lawrence and Roth 1996 is not straightforward because the time unit is not set in that model. However, given the wide level of variation in gene content even within bacterial species [41] and the slow process of rearrangement [40], it is likely that the rearrangement rate is far lower than the rate of horizontal gene transfer. Our analysis of the selfish operon model suggests that gene clusters can evolve under such conditions.

A low number of genes in the pathway promotes clustering

In both the selfish operon model (Figures 2,,3)3) and the rearrangement substitution model (Figures 5, ,6),6), gene clusters evolved more readily when the number of genes in a pathway was low. It accords with intuition that less time is taken for a smaller number of genes to cluster. Yet large clusters exist in nature. A possible explanation is that clustering occurs in stages rather than all at once. For this scenario to work, subsets of genes already clustered must be held together while the remaining genes move close to the cluster. Biologically, what could make gene clusters an absorbing state? Clusters of genes are sometimes but not always transcribed and regulated together (found on operons). If such genes are separated, their function may be undermined or destroyed. Another possibility is that genes overlap on a chromosome [9]. If the region of overlap is essential to both genes, again selection would act to maintain the clustered arrangement of those genes. It is unclear, however, how widely applicable this mechanism is. Future modelling efforts could include the possibility that selection acts not only on the minimum arc distance but also on the particular arrangement of the genes. For instance, in a three-gene pathway, a genome in which two genes are close together may be favoured over one in which the three genes are spaced evenly over the same minimum arc distance.

The roles of selection and transfer bias in gene cluster evolution

As intuition dictates, the evolution of clusters also requires some kind of a bias favouring clusters, which can appear in the form of biased horizontal transfer (the selfish operon model) or natural selection for gene proximity (the Moran model with inversion, and the rearrangement substitution model). In each case the parameters must be appropriate to give natural selection or transfer bias their efficacy to produce gene clusters. In the selection model the population size needs to be high and the decay in fitness over distance must be sufficiently high. Under the conditions we studied, the evolution of gene clusters is expected to occur very slowly. However, billions of years have passed since the last universal common ancestor, providing ample time for gene clusters and operons to have evolved and to have been transferred between species.

We remark that in mathematical models of the level of detail presented here, including the selfish operon model of Lawrence and Roth [10], bias in horizontal transfer is indistinguishable from direct natural selection. The persistence model of bacterial gene clusters described by Fang et al. [12] represents another model of selection. There, deletions are more likely to destroy gene function when the genes are further apart on the chromosome. This is a form of negative selection acting against non-clustered essential genes. Both the selfish operon model and the persistence model involve a form of indirect selection, and we suggest that either direct or indirect selection, or both, are needed for clusters to form and be maintained. Current models do not and cannot separate these two phenomena. For example, although we specified the Moran model for a population of individual cells under direct selection for gene clustering, it is possible to interpret the same model as one tracking a population of species (as in the selfish operon model) in which selection is indirect, and in the form of horizontal transfer biased towards low minimum arc distances.

We did not attempt to discriminate between the alternative forms of selection or bias favouring clustering. Rather, we have shown that under appropriate conditions these models can lead to gene clustering. A systematic and formal comparison of alternative models is a remaining challenge, which may require a common mathematical framework for comparing the consequences of these alternatives. Although the selfish operon model has been questioned as the sole mechanism for the evolution of gene clustering, we believe it cannot yet be rejected as a contributor on either empirical or theoretical grounds.

Footnotes

The authors have declared that no competing interests exist.

This work was supported by Discovery Grant DP0987302 from the Australian Research Council, including a QEII Fellowship to MMT. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

1. Stahl FW, Murray NE. The evolution of gene clusters and genetic circularity in microorganisms. Genetics. 1966;53:569–576. [PMC free article] [PubMed]
2. Demerec M. Clustering of functionally related genes in Salmonella typhimurium. Proc Natl Acad Sci. 1964;51:1057–1060. [PMC free article] [PubMed]
3. Demerec M, Hartman PE. Complex loci in microorganisms. Annu Rev Microbiol. 1959;13:377–406.
4. Bentley SD, Parkhill J. Comparative genomic structure of prokaryotes. Annu Rev Genet. 2004;38:771–791. [PubMed]
5. Jackson JH, Harrison SH, Herring PA. A theoretical limit to coding space in chromosomes of bacteria. OMICS. 2002;6:115–121. [PubMed]
6. Lathe WC, Snel B, Bork P. Gene context conservation of a higher order than operons. Trends Biochem Sci. 2000;25:474–479. [PubMed]
7. Dandekar T, Snel B, Huynen M, Bork P. Conservation of gene order: a fingerprint of proteins that physically interact. Trends Biochem Sci. 1998;23:324–328. [PubMed]
8. Eyre-Walker A. The distance between Escherichia coli genes is related to gene expression levels. J Bacteriol. 1995;177:5368–5369. [PMC free article] [PubMed]
9. Fukuda Y, Nakayama Y, Tomita M. On dynamics of overlapping genes in bacterial genomes. Gene. 2003;323:181–187. [PubMed]
10. Lawrence JG, Roth JR. Selfish operons: Horizontal transfer may drive the evolution of gene clusters. Genetics. 1996;143:1843–1860. [PMC free article] [PubMed]
11. Lawrence JG. Gene organization: Selection, selfishness, and serendipity. Annu Rev Microbiol. 2003;57:419–440. [PubMed]
12. Fang G, Rocha EPC, Danchin A. Persistence drives gene clustering in bacterial genomes. BMC Genomics. 2008;9:4+. [PMC free article] [PubMed]
13. Lynch M. Streamlining and simplification of microbial genome architecture. Annu Rev Microbiol. 2006;60:327–349. [PubMed]
14. Price MN, Huang KH, Arkin AP, Alm EJ. Operon formation is driven by co-regulation and not by horizontal gene transfer. Genome Res. 2005;15:809–819. [PMC free article] [PubMed]
15. Svetic RE, MacCluer CR, Buckley CO, Smythe KL, Jackson JH. A metabolic force for gene clustering. B Math Biol. 2004;66:559–581. [PubMed]
16. Kolesov G, Wunderlich Z, Laikova O, Gelfand M, Mirny L. How gene order is influenced by the biophysics of transcription regulation. Proc Natl Acad Sci. 2007;104:13948–13953. [PMC free article] [PubMed]
17. Martin FJ, McInerney JO. Recurring cluster and operon assembly for phenylacetate degradation genes. BMC Evol Biol. 2009;9:36. [PMC free article] [PubMed]
18. Reams AB, Neidle EL. Selection for gene clustering by tandem duplication. Annu Rev Microbiol. 2004;58:119–142. [PubMed]
19. Kovács K, Hurst LD, Papp B. Stochasticity in protein levels drives colinearity of gene order in metabolic operons of escherichia coli. PLoS Biol. 2009;7:e1000115. [PMC free article] [PubMed]
20. Zaslaver A, Mayo A, Ronen M, Alon U. Optimal gene partition into operons correlates with gene functional order. Phys Biol. 2006;3:183–189. [PubMed]
21. Bodmer WF, Parsons PA. Linkage and recombination in evolution. Adv Genet. 1962;11:5.
22. Feil EJ, Holmes EC, Bessen DE, Chan MS, Day NP, et al. Recombination within natural populations of pathogenic bacteria: short-term empirical estimates and long-term phylogenetic consequences. Proc Natl Acad Sci. 2001;98:182–187. [PMC free article] [PubMed]
23. Spratt BG, Hanage WP, Feil EJ. The relative contributions of recombination and point mutation to the diversification of bacterial clones. Curr Opin Microbiol. 2001;4:602–606. [PubMed]
24. Pal C, Hurst LD. Evidence against the selfish operon theory. Trends Genet. 2004;20:232–234. [PubMed]
25. Price MN, Dehal PS, Arkin AP. Horizontal gene transfer and the evolution of transcriptional regulation in Escherichia coli. Genome Biol. 2008;9:R4. [PMC free article] [PubMed]
26. Eisen JA, Heidelberg JF, White O, Salzberg SL. Evidence for symmetric chromosomal inversions around the replication origin in bacteria. Genome Biol. 2000;1:RESEARCH0011. [PMC free article] [PubMed]
27. Darling AE, Miklós I, Ragan MA. Dynamics of genome rearrangement in bacterial populations. PLoS Genet. 2008;4:e1000128. [PMC free article] [PubMed]
28. Falush D, Kraft C, Taylor NS, Correa P, Fox JG, et al. Recombination and mutation during long-term gastric colonization by Helicobacter pylori: estimates of clock rates, recombination size, and minimal age. Proc Natl Acad Sci. 2001;98:15056. [PMC free article] [PubMed]
29. Jolley KA, Wilson DJ, Kriz P, McVean G, Maiden MCJ. The influence of mutation, recombination, population history, and selection on patterns of genetic diversity in Neisseria meningitidis. Mol Biol Evol. 2005;22:562. [PubMed]
30. Ochman H, Jones IB. Evolutionary dynamics of full genome content in Escherichia coli. EMBO J. 2000;19:6637–6643. [PMC free article] [PubMed]
31. Rocha EPC. Inference and analysis of the relative stability of bacterial chromosomes. Mol Biol Evol. 2006;23:513. [PubMed]
32. McKay MD, Beckman RJ, Conover WJ. A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics. 2000;42:55–61.
33. Blower SM, Dowlatabadi H. Sensitivity and uncertainty analysis of complex-models of disease transmission: an HIV model, as an example. Int Statist Rev. 1994;62:229–243.
34. Moran PAP. Random processes in genetics. Math Proc Cambridge. 1958;54:60–71.
35. Moran PAP. The effect of selection in a haploid genetic population. Math Proc Cambridge. 1958;54:463–467.
36. Kimura M. The Neutral Theory of molecular evolution. Cambridge: Cambridge University Press; 1983.
37. Suyama M, Bork P. Evolution of prokaryotic gene order: genome rearrangements in closely related species. Trends Genet. 2001;17:10–13. [PubMed]
38. Belda E, Moya A, Silva FJ. Genome rearrangement distances and gene order phylogeny in gamma-proteobacteria. Mol Biol Evol. 2005;22:1456–1467. [PubMed]
39. Hill CW, Harnish BW. Inversions between ribosomal RNA genes of Escherichia coli. Proc Natl Acad Sci U S A. 1981;78:7069–7072. [PMC free article] [PubMed]
40. Rocha EPC. Inference and analysis of the relative stability of bacterial chromosomes. Mol Biol Evol. 2006;23:513–522. [PubMed]
41. Tettelin H, Riley D, Cattuto C, Medini D. Comparative genomics: the bacterial pan-genome. Curr Opin Microbiol. 2008;11:472–477. [PubMed]

Articles from PLoS Computational Biology are provided here courtesy of Public Library of Science
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...