• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptNIH Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Science. Author manuscript; available in PMC Oct 6, 2012.
Published in final edited form as:
PMCID: PMC3337212

Population Genomics of Early Events in the Ecological Differentiation of Bacteria


Genetic exchange is common among bacteria, but its effect on population diversity during ecological differentiation remains controversial. A fundamental question is whether advantageous mutations lead to selection of clonal genomes or, as in sexual eukaryotes, sweep through populations on their own. Here we show that in two recently diverged populations of ocean bacteria, ecological differentiation has occurred akin to a sexual mechanism: a few genome regions have swept through subpopulations in a habitat specific manner, accompanied by gradual separation of gene pools as evidenced by increased habitat-specificity of the most recent recombinations. These findings reconcile previous, seemingly contradictory empirical observations of the genetic structure of bacterial populations, and point to a more unified process of differentiation in bacteria and sexual eukaryotes than previously imagined.

How adaptive mutations spread through bacterial populations and trigger ecological differentiation has remained controversial. While it is agreed that the key factor is the balance between recombination and positive selection, theory and observations remain seemingly at odds. On the one hand, evidence for genes spreading through populations independently via recombination (‘gene-specific sweeps’) is found in observations of environment-specific genes (1) and alleles (2), and reduced diversity at single loci amidst high genomewide polymorphism (3, 4). On the other hand, mathematical modeling suggests that empirically observed rates of homologous recombination should not be high enough to unlink a gene, which is under even moderate selection, from the rest of the genome (5, 6). Importantly, this recombination/selection balance, expressed most saliently by the ecotype theory, leads to a prediction that is actually observed but that is at odds with gene-specific sweeps: i.e., bacterial diversity is organized into ecologically differentiated clusters (79). The proposed mechanism involves cycles of neutral diversification punctuated by genomewide selective sweeps (6). While the observations of environment-specific genes and locus-specific reduced diversity conflict with the ecotype model of selected clonal genomes, they do not explain why its prediction of coincident genetic and ecological clusters hold true, nor provide insights into the early genomic events accompanying adaptation. How to reconcile these different empirical observations, so seemingly at odds with each other, therefore remains an open question.

Here, we test whether recombination is strong enough relative to selection to allow gene-specific rather than genomewide selective sweeps in natural microbial populations and explore the effect on population-level diversity. Using whole-genome sequences from two recently diverged Vibrio populations with clearly delineated habitat associations, we show that genome regions rather than whole genomes sweep through populations, triggering gradual, genomewide differentiation. Our proposed evolutionary scenario is based on three lines of evidence. First, most of the genetic divergence between ecological populations is restricted to a few genomic loci with low diversity within one or both of the populations, suggesting recent sweeps of confined regions of the genome. Second, we show that only one of the two chromosomes comprising the genome has swept through part of one population. Third, the most recent recombination events tend to be population specific but older events are not, reinforcing the notion that these populations are on independent evolutionary trajectories, which may ultimately lead to the formation of genotypic clusters with different ecology. Although such clusters have been interpreted as evidence for the ecotype model, our results suggest that they can arise even in populations that do not experience genomewide selective sweeps.

In a previous study, we noticed an instance of very recent ecological differentiation among two populations of Vibrio cyclitrophicus by their divergence in fast-evolving protein-coding genes and differential occurrence in the large (L) and small (S) size fractions of filtered seawater, suggesting association with different zoo- and phytoplankton or suspended organic particle types (8). This population structure was reproduced across independent samples taken in 2006 and 2009. We sequenced whole genomes from both populations (13 L and 7 S isolates, all obtained in 2006). As in other Vibrionaceae, these genomes consist of two chromosomes, each with a flexible and core component, defined as blocks of DNA not universally present in all isolates or shared by all, respectively. To estimate the extent and patterns of recombination among the isolates, we subdivided the core genome into blocks of DNA on the basis of their supporting different phylogenetic relationships among the 20 isolates (10). Overall, the ecological populations described here are among the most closely related (identical 16S and >99% average amino acid identity) studied with genomewide sequence data, making them an ideal test case for observing the early events involved in ecological differentiation.

Genes not genomes sweep populations

Our first line of evidence favoring gene-specific rather than genomewide selective sweeps is that most of the differentiation between populations is restricted to a few small patches of the core genome. Ecological differentiation is supported by 725 ‘ecoSNPs’ – defined as dimorphic nucleotide positions with one variant present in all S strains and a different variant in all L strains – which cluster in a few discrete patches of the genome (11 in total, three of which contain >80% of ecoSNPs). In contrast, the rest of the genome is dominated by 28,744 SNPs supporting phylogenetic intermingling of S and L strains (e.g., nucleotide C in 3 S and 6 L strains, G in 4 S and 7 L strains), therefore rejecting the ecological partition (Fig. 1; S1, S2). Any signal of clonal ancestry has been obscured by homologous recombination, which affects equally genes of all functions, and is therefore likely not driven by selection (fig. S3, (10)), such that no single bifurcating tree relating the 20 strains adequately describes the evolution of more than 1% of the core genome (Fig. 1C). Such a pattern could have been produced either by an ancient genomewide selective sweep in one or both populations, followed by recombination between populations eroding the ‘clonal frame’ down to a few regions, or by recent gene-specific selective sweeps centered on these few regions. The latter explanation is favored because most major ecoSNP clusters (three out of the four peaks in Fig. 1B) have significantly lower within-habitat diversity (in one or both habitats) than the chromosome-wide average. The exception is the highly diverse RTX/RpoS locus, which may be under diversifying selection both within and between habitats. The low within-habitat diversity in the other three regions, which account for the majority of ecoSNPs, suggests they arrived recently by recombination (likely from a distantly related population;(10)) and swept through a population before accumulating much polymorphism.

Fig. 1
Phylogeny follows ecology at just a few habitat-specific loci

Our second line of evidence shows that genomic fragments can sweep through populations in an ecology-specific manner without purging genomewide variation. In particular, a large fraction of chromosome II has swept through a subset of the S population, without impacting the diversity of chromosome I. As evidence for this, each chromosome has a distinct core phylogeny, with five of the seven S strains grouping together on chromosome II, but not chromosome I (Fig. 1). This ‘5-S’ clade (grouping together strains 1F97, 1F111, 1F273, FF274 and FF160; blue branch in Fig. 1A and blue points in Fig. 1B) is supported by 796 SNPs: 790 on chromosome II and six on chromosome I – an over 200-fold imbalance after normalizing by the 1.45X more SNPs/site on chromosome II. Chromosome II also strongly supports one phylogeny within the 5-S strains; SNPs inconsistent with this phylogeny are restricted almost entirely to chromosome I (fig. S4, S5). The degree of support for the 5-S group on chromosome II suggest that a variant of this chromosome swept through these five S strains, independently of chromosome I. The sweep likely occurred recently, before the clear phylogenetic signal within the 5-S strains was disrupted by recombination. This signature of a long stretch of DNA (in this case, a chromosome) largely uninterrupted by recombination is a hallmark of recent positive selection in sexual eukaryotes (11), suggesting a selective sweep of chromosome II independently of the rest of the genome (chromosome I). The mobilization of genomic fragments on the size scale of chromosomes may also explain the hybrid genomes observed in novel pathogenic variants of Vibrio vulnificus (12).

Emergent habitat-specific recombination

Our third line of evidence shows how, despite the lack of genomewide selective sweeps, tight genotypic clusters may eventually emerge as a result of preferential recombination within, rather than between, habitats. This is evident from quantification of recent recombination in the core genome, using three very recently diverged pairs of ‘sister strains,’ 1F175-1F53, 1F111-1F273 and ZF30-ZF207, that group together at nearly all SNPs in the genome (Fig. 1A). The grouping of such young sister pairs should only be broken by the most recent recombination events identifiable in our sample, involving one of the sister strains as a donor or acceptor. We quantified such events by counting core genome blocks inconsistent with phylogenetic pairing of sister strains (10). Out of 93 such blocks (Fig. 2A), 76 resulted from one sister strain pairing with another strain from the same habitat. This is significantly more within-habitat recombination than expected under a model with random recombination across habitats (p < 1e-5; (10)). The excess within-habitat recombination was detectable in both S (p = 0.03) and L (p < 1e-5) populations considered separately, and is robust to variation in our assumptions about the relative S:L population sizes (10). In contrast, the pairing of more anciently diverged S strains, FF160-FF274, is more often broken up by recombination with L (222 blocks) than S strains (8 blocks)(p < 1e-5), perhaps owing to the higher abundance of L strains in the past (e.g., if the ancestral, undifferentiated population was L-associated). This suggests that the trend toward the habitat-specific gene flow we identified has emerged relatively recently.

Fig. 2
Recent recombination is more common within than between habitats

The preference for within-habitat recombination is also apparent in the flexible genome. This component of the genome changes so rapidly that even the two most closely-related genomes in our study (1F175-1F53), differing by only 66 substitutions in 3.54 Mb of core genome, each contain about 4,500 bp of unique DNA (fig. S6). The flexible genome tree also has a different topology to that of the core (Fig. 2), suggesting that the flexible genome is shaped largely by horizontal transfer (integrase-mediated and illegitimate recombination), with limited clonal descent. The separate grouping of S and L strains (Fig. 2B; 99.8% bootstrap support) when clustered by the proportion of shared flexible DNA (Fig. 2B) indicates preferential recombination occurs within habitats. Compared with a model of random recombination among habitats, there is significantly more habitat-specific sharing of flexible blocks than expected by chance (p < 5.5x10−58; (10); Table S1). Interestingly, all seven S strains – not just the 5-S strains hypothesized to have undergone a selective sweep on chromosome II – share significant amounts of flexible DNA on this chromosome (fig. S7). Therefore flexible genome turnover is sufficiently rapid that flexible DNA does not hitchhike with selective sweeps for very long. Rather, high turnover, with a clear bias toward within-habitat sharing of DNA, maintains distinct but dynamic and habitat-specific gene pools.

Functions of ecologically differentiated genes

The revelation that there is a suite of habitat-specific genes and alleles has shed light on the selective pressures associated with specialization to different microhabitats in the ocean (Table S1, S2; (10)). The RTX locus and syp operon exhibit both allelic variation (core) and gene content variation (flexible). Several syp genes, present in all L but absent from S genomes, and their upstream regulator sypG, present in different allelic variants between habitats, are involved in biofilm formation and host colonization (13). RTX proteins are important virulence factors in pathogens (14) and may play a role in interactions with different hosts. The stress-response sigma factor RpoS, in the core genome near the RTX locus, has been shown to mediate a tradeoff between stress tolerance and nutritional specialization in environmental Escherichia coli isolates (15). Finally, MSHA biosynthesis genes, many of which are unique to L flexible genomes, promote adherence to chitin (16) and zooplankton exoskeletons (17). Together, this suggests that ecological specialization, possibly through differential host association, can be achieved by fine-tuning genes in a few key functional pathways.

A model for ecological differentiation in bacteria

Our observations can be generalized with a model predicting independent evolutionary trajectories for nascent populations triggered by gene-specific sweeps (Fig. 3). The mosaic genomes we observed, with different genome blocks supporting different phylogenies suggest a frequently recombining, ecologically uniform ancestral population (Fig. 3B, early time points). The recent acquisition of habitat-specific flexible genes and core alleles likely initiated specialization to different hosts or habitats leading to decreased gene flow between populations. The populations we studied are in a very early stage of ecological specialization, with little genetic divergence between them. However, if the trend towards greater within-population recombination can be extrapolated into the future (as might indeed be expected given that recombination drops loglinearly with sequence divergence (1822)), they will eventually form distinct genetic clusters, potentially indistinguishable from those predicted by (and often taken as evidence for) the ecotype model (Fig. 3A). Genetic isolation by preferential recombination has been suggested previously (23), and this trend might be enhanced if homologous recombination between populations is reduced in the vicinity of acquired habitat-specific genes (24). Thus, a mechanism of gene-centered sweeps may eventually lead to a pattern characteristic of genomewide sweeps. In this way, our study of the very early stages of ecological specialization has provided a simple resolution to seemingly conflicting empirical observations.

Fig. 3
Ecological differentiation in recombining microbial populations


Our findings of ecological differentiation driven by gene-specific rather than genomewide selective sweeps, followed by gradual emergence of barriers to gene flow, leave open three major questions for future investigation: what mechanisms (aside from unrealistically high recombination rates) are responsible for preventing genomewide selective sweeps (e.g., negative frequency dependent selection by viruses and protozoa), how often and by what mechanism are entire chromosomes mobilized, and what are the barriers to gene flow between sympatric ecological populations (e.g., reduced encounter rates or some form of assortative mating)? No matter how marked the decline in gene flow between ecological populations, they will always remain open to uptake of DNA from other populations, thus remaining fundamentally different from biological species of sexual eukaryotes (2). Yet strikingly, the process of ecological differentiation we have inferred for these ocean bacteria is similar to models of sympatric speciation by habitat-specific allelic sweeps in sexual eukaryotes (25, 26). Despite differences in how adaptive alleles are acquired, our results suggest that how they spread within populations may follow a more uniform process in both prokaryotes and eukaryotes than previously imagined.

Supplementary Material

Supplemental materials


We thank E. DeLong, S. W. Chisholm, J. Wakeley, P. Sabeti, W. Hanage, D. Neafsey, and M. Coleman for valuable suggestions and comments, and X. Didelot and P. Marttinen for help with software. Funding for this work was provided by NSF grant DEB-0918333(to E.J.A. and M.F.P.), the NSF -supported Woods Hole Center for Oceans and Human Health (COOH), and grants from the Gordon and Betty Moore Foundation and the Department of Energy Genomes to Life program (M.F.P.). Funding for genome sequencing was provided by the Moore Foundation and the Broad Institute’s SPARC program. Computational resources were provided by NSF grant 0821391. Support was provided by a Canada Graduate Scholarship from the Natural Sciences and Engineering Research Council of Canada and a postdoctoral fellowship from the Harvard MIDAS Center for Communicable Disease Dynamics (B.J.S.); a Merck-MIT fellowship (J.F.); the Netherlands Organisation for Scientific Research (O.X.C.); and the Rosztoczy Foundation (G.S.). Whole genomes sequences have been deposited at the DNA Data Bank of Japan, European Molecular Biology Laboratory, and GenBank under accessions AHTI00000000, AICZ00000000, and AIDA00000000-AIDS00000000(table S6).


Supplementary Materials:

Materials and Methods

Figures S1–S14

Tables S1–S6

References (2759)

References and Notes

1. Coleman ML, Chisholm SW. Proc Natl Acad Sci USA. 2010;107:18634. [PMC free article] [PubMed]
2. Papke RT, et al. Proc Natl Acad Sci USA. 2007;104:14092. [PMC free article] [PubMed]
3. Guttman DS, Dykhuizen DE. Genetics. 1994;138:993. [PMC free article] [PubMed]
4. Denef VJ, et al. Proc Natl Acad Sci USA. 2010;107:2383. [PMC free article] [PubMed]
5. Shapiro BJ, David LA, Friedman J, Alm EJ. Trends Microbiol. 2009;17:196. [PubMed]
6. Cohan FM, Perry EB. Curr Biol. 2007;17:R373. [PubMed]
7. Koeppel A, et al. Proc Natl Acad Sci USA. 2008;105:2504. [PMC free article] [PubMed]
8. Hunt DE, et al. Science. 2008;320:1081. [PubMed]
9. Preheim SP, Timberlake S, Polz MF. Appl Envir Microbiol. 2011;77:7195. [PMC free article] [PubMed]
10. Materials and methods are available as supporting material on Science Online.
11. Sabeti PC, et al. Science. 2006;312:1614. [PubMed]
12. Bisharat N, et al. Emerging Infect Dis. 2005;11:30. [PMC free article] [PubMed]
13. Visick KL. Mol Microbiol. 2009;74:782. [PMC free article] [PubMed]
14. Satchell KJF. Annu Rev Microbiol. 2011;65:71. [PubMed]
15. King T, Ishihama A, Kori A, Ferenci T. Journal of Bacteriology. 2004;186:5614. [PMC free article] [PubMed]
16. Meibom KL, et al. Proc Natl Acad Sci USA. 2004;101:2524. [PMC free article] [PubMed]
17. Chiavelli DA, Marsh JW, Taylor RK. Appl Envir Microbiol. 2001;67:3220. [PMC free article] [PubMed]
18. Majewski J. FEMS Microbiol Lett. 2001;199:161. [PubMed]
19. Falush D, et al. Philos Trans R Soc Lond, B, Biol Sci. 2006;361:2045. [PMC free article] [PubMed]
20. Fraser C, Hanage WP, Spratt BG. Science. 2007;315:476. [PMC free article] [PubMed]
21. Eppley JM, Tyson GW, Getz WM, Banfield JF. Genetics. 2007;177:407. [PMC free article] [PubMed]
22. Denef VJ, Mueller RS, Banfield JF. ISME J. 2010;4:599. [PubMed]
23. Dykhuizen DE, Green L. Journal of Bacteriology. 1991;173:7257. [PMC free article] [PubMed]
24. Lawrence JG. Theoretical population biology. 2002;61:449. [PubMed]
25. Turner T, Hahn M, Nuzhdin S. Plos Biol. 2005;3:1572. [PMC free article] [PubMed]
26. Neafsey DE, et al. Science. 2010;330:514. [PubMed]
27. Collins FS, et al. Science. 1987;235:1046. [PubMed]
28. Chaisson MJ, Brinza D, Pevzner PA. Genome Res. 2009;19:336. [PMC free article] [PubMed]
29. Zerbino DR, McEwen GK, Margulies EH, Birney E. PloS One. 2009;4:e8407. [PMC free article] [PubMed]
30. Shi L, et al. Nat Biotechnol. 2006;24:1151. [PMC free article] [PubMed]
31. Treangen TJ, Sommer DD, Angly FE, Koren S, Pop M. Current Protocols in Bioinformatics. 2011;11:1. [PMC free article] [PubMed]
32. Angiuoli SV, Salzberg SL. Bioinformatics. 2011;27:334. [PMC free article] [PubMed]
33. Kirkup BC, Chang L, Chang S, Gevers D, Polz MF. BMC Microbiol. 2010;10:137. [PMC free article] [PubMed]
34. Didelot X, Falush D. Genetics. 2007;175:1251. [PMC free article] [PubMed]
35. Minin VN, Dorman KS, Fang F, Suchard MA. Bioinformatics. 2005;21:3034. [PubMed]
36. Mau B, Glasner JD, Darling AE, Perna NT. Genome Biol. 2006;7:R44. [PMC free article] [PubMed]
37. Guindon S, Gascuel O. Systematic Biology. 2003;52:696. [PubMed]
38. Durbin R, Eddy S, Krogh A, Mitchison G. Biological Sequence Analysis: probabilistic models of proteins and nucleic acids. Cambridge Univ. Press; Cambridge: 1998. pp. 54–66.
39. Rambaut A, Grassly NC. Computer applications in the biosciences: CABIOS. 1997;13:235. [PubMed]
40. Didelot X, Lawson D, Falush D. Bioinformatics. 2009;25:1442. [PubMed]
41. Didelot X, Lawson D, Darling A, Falush D. Genetics. 2010;186:1435. [PMC free article] [PubMed]
42. Holt KE, et al. Nat Genet. 2008;40:987. [PMC free article] [PubMed]
43. Overbeek R. Nucleic Acids Res. 2005;33:5691. [PMC free article] [PubMed]
44. McDonald JH, Kreitman M. Nature. 1991;351:652. [PubMed]
45. Price MN, Dehal PS, Arkin AP. PLoS ONE. 2010;5:e9490. [PMC free article] [PubMed]
46. Felsenstein J. PHYLIP: Phylogeny Inference Package version 3.5c. 1993.
47. Bryant D, Moulton V. Mol Biol Evol. 2004;21:255. [PubMed]
48. Marttinen P, et al. Nucleic Acids Res. 2012;40:e6. doi: 10.1093/nar/gkr928. [PMC free article] [PubMed] [Cross Ref]
49. McVean G, Awadalla P, Fearnhead P. Genetics. 2002;160:1231. [PMC free article] [PubMed]
50. Georgopoulos C. Genetics. 2006;174:1699. [PMC free article] [PubMed]
51. Wildschutte H, Preheim SP, Hernandez Y, Polz MF. Environ Microbiol. 2010;12:2977. [PubMed]
52. Lee SJ, Gralla JD. The Journal of biological chemistry. 2002;277:47420. [PubMed]
53. Lin W, et al. Proc Natl Acad Sci USA. 1999;96:1071. [PMC free article] [PubMed]
54. Myers LC, Terranova MP, Ferentz AE, Wagner G, Verdine GL. Science. 1993;261:1164. [PubMed]
55. Barrick JE, et al. Nature. 2009;461:1243. [PubMed]
56. de Visser JAGM. Microbiology. 2002;148:1247. [PubMed]
57. Hazen T, Kennedy K, Chen S, Yi S, Sobecky P. Environ Microbiol. 2009;11:1254. [PubMed]
58. Kalinowski ST, Hedrick PW. Heredity. 2001;87:698. [PubMed]
59. Edgar RC. Nucleic Acids Res. 2004;32:1792. [PMC free article] [PubMed]
PubReader format: click here to try


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...