FEMS Microbiol Rev. 2009 Jan; 33(1): 109–132.
Published online 2008 Dec 1. doi:  10.1111/j.1574-6976.2008.00144.x
PMCID: PMC2704941

The dynamic genetic repertoire of microbial communities

Section Editor: Victor de Lorenzo


Community genomic data have revealed multiple levels of variation between and within microbial consortia. This variation includes large-scale differences in gene content between ecosystems as well as within-population sequence heterogeneity. In the present review, we focus specifically on how fine-scale variation within microbial and viral populations is apparent from community genomic data. A major unresolved question is how much of the observed variation is due to neutral vs. adaptive processes. Limited experimental data hint that some of this fine-scale variation may be in part functionally relevant, whereas sequence-based and modeling analyses suggest that much of it may be neutral. While methods for interpreting population genomic data are still in their infancy, we discuss current interpretations of existing datasets in the light of evolutionary processes and models. Finally, we highlight the importance of virus–host dynamics in generating and shaping within-population diversity.

Keywords: community genomics, CRISPR, genetic heterogeneity, metagenomics, population genomics, virus–host dynamics


Microbial ecology is undergoing a technology-driven renaissance that challenges our understanding of natural microbial communities. The application of molecular tools, from 16S rRNA gene sequencing to community genomic and postgenomic methods, has provided unprecedented insights into the genetic and physiological dynamics within complex microbial assemblages. We can now obtain an incredibly detailed view of genetic and phenotypic diversity. The novelty and depth of these data is a challenge to previous conceptual paradigms of microbial community structure and function.

While considerable genetic variation between closely related microbial strains is apparent from isolate genome-sequencing studies (e.g. Alm et al., 1999; Parkhill et al., 2000; Tettelin et al., 2005), the extent of variation detected in natural populations with genomic techniques is far greater (e.g. Tyson et al., 2004; Venter et al., 2004; García-Martín et al., 2006; Rusch et al., 2007). Studies in multiple environments indicate that fine-scale genetic variation within populations is a hallmark of natural microbial assemblages, and that it is at least in part functionally relevant (Frias-Lopez et al., 2008; Wilmes et al., 2008a).

To illustrate the importance of structure, variation within populations, and fine-tuning by evolutionary forces, we use the analogy of a symphony orchestra (Fig. 1). We illustrate our analogy with the example of acid mine drainage (AMD) biofilms growing within subsurface sulfuric acid solutions (pH c. 1, c. 40 °C) underground within the Richmond Mine (Iron Mountain, Redding, CA). In these biofilms, different species are partitioned into distinct ecological niches (Wilmes et al., 2008b) analogous to the specific seating arrangement of different instruments. AMD biofilms are dominated by the chemoautotrophic Nitrospira phylum bacteria Leptospirillum groups II and III. Leptospirillum group II is the predominant member of the biofilm community and, hence, in the analogy, is associated with the violin section (Fig. 1). Its less abundant relative, Leptosprillum group III, is represented by the violas (Fig. 1). Leptospirillum group II can be broadly classified into two sequence types, 5-way CG and UBA (Tyson et al., 2004; Lo et al., 2007) and, hence, these are affiliated with the first and second violins, respectively (Fig. 1). Further fine-scale genetic variation within each Leptospirillum group II population is apparent from extensive population genomic data (Simmons et al., 2008) and corresponds to the unique sound characteristics of each individual violin in the orchestra. More phylogenetically distinct organisms are equivalent to more distantly related instruments (Fig. 1).

Fig. 1
The microbial orchestra analogy showing relatedness of individual community members in acid mine drainage biofilms with corresponding instrumental groups.

The relative abundances and arrangement of organisms within a community may vary considerably according to environmental conditions, as do instrument numbers and seating arrangement according to performance space and composition. Just as a conductor shapes the membership, arrangement, and sound of an instrumental orchestra, natural evolutionary processes shape microbial communities. Here we present an overview of the types of population-level variations detected by community genomic (metagenomic) studies, followed by a discussion of how these data can be used to test the role of specific evolutionary processes involved in structuring communities. In particular, we focus on the importance of host–viral interactions. With the advent of functional postgenomic methodologies, i.e. transcriptomics, proteomics, and metabolomics, we are now able to listen to the ‘tunes’ played by microbial orchestras. This type of ecosystem-level analysis was recently reviewed by Raes & Bork (2008). We argue that a focus on fine-scale variation is essential to achieve a more complete understanding of microbial community function.

The genetic repertoire of microbial communities

Recent studies based on PCR amplification and pyrosequencing of 16S rRNA gene fragments have revealed a vast phylotypic diversity in a wide range of microbial habitats [e.g. ocean (Sogin et al., 2006), soil (Roesch et al., 2007), and air (Tringe et al., 2008)]. Although these approaches provide estimates of species richness within a given community, they are unable to resolve the true genetic diversity contained within microbial populations. Genome plasticity causes extensive variations in gene content between closely related strains of the same species (Medini et al., 2005). Based on DNA reassociation kinetics of pooled genomic DNA, Gans et al. (2005) estimated that 1 g of pristine soil may contain 106 distinct genotypes. This number far exceeds phylotypic diversity estimates for soil [e.g. 52 000 phylotypes (Roesch et al., 2007)]. Consequently, due to the dynamic nature of microbial genomes, phylotypic diversity may not correlate well with genotypic and phenotypic diversity, and, hence, genotypic richness within a given sample cannot be inferred from rRNA surveys.

Community genomics (metagenomics) based on random shotgun sequencing of microbial community DNA goes far beyond marker gene surveys to provide an in-depth look at the genotypic richness within populations. The concept of sequencing genomic DNA directly from the environment was first suggested by Norman Pace (Pace et al., 1985) and first implemented in the 1990s (Schmidt et al., 1991; Stein et al., 1996; Schleper et al., 1998; Vergin et al., 1998). It initially involved sequencing large inserts of DNA derived from microbial communities. Functional genes of interest were linked to community members through phylogenetically informative marker genes. A notable discovery from this approach was the presence of bacterial rhodopsin in the surface ocean (Béjàet al., 2000). The high-throughput sequencing of environmental DNA was pioneered on viral communities (Breitbart et al., 2002), echoing the first complete isolate genome ever sequenced, bacteriophage ΦX174 (Sanger et al., 1977). Large-scale sequencing of bacterial and archaeal communities followed shortly thereafter (e.g. Schmeisser et al., 2003; Tyson et al., 2004; Venter et al., 2004). As anticipated only 3 years ago (Allen & Banfield, 2005), microbial communities are currently being sequenced en masse. At the time of writing (June 2008) >30 metagenomic studies had been published (Table 1).

Table 1
Overview of random microbial community sequencing studies in chronological order

Apart from random shotgun sequencing of microbial communities, more targeted approaches involve high-throughput sequencing of individual genomes. Single-cell genomics is based on multiple-displacement whole genome amplification (recently reviewed by Lasken, 2007; Binga et al., 2008; Ishoey et al., 2008). This method has resulted in up to 75% of the expected genome coverage as compared with the standard Sanger sequencing of isolates. The most successful application of single-cell sequencing to date relied on microfluidic cell separation and resulted in the sequencing of at least 1500 genes from a representative of the previously uncharacterized TM7 lineage (Marcy et al., 2007). Other alternatives for dissection of complex samples into simpler components include flow cytometry and cell sorting based on FISH (Raghunathan et al., 2005; Kalyuzhnaya et al., 2006), micromanipulation to isolate a cohesive population, for example Beggiatoa filaments (Mussmann et al., 2007), or microbial ‘bait’ to pull out syntrophic assemblages (Pernthaler et al., 2008). Although these approaches provide detailed information about the selected microorganisms, the manipulation removes the environmental setting of the organism and it will omit potentially important co-occurring microorganisms from the analysis.

The wealth of community genomic information allows microbial ecologists to explore the enormous genetic diversity contained within different microbial habitats. However, unless a cell selection method is used, the extent of genomic coverage of community constituents mainly depends on the microbial diversity contained within an analyzed sample. Currently, metagenomic investigations may be broadly classified according to two types: (1) gene-centric investigations where extensive genomic assemblies are unobtainable due to extensive microbial diversity within the sample (e.g. Tringe et al., 2005) and/or due to the sequencing method used (e.g. Edwards et al., 2006) and (2) genome-centric studies where extensive de novo assembly is obtainable due to limited species richness (e.g. Tyson et al., 2004), the application of complexity reduction methods (e.g. Pernthaler et al., 2008), or where previously sequenced isolate genomes allow recruitment of genomic fragments (e.g. Coleman et al., 2006).

Gene-centric metagenomics

Gene-centric approaches using automated gene calling and annotation of genomic fragments followed by the assignment of detected genes to functional categories facilitate the structural and functional comparison of distinct environmental samples. Tringe et al. (2005) demonstrated that gene complements vary distinctly between different ecosystems and reflect known characteristics of the terrestrial and marine environments that were sampled, such as photosynthesis in the Sargasso Sea and starch and sucrose catabolism in soil. Environmental gene censuses provide a coarse overview of the genetic potential within a given ecosystem and, by juxtaposition of distinct datasets, can reveal interesting taxonomic and functional aspects of particular habitats.

The gene-centric approach has been applied to a range of different microbial ecosystems. For example, Kurokawa et al. (2007) found that the structural and functional composition of infant gut microbiomes varies extensively between individuals and is functionally less redundant compared with adults and children. Overall, the individual gut metagenomes exhibited prominent enrichment in genes indicative of distinct nutrient acquisition strategies related to the hosts' diets. Differences in community composition and functional gene complements are also observed on a large scale, such as in microbial communities inhabiting the water column overlying four coral atolls along an c. 750-km-long ocean transect (Dinsdale et al., 2008b). Moving along the transect from a pristine atoll to increasingly human-impacted reefs, Dinsdale et al. (2008b) observed a marked shift in community composition and functional gene complement from autotrophy to heterotrophy that may be directly related to anthropogenic effects. DeLong et al. (2006) sequenced large-insert libraries derived from microbial communities sampled at different depths in the North Pacific Gyre, and noted the enrichment of particular gene categories in distinct environments, which they hypothesized to reflect distinct adaptive strategies. For example, genes involved in chemotaxis were enriched in the photic zone, suggestive of a free-swimming lifestyle, while deep-water samples were enriched in genes involved in particle attachment and biofilm formation. The broadest overview of differing genetic potentials within microbial communities was recently described across 45 different microbial habitats (Dinsdale et al., 2008a). The study focused on the microbial and corresponding viral constituents of samples from multiple environments ranging from solar salterns to mosquito guts (Dinsdale et al., 2008a). Although most of the functional diversity was redundant, the relative abundances of genes linked to particular metabolisms varied, and as previously highlighted by Tringe et al. (2005), the differences in functional gene content reflected the environments from which the samples had been taken.

The relatively new field of experimental metagenomics has so far used a gene-centric approach, but explicitly addresses differences between manipulated systems. Two of the most notable of these types of studies involved comparisons of the gut microbiota of obese and lean mice (Turnbaugh et al., 2006) and the identification of large niche breadth associated with the use of a range of different carbon compounds in the coastal ocean (Mou et al., 2008).

Gene-centric analyses are constrained due to the large fraction of genes of unknown function and the inability to place individual genes into genomic context. The sequencing method used can also significantly bias gene identification, as short reads generated with 454 pyrosequencing are less likely to match distant homologs with blast than reads generated with Sanger sequencing (Wommack et al., 2008). Hence, a subset of fine-scale genetic differences that may be ecologically significant is not considered. In the present review, we focus mainly on genome-centric community genomics because these approaches allow us to infer the effects of fine-scale evolution (recombination, mutation) on community-level ecology and, hence, facilitate a distinctly different view of community composition and function. We refer the reader to the recent review of Raes & Bork (2008) for a more involved discussion of the integration of gene-centric methods with other systems-level data.

Genome-centric metagenomics

Genome-centric approaches based on extensive genomic reconstruction of community constituents have been applied to microbial ecosystems containing low species richness (e.g. Tyson et al., 2004; Woyke et al., 2006; Robidart et al., 2008) and/or dominant organism types (e.g. García-Martín et al., 2006; Strous et al., 2006). Gene annotation of genomic fragments assigned to specific organisms facilitates comprehensive metabolic reconstructions of community members (García-Martín et al., 2006; Strous et al., 2006; Robidart et al., 2008) and, hence, provides insight into possible metabolic partitioning among community members (Tyson et al., 2004; Woyke et al., 2006; Warnecke et al., 2007). Detailed metabolic reconstructions may reveal new aspects of the metabolisms of certain community constituents and highlight previously unknown characteristics of a particular metabolic process. Tyson et al., (2004) identified nitrogen fixation genes on a genomic scaffold assigned to Leptospirillum group III and this organism was obtained in pure culture using nitrogen fixation as an isolation strategy (Tyson et al., 2005). Strous et al. (2006) identified candidate genes involved in ladderane biosynthesis and hydrazine metabolism in the composite genome of the dominant organism ‘Candidatus Kuenenia stuttgartiensis’, an uncultured Planctomycete that carries out anaerobic ammonium oxidation (anammox). These previously unknown genes are important components of the anammox process.

Apart from enabling comprehensive metabolic reconstructions of community members, genome-centric metagenomics allows the fine-scale resolution of genetic heterogeneity within distinct populations. Community genomic studies that achieve extensive de novo genomic assemblies of community constituents reveal that the extent of within-population variation differs widely within ecosystems (Table 2). For example, the frequencies of single nucleotide polymorphisms (SNPs) in populations in the AMD system vary from around 0.08% (Leptospirillum group II) to 2.2% (Ferroplasma acidarmanus; Tyson et al., 2004). The SNP frequency in four endosymbionts of the marine oligochaete Olavius algarvensis range from 0.01% (δ4) to 0.1% (γ1) (Woyke et al., 2006; Table 2).

Table 2
Single nucleotide polymorphism (SNP) densities

Genetic heterogeneity within microbial populations

The genetic heterogeneity of microbial populations was first apparent from the comparison of multiple genome sequences from organisms considered to be strains of the same species. At first, the observation of 25% unique gene content between Escherichia coli K12 and O157:H7 despite c. 98% average nucleotide identity (ANI) between their orthologs seemed remarkable (Hayashi et al., 2001). These findings were confirmed by the comparison of 20 available strains of E. coli and Shigella sharing 98–99% ANI, which predicted that every newly sequenced genome will add c. 300 new genes to the E. coli‘pan-genome’ (Konstantinidis et al., 2006). The pan-genome size seems to depend on the ecology of the organism. Phenotypically and ecologically more coherent species, such as obligate pathogens, tend to have smaller pan-genomes [<50 genes added for every new strain of Streptococcus agalactiae (Tettelin et al., 2005)] than organisms residing in more dynamic environments (Rocap et al., 2003; Thompson et al., 2005). Population-level heterogeneity even exists within supposedly clonal populations used for sequencing, mostly due to rapid processes including the spread of insertion sequence elements and phase inversions (Cerdeno-Tarraga et al., 2005; Chain et al., 2006). Overall, the findings indicate the dynamic nature of population-level genome content and structure.

The extent of heterogeneity within bacterial and archaeal populations calls into question whether our current species definition corresponds with distinct evolutionary units or natural groups (Doolittle & Papke, 2006; Bapteste & Boucher, 2008). Higher-level taxonomic groups based on phylogenetic markers are demonstrably coherent despite extensive strain-to-strain variation (Ochman et al., 2005), possibly because differences in gene content are localized on genomic islands (Chain et al., 2006; Coleman et al., 2006; Kettler et al., 2007; Mathee et al., 2008). These islands, which might be neutral or transient, mainly encode hypothetical proteins (Konstantinidis et al., 2006). Nevertheless, a subset may confer adaptive traits. We will revisit the question of the fitness effects of gene content variation later in this review.

Because reliance on isolate genomes alone limits the scope of observable genomic heterogeneity, recent efforts have focused on using random shotgun sequencing of microbial communities followed by assembly and documentation of various types of within-population variability. Genome-centric approaches fall into two classes: (1) de novo sequence assembly and (2) recruitment of environmental genome fragments to isolate genomes followed by some degree of assembly.

De novo assembly

De novo genome assembly from shotgun sequencing data was used to obtain comprehensive and deeply sampled genomic datasets (up to 25 × coverage) for multiple organisms from AMD biofilms (Tyson et al., 2004; Allen et al., 2007; Lo et al., 2007; Simmons et al., 2008), which allowed for a direct analysis of in situ population heterogeneity. The level of within-population variability ranges from near-clonal (Lo et al., 2007) to freely recombining (Eppley et al., 2007b; Table 2). The two deepest coverage assemblies were obtained for two Leptospirillum group II populations sampled at the UBA and 5-way locations within the Richmond Mine (Lo et al., 2007; Simmons et al., 2008;Fig. 2). These two populations are c. 95% identical at the amino acid level, although they have been shown to recombine (Lo et al., 2007; Denef et al., 2008). In addition, there is recombination within the Leptospirillum group II 5-way CG population between distinct substrains <0.5% divergent (Simmons et al., 2008; Fig. 2). Strikingly, based on the extensive gene content variation within the Leptospirillum group II population, it could be inferred that the number of unique genotypes was only one order of magnitude less than the number of cells in the population (Simmons et al., 2008).

Fig. 2
Examples of genome-wide fine-scale analysis of sequence variation in Leptospirillum group II 5-way CG. (a) Part of the Leptospirillum group II 5-way CG genome assembled from population genomic data. The first inner ring shows a moving average of SNP density. ...

High levels of recombination were detected in two distinct populations (types I and II) related to F. acidarmanus. The Ferroplasma type II population within one sample consisted of individuals with mosaic genomes formed by recombination between distinct genome types (Tyson et al., 2004). A comparison of a Ferroplasma type I isolate with its corresponding populations revealed that much of the observed heterogeneity was due to transposase movement and phage insertions and deletions (Allen et al., 2007). The majority of Ferroplasma type I genes were under strong stabilizing selection as only six loci out of 1963 exhibited nonsynonymous vs. synonymous SNP ratios indicative of positive selection. Recombination was more frequent within both Ferroplasma type I and type II populations than between them, consistent with a log-linear decline in recombination frequency with sequence divergence (Eppley et al., 2007b). In summary, the AMD system studies have confirmed the isolate sequence-based hypothesis of population-level heterogeneity in gene content and the movement of mobile elements within natural populations. Additionally, these studies uncovered the prevalence of recombination within and between natural populations.

Community genomic approaches have also resulted in deep sequence coverage of the dominant population in two types of activated sludge enrichment cultures (García-Martín et al., 2006; Strous et al., 2006). Little fine-scale variation was apparent in the dominant population of the anammox bacterium ‘Candidatus Kuenenia stuttgartiensis’ (Strous et al., 2006). The sludge community was dominated by a single clonal type and this may be due to the long-term selection implemented by enrichment culturing. In contrast, a study using sludge from enhanced biological phosphorus removal (EBPR) reactors in the United States and Australia did retrieve population heterogeneity. These communities were dominated by similar genotypes (>95% identical at the nucleotide level) of ‘Candidatus Accumulibacter phosphatis’ (Accumulibacter phosphatis; García-Martín et al., 2006), but substantial strain diversity (up to 15% divergent at the nucleotide level) was present within both A. phosphatis populations. This heterogeneity was corroborated by extensive fine-scale variation among A. phosphatis rRNA internally transcribed spacer regions (He et al., 2006) and the polyphosphate kinase 1 gene (He et al., 2007; Wilmes et al., 2008a).

Genomic fragment recruitment

For metagenomic datasets generated from diverse environments where de novo assembly is difficult or impossible, the in silico recruitment of closely related genomic fragments and comparison with sequenced isolate genomes is an effective approach to study within-population variation. The Global Ocean Survey (GOS) sequencing data, which comprised 6.3 Gbp generated from diverse marine microbial communities along an 8000 km ocean transect, required the extensive use of this method (Rusch et al., 2007). Fragment recruitment, as first described by Coleman et al. (2006), was performed for those genera for which isolate sequences were available [Pelagibacter (Giovannoni et al., 2005), Prochlorococcus (Rocap et al., 2003), and Synechococcus (Palenik et al., 2006)]. In addition, newly assembled composite genomic fragments from the GOS data provided additional reference sequence. These analyses revealed tremendous sequence variation consisting of SNPs, gene and genomic island insertions, deletions and rearrangements, and geographic clines in sequence patterns. These results are consistent with the extensive allelic diversity and genome size variation previously observed in marine microbial populations (Thompson et al., 2005).

Mate-pair analysis of the GOS dataset suggested that gene synteny was highly conserved. A more quantitative analysis involving gene-based matching of metagenomic fragments to a Pelagibacter isolate genome found that gene synteny was highly conserved between populations, despite large geographic separation and an average 30% amino acid divergence (Wilhelm et al., 2007). It was suggested that gene order conservation is due to low functional diversity in the SAR11 population, with the caveat that large-scale genome rearrangements are less likely to be identified by the applied method. However, a SAR11 fosmid clone from the English Channel exhibited multiple differences in a hypervariable region compared with the previously available SAR11 sequences (Gilbert et al., 2008). Rusch et al. (2007) also argue that the fine-scale genetic variation among closely related organisms may reflect functional differentiation between subtypes.

The GOS (Rusch et al., 2007) and Sargasso Sea (Venter et al., 2004) datasets have also been used in additional recruitment studies using reference sequences from other sources, such as the picoeukaryote Ostreococcus tauri (Piganeau & Moreau, 2007) and Cenarcheaum symbiosum (Hallam et al., 2006). Using the genome of O. tauri, Piganeau & Moreau (2007) recruited genomic fragments amounting to 23% of the complete nuclear genome (14% of protein-coding genes), identified two new Ostreococcus strains from the recruited fragments and found that introns have a high proportion of conserved sites (70%). The C. symbiosum reference sequence was assembled from a limited number of fosmid clones from a sponge sample highly enriched for the target organism (Hallam et al., 2006). Fosmids binned into two subpopulations and were c. 15% divergent at the nucleotide level between populations and c. 2% divergent within each population. Again, gene order seemed to be conserved between the two subpopulations, as well as between the sponge symbionts and free-living relatives in the Sargasso Sea (based on fragment recruitment). The authors suggested that clonal diversification was the dominant evolutionary process in C. symbiosum. Population-level heterogeneity was clearly present, although the lack of sequencing depth weakens conclusions about gene content homogeneity within the symbiont populations. Genomic regions that were not present in the planktonic population were suggested to be essential for the symbiotic interactions of C. symbiosum and its sponge host.

A tandem isolate and metagenomic sequencing approach was used by Bhaya et al. (2007) on microbial mat communities of Yellowstone hot springs. Two cyanobacterial isolates (Synechococcus OS-A and OS-B') that dominate the microbial mats at different temperatures were sequenced. Both Synechococcus population representatives shared a large proportion of their gene content at high identity but exhibited extensive genome rearrangements. Differences in phosphate and nitrogen pathways indicated that both populations are distinct in their nutrient utilization. The two isolate genomes served as ‘anchor’ genomes to recruit closely related metagenomic sequences. These exhibited a high degree of variability and demonstrated that the sequenced isolates are not representative of all Synechococcus populations at the two sites. Interestingly, the low-temperature populations exhibited greater sequence diversity compared with the high-temperature populations. Furthermore, Bhaya and colleagues found evidence for functionally specialized populations and, hence, suggest that these ‘ecotypes’ occupy distinct niches within the microbial mats.

A recruitment-based comparative metagenomic approach was also applied to the halophilic square archaeon Haloquadratum walsbyi. This organism, which dominates mature saturated brine communities, has only recently been isolated and sequenced (Bolhuis et al., 2006). End-sequence analysis of a metagenomic fosmid library revealed a remarkable diversity of genes and evidence for genomic islands (Legault et al., 2006; Cuadros-Orellana et al., 2007), leading to the suggestion that the pan-genome of H. walsbyi may be at least double the size of the sequenced isolate. Some genomic islands displayed features of virus-mediated genetic exchange. Importantly, the vast majority of dissimilar gene content was related to small-molecule transport and detection, representing possible adaptations to different pools of organic nutrients (Cuadros-Orellana et al., 2007).

In summary, most observational studies, either based on comparative genomic analysis of isolates or metagenomic datasets, consistently reveal within-population gene content and sequence diversification. These findings substantiate previous work using phylogenetic marker genes and genome fingerprinting of Vibrio isolates that showed extremely high diversity between closely related strains (Acinas et al., 2004; Thompson et al., 2005). The emerging picture is of populations as clouds of genetic material separated from other related populations by levels of genetic exchange that decline with increasing sequence divergence (Eppley et al., 2007b; Rusch et al., 2007; Simmons et al., 2008; G. J. Dick et al., unpublished data). The level of genetic exchange and sequence divergence varies from little (near-clonal) to high (free recombination), as measured both by population genomics and more traditional multilocus sequence typing (MLST) of isolates (reviewed by Pérez-Losada et al., 2006). The set of variable genes and genome rearrangements may be so large in some populations that no two individuals have exactly the same genotype (Thompson et al., 2005; Rusch et al., 2007; Chantratita et al., 2008; Simmons et al., 2008). Interpretation of extensive genetic heterogeneity is generally unresolved. Possible explanations for high levels of variation include diversification on the generation timescale in response to viral predation (Andersson & Banfield, 2008; Tyson & Banfield, 2008), neutral diversification (Acinas et al., 2004) or resource partitioning between closely related strains [<1% 16S rRNA gene divergence (Hunt et al., 2008)]. We discuss below some methods that can be applied to the question of whether within-population variation is largely neutral or has adaptive significance.

Evolutionary interpretation of population heterogeneity

Interpreting the adaptive significance of sequence variation within and between populations represents a considerable challenge, which is only beginning to be addressed with the advent of community genomic data. Much of the observed variation may be neutral, and persist in microbial populations due to potentially quite large, but presently unknown, effective population sizes (Mes, 2008). Basic population genetic theory predicts that neutral variation will persist in a population for a number of generations of the same order of magnitude as the effective population size, Ne, if genetic drift is the only force acting on it (Gillespie, 2004; Mes, 2008). Ne determines the rate at which variation is lost from a population, and is highly sensitive to bottlenecks (such as periodic selection events). Given the enormous census sizes of microbial populations, however, Ne could still be large enough to ensure an extremely long fixation time for neutral variation. In fact, the Ne for E. coli is estimated to be 108–109 based on polymorphism at the third codon position (Hartl et al., 1994). One theoretical model suggests that mutation occurring in neutral gene variants is sufficient to block their fixation in large populations, leading to a large flux of transient novel sequences (Berg & Kurland, 2002). This is consistent with empirical observations of high genotype diversity derived from comparisons of isolates (Thompson et al., 2005) and population genomic assemblies (Allen et al., 2007; Simmons et al., 2008).

Expression and bioinformatic studies have provided indirect insight into the differential fitness of genotypic variants. Hypervariable regions, often called ‘gene islands,’ contain a significantly higher proportion of novel genes compared with the rest of the genome (Hsiao et al., 2005). While in general, a lower fraction of genes in islands are expressed as compared with genes in the core genome, some can be among the most abundant transcripts or proteins in environmental samples (Ram et al., 2005; Frias-Lopez et al., 2008; V.J. Denef et al., unpublished data; D.S.A. Goltsman et al., unpublished data; Fig. 3). The size of the expressed fraction seems to vary depending on the organism studied, with the caveat that there are very few studies of this type available.

Fig. 3
Experimental evidence of the role of the ‘flexible’ genome content. (a) Environmental transcriptomic data from Prochlorococcus MIT3901 from a Sargasso Sea sample (Frias-Lopez et al., 2008). The cDNA levels, normalized using the levels ...

Analysis of environmental transcripts extracted from a marine sample showed that a majority of the flexible gene content of Prochlorococcus genomes was both present and expressed at similar levels to core genes (Frias-Lopez et al., 2008; Fig. 3a). Laboratory experiments with isolated strains of Prochlorococcus also support the importance of hypervariable regions in environmental adaptation. In one strain, 26% of all genes in highly variable regions of the genome were differentially expressed under changed nutrient or light conditions in culture (Coleman et al., 2006). Bioinformatic analysis also supports the potential adaptive value of genomic islands in other species. For example, several of the genomic islands differentiating the soil bacterium Burkholderia xenovorans LB400 from other strains of its species contain the genes enabling it to degrade chlorinated aromatics (Chain et al., 2006). Many additional examples regarding the importance of genomic islands in environmental adaptation have been summarized elsewhere (Dobrindt et al., 2004).

Proteomics holds particular promise for the elucidation of discrete functional differences between closely related organisms and placing these into evolutionary and environmental context. Distinct protein profiles for strains of the same species are easily obtained by single-dimensional (Vauterin et al., 1991) and two-dimensional (Dopson et al., 2004) polyacrylamide gel electrophoresis. Using protein profiles from four Ferroplasma isolated strains that are >98.9% similar at the 16S rRNA gene level but exhibit phenotypic differences in culture, Dopson et al. (2004) constructed a phylogenetic tree that was congruent with a tree based on DNA–DNA similarities and, thus, demonstrated the ability of using proteomics for phylogenetic characterization of discrete populations. Morris et al. (2007) were able to deduce the contribution of distinct strains of Dehalococcoides spp. to anaerobic dehalogenation within an uncharacterized mixed culture by determining the relative abundances of strain-specific peptides obtained from reductive dehalogenases. With the advent of shotgun proteomics based on liquid chromatography coupled with high-resolution tandem mass spectrometry and its application to microbial communities, individual peptides that originate from discrete populations within a mixed microbial community are identified (Lo et al., 2007; Wilmes et al., 2008a). By assigning peptides to different populations, Lo et al. (2007) were able to infer the genome architecture of a single Leptospirillum group II population within a genomically uncharacterized sample and demonstrated that its genome is a hybrid formed by recombination of the UBA and 5-way CG genome types. In the AMD biofilm system, Leptospirillum group II genome types are tractable because distinct biofilm samples are limited in their genotypic diversity (Denef et al., 2008). Although the picture becomes complex when several strains of the same species co-occur, strain-specific contributions to the overall protein pool can still be resolved.

Strain-resolved proteomics has been used to differentiate the expression of co-occurring protein variants within a single sample of activated sludge cultivated for EBPR in the United Kingdom and dominated by A. phosphatis (Wilmes et al., 2008a; Fig. 3b). The study revealed that 59% of identified proteins were derived from the flanking A. phosphatis populations and not from the dominant A. phosphatis strain in the sequenced sludges. A significant subset of these was involved in core-metabolism and EBPR-specific pathways. These results suggest an essential role for genetic diversity in maintaining the stable performance of microbial community-based biotechnological systems.

Somewhat different dynamics are apparent in AMD biofilm communities, where both proteomic (V.J. Denef et al., unpublished data) and genomic studies (Allen et al., 2007; Simmons et al., 2008) so far do not support large fitness effects for regions of variable gene content. The two Leptospirillum group II sequence types dominating the Richmond Mine AMD system differ by only 0.3% at the 16S rRNA gene level, and 20% of each organism's genome is unique relative to the other (Lo et al., 2007). An extensive analysis of 27 environmental proteomes derived from biofilm samples taken from a variety of environmental conditions has shown that while c. 70% of the proteins encoded by genes shared between organisms were identified, c. 75% of the unique gene complement was never identified, and only 1% of unique proteins were identified in every sample (V.J. Denef et al., unpublished data; Fig. 3c). In summary, if we take expression levels under different conditions as an indicator of fitness, some proportion of genes in variable regions may have adaptive value, but others appear to be largely neutral. Possible caveats to this include the possibility that proteins expressed at low levels could significantly affect fitness, and methodological limitations of expression measurements, such as poor sensitivity or biases such as the low identification rate of membrane proteins. Nonetheless, the significantly lower identification levels for unique genes do strongly suggest that most of them are transient and do not significantly affect organismal fitness. Additional studies are clearly required to further address this issue.

Detection limits for community proteomics suggest that each organism for which a protein is identified must be present at an abundance of at least a few percent of the total community (N.C. VerBerkmoes et al., unpublished data), and the range of detectable proteins will improve with future technical developments in proteomics (P. Wilmes et al., unpublished data). To evaluate whether the expressed variants are important for community function, it will be necessary to measure expression levels over time in conjunction with process measurements. In addition, structural studies of microbial communities (e.g. biofilms; Wilmes et al., 2008b) may show whether particular variants are localized within distinct microniches. For example, enzyme variants that may be the most suited for a particular biotechnological application may be located at a particular position along a chemical gradient. Hence, more fine-scale measurements will be necessary in future to resolve the functional significance of genetic heterogeneity within microbial communities.

Sequence clusters in population genomics

Defined sequence clusters have been identified in metagenomic assemblies by binning, assembly based on sequence homology, or identity to large fragments of known origin, as discussed above (Tyson et al., 2004; Hallam et al., 2006; Allen et al., 2007; Eppley et al., 2007b; Rusch et al., 2007; Simmons et al., 2008). Smaller, less-divergent sequence clusters within assemblies can be detected through manual analysis of shared, linked polymorphisms (Whitaker & Banfield, 2006; Eppley et al., 2007b). Recent work shows that tetranucleotide frequencies can be used to cluster reads and contigs derived from complex natural communities at the species to genus level and higher, but they do not differentiate between closely related species, despite likely ecologically distinct roles (G. J. Dick et al., unpublished data).

The existence of these clusters, which are also apparent in isolate-based MLST studies, indicates that genetic exchange between populations is limited to varying degrees. It is unclear, however, as to how sequence clusters correspond to microbial ‘species’ (Achtman & Wagner, 2008) or ecologically distinct populations (Whitaker & Banfield, 2006). Possible processes leading to clusters include adaptation to particular environmental niches among coexisting populations, physical isolation or a decline in recombination frequency between coexisting populations due to neutral divergence within genomes (Whitaker et al., 2005; Fraser et al., 2007) without invoking fitness differences (Fraser et al., 2005, 2007; Falush et al., 2006). In one model of speciation (Fraser et al., 2007), the degree of clustering depends on the level of recombination relative to mutation. When recombination is low, populations have a largely clonal structure; sequence clusters continually emerge, split, and disappear over time. Distinct clusters disappear when recombination rates are one quarter to twice the mutation rate, marking the transition from a clonal to sexual population structure (Fraser et al., 2007). Because the rate of homologous recombination in bacteria is known to decline with increasing sequence divergence (Majewski, 2001), genetic drift could potentially lead to reduced rates of within-cluster relative to between-cluster recombination sufficient to cause the emergence of new species. The plausibility of this process appears to depend strongly on the dependence of the recombination rate on sequence divergence, population size, and other modeling assumptions, but under some reasonable parameter schemes, it is at least possible (Falush et al., 2006; Fraser et al., 2007).

Much of the theoretical literature on the formation of sequence clusters (e.g. Spratt et al., 2001; Fraser et al., 2005; Hanage et al., 2006; Didelot & Falush, 2007) is based on MLST data (Fig. 4), which are used to estimate rates of recombination, mutation, and migration. It is worth keeping in mind, however, that MLST allelic profiles subsume levels of variation detectable with higher resolution methods (Fig. 4), for example, strains of Vibrio splendidus differing at <1% of their 16S rRNA gene sequences showed large genome size differences (Thompson et al., 2005), and strains of Burkholderia pseudomallei found to be identical by MLST showed variable pulsed-field gel electrophoresis banding patterns (Chantratita et al., 2008). The clustered regularly interspaced short palindromic repeats (CRISPR) locus involved in phage resistance shows the most extreme level of fine-scale heterogeneity reported to date. In fact, it has been suggested that each cell within the Leptospirillum group II population has a distinct CRISPR locus (Tyson & Banfield, 2008). These levels of genome-wide variation have not been fully incorporated into evolutionary models for cluster formation and speciation.

Fig. 4
Continuum of variation with box text.

Evolutionary models

It is useful to take a brief look at how the extensive population genetic and experimental literature on microbial evolution might inform our understanding of population genomic data. The classical model of microbial evolution is the ‘periodic selection model,’ which was supported by early experimental work in E. coli (Atwood et al., 1951) and has a long history in bacterial population genetics (e.g. Levin, 1981). Briefly, this model posits that beneficial mutations with large effects on fitness arise rarely in asexual populations. The individual containing this large effect mutation rapidly rises to fixation via a selective sweep. Because recombination is essentially absent, this sweep carries an entire genotype to fixation, erasing diversity at all other loci. During the period of stasis in between the appearance of large-effect mutations, neutral diversity can again accumulate at multiple loci.

The periodic selection model is the basis for the ‘clonal ecotype’ model proposed by Cohan and others (Cohan, 2006; Ward, 2006). According to this model, in an environmental context, a single clonal type occupies a particular niche. This comes about because mutations that lead to increased fitness in the niche periodically arise in the population, leading to selective sweeps and the loss of neutral diversity. Multiple sequence clusters are inferred to represent occupants of distinct niches or, alternatively, the mixing of two physically separated populations. This model is rarely tested directly (but see Simmons et al., 2008). Typically, one or more marker gene phylogenies are constructed and the clustering of particular phylogenetic groups according to a limited set of environmental parameters is tested. A positive correlation is interpreted as a support for the ecotype model (Ward, 2006; Koeppel et al., 2008; Ward et al., 2008) because it implies that sequence clusters correspond to ecologically distinct populations.

The periodic selection model assumes that beneficial mutations are rare enough that they will not occur simultaneously in multiple individuals within a population, which may not be correct. The clonal interference model describes the dynamics of evolution when different beneficial mutations occur in multiple individuals before one of them can rise to fixation. Competition between these individuals results in the loss of some mutations and delayed fixation of others (Gerrish & Lenski, 1998). Clonal interference has been shown to occur in laboratory populations of E. coli, resulting in less-effective periodic selection (de Visser & Rozen, 2006). The amount of standing variation within a sequence cluster is probably larger under a clonal interference regime than a simple periodic selection regime, but because only one of these multiple mutations ultimately fixes in the population, marker gene phylogenies are insufficient to distinguish the two alternatives. Recent theoretical and experimental work suggests that multiple beneficial mutations co-occur in a subset of individuals within a population, and that these high fitness individuals drive the overall rate of evolution (Desai et al., 2007). Because smaller effect mutations are not lost in this regime, the amount of standing variation in the population will probably be higher than under either the clonal interference or the periodic selection models, but the form it would take in population genomic data is not known.

Theoretical and experimental work suggests that recombination provides a fitness advantage in microbial populations, which scales with the mutation rate (Cooper, 2007), suggesting that the clonal models described above may not be appropriate in all circumstances. In fact, high intraspecific recombination rates are frequently observed in environmental microbial populations using both MLST of isolates (Vergin et al., 1998; Papke et al., 2004; Whitaker et al., 2005; Vos & Didelot, 2008) and population genomic (Tyson et al. 2004; Allen et al., 2007; Eppley et al., 2007b; Simmons et al., 2008) and proteomic (Lo et al., 2007; Denef et al., 2008) data. Recombination unlinks the evolutionary fate of different parts of a genome, allowing selection to operate independently on individual loci or sets of linked loci. If selection is relatively weak, the net effect is higher levels of standing diversity within a population than we would expect from the clonal models discussed above. If recombination in a population is extensive, phylogenetic signals of vertical descent can be obscured. In fact, incongruence between phylogenies derived from different loci within a population is a widely used indicator for the occurrence of recombination (Feil & Spratt, 2001). Recombination plus weak selection can therefore result in the appearance of sequence clusters that do not correspond to ecologically unique species (Cohan, 2006; Whitaker & Banfield, 2006). We discuss below some methods that can be used to detect recombination directly in population genomic data.

Application of population genetic techniques to metagenomic data

The challenges inherent to the analysis of metagenomic data have not yet sparked the widespread development of novel theoretical methodology in the population genetic community (but see Johnson & Slatkin, 2006, 2008). Additionally, most population genomic studies do not make use of existing methods, apart from the basic calculation of polymorphism frequency in assemblies (Table 2). In general, existing population genetic tests are derived from theoretical models that predict how variation is distributed within and between individuals in a population and are based on assumptions about the evolutionary process. Through the analysis of sequence variation, these models attempt to calculate rates of mutation, selection, and recombination. Population genomic data from microbial communities present a unique challenge to such methods, in that each individual sequencing read is most likely derived from an individual cell. Genomic contigs produced through automated or manual assembly are composite sequences derived from multiple individuals and cannot be assumed to correspond to any real sequence in a population (Fig. 2). Especially in short-insert sequencing libraries, this means we cannot physically reconstruct the genome of any individual cell (a haplotype). Statistical reconstructions of individual haplotypes may be possible based on correlations in polymorphism frequency between samples, but such methods do not yet exist.

The lack of haplotype information presents particular problems for methods designed to detect recombination through comparisons of sequences from different individuals, using the coalescent theory (e.g. McVean et al., 2002; Fearnhead et al., 2004) or phylogenetic break-point methods (Minin et al., 2005). The assumptions of these methods allow recombination detection only on length scales smaller than an individual clone. This limitation makes any model-based detection of recombination over longer length scales or in less-variable genomes difficult. Hence, the only studies to tackle the problem of measuring recombination rates in large-scale population genomic datasets have done so using manual identification of breakpoints, which require a polymorphism density high enough for visual detection (Whitaker & Banfield, 2006; Eppley et al., 2007b; Simmons et al., 2008). This approach revealed a log-linear decline in recombination frequency with sequence divergence between populations of the archaeaon Ferroplasma present in AMD, consistent with findings in isolate genomes (Eppley et al., 2007b). Putative recombination breakpoints between very closely related strains of the bacterium Leptospirillum group II type 5-way CG (>99.5% relatedness) were identified with the visualization program strainer (Eppley et al., 2007a), but due to the low overall polymorphism density, their exact location could not be defined (Simmons et al., 2008). Recombination breakpoints were also identified in Leptospirillum group II using strain-resolved shotgun proteomics (Lo et al., 2007). It should be noted that recombination is also identifiable in population genomic datasets through discordant phylogenies for individual genes (Whitaker & Banfield, 2006).

The analysis of selection in individual genes, indels, or intragenic regions pulled out from population genomic datasets is more straightforward, and has been applied in a number of population genomic studies (e.g. Zeidner et al., 2005; Allen et al., 2007; Piganeau & Moreau, 2007; Wilhelm et al., 2007). Nielsen (2005) provides an excellent nontechnical overview of methods to detect selection in sequence data. Briefly, for individual genes, these fall into two classes: frequency spectrum and neutral/nonneutral mutations. The first tests whether the frequency distribution of polymorphisms in a set of aligned sequences is consistent with positive, negative, or no selection under particular evolutionary models. The second involves a comparison of the number of synonymous substitutions (assumed to be neutral) with the number of nonsynonymous substitutions (assumed to have a fitness effect). A dN/dS ratio >1 for the whole gene is generally assumed to indicate positive selection, because nonsynonymous substitutions would not be retained in the population unless they increased individual fitness. Caveats to this method include a systematic bias in comparisons of closely related organisms (Rocha et al., 2006) and a lack of power to detect selection when it occurs only on a subset of sites within a gene. In fact, most large-scale studies of dN/dS detect negative selection (the reduction of genetic diversity due to the elimination of deleterious mutations) on nearly all genes (Allen et al., 2007; Petersen et al., 2007). More complex phylogenetically based methods need to be used to detect particular sites under selection within a gene (Yang & Swanson, 2002). This is important to note, as single nonsynonymous mutations can alter the kinetics and specificity of enzymes, providing a means for the adaptation of distinct strains to specific environmental conditions. For example, a single amino acid substitution can switch marine proteorhodopsins (a widely distributed light-driven proton pump) from blue light to green light absorbing (Kelemen et al., 2003; Man et al., 2003), and this point mutation allows spectral tuning according to the position along a depth-dependent light gradient (Béjàet al., 2001).

The McDonald–Kreitman (MK) test (McDonald & Kreitman, 1991) is a more powerful use of counts of synonymous and nonsynonymous data. This test posits that under a model of neutral evolution, the ratio of nonsynonymous to synonymous substitutions within a population is the same as the ratio of nonsynonymous to synonymous fixed differences between populations. An excess of replacement fixed differences indicates positive selection on a particular locus, whereas a dearth indicates negative selection. This test is particularly well suited for testing the ‘ecotype’ hypothesis (Ward et al., 2008). This hypothesis predicts that regions differentiating coexisting organisms should encode genes responsible for their increased fitness in particular niches. If these regions are orthologous, and each coexisting organism is uniquely adapted to a particular niche, the MK test should show increased evidence of positive selection in these regions relative to the rest of the genome. Simmons et al. (2008) used the MK test to determine that distinct strains of Leptospirillum group II detected within population genomic assemblies do not appear to be positively selected for adaptive differences with the dominant population, indicating that the ecotype model was not applicable to the population. The availability of metagenomic datasets, in particular those that provide a deep sampling of one or more natural populations, is providing an opportunity to test previously proposed evolutionary models. Currently, however, both the methodology to perform population genetic analysis on these kinds of data as well as the number of appropriate datasets are limited. It is clear that a continued effort in this field will help us garner a higher-resolution understanding of the relative importance of different evolutionary forces. One particular evolutionary force we have yet to discuss is the genetic change induced by the dynamic interplay between viruses and their hosts.

The viral world and its role in perturbation and fine-tuning

In the majority of microbial ecosystems surveyed thus far, extracellular viral particles outnumber their archaeal, bacterial, and eukaryal hosts by at least one order of magnitude (Bergh et al., 1989; Fuhrman, 1999). Overall, the Earth is a reservoir for an estimated 1031 viruses, most of which are bacteriophages (Breitbart & Rohwer, 2005). Viruses may be responsible for killing up to 25% of microbial cells per hour in the ocean (Heldal & Bratbak, 1991; Steward et al., 1992), contributing to nutrient recycling. Thus, viruses have tremendous impacts on the Earth's biogeochemical cycles.

Viral diversity is significantly underrepresented in public sequence databases (Edwards & Rohwer, 2005). However, this is changing rapidly with the acquisition of extensive viral metagenomic sequences from multiple environments (Dinsdale et al., 2008a). Apart from virus-focused studies that have revealed extensive viral genetic diversity (Breitbart et al., 2002, 2003, 2004a, b; Angly et al., 2006; Culley et al., 2006; Zhang et al., 2006; Schoenfeld et al., 2008), several recent metagenomic studies have reported the simultaneous sampling of microorganisms and co-occurring viruses (DeLong et al., 2006; Edwards et al., 2006; Rusch et al., 2007; Andersson & Banfield, 2008; Dinsdale et al., 2008a; Williamson et al., 2008). Such studies are providing the first glimpses into the dynamics of virus–host interactions. Furthermore, they suggest that such interactions may have a significant effect on fine-scale genetic heterogeneity within communities. In fact, viruses impact host genotypes in several ways: they mediate gene transfer between host populations, integrate into host genomes, and drive rapid diversification of host CRISPR loci involved in phage resistance.

Viruses reproduce in their host either by the lytic or by the lysogenic cycle. The lytic cycle is thought to be the dominant mode of virus proliferation, involving the destruction of the host cell through a burst event or the continuous secretion of viruses into the extracellular environment. In the lysogenic cycle, a temperate virus integrates its genome into the host's genome, becoming a provirus that can be transmitted to daughter cells until, at a later stage, it releases and the virus proliferates via the lytic cycle.

These two lifestyles allow viruses to be important mediators of genetic exchange in the environment (Ripp et al., 1994; Jiang & Paul, 1998). As agents of gene transfer, viruses may supply the host with new genetic material in the form of integrated elements (reviewed by Faruque & Mekalanos, 2003; Sherwood, 2003; Brussow et al., 2004) and replace cellular genes by viral nonorthologs (horizontal or lateral gene transfer; Filée et al., 2002, 2003). In some cases, viruses are known to increase the short- and long-term survival fitness of the host (Brussow et al., 2004). Cyanophages infecting Synechococcus and Prochlorococcus carry genes involved in photosynthesis (Mann et al., 2003; Lindell et al., 2004). The expression of cyanophage-encoded photosystem proteins (psbA/psbD) helps to support photosynthetic activity in the host during the infection cycle, providing photosynthetic gene-carrying cyanophages with a selective advantage (Lindell et al., 2004). Viral psbA and psbD have been detected in open ocean metagenomic surveys (Venter et al., 2004; Angly et al., 2006; DeLong et al., 2006; Rusch et al., 2007). Sixty percent of psbA genes along the GOS sampling transect were of viral origin, suggesting that cyanophages may have a pronounced effect on global photosynthetic productivity (Sharon et al., 2007). Moreover, phage psbA genes are evolving under levels of purifying selection that are virtually indistinguishable from those acting on host proteins (Zeidner et al., 2005). Exchange and reshuffling of psbA genes occurs between Synechococcus and Prochlorococcus via phage intermediates, as well as between phages and hosts and between phages (Sullivan et al., 2003). Consequently, cyanophages appear to play a role in both short- and long-term adaptation in host populations.

Little is known about the molecular mechanisms facilitating rapid genome evolution in microbial viruses. Comparative genomics suggest that the viral gene pool appears to be shaped primarily by illegitimate and homologous recombination (Hendrix, 2003; Martinsohn et al., 2008). Apart from recombination, recently described diversity-generating retroelements (Liu et al., 2002) allow viruses to generate adaptive diversity through a stochastic mechanism analogous to the mammalian immune system (Medhekar & Miller, 2007).

Recent evidence suggests that the viral gene pool extends across different biomes. Identical or near-identical bacteriophage-encoded genes have been identified in different ecosystems (Breitbart et al., 2004a, b; Short & Suttle, 2005). Because of their similarity, these genes may have moved between environments within recent evolutionary history, for example within the last 1000–2000 years (Breitbart & Rohwer, 2005). Two distinct processes may explain the movement of bacteriophage-encoded genes from one biome to another:

  1. Transfer of single genetic elements. Within natural virus populations, the rate of reassortment exceeds the rate of substitution (Silander et al., 2005) and, hence, lateral gene transfer may be a mechanism for the global movement of viral genetic elements between biomes (Breitbart et al., 2004a, b; Breitbart & Rohwer, 2005; Silander et al., 2005).
  2. Immigration of phages. Virus diversity in Yellowstone National Park hotsprings was primarily maintained by high rates of foreign immigration and recombination rather than mutation (Snyder et al., 2007). Furthermore, transplanted viruses find hosts in foreign biomes (Sano et al., 2004). These findings suggest that either identical microbial hosts are found in different environments or mobile viruses have broad host ranges (Jensen et al., 1998; Sullivan et al., 2003; Beumer & Robinson, 2005).

Host defense mechanisms

Hosts and viruses are involved in a continuous evolutionary arms race. Archaeal and bacterial hosts have a number of viral defense mechanisms in their arsenal. These include restriction-modification systems (Wilson & Murray, 1991), cell-surface manipulations (Weitz et al., 2005), exopolysaccharide production (Sutherland, 2001), biofilm formation (Sutherland et al., 2004), abortive infection systems (Sturino & Klaenhammer, 2007) and the CRISPR system (recently reviewed by Sorek et al., 2008). Pronounced variation in genomic regions related to these systems (exopolysaccharide synthesis cassettes and CRISPR loci) is apparent between strains of the same species, for example Streptococcus thermophilus (Bolotin et al., 2004) and ‘Candidatus Accumulibacter phosphatis’ (Kunin et al., 2008).

The CRISPR system has recently attracted considerable attention as it represents a putative archaeal and bacterial immune system for defense against foreign DNA (Makarova et al., 2006). CRISPR genomic regions are comprised of a few to many tens (or even hundreds) of tandem-repeated DNA sequences, typically 21–47 bp in length, separated by nonrepetitive spacer sequences of approximately the same length and variable arrays of CRISPR-associated (cas) genes (Makarova et al., 2006). Cas proteins share functional similarity with proteins involved in eukaryotic RNA interference systems and, hence, it has been hypothesized that spacers function analogously to small interfering RNAs (Makarova et al., 2006). Although the exact functional mechanism of the CRISPR-system has yet to be determined, Barrangou et al. (2007) elegantly demonstrated in cultures of S. thermophilus that the CRISPR locus provides resistance against bacteriophages and that resistance specificity is determined by spacer-phage sequence similarity.

More recently, Andersson & Banfield (2008) were able to use spacer sequences to retrieve corresponding viral sequences from community genomic datasets and assemble large viral genomic fragments. Using this targeted approach, virus–host dynamics were resolved by linkage of host-encoded spacer sequences to the corresponding viruses. CRISPRs are highly variable between closely related individuals and evolve rapidly (Tyson & Banfield, 2008). Only the most recently acquired spacers match coexisting viruses (Andersson & Banfield, 2008). This suggests that incorporation of new spacers into the CRISPR locus counteracts rapid local viral evolution and foreign immigration. Furthermore, visual analysis of viral contigs suggests that spacer evasion may occur predominantly through recombination (Fig. 5a). Consequently, viruses and hosts are locked into a continuous ‘arms race’ between the host's defenses and the virus counterdefenses, as symbolized by the Red Queen Principle (Van Valen, 1973).

Fig. 5
The dynamic interplay between viruses and their hosts (Andersson & Banfield, 2008). (a) Population structure of the AMDV2 virus population, showing extensive recombination between closely related sequence variants. Putative genes are displayed ...

Models for viral population dynamics

Direct extrapolation from metagenomic data suggests that there may be c. 100 million distinct viral genotypes (Rohwer, 2003). This diversity is not partitioned equally across spatial scales, however, due to the fact that viruses (or at least some of their genes) move between biomes. The observation that viruses can be globally distributed but have high local diversity led to the development of the ‘Bank Model’ (Breitbart & Rohwer, 2005). This model assumes that only the most abundant viruses in a given environment are active, with the remaining low-abundance fraction being analogous to an inactive seed bank. Furthermore, only abundant viruses behave according to the ‘Kill-the-Winner’ hypothesis (Thingstad & Lignell, 1997), in which the dominant host population is reduced by viral attack, allowing a new host population to rise in frequency. The model is supported by rank-abundance curves indicating that the vast majority of viral genotypes are extremely rare (Breitbart & Rohwer, 2005).

Numerous findings indicate that the Bank Model may not accurately describe viral population dynamics in all/some environments. Significant lower host–virus ratios in extreme environments as well as short half-lives (48 h) in the marine environment are an indication that free-living viruses generally degrade rapidly (Wommack & Colwell, 2000; Breitbart et al., 2004a, b), making a large bank of low-abundance and inactive extracellular lysing viruses unlikely. The model also does not account for the dynamics of nonlysing viruses, which are secreted from host cells without killing them [as in the hyperthermophilic archaeon Sulfolobus tengchongensis (Xiang et al., 2005)]. Viral secretion is highly advantageous if fecundity is only slightly compromised relative to lytic bursts (Bull et al., 2004). However, the abundance of nonlysing viruses in nature is not known, likely due to their inability to form plaques in plate count assays (representing a possible second ‘plate count anomaly’ in microbial ecology). These viruses would be classified as ‘inactive’ under the Breitbart–Rohwer Bank Model, but may in fact replicate slowly and continuously. The prevalence of nonlysing viruses may allow different viral genotypes to coinfect a microbial cell, resulting in extensive recombination within the cell.

Although simple ‘Kill-the-Winner’ scenarios are common in the laboratory environment, few studies suggest that this succession pattern is prevalent in natural communities (Mühling et al., 2005; Martinez et al., 2007). Detailed analysis of CRISPR spacers and coexisting viruses in AMD biofilms (Andersson & Banfield, 2008) suggests that the prolonged coevolution of virus–host pairs leads to broad genetic diversity within the local viral gene pool (Fig. 5a). Only one virus–host pair out of five virus populations analyzed in detail in AMD exhibited a pattern suggestive of recent virus immigration and a targeted selective sweep as predicted by the Bank Model. The observed heterogeneity among incorporated CRISPR spacers within microbial populations (Tyson & Banfield, 2008) suggests that host cells likely differ in their susceptibility to certain viruses. Concomitantly, due to the extensive variability among viral genotypes, viruses likely differ in their virulence. Thus, relatively stable coexisting host and virus populations seem possible (Andersson & Banfield, 2008). Only in a limited number of cases does a potent lysing virus emerge locally or immigrate from the Bank that results in a selective sweep among a dominant group of organisms. Consequently, in at least some environments, ‘Kill-the-Winner’ scenarios may be more the exception than the norm.

The patterns of spacer diversity within CRISPR loci suggest that virus population dynamics may be quite subtle. Bioinformatic and experimental evidence both indicate that novel spacers are added to only one end of the CRISPR locus nearest the cas genes, and that infection by novel viral types results in spacer addition (Barrangou et al., 2007). Analysis of deeply sampled CRISPR loci in natural populations are consistent with this observation; spacers at one end of the locus are nearly identical in all individuals sampled, while at the opposite end each individual cell has a unique spacer complement (Andersson & Banfield, 2008; Tyson & Banfield, 2008;Fig. 5b). Hence, the evolution of the CRISPR spacer complement may be explained by the following scenario: infection by a novel viral genotype results in the lysis or weakening of most individuals, except those that are able to capture and incorporate a corresponding spacer into their CRISPR locus. At present, we do not know the fraction of individuals within a population that gain resistance by spacer addition, nor the rate at which viruses can evade CRISPR-acquired resistance via mutation or recombination. Resistant individuals would rapidly gain a selective advantage, leading to the fixation of the resistant spacer and its associated spacer inventory within the CRISPR locus. Under a straightforward ‘Kill-the-Winner’ scenario, we might expect this rapid rise of a single resistant host type to result in homogenization of the entire locus in the population, which appears inconsistent with virus population genomic data on hand at this time. However, if we assume that cells resistant to a certain viral genotype are being continually infected by mutated variants of the same virus or other viruses during their rise in frequency, diverse new spacers could be added to one end of the CRISPR locus while it is homogenized by selection on the other.

The observed heterogeneity in microbial hosts' spacer complements as well as the extensive viral genotypic diversity suggests that fine-scale variation is a major factor influencing host–virus dynamics. Future studies based on in-depth sampling of CRISPR spacers and corresponding viruses will determine the temporal and spatial scales important for virus–host evolution, and will result in more comprehensive models for virus–host dynamics.


Community genomics is one among a diverse set of tools that can be applied to gain a greater understanding of microbial communities. The complexities revealed by these large and detailed datasets challenge us to consider a number of important new questions. Gene-centric analyses, as discussed above, allow construction of functional scaffolds to model metabolic interactions within a community (e.g. Warnecke et al., 2007) as well as the determination of large-scale differences between the gene complements of distinct ecosystems (e.g. Tringe et al., 2005; DeLong et al., 2006; Dinsdale et al., 2008a). It is now clear, however, that genetic variation within microbial communities is extensive at multiple levels. A gene-centric approach, while informative for certain questions, leaves this variation largely untouched. Community genomic data can provide significant insights into ecological and evolutionary dynamics within communities. This level of analysis is vital to a complete understanding of the form, function, and dynamics of variation within microbial consortia.

Our current understanding of the role of within-population genetic heterogeneity is limited. Theoretical models suggest that some fraction of this variation could result from neutral evolutionary processes such as mutation, recombination, and genetic drift, while others have suggested that sequence variation demonstrates niche-specific adaptation. The wider application of established population genetic tools to detect signatures of selection in community genomic sequence data could shed significant light on this question. To date, experimental data on the expression of genes in hypervariable regions suggest that at least some genotypic diversity contributes to community functioning. Because of the limited number of studies that have addressed the relevance of fine-scale variation in natural populations, it is premature to make any general conclusions regarding its fitness effects.

The importance of fine-scale genetic variation within microbial populations is an interesting question from a basic scientific perspective, but it also has important practical implications. Human society relies heavily on microorganisms. Over the millennia, humans have learnt to harness and engineer several microbial processes. These range from food preservation (Ross et al., 2002) to the treatment of waste (Daims et al., 2006) to the provision of raw materials for manufacturing (Bosecker, 1997). To return to our initial orchestra analogy, although we are attempting to take over the role of the microbial community conductor, we have limited knowledge of the score and how it is played. Metagenomics, in combination with functional approaches, offers opportunities to help improve our performance. Improvement is necessary, because our current lack of understanding often results in mediocre process performances and intermittent failures.

Particularly problematic are phage attacks that represent a major financial burden to the fermentation industry (Petty et al., 2007). In order to improve the operational stability of such microbial processes, a detailed understanding of community dynamics is essential. In particular, the elucidation of virus–host interactions in relation to the recent discovery of the CRISPR system holds great promise for future biotechnological applications. This knowledge might allow us to use the CRISPR system to engineer microbial communities. For example, the system could be used to shape community composition either by improving resistance to phage predation or by silencing specific genes within microbial populations. Moreover, in the light of current challenges imposed by antibiotic resistance (Kluytmans-VandenBergh & Kluytmans, 2006), detailed knowledge of virus–host interactions deduced from studying CRISPR spacers and their targeted viruses might lead to novel infection treatment technologies. For example, rapid CRISPR spacer typing of pathogenic bacteria may provide the foundation for synthetic phage therapy, which could be facilitated by current advances in the field of synthetic biology.

It is important that biotechnology, including the emerging field of synthetic biology, reflects on the lessons learned from failed attempts to use clonal isolates for the engineering of microbial systems, for example bioaugmentation (El Fantroussi & Agathos, 2005). Furthermore, considering the extensive population-level heterogeneity, it could be fruitful to revisit the current quest for the ideal biocatalyst (Burton et al., 2002), using strategies that exploit the diversity in natural communities. It is interesting to note that the most widely applied and one of the most successful ‘bio-catalysts’, activated-sludge in wastewater treatment, harnesses natural communities with their inherent population-level heterogeneity (García-Martín et al., 2006; Kunin et al., 2008; Wilmes et al., 2008a). The question now is whether this heterogeneity confers system resilience and whether communities can be engineered to provide certain services more efficiently? To quote Leonardo Da Vinci: ‘Human subtlety will never devise an invention more beautiful, more simple or more direct than does Nature, because in her inventions, nothing is lacking and nothing is superfluous.’


Funding was provided by the United States Department of Energy Genomics: GTL Program (Office of Science).


Re-use of this article is permitted in accordance with the Creative Commons Deed, Attribution 2.5, which does not permit commercial exploitation.


  • Achtman M, Wagner M. Microbial diversity and the genetic nature of microbial species. Nat Rev Microbiol. 2008;6:431–440. [PubMed]
  • Acinas SG, Klepac-Ceraj V, Hunt DE, Pharino C, Ceraj I, Distel DL, Polz MF. Fine-scale phylogenetic architecture of a complex bacterial community. Nature. 2004;430:551–554. [PubMed]
  • Allen EE, Banfield JF. Community genomics in microbial ecology and evolution. Nat Rev Microbiol. 2005;3:489–498. [PubMed]
  • Allen EE, Tyson GW, Whitaker RJ, Detter JC, Richardson PM, Banfield JF. Genome dynamics in a natural microbial strain population. P Natl Acad Sci USA. 2007;104:1883–1888. [PMC free article] [PubMed]
  • Alm RA, Ling L-SL, Moir DT, et al. Genomic-sequence comparison of two unrelated isolates of the human gastric pathogen Helicobacter pylori. Nature. 1999;397:176–180. [PubMed]
  • Andersson AF, Banfield JF. Virus population dynamics and acquired virus resistance in natural microbial communities. Science. 2008;320:1047–1050. [PubMed]
  • Angly FE, Felts B, Breitbart M, et al. The marine viromes of four oceanic regions. PLoS Biol. 2006;4:e368. [PMC free article] [PubMed]
  • Atwood KC, Schneider LK, Ryan FJ. Periodic selection in Escherichia coli. P Natl Acad Sci USA. 1951;37:146–155. [PMC free article] [PubMed]
  • Bapteste E, Boucher Y. Lateral gene transfer challenges principles of microbial systematics. Trends Microbiol. 2008;16:200–207. [PubMed]
  • Barrangou R, Fremaux C, Deveau H, et al. CRISPR provides acquired resistance against viruses in prokaryotes. Science. 2007;315:1709–1712. [PubMed]
  • Béjà O, Aravind L, Koonin EV, et al. Bacterial rhodopsin: evidence for a new type of phototrophy in the sea. Science. 2000;289:1902–1906. [PubMed]
  • Béjà O, Spudich EN, Spudich JL, Leclerc M, DeLong EF. Proteorhodopsin phototrophy in the ocean. Nature. 2001;411:786–789. [PubMed]
  • Berg OG, Kurland CG. Evolution of microbial genomes: sequence acquisition and loss. Mol Biol Evol. 2002;19:2265–2276. [PubMed]
  • Bergh O, Borsheim KY, Bratbak G, Heldal M. High abundance of viruses found in aquatic environments. Nature. 1989;340:467–468. [PubMed]
  • Beumer A, Robinson JB. A broad-host-range, generalized transducing phage (SN-T) acquires 16S rRNA genes from different genera of bacteria. Appl Environ Microb. 2005;71:8301–8304. [PMC free article] [PubMed]
  • Bhaya D, Grossman AR, Steunou A-S, et al. Population level functional diversity in a microbial community revealed by comparative genomic and metagenomic analyses. ISME J. 2007;1:703–713. [PubMed]
  • Biddle JF, Fitz-Gibbon S, Schuster SC, Brenchley JE, House CH. Metagenomic signatures of the Peru Margin subseafloor biosphere show a genetically distinct environment. P Natl Acad Sci USA. 2008;105:10583–10588. [PMC free article] [PubMed]
  • Binga EK, Lasken RS, Neufeld JD. Something from (almost) nothing: the impact of multiple displacement amplification on microbial ecology. ISME J. 2008;2:233–241. [PubMed]
  • Bolhuis H, Palm P, Wende A, et al. The genome of the square archaeon Haloquadratum walsbyi: life at the limits of water activity. BMC Genomics. 2006;7:169. [PMC free article] [PubMed]
  • Bolotin A, Quinquis B, Renault P, et al. Complete sequence and comparative genome analysis of the dairy bacterium Streptococcus thermophilus. Nat Biotechnol. 2004;22:1554–1558. [PubMed]
  • Bosecker K. Bioleaching: metal solubilization by microorganisms. FEMS Microbiol Rev. 1997;20:591–604.
  • Breitbart M, Rohwer F. Here a virus, there a virus, everywhere the same virus? Trends Microbiol. 2005;13:278–284. [PubMed]
  • Breitbart M, Salamon P, Andresen B, et al. Genomic analysis of uncultured marine viral communities. P Natl Acad Sci USA. 2002;99:14250–14255. [PMC free article] [PubMed]
  • Breitbart M, Hewson I, Felts B, Mahaffy JM, Nulton J, Salamon P, Rohwer F. Metagenomic analyses of an uncultured viral community from human feces. J Bacteriol. 2003;185:6220–6223. [PMC free article] [PubMed]
  • Breitbart M, Felts B, Kelley S, Mahaffy JM, Nulton J, Salamon P, Rohwer F. Diversity and population structure of a near-shore marine-sediment viral community. P Roy Soc B: Biol Sci. 2004a;271:565–574. [PMC free article] [PubMed]
  • Breitbart M, Wegley L, Leeds S, Schoenfeld T, Rohwer F. Phage community dynamics in hot springs. Appl Environ Microb. 2004b;70:1633–1640. [PMC free article] [PubMed]
  • Brussow H, Canchaya C, Hardt W-D. Phages and the evolution of bacterial pathogens: from genomic rearrangements to lysogenic conversion. Microbiol Mol Biol R. 2004;68:560–602. [PMC free article] [PubMed]
  • Bull JJ, Pfennig DW, Wang I-N. Genetic details, optimization and phage life histories. Trends Ecol Evol. 2004;19:76–82. [PubMed]
  • Burton SG, Cowan DA, Woodley JM. The search for the ideal biocatalyst. Nat Biotechnol. 2002;20:37–45. [PubMed]
  • Cann JA, Fandrich ES, Heaphy S. Analysis of the virus population present in equine faeces indicates the presence of hundreds of uncharacterized virus genomes. Virus Genes. 2005;30:151–156. [PubMed]
  • Cerdeno-Tarraga AM, Patrick S, Crossman LC, et al. Extensive DNA inversions in the B. fragilis genome control variable gene expression. Science. 2005;307:1463–1465. [PubMed]
  • Chain PSG, Denef VJ, Konstantinidis KT, et al. Inaugural article: Burkholderia xenovorans LB400 harbors a multi-replicon, 9.73-Mbp genome shaped for versatility. P Natl Acad Sci USA. 2006;103:15280–15287. [PMC free article] [PubMed]
  • Chantratita N, Wuthiekanun V, Limmathurotsakul D, et al. Genetic diversity and microevolution of Burkholderia pseudomallei in the environment. PLoS Negl Trop Dis. 2008;2:e182. [PMC free article] [PubMed]
  • Cohan F. Towards a conceptual and operational union of bacterial systematics, ecology, and evolution. Philos T R Soc B. 2006;361:1985–1996. [PMC free article] [PubMed]
  • Coleman ML, Sullivan MB, Martiny AC, Steglich C, Barry K, DeLong EF, Chisholm SW. Genomic Islands and the ecology and evolution of Prochlorococcus. Science. 2006;311:1768–1770. [PubMed]
  • Cooper TF. Recombination speeds adaptation by reducing competition between beneficial mutations in populations of Escherichia coli. PLoS Biol. 2007;5:e225. [PMC free article] [PubMed]
  • Cox-Foster DL, Conlan S, Holmes EC, et al. A metagenomic survey of microbes in honey bee colony collapse disorder. Science. 2007;11:464–498. [PubMed]
  • Cuadros-Orellana S, Martin-Cuadrado A-B, Legault B, D'Auria G, Zhaxybayeva O, Papke RT, Rodriguez-Valera F. Genomic plasticity in prokaryotes: the case of the square haloarchaeon. ISME J. 2007;1:235–245. [PubMed]
  • Culley AI, Lang AS, Suttle CA. Metagenomic analysis of coastal RNA virus communities. Science. 2006;312:1795–1798. [PubMed]
  • Daims H, Taylor MW, Wagner M. Wastewater treatment: a model system for microbial ecology. Trends Biotechnol. 2006;24:483–489. [PubMed]
  • DeLong EF, Preston CM, Mincer T, et al. Community genomics among stratified microbial assemblages in the Ocean's interior. Science. 2006;311:496–503. [PubMed]
  • Denef VJ, VerBerkmoes NC, Shah MB, Abraham P, Lefsrud M, Hettich RL, Banfield JF. Proteomics-inferred genome typing (PIGT) demonstrates inter-population recombination as a strategy for environmental adaptation. Environ Microbiol DOI 10.1111/j.1462-2920.2008.01769.x. [PubMed]
  • Desai MM, Fisher DS, Murray AW. The speed of evolution and maintenance of variation in asexual populations. Curr Biol. 2007;17:385–394. [PMC free article] [PubMed]
  • de Visser JAGM, Rozen DE. Clonal interference and the periodic selection of new beneficial mutations in Escherichia coli. Genetics. 2006;172:2093–2100. [PMC free article] [PubMed]
  • Didelot X, Falush D. Inference of bacterial microevolution using multilocus sequence data. Genetics. 2007;175:1251–1266. [PMC free article] [PubMed]
  • Dinsdale EA, Edwards RA, Hall D, et al. Functional metagenomic profiling of nine biomes. Nature. 2008a;452:629–632. [PubMed]
  • Dinsdale EA, Pantos O, Smriga S, et al. Microbial ecology of four coral atolls in the Northern Line Islands. PLoS ONE. 2008b;3:e1584. [PMC free article] [PubMed]
  • Dobrindt U, Hochhut B, Hentschel U, Hacker J. Genomic islands in pathogenic and environmental microorganisms. Nat Rev Microbiol. 2004;2:414–424. [PubMed]
  • Doolittle WF, Papke RT. Genomics and the bacterial species problem. Genome Biol. 2006;7:116. [PMC free article] [PubMed]
  • Dopson M, Baker-Austin C, Bond PL. First use of two-dimensional polyacrylamide gel electrophoresis to determine phylogenetic relationships. J Microbiol Meth. 2004;58:297–302. [PubMed]
  • Edwards R, Rodriguez-Brito B, Wegley L, et al. Using pyrosequencing to shed light on deep mine microbial ecology. BMC Genomics. 2006;7:57. [PMC free article] [PubMed]
  • Edwards RA, Rohwer F. Viral metagenomics. Nat Rev Microbiol. 2005;3:504–510. [PubMed]
  • El Fantroussi S, Agathos SN. Is bioaugmentation a feasible strategy for pollutant removal and site remediation? Curr Opin Microbiol. 2005;8:268–275. [PubMed]
  • Eppley JM, Tyson GW, Getz WM, Banfield JF. Strainer: software for analysis of population variation in community genomic datasets. BMC Bioinformatics. 2007a;8:398. [PMC free article] [PubMed]
  • Eppley JM, Tyson GW, Getz WM, Banfield JF. Genetic exchange across a species boundary in the archaeal genus Ferroplasma. Genetics. 2007b;177:407–416. [PMC free article] [PubMed]
  • Falush D, Torpdahl M, Didelot X, Conrad D, Wilson D, Achtman M. Mismatch induced speciation in Salmonella: model and data. Philos T R Soc B. 2006;361:2045–2053. [PMC free article] [PubMed]
  • Faruque SM, Mekalanos JJ. Pathogenicity islands and phages in Vibrio cholerae evolution. Trends Microbiol. 2003;11:505–510. [PubMed]
  • Fearnhead P, Harding RM, Schneider JA, Myers S, Donnelly P. Application of coalescent methods to reveal fine-scale rate variation and recombination hotspots. Genetics. 2004;167:2067–2081. [PMC free article] [PubMed]
  • Feil EJ, Spratt BG. Recombination and the population structures of bacterial pathogens. Annu Rev Microbiol. 2001;55:561–590. [PubMed]
  • Fierer N, Breitbart M, Nulton J, et al. Metagenomic and small-subunit rRNA analyses reveal the genetic diversity of bacteria, archaea, fungi, and viruses in soil. Appl Environ Microb. 2007;73:7059–7066. [PMC free article] [PubMed]
  • Filée J, Forterre P, Sen-Lin T, Laurent J. Evolution of DNA polymerase families: evidences for multiple gene exchange between cellular and viral proteins. J Mol Evol. 2002;54:763–773. [PubMed]
  • Filée J, Forterre P, Laurent J. The role played by viruses in the evolution of their hosts: a view based on informational protein phylogenies. Res Microbiol. 2003;154:237–243. [PubMed]
  • Fraser C, Hanage WP, Spratt BG. Neutral microepidernic evolution of bacterial pathogens. P Natl Acad Sci USA. 2005;102:1968–1973. [PMC free article] [PubMed]
  • Fraser C, Hanage WP, Spratt BG. Recombination and the nature of bacterial speciation. Science. 2007;315:476–480. [PMC free article] [PubMed]
  • Frias-Lopez J, Shi Y, Tyson GW, Coleman ML, Schuster SC, Chisholm SW, DeLong EF. Microbial community gene expression in ocean surface waters. P Natl Acad Sci USA. 2008;105:3805–3810. [PMC free article] [PubMed]
  • Fuhrman JA. Marine viruses and their biogeochemical and ecological effects. Nature. 1999;399:541–548. [PubMed]
  • Gans J, Wolinsky M, Dunbar J. Computational improvements reveal great bacterial diversity and high metal toxicity in soil. Science. 2005;309:1387–1390. [PubMed]
  • García-Martín H, Ivanova N, Kunin V, et al. Metagenomic analysis of two enhanced biological phosphorus removal (EBPR) sludge communities. Nat Biotechnol. 2006;24:1263–1269. [PubMed]
  • Gerrish PJ, Lenski RE. The fate of competing beneficial mutations in an asexual population. Genetica. 1998;102–103:127–144. [PubMed]
  • Gilbert JA, Muhling M, Joint I. A rare SAR11 fosmid clone confirming genetic variability in the ‘Candidatus Pelagibacter ubique’ genome. ISME J. 2008;2:790–793. [PubMed]
  • Gill SR, Pop M, DeBoy RT, et al. Metagenomic analysis of the human distal gut microbiome. Science. 2006;312:1355–1359. [PMC free article] [PubMed]
  • Gillespie J. Population Genetics: A Concise Guide. Baltimore: The Johns Hopkins University Press; 2004.
  • Giovannoni SJ, Tripp HJ, Givan S, et al. Genome streamlining in a cosmopolitan oceanic bacterium. Science. 2005;309:1242–1245. [PubMed]
  • Green RE, Krause J, Ptak SE, et al. Analysis of one million base pairs of Neanderthal DNA. Nature. 2006;444:330–336. [PubMed]
  • Hallam SJ, Konstantinidis KT, Putnam N, et al. Genomic analysis of the uncultivated marine crenarchaeote Cenarchaeum symbiosum. P Natl Acad Sci USA. 2006;103:18296–18301. [PMC free article] [PubMed]
  • Hanage W, Spratt B, Turner K, Fraser C. Modelling bacterial speciation. Philos T R Soc B. 2006;361:2039–2044. [PMC free article] [PubMed]
  • Hartl DL, Moriyama EN, Sawyer SA. Selection intensity for codon bias. Genetics. 1994;138:227–234. [PMC free article] [PubMed]
  • Hayashi T, Makino K, Ohnishi M, et al. Complete genome sequence of enterohemorrhagic Escherichia coli O157: H7 and genomic comparison with a laboratory strain K-12. DNA Res. 2001;8:11–22. [PubMed]
  • He S, Gu AZ, McMahon KD. Fine-scale differences between Rhodocyclus-like bacteria in enhanced biological phosphorus removal activated sludge. Water Sci Technol. 2006;54:111–117. [PubMed]
  • He S, Gall DL, McMahon KD. “Candidatus Accumulibacter” population structure in enhanced biological phosphorus removal sludges as revealed by polyphosphate kinase genes. Appl Environ Microb. 2007;73:5865–5874. [PMC free article] [PubMed]
  • Heldal M, Bratbak G. Production and decay of viruses in aquatic environments. Mar Ecol Prog Ser. 1991;72:205–212.
  • Hendrix RW. Bacteriophage genomics. Curr Opin Microbiol. 2003;6:506–511. [PubMed]
  • Hsiao WWL, Ung K, Aeschliman D, Bryan J, Finlay BB, Brinkman FSL. Evidence of a large novel gene pool associated with prokaryotic Genomic Islands. PLoS Genetics. 2005;1:e62. [PMC free article] [PubMed]
  • Hunt DE, David LA, Gevers D, Preheim SP, Alm EJ, Polz MF. Resource partitioning and sympatric differentiation among closely related bacterioplankton. Science. 2008;320:1081–1085. [PubMed]
  • Ishoey T, Woyke T, Stepanauskas R, Novotny M, Lasken RS. Genomic sequencing of single microbial cells from environmental samples. Curr Opin Microbiol. 2008;11:198–204. [PMC free article] [PubMed]
  • Jensen EC, Schrader HS, Rieland B, Thompson TL, Lee KW, Nickerson KW, Kokjohn TA. Prevalence of broad-host-range lytic bacteriophages of Sphaerotilus natans, Escherichia coli, and Pseudomonas aeruginosa. Appl Environ Microb. 1998;64:575–580. [PMC free article] [PubMed]
  • Jiang SC, Paul JH. Gene transfer by transduction in the marine environment. Appl Environ Microb. 1998;64:2780–2787. [PMC free article] [PubMed]
  • Johnson PLF, Slatkin M. Inference of population genetic parameters in metagenomics: a clean look at messy data. Genome Res. 2006;16:1320–1327. [PMC free article] [PubMed]
  • Johnson PLF, Slatkin M. Accounting for bias from sequencing error in population genetic estimates. Mol Biol Evol. 2008;25:199–206. [PubMed]
  • Kalyuzhnaya MG, Zabinsky R, Bowerman S, Baker DR, Lidstrom ME, Chistoserdova L. Fluorescence in situ hybridization-flow cytometry-cell sorting-based method for separation and enrichment of Type I and Type II methanotroph populations. Appl Environ Microb. 2006;72:4293–4301. [PMC free article] [PubMed]
  • Kelemen BR, Du M, Jensen RB. Proteorhodopsin in living color: diversity of spectral properties within living bacterial cells. BBA-Biomembranes. 2003;1618:25–32. [PubMed]
  • Kettler GC, Martiny AC, Huang K, et al. Patterns and implications of gene gain and loss in the evolution of Prochlorococcus. PLoS Genet. 2007;3:e231. [PMC free article] [PubMed]
  • Kluytmans-VandenBergh MFQ, Kluytmans JAJW. Community-acquired methicillin-resistant Staphylococcus aureus: current perspectives. Clin Microbiol Infec. 2006;12:9–15. [PubMed]
  • Koeppel A, Perry EB, Sikorski J, et al. Identifying the fundamental units of bacterial diversity: a paradigm shift to incorporate ecology into bacterial systematics. P Natl Acad Sci USA. 2008;105:2504–2509. [PMC free article] [PubMed]
  • Konstantinidis KT, Ramette A, Tiedje JM. The bacterial species definition in the genomic era. Philos T R Soc B. 2006;361:1929–1940. [PMC free article] [PubMed]
  • Kunin V, He S, Warnecke F, et al. A bacterial metapopulation adapts locally to phage predation despite global dispersal. Genome Res. 2008;18:293–297. [PMC free article] [PubMed]
  • Kurokawa K, Itoh T, Kuwahara T, et al. Comparative metagenomics revealed commonly enriched gene sets in human gut microbiomes. DNA Res. 2007;14:169–181. [PMC free article] [PubMed]
  • Lasken RS. Single-cell genomic sequencing using multiple displacement amplification. Curr Opin Microbiol. 2007;10:510–516. [PubMed]
  • Legault B, Lopez-Lopez A, Alba-Casado J, Doolittle WF, Bolhuis H, Rodriguez-Valera F, Papke RT. Environmental genomics of “Haloquadratum walsbyi” in a saltern crystallizer indicates a large pool of accessory genes in an otherwise coherent species. BMC Genomics. 2006;7:171. [PMC free article] [PubMed]
  • Levin BR. Periodic selection, infectious gene exchange and the genetic structure of E. coli populations. Genetics. 1981;99:1–23. [PMC free article] [PubMed]
  • Lindell D, Sullivan MB, Johnson ZI, Tolonen AC, Rohwer F, Chisholm SW. Transfer of photosynthesis genes to and from Prochlorococcus viruses. P Natl Acad Sci USA. 2004;101:11013–11018. [PMC free article] [PubMed]
  • Liu M, Deora R, Doulatov SR, et al. Reverse transcriptase-mediated tropism switching in Bordetella bacteriophage. Science. 2002;295:2091–2094. [PubMed]
  • Lo I, Denef VJ, VerBerkmoes NC, et al. Strain-resolved community proteomics reveals recombining genomes of acidophilic bacteria. Nature. 2007;446:537–541. [PubMed]
  • Majewski J. Sexual isolation in bacteria. FEMS Microbiol Lett. 2001;199:161–169. [PubMed]
  • Makarova K, Grishin N, Shabalina S, Wolf Y, Koonin E. A putative RNA-interference-based immune system in prokaryotes: computational analysis of the predicted enzymatic machinery, functional analogies with eukaryotic RNAi, and hypothetical mechanisms of action. Biology Direct. 2006;1:7. [PMC free article] [PubMed]
  • Man D, Wang W, Sabehi G, et al. Diversification and spectral tuning in marine proteorhodopsins. EMBO J. 2003;22:1725–1731. [PMC free article] [PubMed]
  • Mann NH, Cook A, Millard A, Bailey S, Clokie M. Marine ecosystems: bacterial photosynthesis genes in a virus. Nature. 2003;424:741. [PubMed]
  • Marcy Y, Ouverney C, Bik EM, et al. Dissecting biological “dark matter” with single-cell genetic analysis of rare and uncultivated TM7 microbes from the human mouth. PNAS. 2007;104:11889–11894. [PMC free article] [PubMed]
  • Martín-Cuadrado A-B, López-García P, Alba J-C, et al. Metagenomics of the deep mediterranean, a warm bathypelagic habitat. PLoS ONE. 2007;2:e914. [PMC free article] [PubMed]
  • Martinez JM, Schroeder DC, Larsen A, Bratbak G, Wilson WH. Molecular dynamics of Emiliania huxleyi and cooccurring viruses during two separate mesocosm studies. Appl Environ Microb. 2007;73:554–562. [PMC free article] [PubMed]
  • Martinsohn JT, Radman M, Petit M-A. The λ red proteins promote efficient recombination between diverged sequences: implications for bacteriophage genome mosaicism. PLoS Genet. 2008;4:e1000065. [PMC free article] [PubMed]
  • Mathee K, Narasimhan G, Valdes C, et al. Dynamics of Pseudomonas aeruginosa genome evolution. P Natl Acad Sci USA. 2008;105:3100–3105. [PMC free article] [PubMed]
  • McDonald JH, Kreitman M. Adaptive protein evolution at the Adh locus in Drosophila. Nature. 1991;351:652–654. [PubMed]
  • McVean G, Awadalla P, Fearnhead P. A coalescent-based method for detecting and estimating recombination from gene sequences. Genetics. 2002;160:1231–1241. [PMC free article] [PubMed]
  • Medhekar B, Miller JF. Diversity-generating retroelements. Curr Opin Microbiol. 2007;10:388–395. [PMC free article] [PubMed]
  • Medini D, Donati C, Tettelin H, Masignani V, Rappuoli R. The microbial pan-genome. Curr Opin Genet Dev. 2005;15:589–594. [PubMed]
  • Mes TH. Microbial diversity – insights from population genetics. Environ Microbiol. 2008;10:251–264. [PubMed]
  • Minin VN, Dorman KS, Fang F, Suchard MA. Dual multiple change-point model leads to more accurate recombination detection. Bioinformatics. 2005;21:3034–3042. [PubMed]
  • Morris RM, Fung JM, Rahm BG, Zhang S, Freedman DL, Zinder SH, Richardson RE. Comparative proteomics of Dehalococcoides spp. reveals strain-specific peptides associated with activity. Appl Environ Microb. 2007;73:320–326. [PMC free article] [PubMed]
  • Mou X, Sun S, Edwards RA, Hodson RE, Moran MA. Bacterial carbon processing by generalist species in the coastal ocean. Nature. 2008;451:708–711. [PubMed]
  • Mühling M, Fuller NJ, Millard A, et al. Genetic diversity of marine Synechococcus and co-occurring cyanophage communities: evidence for viral control of phytoplankton. Environ Microbiol. 2005;7:499–508. [PubMed]
  • Mussmann M, Hu FZ, Richter M, et al. Insights into the genome of large sulfur bacteria revealed by analysis of single filaments. PLoS Biol. 2007;5:e230. [PMC free article] [PubMed]
  • Nielsen R. Molecular signatures of natural selection. Annu Rev Genet. 2005;39:197–218. [PubMed]
  • Noonan JP, Hofreiter M, Smith D, et al. Genomic sequencing of Pleistocene Cave bears. Science. 2005;309:597–599. [PubMed]
  • Ochman H, Lerat E, Daubin V. Examining bacterial species under the specter of gene transfer and exchange. P Natl Acad Sci USA. 2005;102(suppl 1):6595–6599. [PMC free article] [PubMed]
  • Pace NR, Stahl DA, Lane DJ, Olsen GJ. Analyzing natural microbial populations by rRNA sequences. ASM News. 1985;51:4–12.
  • Palenik B, Ren Q, Dupont CL, et al. Genome sequence of Synechococcus CC9311: insights into adaptation to a coastal environment. P Natl Acad Sci USA. 2006;103:13555–13559. [PMC free article] [PubMed]
  • Papke RT, Koenig JE, Rodriguez-Valera F, Doolittle WF. Frequent recombination in a saltern population of Halorubrum. Science. 2004;306:1928–1929. [PubMed]
  • Parkhill J, Wren BW, Mungall K, et al. The genome sequence of the food-borne pathogen Campylobacter jejuni reveals hypervariable sequences. Nature. 2000;403:665–668. [PubMed]
  • Pelletier E, Kreimeyer A, Bocs S, et al. “Candidatus Cloacamonas acidaminovorans”: genome sequence reconstruction provides a first glimpse of a new bacterial division. J Bacteriol. 2008;190:2572–2579. [PMC free article] [PubMed]
  • Perez-Brocal V, Gil R, Ramos S, et al. A small microbial genome: the end of a long symbiotic relationship? Science. 2006;314:312–313. [PubMed]
  • Pérez-Losada M, Browne EB, Madsen A, Wirth T, Viscidi RP, Crandall KA. Population genetics of microbial pathogens estimated from multilocus sequence typing (MLST) data. Infect Genet Evol. 2006;6:97–112. [PMC free article] [PubMed]
  • Pernthaler A, Dekas AE, Brown CT, Goffredi SK, Embaye T, Orphan VJ. Diverse syntrophic partnerships from deep-sea methane vents revealed by direct cell capture and metagenomics. P Natl Acad Sci USA. 2008;105:7052–7057. [PMC free article] [PubMed]
  • Petersen L, Bollback JP, Dimmic M, Hubisz M, Nielsen R. Genes under positive selection in Escherichia coli. Genome Res. 2007;17:1336–1343. [PMC free article] [PubMed]
  • Petty NK, Evans TJ, Fineran PC, Salmond GPC. Biotechnological exploitation of bacteriophage research. Trends Biotechnol. 2007;25:7–15. [PubMed]
  • Piganeau G, Moreau H. Screening the Sargasso Sea metagenome for data to investigate genome evolution in Ostreococcus (Prasinophyceae, Chlorophyta) Gene. 2007;406:184–190. [PubMed]
  • Poinar HN, Schwarz C, Qi J, et al. Metagenomics to paleogenomics: large-scale sequencing of mammoth DNA. Science. 2006;311:392–394. [PubMed]
  • Raes J, Bork P. Molecular eco-systems biology: towards an understanding of community function. Nat Rev Microbiol. 2008;6:693–699. [PubMed]
  • Raghunathan A, Ferguson HR, Jr, Bornarth CJ, Song W, Driscoll M, Lasken RS. Genomic DNA amplification from a single bacterium. Appl Environ Microb. 2005;71:3342–3347. [PMC free article] [PubMed]
  • Ram RJ, VerBerkmoes NC, Thelen MP, et al. Community proteomics of a natural microbial biofilm. Science. 2005;308:1915–1920. [PubMed]
  • Ripp S, Ogunseitan OA, Miller RV. Transduction of a freshwater microbial community by a new Pseudomonas aeruginosa generalized transducing phage, UT1. Microb Ecol. 1994;3:121–126. [PubMed]
  • Robidart JC, Bench SR, Feldman RA, et al. Metabolic versatility of the Riftia pachyptila endosymbiont revealed through metagenomics. Environ Microbiol. 2008;10:727–737. [PubMed]
  • Rocap G, Larimer FW, Lamerdin J, et al. Genome divergence in two Prochlorococcus ecotypes reflects oceanic niche differentiation. Nature. 2003;424:1042–1047. [PubMed]
  • Rocha EPC, Smith JM, Hurst LD, Holden MTG, Cooper JE, Smith NH, Feil EJ. Comparisons of dN/dS are time dependent for closely related bacterial genomes. J Theor Biol. 2006;239:226–235. [PubMed]
  • Roesch LFW, Fulthorpe RR, Riva A, et al. Pyrosequencing enumerates and contrasts soil microbial diversity. ISME J. 2007;1:283–290. [PMC free article] [PubMed]
  • Rohwer F. Global phage diversity. Cell. 2003;113:141–141. [PubMed]
  • Ross PR, Morgan S, Hill C. Preservation and fermentation: past, present and future. Int J Food Microbiol. 2002;79:3–16. [PubMed]
  • Rusch DB, Halpern AL, Sutton G, et al. The sorcerer II global Ocean sampling expedition: Northwest Atlantic through Eastern Tropical Pacific. PLoS Biol. 2007;5:e77. [PMC free article] [PubMed]
  • Sanger F, Air GM, Barrell BG, et al. Nucleotide sequence of bacteriophage ΦX174 DNA. Nature. 1977;265:687–695. [PubMed]
  • Sano E, Carlson S, Wegley L, Rohwer F. Movement of viruses between biomes. Appl Environ Microb. 2004;70:5842–5846. [PMC free article] [PubMed]
  • Schleper C, DeLong EF, Preston CM, Feldman RA, Wu K-Y, Swanson RV. Genomic analysis reveals chromosomal variation in natural populations of the uncultured psychrophilic archaeon Cenarchaeum symbiosum. J Bacteriol. 1998;180:5003–5009. [PMC free article] [PubMed]
  • Schmeisser C, Stockigt C, Raasch C, et al. Metagenome survey of biofilms in drinking water. Appl Environ Microb. 2003;69:7298–7309. [PMC free article] [PubMed]
  • Schmidt TM, DeLong EF, Pace NR. Analysis of a marine picoplankton community by 16S rRNA gene cloning and sequencing. J Bacteriol. 1991;173:4371–4378. [PMC free article] [PubMed]
  • Schoenfeld T, Patterson M, Richardson PM, Wommack KE, Young M, Mead D. Assembly of viral metagenomes from yellowstone hot springs. Appl Environ Microb. 2008;74:4164–4174. [PMC free article] [PubMed]
  • Sharon I, Tzahor S, Williamson S, et al. Viral photosynthetic reaction center genes and transcripts in the marine environment. ISME J. 2007;1:492–501. [PubMed]
  • Sherwood C. Prophages and bacterial genomics: what have we learned so far? Mol Microbiol. 2003;49:277–300. [PubMed]
  • Short CM, Suttle CA. Nearly identical bacteriophage structural gene sequences are widely distributed in both marine and freshwater environments. Appl Environ Microb. 2005;71:480–486. [PMC free article] [PubMed]
  • Silander OK, Weinreich DM, Wright KM, O'Keefe KJ, Rang CU, Turner PE, Chao L. Widespread genetic exchange among terrestrial bacteriophages. P Natl Acad Sci USA. 2005;102:19009–19014. [PMC free article] [PubMed]
  • Simmons SL, DiBartolo G, Denef VJ, Goltsman DSA, Thelen MP, Banfield JF. Population genomic analysis of strain variation in Leptospirillum group II bacteria involved in acid mine drainage formation. PLoS Biol. 2008;6:e177. [PMC free article] [PubMed]
  • Snyder JC, Wiedenheft B, Lavin M, et al. Virus movement maintains local virus population diversity. P Natl Acad Sci USA. 2007;104:19102–19107. [PMC free article] [PubMed]
  • Sogin ML, Morrison HG, Huber JA, et al. Microbial diversity in the deep sea and the underexplored “rare biosphere” P Natl Acad Sci USA. 2006;103:12115–12120. [PMC free article] [PubMed]
  • Sorek R, Kunin V, Hugenholtz P. CRISPR: a widespread system that provides acquired resistance against phages in bacteria and archaea. Nat Rev Microbiol. 2008;6:181–186. [PubMed]
  • Spratt BG, Hanage WP, Feil EJ. The relative contributions of recombination and point mutation to the diversification of bacterial clones. Curr Opin Microbiol. 2001;4:602–606. [PubMed]
  • Stein J, Marsh T, Wu K, Shizuya H, DeLong E. Characterization of uncultivated prokaryotes: isolation and analysis of a 40-kilobase-pair genome fragment from a planktonic marine archaeon. J Bacteriol. 1996;178:591–599. [PMC free article] [PubMed]
  • Steward GF, Wikner J, Cochlan WP, Smith DC, Azam F. Estimation of virus predation in the sea: 2. Field results. Mar Microbial food Webs. 1992;6:79–90.
  • Strous M, Pelletier E, Mangenot S, et al. Deciphering the evolution and metabolism of an anammox bacterium from a community genome. Nature. 2006;440:790–794. [PubMed]
  • Sturino JM, Klaenhammer TR. Inhibition of bacteriophage replication in Streptococcus thermophilus by subunit poisoning of primase. Microbiology. 2007;153:3295–3302. [PubMed]
  • Sullivan MB, Waterbury JB, Chisholm SW. Cyanophages infecting the oceanic cyanobacterium Prochlorococcus. Nature. 2003;424:1047–1051. [PubMed]
  • Sutherland IW. Biofilm exopolysaccharides: a strong and sticky framework. Microbiology. 2001;147:3–9. [PubMed]
  • Sutherland IW, Hughes KA, Skillman LC, Tait K. The interaction of phage and biofilms. FEMS Microbiol Lett. 2004;232:1–6. [PubMed]
  • Tettelin H, Masignani V, Cieslewicz MJ, et al. Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial “pan-genome” P Natl Acad Sci USA. 2005;102:13950–13955. [PMC free article] [PubMed]
  • Thingstad TF, Lignell R. Theoretical models for the control of bacterial growth rate, abundance, diversity and carbon demand. Aquat Microb Ecol. 1997;13:19–27.
  • Thompson JR, Pacocha S, Pharino C, et al. Genotypic diversity within a natural coastal bacterioplankton population. Science. 2005;307:1311–1313. [PubMed]
  • Tringe SG, von Mering C, Kobayashi A, et al. Comparative metagenomics of microbial communities. Science. 2005;308:554–557. [PubMed]
  • Tringe SG, Zhang T, Liu X, et al. The airborne metagenome in an Indoor Urban environment. PLoS ONE. 2008;3:e1862. [PMC free article] [PubMed]
  • Turnbaugh PJ, Ley RE, Mahowald MA, Magrini V, Mardis ER, Gordon JI. An obesity-associated gut microbiome with increased capacity for energy harvest. Nature. 2006;444:1027–1131. [PubMed]
  • Tyson GW, Banfield JF. Rapidly evolving CRISPRs implicated in acquired resistance of microorganisms to viruses. Environ Microbiol. 2008;10:200–207. [PubMed]
  • Tyson GW, Chapman J, Hugenholtz P, et al. Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature. 2004;428:37–43. [PubMed]
  • Tyson GW, Lo I, Baker BJ, Allen EE, Hugenholtz P, Banfield JF. Genome-directed isolation of the key nitrogen fixer Leptospirillum ferrodiazotrophum sp. nov. from an acidophilic microbial community. Appl Environ Microb. 2005;71:6319–6324. [PMC free article] [PubMed]
  • Van Valen L. A new evolutionary law. Evol Theor. 1973;1:1–30.
  • Vauterin L, Swings J, Kersters K. Grouping of Xanthomonas campestris pathovars by SDS-PAGE of proteins. J Gen Microbiol. 1991;137:1677–1687.
  • Venter JC, Remington K, Heidelberg JF, et al. Environmental genome shotgun sequencing of the Sargasso Sea. Science. 2004;304:66–74. [PubMed]
  • Vergin KL, Urbach E, Stein JL, DeLong EF, Lanoil BD, Giovannoni SJ. Screening of a fosmid library of marine environmental genomic DNA fragments reveals four clones related to members of the order Planctomycetales. Appl Environ Microb. 1998;64:3075–3078. [PMC free article] [PubMed]
  • Vos M, Didelot X. A comparison of homologous recombination rates in bacteria and archaea. ISME J. 2008 DOI 10.1038/ismej.2008.93. [PubMed]
  • Ward DM, Cohan FM, Bhaya D, Heidelberg JF, Kuhl M, Grossman A. Genomics, environmental genomics and the issue of microbial species. Heredity. 2008;100:207–219. [PubMed]
  • Ward N. New directions and interactions in metagenomics research. FEMS Microbiol Ecol. 2006;55:331–338. [PubMed]
  • Warnecke F, Luginbuhl P, Ivanova N, et al. Metagenomic and functional analysis of hindgut microbiota of a wood-feeding higher termite. Nature. 2007;450:560–565. [PubMed]
  • Wegley L, Edwards R, Rodriguez-Brito B, Liu H, Rohwer F. Metagenomic analysis of the microbial community associated with the coral Porites astreoides. Environ Microbiol. 2007;9:2707–2719. [PubMed]
  • Weitz JS, Hartman H, Levin SA. Coevolutionary arms races between bacteria and bacteriophage. P Natl Acad Sci USA. 2005;102:9535–9540. [PMC free article] [PubMed]
  • Whitaker RJ, Banfield JF. Population genomics in natural microbial communities. Trends Ecol Evol. 2006;21:508–516. [PubMed]
  • Whitaker RJ, Grogan DW, Taylor JW. Recombination shapes the natural population structure of the hyperthermophilic archaeon Sulfolobus islandicus. Mol Biol Evol. 2005;22:2354–2361. [PubMed]
  • Wilhelm LJ, Tripp HJ, Givan SA, Smith DP, Giovannoni SJ. Natural variation in SAR11 marine bacterioplankton genomes inferred from metagenomic data. Biol Direct. 2007;2:27. [PMC free article] [PubMed]
  • Williamson SJ, Rusch DB, Yooseph S, et al. The sorcerer II global Ocean sampling expedition: metagenomic characterization of viruses within aquatic microbial samples. PLoS ONE. 2008;3:e1456. [PMC free article] [PubMed]
  • Wilmes P, Andersson AF, Lefsrud MG, et al. Community proteogenomics highlights microbial strain-variant protein expression within activated sludge performing enhanced biological phosphorus removal. ISME J. 2008a;2:853–864. [PubMed]
  • Wilmes P, Remis JP, Hwang M, Auer M, Thelen MP, Banfield JF. Natural acidophilic biofilm communities reflect distinct organismal and functional organization. ISME J. 2008b DOI 10.1038/ismej.2008.90. [PubMed]
  • Wilson GG, Murray NE. Restriction and modification systems. Annu Rev Genet. 1991;25:585–627. [PubMed]
  • Wommack KE, Colwell RR. Virioplankton: viruses in aquatic ecosystems. Microbiol Mol Biol R. 2000;64:69–114. [PMC free article] [PubMed]
  • Wommack KE, Bhavsar J, Ravel J. Metagenomics: read length matters. Appl Environ Microb. 2008;74:1453–1463. [PMC free article] [PubMed]
  • Woyke T, Teeling H, Ivanova NN, et al. Symbiosis insights through metagenomic analysis of a microbial consortium. Nature. 2006;443:950–955. [PubMed]
  • Xiang X, Chen L, Huang X, Luo Y, She Q, Huang L. Sulfolobus tengchongensis spindle-shaped virus STSV1: virus–host interactions and genomic features. J Virol. 2005;79:8677–8686. [PMC free article] [PubMed]
  • Yang Z, Swanson WJ. Codon-substitution models to detect adaptive evolution that account for heterogeneous selective pressures among site classes. Mol Biol Evol. 2002;19:49–57. [PubMed]
  • Zeidner G, Bielawski JP, Shmoish M, Scanlan DJ, Sabehi G, Béjà O. Potential photosynthesis gene recombination between Prochlorococcus and Synechococcus via viral intermediates. Environ Microbiol. 2005;7:1505–1513. [PubMed]
  • Zhang T, Breitbart M, Lee WH, et al. RNA viral community in human feces: prevalence of plant pathogenic viruses. PLoS Biol. 2006;4:e3. [PMC free article] [PubMed]
PubReader format: click here to try


Save items

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


  • Cited in Books
    Cited in Books
    NCBI Bookshelf books that cite the current articles.
  • Compound
    PubChem chemical compound records that cite the current articles. These references are taken from those provided on submitted PubChem chemical substance records. Multiple substance records may contribute to the PubChem compound record.
  • MedGen
    Related information in MedGen
  • PubMed
    PubMed citations for these articles
  • Substance
    PubChem chemical substance records that cite the current articles. These references are taken from those provided on submitted PubChem chemical substance records.

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...