Appendix AContributed Manuscripts


,1,2 ,1,2 and 1,3.

1 Institute of Applied Genetics, University of North Texas Health Science Center, Fort Worth, TX.
2 Department of Forensic and Investigative Genetics, University of North Texas Health Science Center, Fort Worth, TX.
3 Virginia Tech, National Capital Region, Arlington, VA.

The Challenge

Eliminating the threat of terrorist or criminal attacks with microorganisms or toxin weapons is a continual challenge for biodefense and biosecurity programs. The task is difficult for several reasons: (1) the relative ease of access to a variety of effective source materials (Srivatsan et al., 2008) and options for the delivery of a bioweapon, (2) the minute quantities of materials that can be transferred and yet still be effective, (3) the difficulties in detection and analysis of microbiological evidence, and (4) the lack of well-defined approaches regarding credible inferences that can be made from microbial forensic evidence given extant data. At the onset of an event, it may be difficult to distinguish between a deliberate attack and a naturally occurring outbreak of an infectious disease (Morse and Budowle, 2006; Morse and Khan, 2005). Even if evidence strongly supports the hypothesis of a deliberate attack, it may still be very difficult to attribute the attack with certainty to those responsible (i.e., attribution). Attempts to resolve the crime will require advanced methods for characterizing microbial agents, as well as a combination of traditional investigation and intelligence gathering activities.

The Approach

In response to the need to determine the nature of the threat and the source of the weapon and to identify those who perpetrated the crime, the scientific community rose to the occasion beginning in 1996 and developed the field of microbial forensics. Microbial forensics is the scientific discipline dedicated to analyzing evidence from a bioterrorism act, biocrime, hoax, or inadvertent microorganism/toxin release for attribution purposes (Budowle et al., 2003, 2005a; Köser et al., 2012; Morse and Budowle, 2006). Another goal can be to support analysis of potential bioweapons capabilities for counter-proliferation, treaty verification, and/or interdiction. A forensics investigation initially will attempt to determine the identity of the causal agent and/or source of the bioweapon in much the same manner as in an epidemiological investigation. The epidemiological concerns are identification and characterization of specific disease-causing pathogens or their toxins, their modes of transmission, and any manipulations that may have been performed intentionally to increase their effects against human, animal, or plant targets (Morse and Budowle, 2006; Morse and Khan, 2005). A microbial forensics investigation proceeds further in that evidence is characterized to assist in determining the specific source of the sample, as individualizing as possible, and the methods, means, processes, and locations involved to determine the identity of the perpetrator(s) of the attack or to determine that an act is in preparation. A systems analysis may be able to determine the processes used to generate the weapon or how it was delivered, which also can help inform the investigation and attribution decision. The ultimate goal is attribution—to identify the perpetrator(s) or to reduce the potential perpetrator population to as few individuals as possible so investigative and intelligence methods can be effectively and efficiently applied to “build the case” (Figure A1-1).

A diagram of the microbial forensics attribution continuum


The microbial forensics attribution continuum.

Forensic Targets

Microbial forensic evidence may include the microbe, toxin, nucleic acids, protein signatures, inadvertent microbial contaminants, stabilizers, additives, dispersal devices, and indications of the methods used in a preparation. In addition, traditional types of forensic evidence may be informative and should be part of the toolbox of potential analyses of evidence from an act of bioterrorism or biocrime. Traditional evidence includes fingerprints, body fluids and tissues, hair, fibers, documents, photos, digital evidence, videos, firearms, glass, metals, plastics, paint, powders, explosives, tool marks, and soil. Other types of relevant evidence must be considered to exploit avenues to better achieve attribution, including proteins and chemical signatures. These types of signatures can only be obtained from crimes where the weaponized material or delivery device is found; they have little use in covert attacks where the biological agent is derived from the victims. Many of these methods are based on sound technologies and are complementary. They can be combined to identify signatures of sample growth, processing, and chronometry (Morse and Budowle, 2006). Matching of sample properties can help to establish the relatedness of disparate incidents. Furthermore, mismatches might have exclusionary power or signify a more complex causal relationship between the events under investigation. The results of these analyses can provide information on how, when, and/or where microorganisms were grown and weaponized. While the goal of a microbial forensic analysis is to characterize a sample such that it can be traced to a unique source or at least eliminate other sources, it is unlikely that microbial forensic evidence alone is currently adequate to meet this goal.

Emerging Science and Technology

To enhance attribution capabilities with microbial evidence, considerable attention is being invested in molecular genetics, genomics, and bioinformatics. These fields are essential to microbial species/strain identification, fine genome variation, virulence determination, pathogenicity characterization, possible genetic engineering, and attaining source attribution to the highest degree possible. The various tools that have been, or are being, developed in these areas will help to narrow the potential sources from which the pathogen used in an attack may have originated. Indeed, sequencing of an entire genome has been demonstrated as feasible in epidemiological investigations, such as the recent studies of outbreaks of E. coli O104:H4 in Germany and cholera in Haiti (Brzuszkiewicz et al., 2008; Chin et al., 2011; Grad et al., 2012; Hasan et al., 2012; Hendriksen et al., 2011; Mellmann et al., 2011; Rasko et al., 2011; Rohde et al., 2011). In addition, metagenomics studies may become foundational on describing diversity and endemicity. Endemicity becomes important when the relationship between microbes or their genetic residues in samples collected from a site of interest and microbes in the environmental background need to be defined. While the inferential capacity of microbial forensics genetics has yet to reach its full power, the phenomenal new generations of sequencing technology and the concomitant developments for bioinformatics capabilities to handle and extract the explosion of data offer potentials for enhancing microbial forensic investigations. Indeed, the science and technology supporting microbial forensics are advancing at an inconceivable rate. For example, in 2002 in response to the anthrax letter attack, whole genomes of a few isolates were sequenced using shotgun sequencing by TIGR (Budowle et al., 2005b; NRC, 2009; Ravel et al., 2009; Read et al., 2002, 2003). That seemingly nominal analysis, by today’s capabilities, cost approximately $250,000 for one genome, took several weeks, and was unable to characterize but a few samples. Today, such enterprises are a fraction of the cost (and continue to drop dramatically), are becoming more automatable, and provide gigabases and terabytes of data in a matter of days (Bentley et al., 2008; Holt et al., 2008; Loman et al., 2012; MacLean et al., 2009; Margulies et al., 2005).

Given the enhanced capabilities of nucleic acid sequencing of microbes the microbial forensics community will embrace these molecular tools. Although developments are needed, one can envision identification of microbes at the species, strain, and isolate levels being transformed using next- (or better termed “current-”) generation sequencing (CGS). Fine genome detail could become available for routine microbial forensic use. Because CGS provides whole genome characterization capabilities with high depths of coverage (100s to 1,000s fold and beyond), the technology will serve a critical role for research, such as genetic diversity and endemicity studies via metagenomics, and become a rapid diagnostic tool initially when viable and culturable microbes are available. Indeed, whole genome sequencing will reduce the need for a priori design of assays directed at defined species. The technology should apply at some resolution level to any genome without knowledge of the target. In addition, whole genome sequencing offers the capability to evaluate a sample for indications of genetic engineering.

Current Realities

However, not all microbial forensic evidence will present itself in a manner where copious quantities of target are available. Some samples will be highly degraded and/or contaminated. Thus, there will be challenges to extract the most information possible from limited materials and non-viable organisms. To meet these challenges, improved sample collection and extraction methods will be needed, nucleic acid repair methods will be sought, target amplification strategies such as whole genome amplification and selective target capture will be sought, and sequencing chemistries will be enhanced. Because of the throughput, CGS technologies can analyze multiple samples and not even begin to exploit the full throughput of the systems (Brzuszkiewicz et al., 2011; Cummings et al., 2012; Eisen, 2007; Hasan et al., 2012; Holt et al., 2008; Howden et al., 2011; Loman et al., 2012; MacLean et al., 2009; Relman, 2011; Rohde et al., 2011). However, the technology still is evolving and currently does not offer the sensitivity of detection to analyze low-quantity and low-quality DNA samples without some amplification approach prior to sequencing. Nonetheless, CGS is sufficiently mature to be considered useful for microbial forensic applications. Alternatively, technologies, such as mass spectrometry analyses of nucleic acids and real-time PCR, will continue to be used because they offer rapid detection (at species and strain levels) at substantially lower costs (Jacob et al., 2012; Kenefic et al., 2008; Sampath et al., 2005, 2009; U’ren et al., 2005; Vogler et al., 2008).

There are a number of CGS instruments and different chemistries. They include Miseq® System and Hiseq™ Sequencing Systems (Illumina, Inc., San Diego, CA), Ion Personal Genome Machine™ (PGM™) Sequencer, Ion Proton™ Sequencer and SOLiD® Systems (Life Technologies, Foster City, CA), and the 454 Genome Sequencer FLX and GS Junior Systems (Roche Diagnostics Corporation, Indianapolis, IN) (Bentley et al., 2008; Cummings et al., 2010; Loman et al., 2012; Margulies et al., 2005). In addition, single molecule detection platforms, such those from Pacific Bioscience (Chin et al., 2011; Eid et al., 2009) and possibly Oxford Nanopore (Branton et al., 2008) are on the horizon. Each system offers some advantages and limitations for sequencing that will need to be defined with considerations of library preparation, read length, and accuracy. The evaluations should be based on the needs of application-oriented laboratories and not necessarily those of a research laboratory. Initially, microbial forensics instruments will be maintained in controlled laboratory environments.

Library preparation is one of the critical limiting factors for transferring CGS technology from a research environment to that of an operational laboratory. Currently, only a few samples can be prepared at any given time. Thus, while the sequencing throughput of the platforms is high, a sufficient number of samples cannot be readily prepared in an appropriate amount of time to meet the full capacity of the system. Library preparation needs to be simplified. Haloplex (Agilent, Santa Clara, CA) is an example of a library preparation process that potentially can reduce the preparation work required ( This library preparation approach is a single-tube target amplification methodology that enables a large number of library samples to be prepared manually. The general process is: (1) restriction digest and denature the sample; (2) hybridize probes to targeted ends of the digested fragments; (3) circularize and ligate the molecules; and (4) introduce bar codes and amplify the targets by polymerase chain reaction (PCR). Eventually with automation the process might accommodate the number of samples that may be encountered by high-throughput operational laboratories. As many as 96 bar codes are available, which fits well with the 96-well format and reduces the preparation time from 2 weeks or several days to 6 hours. However, currently Haloplex is not available for use with non-human nucleic acids. One constraint is that the Haloplex system employs restriction digestion of the DNA. The restriction enzymes can potentially cleave a target site of interest (either a single nucleotide polymorphism (SNP) site or within a repeat motif) and render the marker untypable. Unfortunately, the enzymes used in Haloplex are proprietary, and one cannot readily scan for the restriction sites that would be incompatible with the designated targets (although palindromes can be sought for potential sites that may be obliterated). Another strategy for simplifying library preparation and decreasing sample input is that of the Nextera XT DNA Sample Preparation Kit ( Strategies, such as the Haloplex system and the Nextera XT DNA kit, hold promise for simplifying and possibly automating library preparation.

Another factor to consider with CGS technology is sequencing read length and accuracy. Current read lengths for the most widely used CGS instruments typically do not exceed 200 bases, and when they do, the quality of base calling decreases substantially along the length of a read. Longer reads with higher accuracy are necessary. Advances in technology for some platform systems suggest that reads up to 400 bases will be feasible in 2012.

Another consideration of platform selection is for situations where rapid responses are required (such as in military operations, some pandemics, and bioterrorism acts). Initially, platforms will be placed in laboratories with controlled environments. One can envision the technology being taken to the field for immediate response and exigent circumstances. Robustness of the instrumentation, supply lines of reagents, and service support will be part of the decision process for the instrumentation/chemistry of choice. Fortunately, the technology and supporting interpretation tools continue to evolve and likely will become more robust.

Seeking More Power and Depth

For design and selection of systems and diagnostics, different diagnostic-based strategies can be considered. They can be based on the sample type, the sample matrix, the amount of work, or the question that one is attempting to address. The latter may be the best suited for conceiving workflow systems. The different scenarios should be considered where nucleic acid analyses may be applied, because these will help guide the needs for the microbial forensic community. They likely are (1) identification of species/strain (i.e., similar to epidemiological needs), (2) attribution, (3) genetic engineering, (4) sample-to-sample comparisons, and (5) metagenomics for endemicity (or a modified metagenomics for sample characterization) (Figure A1-2).

A flow diagram showing the work and information flow from sample to analysis


A general overview of the work and information flow from sample to analysis to information developed based on use of second-generation sequencing technology.

Sample identification generally would be direct characterization to identify the agent for immediate determination of potential threat and probable cause to investigate further. The process of attribution would drill down to the finest resolution possible and make comparisons to other reference samples, databases, or repositories to reduce the possible sources from which the sample originated or to a recent common ancestor. Genetic engineering could be detected by whole genome sequencing.

Metagenomics studies have been performed on several platforms, and they will likely provide some foundational data on diversity and endemicity (Eisen, 2007; Relman, 2011; Tringe et al., 2005). The value could be searching various niches for select agents. Suppose that in every sample tested certain select agents are identified. Then there can be two consequences: one is that it may be more difficult to elucidate natural outbreaks versus intentional releases (although strain resolution may reduce the uncertainty); the second could be that such high resolution may be less informative at some threshold depth of coverage.

Most metagenomic work to date has been by exploiting a small, single sequence target (16s rRNA), at a very high depth of coverage (Rusch et al., 2007; Venter et al., 2004). These studies often cannot provide resolution beyond family to genus levels. Clearly such broad range definition will not enable individualization or identify select agents. The anthrax investigation could have benefited from a modified metagenomics characterization. The putative common source of the material (RMR1029) was composed of a population of very similar cells. The colony morphological variants found in the evidence from the 2001 anthrax letter attacks were minority components and because of sample preparation and stochastic effects the minor variants potentially could be difficult to detect with PCR-based assays that were developed for the investigation. Because of the high depth of coverage with CGS, the population of low-level variants may be more readily detected, especially if an amplification enrichment step was included that focused only on the known variant sites that defined the morphology types. Such high depth of coverage would substantially reduce the false-positive rate and improve confidence in the potential relationship of the most similar samples to focus investigative leads (Cummings et al., 2010). Indeed, the depth of coverage could be in the millions. While exquisitely sensitive, platform- and chemistry-specific errors may confound interpretation, and thus thresholds of reliability may be necessarily invoked.

One could envision extending this population depth analysis, which in essence is a simplified metagenomic analysis, and exploiting the concept of using a multi-locus sequence typing (MLST) approach to provide a species-level identification capability (Maiden et al., 1998; Spratt, 1999). A few loci (perhaps the seven typically applied to MLST to 15) could be selected as a standard (e.g., for bacteria). If there is a combination of sufficiently stable sites and evolutionarily rapid sites, the loci could indicate species- to strain-level presence in mixed and metagenomic samples. Using the core seven used for MLST could allow some questions regarding time and place of isolation, host or niche, serotype, and some clinical or drug resistance profiles. This will not be a trivial process because each of the sites will not be physically linked. However, one could determine, if the complete set or a reasonable subset of targets are in a sample, whether there is confidence that a particular species or sets of species are present. In theory this approach could be extended to strain levels. There certainly is enough throughput to consider this capability. The potential already has been established with electrospray ionization mass spectrometry of targeted genes for rapid bacterial species identification (and even for viruses such as influenza). There are sufficient bacterial genomes that have been sequenced to test our hypothesis, and work is under way.

Inferences about the significance of genetic evidence may not reach the ultimate goal of attribution. The most confounding constraint on reaching the full power of attribution is scant data on diversity and endemicity. The vast diversity of the microbial world is unknown and will not be defined substantially with current approaches in the area where a biocrime or bioterrorist attack has occurred. This limitation is not the sole purview of the microbial forensics community; it plagues the epidemiologists as well. Another limitation that evidentiary samples will likely have is an unknown history. Lack of knowledge on how it was manipulated (e.g., number of passages, exposure to mutagenic agents, length of storage) will complicate providing inferences about the significance or strength of sequencing results, especially because the distance between samples will be determined by the degree of similarity or dissimilarity. Indeed, even defining what is a “match” or “similar” may not be straightforward. Keim (personal communication) has stressed this uncertainty and proffered new terminology—a “member,” to the microbial forensic lexicon based on phylogenetics for the relationship of a sample to some reference samples. Regardless of the terminology used, some data will be needed to define the uncertainty of a “membership” or “association.” In 2006, the need for reconciliation between microbial genomics and systematics was described; microbial forensics and epidemiology were seen to offer useful, practical venues to frame the gaps and priorities (Buckley and Roberts, 2006). This challenge remains.

Some assessment of the strength or significance of an analytical result and subsequent comparison also is needed (Budowle et al., 2008; Chakraborty and Budowle, 2011). Of course, because of scant supporting data, such an endeavor will be challenging. Qualitative and/or quantitative statements of the significance of the finding will need to be developed. As an example, consider a forensic analysis of whole genome sequence data that compared two or more sequences, such as an evidence sample profile with that of a reference sample that may be considered a possible direct link or have a common ancestor. The evolutionary rates of the variants will need to be known. But perhaps as consequential, sequencing error and other factors could inflate the dissimilarity between samples and add a degree of “uncertainty” to some extent. Thus, efforts in defining and quantifying the error rates associated with each CGS platform and chemistry are critically important.

Beyond comparison of samples for identification purposes are inferences by whole genome sequencing of phenotypic (i.e., functional) properties of a microbe. For example, even with a whole genome sequence whether a microbe phenotypically displays antimicrobial resistance or susceptibility is still limited. Bacteria may contain multiple pathways, and how the different genes interact is far from being completely understood (Eisen, 2007; Köser et al., 2012; Relman, 2011). Substantial research will be needed such that genotype can be used reliably to predict phenotype.

Making Sense of Data

The ever-increasing amount of microbial genomic sequence data presents a variety of challenges related to the handling and storage of data and the development of bioinformatics methods that can accommodate such large numbers of whole genomes. Being able to analyze the vast amounts of data in a timely fashion is a key challenge to leveraging the power of these newer sequencing platforms. Software, hardware, and IT support may be the greatest barrier to use of CGS technology. It is unlikely that dedicated bioinformaticists will reside in every microbial forensics laboratory. Data cannot be sent to web-based clouds and be analyzed because the results may be classified. Instead, some standardization and standard operating data analysis and interpretation approaches will be needed. Pipeline and interpretation software will need to be evaluated for reliability and seamless diagnostic flow without bioinformatics expert intervention. The output of results must be intelligible to the microbial forensic analyst as well. The ideal software should be a comprehensive tool(s) enabling microbe detection to determination of engineering.

The government should rely heavily on industry and well-established genome centers. The commercial competitive environment is driving down costs and improving informatics pipelines without the need for extensive investment. Leveraging these efforts will help meet the needs of microbial forensics more expeditiously than going it alone. The centers (to include the national laboratories) are evaluating platforms and chemistries and are generating data at unprecedented levels. They are providing solutions to massive data handling, including storage, curation of reference data, annotation, and data analysis.

Collection and databases are needed to house the microbial genomic data and when possible the accompanying meta-data. No standards yet exist for building databases to meet the needs of the microbial forensic community. Requirements for storage and retrieval of raw sequence data in microbial forensics cases and supporting inferential data must be developed. Given the high throughput and anticipated speed of analyses, it is conceivable that meaningful databases can be developed “on the fly” that better reflect the diversity where the crime was committed (to include the preparation laboratory to the crime scene).

The power of microbial forensics techniques, tools, software, and databases that are used need to be understood, and their limitations even more so need to be understood. To achieve this goal methods need to be validated, and validation should be a requisite of any forensic repertoire. Indeed the forensic sciences in general are facing well-deserved criticism for not necessarily having sound foundations and overstating the strength of the evidence (NRC, 2009). Attempts to attribute any attack to a person(s) or group should rely on acurate and credible results. The interpretation of such results might seriously impact the course or focus of an investigation, thus affecting the liberties of individuals or even being used as a justification for a government’s military response to an attack or threat of an attack. Therefore, the methods for collection, extraction, and analysis of microbial evidence that could generate key results need to be as scientifically robust as possible, so the methods can be high performing and the results defensible for decision makers and to the legal, international government, law enforcement, and scientific communities, as well as scrutiny by the media.

Validation Is Essential

Validation is frequently used to connote confidence in a test or process, but it may be better thought of as defining the limitation of a method, process, or assay (Budowle et al., 2003, 2006, 2008). It still is common for the term validation to be used vaguely or to remain undefined when applied to process performance evaluation. The degree of validation varies from nominal to rigorous. The consequences of such varied requirements can be catastrophic if methods used in microbial forensic investigations are poorly constructed, under-developed, or generate results that are difficult to interpret. The validation process needs to be defined as to what is expected to be achieved by a validation study.

Validation determines the limits of a test. It does not mean that a test must be 100 percent accurate or have no cross-reactivity, false-positive results, or false-negative results to be considered useful. It is often thought of as a process applied to the analytical portion of a system. This concept is only partly correct. The limits that the methods can provide must be demonstrated and documented for all steps of the process to include sample collection, preservation, extraction, analytical characterization, and data interpretation. Furthermore, it is recognized that as new technologies and capabilities are developed to address the needs of the microbial forensics community, key principles and performance parameters including accuracy, precision, bias, reliability, sensitivity, and robustness will need to be determined. Robust quality assurance and data control systems are required to achieve confidence in results by diverse users of the information. It is imperative that both technical and interpretation limitations (and thus accuracy and error) be defined. Additionally, a key resource for microbial forensic research, validation, and analysis is access to well-defined and curated microbial collections and data sets that are as comprehensive as is possible to the task. This effort includes the structure, content, and quality of the data sets. While some collections have been started for use in research, or created for case-specific use, no comprehensive repository exists to support microbial forensics, and standards are not codified for meta-data and data curation.

The implications of highly technical data, epidemiological data, traditional evidence data, and investigative or intelligence information are complex and need to be appreciated for their strengths and limitations. Because scientific data can affect the decision-making process for retaliation, preemptive actions, and/or courtroom deliberations, it is imperative that those directly involved in microbial forensics or those who may use the results for investigative lead value or more direct associations be properly educated (or at least properly apprised) of the implications of such data. To meet this necessary goal, education and training are critical to disseminate the principles, development, and applications of the evolving field of microbial forensics. Educational strategies and programs need to be constructed and training programs developed on the varied scientific foundations that support microbial forensics.

If validation processes are not defined and not followed and proper training or communication is not provided, then it is possible that a false sense of confidence may be associated with a poor method or process or from a result of limited significance. There are myriad methods, processes, targets, platforms, and applications. Yet some basic requirements transcend individual differences in methods, and these can be reinforced by contextual description (Table A1-1). Validation needs to be codified. Efforts are under way and should be applied equally across the user space.

TABLE A1-1. Validation Criteria List.


Validation Criteria List.


Microbial forensics should embrace and validate newly developed and emerging molecular biology technologies and phylogenetics approaches, and pursue potential forensic information and comparative sources, such as might be achieved through metagenomics. Genetic analyses of microorganisms often are a powerful tool for differentiating species, isolates, and strains. Similar to human DNA forensic identification, DNA sequences of microorganisms can be used to identify and differentiate between isolates and strains of a single microbial species; however, nucleic acid–based identification is not as resolving with respect to source attribution in microbial forensics as with human DNA forensic analysis. The basic constituents of nucleic acids essentially are the same for bacteria and humans; however, unlike humans, bacteria, viruses, and fungi multiply rapidly in a clonal fashion and can readily share or exchange genetic material between and among species. These differences and uncertainties due to scant supporting data must be taken into consideration during analysis, interpretation, and reporting related to the findings derived from microbial genetic evidence. For the foreseeable future the ability of microbial forensics to establish that a sample collected from either a crime scene or a person of interest can be attributed to a known source to a high degree of scientific certainty will be limited. Therefore, the methods must be reliable and robust, and the uncertainty associated with any interpretation should be properly conveyed.

Microbial forensics experts and those who contribute in closely related fields need to work together to advance the science, to validate methods to scientific and legal standards, and to transition interpretation of results and conclusions from such analyses into something that can be used by the criminal justice system, the policy community, and other stakeholders. It is incumbent upon the microbial forensics community to make every effort to interpret and communicate objectively and effectively the advantages and limitations of both microbial forensics and traditional forensic science analyses. Consumers of microbial forensic information who incorporate this evidence into decision making should be provided accurate, reliable, credible, and defensible results, interpretations, and context.


  • Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown CG, Hall KP, Evers DJ, Barnes CL, Bignell HR, Boutell JM, Bryant J, Carter RJ, Cheetham RK, Cox AJ, Ellis DJ, Flatbush MR, Gormley NA, Humphray SJ, Irving LJ, Karbelashvili MS, Kirk SM, Li H, Liu X, Maisinger KS, Murray LJ, Obradovic B, Ost T, Parkinson ML, Pratt MR, Rasolonjatovo IMJ, Reed MT, Rigatti R, Rodighiero C, Ross MT, Sabot A, Sankar SV, Scally A, Schroth GP, Smith ME, Smith VP, Spiridou A, Torrance PE, Tzonev SS, Vermaas EH, Walter K, Wu X, Zhang L, Alam MD, Anastasi C, Aniebo IC, Bailey DMD, Bancarz IR, Banerjee S, Barbour SG, Baybayan PA, Benoit VA, Benson KF, Bevis C, Black PJ, Boodhun A, Brennan JS, Bridgham JA, Brown RC, Brown AA, Buermann DH, Bundu AA, Burrows JC, Carter NP, Castillo N, Catenazzi MCE, Chang S, Cooley RN, Crake NR, Dada OO, Diakoumakos KD, Dominguez-Fernandez B, Earnshaw DJ, Egbujor UC, Elmore DW, Etchin SS, Ewan MR, Fedurco M, Fraser LJ, Fuentes Fajardo KV, Furey WS, George D, Gietzen KJ, Goddard CP, Golda GS, Granieri PA, Green DE, Gustafson DL, Hansen NF, Harnish K, Haudenschild CD, Heyer NI, Hims MM, Ho JT, Horgan AM, Hoschler K, Hurwitz S, Ivanov DV, Johnson MQ, James T, Huw Jones TA, Kang G.-D, Kerelska TH, Kersey AD, Khrebtukova I, Kindwall AP, Kingsbury Z, Kokko-Gonzales PI, Kumar A, Laurent MA, Lawley CT, Lee SE, Lee X, Liao AK, Loch JA, Lok M, Luo S, Mammen RM, Martin JW, McCauley PG, McNitt P, Mehta P, Moon KW, Mullens JW, Newington T, Ning Z, Ng BL, Novo SM, O’Neill MJ, Osborne MA, Osnowski A, Ostadan O, Paraschos LL, Pickering L, Pike AC, Pike AC, Pinkard DC, Pliskin DP, Podhasky J, Quijano VJ, Raczy C, Rae VH, Rawlings SR, Chiva Rodriguez A, Roe PM, Rogers J, Rogert Bacigalupo MC, Romanov N, Romieu A, Roth RK, Rourke NJ, Ruediger ST, Rusman E, Sanches-Kuiper RM, Schenker MR, Seoane JM, Shaw RJ, Shiver MK, Short SW, Sizto NL, Sluis JP, Smith MA, Ernest Sohna Sohna J, Spence EJ, Stevens K, Sutton N, Szajkowski L, Tregidgo CL, Turcatti G, Vandevondele S, Verhovsky Y, Virk SM, Wakelin S, Walcott GC, Wang J, Worsley GJ, Yan J, Yau L, Zuerlein M, Rogers J, Mullikin JC, Hurles ME, McCooke NJ, West JS, Oaks FL, Lundberg PL, Klenerman D, Durbin R, Smith AJ. Accurate whole human genome sequencing using reversible terminator chemistry. Nature. 2008;456:53–59. [PMC free article: PMC2581791] [PubMed: 18987734]
  • Branton D, Deamer DW, Marziali A, Bayley H, Benner SA, Butler T, Di Ventra M, Garaj S, Hibbs A, Huang X, Jovanovich SB, Krstic PS, Lindsay S, Ling XS, Mastrangelo CH, Meller A, Oliver JS, Pershin YV, Ramsey JM, Riehn R, Soni GV, Tabard-Cossa V, Wanunu M, Wiggin M, Schloss JA. The potential and challenges of nanopore sequencing. Nature Biotechnology. 2008;26:1146–1153. [PMC free article: PMC2683588] [PubMed: 18846088]
  • Brzuszkiewicz E, Thurmer A, Schuldes J, Leimbach A, Liesegang H, Meyer FD, Boelter J, Petersen H, Gottschalk G, Daniel R. Genome sequence analyses of two isolates from the recent Escherichia coli outbreak in Germany reveal the emergence of a new pathotype: Enteroaggregative-haemorrhagic Escherichia coli (EAHEC) Archives of Microbiology. 2011;193:883–891. [PMC free article: PMC3219860] [PubMed: 21713444]
  • Buckley M, Roberts RJ. Report of a Colloquium of the American Academy of Microbiology. Washington, DC: ASM Press; 2006. Reconciling microbial systematics and genomics.
  • Budowle B, Schutzer SE, Einseln A, Kelley LC, Walsh AC, Smith JA, Marrone BL, Robertson J, Campos J. Building microbial forensics as a response to bio-terrorism. Science. 2003;301:1852–1853. [PubMed: 14512607]
  • Budowle B, Schutzer SE, Ascher MS, Atlas RM, Burans JP, Chakraborty R, Dunn JJ, Fraser CM, Franz DR, Leighton TJ, Morse SA, Murch RS, Ravel J, Rock DL, Slezak TR, Velsko SP, Walsh AC, Walters RA. Toward a system of microbial forensics: From sample collection to interpretation of evidence. Applied and Environmental Microbiology. 2005;71:2209–2213. [PMC free article: PMC1087589] [PubMed: 15870301]
  • Budowle B, Johnson MD, Fraser CM, Leighton TJ, Murch RS, Chakraborty R. Genetic analysis and attribution of microbial forensics evidence. Critical Reviews in Microbiology. 2005;31(4):233–254. [PubMed: 16417203]
  • Budowle B, Schutzer SE, Burans JP, Beecher DJ, Cebula TA, Chakraborty R, Cobb WT, Fletcher J, Hale ML, Harris RB, Heitkamp MA, Keller FP, Kuske C, LeClerc JE, Marrone BL, McKenna TS, Morse SA, Rodriguez LL, Valentine NB, Yadev J. Quality sample collection, handling, and preservation for an effective microbial forensics program. Applied and Environmental Microbiology. 2006;72(10):6431–6438. [PMC free article: PMC1610269] [PubMed: 17021190]
  • Budowle B, Schutzer SE, Morse SA, Martinez KF, Chakraborty R, Marrone BL, Messenger SL, Murch RS, Jackson PJ, Williamson P, Harmon R, Velsko SP. Criteria for validation of methods in microbial forensics. Applied and Environmental Microbiology. 2008;74:5559–5607. [PMC free article: PMC2547046] [PubMed: 18658281]
  • Chakraborty R, Budowle B. Population genetic considerations in statistical interpretation of microbial forensic data in comparison with the human DNA forensic standard. In: Budowle B, Schutzer SE, Breeze R, Keim PS, Morse SA, editors. Microbial Forensics. 2nd ed. Amsterdam: Academic Press; 2011. pp. 561–580.
  • Chin CS, Sorenson J, Harris JB, Robins WP, Charles RC, Jean-Charles RR, Bullard J, Webster DR, Kasarskis A, Peluso P, Paxinos EE, Yamaichi Y, Calderwood SB, Mekalanos JJ, Schadt EE, Waldor MK. The origin of the Haitian cholera outbreak strain. New England Journal of Medicine. 2011;364:33–42. [PMC free article: PMC3030187] [PubMed: 21142692]
  • Cummings CA, Bormann-Chung CA, Fang R, Barker M, Brzoska P, Williamson PC, Beaudry J, Matthews M, Schupp J, Wagner DM, Birdsell D, Vogler AJ, Furtado MR, Keim P, Budowle B. Accurate, rapid, and high-throughput detection of strain-specific polymorphisms in Bacillus anthracis and Yersinia pestis by next-generation sequencing. BMC Investigative Genetics. 2010;1:5. [PMC free article: PMC2988479] [PubMed: 21092340]
  • Eid J, Fehr A, Gray J, Luong K, Lyle J, Otto G, Peluso P, Rank D, Baybayan P, Bettman B, Bibillo A, Bjornson K, Chaudhuri B, Christians F, Cicero R, Clark S, Dalal R, deWinter A, Dixon J, Foquet M, Gaertner A, Hardenbol P, Heiner C, Hester K, Holden D, Kearns G, Kong X, Kuse R, Lacroix Y, Lin S, Lundquist P, Ma C, Marks P, Maxham M, Murphy D, Park I, Pham T, Phillips M, Roy J, Sebra R, Shen G, Sorenson J, Tomaney A, Travers K, Trulson M, Vieceli J, Wegener J, Wu D, Yang A, Zaccarin D, Zhao P, Zhong F, Korlach J, Turner S. Real-time DNA sequencing from single polymerase molecules. Science. 2009;323:133–138. [PubMed: 19023044]
  • Eisen JA. Environmental shotgun sequencing: Its potential and challenges for studying the hidden world of microbes. PLoS Biology. 2007;5(3):e82. [PMC free article: PMC1821061] [PubMed: 17355177]
  • Grad YH, Lipsitch M, Feldgarden M, Arachchi HM, Cerqueira GC, Fitzgerald M, Godfrey P, Haas BJ, Murphy CI, Russ C, Sykes S, Walker BJ, Wortman JR, Young S, Zeng Q, Abouelleil A, Bochicchio J, Chauvin S, DeSmet T, Gujja S, McCowan C, Montmayeur A, Steelman S, Frimodt-Møller J, Petersen AM, Struve C, Krogfelt KA, Bingen E, Weill FX, Lander ES, Nusbaum C, Birren BW, Hung DT, Hanage WP. Genomic epidemiology of the Escherichia coli O104:H4 outbreaks in Europe, 2011. Proceedings of the National Academy of Sciences USA. 2012;109:3065–3070. [PMC free article: PMC3286951] [PubMed: 22315421]
  • Hasan NA, Choi SY, Eppinger M, Clark PW, Chen A, Alam M, Haley BJ, Taviani E, Hine E, Su Q, Tallon LJ, Prosper JB, Furth K, Hog MM, Li H, Fraser-Liggett CM, Cravioto A, Hug A, Ravel J, Cebula TA, Colwell RR. Genomic diversity of 2010 Haitian cholera outbreak strains. Proceedings of the National Academy of Sciences USA. 2012;109(29):E2010–E2017. [PMC free article: PMC3406840] [PubMed: 22711841]
  • Hendriksen RS, Price LB, Schupp JM, Gillece JD, Kaas RS, Engelthaler DM, Bortolaia V, Pearson T, Waters AE, Upadhyay BP, Shrestha SD, Adhikai S, Shakya G, Keim PS, Aarestrup FM. Population genetics of Vibrio cholerae from Nepal in 2010: Evidence on the origin of the Haitian outbreak. MBio. 2011;2(4):e00157–e00111. [PMC free article: PMC3163938] [PubMed: 21862630]
  • Holt KE, Parkhill J, Mazzoni CJ, Roumagnac P, Weill FX, Goodhead I, Rance R, Baker S, Maskell DJ, Wain J, Dolecek C, Achtman M, Dougan G. High-throughput sequencing provides insights into genome variation and evolution in Salmonella typhi. Nature Genetics. 2008;40:987–993. [PMC free article: PMC2652037] [PubMed: 18660809]
  • Howden BP, McEvoy CRE, Allen DL, Chua K, Gao W, Harrison PF, Bell J, Coombs G, Bennett-Wood V, Porter JL, Robins-Browne R, Davies JK, Seemann T, Stinear TP. Evolution of multidrug resistance during Staphylococcus aureus infection involves mutation of the essential two component regulator WalKR. PLoS Pathogens. 2011;7(11):e1002359. [PMC free article: PMC3213104] [PubMed: 22102812]
  • Jacob D, Sauer U, Housley R, Washington C, Sannes-Lowery K, Ecker DJ, Sampath R, Grunow R. Rapid and high-throughput detection of highly pathogenic bacteria by Ibis PLEX-ID technology. PLoS One. 2012;7(6):e39928. [PMC free article: PMC3386907] [PubMed: 22768173]
  • Kenefic LJ, Beaudry J, Trim C, Daly R, Parmar R, Zanecki S, Huynh L, Van Ert MN, Wagner DM, Graham T, Keim P. High resolution genotyping of Bacillus anthracis outbreak strains using four highly mutable single nucleotide repeat markers. Letters in Applied Microbiology. 2008;46:600–603. [PubMed: 18363651]
  • Köser CU, Ellington MJ, Cartwright EJ, Gillespie SH, Brown NM, Farrington M, Holden MTG, Dougan G, Bentley SD, Parkhill J, Peacock SJ. Routine use of microbial whole genome sequencing in diagnostic and public health microbiology. PLoS Pathogens. 2012;8(8):e1002824. [PMC free article: PMC3410874] [PubMed: 22876174]
  • Loman NJ, Misra RV, Dallman TJ, Constantinidou C, Gharbia SE, Wain J, Pallen MJ. Performance comparison of benchtop high-throughput sequencing platforms. Nature Biotechnology. 2012;30(5):434–439. [PubMed: 22522955]
  • MacLean D, Jones JD, Studholme DJ. Application of “next-generation” sequencing technologies to microbial genetics. Nature Reviews Microbiology. 2009;7(4):287–296. [PubMed: 19287448]
  • Maiden MC, Bygraves JA, Feil E, Morelli G, Russell JE, Urwin R, Zhang Q, Zhou J, Zurth K, Caugant DA, Feavers IM, Achtman M, Spratt BG. Multilocus sequence typing: A portable approach to the identification of clones within populations of pathogenic microorganisms. Proceedings of the National Academy of Sciences USA. 1998;95:3140–3145. [PMC free article: PMC19708] [PubMed: 9501229]
  • Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, Berka J, Braverman MS, Chen YJ, Chen Z, Dewell SB, Du L, Fierro JM, Gomes XV, Godwin BC, He W, Helgesen S, Ho CH, Irzyk GP, Jando SC, Alenquer MLI, Jarvie TP, Jirage KB, Kim J-B, Knight JR, Lanza JR, Leamon JH, Lefkowitz SM, Lei M, Li J, Lohman KL, Lu H, Makhijani VB, McDade KE, McKenna MP, Myers EW, Nickerson E, Nobile JR, Plant R, Puc BP, Ronan MT, Roth GT, Sarkis GJ, Simons JF, Simpson JW, Srinivasan M, Tartaro KR, Tomasz A, Vogt KA, Volkmer GA, Wang SH, Wang Y, Weiner MP, Yu P, Begley RF, Rothberg JM. Genome sequencing in microfabricated high-density picolitre reactors. Nature. 2005;437:376–380. [PMC free article: PMC1464427] [PubMed: 16056220]
  • Mellmann A, Harmsen D, Cummings CA, Zentz EB, Leopold SR, Rico A, Prior K, Szczepanowski R, Ji Y, Zhang W, McLaughlin SF, Henkhaus JK, Leopold B, Bielaszewska M, Prager R, Brzoska PM, Moore RL, Guenther S, Rothberg JM, Karch H. Prospective genomic characterization of the German enterohemorrhagic Escherichia coli O104:H4 outbreak by rapid next-generation sequencing technology. PLoS One. 2011;6(7):e22751. [PMC free article: PMC3140518] [PubMed: 21799941]
  • Morse SA, Budowle B. Microbial forensics: Application to bioterrorism preparedness and response. Infectious Disease Clinics of North America. 2006;20:455–473. [PubMed: 16762747]
  • Morse SA, Khan AS. Epidemiologic investigation for public health, biodefense, and forensic microbiology. In: Breeze R, Budowle B, Schutzer S, editors. Microbial Forensics. Amsterdam: Academic Press; 2005. pp. 157–171.
  • NRC (National Research Council) Strengthening forensic science in the United States: A path forward. Washington, DC: The National Academies Press; 2009.
  • Rasko DA, Worshamb PL, Abshireb TG, Stanley ST, Bannand JD, Wilson MR, Langham RJ, Decker RS, Jianga L, Reade TD, Phillippy AM, Salzberg SL, Pop M, Van Ert MN, Kenefic LJ, Keim PS, Fraser-Liggett CM, Ravel J. Bacillus anthracis comparative genome analysis in support of the Amerithrax investigation. Proceedings of the National Academy of Sciences USA. 2011;108(12):5027–5032. [PMC free article: PMC3064363] [PubMed: 21383169]
  • Ravel J, Jiang L, Stanley ST, Wilson MR, Decker RS, Read TD, Worsham P, Keim PS, Salzberg SL, Liggett CM, Rasko DA. The complete genome sequence of Bacillus anthracis Ames “Ancestor. Journal of Bacteriology. 2009;191:445–446. [PMC free article: PMC2612425] [PubMed: 18952800]
  • Read TD, Salzberg SL, Pop M, Shumway M, Umayam L, Jiang L, Holtzapple E, Busch JD, Smith KL, Schupp JM, Solomon D, Keim P, Fraser CM. Comparative genome sequencing for discovery of novel polymorphisms in Bacillus anthracis. Science. 2002;296:2028–2033. [PubMed: 12004073]
  • Read TD, Peterson SN, Tourasse N, Baillie LW, Paulsen IT, Nelson KE, Tettelin H, Fouts DE, Eisen JA, Gill SR, Holtzapple EK, Okstad OA, Helgason E, Rilstone J, Wu M, Kolonay JF, Beanman MJ, Dodson RJ, Brinkac LM, Gwinn M, DeBoy RT, Madpu R, Daugherty SC, Durkin AS, Haft DH, Nelson WC, Peterson JD, Pop M, Khouri HM, Radune D, Benton JL, Mahamoud Y, Jiang L, Hance IR, Wiedman JF, Berry KJ, Plaut RD, Wolf AM, Watkins KL, Nierman WC, Hazen A, Cline R, Redmond C, Thwaite JE, White O, Salzberg SL, Thomason B, Friedlander AM, Koehler TM, Hanna PC, Kolstø AB, Fraser CM. The genome sequence of Bacillus anthracis Ames and comparison to closely related bacteria. Nature. 2003;423:81–86. [PubMed: 12721629]
  • Relman DA. Microbial genomics and infectious diseases. New England Journal of Medicine. 2011;365:347–357. [PMC free article: PMC3412127] [PubMed: 21793746]
  • Rohde H, Qin J, Cui Y, Li D, Loman NJ, Hentschke M, Chen W, Pu F, Peng Y, Li J, Xi F, Li S, Li Y, Zhang Z, Yang X, Zhao M, Wang P, Guan Y, Cen Z, Zhao X, Christner M, Kobbe R, Loos S, Oh J, Yang L, Danchin A, Gao GF, Song Y, Li Y, Yang H, Wang J, Xu J, Pallen MJ, Wang J, Aepfelbacher M, Yang R. E. coli O104:H4 Genome Analysis Crowd-Sourcing Consortium 2011. Open-source genomic analysis of Shiga-toxin-producing E. coli O104:H4. New England Journal of Medicine. 2011;365(8):718–724. [PubMed: 21793736]
  • Rusch DB, Halpern AL, Sutton G, Heidelberg KB, Williamson S, Yooseph S, Wu D, Eisen JA, Hoffman JM, Remington K, Beeson K, Tran B, Smith H, Baden-Tillson H, Stewart C, Thorpe J, Freeman J, Andrews-Pfannkoch C, Venter JE, Li K, Kravitz S, Heidelberg JF, Utterback T, Rogers YH, Falcón LI, Souza V, Bonilla-Rosso G, Eguiarte LE, Karl DM, Sathyendranath S, Platt T, Bermingham E, Gallardo V, Tamayo-Castillo G, Ferrari MR, Strausberg RL, Nealson K, Friedman R, Frazier M, Venter JC. The Sorcerer II global ocean sampling expedition: Northwest Atlantic through Eastern Tropical Pacific. PLoS Biology. 2007;5:e77. [PMC free article: PMC1821060] [PubMed: 17355176]
  • Sampath R, Mulholland N, Blyn LB, Eshoo MW, Hall TA, Massire C, Levene HM, Hannis JC, Harrell PM, Neuman B, Buchmeier MJ, Jiang Y, Ranken R, Drader JJ, Samant V, Griffey RH, McNeil JA, Crooke ST, Ecker DJ. Rapid identification of emerging pathogens: Coronavirus. Emerging Infectious Diseases. 2005;11:373–379. [PMC free article: PMC3298233] [PubMed: 15757550]
  • Sampath R, Mulholland N, Blyn LB, Massire C, Whitehouse CA, Waybright N, Harter C, Bogan J, Miranda MS, Smith D, Baldwin C, Wolcott M, Norwood D, Kreft R, Frinder M, Lovari R, Yasuda I, Matthews H, Toleno D, Housley R, Duncan D, Li F, Warren R, Eshoo MW, Hall TA, Hofstadler SA, Ecker DJ. Comprehensive biothreat cluster identification by PCR/electrospray-ionization mass spectrometry. Nature Reviews Microbiology. 2009;7(4):287–296. [PMC free article: PMC3387173] [PubMed: 22768032]
  • Spratt BG. Multilocus sequence typing: Molecular typing of bacterial pathogens in an era of rapid DNA sequencing and the Internet. Current Opinion in Microbiology. 1999;2:312–316. [PubMed: 10383857]
  • Srivatsan A, Han Y, Peng J, Tehranchi AK, Gibbs R, Wang JD, Chen R. High-precision, whole-genome sequencing of laboratory strains facilitates genetic studies. PLoS Genetics. 2008;4(8):e1000139. [PMC free article: PMC2474695] [PubMed: 18670626]
  • Tringe SG, von Mering C, Kobayashi A, Salamov AA, Chen K, Chang HW, Podar M, Short JM, Mathur EJ, Detter JC, Bork P, Hugenholtz P, Rubin EM. Comparative metagenomics of microbial communities. Science. 2005;308:554–557. [PubMed: 15845853]
  • U’ren JM, Vant MN, Schupp JM, Easterday WR, Simonson TS, Okinaka RT, Pearson T, Keim P. Use of a real-time PCR TaqMan assay for rapid identification and differentiation of Burkholderia pseudomallei and Burkholderia mallei. Journal of Clinical Microbiology. 2005;43:5771–5774. [PMC free article: PMC1287822] [PubMed: 16272516]
  • Venter JC, Remington K, Heidelberg JF, Halpern AL, Rusch D, Eisen JA, Wu D, Paulsen I, Nelson KE, Nelson W, Fouts DE, Levy S, Knap AH, Lomas MW, Nealson K, White O, Peterson J, Hoffman J, Parsons R, Baden-Tillson H, Pfannkoch C, Rogers YH, Smith HO. Environmental genome shotgun sequencing of the Sargasso Sea. Science. 2004;304:66–74. [PubMed: 15001713]
  • Vogler AJ, Driebe EM, Lee J, Auerbach RK, Allender CJ, Stanley M, Kubota K, Andersen GL, Radnedge L, Worsham PL, Keim P, Wagner DM. Assays for the rapid and specific identification of North American Yersinia pestis and the common laboratory strain CO92. BioTechniques. 2008;44:201–207. [PMC free article: PMC3836605] [PubMed: 18330347]


,5,* ,6 and 5.

5 Department of Microbiology & Immunology and Medicine, Albert Einstein College of Medicine, Bronx, New York, United States of America.
6 Departments of Laboratory Medicine and Microbiology, University of Washington School of Medicine, Seattle, Washington, United States of America.

Although an existential threat from the microbial world might seem like science fiction, a catastrophic decline in amphibian populations with the extinction of dozens of species has been attributed to a chytrid fungus (Daszak et al., 1999; Pound et al., 2006), and North American bats are being decimated by Geomyces destructans, a new fungal pathogen (Blehert et al., 2009). Hence, individual microbes can cause the extinction of a species. In the foregoing instances, neither fungus had a known relationship with the threatened species; there was neither selection pressure for pathogen attenuation nor effective host defense. Humans are also constantly confronted by new microbial threats as witnessed by the appearance of HIV, SARS coronavirus, and the latest influenza pandemic. While some microbial threats seem to be frequently emerging or re-emerging, others seem to wane or attenuate with time, as exemplified by the decline of rheumatic heart disease (Quinn, 1989), the evolution of syphilis from a fulminant to a chronic disease (Tognotti, 2009), and the disappearance of “English sweating sickness” (Beeson, 1980). A defining feature of infectious diseases is changeability, with change being a function of microbial, host, environmental, and societal changes–microbe interaction. Given that species as varied as amphibians and bats can be threatened with extinction by microbes, the development of predictive tools for identifying microbial threats is both desirable and important.

Virulence as an Emergent Property

To those familiar with the concept of emergence (Box A2-1), it probably comes as no surprise that microbial virulence is an emerging property. However, the traditional view of microbial pathogenesis has been reductionist (Fang and Casadevall, 2011), namely, assigning responsibility for virulence to either the microbe or the host. Such pathogen- and host-centric views, and in turn the scientific approaches fostered by these viewpoints, differ significantly in their historical underpinnings and philosophy (Biron and Casadevall, 2010). In fact, neither alone can account for how new infectious diseases arise. The conclusion that virulence is an emergent property is obvious when one considers that microbial virulence can only be expressed in a susceptible host (Casadevall and Pirofski, 2001). Consequently, the very same microbe can be virulent in one host but avirulent in another (Casadevall and Pirofski, 1999). Furthermore, host immunity can negate virulence, as evidenced by the effectiveness of immunization that renders a microbe as deadly as the variola virus completely avirulent in individuals inoculated with the vaccinia virus. Infection with a microbe can result in diametrically opposed outcomes, ranging from the death of a host to elimination of the microbe. Hence, virulence is inherently novel, unpredictable, and irreducible to first principles.

Box Icon

BOX A2-1

The Concept of Emergent Properties. Emergent properties are properties that cannot be entirely explained by their individual components (Ponge, 2005). An element of novelty is also considered to be an essential attribute of “emergent,” (more...)

Critical to our understanding of virulence as a property that can only be expressed in a susceptible host is that both the microbe and the host bring their own emergent properties to their interaction. Host and microbial cells receive and process information by signaling cascades that manifest emergent properties (Bhalla and Iyengar, 1999); e.g., gene expression studies reveal heterogeneous or bi-stable expression in clonal cell populations with important implications for phenotypic variability and fitness (Dubnau and Losick, 2006; Veening et al., 2008). Other emergent properties that have been identified in microbial and cellular systems could influence pathogenesis. Intracellular parasitism is associated with genome reduction, a phenomenon that could confer emergent properties, given that deliberate genome reduction in E. coli has led to unexpected emergent properties, such as ease of electroporation and increased stability of cloned DNA and plasmids (Posfai et al., 2006).

On the host side, many aspects of the immune system have the potential to spawn emergent properties. The antigenic determinants of a microbe are defined by antibodies and processing by host cells, consequently existing only in the context of an immune system (Van Regenmortel, 2004). Microbial determinants can elicit host-damaging immune responses. Such deleterious responses exemplify a detrimental emergent property of the same host defense mechanisms that mediate antimicrobial effects. The outcome of a viral infection can depend on prior infection with related or unrelated viruses that express related antigens; hence, the infection history of a host affects the outcome of subsequent infections (Welsh et al., 2010).

For those accustomed to viewing host–microbe interactions from an evolutionary perspective (Dethlefsen et al., 2007), the emergent nature of virulence is also no surprise, for the evolution of life itself can be viewed as an emergent process (Corning, 2002). Even in relatively well-circumscribed systems such as Darwin’s finches on the Galápagos Islands, evolutionary trends over time became increasingly unpredictable as a consequence of environmental fluctuations (Grant and Grant, 2002).

Consequences of the Emergent Nature of Microbial Virulence

The fact that virulence is an emergent property of host, microbe, and their interaction has profound consequences for the field of microbial pathogenesis, for it implies that the outcome of host–microbe interaction is inherently unpredictable. Even with complete knowledge of microbes and hosts, the outcome of all possible interactions cannot be predicted for all microbes and all hosts. Lack of predictability should not be unduly discouraging. Even in systems in which emergent properties reveal novel functions, such as fluid surface tension and viscosity, recognition of these properties can be useful. For example, molecular structure might not predict the hydrodynamics of a fluid, but the empirical acquisition of information can be exploited to optimize pipeline diameter and flow rates. Novelty is unpredictable but novel events can be interpreted and comprehended once they have occurred (Ablowitz, 1939). A pessimist might argue that living systems are significantly more complex than flowing liquids. However, such pessimism may be unwarranted. The appearance of new influenza virus strains every year is an emergent property resulting from high rates of viral mutation and host selection of variants (Lofgren et al., 2007). Hence, the time or place in which new pandemics will arise or the relative proportion of strains that will circulate each year cannot be predicted with certainty. Nevertheless, the likely appearance of new strains can be estimated from the history of population exposure to given strains and knowledge of recently circulating strains, and this information can be used to formulate the next year’s vaccine.

A Probabilistic Framework

Although the field of infectious diseases may never achieve the predictive certainty achieved in other branches of medicine, it may be possible to develop a probabilistic framework for the identification of microbial threats. Although all known pathogenic host–microbe interactions have unique aspects, and it is challenging to extrapolate from experiences with one microbe to another, a probabilistic framework can incorporate extant information and attempt to estimate risks. For example, the paucity of invasive fungal diseases in mammalian populations with intact immunity has been attributed to the combination of endothermy and adaptive immunity (Robert and Casadevall, 2009). This notion could be extrapolated to other environmental microbes, i.e., those that cannot survive at mammalian temperatures have a low probability of emerging as new human pathogens. On the other hand, the identification of known virulence determinants in new bacterial strains may raise concern. In this regard, the expression of anthrax toxin components in Bacillus cereus produces an anthrax-like disease that is not caused by Bacillus anthracis (Hoffmaster et al., 2004).

Given the experience of recent decades, we can predict with confidence that new infectious diseases are likely to continue to emerge and make some general predictions about the nature of the microbes that could constitute these threats. One possibility is that an emergent pathogen could come from elsewhere in the animal kingdom. A comprehensive survey revealed that three-fourths of emerging pathogens are zoonotic (Taylor et al., 2001). Crossing the species barrier can result in particularly severe pathology, as pathogen and host have not had the opportunity to co-evolve toward equilibrium. Another good bet is that an RNA virus could emerge as a pathogen. The high mutation rate and generally broad host range of RNA viruses may favor species jumps (Woolhouse et al., 2005), and many emergent human pathogens belong to this group, e.g., HIV, H5N1 influenza, SARS coronavirus, Nipah virus, and hemorrhagic fever viruses. On the other hand, global warming could hasten the emergence of new mammalian pathogenic fungi through thermal adaptation (Garcia-Solache and Casadevall, 2010), given that the relative resistance of mammals to fungal diseases has been attributed to a combination of higher body temperatures and adaptive immunity (Bergman and Casadevall, 2010; Robert and Casadevall, 2009).

Despite abandoning hopes for certainty and determinism in predicting microbial pathogenic interactions, we can attempt to develop a probabilistic framework that endeavors to estimate the pathogenic potential of a microbe based on lessons from known host–microbe interactions. A variety of mathematical models based on game theory or quantitative genetics have been developed in attempts to understand the evolution of virulence (Boots et al., 2009; Day and Proulx, 2004). These have provided interesting new insights into host–pathogen interactions, including the tendency for evolutionary dynamics to produce oscillations and chaos rather than stable fitness-maximizing equilibria, the unpredictability that results when multiple games are played simultaneously, and the tendency for three-way co-evolution of virulence with host tolerance or resistance to select for greater virulence and variability (Carval and Ferriere, 2010; Hashimoto, 2006; Nowak and Sigmund, 2004).

Preparing for the Unpredictable

Emerging infections seem to be becoming more frequent, and it is not difficult to understand why. An interesting experimental system examining a viral pathogen of moth larvae demonstrated that host dispersal promotes the evolution of greater virulence (Boots and Mealor, 2007). When hosts remain local, this encourages more “prudent” behavior by pathogens, but host movement encourages more infections and greater disease severity (Buckling, 2007). Global travel in the modern world can rapidly spread pathogenic microbes, but what is less obvious is that travel may also enhance virulence. Other factors contributing to the emergence and re-emergence of new pathogens include changes in land use, human migration, poverty, urbanization, antibiotics, modern agricultural practices, and other human behaviors (Cleaveland et al., 2007; IOM, 1992). Microbial evolution and environmental change, anthropogenic or otherwise, will continue to drive this process. Another implication of the emergent nature of virulence is recognition of the hubris and futility of thinking that we can simply target resources to the human pathogens that we already know well. The discovery of HIV as the cause of AIDS (Barre-Sinoussi et al., 1983) was greatly facilitated by research on avian and murine retroviruses that had taken place decades before (Hsiung, 1987), at a time when the significance of retroviruses as agents of human disease was unknown.

We share the view that sentinel capabilities are more important than predictive models at the present time (Barre-Sinoussi et al., 1983; Hsiung, 1987), but are optimistic that it will be possible to develop general analytical tools that can be applied to provide probabilistic assessments of threats from future unspecified agents. Comparative analysis of microbes with differing pathogenic potential and their hosts could provide insight into those interactions that are most likely to result in virulence. Hence, the best preparation for the unexpected and unpredictable nature of microbial threats will be the combination of enhanced surveillance with a broad exploration of the natural world to ascertain the range of microbial diversity from which new threats are likely to emerge.


  • Ablowitz R. The theory of emergence. Phil Sci. 1939;6:1–16.
  • Barre-Sinoussi F, Chermann JC, Rey F, Nugeyre MT, Chamaret S, et al. Isolation of a T-lymphotropic retrovirus from a patient at risk for acquired immune deficiency syndrome (AIDS) Science. 1983;220:868–871. [PubMed: 6189183]
  • Baylis CA. The philosophic functions of emergence. Philos Rev. 1929;38:372–384.
  • Beeson PB. Some diseases that have disappeared. Am J Med. 1980;68:806–811. [PubMed: 6992569]
  • Bergman A, Casadevall A. Mammalian endothermy optimally restricts fungi and metabolic costs. MBio. 2010;1:00212–10. [PMC free article: PMC2975364] [PubMed: 21060737]
  • Bhalla US, Iyengar R. Emergent properties of networks of biological signaling pathways. Science. 1999;283:381–387. [PubMed: 9888852]
  • Biron CA, Casadevall A. On immunologists and microbiologists: ground zero in the battle for interdisciplinary knowledge. MBio. 2010;1:e00280–10.
  • Blehert DS, Hicks AC, Behr M, Meteyer CU, Berlowski-Zier BM, et al. Bat white-nose syndrome: an emerging fungal pathogen? Science. 2009;323:227. [PubMed: 18974316]
  • Boots M, Best A, Miller MR, White A. The role of ecological feedbacks in the evolution of host defence: what does theory tell us? Philos Trans R Soc Lond B Biol Sci. 2009;364:27–36. [PMC free article: PMC2666696] [PubMed: 18930880]
  • Boots M, Mealor M. Local interactions select for lower pathogen infectivity. Science. 2007;315:1284–1286. [PubMed: 17332415]
  • Buckling A. Epidemiology. Keep it local. Science. 2007;315:1227–1228. [PubMed: 17332398]
  • Carval D, Ferriere R. A unified model for the coevolution of resistance, tolerance, and virulence. Evolution. 2010;64:2988–3009. [PubMed: 20497218]
  • Casadevall A, Pirofski L. Host-pathogen interactions: redefining the basic concepts of virulence and pathogenicity. Infect Immun. 1999;67:3703–3713. [PMC free article: PMC96643] [PubMed: 10417127]
  • Casadevall A, Pirofski L. Host-pathogen interactions: the attributes of virulence. J Infect Dis. 2001;184:337–344. [PubMed: 11443560]
  • Cleaveland S, Haydon DT, Taylor L. Overviews of pathogen emergence: which pathogens emerge, when and why? Curr Top Microbiol Immunol. 2007;315:85–111. [PubMed: 17848062]
  • Corning PA. The re-emergence of ‘emergence’: a venerable concept in search for a theory. Complexity. 2002;7:18–30.
  • Daszak P, Berger L, Cunningham AA, Hyatt AD, Green DE, et al. Emerging infectious diseases and amphibian population declines. Emerg Infect Dis. 1999;5:735–748. [PMC free article: PMC2640803] [PubMed: 10603206]
  • Day T, Proulx SR. A general theory for the evolutionary dynamics of virulence. Am Nat. 2004;163:E40–E63. [PubMed: 15122509]
  • Dethlefsen L, McFall-Ngai M, Relman DA. An ecological and evolutionary perspective on human-microbe mutualism and disease. Nature. 2007;449:811–818. [PubMed: 17943117]
  • Dubnau D, Losick R. Bistability in bacteria. Mol Microbiol. 2006;61:564–572. [PubMed: 16879639]
  • Fang FC, Casadevall A. Reductionistic and holistic science. Infect Immun. 2011;79:1401–1414. [PMC free article: PMC3067528] [PubMed: 21321076]
  • Garcia-Solache MA, Casadevall A. Global warming will bring new fungal diseases for mammals. MBio. 2010;1:e00061–10. [PMC free article: PMC2912667] [PubMed: 20689745]
  • Grant PR, Grant BR. Unpredictable evolution in a 30-year study of Darwin’s finches. Science. 2002;296:707–711. [PubMed: 11976447]
  • Hashimoto K. Unpredictability induced by unfocused games in evolutionary game dynamics. J Theor Biol. 2006;241:669–675. [PubMed: 16490216]
  • Henpel CG, Oppenheim P. Studies in the logic of explanation. Phil Sci. 2011;15:135–175.
  • Hoffmaster AR, Ravel J, Rasko DA, Chapman GD, Chute MD, et al. Identification of anthrax toxin genes in a Bacillus cereus associated with an illness resembling inhalation anthrax. Proc Natl Acad Sci U S A. 2004;101:8449–8454. [PMC free article: PMC420414] [PubMed: 15155910]
  • Taylor LH, Latham SM, Woolhouse ME. Risk factors for human disease emergence. Philos Trans R Soc Lond B Biol Sci. 2001;356:983–989. [PMC free article: PMC1088493] [PubMed: 11516376]
  • Hsiung GD. Perspectives on retroviruses and the etiologic agent of AIDS. Yale J Biol Med. 1987;60:505–514. [PMC free article: PMC2590378] [PubMed: 2829449]
  • IOM. Emerging infections: microbial threats to the United States. Washington (DC): Institute of Medicine; 1992.
  • Lofgren E, Fefferman NH, Naumov YN, Gorski J, Naumova EN. Influenza seasonality: underlying causes and modeling theories. J Virol. 2007;81:5429–5436. [PMC free article: PMC1900246] [PubMed: 17182688]
  • Mazzocchi F. Complexity in biology. Exceeding the limits of reductionism and determinism using complexity theory. EMBO Rep. 2008;9:10–14. [PMC free article: PMC2246621] [PubMed: 18174892]
  • Nowak MA, Sigmund K. Evolutionary dynamics of biological games. Science. 2004;303:793–799. [PubMed: 14764867]
  • Parrish JK, Viscido SV, Grumbaum D. Self organized fish schools: an example of emergent properties. Biol Bull. 2011;202:296–305. [PubMed: 12087003]
  • Ponge JF. Emergent properties from organisms to ecosystems: towards a realistic approach. Biol Rev Camb Philos Soc. 2005;80:403–411. [PMC free article: PMC2675173] [PubMed: 16094806]
  • Posfai G, Plunkett G III, Feher T, Frisch D, Keil GM, et al. Emergent properties of reduced-genome Escherichia coli. Science. 2006;312:1044–1046. [PubMed: 16645050]
  • Pounds JA, Bustamante MR, Coloma LA, Consuegra JA, Fogden MP, et al. Widespread amphibian extinctions from epidemic disease driven by global warming. Nature. 2006;439:161–167. [PubMed: 16407945]
  • Quinn RW. Comprehensive review of morbidity and mortality trends for rheumatic fever, streptococcal disease, and scarlet fever: the decline of rheumatic fever. Rev Infect Dis. 1989;11:928–953. [PubMed: 2690288]
  • Robert VA, Casadevall A. Vertebrate endothermy restricts most fungi as potential pathogens. J Infect Dis. 2009;200:1623–1626. [PubMed: 19827944]
  • Tognotti E. The rise and fall of syphilis in Renaissance Europe. J Med Humanit. 2009;30:99–113. [PubMed: 19169798]
  • Van Regenmortel MH. Reductionism and complexity in molecular biology. Scientists now have the tools to unravel biological and overcome the limitations of reductionism. EMBO Rep. 2004;5:1016–1020. [PMC free article: PMC1299179] [PubMed: 15520799]
  • Veening JW, Smits WK, Kuipers OP. Bistability, epigenetics, and bet-hedging in bacteria. Annu Rev Microbiol. 2008;62:193–210. [PubMed: 18537474]
  • Welsh RM, Che JW, Brehm MA, Selin LK. Heterologous immunity between viruses. Immunol Rev. 2010;235:244–266. [PMC free article: PMC2917921] [PubMed: 20536568]
  • Woolhouse ME, Haydon DT, Antia R. Emerging pathogens: the epidemiology and evolution of species jumps. Trends Ecol Evol. 2005;20:238–244. [PubMed: 16701375]



7 Senior Scientist, Molecular Epidemiology, British Columbia Centre for Disease Control.

Outbreak Investigation: A Brief Primer

In public health, we are often confronted with the task of “solving” an infectious disease outbreak—identifying all the cases, determining a source of the illness, and deploying an intervention to prevent further cases. A typical scenario unfolds as follows. A potential outbreak alert is issued when routine laboratory or population-based surveillance methods detect a statistically significant increase in case counts relative to historical norms for a particular disease, or when an astute clinician or public health official notes an unusual clustering of cases. This alert triggers an initial investigation combining descriptive epidemiology with laboratory work. Epidemiologists use interviews and questionnaires to review case data, such as travel history, food exposures, and attendance at social events, with the goal of revealing common behaviours across cases—eating the same food items, visiting the same locations, or shared contact with a particular individual.

At the same time, microbiologists carry out their own epidemiological investigation using genotyping techniques. Similar to the genetic fingerprinting methods used in paternity testing or in forensic crime scene analysis, these “molecular epidemiology” tools, including pulsed-field gel electrophoresis (PFGE) and multi-locus sequencing typing (MLST), can quickly reveal whether a collection of bacterial specimens share a common genetic fingerprint and likely represent a true outbreak, or whether they display a range of genotypes and simply reflect an unusual excess of cases of that particular illness, none of which are related to each other.

The results of the descriptive epidemiology and molecular epidemiology investigations are then compared, and a determination is made as to whether the cluster of cases is truly an outbreak meriting further investigation. If this is indeed the case, then a more robust field epidemiological investigation is typically undertaken. This includes enhanced case-finding using more detailed survey instruments as well as case-control studies in which behaviours of cases are compared to those of controls in order to quantify risk factors strongly associated with illness. Through these analyses, investigators are able to form and test a specific hypothesis regarding the source of the outbreak. Laboratory work is also critical at this stage—new cases are genotyped to determine whether they are part of the outbreak, while genotyping of isolates collected from food, water, and other non-human sources can confirm or rule out these entities as potential sources of the outbreak.

Once a source of the outbreak has been confirmed, intervention measures can be put in place. For food- or water-borne outbreaks, these typically involve issuing a recall for the food item in question, eliminating access to the water source until it has been declared safe, and issuing extensive media alerts warning consumers of the risks associated with the entity in question. For outbreaks involving personal contact or attendance at a shared location, such as a specific hospital ward, active case-finding is used to find and treat all infected patients or potential carriers of an illness, while infection control approaches such as patient decolonization or enhanced cleaning are deployed to prevent further infections.

Unfortunately, not every outbreak can be neatly resolved. A number of factors greatly limit public health’s ability to investigate an outbreak from both the field epidemiology and molecular epidemiology perspectives (Figure A3-1). Field investigations are typically limited by resources—not having enough personnel, time, or money to be able to effect a complete investigation—and patients’ inability to recall specific events that might be relevant to the investigation. Molecular epidemiology approaches are also limited in their utility. For some pathogens, such as Salmonella Enteritidis, unrelated isolates from multiple outbreaks may show identical genetic fingerprints. For others, such as Campylobacter jejeuni, one outbreak may comprise multiple distinct genetic fingerprints due to frequent rearrangement of the pathogen’s genome. Genotyping typically requires the organism in question to be cultured, which may add several weeks to an investigation in the case of slow-growing organisms such as Mycobacterium tuberculosis, and the costs of many molecular epidemiology assays are not insignificant, meaning they are often not routinely deployed.

A two-panel illustration showing how the limitations of field and molecular epidemiology complicate outbreak reconstructions


An example demonstrating how the limitations of field and molecular epidemiology complicate outbreak reconstructions. Panel A shows the “true” outbreak scenario—two different genotypes of pathogen are found in the hosts (white (more...)

One of the biggest limitations of current molecular epidemiology methods is the low level of resolution they provide. At best, such tools are only capable of determining whether or not an isolate belongs to an outbreak cluster. Further detail, such as the order of person-to-person transmission, the underlying pattern of spread—superspreader or ongoing chains of transmission—is beyond the scope of current laboratory methods.

An Illustrative Example

In May 2006, a case of pleural tuberculosis (TB) was diagnosed in an adult female in a medium-sized community in British Columbia, Canada. Although pleural TB is suggestive of recently acquired disease, inquiring after the case’s contacts did not suggest a potential source for her illness. Molecular analysis using a TB-specific technique called mycobacterial interspersed repetitive unit variable number tandem repeat (MIRU-VNTR) was performed. In MIRU-VNTR, 24 variable number tandem repeat loci around the M. tuberculosis genome are amplified using polymerase chain reaction (PCR), followed by capillary electrophoresis to enumerate the number of repeats present at each locus. The patient’s MIRU-VNTR genotype indicated she harboured the same strain of TB that had been circulating in her community for several years. She was assumed to represent one of the few annual cases of TB that community regularly observed in a year and was treated for her illness.

Some months later, a second case of TB was reported in the community, this time in an infant female with no epidemiological link to the May case. TB in children is considered to be a marker for recent community transmission, and when MIRU-VNTR revealed the infant to be infected with the same strain of TB as the earlier case, the local public health nurses began an intensive case-finding effort. Using an approach called reverse contact tracing, they identified individuals who had been in contact with the infant and screened them using a tuberculin skin test in an attempt to find out who had been the source of the child’s infection. This investigation led to the diagnosis of nine more cases of active TB in the community, and an outbreak was declared.

Extensive investigation soon followed. Each case was interviewed using a detailed questionnaire, and the resulting data—connections between individuals who reported social relationships with each other, links between people and the places they regularly spent time at, and links between people and specific behaviours associated with an increased risk of TB infection, such as smoking, alcohol use, or drug use—was visualized as a network. The network suggested a potential source for the outbreak—an individual who had been symptomatic and undiagnosed for many months prior to detection of the first case, who had a number of risk factors, and who was a high-degree node in the network—they reported contact with many of the cases.

Although the investigation revealed the likely source case, who was immediately put on treatment, the outbreak eventually grew to include 41 individuals over the 2006–2008 period, with a handful of subsequent diagnoses from 2009 onward.

Attempting to reconstruct the path the organism took through the community proved to be impossible. Despite the rich epidemiological and clinical data available, the social network structure in the community was too dense to interpret—each individual case had an average of six contacts with other cases, and most everyone in the community reported spending time at the same locations, including a series of hotel pubs and crack houses, meaning there were many potential sources for each person’s infection (Figure A3-2). All cases had identical 24-loci MIRU-VNTR patterns, but the low resolution of this technique was incapable of identifying smaller subclusters within the outbreak.

An illustration of the social network of outbreak in a community


The dense social network in the outbreak community complicated outbreak reconstruction attempts. Circular nodes represent outbreak cases—grey nodes are individuals with smear-positive tuberculosis; white nodes are individuals with smear-negative (more...)

While the outbreak eventually abated, our inability to reconstruct individual transmission events meant that an important learning opportunity was missed. We could not describe the underlying pattern of disease spread in the community, we could not compare it to other TB outbreaks to determine whether this organism behaves in a similar way across different outbreaks in different communities with different social network structures, and we could not use our experience to guide future TB outbreak investigations.

This uncertainty about how an outbreak unfolds is not unique to tuberculosis. For the majority of communicable diseases, our understanding of how they behave “in the wild” is limited. Unfortunately, this lack of understanding of pathogens’ natural transmission tendencies precludes developing any sort of proactive evidence-based interventions. We do not know whether there are “one-size-fits-all” interventions for a given pathogen or a family of disease, or whether each outbreak is unique and will require a specifically tailored intervention.

The Rise of the Next Generation

We will return to the tuberculosis story in time; for now, we must climb into our microbial genomics time machine and rewind several years. . . . The complete genome has sometimes been described as “the ultimate genotype”—examining the total genetic content of an organism reveals the unique fingerprint that sets each of us apart from the other members of our species. Until recently, however, interrogating the complete genome of anything larger than a virus required a significant investment of time, money, and analytical resources. Sequencing of the first bacterial genome, Haemophilus influenzae, in 1995 took more than a year, cost nearly USD$1 million, and involved a large team of researchers running a not insignificant number of DNA sequencers. Subsequent microbial genomics efforts targeted individual bacteria selected to represent a range of common laboratory strains and interesting clinical isolates; experiments reporting the sequencing of more than one isolate were relatively rare and certainly outside the scope of most research groups’ technological abilities.

Tracking the number of bacterial genome projects recorded in the Genomes Online Database (GOLD) reveals that after approximately a decade of steady progress in microbial genomics, a sudden and dramatic upswing in the number of sequenced genomes began around 2006 (Figure A3-3). This sea change coincides with the commercial release of the so-called “next-generation” DNA-sequencing technologies. Previously, DNA sequencing was performed using the Sanger method, originally developed by Frederick Sanger in 1977, although subsequently modified to improve throughput. Next-generation sequencing methods, including the pyrosequencing platform commercialized by Roche, a reversible terminator platform marketed by Illumina, and the newer ion semiconductor-based approach available through Life Technologies, all take a fundamentally different approach to sequencing. They are based on the concept of “sequencing by synthesis,” in which DNA synthesis is essentially observed in real-time, with the sequencing instrument using one of the above technologies to extend a template base by base, and record which base was added at each step. Although the reads produced by these technologies are much shorter than those resulting from Sanger sequencing, the sheer magnitude of the parallel sequencing made possible by these approaches means that next-generation sequencing runs generate orders of magnitude more data in a single run than Sanger sequencers are capable of.

A graph showing the number of bacterial genome projects in the Genomes Online Database


The number of bacterial genome projects recorded in the Genomes Online Database (GOLD) increased exponentially with the introduction of next-generation sequencing methods in the mid-2000s.

With these new platforms, the cost of sequencing a complete bacterial genome dropped dramatically. Now, many large genome centres operating optimized pipelines and high-volume sequencers are able to offer their clients full bacterial genome sequences for between USD$50–250 per genome. Run times can be as little as a few hours for certain platforms, meaning it is now possible to sequence tens or even hundreds of bacterial genomes within a week for only a few thousand dollars.

A New Tool: Genomic Epidemiology

Soon after the commercialization and adoption of next-generation sequencing technologies, a few astute clinicians and infectious disease researchers recognized the technology’s potential for transforming molecular epidemiology. In a wonderful example of convergent evolution, several independent research groups around the world embarked upon proof-of-concept projects in the new area of “genomic epidemiology,” with the first few papers in the field appearing in 2010 and 2011.

The basic premise behind genomic epidemiology is that the microevolutionary events occurring within a pathogen’s genome over the course of an outbreak can be used as markers of transmission. For example, consider an outbreak in which the first patient is colonized with a bacterium having the genome sequence AAAAA. Any individuals infected by that person would then be colonized with bacteria having the same genome sequence. As a result of the natural process of mutation, many of these second-generation organisms will accrue a small number of nucleotide changes (the number depends on the duration of infection and the natural mutation rate of the pathogen in question). If we have three second-generation patients, one might accrue no mutations and continue to display the AAAAA genome sequence, one might show the sequence ACAAA, and one might contain the sequence AAAAG. The third generation of cases, those infected by these individuals, will then show genome sequences identical to or descended from these second-generation cases. By sequencing the genomes of all the outbreak organisms and identifying positions that vary over the course of the outbreak, one should, in theory, be able to infer the individual transmission events that gave rise to the outbreak (Figure A3-4).

A diagram showing how accrual of mutations can be used to trace person-to-person transmission


Using microevolutionary events to track person-to-person spread of a pathogen over a social network. As a pathogen spreads over a contact network, the accrual of mutations can be used to trace person-to-person transmission. When a mutation arises in one (more...)

The words “in theory” are very important in this case. The first two studies to use genomics to identify person-to-person transmission events both revealed that the answers are not so readily forthcoming.

In the first study (Lewis et al., 2010), genome sequence was obtained for six multidrug-resistant Acinetobacter baumannii isolates from a hospital outbreak occurring over a seven-week period—four from military patients and two from civilian patients. The hope was that the study would reveal how the bacterium was transferred from the military patients—who were presumed to have been infected in the field prior to hospital admission—to the civilian patients. Three positions across the approximately 4-megabase pair genome were found to vary between isolates, with patients showing four genotypes at these positions: CAG, TAG, TAT, and TTG. One of the civilian patients shared a genotype with two of the military patients, suggesting that one of these military patients was the source of the civilian’s infection. Examination of this hypothesis in the context of the available epidemiological information revealed that one of the military patients was housed in the bed next to the civilian, making him or her the most likely source. The other civilian patient displayed a unique genotype, and the study authors were not able to infer the source of that individual’s infection.

In the second study (Gardy et al., 2011), genomics was used to reconstruct the tuberculosis outbreak described earlier in this paper. Genome sequence was obtained for 36 M. tuberculosis isolates—32 from outbreak cases and 4 from patients diagnosed in the same community in the decade prior to the outbreak, all of which had the same 24-loci MIRU-VNTR fingerprint as the outbreak cases. More than 200 single nucleotide polymorphisms (SNPs) were found among the isolates, and the authors realized the nature of TB infection meant it would be impossible to trace the outbreak’s path SNP by SNP, as was done by Lewis et al. In TB, an individual may be infectious for a period of many months, during which time the colonizing organism is continuing to accrue mutations. If the individual transmits the disease to an individual on day 1 of his or her illness, to another individual on day 180, and to a third individual on day 270, and is diagnosed and his or her organism sampled at day 300, his or her isolate might show similarity to the isolate in patient 3, but could be very different from the isolate in patient 1, making it difficult to ascribe patient 1’s disease to the source case. The variable periods of latency associated with TB further complicate SNP-by-SNP reconstruction of transmission.

Instead, the authors used a phylogenetic tree of the data to demonstrate that two separate lineages of M. tuberculosis—labeled A and B—could be resolved within the single MIRU-VNTR genotype. Thus the genomic data acted as a sort of enhanced genotyping method, able to break one MIRU-VNTR cluster containing all the isolates into two distinct genome-based clusters, A and B. Although the original social network describing the relationships between all the outbreak cases was too complex to resolve, when it was broken down into two networks—one showing connections between A cases and one showing connections between B cases—the data became much more interpretable and several person-to-person transmission events could be identified. This revealed that several key individuals acting as superspreaders were associated with the majority of transmission events and that factors including delays in diagnosis, clinical presentation, and risk behaviours contributed toward these individuals’ role as sources of infection. Not every transmission event could be identified, however, and the genomic data also suggested that some individuals in the network might have exhibited coinfection with both an A and a B strain.

Best Practices and Future Directions

The earliest genomic epidemiology studies suggested that when combined with epidemiological and clinical data, whole genome sequencing has the potential to inform reconstructions of communicable disease outbreaks. Since the publication of the first few studies, several other papers have described using whole genome sequencing to solve outbreaks of other organisms, including Clostridium difficile, methicillin-resistant Staphylococcus aureus, and Klebsiella pneumoniae. Projects are becoming larger and more ambitious, sequencing hundreds of isolates collected across large regions over many years, and the number of outbreak reconstructions available for an individual pathogen is growing as well.

As this emerging field continues to find its place in the realm of public health microbiology, it is important to note several “best practices” that must be considered when doing such a study.

  1. Genomic data alone cannot reliably identify individual transmission events. The genomic data must be combined with epidemiological and clinical information if a plausible reconstruction of an outbreak is to be achieved.
  2. The bioinformatics methods for identifying positions of variation across a series of isolates are not perfect. The results of any analysis must be carefully examined to ensure that errors in alignment or inappropriate scoring thresholds are not causing variants to be erroneously called or missed.
  3. The data must be considered in terms of biological plausibility. An expected level of variation over an outbreak can be inferred from organisms’ mutation rates. If the observed variation is much less or much greater than the expected variation, then the analysis used to generate that data must be reexamined.
  4. For others to evaluate a study’s accuracy and reproduce the results, the raw sequencing data for each isolate should be made freely available in a public repository. Manuscripts describing genomic epidemiology studies should include, as appendices, the analysis commands used to generate the data and a detailed description of how the data were filtered and processed after genome assembly and SNP calling.
  5. There is a significant amount of interesting biology that can be mined from outbreak-derived genome sequences, particularly in the area of population genetics. To maximize the value of a sequencing data set, it is well worth identifying academic partners who can use a study data set for further analyses.

As more and more genomic epidemiology projects are undertaken, the natural behaviour of pathogens in the wild will slowly be revealed. We will know much more about their spatial and temporal patterns of spread, whether superspreading is as common as early reports are indicating, and the factors that influence an individual’s tendency to spread disease. It will then be up to public health agencies to use this valuable information to develop evidence-based interventions—for example, directing case-finding efforts around contacts of potential superspreaders, or designing prevention programs targeted to specific high-risk communities or individuals. In our own work, we are sequencing 20 years worth of TB in a single Canadian province to identify province-wide transmission routes, community-level transmission events, and socioeconomic and clinical risk factors for acting as a source or sink community for disease. It is our hope that the resulting data will allow us to reshape our current TB prevention and control programs, enabling us to use our limited resources for maximum effect.

As sequencing technologies improve—generating longer reads at lower costs—and as bioinformatics methods become more reliable at identifying variation, we anticipate even more accurate and detailed outbreak reconstructions. The coming decade will be an exciting time for genomic epidemiology as it moves from proof of concept to a routine component of clinical practice.


  • Gardy JL, Johnston JC, Ho Sui SJ, Cook VJ, Shah L, Brodkin E, Rempel S, Moore R, Zhao Y, Holt R, Varhol R, Birol I, Lem M, Sharma MK, Elwood K, Jones SJM, Brinkman FSL, Brunham RC, Tang P. Whole-genome sequencing and social-network analysis of a tuberculosis outbreak. New England Journal of Medicine. 2011;364:730–739. [PubMed: 21345102]
  • Lewis T, Loman NJ, Bingle L, Jumaa P, Weinstock GM, Mortiboy D, Pallen MJ. High-throughput whole-genome sequencing to dissect the epidemiology of Acinetobacter baumannii isolates from a hospital outbreak. Journal of Hospital Infection. 2010;75(1):37–41. [PubMed: 20299126]


,9,* ,10,11 ,8 ,12 ,8 ,13 ,12 and 11,*.

8 Reprinted with permission by permission of Oxford University Press. Originally published as Ghedin E, et al. (2012) Presence of oseltamivir-resistant pandemic A/H1N1 minor variants before drug therapy with subsequent selection and transmission. Journal of Infectious Diseases 206(10), 1504-1511. doi: 10.1093/infdis/jis571.
9 Department of Computational & Systems Biology, Center for Vaccine Research, University of Pittsburgh School of Medicine, Pittsburgh, PA 15261, USA.
10 Center for Infectious Disease Dynamics, Department of Biology, The Pennsylvania State University, University Park, PA 16802, USA.
11 Fogarty International Center, National Institutes of Health, Bethesda, MD 20892, USA.
12 Centre de Recherche du Centre Hospitalier Universitaire de Québec and Université Laval, Québec City, Québec, Canada.
13 McGill University Health Centre, Montréal, Québec, Canada.
* Corresponding authors: Dr. Elodie Ghedin, Center for Vaccine Research, University of Pittsburgh School of Medicine, 3501 5th Avenue, BST3 Room 9043b, Pittsburgh, PA 15261. Phone: (412) 383-5850. E-mail: ude.ttip@12gle. Dr. Guy Boivin, Centre de Recherche en Infectiologie, CHUL, room RC-709, 2705 Laurier, Québec City, QC, Canada. E-mail: ac.lavalu.luhcrc@niviob.yug


A small proportion (1–1.5%) of 2009 pandemic A/H1N1 influenza viruses (A(H1N1)pdm09) are oseltamivir-resistant, due almost exclusively to a H275Y mutation in the neuraminidase protein. However, many individuals infected with resistant strains had not received antivirals. Whether drug-resistant viruses are initially present as minor variants in untreated subjects before they emerge as the dominant strain in a virus population is of great importance for predicting the speed at which resistance will arise. To address this issue, we employed ultra-deep sequencing of viral populations from serial nasopharyngeal specimens from an immunocompromised child and from two individuals in a household outbreak. We observed that the Y275 mutation was present as a minor variant in infected hosts prior to onset of therapy. We also found evidence for the transmission of this drug-resistant variant alongside drug-sensitive viruses. These observations provide important information on the relative fitness of the Y275 mutation in the absence of oseltamivir.

The 2009 pandemic A/H1N1 influenza virus (A(H1N1)pdm09) emerged following reassortment between two swine viruses circulating in North America and Eurasia (Garten et al., 2009). Between 1 and 1.5% of A(H1N1)pdm09 strains analyzed to date have been found to be resistant to oseltamivir, a neuraminidase (NA) inhibitor that constitutes the current standard of care (Pizzorno et al., 2011a). Virtually all oseltamivir-resistant A(H1N1)pdm09 viruses contain an H275Y amino acid substitution in the viral NA gene (Pizzorno et al., 2011b). Among the drug-resistant strains recovered from immunocompetent patients, approximately one-third have been recovered from untreated individuals (WHO, 2011). Whether drug-resistant variants are initially present as minor variants in untreated subjects due to transmission from a host harboring a minority drug-resistant population, or whether they emerge following de novo replication, is of great importance for predicting the speed at which resistance will arise: the selection of resistant mutations will occur more rapidly if they are already present within hosts as pre-existing minor variants (Bonhoeffer and Nowak, 1997). In addition, the presence (or not) of the H275Y mutation in pre-treatment samples provides important information on the relative fitness of drug resistance mutations in the absence of oseltamivir.

To determine whether the H275Y mutation is present as a minor variant within hosts infected with influenza A virus, we performed ultra-deep sequencing of viral populations from nasopharyngeal specimens of two sets of individuals infected with A(H1N1)pdm09 viruses. First, we examined longitudinal samples collected from an immunocompromised child who remained infected for more than 6 weeks, during which time a drug-resistant strain came to dominate the virus population. Second, we analyzed the emergence of oseltamivir-resistant viruses in an household outbreak of A(H1N1)pdm09 infections in which the contact case developed influenza symptoms 24 hours after starting post-exposure oseltamivir prophylaxis (Baz et al., 2009).

Materials and Methods

Study 1: Immunocompromised Child

A 31-month-old boy weighing 13.4 kg, diagnosed three months earlier with medulloblastoma, was admitted on January 5, 2011, for consolidation chemotherapy in preparation for the first of 3 consecutive autologous bone marrow transplants (ABMT). On admission, the child presented rhinorrhea and mild cough but was afebrile. Members of his immediate family, including his older sister and his father, had cold-like symptoms 1–2 weeks prior; none of the family members, including the patient, had received the 2010–11 influenza vaccine, the monovalent A(H1N1)pdm09 vaccine or any antiviral drug. A nasopharyngeal aspirate (NPA) collected on admission was positive for the A(H1N1)pdm09 virus by real-time RT-PCR (Semret M et al., 2009) and by viral culture on A549 and Mink lung cells. Treatment with oseltamivir (30 mg, twice daily) was started on January 6. The following day, the patient developed fever (max. 39.2°C), coincident with dropping neutrophil counts. The child received his first ABMT on January 10. NPA specimens collected throughout admission remained positive for A(H1N1) pdm09 influenza virus by RT-PCR (Table A4-1). Oseltamivir therapy was continued during the hospitalisation and after discharge on January 22. The patient was readmitted from January 27 to February 14, 2011 for his second ABMT. A NPA specimen collected on January 28 was positive for A(H1N1)pdm09 by RT-PCR. Because of persistent viral excretion, oseltamivir was replaced by zanamivir (25 mg inhaled four times daily) on February 1 and continued until negative RT-PCR results on February 17. The patient received a third ABMT on February 18 and he recovered from his influenza infection without complications.

TABLE A4-1. Virological testing of nasopharyngeal aspirates sampled from a young boy undergoing autologous bone marrow transplantation and infected with A(H1N1)pdm09 influenza.


Virological testing of nasopharyngeal aspirates sampled from a young boy undergoing autologous bone marrow transplantation and infected with A(H1N1)pdm09 influenza. n.e. = not evaluated. CM1 and CM2 = Culture passages 1 and 2. S = Primary specimen (nasopharyngeal (more...)

Study 2: Transmission in Household

A detailed description of the familial cluster of infections with A(H1N1) pdm09 virus has been reported elsewhere (Baz et al., 2009). Briefly, a 13-year-old asthmatic male developed infection with A(H1N1)pdm09 confirmed by RT-PCR testing of a NPA. The child was started on oseltamivir (60 mg twice daily for 5 days) and discharged home the same day. Simultaneously to treatment of the index case, post-exposure oseltamivir prophylaxis (75 mg once daily for 10 days) was prescribed to the 59-year-old father with chronic obstructive pulmonary disease. Approximately 24 hours after beginning oseltamivir prophylaxis, the father developed influenza-like symptoms. On day 8 of oseltamivir prophylaxis, he consulted his general practitioner for persistent cough. An NPA collected at that time was positive by RT-PCR and by culture for A(H1N1)pdm09. The father had an uneventful clinical course, and an NPA sampled at the end of his illness was negative. The son’s A(H1N1)pdm09 isolate collected before oseltamivir therapy was susceptible to oseltamivir (50% inhibitory concentration or IC50: 0.27 nM), whereas the father’s A(H1N1)pdm09 isolate was highly resistant to oseltamivir (IC50 > 400 nM). The complete (consensus sequence) virus genomes of the father (GenBank accession FN434454) differed by one amino substitution (H275Y) in the NA protein compared to the virus present in the son (GenBank FN434445).

Informed Consent

Written consent was obtained for report of the case described in Study 1. Samples used in Study 2 were obtained as part of an investigation of the Public Health Department of the Ministry of Health, Quebec, Canada.

Clinical Specimens and Viral Culture

In Study 1 (immunocompromised child), 7 NPAs were collected between January 5 and February 17, 2011, for RT-PCR testing (Table A4-1 and Figure A4-1). Viral isolates were also obtained by culture from NPAs sampled on January 5 and January 20. In Study 2 (household transmission), the NPA from the index case (son) was collected prior to oseltamivir treatment whereas the NPA from his father was obtained on day 8 of oseltamivir prophylaxis (Figure A4-1).

An illustration outlining studies of the day of onset of pandemic A/H1N1 when oseltamivir treatment was started


Outline of studies indicating day of onset, day when oseltamivir treatment was started, and sampling timeline. A) Study 1: Immunocomprised 31-month-old boy. B) Study 2: Son–father transmission.

NA Inhibition Assay

The drug resistance phenotype to NA inhibitors was determined by NA inhibition assays (Potier et al., 1979). The IC50 values were determined from the dose response curve. A virus was considered resistant to a drug if its IC50 value was 10-fold greater than that of the wild-type (WT) virus (Mishin et al., 2005).

RNA Extraction

Total RNA was extracted from 200 μL of thawed specimen or culture using the MagNA Pure instrument and the MagNA Pure LC total nucleic acid isolation kit (Roche Applied Science) according to the manufacturer’s instructions and stored at −80°C.

Discriminative Real-time PCR Assay

To discriminate between WT and H275Y oseltamivir-resistant strains of A(H1N1)pdm09, a modified version of a previously reported real-time RT-PCR method (van der Vries et al., 2010) was used to test samples. This technique requires a reverse (panN1-H275-sense 5′–cagtcgaaatgaatgcccctaa-3′) and a forward (panN1-H275-antisense 5′–tgcacacacatgtgatttcactag-3′) primer for both the WT and the H275Y viruses and two labelled allele-specific probes: panN1-275H-probe (5′–ttaTCActAtgAggaatga-6-FAM/BHQ-1) and panN1-275Y-probe (5′–ttaTTActAtgAggaatga-HEX/BHQ-1). In the aforementioned probe sequences, locked nucleic acid (LNA) nucleotides are denoted in upper case, DNA nucleotides are denoted in lower case, and the single nucleotide polymorphism (SNP) is underlined. The limits of detection for the assay are 50 copies for the H275Y target and between 10 and 50 copies for the WT target. RT-PCR conditions are available upon request. Data acquisition was performed in both FAM and HEX filters during the annealing/extension step. Standard curves were constructed using 10-fold serial dilutions of pJET1.2-NA-Y275 and pJET1.2-NA-H275 plasmids.

Sequencing and analysis RNA isolated from two cultured isolates and seven primary specimens collected for Study 1 (Figure A4-1A), and two primary specimens for Study 2 (Figure A4-1B), was subjected to a multisegment RT-PCR (M-RT-PCR) step (Zhou et al., 2009) and random priming with barcoding using the SISPA (sequence independent single primer amplification) protocol (Djikeng et al., 2008). For each RNA sample, we performed two M-RT-PCR reactions using the One Step Superscript III RT kit (Invitrogen). Reactions were purified independently using the Qiagen MinElute kit and quantitated on a Nanodrop spectrophotometer; 100–200 ng of each purified M-RT-PCR reaction was used in two separate SISPA reactions with two different barcode tags for a total of 4 tagged reactions per original RNA sample. Products were then separated on a 1% agarose gel and fragments from 200–400bp purified with the Qiagen MinElute kit. Pooled samples were sent for paired end (PE) library preparation and 100 base sequencing on the Illumina Hi-Seq2000 platform.

The barcoded amplification products were sequenced on one lane of the sequence run. Analyses were performed to reduce the distortion caused by SISPA amplification, account for both PCR and sequencing errors, and provide a “clean” comparison between the mapped reads of the experimental samples. The trimmed reads were mapped to A/Quebec/144147/2009(H1N1) (GenBank accession FN434457-FN434464) using the bowtie short-read aligner (Langmead et al., 2009).

The frequency of each codon observed in the set of mapped reads from each amplification replicate was tabulated across each of the 10 influenza genes. To account for sequence-specific errors (Minoche et al., 2011; Nakamura et al., 2011), the variant counts for the forward and reverse direction reads were calculated separately, and only those variants for which counts were within 50% of each other in both directions were retained. For these summaries, the unique reads from all amplification replicates were pooled and total coverage is reported for each codon site. The proportion of codons expected to differ from the consensus due to background mutation and technical error was estimated from a separate cell culture of the PR8 strain that was otherwise processed in exactly the same manner as the specimens in this study. This proportion, found to be 0.00392, lies well outside of the 95% confidence interval for any variant codon in our study that is (a) represented by more than 4 sequence reads, and (b) found in at least 2% of all sequence reads mapped to that position. The lower limit of the 95% confidence interval determined by computing the inverse of the appropriate cumulative Beta distribution is 0.00813.


Presence of Drug-Resistant Viruses Before Drug Treatment in an Immunocompromised Child (Study 1)

The results of the NA gene H275Y discriminatory real-time RT-PCR assay performed on the seven primary specimens and the two viral isolates (January 5 and 20) are presented in Table A4-1. In the first NPA collected on January 5 (day 1), prior to antiviral therapy (initiated January 6), 99.9% of the viral population was WT at NA position 275 by our discriminatory assay. Nevertheless, a very small sub-population of H275Y mutant was also detectable (0.08%). The corresponding viral isolate (05-01-2011—CM2 in Table 1) contained 99.9% of WT virus and was susceptible to oseltamivir (IC50=0.77 nM ± 0.02), zanamivir (IC50=0.15 nM ± 0.02), and peramivir (IC50=0.05 nM ± 0.01). Notably, the H275Y mutation could not be detected by conventional RT-PCR and Sanger sequencing in the original sample. A second NPA collected on January 10 (day 6) also demonstrated a predominance of the WT population (99.8%). However, the proportion of the H275Y mutant detected in NPAs collected on January 17, 20, and 28 increased to 96.9, 95.9, and 83.5%, respectively, during continuous oseltamivir treatment. Furthermore, the second passage on Madin Darby canine kidney (MDCK) cells of the January 20 viral isolate (20-01-2011-CM2 in Table A4-1) resulted in 100% H275Y mutant population compared with 95.4% from the primary culture recovered from A549 and Mink lung cells. This viral isolate exhibited an IC50 value of 556.75 nM ± 61.32 for oseltamivir, 0.22 nM ± 0.01 for zanamivir, and 34.81 nM ± 5.77 for peramivir, which indicates a resistance phenotype to oseltamivir and peramivir. Antiviral therapy was changed to zanamivir on February 1st. The February 8 sample contained a predominance of 88.5% of H275Y mutant virus, whereas the last NPA collected on February 17 was negative for A(H1N1)pdm09 by RT-PCR.

A number of the primary specimens (January 5, 10, 17, 20, 28, and February 8; corresponding to samples 1–6 in Figure A4-1A) for which M-RTPCR product could be generated, as well as the viral isolates, were subjected to deep sequencing to better evaluate the genetic diversity of the viral population, including the presence of drug-resistant mutants. Based on the average depth of coverage across each of the virus segments, we highlighted codons represented by at least 2% of the sequence reads covering each position (Table S1). This percentage is conservative enough that, even in low-coverage areas, it excludes potential sequence and PCR errors.

The positions on the NA and NS1 proteins that display evidence for the presence of minor variants at a frequency of 2% or above in more than one of the samples are shown in Figure A4-2. Similar patterns are observed for all other proteins (Table S1). Over time, the ratios of the minor variants to the dominant codon remain relatively stable, except for NA position 275 where a shift of H to Y is apparent on 17 January 2011. The ratios are similar to the ones observed in the real-time discriminatory RT-PCR assays for each of the samples tested (Table A4-1), although values across both assays are not identical. No other position on the NA protein appears to co-vary with the 275Y variant. The same pattern is observed in the culture isolates (05-01-2011-CM2 and 20-01-2011-CM1 in Table S2). However, position 153 in NS1 displays a similar switch, although involving a synonymous mutation (from codon GAG to GAA, for E (glutamic acid)). Hence, the sample from the original infection contained a drug-associated minor variant prior to the onset of drug treatment, and this minor variant differed from the dominant strain by only two nucleotide positions. Due to drug-associated selection pressure, this minor variant eventually became dominant in the host. The variant codons observed at the other positions are also possibly representative of other minor variants in the original virus population but, as they remained minor members of the viral population, they are unlikely to have a selective advantage.

An illustration of the results from a longitudinal study of variant codon prevalence across multiple time-points in an infected immunocompromised child


Longitudinal study of variant codon prevalence across multiple time-points in an infected immunocompromised child. Ratios of major and minor codons are represented at each position where the variant codons appear in more than 2% of the deep sequence data (more...)

Evidence for Transmission of Drug-Resistant Viruses in Household (Study 2)

In a separate study, we observed a similar phenomenon where oseltamivir resistance emerged quickly in the household contact (father) of an index case (son). Both family members were started on oseltamivir on the same day (Figure A4-1B) i.e., twice a day treatment for the son and once a day prophylaxis for the father. The latter developed influenza-like symptoms 24 hours after drug treatment was begun. Such a rapid clinical presentation suggests that he was already infected at the time prophylaxis was initiated, and that drug-resistant viruses were most likely already present.

We characterized the genetic diversity of the virus populations in both individuals by deep sequencing. An example for the HA and NA genes where most of the variants seen in the son are also observed in the father is shown in Figure A4-3. While the dominant viruses are drug-sensitive in the son and drug-resistant in the father, apparent by the switch from H275 to 275Y, it is striking that a minor population of viruses in the son already carries the drug resistance mutation; minor drug resistant variant residue 275Y is present in more than 2.4% of the reads in the son (which was not detected by conventional RT-PCR and Sanger sequencing). Hence, it is likely that viruses carrying this mutation were transmitted to the father along with drug sensitive viruses, and became dominant in that individual following selection associated with a subtherapeutic (prophylactic) dose of oseltamivir.

An illustration of results of a transmission study of variant codon prevalence compared between son and father specimens


Transmission study of variant codon prevalence compared between son and father specimens. Ratios of major and minor codons are represented at each position of the neuraminidase (NA) and hemagglutinin (HA) where the variant codons appear in more than 2% (more...)

Also of note was that the same minor variants were found in both the father and the son at 60 residue positions across all 10 viral proteins (Table S3). We estimate that there were 8 days of replication in the father from the time he was possibly infected by the son (assuming it occurred 24 hours before any symptoms) to the time the specimen was collected. Over that time, variant representation could have fluctuated such that the set of 60 variants seen in both samples is likely to underestimate the true number. While the number of conserved variants points to possible transmission, and the probability that the same variants could appear in both the son and the father by chance alone is extremely low, we do not have other potential contacts or index cases to test in order to confirm this observation.


The most striking observation from both of these studies is that the mutation most commonly associated with resistance to oseltamivir (H275Y) is present in the viral population of some individuals prior to the onset of drug treatment. In addition, this minor drug-resistant population could not be revealed by conventional methods such as phenotypic resistance tests and Sanger sequencing. This observation is important for a number of reasons. First, the prior existence of Y275 means that the selection for drug resistance will proceed much more rapidly following the onset of drug selection pressure than if only wild-type viruses are present in the population, as there is no waiting time for the correct mutation to appear (Bonhoeffer and Nowak, 1997). Further, that the Y275 mutation is present in untreated hosts indicates that this mutation is not strongly deleterious in the absence of oseltamivir, and likely does not need compensatory mutations to enable its fixation (Hamelin et al., 2010; Memoli et al., 2010; Seibert et al., 2010). Indeed, in both cases studied here, we observed no amino acid changes that were fixed concordant with Y275, and only a single synonymous mutation (in NS1) in the case of the immunocomprised child. In these circumstances, the pre-existence of Y275 means that oseltamivir resistance will likely spread rapidly as soon as there is drug selection pressure, especially in immunocompromised individuals and when suboptimal antiviral dosage is used.

If the Y275 mutation is present in individual hosts prior to the onset of drug treatment then it is also likely to have been transmitted between individuals as a minor variant. This in turn suggests that there may not often be a severe population bottleneck during the inter-host transmission of influenza virus. Indeed, mixed infections of multiple variants of influenza virus have been observed in both natural human infections (Ghedin et al., 2009; Ghedin et al., 2011; Pajak et al., 2011) and experimental animal infections (Murcia et al., 2010; Murcia et al., 2012), and hence may be commonplace. Co-infection with major and minor variants, captured by deep sequencing, was also observed during the course of human rhinovirus infections (Cordey et al., 2010), indicating that this phenomenon is not unique to influenza. In contrast, sequencing studies of HIV suggest that a small number of viral particles initiate infection, such that most variants are produced following replication within the newly infected host (Keele et al., 2008).

Such transmission of multiple variants is most clearly documented in the son-father case, where perhaps 60 mutational variants are passed between these two individuals, one of which confers oseltamivir resistance. However, the availability of only short sequence reads makes it impossible to determine the exact number of distinct viral haplotypes these correspond to. In addition, our sampling protocol in the son-father transmission case dictates that we cannot exclude that there was rapid selection of oseltamivir resistance in the son after we sampled his virus population, such that a majority Y275 population was in fact transmitted to the father. However, this would entail extremely rapid selection for resistance and does not change the central observation that multiple variants are transmitted between hosts as both H275 and Y275 are found in the father.

That the Y275 mutation is present in the son prior to oseltamivir treatment and so soon after symptom onset suggests that this resistance mutation was also present in the viral population initially transmitted to the son. Similarly, the presence of Y275 in the immunocomprised child suggests that this mutation may have been transmitted to the child in a mixed infection containing both drug-sensitive and -resistant mutations, although it cannot be excluded that the variant appeared de novo. If Y275 is indeed present in the founding population in both individuals then it is possible that this mutation is present as a low frequency variant in many individuals infected with A(H1N1)pdm09, and that its presence reflects the combined action of past selection for drug resistance in patients receiving oseltamivir, incomplete reversion to the wild-type H275 mutation in patients that are not on the drug, and a lack of strongly deleterious fitness effects in the absence of drug. The large-scale ultra-deep sequencing of additional A(H1N1)pdm09 patients who have not received oseltamivir will clearly be central to answering this question.

Next generation ultra-deep sequencing of intra-host viral populations such as that undertaken here promises to transform our understanding of the evolution of drug resistance in acute viral infections, allowing the dissection of the mutational spectrum at an unprecedented level of precision. Indeed, it is striking that in the two cases conventional RT-PCR failed to detect the presence of oseltamivir resistance even though Y275 was present in the viral population. However, despite its undoubted potential, ultra-deep sequencing also comes with a number of inherent analytical difficulties. First, because the sequencing protocol leads to the generation of short sequence reads, nucleotide positions cannot be linked either within or among individual genes except if they are close enough to appear on the same sequence read, or if they have the same pattern of prevalence. More fundamentally, it is critical to ensure that minor genetic variants are not the result of PCR/sequencing artefacts. Amplification leads to the well-known problem of “PCR duplicates,” sometimes resulting in severe distortion to the observed proportions of true variant subpopulations and the possible creation of false variant sequences through PCR errors. To address these problems, each specimen from our study was amplified in four independent reactions using different barcodes, allowing us to track amplification products and their respective sequence reads. Future work will employ a simpler and more cost-effective approach using modified primers that include unique tags for each template (Jabara et al., 2011).


  • Baz M, Abed Y, Papenburg J, Bouhy X, Hamelin ME, Boivin G. Emergence of oseltamivir-resistant pandemic H1N1 virus during prophylaxis. New England Journal of Medicine. 2009;361(23):2296–2297. [PubMed: 19907034]
  • Bonhoeffer S, Nowak MA. Pre-existence and emergence of drug resistance in HIV-1 infection. Proceedings: Biological Sciences. 1997;264(1382):631–637. [PMC free article: PMC1688415] [PubMed: 9178534]
  • Cordey S, Junier T, Gerlach D, Gobbini F, Farinelli L, Zdobnov EM, Winther B, Tapparel C, Kaiser L. Rhinovirus genome evolution during experimental human infection. PloS One. 2010;5(5):e10588. [PMC free article: PMC2868056] [PubMed: 20485673]
  • Djikeng A, Halpin R, Kuzmickas R, Depasse J, Feldblyum J, Sengamalay N, Afonso C, Zhang X, Anderson NG, Ghedin E, Spiro DJ. Viral genome sequencing by random priming methods. BMC Genomics. 2008;9:5. [PMC free article: PMC2254600] [PubMed: 18179705]
  • Garten RJ, Davis CT, Russell CA, Shu B, Lindstrom S, Balish A, Sessions WM, Xu X, Skepner E, Deyde V, Okomo-Adhiambo M, Gubareva L, Barnes J, Smith CB, Emery SL, Hillman MJ, Rivailler P, Smagala J, de Graaf M, Burke DF, Fouchier RA, Pappas C, Alpuche-Aranda CM, Lopez-Gatell H, Olivera H, Lopez I, Myers CA, Faix D, Blair PJ, Yu C, Keene KM, Dotson PD Jr, Boxrud D, Sambol AR, Abid SH, St George K, Bannerman T, Moore AL, Stringer DJ, Blevins P, Demmler-Harrison GJ, Ginsberg M, Kriner P, Waterman S, Smole S, Guevara HF, Belongia EA, Clark PA, Beatrice ST, Donis R, Katz J, Finelli L, Bridges CB, Shaw M, Jernigan DB, Uyeki TM, Smith DJ, Klimov AI, Cox NJ. Antigenic and genetic characteristics of swine-origin 2009 A(H1N1) influenza viruses circulating in humans. Science. 2009;325(5937):197–201. [PMC free article: PMC3250984] [PubMed: 19465683]
  • Ghedin E, Fitch A, Boyne A, Griesemer S, DePasse J, Bera J, Zhang X, Halpin RA, Smit M, Jennings L, St George K, Holmes EC, Spiro DJ. Mixed infection and the genesis of influenza virus diversity. Journal of Virology. 2009;83(17):8832–8841. [PMC free article: PMC2738154] [PubMed: 19553313]
  • Ghedin E, Laplante J, DePasse J, Wentworth DE, Santos RP, Lepow ML, Porter J, Stellrecht K, Lin X, Operario D, Griesemer S, Fitch A, Halpin RA, Stockwell TB, Spiro DJ, Holmes EC, St George K. Deep sequencing reveals mixed infection with 2009 pandemic influenza A (H1N1) virus strains and the emergence of oseltamivir resistance. Journal of Infectious Diseases. 2011;203(2):168–174. [PMC free article: PMC3071067] [PubMed: 21288815]
  • Hamelin ME, Baz M, Abed Y, Couture C, Joubert P, Beaulieu E, Bellerose N, Plante M, Mallett C, Schumer G, Kobinger GP, Boivin G. Oseltamivir-resistant pandemic A/H1N1 virus is as virulent as its wild-type counterpart in mice and ferrets. PLoS Pathogens. 2010;6(7):e1001015. [PMC free article: PMC2908621] [PubMed: 20661429]
  • Jabara CB, Jones CD, Roach J, Anderson JA, Swanstrom R. Accurate sampling and deep sequencing of the HIV-1 protease gene using a primer id. Proceedings of the National Academy of Sciences of the United States of America. 2011;108(50):20166–20171. [PMC free article: PMC3250168] [PubMed: 22135472]
  • Keele BF, Giorgi EE, Salazar-Gonzalez JF, Decker JM, Pham KT, Salazar MG, Sun C, Grayson T, Wang S, Li H, Wei X, Jiang C, Kirchherr JL, Gao F, Anderson JA, Ping LH, Swanstrom R, Tomaras GD, Blattner WA, Goepfert PA, Kilby JM, Saag MS, Delwart EL, Busch MP, Cohen MS, Montefiori DC, Haynes BF, Gaschen B, Athreya GS, Lee HY, Wood N, Seoighe C, Perelson AS, Bhattacharya T, Korber BT, Hahn BH, Shaw GM. Identification and characterization of transmitted and early founder virus envelopes in primary HIV-1 infection. Proceedings of the National Academy of Sciences of the United States of America. 2008;105(21):7552–7557. [PMC free article: PMC2387184] [PubMed: 18490657]
  • Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology. 2009;10(3):R25. [PMC free article: PMC2690996] [PubMed: 19261174]
  • Memoli MJ, Hrabal RJ, Hassantoufighi A, Jagger BW, Sheng ZM, Eichelberger MC, Taubenberger JK. Rapid selection of a transmissible multidrug-resistant influenza a/h3n2 virus in an immunocompromised host. Journal of Infectious Diseases. 2010;201(9):1397–1403. [PMC free article: PMC2851491] [PubMed: 20350163]
  • Minoche AE, Dohm JC, Himmelbauer H. Evaluation of genomic high-throughput sequencing data generated on illumina hiseq and genome analyzer systems. Genome Biology. 2011;12(11):R112. [PMC free article: PMC3334598] [PubMed: 22067484]
  • Mishin VP, Hayden FG, Gubareva LV. Susceptibilities of antiviral-resistant influenza viruses to novel neuraminidase inhibitors. Antimicrobial Agents and Chemotherapy. 2005;49(11):4515–4520. [PMC free article: PMC1280118] [PubMed: 16251290]
  • Murcia PR, Baillie GJ, Daly J, Elton D, Jervis C, Mumford JA, Newton R, Parrish CR, Hoelzer K, Dougan G, Parkhill J, Lennard N, Ormond D, Moule S, Whitwham A, McCauley JW, McKinley TJ, Holmes EC, Grenfell BT, Wood JL. Intra- and interhost evolutionary dynamics of equine influenza virus. Journal of Virology. 2010;84(14):6943–6954. [PMC free article: PMC2898244] [PubMed: 20444896]
  • Murcia PR, Hughes J, Battista P, Lloyd L, Baillie GJ, Ramirez-Gonzalez RH, Ormond D, Oliver K, Elton D, Mumford JA, Caccamo M, Kellam P, Grenfell BT, Holmes EC, Wood JLN. Evolution of an eurasian avian-like influenza virus in native and vaccinated pigs. PLoS Pathogens. 2012;8(5):e1002730. [PMC free article: PMC3364949] [PubMed: 22693449]
  • Nakamura K, Oshima T, Morimoto T, Ikeda S, Yoshikawa H, Shiwa Y, Ishikawa S, Linak MC, Hirai A, Takahashi H, Altaf-Ul-Amin M, Ogasawara N, Kanaya S. Sequence-specific error profile of illumina sequencers. Nucleic Acids Research. 2011;39(13):e90. [PMC free article: PMC3141275] [PubMed: 21576222]
  • Pajak B, Stefanska I, Lepek K, Donevski S, Romanowska M, Szeliga M, Brydak LB, Szewczyk B, Kucharczyk K. Rapid differentiation of mixed influenza A/H1N1 virus infections with seasonal and pandemic variants by multitemperature single-stranded conformational polymorphism analysis. Journal of Clinical Microbiology. 2011;49(6):2216–2221. [PMC free article: PMC3122755] [PubMed: 21471335]
  • Pizzorno A, Abed Y, Boivin G. Influenza drug resistance. Seminars in Respiratory and Critical Care Medicine. 2011;32(4):409–422. [PubMed: 21858746]
  • Pizzorno A, Bouhy X, Abed Y, Boivin G. Generation and characterization of recombinant pandemic influenza A(H1N1) viruses resistant to neuraminidase inhibitors. Journal of Infectious Diseases. 2011;203(1):25–31. [PMC free article: PMC3086433] [PubMed: 21148493]
  • Potier M, Mameli L, Belisle M, Dallaire L, Melancon SB. Fluorometric assay of neuraminidase with a sodium (4-methylumbelliferyl-alpha-D-N-acetylneuraminate) substrate. Analytical Biochemistry. 1979;94(2):287–296. [PubMed: 464297]
  • Seibert CW, Kaminski M, Philipp J, Rubbenstroth D, Albrecht RA, Schwalm F, Stertz S, Medina RA, Kochs G, Garcia-Sastre A, Staeheli P, Palese P. Oseltamivir-resistant variants of the 2009 pandemic H1N1 influenza a virus are not attenuated in the guinea pig and ferret transmission models. Journal of Virology. 2010;84(21):11219–11226. [PMC free article: PMC2953187] [PubMed: 20739532]
  • Semret M, Fenn S, Charest H, McDonald J, Frenette C, Loo V. A real-time RT-PCR assay for detection of influenza H1N1 (swine-type) and other respiratory viruses; Paper presented at 26th International Congress of Chemotherapy and Infection; Toronto, ON. Jun 18–21, 2009.
  • van der Vries E, Jonges M, Herfst S, Maaskant J, Van der Linden A, Guldemeester J, Aron GI, Bestebroer TM, Koopmans M, Meijer A, Fouchier RA, Osterhaus AD, Boucher CA, Schutten M. Evaluation of a rapid molecular algorithm for detection of pandemic influenza A (H1N1) 2009 virus and screening for a key oseltamivir resistance (H275Y) substitution in neuraminidase. Journal of Clinical Virology. 2010;47(1):34–37. [PubMed: 19857993]
  • WHO. Weekly update on oseltamivir resistance in influenza A(H1N1)2009 viruses. 2011.
  • Zhou B, Donnelly ME, Scholes DT, St George K, Hatta M, Kawaoka Y, Wentworth DE. Single-reaction genomic amplification accelerates sequencing and vaccine production for classical and swine origin human influenza a viruses. Journal of Virology. 2009;83(19):10309–10313. [PMC free article: PMC2748056] [PubMed: 19605485]


,14 ,15 ,16,17 and 14,18.

14 Argonne National Laboratory, Institute for Genomic and Systems Biology, Argonne, IL, USA.
15 Department of Surgery, The University of Chicago Medical Center, Chicago, IL, USA.
16 Department of Civil, Architectural and Environmental Engineering, The University of Texas at Austin, Austin, TX, USA.
17 Department of Civil Engineering, The University of Toronto, Toronto, ON, Canada.
18 Department of Ecology and Evolution, The University of Chicago, Chicago, IL, USA.

Milestones in Home and Hospital Microbiome Research

The populations of developed nations spend approximately 90 percent of their time indoors (Moschandreas, 1981), leading scientists and the public alike to take an interest in the microbial communities that share these spaces with us. This is especially true in healthcare environments, where hospital-acquired infections (HAIs) have long been among the leading causes of patient deaths (Anderson and Smith, 2005; Groseclose et al., 2004; Hall-Baker et al., 2010; Klevens et al., 2007). The first study of airborne pathogens in a hospital can be attributed to Bourdillon and Colebrook, who, in 1946, investigated the concentration of bacteria present in the air of a surgical changing room (Bourdillon and Colebrook, 1946). Their findings, and the findings of similar studies published over the following two decades, revealed levels of airborne bacteria that were cause for concern (Blowers and Wallace, 1960; Colebrook and Cawston, 1948; Cvjetanović, 1957; Greene et al., 1962a,b; Warner and Glassco, 1963) and prompted a rethinking of ventilation designs for hospitals.

The air-sampling techniques developed for hospitals were soon applied to other indoor spaces including subway trains (Williams and Hirch, 1950), classrooms (Williams et al., 1956), movie theaters (Cvjetanović, 1957), and apartments (Simard et al., 1983). Articles by Finch and colleagues in 1978 and Scott and colleagues in 1982 complemented these air-based studies with the first characterizations of bacteria living on bathroom and kitchen surfaces (Finch et al., 1978; Scott et al., 1982). The larger of the two studies, conducted by Scott and colleagues, examined 60 locations in 251 homes, and agreed well with the conclusions from an earlier study by Finch and colleagues of 21 homes that the dominant species on the studied surfaces were enterobacteria, Pseudomonads, micrococci, Bacillus, and Aeromonas hydrophila, with a lower incidence of Salmonella, Staphylococcus aureus, and Bacillus cereus.

Current literature regarding the relationship between the indoor environment and humans primarily explores the development of fungal contamination with damp surfaces (Hyvarinen et al., 2002; Jaffal et al., 1997; Lignell et al., 2008; Nevalainen and Seuri, 2005), the role of hygiene in removing microbial communities (Bright et al., 2009; Grice and Segre, 2011), and the length of time microbes can survive on surfaces (Kramer et al., 2006). There have been a number of studies to explore the microbial diversity of communities associated with dust (Pitkäranta et al., 2008; Rintala et al., 2008; Sebastian and Larsson, 2003) and air (Huttunen et al., 2008; Tringe et al., 2008). One study that investigated temporal succession of microbial communities performed on indoor dust found seasonal patterns, and these were building specific, probably as a result of skin cells shed from inhabitants within the buildings (Rintala et al., 2008). These existing studies demonstrate fundamental principles regarding experimental design, explicitly regarding the types of environmental conditions that need to be monitored (e.g., surface material, moisture, HVAC system), and the observation that the architectural design of an indoor space influences the potential community structure and hence human health (Guenther and Vittori, 2008). The influence of air ventilation and the number of people in a space must be explored with regard to the impact on microbial community structure (Hospodsky et al., 2012; Kembel et al., 2012; Qian and Li, 2010; Qian et al., 2012). Additionally, the variability associated with body sites (Costello et al., 2009; Fierer et al., 2010; Grice et al., 2009) will have a major impact on the interpretation of the analyses, because different body sites interact with different surfaces differently. The most diverse skin sites are the driest areas and hence are less likely to be transferred to a surface with sebaceous exudates. This will affect the time that the microbial community maintains structural cohesion with reference to relative abundance of members on a surface (Kramer et al., 2006).

Studies of indoor microbiology are highly relevant in today’s age, because concern for protecting ourselves from microbial pathogens is ever-present. Studies in this field do much to put this threat in perspective by identifying incorrect preconceptions. For instance, using cleaning products containing the antibacterial agent triclosan over the course of a year does not result in an increase in antimicrobial drug-resistant bacteria in homes (Aiello et al., 2005; Cole et al., 2003). Several research groups have studied the beneficial effect of childhood exposure to dirty environments, which is particularly pronounced in the inverse correlation between children who live on farms and their reduced likelihood of later developing asthma and other respiratory problems (Adler et al., 2005; Alfvén et al., 2006; Klintberg et al., 2001; Leynaert et al., 2001; Merchant et al., 2005; Remes et al., 2005; Riedler et al., 2001; Schram et al., 2005). Hospitals have found that by opening the windows in patient rooms, the percentage of potentially pathogenic microbes in the air is significantly reduced (Escombe et al., 2007; Kembel et al., 2012). These are just a few examples of a larger trend toward questioning the culture of cleanliness, or as its come to be called—the hygiene hypothesis (Bloomfield et al., 2006; Martinez, 2001; Rook, 2009; Rook and Stanford, 1998; Yazdanbakhsh and Matricardi, 2004). The study of pathogens in indoor environments is valuable, but perhaps even more valuable is gaining a better understanding of the competition between pathogenic and non-pathogenic microorganisms and how we might shift that balance in our favor.

A New Scientific Community

The expansion of indoor microbiome research from healthcare environments into homes and offices has been driven in large part by funding initiatives by the Alfred P. Sloan Foundation. This private agency stipulates a high degree of collaboration between its grantees and as a result has brought about the development of Internet portals designed to enable any indoor microbiome researcher to exchange raw data and results with other scientists as well as with reporters and the general public. One such nexus, the Microbiology of the Built Environment Network website,19 tracks investigators, projects, publications, computational resources, protocols, standards, press releases, social media, conferences, and workshops relating to indoor microbiology. A community data archive is hosted by the Microbiome of the Built Environment Data Analysis Core (MoBEDAC), which integrates with data analysis tools including visualization and analysis of microbial population structures (VAMPS), quantitative insights into microbial ecology (QIIME), meta-genome rapid annotation using system technology (MG-RAST), and FungiDB (Caporaso et al., 2010; Meyer et al., 2008; Stajich et al., 2011).

While provides general information applicable to a wide range of indoor microbiology projects, working groups for specialists in this field have also emerged. The Berkeley Indoor Microbial Ecology Research Consortium (BIMERC) focuses on identifying the source populations and human influences on the microbial components of indoor air. Discovering the mechanisms and rates with which microbial communities spread throughout healthcare facilities is the goal of the Hospital Microbiome Consortium.20 The Biology and the Built Environment Center at the University of Oregon21 has begun training students in the investigation of how architecture can influence the structure of indoor microbiomes.

Indoor Microbiology Without Culturing?

Early studies of the indoor microbiome relied heavily on culture-based methods: agar plates were commonly pressed directly against the surface of interest, and the resultant colonies were counted and microscopically characterized in order to establish quantitative and taxonomic abundance metrics. However, with the introduction of next-generation sequencing technologies, rapid, high-throughput characterization of taxonomic marker genes (e.g., 16S/18S rRNA) and whole genomic DNA from environmental samples is now financially viable, altering the landscape of tools available to the researcher interested in the indoor microbiome.

The Pros and Cons of High-Throughput Sequencing

Since 2007, high-throughput sequencing technologies offered by Illumina, Roche, and ABI have enabled researchers to quickly and inexpensively profile the relative abundances of taxonomic groups in samples of environmental microbial communities. This approach to microbial ecology offers three advantages over culture-based methods. First, uncultivable species can be identified, thereby providing a more complete characterization of the microbial community. Second, organisms can be systematically classified by computer-aided alignment of DNA sequences to reference genes and genomes. And lastly, this entire process is relatively easily scalable from tens of samples to tens of thousands of samples.

Classical culturing, however, still yields important information that cannot be attained through high-throughput DNA sequencing of ribosomal genes. Measuring the absolute abundance of colony-forming units is very straightforward when working with agar press-plates, but it is a difficult metric to attain from nanogram quantities of DNA extracted from environmental samples. Growth of colonies on plates also provides concrete evidence that the cell taken from the environment was viable. In addition, sequencing ribosomal genes does not provide information on whether the detected microbial species harbor genetic cassettes encoding antibiotic-resistance genes, an important factor in evaluating the pathogenicity of microorganisms. The ability to retain bacterial colonies for further studies is a third advantage of plate-based culturing, allowing one to subject any detected species to thorough examination. In light of these considerations, a study seeking to characterize both currently uncultivable microorganisms and antibiotic-resistant human pathogens would need to draw on both classical and next-generation microbial community analysis techniques.

Microarrays for Rapid Identification of Antibiotic Resistance

A new approach to rapidly screening microbial communities for multiple antibiotic resistance markers has recently been developed by Taitt et al. (2012). Their Antimicrobial Resistance Determinant Microarray chips are designed to detect more than 250 resistance genes covering 12 classes of antibiotics and have been shown to be compatible with low concentrations of DNA extracted from swabs (personal communication). Advances such as this are quickly closing the gap between the high-throughput sequencing technologies and classical culture-based phenotyping.

Sample and Metadata Collection

The selection of sampling locations and environmental parameters to monitor is fundamental to any microbiome project. Balancing the comprehensiveness of an investigation against logistical constraints has led to studies that examine highly specific aspects of the indoor microbiome (Hilton and Austin, 2000; Kelley et al., 2004; Kembel et al., 2012; Kopperud et al., 2004; Krogulski and Szczotko, 2011; Tang, 2009; Wiener-Well et al., 2011). As might be expected, the sampling locations that these studies chose are those that humans most come into contact with on a daily basis. A list of these locations for homes (Table A5-1) and hospitals (Table A5-2) are provided below.

TABLE A5-1. Home High-Touch Surfaces and Bacterial Reservoirs.


Home High-Touch Surfaces and Bacterial Reservoirs.

TABLE A5-2. Hospital High-Touch Surfaces and Bacterial Reservoirs.


Hospital High-Touch Surfaces and Bacterial Reservoirs.


The air within built environments is arguably one of the most important mediums to consider when investigating the interaction between humans and the indoor microbiome. Air inside of buildings is biologically distinct from outdoor air, containing a greater proportion of human-associated microflora shed by its occupants (Bouillard et al., 2005; Clark, 2009; Fox et al., 2010; Hospodsky et al., 2012; Korves et al., 2012; Noble et al., 1976; Noris et al., 2011; Qian et al., 2012; Rintala et al., 2008; Täubel et al., 2009). Low air exchange rates and recycling of air for conditioning purposes can exacerbate the negative effects of aerosolized microorganisms that have effects on human health due to pathogenic, toxic, and/or allergic properties (D’Amato et al., 2005; Monto, 2002; Peccia et al., 2008; Pope et al., 1993).

Airborne particles are commonly collected using one of four methods: settle plates, impactors, impingers, and filtration, each of which offer differing advantages and efficiencies (Fahlgren et al., 2010; Griffin et al., 2010; Morrow et al., 2012). Settle plates offer a silent and inexpensive option for enumerating colony-forming units deposited by gravity onto agar petri dishes. Settle plates will preferentially sample larger particles, because they are more likely to be deposited by gravitational settling. Impactors increase the rate and control of particle deposition by accelerating air in an arc relative to the agar surface, utilizing centrifugal forces to select for a specific range of particle masses. The same design principle is used by impingers, where the deposition media is liquid rather than solid state. However, the mechanical stresses introduced by impactors and impingers can rupture cellular membranes, thereby reducing culturing viability. Filtration of air through a porous membrane is less mechanically stressful to cells, but may result in desiccation. Although active samplers—impactors, impingers, and filtering devices—offer the added benefit of providing a quantitative accounting of the volume of air sampled, the noise generated by the unit’s pump or fan may preclude their use in occupied buildings. Filters from central HVAC systems have also been used in lieu of portable sampling units (Bonetta et al., 2009; Drudge et al., 2011; Farnsworth et al., 2006; Hospodsky et al., 2012; Korves et al., 2012; Noris et al., 2009, 2011; Stanley et al., 2008).


Municipal water supplies have long been known to contain biofilm-forming and planktonic microorganisms, including Mycobacterium and Legionella (Angenent, 2005; Boe-Hansen et al., 2002; du Moulin et al., 1988; Embil et al., 1997; Falkinham et al., 2001; Le Dantec et al., 2002; Lee et al., 1988; Leoni et al., 1999, 2001; Thomas et al., 2006; Vaerewijck et al., 2001, 2005). Tap water therefore may be an important source of microbes in built environments. Cell counts can be attained visually by microscopy using non-specific DNA stains such as SYBR Gold, or in an automated fashion with flow cytometry. Both methods can be adapted to fluorescent in situ hybridization (FISH) analysis, in which taxon-specific fluorescent probes replace or complement non-specific DNA stains. The particle size of aggregated cells can also be used to determine if cells are biofilm-originating or planktonic. Because biofilms may form on faucet heads, sampling strategies may opt to collect water samples as soon as water begins flowing from the tap (to favor collection of the tap’s biofilms) and/or wait until water has been flowing through the tap for several minutes (to measure systemic contaminants). Collection of both hot and cold tap water samples is crucial, because the different water temperatures can have an effect on the microorganisms that are able to persist in these water systems.

External Factors Influencing the Indoor Microbiome

Microbial community structure is highly habitat dependent. Therefore, the collection of metadata is fundamental to any project seeking to characterize the microbial community composition and structure. For constantly fluctuating parameters such as temperature, relative humidity, and brightness, one might consider recording not only the value at the time of sampling, but also the recent highs, lows, and averages for the sampled location. Table A5-3 lists parameters that may have an influence on microbial populations and communities. Many of these factors, such as temperature and humidity, are constantly changing, therefore measuring these variables continuously in order to observe maximums, minimums, and averages can be useful in generating a comprehensive site analysis.

TABLE A5-3. Environmental Parameters.


Environmental Parameters.

To facilitate the adoption of a consistent metadata ontology for these types of measurements, a Minimum Information about any Sequence (MIxS) standard specific to the built environment (MIxS-BE) was presented to the Genomics Standards Consortium (GSC) on March 7, 2012 (Gilbert et al., 2012). At the time of this writing, the MIxS-BE standard is available as a working draft on the Microbiology of the Built Environment website ( in order to solicit feedback and provide direction to early adopters. The MIxS GSC standards have been integral to generating comparable results among different research centers, with the existing GSC standards for genomics (MIGS), metagenomics (MIMS), and genetic markers (MIMARKS) (Field et al., 2008; Kottmann et al., 2008; Yilmaz et al., 2011) having enabled tens of thousands of environmental samples from dozens of laboratories and hundreds of geographic locations to be included in the Earth Microbiome Project (Gilbert et al., 2010a,b, 2011).

Automated Monitoring

Many of the parameters in Table A5-3 can be automatically measured and recorded at regular intervals by specialized data loggers designed for this purpose. Temperature and humidity monitors are relatively inexpensive, whereas devices for assessing air exchange rate, fraction of recirculated air, HVAC system flows, and occupancy and activities to a high level of precision can be a significant portion of a research budget and can require substantial post-processing. In selecting monitoring equipment, it is also important to consider the options of using battery versus electrical outlet power, and whether to store data locally on memory cards versus transmitting readings off site. In hospital settings, it is often necessary to obtain permission from technical administrators before installing wireless transmitters, because such devices may interfere with sensitive medical equipment.

Personnel- and asset-tracking infrastructure can also be of immense value to a study of microbial communities in an environment where human movement is hypothesized to be a driving factor in the introduction of new microbial species to surfaces and airborne particles. Through a combination of uniquely identifiable radio-frequency identification (RFID) tags worn by hospital occupants and RFID sensors placed throughout the building, this system is able to continuously monitor the location of personnel, thereby providing time-stamped information on person-to-person and person-to-room interactions. From this data, one can examine the connection between movement of staff between rooms and the movement of bacterial populations between rooms. Furthermore, by combining the observed number of occupants per room with the air exchange rate, it is possible to estimate the CO2 and airborne microbe concentrations at each point in time. RFID systems are commonly used to track equipment and can therefore also provide information regarding which equipment is shared among patients, such as IVs and dialysis machines. These data enable researchers to observe not only microbial communities over time, but also the influence of human interaction with those communities.

Seasonal changes in outdoor humidity and temperature have previously been found to influence the composition of microbial communities in indoor environments (Augustowska and Dutkiewicz, 2006; Eber et al., 2011; Kaarakainen et al., 2009; Park et al., 2000; Pitkäranta et al., 2008; Rintala et al., 2008, 2012; Yamada, 2007). Therefore, the recording of meteorological conditions is an important aspect of indoor microbiome studies. This can be accomplished by retrieving publicly available National Oceanic and Atmospheric Administration records through the National Climatic Data Center website at This collection offers hourly measurements from a network of temperature, humidity, pressure, wind velocity, and precipitation sensors gathered from automated weather monitoring stations throughout the United States.

Special Considerations

Effect of Cleaning

Cleaning practices in indoor environments are inherently designed to affect the resident microbiota; therefore, it is important to take note of the strategies used to disinfect the environments under study and the time-points at which cleaning regimens were conducted. In previous studies, the effects of cleaning were found to be highly dependent upon the cleaning products used (Barker et al., 2004; Exner et al., 2004; Josephson et al., 1997; Marshall et al., 2012; Rusin et al., 1998; Rutala et al., 2000; Scott et al., 1984): antimicrobial agents, bleach, ethanol, peroxide, and Lysol were much more effective at sterilizing a surface than surfactants, detergents, vinegar, ammonia, or baking soda. Although there has been speculation that households using antimicrobial cleaning products may select for antibiotic-resistant bacteria, randomized studies investigating this hypothesis did not observe differences in bacterial population structure or antibiotic resistance in response to antimicrobial cleaning products (Aiello et al., 2005; Cole et al., 2003).

Healthcare Facility Sampling

Prior studies have identified numerous hospital-associated pathogens (HAPs) as well as routes of transmission between patients, staff, equipment, surfaces, and recycled air. HAPs that are of particular relevance are coagulase-negative staphylococci, Staphylococcus aureus, Enterococcus species, Candida species, Escherichia coli, Pseudomonas aeruginosa, Klebsiella pneumoniae, Enterobacter species, Acinetobacter baumannii, and Klebsiella oxytoca, which have been previously found to collectively account for 84 percent of HAIs over a 21-month period in 463 hospitals (Hidron et al., 2008). These bacteria have been found on physician’s and nursing staff’s clothing (Babb et al., 1983; Biljan et al., 1993; Loh et al., 2000; Lopez et al., 2009; Perry et al., 2001; Snyder et al., 2008; Treakle et al., 2009; Wiener-Well et al., 2011; Wong et al., 1991; Zachary et al., 2001), cell phones (Akinyemi et al., 2009; Brady et al., 2006, 2009; Datta et al., 2009; Hassoun et al., 2004; Kilic et al., 2009; Ulger et al., 2009), stethoscopes (Marinella et al., 1997; Zachary et al., 2001), computer keyboards (Bures et al., 2000; Doğan et al., 2008), faucet handles (Bures et al., 2000), telemetry leads (Safdar et al., 2012), electronic thermometers (Livornese et al., 1992), blood-pressure cuffs (Myers, 1978), X-ray cassettes (Kim et al., 2012), gels for ultrasound probes (Schabrun et al., 2006), and in the air of patient rooms (Berardi and Leoni, 1993; Fleischer, 2006; Genet et al., 2011; Huang et al., 2006; Sudharsanam et al., 2008).

Patient microflora are one of the most significant drivers of microbial ecology within a hospital room (Bhalla et al., 2004; Drees et al., 2008b). Microorganisms are readily transferred from patients to hospital staff (Bhalla et al., 2004) and to the next occupant of the room after it has been cleaned (Drees et al., 2008a; Huang et al., 2006). Human traffic that enters and leaves multi-specialty medical centers includes patients with active diseases and infections as well as healthy patients undergoing invasive medical and surgical procedures. The inherent risk of cross-contamination from interactions between healthcare workers, patients, and their families presents a major obstacle to protecting patient health using only the practices of isolation and containment. Several studies have found that regular washing of patients’ skin with the bactericide chlorhexidine can reduce the likelihood of the patient acquiring a nosocomial antibiotic-resistant infection (Bleasdale, 2007; Climo et al., 2009; Kassakian et al., 2011; O’Horo et al., 2012; Paulson, 1993; Popovich et al., 2010; Vernon et al., 2006). Diagnostic testing performed by hospital laboratories in the course of patient treatment produces a detailed accounting of specific patient-associated microorganisms that, with institutional review board approval, can be included in study metadata to identify point sources of microbial populations within the larger hospital environment under observation.

Characterization of the Microbial Community

Ribosomal RNA sequencing is a common method to identify the microbial community structure in environmental samples. This approach involves PCR amplifying a variable region of the 16S (bacterial), 18S (eukaryotic), or ITS (fungal) rRNA gene using multifunctional DNA oligos that contain not only a complementary nucleotide sequence for priming the PCR reaction, but also a multiplex barcode for marking amplified sequences with a unique sample-specific DNA sequence and 5′ region encoding base pairs needed by the sequence technology. The amplified, barcoded sequences from multiple samples are pooled together in equimolar concentrations for sequencing and then demultiplexed with computer algorithms based on their barcode sequence. Several software suites are freely available for processing high-throughput sequencing data including QIIME (Caporaso et al., 2010), MG-RAST (Meyer et al., 2008), mother (Schloss et al., 2009), Galaxy (Goecks et al., 2010), HUMAnN (Abubucker et al., 2012), and MEGAN (Huson et al., 2011).

The influence of environmental parameters on microbial populations can be quantified using multivariate non-parametric algorithms (e.g., principal coordinate analysis, principal component analysis, non-metric multidimensional scaling) for community composition and univariate analysis of variance (ANOVA) tests for diversity measures. These statistical tools calculate the percentages of variation that can be explained by individual parameters such as treatment, temperature, building material, adjacent microbiome, and any other environmental characteristic measured in concert with sample collections. Multivariate-crossed analyses are particularly useful in determining if specific combinations of environmental parameters (interactions) have a synergistic effect on population structure or composition. Univariate tests of diversity indices use higher-way ANOVA and are calculated with distribution-free, permutation-based (PERMANOVA) routines (Anderson, 2001). Additionally, following taxonomic characterization of the communities, using the QIIME pipeline (Caporaso et al., 2010), and production of an abundance matrix of operational taxonomic units against experimental condition, community similarity between samples can be represented by calculating a Bray-Curtis similarity matrix and UniFrac distances (Lozupone and Knight, 2005). Non-metric multidimensional scaling can be used to visualize the relationship between the experimental factors and formally tested using a combination of permutation-based PERMANOVA and fully non-parametric ANOSIM tests (Clarke, 1993). The QIIME, MoBEDAC, and VAMPS web servers calculate these metrics as well as facilitate the public release of such data sets. State-of-the-art artificial neural network software developed Larsen and colleagues can be employed to generate models for predicting the development of microbial communities based on the bacterial abundances observed in the study (Larsen et al., 2012). Source-tracking algorithms developed by Knights and colleagues identifies transference of communities from one sampling site to another (Knights et al., 2011). Taken together, these analyses provide insights into the driving factors behind microbial community development.

In addition to examining the relationships between microbial community structure and environmental variables, it is often desirable to compare population structures directly to one another. Such metrics include the diversity of species in a sample (alpha diversity) and the closely related measures of richness and evenness that describe the quantity of species and the range of population sizes (Whittaker, 1960, 1972). Beta diversity takes into account the average alpha diversity and the combined diversity of species across all samples (gamma diversity) to evaluate the presence or absence of a core microbiome commonly shared across a significant subset of samples. This is particularly relevant in the study of disease-causing microorganisms, where it is important to differentiate between systemic populations of opportunistic pathogens that become disease-causing under specific conditions and microbes that are only associated with infections.

When describing the presence of potentially pathogenic microorganisms discovered in surveys of ribosomal sequences, care should be taken to place these results in context. Taxonomic assignment of reads is reliant on reference databases that contain a disproportionate number of sequences from disease-causing microbes, leading many novel operational taxonomic units to phylogenetically ordinate closest to a pathogen with which they may or may not share specific infectivity characteristics. The RDP Classifier (Wang et al., 2007) used in QIIME (Caporaso et al., 2010) and other taxonomic assignment software partially alleviates this issue by providing a confidence score for the assignment of a read into each taxonomic level—domain, kingdom, phylum, class, order, family, genus, and species. However, even correct species-level assignments fail to provide information on the presence or absence of genetic elements that are responsible for many pathogenic and antibiotic-resistant phenotypes.


The ideas presented in this article have been shaped in large part by Hospital Microbiome Consortium ( discussions held at the University of Chicago on June 7, 2012. We also thank the Arthur P. Sloan Foundation for funding the Home Microbiome Project (2011-6-05) and the Hospital Microbiome Consortium Workshop grant (2012-3-25).


  • Abubucker S, Segata N, Goll J, Schubert AM, Izard J, Cantarel BL, Rodriguez-Mueller B, Zucker J, Thiagarajan M, Henrissat B, White O, Kelley ST, Methé B, Schloss PD, Gevers D, Mitreva M, Huttenhower C. Metabolic reconstruction for metagenomic data and its application to the human microbiome. PLoS Computational Biology. 2012;8(6):e1002358. [PMC free article: PMC3374609] [PubMed: 22719234] [Cross Ref]
  • Adler A, Tager I, Quintero DR. Decreased prevalence of asthma among farm-reared children compared with those who are rural but not farm-reared. Journal of Allergy and Clinical Immunology. 2005;115(1):67–73. [PubMed: 15637549] [Cross Ref]
  • Aiello AE, Marshall B, Levy SB, Della-Latta P, Lin SX, Larson E. Antibacterial cleaning products and drug resistance. Emerging Infectious Diseases. 2005;11(10):1565–1570. [PMC free article: PMC3366732] [PubMed: 16318697] [Cross Ref]
  • Akinyemi KO, Atapu AD, Adetona OO, Coker AO. The potential role of mobile phones in the spread of bacterial infections. Journal of Infection in Developing Countries. 2009;3(8):628–632. [PubMed: 19801807]
  • Alfvén T, Braun-Fahrländer C, Brunekreef B, von Mutius E, Riedler J, Scheynius A, van Hage M, Wickman M, Benz MR, Budde J, Michels KB, Schram D, Ublagger E, Wasser M, Pershagen G. PARSIFAL study group. Allergic diseases and atopic sensitization in children related to farming and anthroposophic lifestyle—the PARSIFAL Study. Allergy. 2006;61(4):414–421. [PubMed: 16512802] [Cross Ref]
  • Anderson MJ. A new method for non-parametric multivariate analysis of variance. Austral Ecology. 2001;26(1):32–46. [Cross Ref]
  • Anderson RN, Smith BL. Deaths: Leading causes for 2002. National Vital Statistics Reports: From the Centers for Disease Control and Prevention, National Center for Health Statistics, National Vital Statistics System. 2005;53(17):1–89.
  • Angenent LT. Molecular identification of potential pathogens in water and air of a hospital therapy pool. Proceedings of the National Academy of Sciences of the USA. 2005;102(13):4860–4865. [PMC free article: PMC555732] [PubMed: 15769858] [Cross Ref]
  • Augustowska M, Dutkiewicz J. Variability of airborne microflora in a hospital ward within a period of one year. Annals of Agricultural and Environmental Medicine: AAEM. 2006;13(1):99–106. [PubMed: 16841880]
  • Babb JR, Davies JG, Ayliffe GAJ. Contamination of protective clothing and nurses’ uniforms in an isolation ward. Journal of Hospital Infection. 1983;4(2):149–157. [PubMed: 6195223] [Cross Ref]
  • Barker J, Vipond IB, Bloomfield SF. Effects of cleaning and disinfection in reducing the spread of norovirus contamination via environmental surfaces. Journal of Hospital Infection. 2004;58(1):42–49. [PubMed: 15350713] [Cross Ref]
  • Berardi BM, Leoni E. Indoor air climate and microbiological airborne: Contamination in various hospital areas. Zentralblatt Für Hygiene Und Umweltmedizin = International Journal of Hygiene and Environmental Medicine. 1993;194(4):405–418. [PubMed: 8397689]
  • Bhalla A, Pultz NJ, Gries DM, Ray AJ, Eckstein EC, Aron DC, Donskey CJ. Acquisition of nosocomial pathogens on hands after contact with environmental surfaces near hospitalized patients. Infection Control and Hospital Epidemiology. 2004;25(2):164–167. [PubMed: 14994944] [Cross Ref]
  • Biljan MM, Hart CA, Sunderland D, Manasse PR, Kingsland CR. Multicentre randomised double bind crossover trial on contamination of conventional ties and bow ties in routine obstetric and gynaecological practice. BMJ. 1993;307(6919):1582–1584. [PMC free article: PMC1697785] [PubMed: 8292945] [Cross Ref]
  • Bleasdale SC. Effectiveness of chlorhexidine bathing to reduce catheter-associated bloodstream infections in medical intensive care unit patients. Archives of Internal Medicine. 2007;167(19):2073. [PubMed: 17954801] [Cross Ref]
  • Bloomfield SF, Stanwell-Smith R, Crevel RWR, Pickup J. Too clean, or not too clean: The hygiene hypothesis and home hygiene. Clinical & Experimental Allergy. 2006;36(4):402–425. [PMC free article: PMC1448690] [PubMed: 16630145] [Cross Ref]
  • Blowers R, Wallace KR. Environmental aspects of staphylococcal infections acquired in hospitals. III. Ventilation of operating rooms—bacteriological investigations. American Journal of Public Health and the Nation’s Health. 1960;50:484–490. [PMC free article: PMC1373302] [PubMed: 13801650]
  • Boe-Hansen R, Albrechtsen HJ, Arvin E, Jørgensen C. Bulk water phase and biofilm growth in drinking water at low nutrient conditions. Water Research. 2002;36(18):4477–4486. [PubMed: 12418650]
  • Bonetta S, Bonetta S, Mosso S, Sampò S, Carraro E. Assessment of microbiological indoor air quality in an Italian office building equipped with an HVAC system. Environmental Monitoring and Assessment. 2009;161(1–4):473–483. [PubMed: 19224384] [Cross Ref]
  • Bouillard L, Michel O, Dramaix M, Devleeschouwer M. Bacterial contamination of indoor air, surfaces, and settled dust, and related dust endotoxin concentrations in healthy office buildings. Annals of Agricultural and Environmental Medicine: AAEM. 2005;12(2):187–192. [PubMed: 16457472]
  • Bourdillon RB, Colebrook L. Air hygiene in dressing-rooms for burns or major wounds. Lancet. 1946;1(6400):601. [PubMed: 21025753]
  • Brady RRW, Wasson A, Stirling I, McAllister C, Damani NN. Is your phone bugged? The incidence of bacteria known to cause nosocomial infection on healthcare workers’ mobile phones. Journal of Hospital Infection. 2006;62(1):123–125. [PubMed: 16099536] [Cross Ref]
  • Brady RRW, Verran J, Damani NN, Gibb AP. Review of mobile communication devices as potential reservoirs of nosocomial pathogens. Journal of Hospital Infection. 2009;71(4):295–300. [PubMed: 19168261] [Cross Ref]
  • Bright KR, Boone SA, Gerba CP. Occurrence of bacteria and viruses on elementary classroom surfaces and the potential role of classroom hygiene in the spread of infectious diseases. Journal of School Nursing. 2009;26(1):33–41. [PubMed: 19903773] [Cross Ref]
  • Bures S, Fishbain JT, Uyehara CF, Parker JM, Berg BW. Computer keyboards and faucet handles as reservoirs of nosocomial pathogens in the intensive care unit. American Journal of Infection Control. 2000;28(6):465–471. [PubMed: 11114617] [Cross Ref]
  • Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, Fierer N, Peña AG, Goodrich JK, Gordon JI, Huttley GA, Kelley ST, Knights D, Koenig JE, Ley RE, Lozupone CA, McDonald D, Muegge BD, Pirrung M, Reeder J, Sevinsky JR, Turnbaugh PJ, Walters WA, Widmann J, Yatsunenko T, Zaneveld J, Knight R. QIIME allows analysis of high-throughput community sequencing data. Nature Methods. 2010;7(5):335–336. [PMC free article: PMC3156573] [PubMed: 20383131] [Cross Ref]
  • Clark RP. Skin scales among airborne particles. Journal of Hygiene. 2009;72(01):47. [PMC free article: PMC2130262] [PubMed: 4522246] [Cross Ref]
  • Clarke KR. Non-parametric multivariate analyses of changes in community structure. Austral Ecology. 1993;18(1):117–143. [Cross Ref]
  • Climo MW, Sepkowitz KA, Zuccotti G, Fraser VJ, Warren DK, Perl TM, Speck K, Jernigan JA, Robles JR, Wong ES. The effect of daily bathing with chlorhexidine on the acquisition of methicillin-resistant Staphylococcus aureus, vancomycin-resistant Enterococcus, and healthcare-associated bloodstream infections: Results of a quasi-experimental multi-center trial. Critical Care Medicine. 2009;37(6):1858–1865. [PubMed: 19384220] [Cross Ref]
  • Cole EC, Addison RM, Rubino JR, Leese KE, Dulaney PD, Newell MS, Wilkins J, Gaber DJ, Wineinger T, Criger DA. Investigation of antibiotic and antibacterial agent cross-resistance in target bacteria from homes of antibacterial product users and nonusers. Journal of Applied Microbiology. 2003;95(4):664–676. [PubMed: 12969278] [Cross Ref]
  • Colebrook L, Cawston WC. Microbic content of air on roof of city hospital, at street level, and in wards. Medical Research Council, Special Report. 1948;262:233–241.
  • Costello EK, Lauber CL, Hamady M, Fierer N, Gordon JI, Knight R. Bacterial community variation in human body habitats across space and time. Science. 2009;326(5960):1694–1697. [PMC free article: PMC3602444] [PubMed: 19892944] [Cross Ref]
  • Cvjetanović B. Determination of bacterial air pollution in various premises. Journal of Hygiene. 1957;56(02):163. [PMC free article: PMC2218037] [PubMed: 13563859] [Cross Ref]
  • D’Amato G, Liccardi G, D’Amato M, Holgate S. Environmental risk factors and allergic bronchial asthma. Clinical & Experimental Allergy. 2005;35(9):1113–1124. [PubMed: 16164436] [Cross Ref]
  • Datta P, Rani H, Chander J, Gupta V. Bacterial contamination of mobile phones of health care workers. Indian Journal of Medical Microbiology. 2009;27(3):279–281. [PubMed: 19584520] [Cross Ref]
  • Doğan M, Feyzioğlu B, Ozdemir M, Baysal B. Investigation of microbial colonization of computer keyboards used inside and outside hospital environments. Mikrobiyoloji Bülteni. 2008;42(2):331–336. [PubMed: 18697431]
  • Drees M, Snydman DR, Schmid CH, Barefoot L, Hansjosten K, Vue PM, Cronin M, Nasraway SA, Golan Y. Prior environmental contamination increases the risk of acquisition of vancomycin-resistant enterococci. Clinical Infectious Diseases. 2008;46(5):678–685. [PubMed: 18230044] [Cross Ref]
  • Drees M, Snydman DR, Schmid CH, Barefoot L, Hansjosten K, Vue PM, Cronin M, Nasraway SA, Golan Y. Antibiotic exposure and room contamination among patients colonized with vancomycin-resistant enterococci. Infection Control and Hospital Epidemiology. 2008;29(8):709–715. [PubMed: 18631116] [Cross Ref]
  • Drudge CN, Krajden S, Summerbell RC, Scott JA. Detection of antibiotic resistance genes associated with methicillin-resistant Staphylococcus aureus (MRSA) and coagulase-negative staphylococci in hospital air filter dust by PCR. Aerobiologia. 2011;28(2):285–289. [Cross Ref]
  • du Moulin GC, Stottmeier KD, Pelletier PA, Tsang AY, Hedley-Whyte J. Concentration of Mycobacterium avium by hospital hot water systems. Journal of the American Medical Association. 1988;260(11):1599–1601. [PubMed: 3411741]
  • Eber MR, Shardell M, Schweizer ML, Laxminarayan R, Perencevich EN. Seasonal and temperature-associated increases in gram-negative bacterial bloodstream infections among hospitalized patients. PLoS ONE. 2011;6(9):e25298. [PMC free article: PMC3180381] [PubMed: 21966489] [Cross Ref]
  • Embil J, Warren P, Yakrus M, Stark R, Corne S, Forrest D, Hershfield E. Pulmonary illness associated with exposure to mycobacterium-avium complex in hot tub water. Hypersensitivity pneumonitis or infection? Chest. 1997;111(3):813–816. [PubMed: 9118726]
  • Escombe AR, Oeser CC, Gilman RH, Navincopa M, Ticona E, Pan W, Martínez C, Chacaltana J, Rodríguez R, Moore DA, Friedland JS, Evans CA. Natural ventilation for the prevention of airborne contagion. PLoS Medicine. 2007;4(2):e68. [PMC free article: PMC1808096] [PubMed: 17326709] [Cross Ref]
  • Exner M, Vacata V, Hornei B, Dietlein E, Gebel J. Household cleaning and surface disinfection: New insights and strategies. Journal of Hospital Infection. 2004;56(Suppl 2):S70–S75. [PubMed: 15110127] [Cross Ref]
  • Fahlgren C, Bratbak G, Sandaa R-A, Thyrhaug R, Li Zweifel U. Diversity of airborne bacteria in samples collected using different devices for aerosol collection. Aerobiologia. 2010;27(2):107–120. [Cross Ref]
  • Falkinham JO, Norton CD, LeChevallier MW. Factors influencing numbers of Mycobacterium avium, Mycobacterium intracellulare, and other mycobacteria in drinking water distribution systems. Applied and Environmental Microbiology. 2001;67(3):1225–1231. [PMC free article: PMC92717] [PubMed: 11229914] [Cross Ref]
  • Farnsworth JE, Goyal SM, Kim SW, Kuehn TH, Raynor PC, Ramakrishnan MA, Anantharaman S, Tang W. Development of a method for bacteria and virus recovery from heating, ventilation, and air conditioning (HVAC) filters. Journal of Environmental Monitoring. 2006;8(10):1006. [PubMed: 17240906] [Cross Ref]
  • Field D, Garrity G, Gray T, Morrison N, Selengut J, Sterk P, Tatusova T, Thomson N, Allen MJ, Angiuoli SV, Ashburner M, Axelrod N, Baldauf S, Ballard S, Boore J, Cochrane G, Cole J, Dawyndt P, De Vos P, dePamphilis C, Edwards R, Faruque N, Feldman R, Gilbert J, Gilna P, Glöckner FO, Goldstein P, Guralnick R, Haft D, Hancock D, Hermjakob H, Hertz-Fowler C, Hugenholtz P, Joint I, Kagan L, Kane M, Kennedy J, Kowalchuk G, Kottmann R, Kolker E, Kravitz S, Kyrpides N, Leebens-Mack J, Lewis SE, Li K, Lister AL, Lord P, Maltsev N, Markowitz V, Martiny J, Methe B, Mizrachi I, Moxon R, Nelson K, Parkhill J, Proctor L, White O, Assunta-Sansone S, Spiers A, Stevens R, Swift P, Taylor C, Tateno Y, Tett A, Turner S, Ussery D, Vaughan B, Ward N, Whetzel T, San Gil I, Wilson G, Wipat A. The Minimum Information About a Genome Sequence (MIGS) Specification. Nature Biotechnology. 2008;26(5):541–547. [PMC free article: PMC2409278] [PubMed: 18464787] [Cross Ref]
  • Fierer N, Lauber CL, Zhou N, McDonald D, Costello EK, Knight R. Forensic identification using skin bacterial communities. Proceedings of the National Academy of Sciences of the USA. 2010;107(14):6477–6481. [PMC free article: PMC2852011] [PubMed: 20231444] [Cross Ref]
  • Finch JE, Prince J, Hawksworth M. A bacteriological survey of the domestic environment. Journal of Applied Microbiology. 1978;45(3):357–364. [PubMed: 730629] [Cross Ref]
  • Fleischer M. Microbiological control of airborne contamination in hospitals. Indoor and Built Environment. 2006;15(1):53–56. [Cross Ref]
  • Fox K, Fox A, Elssner T, Feigley C, Salzberg D. MALDI-TOF mass spectrometry speciation of staphylococci and their discrimination from micrococci isolated from indoor air of schoolrooms. Journal of Environmental Monitoring. 2010;12(4):917–923. [PubMed: 20383373] [Cross Ref]
  • Genet C, Kibru G, Tsegaye W. Indoor air bacterial load and antibiotic susceptibility pattern of isolates in operating rooms and surgical wards at Jimma University Specialized Hospital, southwest Ethiopia. Ethiopian Journal of Health Sciences. 2011;21(1):9–17. [PMC free article: PMC3275854] [PubMed: 22434981]
  • Gilbert JA, Meyer F, Antonopoulos D, Balaji P, Brown CT, Desai N, Eisen JA, Evers D, Field D, Feng W, Huson D, Jansson J, Knight R, Knight J, Kolker E, Konstantindis K, Kostka J, Kyrpides N, Mackelprang R, McHardy A, Quince C, Raes J, Sczyrba A, Shade A, Stevens R. Meeting report: The Terabase Metagenomics Workshop and the Vision of an Earth Microbiome Project. Standards in Genomic Sciences. 2010;3(3):243–248. [PMC free article: PMC3035311] [PubMed: 21304727] [Cross Ref]
  • Gilbert JA, Meyer F, Jansson J, Gordon J, Pace N, Tiedje J, Ley R, Fierer N, Field D, Kyrpides N, Glöckner FO, Klenk HP, Wommack KE, Glass E, Docherty K, Gallery R, Stevens Rick, Knight R. The Earth Microbiome Project: Meeting report of the 1st EMP Meeting on Sample Selection and Acquisition at Argonne National LaboratoryOctober 6, 2010. Standards in Genomic Sciences. 2010;3(3):249–253. [PMC free article: PMC3035312] [PubMed: 21304728] [Cross Ref]
  • Gilbert JA, Bailey M, Field D, Fierer N, Fuhrman JA, Hu B, Jansson J, Knight R, Kowalchuk GA, Kyrpides NC, Meyer F, Stevens R. The Earth Microbiome Project: The Meeting Report for the 1st International Earth Microbiome Project Conference, Shenzhen, China, June 13–15, 2011. Standards in Genomic Sciences. 2011;5(2):243–247. [Cross Ref]
  • Gilbert JA, Bao Y, Wang H, Sansone SA, Edmunds SC, Morrison N, Meyer F, Schriml LM, Davies N, Sterk P, Wilkening J, Garrity GM, Field D, Robbins R, Smith DP, Mizrachi I, Moreau C. Report of the 13th Genomic Standards Consortium Meeting, Shenzhen, China, March 4–7, 2012. Standards in Genomic Sciences. 2012;6(2):276–286. [PMC free article: PMC3387801] [PubMed: 22768370] [Cross Ref]
  • Goecks J, Nekrutenko A, Taylor J. The Galaxy Team. Galaxy: A comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biology. 2010;11(8):R86. [PMC free article: PMC2945788] [PubMed: 20738864] [Cross Ref]
  • Greene VW, Vesley D, Bond RG, Michaelsen GS. Microbiological contamination of hospital air. I. Quantitative studies. Applied Environmental Microbiology. 1962;10:561–566. [PMC free article: PMC1057915] [PubMed: 13950172]
  • Greene VW, Vesley D, Bond RG, Michaelsen GS. Microbiological contamination of hospital air. II. Qualitative studies. Applied Environmental Microbiology. 1962;10:567–571. [PMC free article: PMC1057916] [PubMed: 13950173]
  • Grice EA, Kong HH, Conlan S, Deming CB, Davis J, Young AC, NISC Comparative Sequencing Program. Bouffard GG, Blakesley RW, Murray PR, Green ED, Turner ML, Segre JA. Topographical and temporal diversity of the human skin microbiome. Science. 2009;324(5931):1190–1192. [PMC free article: PMC2805064] [PubMed: 19478181] [Cross Ref]
  • Grice EA, Segre JA. The skin microbiome. Nature Reviews Microbiology. 2011;9(4):244–253. [PMC free article: PMC3535073] [PubMed: 21407241] [Cross Ref]
  • Griffin DW, Gonzalez C, Teigell N, Petrosky T, Northup DE, Lyles M. Observations on the use of membrane filtration and liquid impingement to collect airborne microorganisms in various atmospheric environments. Aerobiologia. 2010;27(1):25–35. [Cross Ref]
  • Groseclose SL, Brathwaite WS, Hall PA, Connor FJ, Sharp P, Anderson WJ, Fagan RF, Aponte JJ, Jones GF, Nitschke DA, Worsham CA, Adekoya N, Chang MH, Doyle T, Dhara R, Jajosky RA. Summary of notifiable diseases—United States, 2002. Morbidity and Mortality Weekly Report. 2004;51(53):1–84. [PubMed: 15123988]
  • Guenther R, Vittori G. Sustainable healthcare architecture. Hoboken, NJ: John Wiley & Sons; 2008.
  • Hall-Baker PA, Nieves E, Jajosky RA, Adams DA, Sharp P, Anderson WJ, Aponte JJ, Aranas AE, Katz SB, Mayes M, Wodajo MS, Onweh DH, Baillie J, Park M. Summary of notifiable diseases—United States, 2008. Morbidity and Mortality Weekly Report. 2010;57(54):1–100.
  • Hassoun A, Vellozzi EM, Smith MA. Colonization of personal digital assistants carried by healthcare professionals. Infection Control and Hospital Epidemiology. 2004;25(11):1000–1001. [PubMed: 15566038] [Cross Ref]
  • Hidron AI, Edwards JR, Patel J, Horan TC, Sievert DM, Pollock DA, Fridkin SK. National Healthcare Safety Network Team, and Participating National Healthcare Safety Network Facilities. NHSN annual update: Antimicrobial-resistant pathogens associated with healthcare-associated infections: Annual summary of data reported to the National Healthcare Safety Network at the Centers for Disease Control and Prevention, 2006–2007. Infection Control and Hospital Epidemiology. 2008;29(11):996–1011. [PubMed: 18947320] [Cross Ref]
  • Hilton AC, Austin E. The kitchen dishcloth as a source of and vehicle for foodborne pathogens in a domestic setting. International Journal of Environmental Health Research. 2000;10(3):257–261. [Cross Ref]
  • Hospodsky D, Qian J, Nazaroff WW, Yamamoto N, Bibby K, Rismani-Yazdi H, Peccia J. Human occupancy as a source of indoor airborne bacteria. PLoS ONE. 2012;7(4):e34867. [PMC free article: PMC3329548] [PubMed: 22529946] [Cross Ref]
  • Huang LL, Mao IF, Chen ML, Huang CT. The microorganisms of indoor air in a teaching hospital. Taiwan Journal of Public Health. 2006;25(4):315–322.
  • Huang SS, Datta R, Platt R. Risk of acquiring antibiotic-resistant bacteria from prior room occupants. Archives of Internal Medicine. 2006;166(18):1945–1951. [PubMed: 17030826] [Cross Ref]
  • Huson DH, Mitra S, Ruscheweyh H-J, Weber N, Schuster SC. Integrative analysis of environmental sequences using MEGAN4. Genome Research. 2011;21(9):1552–1560. [PMC free article: PMC3166839] [PubMed: 21690186] [Cross Ref]
  • Huttunen K, Rintala H, Hirvonen MR, Vepsäläinen A, Hyvärinen A, Meklin T, Toivola M, Nevalainen A. Indoor air particles and bioaerosols before and after renovation of moisture-damaged buildings: The effect on biological activity and microbial flora. Environmental Research. 2008;107(3):291–298. [PubMed: 18462714] [Cross Ref]
  • Hyvarinen A, Meklin T, Vepsäläinen A, Nevalainen A. Fungi and actinobacteria in moisture-damaged building materials—concentrations and diversity. International Biodeterioration & Biodegradation. 2002;49(1):27–37. [Cross Ref]
  • Jaffal AA, Banat IM, El Mogheth AA, Nsanze H, Bener A, Ameen AS. Residential indoor airborne microbial populations in the United Arab Emirates. Environment International. 1997;23(4):529–533. [Cross Ref]
  • Josephson KL, Rubino JR, Pepper IL. Characterization and quantification of bacterial pathogens and indicator organisms in household kitchens with and without the use of a disinfectant cleaner. Journal of Applied Microbiology. 1997;83(6):737–750. [PubMed: 9449812]
  • Kaarakainen P, Rintala H, Vepsäläinen A, Hyvärinen A, Nevalainen A, Meklin T. Microbial content of house dust samples determined with qPCR. Science of the Total Environment. 2009;407(16):4673–4680. [PubMed: 19473690] [Cross Ref]
  • Kassakian SZ, Mermel LA, Jefferson JA, Parenteau SL, Machan JT. Impact of chlorhexidine bathing on hospital-acquired infections among general medical patients. Infection Control and Hospital Epidemiology. 2011;32(3):238–243. [PubMed: 21460508] [Cross Ref]
  • Kelley ST, Theisen U, Angenent LT, St Amand A, Pace NR. Molecular analysis of shower curtain biofilm microbes. Applied and Environmental Microbiology. 2004;70(7):4187–4192. [PMC free article: PMC444822] [PubMed: 15240300] [Cross Ref]
  • Kembel SW, Jones E, Kline J, Northcutt D, Stenson J, Womack AM, Bohannan BJM, Brown GZ, Green JL. Architectural design influences the diversity and structure of the built environment microbiome. ISME Journal. 2012 [PMC free article: PMC3400407] [PubMed: 22278670] [Cross Ref]
  • Kilic IH, Ozaslan M, Karagoz ID, Zer Y, Davutoglu V. The microbial colonisation of mobile phone used by healthcare staffs. Pakistan Journal of Biological Sciences. 2009;12(11):882–884. [PubMed: 19803124] [Cross Ref]
  • Kim J-S, Kim H-S, Park J-Y, Koo H-S, Choi C-S, Song W, Cho HC, Lee KM. Contamination of X-ray cassettes with methicillin-resistant Staphylococcus aureus and methicillin-resistant Staphylococcus haemolyticus in a radiology department. Annals of Laboratory Medicine. 2012;32(3):206. [PMC free article: PMC3339301] [PubMed: 22563556] [Cross Ref]
  • Klevens RM, Edwards JR, Richards CL Jr, Horan TC, Gaynes RP, Pollock DA, Cardo DM. Estimating health care-associated infections and deaths in US hospitals, 2002. Public Health Reports (Washington, D.C.: 1974) 2007;122(2):160–166. [PMC free article: PMC1820440] [PubMed: 17357358]
  • Klintberg B, Berglund N, Lilja G, Wickman M, van Hage-Hamsten M. Fewer allergic respiratory disorders among farmers’ children in a closed birth cohort from Sweden. European Respiratory Journal: Official Journal of the European Society for Clinical Respiratory Physiology. 2001;17(6):1151–1157. [PubMed: 11491158]
  • Knights D, Kuczynski J, Charlson ES, Zaneveld J, Mozer MC, Collman RG, Bushman FD, Knight R, Kelley ST. Bayesian community-wide culture-independent microbial source tracking. Nature Methods. 2011;8(9):761–763. [PMC free article: PMC3791591] [PubMed: 21765408] [Cross Ref]
  • Kopperud RJ, Ferro AR, Hildemann LM. Outdoor versus indoor contributions to indoor particulate matter (PM) determined by mass balance methods. Journal of the Air & Waste Management Association (1995) 2004;54(9):1188–1196. [PubMed: 15468671]
  • Korves TM, Piceno YM, Tom LM, DeSantis TZ, Jones BW, Andersen GL, Hwang GM. Bacterial communities in commercial aircraft high-efficiency particulate air (HEPA) filters assessed by PhyloChip analysis. Indoor Air. 2012. http://doi​​.1111/j.1600-0668.2012.00787.x. [PubMed: 22563927] [Cross Ref]
  • Kottmann R, Gray T, Murphy S, Kagan L, Kravitz S, Lombardot T, Field D, Glöckner FO. A standard MIGS/MIMS compliant XML schema: Toward the development of the Genomic Contextual Data Markup Language (GCDML) OMICS: A Journal of Integrative Biology. 2008;12(2):115–121. [PubMed: 18479204] [Cross Ref]
  • Kramer A, Schwebke I, Kampf G. How long do nosocomial pathogens persist on inanimate surfaces? A systematic review. BMC Infectious Diseases. 2006;6:130. [PMC free article: PMC1564025] [PubMed: 16914034] [Cross Ref]
  • Krogulski A, Szczotko M. Microbiological quality of hospital indoor air. Determinant factors for microbial concentration in air of operating theatres. Roczniki Państwowego Zakładu Higieny. 2011;62(1):109–113. [PubMed: 21735988]
  • Larsen PE, Field D, Gilbert JA. Predicting bacterial community assemblages using an artificial neural network approach. Nature Methods. 2012 [PubMed: 22504588] [Cross Ref]
  • Le Dantec C, Duguet J-P, Montiel A, Dumoutier N, Dubrou S, Vincent V. Occurrence of mycobacteria in water treatment lines and in water distribution systems. Applied and Environmental Microbiology. 2002;68(11):5318–5325. [PMC free article: PMC129932] [PubMed: 12406720] [Cross Ref]
  • Lee TC, Stout JE, Yu VL. Factors predisposing to Legionella pneumophila colonization in residential water systems. Archives of Environmental Health: An International Journal. 1988;43(1):59–62. [PubMed: 3355245] [Cross Ref]
  • Leoni E, Legnani P, Mucci MT, Pirani R. Prevalence of mycobacteria in a swimming pool environment. Journal of Applied Microbiology. 1999;87(5):683–688. [PubMed: 10594708] [Cross Ref]
  • Leoni E, Legnani P, Bucci Sabattini MA, Righi F. Prevalence of Legionella spp. in swimming pool environment. Water Research. 2001;35(15):3749–3753. [PubMed: 11561639] [Cross Ref]
  • Leynaert B, Neukirch C, Jarvis D, Chinn S, Burney P, Neukirch F. Does living on a farm during childhood protect against asthma, allergic rhinitis, and atopy in adulthood? American Journal of Respiratory and Critical Care Medicine. 2001;164(10 Pt 1):1829–1834. [PubMed: 11734431]
  • Lignell U, Meklin T, Rintala H, Hyvärinen A, Vepsäläinen A, Pekkanen J, Nevalainen A. Evaluation of quantitative PCR and culture methods for detection of house dust fungi and streptomycetes in relation to moisture damage of the house. Letters in Applied Microbiology. 2008;47(4):303–308. [PubMed: 19241524] [Cross Ref]
  • Livornese LL Jr, Dias S, Samel C, Romanowski B, Taylor S, May P, Pitsakis P, Woods G, Kaye D, Levison ME. Hospital-acquired infection with vancomycin-resistant Enterococcus faecium transmitted by electronic thermometers. Annals of Internal Medicine. 1992;117(2):112–116. [PubMed: 1605425]
  • Loh W, Ng VV, Holton J. Bacterial flora on the white coats of medical students. Journal of Hospital Infection. 2000;45(1):65–68. [PubMed: 10833346] [Cross Ref]
  • Lopez P-J, Ron O, Parthasarathy P, Soothill J, Spitz L. Bacterial counts from hospital doctors’ ties are higher than those from shirts. American Journal of Infection Control. 2009;37(1):79–80. [PubMed: 19171249] [Cross Ref]
  • Lozupone C, Knight R. UniFrac: A new phylogenetic method for comparing microbial communities. Applied and Environmental Microbiology. 2005;71(12):8228–8235. [PMC free article: PMC1317376] [PubMed: 16332807] [Cross Ref]
  • Marinella MA, Pierson C, Chenoweth C. The stethoscope: A potential source of nosocomial infection? Archives of Internal Medicine. 1997;157(7):786–790. [PubMed: 9125011] [Cross Ref]
  • Marshall BM, Robleto E, Dumont T, Levy SB. The frequency of antibiotic-resistant bacteria in homes differing in their use of surface antibacterial agents. Current Microbiology. 2012;65(4):407–415. [PubMed: 22752336] [Cross Ref]
  • Martinez FD. The coming-of-age of the hygiene hypothesis. Respiratory Research. 2001;2(3):129–132. [PMC free article: PMC2002071] [PubMed: 11686875] [Cross Ref]
  • Merchant JA, Naleway AL, Svendsen ER, Kelly KM, Burmeister LF, Stromquist AM, Taylor CD, Thorne PS, Reynolds SJ, Sanderson WT, Chrischilles EA. Asthma and farm exposures in a cohort of rural Iowa children. Environmental Health Perspectives. 2005;113(3):350–356. [PMC free article: PMC1253764] [PubMed: 15743727]
  • Meyer F, Paarmann D, D’Souza M, Olson R, Glass E, Kubal M, Paczian T, Rodriguez A, Stevens R, Wilke A, Wilkening J, Edwards RA. The metagenomics RAST Server—a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics. 2008;9(1):386. [PMC free article: PMC2563014] [PubMed: 18803844] [Cross Ref]
  • Monto AS. Epidemiology of viral respiratory infections. American Journal of Medicine. 2002;112(6):4–12. [PubMed: 11955454] [Cross Ref]
  • Morrow JB, Downey AS, Peccia J. Challenges in microbial sampling in the indoor environment. National Institutes of Standards and Technology; 2012. http://www​​.cfm?pub_id=910577.
  • Moschandreas DJ. Exposure to pollutants and daily time budgets of people. Bulletin of the New York Academy of Medicine. 1981;57(10):845–859. [PMC free article: PMC1805406] [PubMed: 6947844]
  • Myers MG. Longitudinal evaluation of neonatal nosocomial infections: Association of infection with a blood pressure cuff. Pediatrics. 1978;61(1):42–45. [PubMed: 263872]
  • Nevalainen A, Seuri M. Of microbes and men. Indoor Air. 2005;15(s9):58–64. [PubMed: 15910530] [Cross Ref]
  • Noble WC, Habbema JDF, Van Furth R, Smith I, De Raay C. Quantitative studies on the dispersal of skin bacteria into the air. Journal of Medical Microbiology. 1976;9(1):53–61. [PubMed: 1263248] [Cross Ref]
  • Noris F, Siegel JA, Kinney KA. Biological and chemical contaminants in HVAC filter dust. ASHRAE Transactions. 2009;115(2):484–491.
  • Noris F, Siegel JA, Kinney KA. Evaluation of HVAC filters as a sampling mechanism for indoor microbial communities. Atmospheric Environment. 2011;45(2):338–346. [Cross Ref]
  • O’Horo JC, Silva GLM, Munoz-Price LS, Safdar N. The efficacy of daily bathing with chlorhexidine for reducing healthcare-associated bloodstream infections: A meta-analysis. Infection Control and Hospital Epidemiology. 2012;33(3):257–267. [PubMed: 22314063] [Cross Ref]
  • Park JH, Spiegelman DL, Burge HA, Gold DR, Chew GL, Milton DK. Longitudinal study of dust and airborne endotoxin in the home. Environmental Health Perspectives. 2000;108(11):1023–1028. [PMC free article: PMC1240157] [PubMed: 11102291]
  • Paulson DS. Efficacy evaluaton of a 4% chlorhexidine gluconate as a full-body shower wash. American Journal of Infection Control. 1993;21(4):205–209. [PubMed: 8239051] [Cross Ref]
  • Peccia J, Milton DK, Reponen T, Hill J. A role for environmental engineering and science in preventing bioaerosol-related disease. Environmental Science & Technology. 2008;42(13):4631–4637. [PubMed: 18677984] [Cross Ref]
  • Perry C, Marshall R, Jones E. Bacterial contamination of uniforms. Journal of Hospital Infection. 2001;48(3):238–241. [PubMed: 11439013] [Cross Ref]
  • Pitkäranta M, Meklin T, Hyvärinen A, Paulin L, Auvinen P, Nevalainen A, Rintala H. Analysis of fungal flora in indoor dust by ribosomal DNA sequence analysis, quantitative PCR, and culture. Applied and Environmental Microbiology. 2008;74(1):233–244. [PMC free article: PMC2223223] [PubMed: 17981947] [Cross Ref]
  • Pope AM, Patterson R, Burge H.Institute of Medicine (U.S.). Committee on the Health Effects of Indoor Allergens. Indoor allergens assessing and controlling adverse health effects. Washington, DC: National Academy Press; 1993. http://search​.ebscohost​.com/login.aspx?direct​=true&scope​=site&db​=nlebk&db=nlabk&AN=1100. [PubMed: 25144066]
  • Popovich KJ, Hota B, Hayes R, Weinstein RA, Hayden MK. Daily skin cleansing with chlorhexidine did not reduce the rate of central-line associated bloodstream infection in a surgical intensive care unit. Intensive Care Medicine. 2010;36(5):854–858. [PubMed: 20213074] [Cross Ref]
  • Qian H, Li Y. Removal of exhaled particles by ventilation and deposition in a multibed airborne infection isolation room. Indoor Air. 2010;20(4):284–297. [PubMed: 20546037] [Cross Ref]
  • Qian J, Hospodsky D, Yamamoto N, Nazaroff WW, Peccia J. Size-resolved emission rates of airborne bacteria and fungi in an occupied classroom. Indoor Air. 2012 [PMC free article: PMC3437488] [PubMed: 22257156] [Cross Ref]
  • Remes ST, Koskela HO, Iivanainen K, Pekkanen J. Allergen-specific sensitization in asthma and allergic diseases in children: The Study on Farmers’ and Non-farmers’ Children. Clinical and Experimental Allergy: Journal of the British Society for Allergy and Clinical Immunology. 2005;35(2):160–166. [PubMed: 15725186] [Cross Ref]
  • Riedler J, Braun-Fahrländer C, Eder W, Schreuer M, Waser M, Maisch S, Carr D, Schierl R, Nowak D, von Mutius E. Exposure to farming in early life and development of asthma and allergy: A cross-sectional survey. Lancet. 2001;358(9288):1129–1133. [PubMed: 11597666] [Cross Ref]
  • Rintala H, Pitkaranta M, Toivola M, Paulin L, Nevalainen A. Diversity and seasonal dynamics of bacterial community in indoor environment. BMC Microbiology. 2008;8(1):56. [PMC free article: PMC2323381] [PubMed: 18397514] [Cross Ref]
  • Rintala H, Pitkäranta M, Täubel M. Advances in Applied Microbiology. Elsevier; 2012. Microbial communities associated with house dust. pp. 75–120. http://linkinghub​.elsevier​.com/retrieve/pii​/B978012394805200004X. [PubMed: 22305094]
  • Rook GAW. Review series on helminths, immune modulation and the hygiene hypothesis: The broader implications of the hygiene hypothesis. Immunology. 2009;126(1):3–11. [PMC free article: PMC2632706] [PubMed: 19120493] [Cross Ref]
  • Rook GAW, Stanford JL. Give us this day our daily germs. Immunology Today. 1998;19(3):113–116. [PubMed: 9540269] [Cross Ref]
  • Rusin P, Orosz-Coughlin P, Gerba C. Reduction of faecal coliform, coliform and heterotrophic plate count bacteria in the household kitchen and bathroom by disinfection with hypochlorite cleaners. Journal of Applied Microbiology. 1998;85(5):819–828. [PubMed: 9830117]
  • Rutala WA, Barbee SL, Aguiar NC, Sobsey MD, Weber DJ. Antimicrobial activity of home disinfectants and natural products against potential human pathogens. Infection Control and Hospital Epidemiology. 2000;21(1):33–38. [PubMed: 10656352] [Cross Ref]
  • Safdar N, Drayton J, Dern J, Warrack S, Duster M, Schmitz M. Telemetry leads harbor nosocomial pathogens. International Journal of Infection Control. 2012;8(2) [Cross Ref]
  • Schabrun S, Chipchase L, Rickard H. Are therapeutic ultrasound units a potential vector for nosocomial infection? Physiotherapy Research International. 2006;11(2):61–71. [PubMed: 16808087] [Cross Ref]
  • Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, Lesniewski RA, Oakley BB, Parks DH, Robinson CJ, Sahl JW, Stres B, Thallinger GG, Van Horn DJ, Weber CF. Introducing Mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Applied and Environmental Microbiology. 2009;75(23):7537–7541. [PMC free article: PMC2786419] [PubMed: 19801464] [Cross Ref]
  • Schram D, Doekes G, Boeve M, Douwes J, Riedler J, Ublagger E, Mutius E, Budde J, Pershagen G, Nyberg F, Alm J, Braun-Fahrländer C, Waser M, Brunekreef B. the PARSIFAL Study Group. Bacterial and fungal components in house dust of farm children, Rudolf Steiner school children and reference children—the PARSIFAL Study. Allergy. 2005;60(5):611–618. [PubMed: 15813805] [Cross Ref]
  • Scott E, Bloomfield SF, Barlow CG. An investigation of microbial contamination in the home. Journal of Hygiene. 1982;89(2):279–293. [PMC free article: PMC2134222] [PubMed: 7130703]
  • Scott E, Bloomfield SF, Barlow CG. Evaluation of disinfectants in the domestic environment under ‘in use’ conditions. Journal of Hygiene. 1984;92(2):193–203. [PMC free article: PMC2129245] [PubMed: 6323576]
  • Sebastian A, Larsson L. Characterization of the microbial community in indoor environments: A chemical-analytical approach. Applied and Environmental Microbiology. 2003;69(6):3103–3109. [PMC free article: PMC161488] [PubMed: 12788704] [Cross Ref]
  • Simard C, Trudel M, Paquette G, Payment P. Microbial investigation of the air in an apartment building. Journal of Hygiene. 1983;91(2):277–286. [PMC free article: PMC2129367] [PubMed: 6358346]
  • Snyder GM, Thom KA, Furuno JP, Perencevich EN, Roghmann M-C, Strauss SM, Netzer G, Harris AD. Detection of methicillin-resistant Staphylococcus aureus and vancomycin-resistant enterococci on the gowns and gloves of healthcare workers. Infection Control and Hospital Epidemiology. 2008;29(7):583–589. [PMC free article: PMC2577846] [PubMed: 18549314] [Cross Ref]
  • Stajich JE, Harris T, Brunk BP, Brestelli J, Fischer S, Harb OS, Kissinger JC, Li W, Nayak V, Pinney DF, Stoekert CJ Jr, Roos DS. FungiDB: An integrated functional genomics database for fungi. Nucleic Acids Research. 2011;40(D1):D675–D681. [PMC free article: PMC3245123] [PubMed: 22064857] [Cross Ref]
  • Stanley NJ, Kuehn TH, Kim SW, Raynor PC, Anantharaman S, Ramakrishnan MA, Goyal Sagar M. Background culturable bacteria aerosol in two large public buildings using HVAC filters as long term, passive, high-volume air samplers. Journal of Environmental Monitoring. 2008;10(4):474. [PubMed: 18385868] [Cross Ref]
  • Sudharsanam S, Srikanth P, Sheela M, Steinberg R. Study of the indoor air quality in hospitals in South Chennai, India—microbial profile. Indoor and Built Environment. 2008;17(5):435–441. [Cross Ref]
  • Taitt CR, Leski T, Stenger D, Vora GJ, House B, Nicklasson M, Pimentel G, Zurawski DV, Kirkup BC, Craft D, Waterman PE, Lesho EP, Bangurae U, Ansumana R. Antimicrobial resistance determinant microarray for analysis of multi-drug resistant isolates. SPIE. 2012;8371 83710X-83710X-10.
  • Tang JW. The effect of environmental parameters on the survival of airborne infectious agents. Journal of The Royal Society Interface. 2009;6(Suppl_6):S737–S746. [PMC free article: PMC2843949] [PubMed: 19773291] [Cross Ref]
  • Täubel M, Rintala H, Pitkäranta M, Paulin L, Laitinen S, Pekkanen J, Hyvärinen A, Nevalainen A. The occupant as a source of house dust bacteria. Journal of Allergy and Clinical Immunology. 2009;124(4):834–840. e47. [PubMed: 19767077] [Cross Ref]
  • Thomas V, Herrera-Rimann K, Blanc DS, Greub G. Biodiversity of amoebae and amoeba-resisting bacteria in a hospital water network. Applied and Environmental Microbiology. 2006;72(4):2428–2438. [PMC free article: PMC1449017] [PubMed: 16597941] [Cross Ref]
  • Treakle AM, Thom KA, Furuno JP, Strauss SM, Harris AD, Perencevich EN. Bacterial contamination of health care workers’ white coats. American Journal of Infection Control. 2009;37(2):101–105. [PMC free article: PMC2892863] [PubMed: 18834751] [Cross Ref]
  • Tringe SG, Zhang T, Liu X, Yu Y, Lee WH, Yap J, Yao F, Suan ST, Ing SK, Haynes M, Rohwer F, Wei CL, Tan P, Bristow J, Rubin EM, Ruan Y. The airborne metagenome in an indoor urban environment. PLoS ONE. 2008;3(4):e1862. [PMC free article: PMC2270337] [PubMed: 18382653] [Cross Ref]
  • Ulger F, Esen S, Dilek A, Yanik K, Gunaydin M, Leblebicioglu H. Are we aware how contaminated our mobile phones with nosocomial pathogens? Annals of Clinical Microbiology and Antimicrobials. 2009;8(1):7. [PMC free article: PMC2655280] [PubMed: 19267892] [Cross Ref]
  • Vaerewijck MJM, Huys G, Carlos Palomino J, Swings J, Portaels F. Mycobacteria in drinking water distribution systems: Ecology and significance for human health. FEMS Microbiology Reviews. 2005;29(5):911–934. [PubMed: 16219512] [Cross Ref]
  • Vernon MO, Hayden MK, Trick WE, Hayes RA, Blom DW, Weinstein RA. Chlorhexidine gluconate to cleanse patients in a medical intensive care unit: The effectiveness of source control to reduce the bioburden of vancomycin-resistant Enterococci. Archives of Internal Medicine. 2006;166(3):306–312. [PubMed: 16476870] [Cross Ref]
  • Wang Q, Garrity GM, Tiedje JM, Cole JR. Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Applied and Environmental Microbiology. 2007;73(16):5261–5267. [PMC free article: PMC1950982] [PubMed: 17586664] [Cross Ref]
  • Warner P, Glassco A. Enumeration of air-borne bacteria in hospital. Canadian Medical Association Journal. 1963;88:1280–1283. [PMC free article: PMC1921583] [PubMed: 13998956]
  • Whittaker RH. Vegetation of the Siskiyou Mountains, Oregon and California. Ecological Monographs. 1960;30(3):279. [Cross Ref]
  • Whittaker RH. Evolution and measurement of species diversity. Taxon. 1972;21(2/3):213. [Cross Ref]
  • Wiener-Well Y, Galuty M, Rudensky B, Schlesinger Y, Attias D, Yinnon AM. Nursing and physician attire as possible source of nosocomial infections. American Journal of Infection Control. 2011;39(7):555–559. [PubMed: 21864762] [Cross Ref]
  • Williams REO, Hirch A. Bacterial contamination of air in underground trains. Lancet. 1950;1(6595):128–131. [PubMed: 15408973] [Cross Ref]
  • Williams REO, Lidwell OM, Hirch A. The bacterial flora of the air of occupied rooms. Journal of Hygiene. 1956;54(04):512. [PMC free article: PMC2217879] [PubMed: 13385487] [Cross Ref]
  • Wong D, Nye K, Hollis P. Microbial flora on doctors’ white coats. BMJ. 1991;303(6817):1602–1604. [PMC free article: PMC1676235] [PubMed: 1773186] [Cross Ref]
  • Yamada K. A study on the behavior and control of indoor airborne microbe in a clinic. Journal of the National Institute of Public Health. 2007;56(3):300–302.
  • Yazdanbakhsh M, Matricardi PM. Parasites and the hygiene hypothesis: Regulating the immune system? Clinical Reviews in Allergy & Immunology. 2004;26(1):15–24. [PubMed: 14755072] [Cross Ref]
  • Yilmaz P, Kottmann R, Field D, Knight R, Cole JR, Amaral-Zettler L, Gilbert JA, Karsch-Mizrachi I, Johnston A, Cochrane G, Vaughan R, Hunter C, Park J, Morrison N, Rocca-Serra P, Sterk P, Arumugam M, Bailey M, Baumgartner L, Birren BW, Blaser MJ, Bonazzi V, Booth T, Bork P, Bushman FD, Buttigieg PL, Chain PSG, Charlson E, Costello EK, Huot-Creasy H, Dawyndt P, DeSantis T, Fierer N, Fuhrman JA, Gallery RE, Gevers D, Gibbs RA, San Gil I, Gonzalez A, Gordon JI, Guralnick R, Hankeln W, Highlander S, Hugenholtz P, Jansson J, Kau AL, Kelley ST, Kennedy J, Knights D, Koren O, Kuczynski J, Kyrpides N, Larsen R, Lauber CL, Legg T, Ley RE, Lozupone CA, Ludwig W, Lyons D, Maguire E, Methé BA, Meyer F, Muegge B, Nakielny S, Nelson KE, Nemergut D, Neufeld JD, Newbold LK, Oliver AE, Pace NR, Palanisamy G, Peplies J, Petrosino J, Proctor L, Pruesse E, Quast C, Raes J, Ratnasingham S, Ravel J, Relman DA, Assunta-Sansone S, Schloss PD, Schriml L, Sinha R, Smith MI, Sodergren E, Spor A, Stombaugh J, Tiedje JM, Ward DV, Weinstock GM, Wendel D, White O, Whiteley A, Wilke A, Wortman JR, Yatsunenko T, Glöckner FO. Minimum Information about a Marker Gene Sequence (MIMARKS) and Minimum Information About Any (x) Sequence (MIxS) Specifications. 2011. Nature Biotechnology. 2011;29(5):415–420. [PMC free article: PMC3367316] [PubMed: 21552244] [Cross Ref]
  • Zachary KC, Bayne PS, Morrison VJ, Ford DS, Silver LC, Hooper DC. Contamination of gowns, gloves, and stethoscopes with vancomycin-resistant Enterococci. Infection Control and Hospital Epidemiology. 2001;22(9):560–564. [PubMed: 11732785] [Cross Ref]


, , and .

22 Brown University.


Our understanding of microbial communities is in a time of rapid change. The application of polymerase chain reaction (PCR), cloning, and DNA sequencing to microbial diversity research has rapidly expanded our appreciation of the extent of the microbial world. In particular, analysis of PCR amplicons from various regions of the small subunit ribosomal RNA (SSU rRNA or 16S) gene generated from culture-independent samples is now the accepted standard for cataloguing microbial communities. As sequencing technologies improved, it became feasible to assess community membership from more than 1,000 individual SSU rRNA amplicons. With the advent of next-generation sequencing (NGS) that did not require the cloning of individual amplicons, researchers transitioned from generating thousands of ~800–1,100 nt reads to hundreds of thousands of 100–500 nt sequencing reads (454 technology). Illumina technology can now produce millions of 100 nt reads from hundreds of samples in a single run, potentially providing a nearly exhaustive survey of microbes present in a sample.

While the NGS technologies provide deeper sampling, the trade-off for depth has been shorter read lengths. The ~800–1,100 nt reads produced by the late 1990s using Sanger sequencing on ABI or LICOR platforms could be used to reconstruct the entire SSU rRNA gene through multiple sequencing of the same clone. To make use of the shorter reads produced by NGS technology, Sogin et al. capitalized on the structure of the SSU rRNA gene. The gene includes a series of regions that are highly conserved across the bacterial domain, interspersed with a series of nine hypervariable regions. This structure lends itself conveniently to NGS because oligonucleotide primers that target conserved regions on either side of the hypervariable regions can amplify DNA from across the bacterial domain. The more rapidly evolving hypervariable regions in contrast are unique for most microbial genera and in many cases can differentiate below the genus level. Sogin et al. used primers to conserved flanking regions to amplify the V6 hypervariable region, which at 60–80 nt in length could reliably be completely sequenced on a 454 GS20. As with conventionally sequenced clone libraries, each read in principle represents an SSU rRNA operon and is a proxy for a microbe from the sample. By comparing the hypervariable region sequences against databases of SSU rRNA gene sequences from known taxonomy, such as RDP (Wang et al., 2007), SILVA (Pruesse et al., 2007), or Greengenes (DeSantis et al., 2006), the reads become tags for cataloging the taxonomy of the community being studied.

As the technology for microbial community research has evolved, so has our understanding of the communities we study. In the first published study implementing NGS in environmental samples, Sogin et al. (2006) examined several marine environments and discovered a richness and diversity in microbial community structures previously unknown. Each community exhibited a relatively small number of highly abundant taxa and a large number of low abundance taxa, a pattern often described as a long-tail distribution (Figure A6-1). Because of the unevenness of this community structure, previous studies using hundreds or thousands of sequencing reads were able to identify only the most abundant members and a small fraction of the taxa in the long tail. The greater sequencing depth of NGS methods revealed the breadth of the low abundance taxa—the “rare biosphere.”

A four panel illustration showing examples of rank abundance with a long-tail distribution.


An example rank abundance curve with a long-tail distribution. The OTUs are ordered from most abundant on the left to the least abundant on the right. The y-axis plots the abundance of each OTU. There are a very small number of highly abundant OTUs on (more...)

Impact of Sequencing Errors and Clustering Methods

As NGS provided a means to explore ever deeper into microbial community structures, the gap between the number of named species and the number of sequence phylotypes increased. Unfortunately, the short read lengths of NGS technology, especially when applied to hypervariable regions with few stable phylogenetically informative positions, are poorly suited for the traditional phylogenetic analyses required for registering new taxa. Researchers turned to taxonomic-independent sequence clustering methods for characterizing microbial communities. By assuming that very similar sequences represent closely related organisms, and that more divergent sequences represent more distantly related organisms, the sequences can be clustered into groups of similar organisms, each cluster or “operational taxonomic unit” (OTU) presumed to represent a phylotype (Schloss and Handelsman, 2005). The width of the clustering, meaning the percent identity threshold for sequence tags to be placed in the same OTU, represents the similarity of the microbes in each OTU.

A critical element in taxonomy-independent analyses of diversity is sequencing error. Random errors can be tolerated in an assembly project where the goal is a consensus sequence. In OTU clustering, however, each read is assumed to represent an individual organism, and if a read has sufficient errors, then it will not cluster with its template, instead forming a new, spurious OTU. Thus, if not filtered out or unaccounted for, sequencing error can lead to inflation in the number of OTUs attributed to a community.

To address the issue of sequence quality, several authors developed quality filtering (Huse et al., 2007) and data de-noising (Quince et al., 2009) techniques for processing raw 454 sequencing reads to reduce sequencing errors and thereby reduce OTU inflation. In 2009, in a paper titled Wrinkles in the Rare Biosphere, Kunin et al. (2010) highlighted the impact of error rates on OTU analyses by sequencing a single strain of E. coli and generating more than 600 OTUs. Reeder and Knight followed this with a “News and Views” piece: The ‘rare biosphere’: A reality check (Reeder and Knight, 2009). The combination of these two publications spotlighted the very real concerns about the impact of sequencing error on microbial community diversity estimates. They posed a critical question: Is the rare biosphere real or simply an artifact of sequencing errors?

Even with very high-quality sequencing and stringent quality filtering, the depth of sampling afforded by NGS technology leads to more absolute OTU inflation than the earlier Sanger sequencing. This is for two reasons: First, when processing data from sequencing hundreds to thousands of Sanger capillary reads, individual chromatograms are often read by hand and confirmed by forward and reverse reads resulting in very high-quality sequence assemblies. While it is in principle possible to develop a similar skill reading 454 flowgrams, it is not conceivable to hand-edit hundreds of thousands of reads. In this regard it is worth noting that the generally accepted error rate in automated high-throughput capillary sequencing is 1 percent (i.e., an average Phred score of 20). The second reason is that the sheer number of reads produced by NGS will result in more spurious OTUs even if the error rate is much lower. Most OTU clustering methods are based only on the percent identity between sequences. If a read has sufficient errors that the difference between it and its template is greater than the clustering threshold, the clustering algorithm will place it in a new OTU. If a sequencing error rate leads to 1 read per thousand that fails to cluster with its template, a traditional clone library Sanger project with 1,000 reads will have, on average, one spurious OTU. With NGS, with the same error rate, a data set of 100,000 reads will have, on average, 100 spurious OTUs. In practice, quality-controlled NGS reads tend to have a lower overall error rate (Schloss et al., 2011) than automated ABI capillary sequencing. So while, the relative rate of OTU inflation per read may have dropped with NGS, the absolute number of spurious OTUs has increased considerably because of dramatically increased depths of sequencing.

As it turns out, though, much of the OTU inflation observed in NGS projects was not due to sequencing errors. Ironing out the Wrinkles in the Rare Biosphere (Huse et al., 2010) demonstrated that the most commonly used method for OTU generation dramatically compounded the problem of sequencing error. Following on the Kunin technique, Huse et al. clustered DNA amplified from a single E. coli gene, and using only sequences with an error rate below the clustering threshold, showed that switching from a single multiple sequence (MS) alignment to multiple pairwise (PW) alignments, and from complete linkage (CL) clustering to average linkage (AL) clustering reduced the OTU inflation from 599 OTUs to 24. They introduced a single-linkage preclustering (SLP) to smooth errors prior to clustering. Using SLP-PWAL for clustering brought the OTU count to 1.

The effect of reads with more errors than the clustering threshold still needs to be taken into consideration. With established, simple sequence quality filtering, SLP clustering of reads generated on the Roche Genome Sequencer FLX platform from amplicons of the V6 SSU rRNA hypervariable region, errant reads produce spurious singleton OTUs at a rate of ~1 spurious OTU per 1,000 reads. When applied to analysis of control communities of limited diversity this rate can sound alarming. Processing 50,000 reads of a control community with 40 known members would generate 40 + 50,000/1,000 = 90 OTUs. However it is important to recognize that the number of spurious OTUs produced scales with the sequencing depth, not the complexity of the community. If a biological community sampled to a depth of 50,000 reads were to have 1,000 observed OTUs, then 50 of these would be due to sequencing error. As shown in Table A6-1, algorithm choice has a much greater effect on OTU inflation than sequencing error.

TABLE A6-1. OTU Inflation Due to Clustering Algorithm and Sequencing Error.


OTU Inflation Due to Clustering Algorithm and Sequencing Error.

As NGS technologies produce longer reads (> 500 nt with Roche/454 and 300 nt with the Illumina MiSeq at the time of writing) the same error rate results in fewer errant reads generating new OTUs, because more errors per read are required for a sequence to fail to cluster with its template. Instead, a second type of error becomes of increasing importance: chimeric sequences from two or more templates during amplification. Chimeras are generated when a template is incompletely replicated during the elongation step of PCR. This truncated sequence can hybridize to other targets in subsequent PCR cycles and act as a primer, generating a single sequence from multiple templates. The frequency of chimeras scales with SSU rRNA amplicon length (Huber et al., 2009). The amplification of sample DNA is largely independent of sequencing platform (though the effect of platform-specific primer adapters has not been thoroughly explored) and several modifications of standard PCR result in reduced chimera formation (Acinas et al., 2004; Lahr and Katz, 2009; Qiu et al., 2001). However, chimeras remain in most heterogeneous amplicon pools. Several methods have been developed for identifying and removing chimeric reads. Haas et al. (2011) developed Chimera Slayer to remove chimeras by comparing each sequence read against a curated database of non-chimeric, SSU rRNA genes. Quince et al. (2011) and Edgar et al. (2011) have developed chimera checkers (Perseus and UChime, respectively) using reference comparison, but optimized for shorter NGS reads. UChime also performs a de novo check by comparing triplets of reads in a data set to see if any reads appear to be a combination of two other reads from the same amplicon pool. Because chimeras have large sections of sequence substitutions, we can conservatively presume that each unique chimera will likely create a new OTU during clustering. Chimera checking, therefore, is just as important for improving OTU richness estimates as basic sequence quality filtering.

Sequencing and Clustering Best Practices

DNA Amplification

Given the known limitations of both NGS technologies and of clustering methods, it is particularly important to exercise best practices in both generation and use of NGS data. The first way to reduce the impact of sequencing errors on microbial ecology investigations is to reduce the rates of base incorporation errors and chimera formation in the DNA amplification step. This includes the use of a high-fidelity polymerase such as Platinum Taq, reducing contaminants that interfere with polymerase processivity or proofreading, minimizing the number of PCR cycles, and optimizing the amount of input DNA (Acinas et al., 2004; Lahr and Katz, 2009; Qiu et al., 2001). Technical replicates of the amplification step facilitate discrimination of novel sequences from high-frequency errors or chimeras.

Quality Filtering

Several authors have provided extensive analyses of quality filtering methods for both 454 and Illumina sequencing technologies (Huse et al., 2007; Meacham et al., 2011; Minoche et al., 2011; Quince et al., 2011; Schloss et al., 2011). In brief, removing reads with ambiguous bases (Ns), with low-quality scores, that are truncated, and that have known mismatches in the primer region are computationally simple means of decreasing the error rate several-fold. More computationally intensive algorithms such as AmpliconNoise (Quince et al., 2011) (implemented either directly or as implemented in QIIME [Caporaso et al., 2010] or mothur [Schloss et al., 2009]) can be used as the first step for quality filtering pyrosequencing data. Illumina paired-end technology allows another, more traditional approach to quality filtering: each amplicon is sequenced in both directions, so if the amplicon length is less than twice the read length, overlap of the complementary sequences can be used to assess accuracy (Bartram et al., 2011; Gloor et al., 2010). If the two reads overlap completely, meaning the amplicon length is less than or equal to the read length, requiring no mismatches could lead to data sets with little or no sequencing error, although systematic errors that are the same in both directions, could still exist.

Sequence quality filtering should be followed by chimera checking. Using a reference database for chimera detection is the standard method for identifying and removing chimeras and is very effective. Unfortunately, sequencing of novel environments is fast outstripping curated database growth, and novel genes that are parents of chimeras will be missed by reference comparison methods. A combination of both reference comparison and de novo chimera checking should always be used.

Smoothing Imperfect Data by Aggregation

Even with the best quality filtering and chimera checking, sequencing data will still contain reads with base incorporation and base calling errors, chimeric reads, reads from contaminating DNA, and reads from amplification of non-target areas of sample DNA. The methods chosen for downstream analysis of the data will determine the degree of impact these errors will have on research results.

Assigning reads to their closest match in a database of sequences annotated with defined taxonomy is a simple, straightforward way to minimize the impact of small sequence differences, segregate chimeras, identify contaminants, and eliminate reads from non-target amplification. For instance, many of the reference SSU rRNA databases provide taxonomy primarily to the genus level. While this may not be sufficient resolution for some analyses, the use of genus-level taxonomy for analysis will assign similar sequences to the same genus, so that sequences with even a moderate number of errors are likely to still be classified together with their template. Chimeras of sequences within the same genus will remain in that genus, and chimeras of sequences from different genera within a family will tend to not be classified at the genus level; these cross-genera chimeras will aggregate as sequences classified to the family level but no further. The presence of unexpected genera, such as Ralstonia in deep-sea sediment samples, can indicate contamination.

Routinely assigning taxonomy to all sequence reads has the additional advantage of quickly identifying non-target reads. Occasionally, the SSU rRNA primers can amplify DNA from a section of the genome other than the SSU rRNA gene. The resulting amplicons will be quite divergent from any 16S reference gene. Finally, PCR primers designed to be specific for domains or other groups often amplify the rRNA SSU gene from a subset of species outside that group, generally in a non-quantitative manner. Reads that map to taxa outside the group to which the primers were designed can easily be eliminated from downstream analyses.

The other common way to aggregate similar sequences is to cluster them into OTUs based on percent similarity, as described previously. This is often done after assigning taxonomy to reads so that reads from non-template amplicons or from taxonomic groups outside the target range of the primers can be eliminated. Clustering reads based on a similarity score of 97 percent has become a common way of approximating “species” or phylotype in the absence of taxonomic resolution. However, it is important to note that different algorithms create very different 97 percent OTUs, and some of these methods lead to OTU inflation, as discussed previously. Among those that do not inflate OTU counts are SLP-PWAL clustering, which uses a nearest-neighbor approach to link sequences likely to be derived from base incorporation error with average neighbor linkage to form OTUs, and methods known as greedy clustering algorithms. One of the more popular of these is UClust (Edgar, 2010). Briefly, the UClust ranks sequences in order of abundance and seeds the first OTU with the most abundant sequence. The next sequence is compared to the first, and if it is within the clustering threshold then it is added to the OTU, if not then it becomes the seed of a second OTU. Each sequence is compared to the OTU seeds, in order, until all reads have been assigned to an existing OTU or used to create a new OTU.

Diversity in an Imperfect World

Ecologists measure the diversity of a microbial community in a variety of ways. Richness is the number of different members (OTUs, species, genera, phylotypes, etc.) in a community. Evenness describes the distribution of relative abundances of the members, whether they have similar abundances or are skewed with some highly abundant and others rare. A community’s diversity combines richness and evenness (although richness is often called diversity as well). The richness of a single community is often referred to as alpha diversity. The degree of similarity or difference between two or more communities in richness or evenness is beta diversity. One important conceptual difference between these two measures is that alpha diversity is nearly always used to designate an estimate of the true richness of the community from which a data sample was taken, while beta diversity is generally a metric describing the similarity between the observed richness or evenness of two samples (though there are methods for estimating community similarity from sample data [Chao, 2004], they are rarely used in molecular microbial ecology).

Algorithms used to calculate diversity differ in the stability of their results in the presence of residual sequencing errors and chimeras (as reflected in OTU inflation) and in the depth of sampling. Estimates of sample richness (alpha diversity) are particularly susceptible to the impacts of both. Rarefaction curves, while not true estimators of alpha diversity, are often used to illustrate the relationship between the observed number of community members (i.e., OTUs) and sampling depth. The number of OTUs observed for a range of subsampling sizes are plotted against the average number of OTUs observed for each subsample size, describing the number of new members discovered for an incremental increase in sampling effort. With very small subsamples, small increases in sampling depth lead to the discovery of many new members. As sampling depth increases, the number of new members found decreases and begins to asymptote as the sampling depth provides a more complete picture of the underlying community. If OTUs are created using a clustering method that inflates with sampling depth, the rarefaction curves cannot reach an asymptote and instead will increase linearly as a direct function of sampling depth. Superimposing multiple rarefaction curves from complete-linkage OTUs demonstrates the dependence of the slope of the rarefaction curve on the sample depth (Figure A6-2, Panel A). Rarefaction curves based on depth-independent OTUs (e.g., SLP-PWAL or UClust) have essentially identical slopes for sample depths ranging over two orders of magnitude (Figure A6-2, Panel B).

Two rarefaction curvesŠ the first for OTUs generated from the Human Microbiome Project stool samples using the V3-V5 region; the second generated by average linkage clustering


Panel A, Rarefaction curve for OTUs generated from Human Microbiome Project stool samples using the V3–V5 region. We subsampled the 50,000 reads into smaller samples ranging from 1,000 to the full 50,000 and then performed complete linkage clustering (more...)

Some nonparametric methods of estimating alpha diversity remain sensitive to sampling depth independent of OTU inflation or other forms of error. Two of the most common estimators, ACE (Chao and Lee, 1992) and Chao1 (Chao, 1984), are well known to underestimate richness when used for populations where many of the members are “unseen” in the sample, and are more accurately considered lower bounds rather than true estimates. In other words, they do not perform well when a community is drastically undersampled, as is the case for most samples of microbial communities. Panel A of Figure A6-3 illustrates the impact of both sampling depth and clustering method on the estimated richness for a human gut microbiome sample. With increasing depth, the sample more adequately reflects the underlying community, and the richness estimate can stabilize. In this example, Chao1 estimates rise rapidly with sampling depth from 1,000 to 20,000, but then the estimates level. Doubling the sample depth from 20,000 to 40,000 does not change the estimated richness beyond the error bars. Plots of richness against subsample depth may serve as a reality check on the stability (if not accuracy) of calculated richness for a given sample. Parametric estimators such as CatchAll (Bunge, 2011) are less sensitive to sample size and undersampling and also make use of a wider range of OTU sizes in extrapolating total richness.

Two graphs showing alpha diversity using both ACE and Chao estimators and then Simpson and Shannon diversity estimators


Panel A, We calculated alpha diversity (richness) using both ACE and Chao estimators with clusters based on complete linkage and average linkage algorithms. Chao values are consistently smaller than ACE values, and SLP-PWAL are lower than complete linkage. (more...)

Even with depth-independent OTU clustering methods, community richness estimates can be vulnerable to OTU inflation. Most algorithms for estimating richness heavily weight the number of OTUs with one or two reads (singleton and doubleton OTUs). We can assume that most spurious OTUs are singletons (this will not always be the case, for instance early-round chimeras can be amplified in a sample, but it is a conservative assumption). By removing the estimated number of spurious OTUs (e.g., 1 in 1,000) from the count of singletons in the species abundance data used to calculate richness, we can compensate for OTU inflation in estimating alpha diversity. We do not need to know which of our OTUs are spurious and which are true; we only need an estimated number. Panel A of Figure A6-3 shows the impact of several OTU inflation rates on richness estimates for a subsample of 25,000.

At any sampling depth, microbial communities consistently display a long-tail distribution, and therefore evenness will be low at all sampling depths. Both Simpson’s and Shannon’s diversity estimates show very little direct susceptibility to the sampling depth (Figure A6-3, Panel B), but they are still affected by the OTU clustering method. Clearly, choosing a clustering algorithm that minimizes OTU inflation and that is stable to sample size is critical at all times.

The list of beta diversity metrics that compare the degree of similarity or difference between two communities is very long. We highlight the importance of using metrics that are robust to both differences in sampling depth and undersampling using three distance metrics: Jaccard presence/absence, Bray-Curtis, and Morisita-Horn. Results using the Yue-Clayton distance were consistently similar to Morisita-Horn (results not shown). We subsampled a large data set to provide pseudo-replicates that we expect to be similar. If we compare a subsample with itself, then the distance by any metric will be zero (or 1), and the expectation is that a comparison of two different random subsamples should give a very similar value. In Figure A6-4, Panel A, we take multiple subsamples of 5,000 reads from a sample of 50,000 reads and calculate the beta diversity of pairs of replicates. Although we would like the distances to approach zero, we know that microbial diversity is so great that there will still be differences between the replicates due to incomplete sampling. Morisita-Horn, Bray-Curtis, and Yue-Clayton (not shown) all return distances of about 5 percent or less. The Jaccard presence/absence metric, on the other hand, returns a community distance of 50–60 percent. The test of presence or absence in a community, where fewer members have been detected than not, will always show large differences between the communities. The small number of abundant members will be consistent across replicates, but rare members detected in any given replicate will vary. It is not surprising that upwards of half the members of a replicate pair can be different.

A five panel figure presenting graphs results from multiple random subsamples using Bray-Curtis, Jaccard, and Morisita Horn


Panel A, Selecting multiple random subsamples of 5,000 reads from a larger data set of 50,000 reads, we created a set of pseudo-replicate samples. Because they all represent the same larger sample, the pairwise distances should be very small. The use (more...)

In the case of similar subsample size, both Bray-Curtis and Morisita-Horn returned suitably low beta diversity values. Unfortunately, NGS data sets vary greatly in the number of reads depending on the amount of the amplicon library loaded on the sequencer and the quality of the particular sequencing run. Bray-Curtis and Morisita-Horn compare not only which members are present, but also the abundance of each. To adjust for undersampling, Morisita-Horn emphasizes the abundant members, assuming that if a member is abundant in one sample and not detected in the other sample, then this reflects a true difference in the communities, while if a rare member is detected in one and not the other, this may be an artifact of undersampling. Bray-Curtis includes but does not differentially weight the abundance information.

In Panels B through E in Figure A6-4, we illustrate the effect of different sample sizes on beta diversity values. Bray-Curtis returns small diversity values for subsamples of the same size and increasingly larger values when comparing samples of different sizes; the average distance between subsamples of 1,000 and 25,000 is about 90 percent (Figure A6-4, Panel B). This disparity comes in part because the Bray-Curtis method uses absolute rather than relative abundance. Even if data are subsampled to the same depth, Bray-Curtis can still cause misinterpretations of results when combined with one of the most common visualization tools for illustrating community similarity, principal coordinate analysis (PCoA). Panel C of Figure A6-4, shows that Bray-Curtis measures of subsample similarity pairs subsamples based on read depth, even though they are subsamples from the very same community. Even though the absolute distances were low for pairs from the same subsampling depth, a PCoA plot does not report absolute differences but scales according to the set of distances used. For subtle changes in the community, a method such as Bray-Curtis could lead to erroneous interpretations of community shifts.

Morisita-Horn effectively compensates for sampling depth, returning beta diversity values less than 1 percent for all subsample comparisons (Figure A6-4, Panel D). Interestingly, the Morisita-Horn distances for pairs where one subsample was at 1,000 reads were noticeably higher than the other distances. With increasing depth this divergence disappears. In the PCoA plot, the 1,000 read depth pairs do not cluster either with each other or with any of the other data (Figure A6-4, Panel E). Depth pairs with 5,000 are still divergent but much less so than 1,000. In this particular data set, it appears that a minimum sampling depth of 10,000 is necessary to adequately reflect the community in the subsample.

Continued Evidence for the Rare Biosphere

In evaluating both the extent of the rare biosphere and our ability to meaningfully sample it, it is helpful to put the word “rare” into perspective. Estimates vary, but let us assume that there are at least 1×1011 bacterial cells in a single gram dry weight of human stool (Franks et al., 1998). If we sequence DNA from a single gram of stool and analyze a relatively large data set of 50,000 (5×104) reads, we are sampling a tiny fraction of the census population. An OTU found as a singleton may be present at a frequency of only 1/50,000, but that is 2×106 cells/g, which may not be an insignificant number.

But is the long-tail distribution, while consistent across bacterial communities sampled from human and other hosts, marine, freshwater, soil, sand, leaves, sewage, and any number of other environments, merely an artifact of the known phenomenon of OTU inflation caused by deep sequencing? Returning to the empirically derived estimate of 1 spurious OTU per 1,000 reads, we can remove a fraction of singleton OTUs equal to those attributed to OTU inflation (Figure A6-5, Panel A). What we see is that even if we remove 1 out of every 500 singleton OTUs, the distribution retains its characteristic shape, because the fraction of singletons removed compared to the number observed is relatively small. Any spurious OTUs are simply extending the end of the tail incrementally; they are not fundamentally altering the shape of the species abundance curve.

A four panel figure showing the 100 most abundant OTUs across 208 HMP subjects in rank order


Panel A, The rank abundance curve shows only minor reductions in the long tail even assuming that as many as 1 in 500 reads generates a spurious OTU. Panel B, The relative abundance of the 100 most abundant OTUs in the Human Microbiome Project stool samples. (more...)

Everything May or May Not Be Everywhere, but Everything Is Rare Somewhere

One of the stronger pieces of evidence supporting the existence of the rare biosphere comes from comparing the sequences found in different microbial communities. The human gut microbiome, for example, varies greatly between subjects. In practice, this leads to a wide range of relative abundances of even the most common OTUs. Panels B and C in Figures A6-5 show the 100 most abundant OTUs across 208 Human Microbiome Project subjects in rank order (Huse et al., 2012). The maximum abundance in a single subject for each OTU is 1–100 percent (Figure A6-5, Panel B). The minimum abundance for each of these (except the first most abundant OTU) is within the rare biosphere for at least one subject. In other words, essentially all of the most abundant gut OTUs are highly abundant in some subjects and rare or not detected in others. Panel C of Figure A6-5 uses absolute rather than relative abundance and frequencies to portray the same data. Looking only at a single subject, we might be tempted to discount rare OTUs as noise in the data rather than a true rare biosphere signal. In the greater context of many samples, however, we realize that true rare members are prevalent across subjects, and these same members can dominate the microbiome in other subjects. This same pattern can be seen in other environments including the waters of the English Channel (Gilbert et al., 2009) (Figure A6-5, Panel D).


The use of NGS methods has revolutionized microbial ecology. But, as with any new technology, new challenges must be met. For accurate results, great care must be taken to reduce the rates of sequencing errors and to remove DNA amplification chimeras, using high-quality de-noising or paired-end overlap filtering, and chimera detection. These initial steps, however, are not enough. Researchers must also select bioinformatics tools that avoid artificially inflating the number of OTU clusters, alpha diversity estimates, and beta diversity estimates. OTU clustering methods such as SLP-PWAL and UClust reduce inflation, whereas methods employing multiple sequence alignments and complete linkage clustering overestimate the appropriate number of OTUs.

The selection of diversity metrics affects the research results. Simple richness estimates are sample size dependent. Because the most common estimators are known to be affected by undersampling, larger sample sizes (in the absence of depth-dependent OTU inflation) may provide the most accurate richness estimates. One means to enhance the interpretation of richness estimates is to plot the richness at subsample depths for a given sample to see whether the sample depth is sufficient for the estimate to be stable or whether the sample depth is still within the zone of distinct depth-dependence. Fortuitously, both Simpson’s and Shannon’s diversity estimates show independence of sample depth.

Beta diversity should only be calculated with an appropriately robust metric that can accommodate sample depth. Even in cases where multiple samples are subsampled to the same depth before calculating intercommunity distance, the use of a depth-dependent metric such as Bray-Curtis will still be affected by depth and can in cases of more subtle shifts in community structure skew PCoA plots, resulting in possible misinterpretation of results. The practice of subsampling to the minimum can introduce artifacts of undersampling as demonstrated in Figure A6-5. A much more robust method is to select a beta diversity algorithm such as Morisita-Horn or Yue-Clayton that does not require subsampling. For all alpha and beta diversity calculations there are thresholds of undersampling that no metric selection can overcome. Therefore, in the absence of other depth-dependent overestimates (such as poor selection of clustering method), it is best to use full sample sizes rather than subselecting to a minimal and therefore less representative sample size.

Even in the best of all research worlds, errors, OTU inflation, chimeras, contamination, and other inaccuracies will still exist. In this light, the use of multiple samples for determining when low abundance “errare” OTUs or taxa are errors and when they are true rare members is necessary. One straightforward means for deciding to trust the validity is if an OTU occurs abundantly in any sample. By clustering OTUs or using taxonomy and performing bioinformatics analyses across multiple samples at once, it is easy to detect abundant members in the set of samples, validating those members in communities where they are rare. Given our current techniques, context is the best method for discerning truth from fiction in the rare biosphere.


  • Acinas SG, Klepac-Ceraj V, Hunt DE, Pharino C, Ceraj I, Distel DL, Polz MF. Fine-scale phylogenetic architecture of a complex bacterial community. Nature. 2004;430:551–554. [PubMed: 15282603]
  • Bartram AK, Lynch MDJ, Stearns JC, Moreno-Hagelsieb G, Neufeld JD. Generation of multimillion-sequence 16s rRNA gene libraries from complex microbial communities by assembling paired-end illumina reads. Applied and Environmental Microbiology. 2011;77(11):3846–3852. [PMC free article: PMC3127616] [PubMed: 21460107]
  • Bunge J. Estimating the number of species with CatchAll; Proceedings of the 2011 Pacific Symposium on Biocomputing.2011. [PubMed: 21121040]
  • Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, Fierer N, Pena AG, Goodrich JK, Gordon JI, Huttley GA, Kelley ST, Knights D, Koenig JE, Ley RE, Lozupone CA, McDonald D, Muegge BD, Pirrung M, Reeder J, Sevinsky JR, Turnbaugh PJ, Walters WA, Widmann J, Yatsunenko T, Zaneveld J, Knight R. QIIME allows analysis of high-throughput community sequencing data. Nature Methods. 2010;7(5):335–336. [PMC free article: PMC3156573] [PubMed: 20383131]
  • Chao A. Nonparametric estimation of the number of classes in a population. Scandinavian Journal of Statistics. 1984;11:265–270.
  • Chao A, Lee S-M. Estimating the number of classes via sample coverage. Journal of the American Statistical Association. 1992;87(417):210–217.
  • DeSantis TZ, Hugenholtz P, Larsen N, Rojas M, Brodie EL, Keller K, Huber T, Dalevi D, Hu P, Andersen GL. Greengenes, a chimera-checked 16s rRNA gene database and workbench compatible with ARB. Applied and Enviromental. Microbiology. 2006;72(7):5069–5072. [PMC free article: PMC1489311] [PubMed: 16820507]
  • Edgar RC. Search and clustering orders of magnitude faster than BLAST. Bioinformatics. 2010;26(19):2460–2461. [PubMed: 20709691]
  • Edgar RC, Haas BJ, Clemente JC, Quince C, Knight R. Uchime improves sensitivity and speed of chimera detection. Bioinformatics. 2011;27(16):2194–2200. [PMC free article: PMC3150044] [PubMed: 21700674]
  • Franks AH, Harmsen HJM, Raangs GC, Jansen GJ, Schut F, Welling GW. Variations of bacterial populations in human feces measured by fluorescent in situ hybridization with group-specific 16s rRNA-targeted oligonucleotide probes. Applied and Environmental Microbiology. 1998;64(9):3336–3345. [PMC free article: PMC106730] [PubMed: 9726880]
  • Gilbert JA, Dawn F, Paul S, Lindsay N, Anna O, Tim S, Paul JS, Sue H, Ian J. The seasonal structure of microbial communities in the western English Channel. Environmental Microbiology. 2009;11(12):3132–3139. [PubMed: 19659500]
  • Gloor GB, Hummelen R, Macklaim JM, Dickson RJ, Fernandes AD, MacPhee R, Reid G. Microbiome profiling by illumina sequencing of combinatorial sequence-tagged PCR products. PLoS ONE. 2010;5(10):e15406. [PMC free article: PMC2964327] [PubMed: 21048977]
  • Haas BJ, Gevers D, Earl AM, Feldgarden M, Ward DV, Giannoukos G, Ciulla D, Tabbaa D, Highlander SK, Sodergren E, Methé B, DeSantis TZ, The Human Microbiome C, Petrosino JF, Knight R, Birren BW. Chimeric 16s rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons. Genome Research. 2011;21(3):494–504. [PMC free article: PMC3044863] [PubMed: 21212162]
  • Huber JA, Morrison HG, Huse SM, Neal PR, Sogin ML, Mark Welch DB. Effect of PCR amplicon size on assessments of clone library microbial diversity and community structure. Environmental Microbiology. 2009;11(5):1292–1302. [PMC free article: PMC2716130] [PubMed: 19220394]
  • Huse S, Huber J, Morrison H, Sogin M, Mark Welch D. Accuracy and quality of massively parallel DNA pyrosequencing. Genome Biology. 2007;8(7):R143. [PMC free article: PMC2323236] [PubMed: 17659080]
  • Huse SM, Mark Welch D, Morrison HG, Sogin ML. Ironing out the wrinkles in the rare biosphere through improved OTU clustering. Environmental Microbiology. 2010;12(7):1889–1898. [PMC free article: PMC2909393] [PubMed: 20236171]
  • Huse SM, Ye Y, Zhou Y, Fodor AA. A core human microbiome as viewed through 16s rRNA sequence clusters. PLoS ONE. 2012;7(6):e34242. [PMC free article: PMC3374614] [PubMed: 22719824] [Cross Ref]
  • Kunin V, Engelbrektson A, Ochman H, Hugenholtz P. Wrinkles in the rare biosphere: Pyrosequencing errors lead to artificial inflation of diversity estimates. Environmental Microbiology. 2010;12(1):118–123. [PubMed: 19725865]
  • Lahr DJG, Katz LA. Reducing the impact of PCR-mediated recombination in molecular evolution and environmental studies using a new-generation high-fidelity DNA polymerase. BioTechniques. 2009;47:857–866. [PubMed: 19852769]
  • Meacham F, Boffelli D, Dhahbi J, Martin D, Singer M, Pachter L. Identification and correction of systematic error in high-throughput sequence data. BMC Bioinformatics. 2011;12(1):451. [PMC free article: PMC3295828] [PubMed: 22099972]
  • Minoche A, Dohm J, Himmelbauer H. Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and genome analyzer systems. Genome Biology. 2011;12(11):R112. [PMC free article: PMC3334598] [PubMed: 22067484]
  • Pruesse E, Quast C, Knittel K, Fuchs BM, Ludwig W, Peplies J, Glockner FO. Silva: A comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB. Nucleic Acids Research. 2007;35(21):7188–7196. [PMC free article: PMC2175337] [PubMed: 17947321]
  • Qiu X, Wu L, Huang H, McDonel PE, Palumbo AV, Tiedje JM, Zhou J. Evaluation of PCR-generated chimeras, mutations, and heteroduplexes with 16s rRNA gene-based cloning. Applied and Environmental Microbiology. 2001;67:880–887. [PMC free article: PMC92662] [PubMed: 11157258]
  • Quince C, Lanzen A, Curtis TP, Davenport RJ, Hall N, Head IM, Read LF, Sloan WT. Noise and the accurate determination of microbial diversity from 454 pyrosequencing data. Nature Methods. 2009;6(9):639–641. [PubMed: 19668203]
  • Quince C, Lanzen A, Davenport R, Turnbaugh P. Removing noise from pyrosequenced amplicons. BMC Bioinformatics. 2011;12(1):38. [PMC free article: PMC3045300] [PubMed: 21276213]
  • Reeder J, Knight R. The “rare biosphere”: A reality check. Nature Methods. 2009;6(9):636–637. [PubMed: 19718016]
  • Schloss PD, Gevers D, Westcott SL. Reducing the effects of PCR amplification and sequencing artifacts on 16s rRNA-based studies. PLoS ONE. 2011;6(12):e27310. [PMC free article: PMC3237409] [PubMed: 22194782]
  • Schloss PD, Handelsman J. Introducing DOTUR, a computer program for defining operational taxonomic units and estimating species richness. Applied and Environmental Microbiology. 2005;71(3):1501–1506. [PMC free article: PMC1065144] [PubMed: 15746353]
  • Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, Lesniewski RA, Oakley BB, Parks DH, Robinson CJ, Sahl JW, Stres B, Thallinger GG, Van Horn DJ, Weber CF. Introducing mothur: Open source, platform-independent, community-supported software for describing and comparing microbial communities. Applied and Environmental Microbiology. 2009;75(23):7537–7541. [PMC free article: PMC2786419] [PubMed: 19801464]
  • Sogin ML, Morrison HG, Huber JA, Mark Welch D, Huse SM, Neal PR, Arrieta JM, Herndl GJ. Microbial diversity in the deep sea and the underexplored “rare biosphere. Proceedings of the National Academy of Sciences. 2006;103(32):12115–12120. [PMC free article: PMC1524930] [PubMed: 16880384]
  • Wang Q, Garrity GM, Tiedje JM, Cole JR. A naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Applied and Environmental Microbiology. 2007;73(16):5261–5267. [PMC free article: PMC1950982] [PubMed: 17586664]


,24 ,25 ,24 ,26,¤a ,24 ,24 ,27 ,26 ,24,¤b ,24,¤c ,24,28 ,24,29 ,24,¤d and 24,27,*.

24 Center for Microbial Genetics and Genomics, Northern Arizona University, Flagstaff, Arizona, USA.
25 Institut Pasteur de Madagascar, Antananarivo, Madagascar.
26 Max Planck Institut für Infektionsbiologie, Berlin, Germany.
27 Institute for Genomic Sciences (IGS), School of Medicine, University of Maryland, Baltimore, Maryland, USA.
28 Translational Genomics Research Institute, Phoenix, Arizona, United States of America.
29 Environmental Research Institute, University College Cork, Cork, Ireland.
¤a Current address: Unite Mixte de Recherche 6191, Centre National de la Recherche Scientifique-Commissariat à l’Energie Atomique-Aix-Marseille Université, Commissariat à l’Energie Atomique Cadarache, Saint Paul Lez Durance, France.
¤b Current address: Centers for Disease Control and Prevention, Atlanta, Georgia, USA.
¤c Current address: Laboratoire de Biologie Médicale du Tampon, Le Tampon, Reunion Island, France.
¤d Current address: Institut Pasteur de Nouvelle-Calédonie, Nouméa, New Caledonia.


Background Plague was introduced to Madagascar in 1898 and continues to be a significant human health problem. It exists mainly in the central highlands, but in the 1990s it was reintroduced to the port city of Mahajanga, where it caused extensive human outbreaks. Despite its prevalence, the phylogeography and molecular epidemiology of Y. pestis in Madagascar has been difficult to study due to the great genetic similarity among isolates. We examine island-wide geographic-genetic patterns based upon whole-genome discovery of SNPs, SNP genotyping, and hypervariable variable-number tandem repeat (VNTR) loci to gain insight into the maintenance and spread of Y. pestis in Madagascar.

Methodology and principal findings We analyzed a set of 262 Malagasy isolates using a set of 56 SNPs and a 43-locus multi-locus VNTR analysis (MLVA) system. We then analyzed the geographic distribution of the subclades and identified patterns related to the maintenance and spread of plague in Madagascar. We find relatively high levels of VNTR diversity in addition to several SNP differences. We identify two major groups, Groups I and II, which are subsequently divided into 11 and 4 subclades, respectively. Y. pestis appears to be maintained in several geographically separate subpopulations. There is also evidence for multiple long distance transfers of Y. pestis, likely human mediated. Such transfers have resulted in the reintroduction and establishment of plague in the port city of Mahajanga, where there is evidence for multiple transfers both from and to the central highlands.

Conclusions and Significance The maintenance and spread of Y. pestis in Madagascar is a dynamic and highly active process that relies on the natural cycle between the primary host, the black rat, and its flea vectors as well as human activity.

Author Summary

Plague, caused by the bacterium Yersinia pestis, has been a problem in Madagascar since it was introduced in 1898. It mainly affects the central highlands, but also has caused several large outbreaks in the port city of Mahajanga, after it was reintroduced there in the 1990s. Despite its prevalence, the genetic diversity and related geographic distribution of different genetic groups of Y. pestis in Madagascar has been difficult to study due to the great genetic similarity among isolates. We subtyped a set of Malagasy isolates and identified two major genetic groups that were subsequently divided into 11 and 4 subgroups, respectively. Y. pestis appears to be maintained in several geographically separate subpopulations. There is also evidence for multiple long distance transfers of Y. pestis, likely human mediated. Such transfers have resulted in the reintroduction and establishment of plague in the port city of Mahajanga where there is evidence for multiple transfers both from and to the central highlands. The maintenance and spread of Y. pestis in Madagascar is a dynamic and highly active process that relies on the natural cycle between the primary host, the black rat, and its flea vectors as well as human activity.


Throughout recorded history, Yersinia pestis, etiologic agent of plague, has spread multiple times from foci in central Asia in greatly widening swaths as human-mediated transport became more efficient (Morelli et al., 2010). Plague attained its current global distribution during the current “third” pandemic, which began in 1855 in the Chinese province of Yünnan, when it was introduced to many previously unaffected countries via infected rats on steam ships (Perry and Fetherston, 1997). Plague caused widespread outbreaks during this introduction period (~1900 A.D.), and though disease incidence has since largely decreased, plague remains a significant human health threat due to the severe and often fatal nature of the disease, the many natural plague foci (Perry and Fetherston, 1997), and its potential as a bioterror agent (it is currently classified as a Class A Select Agent [Rotz et al., 2002]). Plague is of particular significance in Madagascar, which has reported some of the highest human plague case numbers (18%–60% of the world total each year between 1995 and 2009) (WHO, 2010) and was the origin of a natural multi-drug resistant strain of Y. pestis (Galimand et al., 1997; Welch et al., 2007).

Plague has been a problem in Madagascar since its introduction during the current pandemic. It was first introduced to Toamasina in 1898 (Brygoo, 1966), likely via India (Morelli et al., 2010), with outbreaks in other coastal cities soon after. In 1921, plague reached the capital, Antananarivo, likely via infected rats transported on the railroad linking Toamasina and Antananarivo. Subsequent rat epizootics signaled the establishment of plague in the central highlands (Brygoo, 1966). Plague then disappeared from the coast and now exists within two large areas in the central and northern highlands above 800 m in elevation (Chanteau et al., 1998). This elevational distribution of plague is linked to the presence of the flea vectors Xenopsylla cheopis and Synopsyllus fonquerniei, which are less abundant and absent, respectively, below 800 m (Duplantier, 2001; Duplantier et al., 1999). Plague has never disappeared from this region, and although it was relatively controlled in the 1950s due to public hygiene improvements and the introduction of antibiotics and insecticides, disease incidence began increasing in 1989 (Chanteau et al., 1998, 2000; Migliani et al., 2006). Human plague cases peaked in 1997 but continue to occur at high frequencies, making Madagascar among the top three countries for human plague cases during the past 15 years (WHO, 2010).

A third, newly emerged plague focus outside the central and northern highlands is the port city of Mahajanga, located ~400 km by air from Antananarivo (Chanteau et al., 1998). Plague first appeared in Mahajanga during an outbreak in 1902. Subsequent outbreaks occurred in 1907 and between 1924 and 1928 (Brygoo, 1966). Plague then disappeared from Mahajanga for a period of 62 years before reappearing during a large outbreak in 1991 (Laventure et al., 1991). Subsequent outbreaks occurred from 1995–1999 (Boisier et al., 1997, 2000; Rasolomaharo et al., 1995). During this time, the Mahajanga focus was responsible for ~30% of the reported human plague cases in Madagascar (Boisier et al., 2002). Interestingly, this focus likely represents one of the only examples of plague being reintroduced to an area where it had gone extinct, rather than emergence from a silently cycling rodent reservoir without telltale human cases (Duplantier et al., 2005).

Molecular subtyping of Y. pestis for epidemiological tracking has been difficult due to a lack of genetic diversity (Achtman et al., 1999). SNP genotyping (Achtman et al., 2004; Eppinger et al., 2010; Morelli et al., 2010), ribotyping (Guiyoule et al., 1994), IS100 insertion element restriction fragment length polymorphism (RFLP) analysis (Achtman et al., 1999), PCR-based IS100 genotyping (Achtman et al., 2004; Motin et al., 2002) and pulsed-field gel electrophoresis (PFGE) (Lucier and Brubaker, 1992) have been used to differentiate global isolate collections; however, SNP genotyping provides the most robust phylogenetic reconstructions. SNP genotyping (Morelli et al., 2010), ribotyping (Guiyoule et al., 1997), IS100 insertion element RFLP analysis (Huang et al., 2002), different region (DFR) analysis (Li et al., 2008), clustered regularly interspaced short palindromic repeats (CRISPR) analysis (Cui et al., 2008), ERIC-PCR (Kingston et al., 2009), ERIC-BOX-PCR (Kingston et al., 2009), and PFGE (Huang et al., 2002; Zhang et al., 2009) have shown limited to moderate ability in differentiating isolates on a regional scale. Of these, ribotyping has been applied to a set of 187 Malagasy isolates, but only revealed four ribotypes, three of which were unique to Madagascar (Guiyoule et al., 1997). SNP genotyping of 82 Malagasy isolates provided greater and more phylogenetically informative resolution, revealing two major groups and an additional 10 subgroups derived from these two major groups that were mostly isolate-specific (Morelli et al., 2010). In contrast to these other molecular subtyping methods, multi-locus variable-number tandem repeat (VNTR) analysis (MLVA) has shown high discriminatory power at global (Achtman et al., 2004; Klevytska et al., 2001; Pourcel et al., 2004), regional (Girard et al., 2004; Klevytska et al., 2001; Li et al., 2009; Lowell et al., 2007; Zhang et al., 2009), and local scales (Girard et al., 2004), indicating its likely usefulness for further differentiation among Y. pestis isolates from Madagascar.

The use of SNPs and MLVA together, in a hierarchical approach, has been successfully applied to clonal, recently emerged pathogens (Keim et al., 2004; Van Ert et al., 2007; Vogler et al., 2009). Point mutations that result in SNPs occur at very low rates, making SNPs relatively rare in the genome, but discoverable through intensive sampling (i.e., whole genome sequencing). In addition, since each SNP likely occurred only once in the evolutionary history of an organism, SNPs represent highly stable phylogenetic markers that can be used for identifying key phylogenetic positions (Keim et al., 2004). However, SNPs discovered from a limited number of whole genome sequences will have limited resolving power (Keim et al., 2004) since they will only be able to identify phylogenetic groups along the evolutionary path(s) linking the sequenced genomes (Pearson et al., 2004). In contrast, VNTRs possess very high mutation rates and multiple allele states, allowing them to provide a high level of resolution among isolates. Unfortunately, these high mutation rates can lead to mutational saturation and homoplasy, which can obscure deeper phylogenetic relationships, leading to inaccurate phylogenies. Using these two marker types together, in a nested hierarchical approach, with SNPs used to identify major genetic groups followed by VNTRs to provide resolution within those groups, allows for both a deeply rooted phylogenetic hypothesis and high resolution discrimination among closely related isolates (Keim et al., 2004).

We investigated the phylogeography and molecular epidemiology of Y. pestis in Madagascar through extensive genotyping and mapping of genetic groups. We genotyped 262 Malagasy isolates from 25 districts from 1939–2005 using 56 SNPs and a 43-marker MLVA system to identify island specific subclades. We then spatially mapped the subclades to examine island-wide geographic-genetic patterns and potential transmission routes.


Ethics Statement

The DNAs analyzed in this study (Table S1) were extracted from Y. pestis cultures that were previously isolated by the Malagasy Central Laboratory for plague and Institut Pasteur de Madagascar as part of Madagascar’s national plague surveillance plan. The Malagasy Ministry of Health, as part of this national plague surveillance plan, requires declaration of all suspected human plague cases and collection of biological samples from those cases. These biological samples are analyzed by the Malagasy Central Laboratory for plague and Institut Pasteur de Madagascar, which also maintains any cultures derived from these samples. These cultures are all de-linked from the patients from whom they originated and analyzed anonymously if used in any research study. Thus, for purposes of this study, all of the DNAs derived from Y. pestis cultures from human patients were analyzed anonymously. No Malagasy review board existed during the collection period of the cultures (1939–2001) from which the DNAs used in this study were derived. In addition, the Institutional Review Board of Northern Arizona University, where the DNA genotyping was done, did not require review of the research due to the anonymous nature of the samples.


DNA was obtained from 262 isolates from 25 different districts from 1939–2005 (Figure S1, Table S1). DNAs consisted of simple heat lysis preparations or whole genome amplification (WGA) (QIAGEN, Valencia, CA) products generated from the heat lysis preps. Most of the isolates were collected by the Malagasy Central Laboratory for plague supervised by the Institut Pasteur de Madagascar and were primarily isolated from human cases with a few isolated from other mammals or fleas. A handful of other isolates were from other institutions (still originally collected by the Malagasy Central Laboratory for plague) or represent publically available whole genome sequences (Table S1).

SNP Genotyping

A total of 56 SNPs were chosen to genotype the Malagasy isolates because they either marked the branches leading to or from the Madagascar clades in a worldwide analysis (Morelli et al., 2010) or were polymorphic among Malagasy isolates (Table S2). These SNPs were either previously identified in a worldwide SNP study on Y. pestis using a combination of denaturing high performance liquid chromatography (dHPLC) and whole genome sequence comparisons (Morelli et al., 2010) or identified here through whole genome sequence comparisons among 2 Malagasy whole genome sequences (MG05-1020 [GenBank:AAYS00000000] and IP275 [GenBank:AAOS00000000] [Morelli et al., 2010]) and 14 other Y. pestis strain sequences (CO92 [GenBank:AL590842] (Parkhill et al., 2001), FV-1 [GenBank:AAUB00000000] (Touchman et al., 2007), CA88-4125 [GenBank:ABCD00000000] (Auerbach et al., 2007), Antiqua [GenBank:CP000308], Nepal 516 [GenBank:CP000305] (Chain et al., 2006), UG05-0454 [GenBank:AAYR00000000] (Morelli et al., 2010), KIM 10 [GenBank:AE009952] (Deng et al., 2002), F1991016 [GenBank:ABAT00000000], E1979001 [GenBank:AAYV00000000], K1973002 [GenBank:AAYT00000000], B42003004 [GenBank:AAYU00000000] (Eppinger et al., 2009), Pestoides F [GenBank:CP000668] (Garcia et al., 2007), Angola [GenBank:CP000901] (Eppinger et al., 2010) and 91001 [GenBank:AE017042] [Song et al., 2004]). These whole genome sequence comparisons involved comparing the predicted gene sequences of the closed genome of Y. pestis strain CO92 (Parkhill et al., 2001) to the completed and draft genomes of all other strains using MUMmer and in-house Perl scripts (Delcher et al., 2002). For genomes with deposited underlying Sanger sequencing read information, a polymorphic site was considered of high quality when its underlying sequence in the query comprised at least three sequencing reads with an average Phred quality score >30 (Eppinger et al., 2010; Ewing et al., 1998).

A TaqMan-minor groove binding (MGB) assay or a melt mismatch amplification mutation assay (Melt-MAMA) was developed for each SNP for use in genotyping the Malagasy DNAs. A TaqMan-MGB assay was designed around one SNP known to divide Malagasy isolates into two major groups (Mad-43, Table S2). Melt-MAMA assays were designed around the other 55 SNPs as previously described (Vogler et al., 2009). SNP locations, primer sequences, primer concentrations and other information for these assays are presented in Table S2. Primers and probes were designed using Primer Express 3.0 software (Applied Biosystems, Foster City, CA). Each 5 μl TaqMan-MGB reaction contained primers and probes (for concentrations see Table S2), 1× Platinum Quantitative PCR SuperMix-UDG with ROX (Invitrogen, Carlsbad, CA), water and 1 μl of template. Each 5 μl Melt-MAMA reaction contained 1× SYBR Green PCR Master Mix (Applied Biosystems) or 1× EXPRESS SYBR GreenER qPCR Supermix with Premixed ROX (Invitrogen) (for assay-specific master mix see Table S2), derived and ancestral allele-specific MAMA primers, a common reverse primer (for primer concentrations see Table S2), water and 1 μl of diluted DNA template. DNA templates were diluted 1/10 for heat lysis preparations or 1/50 for WGA products. All assays were performed on an Applied Biosystems 7900HT Fast Real-Time PCR System with SDS software v2.3. Thermal cycling conditions for the TaqMan-MGB assay were as follows: 50°C for 2 min, 95°C for 2 min and 50 cycles of 95°C for 15 s and 66°C for 1 min. Thermal cycling conditions for the Melt-MAMA assays were as follows: 50°C for 2 min, 95°C for 10 min and 40 cycles of 95°C for 15 s and 55–65°C for 1 min (see Table S2 for assay-specific annealing temperatures). Melt-MAMA results were interpreted as previously described (Vogler et al., 2009).


All 262 Malagasy isolates were also genotyped using a 43-marker MLVA system as previously described (Girard et al., 2004).

Node Assignment

In general, missing SNP data (<0.5% of dataset) were not a factor in node assignment (see SNP phylogenetic analysis below) since data were usually available for an equivalent SNP, thus leading to unambiguous node assignments for most isolates. However, there were four cases where the node assignment was potentially ambiguous. For three isolates missing data for SNP Mad-21 (branch 1.ORI3.k-1.ORI3.o, Table S2), the ancestral allele state was assumed for that SNP for those isolates, since in this and in a previous worldwide analysis (Morelli et al., 2010), only a single isolate, not included among these three, belonged to node “o.” For a single isolate missing data for SNP Mad-46 (branch 1.ORI3.d-1. ORI3.h1, Table S2) the derived state was assumed, due to the placement of that isolate in MLVA subclade II.B in a neighbor-joining analysis and the observed congruence between the “h” nodes and MLVA subclade II.B (see phylogenetic analyses below, Table S1).

Phylogenetic Analyses

A hierarchical approach was applied to the phylogenetic analysis of the Malagasy isolates. First, a SNP phylogeny was generated using data from all 56 SNPs (Figure A7-1). Second, neighbor-joining dendrograms based upon MLVA data were constructed using MEGA 3.1 (Kumar et al., 2001) for the two main groups in the SNP phylogeny, Groups I and II (Figure A7-2A–B). These groups corresponded to the two major Malagasy groups in a previous worldwide analysis (Morelli et al., 2010) and so were separated prior to analyzing with MLVA. The remaining SNPs showing variation among the Malagasy isolates mostly defined subclades observed in the MLVA phylogenies or were specific to single isolates, and so were not used to further separate the isolates prior to applying MLVA. The locations of these additional SNPs are marked on the two MLVA phylogenies where applicable (Figure A7-2A–B). A small set of SNPs provided very fine-scale resolution of the lineage leading to the whole genome sequenced MG05-1020 strain and are not marked on the MLVA phylogeny due to disagreement between the SNP and MLVA phylogenies on this small scale. Distance matrices for the two MLVA phylogenies were based upon mean character differences. Bootstrap values were based upon 1,000 simulations and were generated using PAUP 4.0b10 (D. Swofford, Sinauer Associates, Inc., Sunderland, MA). Branches with ≥50% bootstrap support and/or supported by one or more SNPs were identified as subclades. One other cluster (II.A) was also considered a subclade despite a lack of bootstrap support because of the proximity of a SNP-defined subclade (Figure A7-2B).

A diagram showing the SNP phylogeny of 262 Malagasy isolates


SNP phylogeny of 262 Malagasy isolates. Nodes were named as in Morelli et al. (2010) (lower case letters) and belong to the 1.ORI3 group described there (Morelli et al., 2010). Previously identified nodes (Morelli et al., 2010) that were expanded in this (more...)

A two panel diagram showing neighbor-joining dendrograms based on MLVA data


Neighbor-joining dendrograms based upon MLVA data. Dendrograms for Group I (A) and Group II (B) are indicated. The SNP phylogeny from Figure 1 is also indicated (C) for comparison. Subclades within Groups I and II are collapsed in the full phylogenies (more...)

Geographic Distribution of Subclades

We mapped the geographic distributions of the Group I and II subclades we identified to determine their phylogeographic patterns (Figure A7-3).

A map showing the geographic distribution of MLVA subclades in Madagascar


Geographic distribution of MLVA subclades in Madagascar. The MLVA phylogenies for Groups I and II from Figure A7-2A–B are presented with labeled subclades. Light gray shaded districts indicate Madagascar districts where Y. pestis isolates used (more...)

Statistical Analyses

Analysis of similarity (ANOSIM) (Clarke, 1993) tests were performed using PRIMER software version 5 to test the hypotheses that 1) Groups I and II form distinct geographic groups and 2) the identified subclades form distinct geographic groups. These tests were performed on all subclades with ≥5 members (N = 221 isolates), thus excluding the unaffiliated isolates and subclades I.C, I.H, I.I and I.G (Table S1). The results of all 55 pairwise comparisons among the subgroups were evaluated at α = 0.000909 (global α of 0.05 divided by 55). To determine if there was a rank relationship between genetic distance and geographic distance, a Spearman correlation coefficient was generated using the RELATE function in PRIMER software with significance of the resulting statistics determined using 10,000 random permutations of the data. This analysis utilized all isolates with any geographic data (N = 256), with district centroids used as the geographic location for isolates for which only district level geographic information was available (N = 33); city/commune point geographic data were used for the remaining 223 isolates. Six isolates lacking any geographic information were excluded from both statistical analyses (Table S1).


Genetic Diversity of Y. Pestis in Madagascar

Our hypervariable-locus and genome-based approaches identified a relatively high level of genetic diversity among the 262 Malagasy isolates from 25 districts from 1939–2005. We confirmed the presence of two major genetic groups, Groups I and II, differentiated by a single SNP, Mad-43 (Figure A7-1, Table S2), and many VNTR mutational steps. Groups I and II were further differentiated into eleven (I.A–I.K, Figure A7-2A, Table S1) and four (II.A–II.D, Figure A7-2B, Table S1) subclades, respectively, based upon MLVA and/or SNPs. All but one of these subclades was at least weakly supported by bootstrap values ≥50 and/ or one or more SNPs (Figure A7-2A–B). The high mutation rates at VNTR loci can lead to homoplasy and, consequently, to low bootstrap support for deeper phylogenetic relationships when analyzing isolates from regional or worldwide collections (Achtman et al., 2004; Johansson et al., 2004; Keim et al., 2004; Lowell et al., 2007). Nevertheless, subsequent analyses using more phylogenetically stable molecular markers (i.e., SNPs) have confirmed MLVA-determined clades with weak or even no bootstrap support (Achtman et al., 2004; Vogler et al., 2009), leading us to use even weak bootstrap support to validate subclades in this analysis. Of the two MLVA identified subclades without bootstrap support, II.A and II.B, subclade II.B was supported by SNP Mad-46 (Table S2) and subclade II.A was designated due to its proximity to and clear separation from the SNP-identified subclade II.B (Figure A7-2B). Subclades I.B, I.F, and I.H were supported by SNPs Mad-26 to 31, Mad-42, and Mad-09 to 17 (Table S2), respectively, and bootstrap analysis (Figure A7-2A). MLVA also identified 23 and 5 isolates in Groups I and II, respectively, that did not belong to any of the identified subclades within those groups (hereafter referred to as unaffiliated isolates) (Figure A7-2A–B, I.NONE and II.NONE isolates in Table S1). Four of these unaffiliated isolates and isolates in subclades I.B, I.H and II.B were also identified by apparently isolate-specific SNPs (Figure A7-2A–B). Overall, MLVA identified 226 genotypes among the 262 isolates, constituting far better resolution than that achieved using ribotyping (Guiyoule et al., 1997).

The SNP and MLVA analyses showed remarkable congruence. Nearly all of the nodes in the SNP phylogeny either corresponded to MLVA subclades or were specific to individual isolates, allowing the combined analysis of SNP and MLVA data discussed above. Three nodes (f, m and n, Figure A7-1) did not have representatives in this study, but appeared to be specific for individual isolates in a previous analysis (Morelli et al., 2010). The only exception to this congruence was within the lineage leading to the whole genome sequenced strain, MG05-1020 (q nodes in Figure A7-1 and subclade I.B in Figure A7-2A). In this case, the SNP phylogeny (q nodes, Figure A7-1) was more accurate than and provided nearly as much resolution as the corresponding MLVA phylogeny (I.B, Figure A7-2A). This fine-scale phylogenetic resolution was due to the use of a high resolution SNP discovery method, whole genome sequence comparisons, to discover SNPs along this lineage as opposed to the lower resolution dHPLC method used to discover most of the other Malagasy SNPs (Morelli et al., 2010). Interestingly, comparable resolution was not seen in the lineage leading to the other whole genome sequenced strain, IP275 (l nodes in Figure A7-1 and subclade I.H in Figure A7-2A), likely due to the very low number of isolates (N = 2) within that lineage in this analysis.

Missing data for two SNP assays suggested a potential genomic rearrangement (e.g., deletion) in some of the Malagasy strains. Twenty-five of the 262 isolates were missing data for two SNP assays despite repeated attempts at amplification (Table S1). The two SNPs, Mad-28 and Mad-41, were located <850 bp apart at CO92 positions 2,208,345 and 2,207,531, respectively (Table S2), suggesting that there may have been a genomic rearrangement affecting this region in these strains. Intriguingly, IS100 elements were located flanking these SNPs at CO92 positions 2,135,459–2,137,412 and 2,236,265–2,238,215. IS elements are important facilitators of genomic rearrangements in Y. pestis (Auerbach et al., 2007; Chain et al., 2006) and may have played a role in this result. If so, the same or a similar genomic rearrangement must have occurred multiple times since the 25 isolates were members of six different nodes in the SNP phylogeny (Table S1). This hypothesis is supported by the fact that IS100 elements are known potential hotspots for genomic rearrangements and excisions in Y. pestis (Achtman, 2004; Auerbach et al., 2007).

Geographic Distribution of Isolates

Significant geographic separation was observed among the identified subclades. Overall, there was a small, but highly significant relationship between genetic and geographic distance (Spearman correlation coefficient ρ = 0.226, p<0.0001). In addition, the two main genetic groups, Groups I and II, formed distinct geographic groups based upon an ANOSIM (R = 0.091, p = 0.0007). Group II isolates, which possessed the derived state for SNP Mad-43 (Table S2), were essentially restricted to three of the most active plague districts in the central highlands, Betafo, Manandriana and Ambositra (Chanteau et al., 2000), and an adjacent district, Ambatofinandrahana (Figure A7-3, S1). The only exceptions to this were the five unaffiliated Group II isolates, which were scattered in districts to the east and north (+ symbols, Figure A7-3). In contrast, Group I isolates were found in all three foci, both the central and northern highlands and Mahajanga. Geographic separation among the individual Group I and II subclades was also apparent (Figure A7-3) and statistically supported in an ANOSIM (R = 0.232, p<0.0001). Post-hoc analyses of the pairwise comparisons among subclades indicated that most of the eleven tested subclades formed distinct geographic groups (data not shown). Indeed, several interesting geographic patterns were apparent for the different subclades, only some of which are described below. Separate Group I subclades were found in the northern (I.C, I.G, and I.I, Figure A7-3, Table S1) versus the central (I.A, I.B, I.D, I.E, I.F, I.H, I.J, and I.K, Figure A7-3, Table S1) highlands. Subclade I.A, the largest single subclade, was the dominant subclade found in the capital, Antananarivo, and the surrounding area (Figure A7-3, S1). With the exception of two isolates, it was also the only subclade found in Mahajanga (Figure A7-3, S1, Table S1), indicating a central highlands origin for the Y. pestis responsible for the series of Mahajanga plague outbreaks from 1991–1999 (Boisier et al., 1997, 2002; Laventure et al., 1991; Rasolomaharo et al., 1995). Subclade I.B was the only subclade found in the northeastern portion of the central highlands (Figure A7-3). Geographic analysis of the corresponding SNP phylogeny (q nodes, Figure A7-1) for this subclade revealed some additional geographic-genetic patterns. Isolates with the same SNP genotype tended to be clustered geographically, although no distinct spreading pattern could be discerned, possibly due to the limited number of isolates (Figure A7-4). Subclade I.E was predominantly found in the southern central highlands, in district Fianarantsoa, and also appears to be the subclade responsible for the reemergence of plague in the Ikongo district (Migliani et al., 2001), adjacent to Fianarantsoa on the southeast (Figure A7-3, S1).

A map showing the geographic distribution of SNP-defined nodes in the strain


Geographic distribution of SNP-defined nodes in the strain MG05-1020 lineage. The strain MG05-1020 lineage portion of the SNP phylogeny from Figure A7-1 is indicated as well as an enlarged cutout of the map from Figure A7-3 showing the geographic distribution (more...)

Three subclades, I.F, I.H and I.K, did not show distinct geographic patterns (Figure A7-3). In the cases of subclades I.F and I.H, this may be due to the limited numbers of isolates within those subclades (Figure A7-2A, Table S1). The geographically widespread nature of subclade I.K isolates, however, may be related to their older dates of isolation. All of the subclade I.K isolates were isolated between 1940 and 1955 (Figure A7-2A, Table S1), just 19–34 years after plague was introduced to the central highlands. Therefore, these isolates may represent a subclade that was formerly spread throughout much of the central highlands but that currently does not exist in nature in Madagascar. Similarly, subclade I.I, although it was not geographically widespread (Figure A7-3), only contained isolates isolated from 1971–1976 (Figure A7-2A, Table S1) and may represent a former, now extinct subclade from the northern highlands. However, the limited number of isolates makes this difficult to determine. Alternatively, these subclades may still exist, but may have decreased in frequency and/or be very rare in nature.

Interestingly, the other older isolates tended to be the unaffiliated isolates. Eighteen of the 28 unaffiliated isolates were isolated between 1939 and 1978. Another 3 had unknown dates of isolation (Table S1). Given their older dates of isolation, these unaffiliated isolates may also be representatives of older, now extinct subclades from Madagascar. The lack of comparable isolates to these unaffiliated isolates among the rest of the isolate collection could be due to the limited sampling from earlier years (Table S1). Alternatively, the unaffiliated isolates may simply be representatives of very rare subclades. A final possibility could involve the accumulation of VNTR mutations due to repeated passages associated with prolonged storage in the laboratory, which could lead to the older isolates being inaccurate representatives of the original isolates. This is unlikely, however, as the rate of VNTR evolution in the laboratory, even with passaging, should be much slower than in nature. Thus, while these isolates may not be exactly the same as when they were first isolated, they should be close. Also, multiple copies of a subset of the Malagasy isolates in this study that were stored at different temperatures showed identical MLVA genotypes (data not shown), indicating that these VNTR loci are relatively stable in these isolates under the storage conditions used. Regardless, the unaffiliated nature of many of the older isolates is consistent with and most likely related to their older dates of isolation.

Several cities and communes yielded isolates of subclades predominantly found elsewhere, suggesting importation from other locations. Antananarivo, in particular, contained isolates from five subclades in addition to the dominant subclade (Figure A7-3, S1). Commune Andina Firaisana in the Ambositra district is another example, containing representatives of four different subclades (Figure 3, S1). One of these, subclade I.A, was also found in the nearby surrounding area. However, this area is considerably south of the area where the majority of subclade I.A isolates were found, suggesting that this subclade may have been imported to this area from further north or vice versa (Figure A7-3). Of the other three subclades found in Andina Firaisana, subclades II.A and II.B are also found in nearby areas and so may be naturally occurring in Andina Firaisana rather than due to transfer events. Subclade II.C, in contrast, appears to have been transferred to Andina Firaisana from the Betafo district in the northwest or vice versa (Figure A7-3, S1). Another nearby commune, Ivato, contained a single subclade I.E isolate, suggesting a transfer event from district Fianarantsoa in the south (Figure A7-3, S1).

Plague in Mahajanga

Our data suggest that Y. pestis was reintroduced to Mahajanga from the central highlands. The majority of the Mahajanga isolates (39 of 44) belonged to a single subcluster within subclade I.A (hereafter referred to as the Mahajanga I.A subcluster) (Figure A7-2A), suggesting that there was an introduction to Mahajanga from the central highlands that became established in Mahajanga and then underwent local cycling. Though this Mahajanga I.A subcluster did not have either SNP or MLVA support (Figure A7-2A), close examination of the isolates within this subcluster revealed very close genetic relationships, with most differences involving only a single repeat change at a single VNTR locus (data not shown). This is consistent with an outbreak scenario originating from a single introduction and strengthens the identification of this subcluster as a genetic group. In contrast, subclade I.A isolates outside of the Mahajanga I.A subcluster exhibited much greater variation both in the number of VNTR loci displaying polymorphisms and the number of alleles observed at those loci (data not shown), consistent with an older, more geographically dispersed and more differentiated set of isolates.

Our data also suggest that there have been multiple transfers of Y. pestis between Mahajanga and the central highlands. Specifically, seven isolates within the Mahajanga I.A subcluster were isolated from central highland locations rather than from Mahajanga (Figure A7-2A), suggesting that Y. pestis was also transferred back from Mahajanga to the central highlands. Two other Mahajanga isolates belonged to subclade I.F and were unaffiliated, respectively (Figure A7-2A), suggesting that there has been more than one introduction of Y. pestis to Mahajanga as well. The final three Mahajanga isolates, although they belonged to subclade I.A, were not part of the Mahajanga I.A subcluster and were instead more closely related to subclade I.A isolates from the central highlands (Figure A7-2A), again suggesting multiple introductions. However, it is unclear as to whether any of these other introductions became established in Mahajanga due to the lack of other Mahajanga isolates similar to these five outliers. Finally, although our data suggest that there have been multiple transfers of Y. pestis between Mahajanga and the central highlands, there is no evidence in these data for an introduction to Mahajanga from the northern highlands, as was previously suggested by PFGE analyses (Boisier et al., 2002; Duplantier et al., 2005).


Madagascar is one of the most active plague regions in the world. However, few studies have investigated the molecular epidemiology of Y. pestis from Madagascar and none have done so using very high resolution genomic methodologies. Here, we investigated the phylogeography and molecular epidemiology of Y. pestis in Madagascar by using a combination of SNPs and MLVA to analyze 262 Malagasy isolates from 25 districts from 1939–2005. In contrast with previous analyses that utilized ribotyping or SNPs alone (Guiyoule et al., 1997; Morelli et al., 2010), we identified a very high level of genetic diversity with 226 MLVA genotypes among the 262 isolates. These genotypes were distributed amongst 15 subclades that displayed significant geographic separation (Figure A7-3), leading to insights into the maintenance and spread of plague in Madagascar.

The use of MLVA was particularly effective at identifying genetic groups in Madagascar. SNPs, though useful, mostly provided confidence in genetic groups that were already apparent via MLVA. This is somewhat counter to the conventional hierarchical approach wherein SNPs are used first to identify major genetic groups followed by MLVA to provide resolution within those groups, thus minimizing the problems of mutational saturation/homoplasy that can occur with highly variable markers such as VNTRs (Keim et al., 2004). In this study, only SNP Mad-43 (Table S2), which differentiated Groups I and II, was useful in this conventional sense to identify “major genetic groups” that were obscured in the MLVA phylogeny (data not shown). All of the other subclades identified by SNPs were also identified by MLVA, suggesting that at this regional scale, MLVA alone may be effective at identifying robust genetic groups. Importantly, though MLVA was excellent at identifying these genetic groups, the relationships among those groups, such as the division between Groups I and II, remained unclear using MLVA alone (data not shown) whereas they were very clearly depicted as a star phylogeny in the SNP phylogeny (Figure A7-1). Where knowledge of deeper genetic relationships or fine-scale phylogenetic analysis of specific lineages (e.g., the strain MG05-1020 lineage here) is desired, SNPs will remain the preferred methodology for clonal pathogens such as Y. pestis. However, until whole genome sequencing for entire isolate collections becomes feasible, MLVA will continue to be a useful tool for examining genetic diversity whether used in conjunction with SNPs or alone.

Our analyses suggest that plague is being maintained in Madagascar in multiple geographically separated subpopulations. We revealed significant geographic separation among the identified subclades (Figure A7-3), suggesting that these subclades are undergoing local cycling with limited gene flow from other subclades. This is consistent with the population genetics and ecology of the black rat (Rattus rattus), the primary plague host in rural Madagascar (Brygoo, 1966; Duplantier, 2001). The black rat in Madagascar exhibits limited gene flow between subpopulations (Gilabert et al., 2007) as well as limited geographic ranges (Rahelinirina et al., 2010). This limited mobility, a high reproduction rate (Duplantier and Rakotondravony, 1999), and the development of some resistance to plague (Tollenaere et al., 2010) are all likely important factors that allow the black rat to maintain plague in these genetically distinct, geographically separated subpopulations. The two flea vectors, X. cheopis and S. fonquerniei (Duplantier, 2001; Duplantier and Rakotondravony, 1999), may also play a role in maintaining genetically distinct subpopulations (i.e., Groups I and II), though more data would be needed to confirm this hypothesis.

In contrast, transport of Y. pestis across longer distances in Madagascar is likely human-mediated. Historically, there is ample evidence for the influence of human traffic on the spread of plague, including transport along trade routes such as the Silk Road in the early pandemics and transport via steam ship to numerous new locations during the “third” pandemic (Morelli et al., 2010; Perry and Fetherston, 1997). The SNP phylogeny determined by Morelli et al. (2010) suggests the progression of plague from Israel to Madagascar to Turkey (Figure A7-1), a series of transfer events that were almost certainly human-mediated, though the details remain unknown. In Madagascar, plague was most likely transported from its introduction point on the coast to the central highlands, where it became permanently established, via the railroad linking Toamasina and Antananarivo (Brygoo, 1966). More recently, plague was most likely reintroduced to Mahajanga via the transport of infected rats and fleas together with foodstuffs from the central highlands. Indeed, our data suggest multiple transfers between Mahajanga and the central highlands, all likely human-mediated. Additional long distance transfers of Y. pestis in Madagascar are suggested by the multiple subclades identified in cities/communes such as Antananarivo and Andina Firaisana (Figure A7-3, S1, Table S1).

Though long distance transfers of Y. pestis undoubtedly occur, it is unclear how often such transfers result in the successful establishment of the transferred genotypes in new locations. At least one transfer to Mahajanga became successfully established and underwent local cycling as evidenced by the Mahajanga I.A subcluster described here (Figure A7-2A). However, many of the other examples of long distance transfers where multiple subclades were found in a single location are not as clear regarding the establishment of the transferred subclade(s). Antananarivo, for example, is clearly dominated by subclade I.A with only 1–2 representatives of each of the other five subclades identified there (Figure A7-3, S1, Table S1), suggesting that the presence of these alternative subclades may have been only transitory.

Successful establishment of subclades in new locations following a long distance transfer may be related to adaptive advantages possessed by some genotypes (Keim and Wagner, 2009). For instance, subclade I.A appears to be particularly successful in our analysis. The earliest subclade I.A isolate in our dataset was collected in 1974 from the Ambositra district (Table S1), one of the most active plague districts in Madagascar (Chanteau et al., 2000). Subsequent isolates indicate that this subclade continued to exist in a small area of the Ambositra district but also became well established over a large geographic area including and surrounding the capital, Antananarivo. This subclade was also successfully introduced to and established in Mahajanga and appears to have been transferred to the Fianarantsoa district, though it is unclear whether or not it became established there (Figure A7-3, S1, Table S1). This widespread geographical success may indicate that this subclade possesses an adaptive advantage that enhances its ability to be transferred long distances and become established in new locations (Keim and Wagner, 2009). Alternatively, the particular success of this subclade may simply be due to chance.

The central highlands focus remains the most active plague focus in Madagascar (Chanteau et al., 2000) and is, consequently, a likely place for new genotypes to emerge. This is particularly true for those central highlands districts with the highest plague activity. For instance, the three unique ribotypes identified in a previous study belonged to isolates from two highly active districts, Ambositra and Ambohimahasoa (Guiyoule et al., 1997). Here, isolates belonging to Group II and its subclades were found in three highly active districts, Betafo, Manandriana, and Ambositra (Figure A7-3, S1). As discussed above, Ambositra may also have been the district of origin for the highly successful subclade I.A. Overall, the Ambositra district was one of the two most diverse districts in our analysis, containing representatives from six different subclades (Figure A7-3, Table S1). This diversity is consistent with the Ambositra district’s status as one of the three most important plague districts in Madagascar (Chanteau et al., 1998; 2000).

The maintenance and spread of Y. pestis in Madagascar is a dynamic and highly active process, depending on the natural cycle between the black rat and its flea vectors as well as human activity. Y. pestis in Madagascar is maintained in multiple, genetically distinct, geographically separated subpopulations, likely via the black rat. The exact geographic landscape of these subpopulations is probably ever changing, with some subclades going extinct or decreasing in frequency (e.g., subclade I.K), new subclades emerging and becoming established, and some subclades being transferred to new locations, where they may become established either temporarily or more long-term. Much of the long distance spread of Y. pestis in Madagascar is likely due to human activities that allow for the transport of plague infected rats and fleas from one location to another.


We would like to thank Dr. Kimothy Smith for initially suggesting the collaboration that led to this work. Note that the use of products/names does not constitute endorsement by the DHS of the United States.

Author Contributions

Conceived and designed the experiments: AJV DMW SC PK. Performed the experiments: AJV FC JL RN. Analyzed the data: AJV DMW. Contributed reagents/materials/analysis tools: FC PR ME JR LR BWR SMB-S MA SC. Wrote the paper: AJV PK.


This work was funded by the Department of Homeland Security Science and Technology Directorate (award numbers NBCH2070001 and HSHQDC-08-C-00158), the Cowden Endowment in Microbiology at Northern Arizona University, and the National Institute of Allergy and Infectious Diseases (NIAID), US National Institutes of Health (NIH), Department of Health and Human Services (HHS) (award number AI065359). This work was also supported by the Science Foundation of Ireland (award number 05/FE1/B882) (MA), the NIAID NIH HHS (award number N01 AI-30071) (ME JR), the Malagasy Ministry of Health (contract Nu01/95 IDA 2252-MAG) (FC LR BWR SC), and the French Cooperation (FAC Nu 94008 300) (FC LR BWR SC). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.


  • Achtman M, Morelli G, Zhu P, Wirth T, Diehl I, et al. Microevolution and history of the plague bacillus, Yersinia pestis. Proc Natl Acad Sci U S A. 2004;101:17837–17842. [PMC free article: PMC535704] [PubMed: 15598742]
  • Achtman M, Zurth K, Morelli G, Torrea G, Guiyoule A, et al. Yersinia pestis, the cause of plague, is a recently emerged clone of Yersinia pseudotuberculosis. Proc Natl Acad Sci U S A. 1999;96:14043–14048. [PMC free article: PMC24187] [PubMed: 10570195]
  • Auerbach RK, Tuanyok A, Probert WS, Kenefic L, Vogler AJ, et al. Yersinia pestis evolution on a small timescale: comparison of whole genome sequences from North America. PLoS One. 2007;2:e770. [PMC free article: PMC1940323] [PubMed: 17712418]
  • Boisier P, Rahalison L, Rasolomaharo M, Ratsitorahina M, Mahafaly M, et al. Epidemiologic features of four successive annual outbreaks of bubonic plague in Mahajanga, Madagascar. Emerg Infect Dis. 2002;8:311–316. [PMC free article: PMC2732468] [PubMed: 11927030]
  • Boisier P, Rasolomaharo M, Ranaivoson G, Rasoamanana B, Rakoto L, et al. Urban epidemic of bubonic plague in Majunga, Madagascar: epidemiological aspects. Trop Med Int Health. 1997;2:422–427. [PubMed: 9217697]
  • Brygoo ER. Epidémiologie de la peste à Madagascar. Arch Inst Pasteur Madagascar. 1966;35:9–147.
  • Chain PS, Hu P, Malfatti SA, Radnedge L, Larimer F, et al. Complete genome sequence of Yersinia pestis strains Antiqua and Nepal516: evidence of gene reduction in an emerging pathogen. J Bacteriol. 2006;188:4453–4463. [PMC free article: PMC1482938] [PubMed: 16740952]
  • Chanteau S, Ratsifasoamanana L, Rasoamanana B, Rahalison L, Randriambelosoa J, et al. Plague, a reemerging disease in Madagascar. Emerg Infect Dis. 1998;4:101–104. [PMC free article: PMC2627662] [PubMed: 9452403]
  • Chanteau S, Ratsitorahina M, Rahalison L, Rasoamanana B, Chan F, et al. Current epidemiology of human plague in Madagascar. Microbes Infect. 2000;2:25–31. [PubMed: 10717537]
  • Clarke KR. Non-parametric multivariate analyses of changes in community structure. Aust J Ecol. 1993;18:117–143.
  • Cui Y, Li Y, Gorgé O, Platonov ME, Yan Y, et al. Insight into microevolution of Yersinia pestis by clustered regularly interspaced short palindromic repeats. PLoS ONE. 2008;3:e2652. [PMC free article: PMC2440536] [PubMed: 18612419]
  • Delcher AL, Phillippy A, Carlton J, Salzberg SL. Fast algorithms for large-scale genome alignment and comparison. Nucleic Acids Res. 2002;30:2478–2483. [PMC free article: PMC117189] [PubMed: 12034836]
  • Deng W, Burland V, Plunkett G 3rd, Boutin A, Mayhew GF, et al. Genome sequence of Yersinia pestis KIM. J Bacteriol. 2002;184:4601–4611. [PMC free article: PMC135232] [PubMed: 12142430]
  • Duplantier JM. The black rat’s role in spreading human plague in Madagascar. L’Institut de recherche pour le développement Scientific Bulletin. 2001;131:1–3.
  • Duplantier JM, Duchemin JB, Chanteau S, Carniel E. From the recent lessons of the Malagasy foci towards a global understanding of the factors involved in plague reemergence. Vet Res. 2005;36:437–453. [PubMed: 15845233]
  • Duplantier JM, Rakotondravony D. The rodent problem in Madagascar: agricultural pest and threat to human health. In: Singleton G, Hinds L, Leirs H, Zhang Z, editors. Ecologically-based management of rodent pests. Canberra: Australian Centre for International Agricultural Research; 1999. pp. 441–459.
  • Eppinger M, Guo Z, Sebastian Y, Song Y, Lindler LE, et al. Draft genome sequences of Yersinia pestis isolates from natural foci of endemic plague in China. J Bacteriol. 2009;191:7628–7629. [PMC free article: PMC2786597] [PubMed: 19820101]
  • Eppinger M, Worsham PL, Nikolich MP, Riley DR, Sebastian Y, et al. Genome sequence of the deep-rooted Yersinia pestis strain Angola reveals new insights into the evolution and pangenome of the plague bacterium. J Bacteriol. 2010;192:1685–1699. [PMC free article: PMC2832528] [PubMed: 20061468]
  • Ewing B, Hillier L, Wendl MC, Green P. Base-calling of automated sequencer traces using Phred. I. Accuracy assessment. Genome Res. 1998;8:175–185. [PubMed: 9521921]
  • Galimand M, Guiyoule A, Gerbaud G, Rasoamanana B, Chanteau S, et al. Multidrug resistance in Yersinia pestis mediated by a transferable plasmid. N Engl J Med. 1997;337:677–680. [PubMed: 9278464]
  • Garcia E, Worsham P, Bearden S, Malfatti S, Lang D, et al. Pestoides F, an atypical Yersinia pestis strain from the former Soviet Union. Adv Exp Med Biol. 2007;603:17–22. [PubMed: 17966401]
  • Gilabert A, Loiseau A, Duplantier JM, Rahelinirina S, Rahalison L, et al. Genetic structure of black rat populations in a rural plague focus in Madagascar. Can J Zool. 2007;85:965–972.
  • Girard JM, Wagner DM, Vogler AJ, Keys C, Allender CJ, et al. Differential plague-transmission dynamics determine Yersinia pestis population genetic structure on local, regional, and global scales. Proc Natl Acad Sci U S A. 2004;101:8408–8413. [PMC free article: PMC420407] [PubMed: 15173603]
  • Guiyoule A, Grimont F, Iteman I, Grimont PA, Lefévre M, et al. Plague pandemics investigated by ribotyping of Yersinia pestis strains. J Clin Microbiol. 1994;32:634–641. [PMC free article: PMC263099] [PubMed: 8195371]
  • Guiyoule A, Rasoamanana B, Buchrieser C, Michel P, Chanteau S, et al. Recent emergence of new variants of Yersinia pestis in Madagascar. J Clin Microbiol. 1997;35:2826–2833. [PMC free article: PMC230070] [PubMed: 9350742]
  • Huang XZ, Chu MC, Engelthaler DM, Lindler LE. Genotyping of a homogeneous group of Yersinia pestis strains isolated in the United States. J Clin Microbiol. 2002;40:1164–1173. [PMC free article: PMC140403] [PubMed: 11923326]
  • Johansson A, Farlow J, Larsson P, Dukerich M, Chambers E, et al. Worldwide genetic relationships among Francisella tularensis isolates determined by multiple-locus variable-number tandem repeat analysis. J Bacteriol. 2004;186:5808–5818. [PMC free article: PMC516809] [PubMed: 15317786]
  • Keim P, Van Ert MN, Pearson T, Vogler AJ, Huynh LY, et al. Anthrax molecular epidemiology and forensics: using the appropriate marker for different evolutionary scales. Infect Genet Evol. 2004;4:205–213. [PubMed: 15450200]
  • Keim PS, Wagner DM. Humans and evolutionary and ecological forces shaped the phylogeography of recently emerged diseases. Nat Rev Microbiol. 2009;7:813–821. [PMC free article: PMC2794044] [PubMed: 19820723]
  • Kingston JJ, Tuteja U, Kapil M, Murali HS, Batra HV. Genotyping of Indian Yersinia pestis strains by MLVA and repetitive DNA sequence based PCRs. Antonie Van Leeuwenhoek. 2009;96:303–312. [PubMed: 19449123]
  • Klevytska AM, Price LB, Schupp JM, Worsham PL, Wong J, et al. Identification and characterization of variable-number tandem repeats in the Yersinia pestis genome. J Clin Microbiol. 2001;39:3179–3185. [PMC free article: PMC88315] [PubMed: 11526147]
  • Kumar S, Tamura K, Jakobsen IB, Nei M. MEGA2: molecular evolutionary genetics analysis software. Bioinformatics. 2001;17:1244–1245. [PubMed: 11751241]
  • Laventure S, Andrianaja V, Rasoamanana B. Epidémie de peste à Majunga en 1991. Rapport de mission de l’Institut Pasteur de Madagascar. 1991:1–26. 1991. 1–26.
  • Li Y, Cui Y, Hauck Y, Platonov ME, Dai E, et al. Genotyping and phylogenetic analysis of Yersinia pestis by MLVA: insights into the worldwide expansion of Central Asia plague foci. PLoS One. 2009;4:e6000. [PMC free article: PMC2694983] [PubMed: 19543392]
  • Li Y, Dai E, Cui Y, Li M, Zhang Y, et al. Different region analysis for genotyping Yersinia pestis isolates from China. PLoS ONE. 2008;3:e2166. [PMC free article: PMC2367435] [PubMed: 18478120]
  • Lowell JL, Zhansarina A, Yockey B, Meka-Mechenko T, Stybayeva G, et al. Phenotypic and molecular characterizations of Yersinia pestis isolates from Kazakhstan and adjacent regions. Microbiology. 2007;153:169–177. [PubMed: 17185545]
  • Lucier TS, Brubaker RR. Determination of genome size, macrorestriction pattern polymorphism, and nonpigmentation-specific deletion in Yersinia pestis by pulsed-field gel electrophoresis. J Bacteriol. 1992;174:2078–2086. [PMC free article: PMC205823] [PubMed: 1551830]
  • Migliani R, Chanteau S, Rahalison L, Ratsitorahina M, Boutin JP, et al. Epidemiological trends for human plague in Madagascar during the second half of the 20th century: a survey of 20,900 notified cases. Trop Med Int Health. 2006;11:1228–1237. [PubMed: 16903886]
  • Migliani R, Ratsitorahina M, Rahalison L, Rakotoarivony I, Duchemin JB, et al. [Resurgence of the plague in the Ikongo district of Madagascar in 1998. 1. Epidemiological aspects in the human population]. Bull Soc Pathol Exot. 2001;94:115–118. [PubMed: 11475028]
  • Morelli G, Song Y, Mazzoni CJ, Eppinger M, Roumagnac P, et al. Yersinia pestis genome sequencing identifies patterns of global phylogenetic diversity. Nat Genet. 2010;42:1140–1143. [PMC free article: PMC2999892] [PubMed: 21037571]
  • Motin VL, Georgescu AM, Elliott JM, Hu P, Worsham PL, et al. Genetic variability of Yersinia pestis isolates as predicted by PCR-based IS100 genotyping and analysis of structural genes encoding glycerol-3-phosphate dehydrogenase (glpD) J Bacteriol. 2002;184:1019–1027. [PMC free article: PMC134790] [PubMed: 11807062]
  • Parkhill J, Wren BW, Thomson NR, Titball RW, Holden MT, et al. Genome sequence of Yersinia pestis, the causative agent of plague. Nature. 2001;413:523–527. [PubMed: 11586360]
  • Pearson T, Busch JD, Ravel J, Read TD, Rhoton SD, et al. Phylogenetic discovery bias in Bacillus anthracis using single-nucleotide polymorphisms from whole-genome sequencing. Proc Natl Acad Sci U S A. 2004;101:13536–13541. [PMC free article: PMC518758] [PubMed: 15347815]
  • Perry RD, Fetherston JD. Yersinia pestis-etiologic agent of plague. Clin Microbiol Rev. 1997;10:35–66. [PMC free article: PMC172914] [PubMed: 8993858]
  • Pourcel C, André-Mazeaud F, Neubauer H, Ramisse F, Vergnaud G. Tandem repeats analysis for the high resolution phylogenetic analysis of Yersinia pestis. BMC Microbiol. 2004;4:22. [PMC free article: PMC436057] [PubMed: 15186506]
  • Rahelinirina S, Duplantier JM, Ratovonjato J, Ramilijaona O, Ratsimba M, et al. Study on the movement of Rattus rattus and evaluation of the plague dispersion in Madagascar. Vector Borne Zoonotic Dis. 2010;10:77–84. [PubMed: 20158335]
  • Rasolomaharo M, Rasoamanana B, Andrianirina Z, Buchy P, Rakotoarimanana N, et al. Plague in Majunga, Madagascar. Lancet. 1995;346:1234. [PubMed: 7475693]
  • Rotz LD, Khan AS, Lillibridge SR, Ostroff SM, Hughes JM. Public health assessment of potential biological terrorism agents. Emerging infectious diseases. 2002;8:225–230. [PMC free article: PMC2732458] [PubMed: 11897082]
  • Song Y, Tong Z, Wang J, Wang L, Guo Z, et al. Complete genome sequence of Yersinia pestis strain 91001, an isolate avirulent to humans. DNA Res. 2004;11:179–197. [PubMed: 15368893]
  • Tollenaere C, Rahalison L, Ranjalahy M, Duplantier JM, Rahelinirina S, et al. Susceptibility to Yersinia pestis experimental infection in wild Rattus rattus, reservoir of plague in Madagascar. Ecohealth. 2010;7:242–247. [PubMed: 20443044]
  • Touchman JW, Wagner DM, Hao J, Mastrian SD, Shah MK, et al. A North American Yersinia pestis draft genome sequence: SNPs and phylogenetic analysis. PLoS One. 2007;2:e220. [PMC free article: PMC1794153] [PubMed: 17311096]
  • Van Ert MN, Easterday WR, Huynh LY, Okinaka RT, Hugh-Jones ME, et al. Global genetic population structure of Bacillus anthracis. PLoS One. 2007;2:e461. [PMC free article: PMC1866244] [PubMed: 17520020]
  • Vogler AJ, Birdsell D, Price LB, Bowers JR, Beckstrom-Sternberg SM, et al. Phylogeography of Francisella tularensis: global expansion of a highly fit clone. J Bacteriol. 2009;191:2474–2484. [PMC free article: PMC2668398] [PubMed: 19251856]
  • Welch TJ, Fricke WF, McDermott PF, White DG, Rosso ML, et al. Multiple antimicrobial resistance in plague: an emerging public health risk. PLoS One. 2007;2:e309. [PMC free article: PMC1819562] [PubMed: 17375195]
  • World Health Organization (WHO) Human plague: review of regional morbidity and mortality, 2004–2009. Wkly Epidemiol Rec. 2010;85:40–45. [PubMed: 20151494]
  • Zhang X, Hai R, Wei J, Cui Z, Zhang E, et al. MLVA distribution characteristics of Yersinia pestis in China and the correlation analysis. BMC Microbiol. 2009;9:205. [PMC free article: PMC2761927] [PubMed: 19775435]
  • Zhang Z, Hai R, Song Z, Xia L, Liang Y, et al. Spatial variation of Yersinia pestis from Yunnan Province of China. Am J Trop Med Hyg. 2009;81:714–717. [PubMed: 19815893]


and .

30 Argonne National Laboratory, Institute for Genomics and Systems Biology, 9700 S. Cass Ave, Lemont, IL 60439, U.S.A.


Next-generation sequencing (NGS) has opened up access to genomic data from diverse microbial communities, and studies are emerging that cover a wide variety of systems (Gilbert et al., 2010; Human Microbiome Project, 2012; Nealson and Venter, 2007; Tara Expeditions, 2012; Terragenome Consortium, 2012). A number of techniques are used to extract genome-based information either using single reference genes (usually 16s rDNA) or random shotgun metagenomics using entire genomes. There is an abundance of reviews of the subject (Desai et al., 2012; Thomas et al., 2012). Systems such as MG-RAST (Meyer et al., 2008) now provide access to thousands of metagenomic data sets (see Figure A8-1).

A screenshot of the MG-RAST homepage


The MG-RAST system has more than 58,000 metagenomic data sets totaling over 16.5 terabase pairs of information.

However this newly found data richness is not without its challenges. The main problem stems from dramatic changes that converted an ecosystem that was until recently data poor, to one that is now overflowing with data. Environmental biology and molecular ecology went from being overwhelmed by several hundred megabytes of data generated in 2005 by the Global Ocean Survey (Nealson and Venter, 2007) to generating many terabytes of data in 2012. Biology and medicine, however, lack the tradition and experience to handle big data—only a few areas such as cancer diagnosis have established stable pipelines and data formats that allow exchange.

Shotgun Metagenomics as an Example

Metagenomic shotgun sequencing using NGS technology can serve as a blueprint for how biology is and will be impacted by big data. With sequence data already significantly cheaper than the corresponding analysis (Wilkening et al., 2009), and as the cost of sequencing drops by a factor of 10 annually, it seems clear that a paradigm shift will be required to handle data analysis and storage.

There is significant value in the comparative analysis of metagenomic data sets, yet comparison requires data sets to have undergone more or less the same analytical processes. The existing paradigm of publishing the raw data and summary statistics in tabular form as auxiliary material does not allow any third parties to benefit from the work already done; instead it requires any future authors to re-analyze data. Consider the Human Microbiome Project (Turnbaugh et al., 2007) that has recently published more than 5 terabases of sequence data from 172 human subjects. Any researcher attempting to compare their finding to the data will discover that they need to re-analyze all of the data.

As an interesting side note, the value and need for comparative analyses also necessitates asking questions of how the reviewing process was handled. Did the reviewers in fact take a look at any of the analysis performed, or did they take the results produced by a complex sequence analysis pipeline at face value? Often the information that was derived from the data is a product of a complex pipeline that was not described in sufficient detail. Further, the reviewer has no way to know whether the same results could be obtained from the same data. A rigorous review process cannot possibly be maintained under current mechanisms and requirements. The prohibitive cost renders such analyses effectively irreproducible. Thus, it is all the more critical to require detailed documentation of data handling in analyses.

One of the key missing concepts is the notion of rigorous analysis of data quality prior to deriving any statements about biology from the data. Many factors contribute to data quality; in DNA sequencing, noise can be added at various steps in the pipeline. While some vendor-specific schemes exist to determine sequence noise, in the past there were no generally accepted vendor-neutral ways to characterize the noise in sequence data. For 16s rDNA-based amplicon metagenomic data, this has already led to a major debate over the amount of microbial diversity (Reeder and Knight, 2010), leading to a number of approaches that will de-noise sequence data prior to analysis (Quince et al., 2009; Reeder and Knight, 2010). For shotgun metagenomics the DRISEE approach (Keegan et al., 2012) provides a vendor-neutral estimate of sequence error. Interestingly, results show that the errors found are not specific to sequencing platforms; variations in quality within the platforms are significant, as highlighted by Figure A8-2.

A graph showing the variation in DRISEE error from two samples in the same group


The DRISEE error profiles for two anonymous projects with three shotgun libraries. The predicted cumulative error per position is plotted showing dramatic variation with the green data set near perfect and the purple data exceeding 40 percent error after (more...)

Taking the data sets underlying Figure A8-2 as an example, the sequence analysis pipelines will be required to use different approaches for data with less than 1 percent error and with more than 45 percent error. The MG-RAST analysis pipeline was the first to include the examination of sequence quality systematically and to highlight sequence quality issues. A surprising amount of data sets submitted to the system are rejected initially because they are, for example, too low in quality or contain contamination.

In addition to “low-level” sequence error, a number of other significant sources of problems exist. Tom Schmidt’s group described the existence of artificially duplicated reads in 454 data (Gomez-Alvarez et al., 2009). (These artifacts also exist in Ion Torrent and Illumina data.) If left uncorrected, such duplicates lead to significant biases in the interpretation of sequence data, as some areas of sequence are misrepresented.

Other frequently found artifacts include leftover adapter sequences or primer di-mers (see Figure A8-3 for an example of both problems). Most researchers will agree that for a shotgun sample, a more or less even distribution of each base on each position of all reads is to be expected. Unfortunately, in cases that deviate from the expected distribution, as in Figure A8-3, the only information that can be derived from the sequence data is that the sequencing run has failed. Yet, the data shown in Figure A8-3 have been interpreted biologically and were accepted for publication.

A graph showing a representation of the average bas abundance per base.


A simple representation of average base abundance per base demonstrates that data are not distributed randomly. The four bases are represented green (A), blue (C), yellow (G), and red (T); black indicates a missing base call. In this (anonymous) data (more...)

Once the potential obstacles with data quality have been eliminated, a number of bioinformatics tools can be used to predict genes of interest for downstream analysis. While significant progress has been made in recent years for the prediction of genes in more or less complete microbial genomes (Overbeek et al., 2007), the same cannot be said for the state of the art for predicting genes in noisy metagenomic data. Trimble et al. (2012) show that only one of the existing tools accounts for the possibility that sequences might contain sequencing error, despite that, as we have mentioned above, the presence of noise (or imperfect data) is a reality in shotgun metagenomics where data are frequently not or only partially assembled.

A side effect of assuming all data to be perfect is the significant impairment of tool performance when tools are used on data with realistic error properties. This mismatch of assumption with reality leads to performance reductions of 10–20 percent accuracy in the presence of 3 percent error (see Trimble et al., 2012).

Following gene prediction, functional assignments are computed by mapping the prediction features against a database of known proteins in some form. Again different pipelines apply different approaches; however, they all rely on some flavor of sequence similarity searching (e.g., BLAST [Altschul et al., 2009], BLAT [Kent, 2002], HMMer [Eddy, 1998]). What all these approaches have in common is their failure to identify novelty. Any unknown gene will remain uncharacterized, and a non-homologous replacement for a known protein function (no matter how important for the sample) will be represented as an unknown protein in the metagenomic analysis presented.

While this might present a problem in some cases, the majority of the protein databases contain a variety of annotations for proteins of the same function. Often, multiple annotations for proteins that are 100 percent sequence identical can be found (e.g., the alcohol dehydrogenase gene from various Streptococcus strains has 20 different annotations). While some annotations would be informative for a human reader, in the majority of cases, a computer would not be able to recognize the fact that the two functions described are identical. When comparing the relative abundance of alcohol dehydrogenase genes, results would be affected by the fact that several alcohol dehydrogenase genes would have slightly different names. As a result, annotations derived from similarity searches are less then useful for quantitative studies in microbial ecology unless carried out on a higher functional level as for example KEGG (Kanehisa, 2002) pathways or the very successful SEED subsystems (Overbeek et al., 2005). These higher-level aggregations subsume a significant number of genes in categories (e.g., KEGG pathways).

Using those categories to represent gene function abundance, we can represent the genetic material in environmental samples as an abundance vector, allowing cross-sample comparison of gene abundance. With the newly minted BIOM (McDonald et al., 2012) format, various existing tools, such as MG-RAST (Meyer et al., 2008) and QIIME (Caporaso et al., 2010) can be used together to analyze functional abundance data.

An Attempt to Estimate the Scale of the Problem

“Raw” sequence data can be transformed into abundance vectors that describe the abundance of specific gene categories in environmental samples. However even slight variations in one of the transforming steps will introduce significant variations in the outcome, as would be the case for two data sets that have used different gene prediction algorithms with different levels of tolerance for sequence error.

Because there is no culture of sharing data beyond raw sequences in the INSDC archives, consumers of data from any specific study can either rely on tables published as auxiliary material with a study, or they can re-analyze the data. As mentioned before, the computational cost makes that undesirable (Wilkening et al., 2009), Imagine yourself in the position of comparing your data to the Human Microbiome Project jump-start project (described above). Instead of simply analyzing their own data, researchers would find themselves re-analyzing various other data sets they are using in comparison. While this type of approach was common in the early days of genomics and still is used for single microbial genomes—such as SEED (Aziz et al., 2008), IMG (Markowitz et al., 2006), and GenDB (Meyer et al., 2003)—using the same approach for more computationally demanding data types like metagenomes will lead to a situation in which groups are no longer limited by their ability to acquire sequence data, but by their ability to analyze it. While it is likely that metagenomic data analysis will undergo significant improvements (as compared to the improvements for individual microbial genomes discussed in Overbeek et al. [2007]), that will not suffice. With sequencing costs continuing to drop, data acquisition costs will soon be a small fraction of the data analysis cost.

Sharing of computational results (e.g., gene calling results, computed similarities, and other intermediate data types) would alleviate this problem. But this only works if the community can agree on a small number of standard formats to represent the data, and only if the majority of the tools support those standards.

Ways Out of the Current Dilemma

While for the first time biology now has access to abundant data, the challenges in handling the data and using the data in a robust way to define new research hypotheses seem insurmountable. We have described the challenge of using big data in the context of metagenomics above.

A number of aspects to this challenge need to be addressed separately. Perhaps the most important aspect is a change of culture recognizing that in the presence of abundant data, the standard operating procedures from before are no longer sufficient. Among the things that need to change are data archives that now can no longer attempt to capture all data, computational approaches that need to take computational costs into account, and last but not least the individual researchers that need to learn that data analysis must be planned and budgeted for appropriately.

One suggestion likely acceptable to most readers is that, in the presence of big data, computational data analysis needs to take a more prominent role in the training of the next generation of bio-scientists.

There are also some technical steps that can be taken to improve the current situation. Standards for describing sequence data have been established by the Genomics Standards Consortium (GSC) (Field et al., 2011). These standards are currently enabling data exchange at a hitherto unprecedented scale, enabling data consumption by many third parties for many purposes.

Based on these positive experiences, the GSC now has initiated a long-term project to define standards for sharing processed data. The results of this M5 project will enable researchers to consume data from a published study for their analysis without having to re-analyze everything from scratch and yet allowing them to both change and understand all the fine details of these studies.

In addition, the M5 project aims to define encodings for data that will allow existing and new analysis service providers to exchange analyzed data. This ability will allow comparison of different analytical approaches and will help with the evolution of analysis approaches. We predict that this new openness will lead to more acceptance for the established analysis approaches and reduce the number of ad hoc analysis pipelines that reinvent analysis processes. By embracing the needed cultural changes, adhering to standards, and promoting openness, many third parties will be liberated to innovate like never before.


  • Altschul SF, Gertz EM, Agarwala R, Schaffer AA, Yu YK. PSI-BLAST pseudocounts and the minimum description length principle. Nucleic Acids Research. 2009;37(3):815–824. [PMC free article: PMC2647318] [PubMed: 19088134]
  • Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, Formsma K, Gerdes S, Glass EM, Kubal M, Meyer F, Olsen GJ, Olson R, Osterman AL, Overbeek RA, McNeil LK, Paarmann D, Paczian T, Parrello B, Pusch GD, Reich C, Stevens R, Vassieva O, Vonstein V, Wilke A, Zagnitko O. The RAST server: Rapid annotations using subsystems technology. BMC Genomics. 2008;9:75. [PMC free article: PMC2265698] [PubMed: 18261238]
  • Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, Fierer N, Pena AG, Goodrich JK, Gordon JI, Huttley GA, Kelley ST, Knights D, Koenig JE, Ley RE, Lozupone CA, McDonald D, Muegge BD, Pirrung M, Reeder J, Sevinsky JR, Turnbaugh PJ, Walters WA, Widmann J, Yatsunenko T, Zaneveld J, Knight R. QIIME allows analysis of high-throughput community sequencing data. Nature Methods. 2010;7(5):335–336. [PMC free article: PMC3156573] [PubMed: 20383131]
  • Desai N, Antonopoulos D, Gilbert JA, Glass EM, Meyer F. From genomics to metagenomics. Current Opinion in Biotechnology. 2012;23(1):72–76. [PubMed: 22227326]
  • Eddy SR. Profile hidden Markov models. Bioinformatics. 1998;14(9):755–763. [PubMed: 9918945]
  • Field D, Amaral-Zettler L, Cochrane G, Cole JR, Dawyndt P, Garrity GM, Gilbert J, Glöckner FO, Hirschman L, Karsch-Mizrachi I. The Genomic Standards Consortium. PLoS Biology. 2011;9(6):e1001088. [PMC free article: PMC3119656] [PubMed: 21713030]
  • Gilbert JA, Meyer F, Jansson J, Gordon J, Pace N, Tiedje J, Ley R, Fierer N, Field D, Kyrpides N, Glöckner FO, Klenk HP, Wommack KE, Glass E, Docherty K, Stevens R, Knight R. The Earth Microbiome Project: Meeting report of the “1 EMP meeting on sample selection and acquisition” at Argonne National Laboratory, October 6, 2010. Standards in Genomic Sciences. 2010;3(3):249–253. [PMC free article: PMC3035312] [PubMed: 21304728]
  • Gomez-Alvarez V, Teal TK, Schmidt TM. Systematic artifacts in metagenomes from complex microbial communities. ISME Journal. 2009;3(11):1314–1317. [PubMed: 19587772]
  • HMP (Human Microbiome Project) Human Microbiome Project. 2012. [accessed October 2, 2012]. http://nihroadmap​
  • Kanehisa M. The KEGG database. Novartis Foundation symposium. 2002;247:91–101. discussion 101–103, 119–128, 244–252. [PubMed: 12539951]
  • Keegan KP, Trimble WL, Wilkening J, Wilke A, Harrison T, D’Souza M, Meyer F. A platform-independent method for detecting errors in metagenomic sequencing data: DRISEE. PLoS Computational Biology. 2012;8(6):e1002541. [PMC free article: PMC3369934] [PubMed: 22685393]
  • Kent WJ. BLAT—the BLAST-like alignment tool. Genome Research. 2002;12(4):656–664. [PMC free article: PMC187518] [PubMed: 11932250]
  • Markowitz VM, Korzeniewski F, Palaniappan K, Szeto E, Werner G, Padki A, Zhao X, Dubchak I, Hugenholtz P, Anderson I, Lykidis A, Mavromatis K, Ivanova N, Kyrpides NC. The integrated microbial genomes (IMG) system. Nucleic Acids Research. 2006;34(Database issue):D344–D348. [PMC free article: PMC1347387] [PubMed: 16381883]
  • McDonald D, Clemente JC, Kuczynski J, Rideout J, Stombaugh J, Wendel D, Wilke A, Huse S, Hufnagle J, Meyer F, Knight R, Caporaso J. The Biological Observation Matrix (BIOM) format or: How I learned to stop worrying and love the ome-ome. Gigascience. 2012;1:7. [PMC free article: PMC3626512] [PubMed: 23587224]
  • Meyer F, Goesmann A, McHardy AC, Bartels D, Bekel T, Clausen J, Kalinowski J, Linke B, Rupp O, Giegerich R, Pühler A. GenDB—an open source genome annotation system for prokaryote genomes. Nucleic Acids Research. 2003;31(8):2187–2195. [PMC free article: PMC153740] [PubMed: 12682369]
  • Meyer F, Paarmann D, D’Souza M, Olson R, Glass EM, Kubal M, Paczian T, Rodriguez A, Stevens R, Wilke A, Wilkening J, Edwards RA. The metagenomics RAST server—A public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics [electronic resource] 2008;9:386. [PMC free article: PMC2563014] [PubMed: 18803844]
  • Nealson KH, Venter JC. Metagenomics and the global ocean survey: What’s in it for us, and why should we care? ISME Journal. 2007;1(3):185–187. [PubMed: 18043628]
  • Overbeek R, Begley T, Butler RM, Choudhuri JV, Diaz N, Chuang HY, Cohoon M, de Crécy-Lagard V, Disz T, Edwards R, Fonstein M, Frank ED, Gerdes S, Glass EM, Goesmann A, Hanson A, Iwata-Reuyl D, Jensen R, Hamshidi N, Krause L, Kubal M, Larsen N, Linke B, McHardy AC, Meyer F, Neuweger H, Olsen G, Olson R, Osterman A, Portnoy V, Pusch GD, Rodionov DA, Rückert C, Steiner J, Stevens R, Thiele I, Vassieva O, Ye Y, Zagnitko O, Vonstein V. The subsystems approach to genome annotation and its use in the Project to Annotate 1000 Genomes. Nucleic Acids Research. 2005;33(17) [PMC free article: PMC1251668] [PubMed: 16214803]
  • Overbeek R, Bartels D, Vonstein V, Meyer F. Annotation of bacterial and archaeal genomes: Improving accuracy and consistency. Chemical Reviews. 2007;107(8):3431–3447. [PubMed: 17658903]
  • Quince C, Lanzen A, Curtis TP, Davenport RJ, Hall N, Head IM, Read LF, Sloan WT. Accurate determination of microbial diversity from 454 pyrosequencing data. Nature Methods. 2009;6(9):639–641. [PubMed: 19668203]
  • Reeder J, Knight R. Rapidly denoising pyrosequencing amplicon reads by exploiting rank-abundance distributions. Nature Methods. 2010;7(9):668–669. [PMC free article: PMC2945879] [PubMed: 20805793]
  • Tara Expeditions. 2012. [accessed October 2, 2012]. http://oceans​
  • Terragenome Consortium. 2012. [accessed October 2, 2012]. http://www​
  • Thomas T, Gilbert J, Meyer F. Metagenomics—a guide from sampling to data analysis. Microbial Informatics and Experimentation. 2012;2(1):3. [PMC free article: PMC3351745] [PubMed: 22587947]
  • Trimble WL, Keegan KP, D’Souza M, Wilke A, Wilkening J, Gilbert J, Meyer F. Short-read reading-frame predictors are not created equal: Sequence error causes loss of signal. BMC Bioinformatics [electronic resource] 2012;13(1):183. [PMC free article: PMC3526449] [PubMed: 22839106]
  • Turnbaugh PJ, Ley RE, Hamady M, Fraser-Liggett CM, Knight R, Gordon JI. The human microbiome project. Nature. 2007;449(7164):804–810. [PMC free article: PMC3709439] [PubMed: 17943116]
  • Wilkening J, Wilke A, Desai N, Meyer F. Using clouds for metagenomics: A case study. IEEE Cluster 2009. 2009


,32 ,32 ,32 ,32 ,32 ,32 ,33 and 32.

32 Nicholas J. Loman, Chrystala Constantinidou, Jacqueline Z. M. Chan, Mihail Halachev, Martin Sergeant, Charles W. Penn and Mark J. Pallen are at the Institute of Microbiology and Infection, University of Birmingham, Birmingham B15 2TT, UK.
33 Esther R. Robinson is at the Nuffield Department of Clinical Laboratory Sciences, University of Oxford, Oxford OX3 9DU, UK.


Here, we take a snapshot of the high-throughput sequencing platforms, together with the relevant analytical tools, that are available to microbiologists in 2012, and evaluate the strengths and weaknesses of these platforms in obtaining bacterial genome sequences. We also scan the horizon of future possibilities, speculating on how the availability of sequencing that is ‘too cheap to metre’ might change the face of microbiology forever.

In bacteriology, the genomic era began in 1995, when the first bacterial genome was sequenced using conventional Sanger sequencing (Fleischmann et al., 1995). Back then, sequencing projects required six-figure budgets and years of effort. A decade later, in 2005, the advent of the first high-throughput (or “next-generation”) sequencing technologies signalled a significant advance in the ease and cost of sequencing (Metzker et al., 2005), delivering bacterial genome sequences in hours or days rather than months or years. High-throughput sequencing now delivers sequence data thousands of times more cheaply than is possible with Sanger sequencing. The availability of a growing abundance of platforms and instruments presents the user with an embarrassment of choice. Better still, vigorous competition between manufacturers has resulted in sustained technical improvements on almost all platforms. This means that in recent years our sequencing capability has been doubling every 6–9 months—much faster than Moore’s law.

Here, we describe the sequencing technologies themselves, examine the practicalities of producing a sequence-ready template from bacterial cultures and clinical samples, and weigh up the costs of labour and kits. We look at the types of data that are delivered by each instrument, and describe the approaches, programs and pipelines that can be used to analyse these data and thus move from draft to complete genomes.

Several high-throughput sequencing platforms are now chasing the US$1,000 human genome (Venter, 2010). Given that the average bacterial genome is less than one-thousandth the size of the human genome, a back-of-the-envelope calculation suggests that a $1 bacterial genome sequence is an imminent possibility. In closing, we assess how close to reality the $1 bacterial genome actually is and explore the ways in which high-throughput sequencing might change the way that all microbiologists work.

A Variety of Approaches

High-throughput sequencing platforms can be divided into two broad groups depending on the kind of template used for the sequencing reactions. The earliest, and currently most widely used, platforms depend on the production of libraries of clonally amplified templates. These are produced through amplification of immobilized libraries made from a single DNA molecule in the initial sample. More recently, we have seen the arrival of single-molecule sequencing platforms, which determine the sequence of single molecules without amplification. Within these broad categories, there is considerable variation in performance—including in throughput, read length and error rate—as well as in factors affecting usability, such as cost and run time.

Template amplification technologies In general terms, all of the platforms that are currently on the market rely on a three-stage workflow of library preparation, template amplification and sequencing (Figure A9-1). Library preparation begins with the extraction and purification of genomic DNA. Depending on the protocol, the amount of DNA required can vary from a few nanograms to tens of micrograms, meaning that success in this step depends on the ability to grow sufficient biomass. For some microorganisms, obtaining suitable DNA—in terms of quantity and quality—can prove difficult. Therefore, before using expensive reagents for library preparation and sequencing, it is advisable to confirm, by fluorometry, that DNA of sufficient quantity and quality has been obtained. However, purchasing a suitable instrument to do this adds to the costs of establishing a sequencing capability (Box A9-1).

A diagram showing numerous high-throughput sequencing platforms


High-throughput sequencing platforms. The schematic shows the main high-throughput sequencing platforms available to microbiologists today, and the associated sample preparation and template amplification procedures. For full details, see main text. PGM, (more...)

Box Icon

BOX A9-1

The Add-on Cost of Sequencing. The costs of sequencing instruments and reagents are not the only issues that need to be taken into account when setting up a sequencing facility for microbial applications. So, what else do you need? Well, first you have (more...)

For shotgun sequencing, an initial fragmentation step is required to generate random, overlapping DNA fragments. Depending on the platform and application, these fragments can range from 150 bp to 800 bp in length; size selection either involves harvesting from agarose gels or exploits paramagnetic-bead-based technology. The selected fragments must also be sufficiently abundant to provide comprehensive and even coverage of the target genome. Two types of fragmentation are widely used: mechanical and enzymatic. Early protocols relied on mechanical methods such as nebulization or ultrasonication. Nebulization is an inexpensive method that can be easily adopted by any laboratory, but it results in large losses of input material and a broad range of fragment sizes, runs the risk of cross-contamination and cannot handle parallel processing. By contrast, ultrasonication instruments such as systems from Covaris or the Bioruptor systems from Diagenode allow parallel sample processing and minimize hands-on time and sample loss but come at a price that could be prohibitive for small laboratories. Mechanically generated fragments require repair and end-polishing before platform-specific adaptors can be ligated to the ends of the target molecules. These adaptors act as primer-binding sites for the subsequent template amplification reaction.

More recently, enzymatic methods have provided an alternative approach to producing random fragments of the desired length. These require less input DNA and offer easier, faster sample processing. Fragmentase (from New England Biolabs) is a mixture of a nuclease, which randomly nicks double-stranded DNA, and a T7 endonuclease, which cleaves the DNA. Together, these enzymes generate random double-strand DNA breaks in a time-dependent manner, allowing the user to tailor protocols in order to obtain products of the required length. Adaptors can then be ligated to these fragments in the usual way. Tagmentation (Caruccio, 2011) is a promising transposase-based approach that, in a single step, fragments DNA and incorporates sequence tags, which then take the place of adaptors. Currently, the only available implementation of tagmentation is within the Nextera system, which is only available for the Illumina platform. Several companies have produced automated liquid-handling machines that greatly reduce the hands-on time required for fragmentation approaches but significantly increase costs (Box A9-1).

In addition to supporting fragment-based sequencing, all template amplification platforms support mate pair sequencing, in which the ends of DNA fragments of a certain size (typical sizes are 3 kb, 6 kb, 8 kb or 20 kb) are joined together to form circular molecules. These molecules are then fragmented a second time. Fragments flanking the joins are then selected and end adaptors added. Sequencing through the joins provides valuable information about the location of sequences dispersed across the genome, facilitating assembly.

Paired-end sequencing has similarities to mate pair sequencing, but DNA fragments are sequenced from each end without the need for additional library preparation steps. The Illumina platform has direct support for paired-end sequencing. Short fragments that are less than the read length from the forward and reverse ends (for example, 180 bp fragments combined with 2 × 100 base sequencing) permits overlapping pseudo long reads to be generated. Alternatively, fragments of up to ~800 bp can be used. Longer fragments may result in a loss of amplification efficiency. The Ion Personal Genome Machine (PGM) (using the Ion Torrent platform, from Life Technologies) also has a bidirectional sequencing protocol that requires the removal of the chip after the initial run, a digestion step and a second sequencing run using a different sequencing primer. All platforms can handle PCR products, allowing adaptor sequences to be incorporated into the 5′ ends of primers.

For all platforms, it is highly advisable to assess the quality and quantity of the sequence library before subjecting it to amplification. Different instruments for quality assessment are recommended by different manufacturers. Examples include the 2100 Bioanalyzer (from Agilent Technologies), fluorometers such as the NanoDrop 3300 (from Thermo Scientific) or the Qubit (from Life Technologies), and quantitative PCR using any of a number of available quantitative PCR machines along with either own-design or commercially available assays. Purchasing a suitable instrument for this step can add several thousand dollars to the costs of establishing a sequencing capability (Box A9-1).

In preparation for amplification, template molecules are immobilized on a solid surface, which is a flow cell for sequencing with the Illumina platform and solid beads or ion sphere particles for other approaches. Simultaneous solid-phase amplification of millions or billions of spatially separated template fragments prepares the way for massively parallel sequencing. For the Illumina platform, template amplification is automated and is performed either directly on the instrument (for the MiSeq, and the HiSeq 2500 sequencer in rapid-run mode) or using the cBot, a separate instrument that is dedicated to this task (used in conjunction with the Genome Analyzer IIx and the HiSeq 2000 machine). Clusters are generated by bridge amplification on the surface of the flow cell. For platforms that use bead-based immobilization (the SOLiD [from Life Technologies], 454 and Ion Torrent platforms), amplified template sequence libraries are prepared off-instrument, relying on an emulsion PCR, in which the beads are enclosed in aqueous-phase microreactors and are kept separated from each other in a water-in-oil emulsion.

Sequencing chemistry Although these platforms rely on a sequencing-by-synthesis design, they differ in the details of the sequencing chemistry and the approach used to read the sequence. The Illumina sequencing platform depends on Solexa chemistry (Bentley et al., 2008), which includes reversible termination of sequencing products. In each sequencing cycle, a mixture of fluorescently labelled ‘reversible terminator’ nucleotides with protected 3′-OH groups (and a different emission wavelength for each nucleotide) is perfused across the flow cell. Wherever a complementary nucleotide is present on the template strand, the terminator is incorporated and imaged, and then the signal is quenched and the terminator nucleotide is chemically deprotected at the 3′-OH group.

The 454 and Ion Torrent sequencing platforms avoid the use of terminators. Instead, in each cycle a single kind of dNTP is flowed across the template. When there is base complementarity between the dNTP and the next available position in the template, the DNA polymerase incorporates the base onto the extending strand, liberating pyrophosphate and hydrogen ions. When there is no complementarity, DNA synthesis is halted temporarily; each type of dNTP is flowed across the template in turn according to the dispensing cycle, and DNA synthesis is thus re-initiated when the next complementary dNTP is added. The 454 platform exploits a pyrosequencing approach (Margulies et al., 2005; Ronaghi et al., 1998) whereby the presence of pyrophosphate is signalled by visible light as the result of an enzyme cascade. The order and intensity of the light peaks are recorded as “flowgrams.” The Ion Torrent platform relies on a modified silicon chip to detect hydrogen ions that are released during base incorporation; the resulting lack of reliance on imaging makes this platform the first “post-light” sequencing instrument (Rothberg et al., 2011).

The SOLiD platform (Valouev et al., 2008) and the platform from Complete Genomics (Drmanac et al., 2010) depend on sequencing by ligation. In this approach, fluorescent probes undergo iterative steps of hybridization and ligation to complementary positions in the template strand at the 5′ end of the extending strand, followed by fluorescence imaging to identify the ligated probe.

Single-molecule sequencing Single-molecule sequencing brings the promise of freedom from amplification artefacts as well as from onerous sample and library preparations. The HeliScope Single-Molecule Sequencer (from Helicos BioSciences) was the first platform for single-molecule sequencing to hit the market place in 2009 (Bowers et al., 2009). This technology applies one-colour reversible-terminator sequencing to unamplified single-molecule templates. However, this platform has been hampered by its high price and poor instrument sales and, following the delisting of the company from the stock market, there are significant doubts over the future of the platform.

More recently, Pacific Biosciences has delivered “real-time sequencing,” in which dye-labelled nucleotides are continuously incorporated into a growing DNA strand by a highly processive, strand-displacing φ29-derived DNA polymerase (Eid et al., 2009). Each DNA polymerase molecule is tethered within a zero-mode waveguide detector, which allows continuous imaging of the labelled nucleotides as they enter the strand (Levene et al., 2003).

Choosing a Platform

High-end instruments The high-throughput sequencing market presents the user with a challenging choice between bulky, expensive high-end instruments and the new generation of bench-top instruments (Tables A9-1, A9-2). The high-end machines include PacBioRS (from Pacific Bioseciences), the HiSeq instruments, Genome Analyzer IIx, the SOLiD 5500 series and the 454 GS FLX+ system. These deliver a high throughput and/or long read lengths but come with set-up costs of hundreds of thousands of dollars, placing them beyond the reach of the average research laboratory or even department. These machines are thus only suitable for large sequencing centres or core facilities. This raises the important question of where an ‘average’ microbiologist should source sequencing from.

TABLE A9-1. Comparison of Next-Generation Sequencing Platforms.


Comparison of Next-Generation Sequencing Platforms.

TABLE A9-2. The Applicability of the Major High-Throughput Sequencing Platforms.


The Applicability of the Major High-Throughput Sequencing Platforms.

These instruments can deliver dozens to thousands of bacterial genomes per run, as illustrated by several high-impact publications on bacterial genomes and metagenomes (Harris et al., 2012; Hess et al., 2011; Mutreja et al., 2011; Qin et al., 2010). However, to achieve efficiencies in time and cost, optimum sequencing of microbial samples on such instruments requires onerous and expensive bar-coding and multiplexing of samples and/or subdivision of runs (for example, through gaskets or the use of single channels on the Illumina platform), as well as a sophisticated scheduling system. Compare sequencing a single human genome with the equivalent sequencing throughput for 1,000 average-sized bacterial genomes: although the sequencing run itself may be comparable in both scenarios, >1,000 samples and libraries need to be prepared for the bacterial run, compared with just one for the human genome. The costs and effort involved in sequencing 1,000 bacterial genomes therefore vastly outweigh the requirements for sequencing a single human genome, so the hasty calculation that one human genome-sequencing project equates to 1,000 bacterial genome-sequencing projects starts to look rather optimistic.

Bench-top instruments Three modestly priced bench-top instruments with throughputs and workflows that are well suited to microbial applications have recently hit the market. The 454 GS Junior was released in early 2010 and is a smaller, lower-throughput version of the 454 GS FLX+ machine, exploiting similar emulsion PCR and pyrosequencing approaches but with lower set-up and running costs (Loman et al., 2012). The Ion PGM was launched in early 2011 and saw almost immediate use in the crowd-sourced analysis of the Shiga toxin-producing Escherichia coli (STEC) outbreak in Germany (Rohde et al., 2011; Mellmann et al., 2011). This platform has also shown the greatest improvement in performance in recent months: an assembly for the STEC outbreak strain was generated in May 2011 using data from five Ion Torrent 314 chips and consisted of more than 3,000 contigs, whereas comparable data from a single newer 316 chip assembled into fewer than 400 contigs. The MiSeq, which began to ship to customers in late 2011, is based on the existing Solexa chemistry but has dramatically reduced run times compared with the HiSeq (hours rather than days). This is made possible by the use of a smaller flow cell, leading to a reduced imaging time and faster microfluidics.

Each of these bench-top instruments is capable of sequencing a whole bacterial genome in days. The performance of all three instruments was recently compared by sequencing a British isolate from the German STEC outbreak of 2011 (Loman et al., 2012). In this evaluation, all three bench-top sequencing platforms generated useful draft genome sequences with assemblies that mapped to ≥95% of the reference genome, so by these criteria all could be judged fit for purpose. However, no instrument was able to generate accurate one-contig-per-replicon assemblies that might equate to a finished genome.

The MiSeq was found to have the highest throughput per run, lowest error rate and most user-friendly workflow of the three instruments: hands-on time is low because template amplification is carried out directly on the instrument without manual intervention. However, a paired-end 150-base sequencing run took more than 27 hours. The MiSeq is notable for being able to sequence fragments from both ends (paired-end mode) without changes to the library preparation stage or additional intervention during sequencing.

The 454 GS Junior produced the longest reads (mean 522 bases) and generated the least fragmented assemblies but had the lowest throughput and a cost-per-base that was at least one order of magnitude higher than the cost for the other two platforms. The Ion PGM delivered the fastest throughput per hour (80–100 Mb) and had the shortest run time (around 3 hours) but also had the shortest reads (mean 121 bases), although kits producing 200 bases have since been made available for this instrument. The Ion PGM and 454 GS Junior were both prone to making mistakes in homopolymeric tracts, and these mistakes caused assembly errors that resulted in frame-shifts in coding regions, even when data were assembled at high read coverage.

Coping with the Data

The high-end sequencing platforms make considerable demands on the local information technology infrastructure in terms of data tracking and analysis, short-term storage and long-term archiving. Bench-top instruments have more modest information technology requirements. However, each platform delivers data in a slightly different format, and saying that one has sequenced a bacterial genome means different things on different platforms and can create difficulties when comparing or combining data generated on different platforms (Table A9-2).

There are two main analytical approaches to the exploitation of high-throughput sequencing data: reads can be aligned—that is, mapped—to a known reference sequence or subjected to de novo assembly. The choice of strategy depends on the read length obtained (short reads are better mapped to a reference), the availability of a good reference sequence and the intended biological application (for example, genomic epidemiology versus pathogen biology).

To document genetic variation in the genomes of multiple highly related strains, a mapping approach is efficient and often sufficient. In this situation, sequence variants can be called by aligning reads to a reference genome using short-read-mapping tools (see Supplementary information S1 (table)). A mapping approach is problematic when dealing with reads from repetitive regions or from parts of the genome that are absent from the reference genome, or when a closely related reference genome is unavailable.

De novo assembly is more informative when dealing with a new pathogen or a new strain of a well-known pathogen. Sequencing errors can have a significant impact on assembly. When platforms produce random errors, the effect of these errors on assembly can be overcome by increasing the depth of coverage. However, when errors are systematic and occur in predictable contexts (for example, in homopolymers), increasing the depth of coverage is unlikely to help, and it may be necessary to sequence the troublesome regions using an alternative technology. Very high-quality, near complete references may be obtained by a hybrid approach, such as in recent studies combining Pacific Biosciences and Illumina data (Bashir et al., 2012; Koren et al., 2012).

A variety of commonly used assemblers is now available (see Supplementary information S1 (table)), ranging from the platform specific (for example, Newbler from Roche) to the more generally applicable (for example, MIRA (Chevreax, et al., 2004), Velvet (Zerbino and Birney, 2008), and the CLC Genomics Workbench from CLC Bio). De novo assemblies can be compared using Mauve (Darling et al., 2004) or Mugsy (Angiuoli and Salzberg, 2011), and the assemblies can be manually examined using the Tablet viewer (Milne et al., 2010). For annotation of assemblies, Glimmer (Delcher et al., 1999) works well for coding-sequence prediction, while tRNAScan-SE (Schattner et al., 2005) and RNAmmer (Lagesen et al., 2007) work well for stable-RNA prediction. There are numerous pipelines for automatic annotation of de novo assemblies, including RAST (Aziz et al., 2008), IMG/ER (Markowitz et al., 2009) and the IGS Annotation Engine (developed by the Institute for Genome Sciences, University of Maryland School of Medicine, USA), although care must be taken when interpreting results from such services, as the public databases used contain annotation errors that are then propagated to newly sequenced genomes (Richardson and Watson, 2012).

For microbial applications, all of the above programs run quickly (in minutes or hours) and are not particularly processor intensive. Some workflows combine a series of programs and provide an accessible interface for microbiologists who are not bioinformatics specialists. For example, xBASE-NG provides a “one-stop shop” for assembly, annotation and comparison of bacterial genome sequences (Chaudhuri et al., 2008). Sophisticated phylogenetic analyses are more demanding and may be beyond the capability of the average research group. One particular issue when constructing bacterial whole-genome phylogenies is the clouding of phylogenetic signal by recombination events and homoplasy (Marttinen et al., 2012). Algorithms such as ClonalFrame (Didelot and Falush, 2007) and ClonalOrigin (Didelot et al., 2010) take multiple whole-genome alignments as input and attempt to identify blocks of recombination. These approaches are computationally very expensive, and there is no “off the shelf” solution to comparing hundreds or thousands of bacterial genomes. There is a growing interest in alignment-free approaches for constructing bacterial phylogenies, as it is thought that these approaches may help address the computational challenges of these analyses (Köser et al., 2012).

A recurring problem with data from high-throughput sequencing is meeting the requirement, as stipulated by journals and funders, that data be lodged in the public domain. Unannotated assembled sequences can be uploaded to conventional sequence databases, such as GenBank, fairly easily. However, submission of annotated sequences can be onerous, slowing down the process of publication even further. Submission of sequence reads to short-read archives may be hampered by slow data transfer rates, and it remains uncertain how sustainable such archives will prove to be in the future. There may come a time when the easiest way to obtain such data will be to re-sequence the sample, rather than upload, archive and retrieve large data sets.

Current Applications and Future Prospects

High-throughput sequencing has already transformed microbiology. Rapid, low-cost genome sequencing has helped make genomic epidemiology a reality, allowing us to track the spread of pathogens through hospitals (Köser et al., 2012; Lewis et al., 2010), communities (Gardy et al., 2011; Mellmann et al., 2011; Rohde et al., 2011) and across the globe (Beres et al., 2010; Harris et al., 2011; Mutreha et al., 2011). High-throughput sequencing has already had a huge impact on our understanding of microbial evolution, whether within a single patient over years or decades (for example, Pseudomonas aeruginosa in a patient with cystic fibrosis [Cramer et al., 2011]) or globally across the centuries (for example, influenza virus in the 1918 influenza pandemic [Dunham et al., 2009] or mediaeval Yersinia pestis in the Black Death [Bos et al., 2011]). Genome sequences have even been obtained from single microbial cells (Woyke et al., 2009).

There are many applications beyond mere genome sequencing. High-throughput sequencing has opened up new avenues for sequence-based profiling and metagenomics of complex microbial communities, including those associated with human health and disease (Hess et al., 2011; Qin et al., 2010). Particularly exciting is the promise of culture-independent approaches to pathogen discovery and detection (Lipkin, 2010). In the research laboratory, sequencing is taking over from microarrays as the method of choice for studying gene expression (using RNA sequencing (RNA-seq)) (Passalacqua et al., 2009; Sharma et al., 2010; Sorek and Cossart, 2010), mutant libraries (using Tn-seq and transposon-directed insertion site sequencing (TraDIS)) (Langridge et al., 2009; van Opinjnen et al., 2009) and protein–DNA interactions (using chromatin immunoprecipitation followed by sequencing (ChIP–Seq)) (Grainger et al., 2009).

So, what does the future hold? For current platforms, we can expect to see cheaper, easier library preparation methods and ever-higher sequencing throughputs. However, with the arrival of transformative new technologies (Branton et al., 2008) (Box A9-2), this might be seen as tinkering around the edges. The tipping point has already been reached such that the staff and infrastructure costs of handling and analysing sequence data outweigh the costs of generating that data. If the promise of portable, single-molecule, long-read-length sequencing bears fruit and these technologies show the same steady increase in functionality and cost-effectiveness that we have seen with earlier high-throughput sequencing platforms, we could be just a few years away from user-friendly, “$1-a-pop” bacterial genome sequencing.

Box Icon

BOX A9-2

Oxford Nanopore: The Game Changer? In February 2012, at a conference in the United States, the British company Oxford Nanopore Technologies announced a new, near-market “strand sequencing” technology that exploits protein nanopores embedded (more...)

As we have argued elsewhere (Pallen and Loman, 2011), high-throughput sequencing may well be poised to make a decisive impact on clinical microbiology, but there are still many difficulties to be overcome—for example, in presenting complex information to clinicians, in agreeing common formats for data sharing, in integrating genomics with clinical informatics and clinical practice, in benchmarking novel technologies and in gaining regulatory approval (from the US FDA and other bodies) for clinical applications of these technologies. One thing is certain: thanks to the expected relentless progress in sequencing technology, microbiology in the next 20 years will look nothing like it does now.


The authors thank the anonymous reviewers for their help and suggestions.

Competing Interests Statement

The authors declare competing financial interests. Mark J. Pallen was a winner of an Ion Personal Genome Machine (PGM) (from Ion Torrent, part of Life Technologies) in the European Ion PGM Grant Programme. Nicholas J. Loman has received expenses to speak at an Ion Torrent meeting organized by Life Technologies and has received honoraria and expenses from Illumina to speak at Illumina meetings. Chrystala Constantinidou, Jacqueline Z. M. Chan, Mihail Halachev, Martin Sergeant, Charles W. Penn and Esther R. Robinson declare no competing financial interests.



,35,* ,36,37,* ,35,* ,35 ,36,38 ,39 ,35 ,36,38 ,35 ,40 ,41 ,36 ,41 ,38 ,42 ,36 ,36 ,41 ,40 ,35 and 35.

35 Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
36 International Vaccine Institute, SNU Research Park, Bongchun 7 dong, Kwanak, Seoul 151-919, Korea.
37 Department of Pharmacy, College of Pharmacy, Hanyang University, Kyeonggi-do 426-791, Korea.
38 Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 151-742, Korea.
39 Centre for Microbiology Research, KEMRI at Kenyatta Hosp Compound, Off Ngong Road, PO Box 43640-00100, Kenya.
40 Department of Microbiology and Immunology and University of Gothenburg Vaccine Research Institute, The Sahlgrenska Academy at the University of Gothenburg, Box 435, 40530 Göteborg, Sweden.
41 National Institute of Cholera and Enteric Diseases, P-33, CIT Scheme XM, Beliaghata, Kolkata 700 010, India.
42 University of Cambridge, Department of Veterinary Medicine, Madingley Road, Cambridge CB3 0ES, UK.
* These authors contributed equally to this work.

Vibrio choleraeis a globally important pathogen that is endemic in many areas of the world and causes 3–5 million reported cases of cholera every year. Historically, there have been seven acknowledged cholera pandemics; recent outbreaks in Zimbabwe and Haiti are included in the seventh and ongoing pandemic (Chin et al., 2011). Only isolates in serogroup O1 (consisting of two biotypes known as “classical” and “El Tor”) and the derivative O139 (Chun et al., 2009; Hochhut and Waldor, 1999) can cause epidemic cholera (Chun et al., 2009). It is believed that the first six cholera pandemics were caused by the classical biotype, but El Tor has subsequently spread globally and replaced the classical biotype in the current pandemic (Chin et al., 2011). Detailed molecular epidemiological mapping of cholera has been compromised by a reliance on sub-genomic regions such as mobile elements to infer relationships, making El Tor isolates associated with the seventh pandemic seem superficially diverse. To understand the underlying phylogeny of the lineage responsible for the current pandemic, we identified high-resolution markers (single nucleotide polymorphisms; SNPs) in 154 whole-genome sequences of globally and temporally representativeV. choleraeisolates. Using this phylogeny, we show here that the seventh pandemic has spread from the Bay of Bengal in at least three independent but overlapping waves with a common ancestor in the 1950s, and identify several transcontinental transmission events. Additionally, we show how the acquisition of the SXT family of antibiotic resistance elements has shaped pandemic spread, and show that this family was first acquired at least ten years before its discovery inV. cholerae.


Whole-genome analysis is perhaps the ultimate approach to building a robust phylogeny in recently emerged pathogens, through the identification of SNPs and other rare genetic variants (Harris et al., 2010). Therefore, we sequenced the genomes of 136 isolates of V. cholerae, the causative agent of several million cholera cases each year ( These sequences, including 113 isolates from the seventh pandemic, were added to 18 previously published genomes (CDC, 2010; Chin et al., 2011; Chun et al., 2009) to produce a global genomic database from isolates collected in the course of a century. We included representative El Tor isolates collected in the past four decades and compared these to previously reported and novel genome sequences of both classical and non-O1 types (Chin et al., 2011; Chun et al., 2009).

The sequence reads were mapped to the reference sequence of El Tor N16961 (Heidelberg et al., 2000), a seventh-pandemic V. cholerae that was isolated in Bangladesh in 1975 (see footnote to Supplementary) and the resulting consensus tree identified eight distinct phyletic lineages (L1–L8, see Supplementary Fig. 1 and Supplementary Table 1 for strain and lineage information), six of which incorporated O1 clinical isolates. The classical isolates formed a distinct, highly clustered group (L1), distant from the El Tor isolates of the seventh pandemic (L2). It is clear from Supplementary Fig. 1 that the classical and El Tor clades did not originate from a recent common ancestor and instead seem to be independent derivatives with distinct phylogenetic histories, consistent with previous proposals (Chun et al., 2009). Isolates of L4 share a common ancestor with previously reported non-conventional O1 isolates (Chun et al., 2009) (Supplementary Fig. 2), and are likely to have acquired the O1 antigen genes by a recombination event onto a genetically distinct genome backbone. Isolates of L7 also have a distinct backbone, whereas L2, L3 (USA Gulf coast strains), L5, L6 and L8 share a more “El-Tor-like” genome backbone, and the L1 backbone is of the “classical” type.

Genome-wide SNP analysis showed that the 123 El Tor isolates in the L2 cluster (Supplementary Fig. 1) differed from the reference by only 50–250 SNPs. With this large sample size we were able to construct a high-resolution phylogeny that shows unequivocally that the current pandemic is monophyletic and originated from a single source, providing a framework for future epidemiological and phenotypic analysis of V. cholerae, including transmission-tracking and typing.

Predicted recombined regions were identified, and along with genomic islands and mobile genetic elements, these were initially excluded from the phylogenetic analysis of seventh-pandemic isolates, to determine the underlying phylogeny. Notably, analysis of the tree (Figure A10-1; see Supplementary Fig. 3 for a tree with strain names) provides clear evidence of a clonal expansion of the lineage, with a strong temporal signature. This is most clearly illustrated by the fact that the most divergent isolates from the N16961 reference are represented by the oldest seventh-pandemic isolate in our collection, A6, collected in 1957, together with the most recent Haitian isolates (CDC, 2010) from late 2010. We performed a linear regression analysis on all the L2 isolates to calculate the rate of SNP accumulation on the basis of the date of isolation and the root-to-tip distance. The shape of the tree and temporal signatures in Fig. A10-1 show a very consistent rate of SNP accumulation, 3.3 SNPs year−1 (R2 = 0.73, Supplementary Fig. 4) in the core genome, emphasizing the tree’s robustness and utility for transmission studies. The only exception to this is V. cholerae A4, a repeatedly passaged laboratory strain that was originally isolated in 1973 (Supplementary Figs 3 and 4). The estimated rate of mutation for our seventh-pandemic V. cholerae collection was 8.3×10−7 SNPs site−1 year−1: between 5 and 2.5 times slower than the rate estimated for recent clonal expansions of some other human-pathogenic bacteria (Croucher et al., 2011; Harris et al., 2010).

A maximum-likelihood phylogenetic tree of the seventh pandemic lineage of V. cholerae


A maximum-likelihood phylogenetic tree of the seventh pandemic lineage of V. cholerae based on SNP differences across the whole core genome, excluding probable recombination events. The pre-seventh-pandemic isolate M66 was used as an outgroup to root (more...)

The seventh-pandemic tree can be subdivided into three major groups or clades by clustering using Bayesian analysis of population structure (Corander et al., 2003, 2008) (shown as waves 1–3 in Figure A10-1); this clustering is mostly consistent with the cholera toxin (CTX) type of the three clades, which represent independent waves of transmission. Although examples of genetic determinants differentiating these three CTX types have previously been published (Safa et al., 2010), they have not been put into a phylogenetic context, undermining efforts to investigate the evolutionary aspects of their emergence. Perhaps as a result, there has been substantial uncertainty in naming new CTX types as they have been discovered. Our data shows that the first CTX type is canonical CTX El Tor and we propose that it is renamed CTX-1; for the other two we propose a new expandable nomenclature and class them as CTX-2 and CTX-3 (Supplementary Table 2).

Isolates spanning A18 to PRL5 (the lower clade in Figure A10-1) represent wave 1, covering about 16 years (1977–1992). All isolates in this group lack the integrative and conjugative element (ICE) of the SXT/R391 family, encoding resistance to several antibiotics (Garriss et al., 2009; Wozniak et al., 2009). It is within this time period that seventh-pandemic cholera occurred in South America (Heidelberg et al., 2000). Our data show that the South American isolates form a discrete cluster, which also includes a single Angolan isolate collected in 1989. The position of the Angolan isolate at the base of the South American group indicates that transmission to South America may have been via Africa, as previously proposed (Lam et al., 2010). We used BEAST (Drummond et al., 2006) to translate evolutionary distance in SNPs into time (Supplementary Fig. 5) and this indicated that transmission to South America is likely to have occurred between 1981 and 1985. The branch harbouring this West African–South American (WASA) clade is distinguished from all other V. cholerae by the acquisition of novel VSP-2 genes (O’Shea et al., 2004) and a novel genomic island that we have denoted WASA1 (Supplementary Table 3). Notably, the Angolan isolate A5 and all the South American isolates are discriminated by just ten SNPs. Based on the accumulation rate of 3.3 SNPs year−1 (Supplementary Fig. 4), the 3-year time period between the isolation of A5 and the oldest South American isolate included in this study, A32, is consistent with previous studies indicating that cholera spread as a single epidemic (Lam et al., 2010).

The first acquisition of an SXT/R391 ICE lies at the point of transition from the wave-1 cluster to the wave-2 cluster. Using our dated phylogeny (Supplementary Fig. 5) (Drummond et al., 2006), we were able to date this transition and the first acquisition of SXT/R391 ICE to 1978–84, ten years before its discovery in O139 strains, which also fits with the otherwise surprising discovery of SXT in a Vietnamese strain isolated before 1992 (Bani et al., 2007). This date would also correspond to the most recent common ancestor (MRCA) of the O1 and O139 serogroup isolates. Analysis of the diversity of the common regions of SXT/R391 ICEs in our seventh-pandemic collection (Supplementary Fig. 6) shows that they are discriminated by 3,161 SNPs, compared to only 1,757 SNPs used to define the core whole-genome phylogeny in Figure A10-1. This indicates either that there have been several recombination events within these ICEs, or that they have been acquired independently several times on the tree (Garriss et al., 2009). Isolates from wave 2 represent a discrete cluster that shows a complex pattern of accessory elements in the CTX locus (Figure A10-1) and a wide phylogeographical distribution. It is also notable that isolates collected in Vietnam in 1995–2004 and strain A109 are the only wave-2 isolates studied from this time period that lack an SXT/R391 ICE. We examined the genomic locus in these clones that marks the point of insertion of SXT/R391 ICE in all other V. cholerae isolates and found no remnants of this conjugative element, which may have been lost from this lineage (no “scar” in DNA sequence is expected after the precise excision of SXT/R391 ICE).

Ignoring the CTX-related genomic regions, the seventh-pandemic L2 isolates show relatively little evidence of recombination either within or from outside the tree. On the basis of the SNP distribution, 1,930 out of 2,027 SNPs (Supplementary Table 4) are congruent with the tree, leaving 97 homoplasies that could be due to selection or homologous recombination among the L2 isolates. Only 270 SNPs were predicted to be due to homologous recombination from outside the tree. The only two branches in which the SNP distribution indicated considerable recombination were those leading to the WASA cluster (Supplementary Fig. 7) and the O139 serogroup. Aside from the acquisitions of CTX and the SXT/R391 ICEs, we found evidence of gene flux affecting only 155 other genes (Supplementary Figs 8 and 9 and Supplementary Table 3).

Also represented in our collection are two isolates of serogroup O139, which are known to have arisen from a homologous replacement of their O-antigen determinant into an El Tor genomic backbone (Chun et al., 2009; Hochhut and Waldor, 1999; Lam et al., 2010). CTX types that are different from El Tor, classical, CTX-2 and CTX-3 have been reported for the O139 serogroup (Basu et al., 2000; Faruque and Mekalanos, 2003; Faruque et al., 2000; Nair et al., 1994); however, the phylogenetic position of the two strains included in this study shows that O139 was derived from O1 El Tor and therefore represents another distinct but spatially restricted wave from the common source.

We were also able to date the ancestor of the El Tor seventh-pandemic lineage, L2, as having existed in 1827–1936 (Supplementary Fig. 5), which is consistent with the predicted date of origin from the linear regression plot (1910, Supplementary Fig. 4). This also corresponds well with the date of isolation of the first El Tor biotype strain in 1905 (Cvjetanovic and Barua, 1972).

It is apparent from Figure A10-1 that V. cholerae wave 1, which spread globally, was later replaced by the more geographically restricted wave 2 and wave 3, a phenomenon supported by local clinical observations and phage analysis (Safa et al., 2010). This also reflects the fact that V. cholerae epidemics since 2003–2010 have been restricted to Africa and south Asia. Notably, the rates of SNP accumulation calculated independently for wave 1, wave 3 and wave 2 (2.3, 2.6 and 3.5 SNPs year−1 respectively) are consistent with the rate calculated over the whole collection period (Supplementary Fig. 4).

The clonal clustering of L2 isolates, the constant rate of SNP accumulation and the temporal and geographical distribution support the concept that the seventh pandemic has spread by periodic radiation from a single source population located in the Bay of Bengal, followed by local evolution and ultimately local extinction in non-endemic areas. This is evidenced by the disappearance of wave-1 isolates, followed by the independent expansion of waves 2 and 3, both derived from the same original population, occurring within seven years of each other. These two waves are clearly distinguished from the first by the acquisition of SXT/R391 ICEs (Figure A10-1). Plotting the intercontinental spread of each wave onto the world map (Figure A10-2) clearly shows that the V. cholerae seventh pandemic is sourced from a single, restricted geographical location but has spread in overlapping waves. In these ancestral waves, there are at least four recent long-range transmission events (A–D in Figure A10-1), in which isolates clearly share a common ancestor with recent strains at distant locations, indicating that such events are not uncommon. The most recent example of this is the Haitian outbreak, in which strains share a very recent common ancestor with south-Asian strains at the tip of wave 3. The number of SNP differences, even at whole-genome resolution, between the Haitian and the most closely related Indian and Bangladeshi strains is very low. This demonstrates that the Haitian strains must have come from south Asia, at most within the last six years. However, the limited discrimination means that it may prove challenging to make country-specific inferences as to the origins of the Haitian strains on the basis of DNA sequence alone. For such conclusions to be robust, great care must be taken in the selection of samples for analysis.

A map showing transmission events for the seventh pandemic phylogenetic tree


Transmission events inferred for the seventh-pandemic phylogenetic tree, drawn on a global map. The date ranges shown for transmission events are taken from the BEAST analysis, and represent the median values for the MRCA of the transmitted strains (later (more...)

Despite clear evidence of sporadic long-range transmission events that are likely to be associated with direct human carriage, the overall pattern seen in our data is one of continued local evolution of V. cholerae in the Bay of Bengal, with several independent waves of global transmission resulting in short-term epidemics in non-endemic countries. Although our sample set is substantial, there are clearly areas where geographical coverage is limited. However, the structure of the tree, with deep branches between the major waves, means that increasing the number of strains and the resolution further should only identify further independent waves of transmission. Indeed, we cannot rule out the possibility of an El Tor population persisting or evolving as a new wave of the seventh pandemic; for example, in areas such as China that were not sampled in this study.

One notable factor in the ongoing evolution of pandemic cholera was the acquisition of the SXT/R391-family antibiotic resistance element. The clinical use of the antibiotics tetracycline and furazolidone for cholera treatment started in 1963 and 1968 respectively, about 15 years before our prediction of the first acquisition of an SXT/R391 ICE (1978–1984). Our analysis provides a robust framework for elucidating the evolution of the seventh pandemic further, and for studying the local evolution, particularly in the Bay of Bengal, that has such a key role in the evolution of cholera.


Genomic Library Creation and Multiplex Sequencing

Unique index-tagged libraries for each sample were created, and up to 12 separate libraries were sequenced in each of eight channels in Illumina Genome Analyser GAII cells with 54-base paired-end reads. The index-tag sequence information was used for downstream processing to assign reads to the individual samples (Harris et al., 2010).

Detection of SNPs in the Core Genome

The 54-base paired-end reads were mapped against the N16961 El Tor reference (accession numbers AE003852 and AE003853) and SNPs were identified as described in Croucher et al. (2011). The unmapped reads and the sequences that were not present in all genomes were not considered a part of the core genome, and therefore SNPs from these regions were not included in the analysis. Appropriate SNP cutoffs were chosen to minimize the number of false-positive and false-negative calls; SNPs were filtered to remove those at sites with a SNP quality score lower than 30, and SNPs at sites with heterogeneous mappings were filtered out if the SNP was present in fewer than 75% of reads at that site. From the seventh-pandemic data set, high-density SNP clusters indicating possible recombination were excluded (Croucher et al., 2011). In total, 2,027 SNPs were detected in the core genome of the El Tor lineage. Of these, 270 SNPs were predicted to be due to recombination. Removing these provided a data set characterized by 1,757 SNPs: these were used to produce the final phylogeny.

Comparative Genomics

Raw Illumina data were split to generate paired-end reads, and assembled using a de novo genome-assembly program, Velvet v0.7.03 (Zerbino et al., 2008), to generate a multi-contig draft genome for each of 133 V. cholerae strains (Harris et al., 2010). The overlap parameters were optimized to give the highest N50 value. Because seventh-pandemic V. cholerae strains are closely related in the core, Abacas (Assefa et al., 2009) was used to order the contigs using the N16961 El Tor strain as a reference, followed by annotation transfer from the reference strain to each draft genome (Harris et al., 2010). Using the N16961 sequence as a database to perform a TBLASTX (Altschul et al., 1990) for each draft genome, a genome comparison file was generated that was subsequently used in the Artemis comparison tool (Carver et al., 2008) to compare the genomes manually and search for novel genomic islands.

Phylogenetic Analysis

A phylogeny was drawn for V. cholerae using RAxML v0.7.4 (Stamatakis, 2006) to estimate the trees for all SNPs called from the core genome. The general time-reversible model with gamma correction was used for among-site rate variation for ten initial trees (Harris et al., 2010). USA Gulf coast strains A215 and A325, which have substantially different core genomes from all other strains in our collection, were used as an outgroup to root the global phylogeny (Supplementary Fig. 1), whereas a pre-seventh-pandemic strain, M66 (accession numbers CP001233 and CP001234), and strain A6 (from our collection), were used to root the seventh-pandemic phylogenetic tree (Figure A10-1).

CTX Prophage Analysis

For each strain, the CTX structure and the sequence of rstA, rstR and ctxB was determined as in Lee et al. (2009) and Nguyen et al. (2009).

Linear Regression and Bayesian Analysis

The phylogram for the seventh pandemic was exported to Path-O-Gen v1.3 ( and a linear regression plot for isolation date versus root-to-tip distance was generated. The same plot was also constructed individually for the three waves, but A4, being a laboratory strain, was excluded from the latter analysis.

The presence of three waves was checked, and their makeup was determined, using a BAPS analysis performed on the SNP alignment containing the unique SNP patterns from the seventh-pandemic isolates. The program was run using the BAPS individual mixture model, and three independent iterations were performed using an upper limit for the number of populations of 20, 21 and 22 to obtain optimal partitioning of the sample. The dates for the acquisition of SXT and the ancestors of the three waves were inferred using the Bayesian Markov chain Monte Carlo framework BEAST (Drummond and Rambaut, 2007). We used the final SNP alignment with recombinant sites removed and fixed the tree topology to the phylogeny produced by RAxML, as described above. We used BEAST to estimate the rates of evolution on the branches of the tree using a relaxed molecular clock (Drummond et al., 2006), which allows rates of evolution to vary amongst the branches of the tree. BEAST produced estimates for the dates of branching events on the tree by sampling dates of divergence between isolates from their joint posterior distribution, in which the sequences are constrained by their known date of isolation. The data were analysed using a coalescent constant population size and a general time-reversible model with gamma correction. The results were produced from three independent chains of 50 million steps each, sampled every 10,000 steps to ensure good mixing. The first 5 million steps of each chain were discarded as a burn-in. The results were combined using Log Combiner, and the maximum clade credibility tree was generated using Tree Annotator, both parts of the BEAST package ( Convergence and the effective sample-size values were checked using Tracer 1.5 (available from ESS values in excess of 200 were obtained for all parameters.


The seventh-pandemic cholera strains were clearly distinguished by three waves and we therefore propose their CTX types to be CTX-1, CTX-2 and CTX-3 under the new nomenclature scheme (see Supplementary Table 2). Our nomenclature system is expandable and would be suitable for naming any new seventh-pandemic V. cholerae strains. With CTX-1 representing canonical El Tor, we followed the rationale: (1) For CTX-1 to CTX-2, because there was a shift of rstREl Tor to rstRClassical, rstAEl Tor to rstAClassical + El Tor and ctxBEl Tor to ctxBClassical, we called it CTX-2; (2) for CTX-1 to CTX-3, because there was a shift of ctxBEl Tor to ctxBClassical, we called it CTX-3; (3) for CTX-3 to CTX-3b, because there was only one SNP mutation in ctxBClassical from CTX-2 and rest was identical, we called it the next variant of CTX-3, which is CTX-3b.

In summary, if there is a shift of any gene from one biotype to another, the new CTX will be called CTX-n: thus the next strains fitting these criteria will be called CTX-4. However, if there is a mutation(s) that does not lead to a shift of the gene to another biotype gene, CTX-1b, CTX-1c or CTX-2b; CTX-2c or CTX-3b; CTX-3c and so on should be followed as appropriate.

Methods Summary

Genomic libraries were created for each sample, followed by multiplex sequencing on an Illumina GAIIx analyser. The 54-base paired-end reads obtained were mapped against N16961 El Tor as a reference and SNPs in the core genome were identified as described in Methods. The SNPs were used to draw a whole coregenome phylogeny as described in Harris et al. (2010). The final SNP alignment was used to perform BEAST (Drummond et al., 2006) analysis and to confirm the output of linear regression analysis. The three cholera waves reported in the seventh-pandemic phylogeny were confirmed using BAPS (Corander et al., 2003, 2008). The raw Illumina data were also assembled de novo (see Methods) so that pairwise genome comparisons could be made. A new and expandable nomenclature system describing the CTX trends seen in the last 40 years was proposed following the rationale described in Methods.

Full methods and any associated references are available in the online version of the paper at


This work was supported by The Wellcome Trust grant 076964. The IVI is supported by the Governments of Korea, Sweden and Kuwait. D.W.K. was partially supported by grant RTI05-01-01 from the Ministry of Knowledge and Economy (MKE), Korea and by R01-2006-000-10255-0 from the Korea Science and Engineering Foundation; and J.L.N.W. was supported by the Alborada Trust and the RAPIDD program of the Science & Technology Directorate, Department of Homeland Security. Thanks to A. Camilli at Tufts University Medical School for providing the corrected N16961 sequence, to B.M. Nguyen at NIHE, Vietnam, and M. Ansaruzzaman at ICDDR, Bangladesh for providing strains, and to M. Fookes at WTSI for training support.

Author Contributions

A.M., D.W.K. and N.R.T. collected the data, analysed it and performed phylogenetic analyses and comparative genomics. J.H.L., S.Y.C., E.J.K. and J.C. analysed the CTX types. S.K., S.K.N. and T.R. were involved in strain collection and serogroup analysis. T.R.C. performed Bayesian analysis; N.J.C. and S.R.H. did the computational coding. J.L.N.W., J.D.C., C.C., G.B.K., J.H., N.R.T., J.P. and G.D. were involved in the study design. A.M., N.R.T., J.P., G.D., J.H., G.B.K., N.J.C., S.R.H., T.R.C., D.W.K. and M.L. contributed to the manuscript writing.


  • Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215:403–410. [PubMed: 2231712]
  • Assefa S, Keane TM, Otto TD, Newbold C, Berriman M. ABACAS: algorithm-based automatic contiguation of assembled sequences. Bioinformatics. 2009;25:1968–1969. [PMC free article: PMC2712343] [PubMed: 19497936]
  • Bani S, et al. Molecular characterization of ICEVchVie0 and its disappearance in Vibrio cholerae O1 strains isolated in 2003 in Vietnam. FEMS Microbiol Lett. 2007;266:42–48. [PubMed: 17233716]
  • Basu A, et al. Vibrio cholerae O139 in Calcutta, 1992–1998: incidence, antibiograms, and genotypes. Emerg Infect Dis. 2000;6:139–147. [PMC free article: PMC2640858] [PubMed: 10756147]
  • Carver T, et al. Artemis and ACT: viewing, annotating and comparing sequences stored in a relational database. Bioinformatics. 2008;24:2672–2676. [PMC free article: PMC2606163] [PubMed: 18845581]
  • CDC. Update: cholera outbreak—Haiti, 2010. MMWR Morb Mortal Wkly Rep. 2010;59:1473–1479. 2010. [PubMed: 21085088]
  • Chin CS, et al. The origin of the Haitian cholera outbreak strain. N Engl J Med. 2011;364:33–42. [PMC free article: PMC3030187] [PubMed: 21142692]
  • Chun J, et al. Comparative genomics reveals mechanism for short-term and long-term clonal transitions in pandemic Vibrio cholerae. Proc Natl Acad Sci USA. 2009;106:15442–15447. [PMC free article: PMC2741270] [PubMed: 19720995]
  • Corander J, Marttinen P, Siren J, Tang J. Enhanced Bayesian modelling in BAPS software for learning genetic structures of populations. BMC Bioinformatics. 2008;9:539. [PMC free article: PMC2629778] [PubMed: 19087322]
  • Corander J, Waldmann P, Sillanpaa MJ. Bayesian analysis of genetic differentiation between populations. Genetics. 2003;163:367–374. [PMC free article: PMC1462429] [PubMed: 12586722]
  • Croucher NJ, et al. Rapid pneumococcal evolution in response to clinical interventions. Science. 2011;331:430–434. [PMC free article: PMC3648787] [PubMed: 21273480]
  • Cvjetanovic B, Barua D. The seventh pandemic of cholera. Nature. 1972;239:137–138. [PubMed: 4561957]
  • Drummond AJ, Rambaut A. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol Biol. 2007;7:214. [PMC free article: PMC2247476] [PubMed: 17996036]
  • Drummond AJ, Ho SY, Phillips MJ, Rambaut A. Relaxed phylogenetics and dating with confidence. PLoS Biol. 2006;4:e88. [PMC free article: PMC1395354] [PubMed: 16683862]
  • Faruque SM, Mekalanos JJ. Pathogenicity islands and phages in Vibrio cholerae evolution. Trends Microbiol. 2003;11:505–510. [PubMed: 14607067]
  • Faruque SM, et al. The O139 serogroup of Vibrio cholerae comprises diverse clones of epidemic and nonepidemic strains derived from multiple V. cholerae O1 or non-O1 progenitors. J Infect Dis. 2000;182:1161–1168. [PubMed: 10979913]
  • Garriss G, Waldor MK, Burrus V. Mobile antibiotic resistance encoding elements promote their own diversity. PLoS Genet. 2009;5:e1000775. [PMC free article: PMC2786100] [PubMed: 20019796]
  • Harris SR, et al. Evolution of MRSA during hospital transmission and intercontinental spread. Science. 2010;327:469–474. [PMC free article: PMC2821690] [PubMed: 20093474]
  • Heidelberg JF, et al. DNA sequence of both chromosomes of the cholera pathogen Vibrio cholerae. Nature. 2000;406:477–483. [PubMed: 10952301]
  • Hochhut B, Waldor MK. Site-specific integration of the conjugal Vibrio cholerae SXT element into prf. C. Mol. Microbiol. 1999;32:99–110. [PubMed: 10216863]
  • Lam C, Octavia S, Reeves P, Wang L, Lan R. Evolution of seventh cholera pandemic and origin of 1991 epidemic, Latin America. Emerg Infect Dis. 2010;16:1130–1132. [PMC free article: PMC3321917] [PubMed: 20587187]
  • Lee JH, et al. Classification of hybrid and altered Vibrio cholerae strains by CTX prophage and RS1 element structure. J Microbiol. 2009;47:783–788. [PubMed: 20127474]
  • Nair GB, Bhattacharya SK, Deb BC. Vibrio cholerae O139 Bengal: the eighth pandemic strain of cholera. Indian J Public Health. 1994;38:33–36. [PubMed: 7835993]
  • Nguyen BM, et al. Cholera outbreaks caused by an altered Vibrio cholerae O1 El Tor biotype strain producing classical cholera toxin B in Vietnam in 2007 to 2008. J Clin Microbiol. 2009;47:1568–1571. [PMC free article: PMC2681878] [PubMed: 19297603]
  • O’Shea YA, et al. The Vibrio seventh pandemic island-II is a 26.9 kb genomic island present in Vibrio cholerae El Tor and O139 serogroup isolates that shows homology to a 43.4 kb genomic island in V vulnificus. Microbiology. 2004;150:4053–4063. [PubMed: 15583158]
  • Safa A, Nair GB, Kong RY. Evolution of new variants of Vibrio cholerae O1. Trends Microbiol. 2010;18:46–54. [PubMed: 19942436]
  • Stamatakis A. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics. 2006;22:2688–2690. [PubMed: 16928733]
  • Wozniak RA, et al. Comparative ICE genomics: insights into the evolution of the SXT/R391 family of ICEs. PLoS Genet. 2009;5:e1000786. [PMC free article: PMC2791158] [PubMed: 20041216]
  • Zerbino DR, Birney E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 2008;18:821–829. [PMC free article: PMC2336801] [PubMed: 18349386]


44,* and 45.

44 Eckerd College, 4200 54th Avenue South, St. Petersburg, Florida 33711.
45 Mote Marine Laboratory, 1600 Ken Thompson Parkway, Sarasota, Florida 34236.
* To whom correspondence should be addressed. E-mail: ude.drekce@hkprahs.


Recent research has explored the possibility that increased sea-surface temperatures and decreasing pH (ocean acidification) contribute to the ongoing decline of coral reef ecosystems. Within corals, a diverse microbiome exerts significant influence on biogeochemical and ecological processes, including food webs, organismal life cycles, and chemical and nutrient cycling. Microbes on coral reefs play a critical role in regulating larval recruitment, bacterial colonization, and pathogen abundance under ambient conditions, ultimately governing the overall resilience of coral reef systems. As a result, microbial processes may be involved in reef ecosystem-level responses to climate change. Developments of new molecular technologies, in addition to multidisciplinary collaborative research on coral reefs, have led to the rapid advancement in our understanding of bacterially mediated reef responses to environmental change. Here we review new discoveries regarding (1) the onset of coral-bacterial associations; (2) the functional roles that bacteria play in healthy corals; and (3) how bacteria influence coral reef response to environmental change, leading to a model describing how reef microbiota direct ecosystem-level response to a changing global climate.


The health of coral reefs is declining on a global scale and continues to be threatened by overfishing and habitat destruction. Anthropogenically induced global climate change has been identified as a significant threat to these sensitive ecosystems. As temperatures rise, bleaching and diseases are increasing, and excess atmospheric carbon dioxide is greatly altering reef ecosystems by changing seawater chemistry through decreases in pH (Anthony et al., 2011).

In a recent review, Bosch and McFall-Ngai (2011) highlight the significance of viewing animals as “metaorganisms”—multicellular organisms consisting of a macroscopic host and multiple microorganisms that interact synergistically to shape the ecology and evolution of the entire association. In this sense, the term metaorganism can be applied to a broad range of animal-microbe symbioses, ranging from humans to sponges (Bosch and McFall-Ngai, 2011). Coral research within this perspective has revolutionized the way that researchers study corals. In scleractinian (hard) corals, the term “holobiont” (Knowlton and Rohwer, 2003) was adapted to indicate that corals are dynamic, multi-domain assemblages consisting of an animal host, symbiotic dinoflagellates in the genus Symbiodinium, bacteria, archaea, fungi, and viruses (Rohwer et al., 2001, 2002; Stat et al., 2006; Wegley et al., 2007; Thurber et al., 2009). The term metaorganism is especially useful for describing corals and reflecting that corals’ response to environmental change is driven by physiological interactions among the various microorganisms associated with the tissue, skeleton, and mucous layer. Corals harbor Symbiodinium, which provides fixed carbon to the host via photosynthesis, serving as the trophic foundation for coral reef ecosystems. It has been proposed that corals have additionally evolved to exploit specific bacterial metabolic capabilities that, in turn, directly modulate the survival of the coral holobiont in the marine environment (Zilber-Rosenberg and Rosenberg, 2008). An extensive characterization of the diverse microorganisms in corals will guide our understanding of the ecology of corals and coral reef ecosystems in response to a changing global climate.

Coral microbiology is a rapidly growing area of study. Early culture-based studies of coral-associated bacteria provided a foundation from which genomics, metagenomics, and transcriptomics approaches were established in corals, leading to exciting new advances in our current understanding of the diversity and dynamics of coral-associated bacterial communities. Evidence is accumulating that bacteria have an enormous influence on coral health and resilience, particularly with respect to changing reef environments (Azam and Worden, 2004; Rosenberg et al., 2007; Bourne et al., 2009; Ainsworth et al., 2010; Garren and Azam, 2012). The field of marine microbial ecology underwent a revolution in the 1990s, when culture-independent molecular techniques revealed that bacterial diversity from culture-based assessments was largely underestimated (Azam, 1998). Studies of persistent associations between corals and bacteria, both beneficial and pathogenic, were enhanced by new methods and approaches from this revolution. Those techniques were adopted by coral microbiologists, resulting in the discovery that particular components of bacterial communities are specific to some coral host species (Rohwer et al., 2002).

The cost and time associated with characterizing these complex bacterial assemblages initially posed a challenge to scientists attempting to identify patterns of diversity across a large scale. However, the gradually decreasing cost and increasing efficiency of high-throughput methods, including 454 pyrosequencing technology, allowed researchers to perform community 16S rRNA gene profiling and metagenome sequencing in a broad range of coral specimens. Recent applications of 16S pyrosequencing in corals have produced hundreds of thousands of 16S sequences—in contrast to hundreds of sequences from cloning methods. Results from pyrosequencing-based studies provide evidence of the presence of “coral-specific” groups of bacterial ribotypes (Reis et al., 2009; Kvennefors et al., 2010; Sunagawa et al., 2010; Ceh et al., 2011). Experiments investigating the bacterial component of coral surface mucous layers suggest that the composition of bacterial communities in coral mucus is distinct from other surface-associated biofilms and is influenced by the physical and biochemical properties of the mucus (Barott et al., 2011; Sweet et al., 2011b). Although corals maintain specific groups of bacteria, variation among individuals of a coral species may occur according to location (Guppy and Bythell, 2006; Littman et al., 2009; Kvennefors et al., 2010; Ceh et al., 2011).

Bacterial communities are maintained in microhabitats within an individual coral host, spatially structured within chemical micro-niches, or compartments, in the skeleton, tissues, and surface mucous layer of corals (Rohwer et al., 2001, 2002; Daniels et al., 2011; Sweet et al., 2011a). This spatial microheterogeneity is similar to previously described trends in the speciation of the dinoflagellate Symbiodinium in branching acroporid corals (Rowan and Knowlton, 1995). With that in mind, new collection techniques and apparatuses have recently been developed to enable collection from specific compartments of the coral, with minimized contamination by bacteria from other compartments (Sweet et al., 2011a).

Recent research surveying bacterial communities in a large number of marine sponges suggests that bacteria detected in sponges can be classified in three categories (Schmitt et al., 2011): core (groups of bacteria that are shared across many sponges), species-specific (groups of bacteria that are specific to certain sponge hosts), and variable (groups of bacteria that are transiently associated with the host, probably due to passive attachment from seawater). The recent composition analyses of bacterial assemblages in corals indicate that a similar classification scheme can be applied to coral-associated bacteria. An interesting difference between corals and sponges is that while many sponges have been documented to transmit diverse, specific bacterial communities in their gametes or larvae (Schmitt et al., 2007; Sharp et al., 2007), most corals appear to acquire specific bacteria from the seawater each generation (Apprill et al., 2009; Sharp et al., 2010). The mechanisms by which corals selectively and specifically recruit their core and specific bacterial components are largely undescribed, but they likely involve the physical properties and the chemical structure of the mucous layer, which is thought to be unique in specific coral species (Bythell and Wild, 2011). Bacteria that successfully colonize the mucus are, in turn, involved in cycling nutrients and organic compounds in corals and on the reefs, and the resident microbes have the potential to modulate the bacterial community structure in coral mucus and tissue.

Here we review recent advances in the study of the coral metaorganism and specifically address (1) the onset of coral-bacterial associations; (2) the functional roles that bacteria play in healthy corals; and (3) how bacteria influence coral reef response to environmental change. These new discoveries are the basis for a model of how coral-associated and reef-inhabiting microbiota influence ecosystem-level responses to global climate change.

Onset of Coral-Bacterial Associations

The Caribbean coral Porites astreoides has been shown to transmit a bacterial component to its offspring (Sharp et al., 2012). However, this seems to be an exception to the rule in scleractinian corals. In eight other coral species that have been examined (Apprill et al., 2009; Sharp et al., 2010), corals do not appear to inherit bacteria from parents; rather, bacterial colonization occurs in planula larvae or post-settlement stages. Many bacterial phylotypes detected in planulae and post-settlement stages of P. astreoides have also been documented in the adult (Wegley et al., 2007), suggesting that corals acquire specific bacterial phylotypes.

Exploration of bacterial communities in early life stages of corals has not only provided new information about bacterial infection in corals, but it has also simplified analysis of diversity and dynamics of bacterial communities in corals across spatiotemporal scales. In contrast to their adult counterparts, swimming planula larvae of most corals have not yet accumulated a high bacterial load from the surrounding environment or by feeding (Apprill et al., 2009; Sharp et al., 2010); as a result, it is more tractable to characterize and quantify the associated bacterial component in these larvae. Similar phylogenetic clades of bacteria were detected in 16S rRNA gene sequence clone libraries from multiple larval specimens of the Caribbean coral Porites astreoides (Sharp et al., 2012) and in the Pacific coral Pocillopora meandrina (Apprill et al., 2009), suggesting that some groups of bacteria are common across different coral species. A number of bacterial types have been commonly detected in multiple species of corals, but of particular interest are those belonging to the phylum α-proteobacteria (Apprill et al., 2009; Raina et al., 2009; Sharp et al., 2012). The α-proteobacteria (particularly the Roseobacteriales) are abundant in the oceans, often constituting a third of the bacterioplankton (Wagner-Dobler and Biebl, 2006). This same group of bacteria is also closely associated with phytoplankton, including the dinoflagellate coral endosymbiont Symbiodinium (Webster et al., 2004). Many of these bacteria, now classified as Ruegeria spp., were originally designated Silicibacter spp. (Yi et al., 2007). It is unknown whether these bacteria play a functional role in corals, but their consistent detection in early life stages of corals and in seawater during coral spawning may be an indication that they are significant to the health of larvae, or even to adult colonies (Apprill et al., 2009; Apprill and Rappe, 2011; Sharp et al., 2012).

New research focusing on the molecular basis of bacterial colonization of the coral tissues or surface mucous layer indicates that coral mucous biofilm communities are a result of selection processes driven by the coral holobiont rather than by incidental attachment by bacteria in the seawater (Sweet et al., 2011b). This is consistent with recent findings from studies in the cnidarians Hydra, in which researchers found that the composition of the surface-associated bacterial community is driven directly by host metabolism and production of compounds in the surface layer of Hydra (Augustin et al., 2010). It is likely that there are specific molecules that influence colonization in the coral mucous layer. Lectin-mediated uptake of Symbiodinium has been demonstrated in corals (Wood-Charlson et al., 2006), but very little is known about bacterial uptake or invasion in corals.

Functional immunological molecules with bacterial binding capacity have been found in corals, describing a means by which the host may control associated microbial composition (Kvennefors et al., 2008; Kvennefors and Roff, 2009). Molecules that control the activities of other coral-associated microbes are thought to be derived from the coral host and in some cases from the associated bacteria (Ritchie, 2006; Teplitski and Ritchie, 2009; Vidal-Dupiol et al., 2011a,b). As previously described in a broad range of other animal-microbe systems (McFall-Ngai et al., 2012), molecules that direct bacterial infection of animal tissue-associated bacteria may be conserved, regardless of whether the bacteria are beneficial, commensal, or pathogenic.

Role of Bacteria in Health of Coral and Coral Reefs

Recent coral microbiology research has described how bacterial communities contribute to the overall physiology and ecology of apparently healthy corals. These discoveries were made possible both by new molecular technologies and by novel fieldwork-based approaches. Bacteria within corals govern the biogeochemical cycling within coral tissues. In addition, bacteria on surfaces in the reef environment influence and facilitate settlement of coral larval, and resident microbes in corals play a role in defining the composition of the bacterial community in corals.

Studies over the past several years indicate that coral-associated bacteria influence biogeochemical cycling within corals and on reefs. Metagenomic data from the bacterial fraction of DNA from the coral Porites astreoides indicate the presence of numerous genes capable of degrading diverse aromatic compounds (Wegley et al., 2007). Coral-associated bacteria have been shown to be involved in cycling mucous-derived particulate and dissolved organic compounds in the reef environment (Wild et al., 2004, 2009; Huettel et al., 2006). In addition, the bacterial metagenome of P. astreoides consists of genes encoding enzymes involved in cycling nitrogen via nitrogen fixation, ammonification, nitrification, and denitrification (Wegley et al., 2007). The detection of bacterial nitrogen fixation genes is consistent with previous biochemical research in which cyanobacterial nitrogen fixation was detected (Lesser et al., 2007). Further research focusing on nifH gene diversity in two species of Montipora (Olson et al., 2009) suggests that nitrogen-fixing bacteria in corals are not limited to cyanobacteria but also belong to taxa representing the α-, β-, γ-, and δ-proteobacterial classes (Olson et al., 2009). Bacteria have been shown to be significant players in transforming nitrogen (Fiore et al., 2010) as well as sulfur and carbon compounds (Ferrier-Pages et al., 2001; Raina et al., 2009; Kimes et al., 2010) in corals and on coral reefs.

Bacteria outside of the coral animal also exert influence on the behavior of corals during their early life stages. Particular species of crustose coralline algae (CCAs) have been shown to facilitate larval settlement of the threatened coral species Acropora cervicornis and A. palmata in the Florida Keys and the Caribbean (Ritson-Williams et al., 2010). The integration of microbiological and chemical ecology approaches suggests that the facilitation of larval settlement by CCAs may be regulated by bacteria growing in biofilms on the surface of CCAs (Negri et al., 2001; Webster et al., 2004; Tebben et al., 2011). To date, all of the CCA-associated bacteria implicated in inducing coral metamorphosis and settlement belong to the γ-proteobacteria. A strain of the γ-proteobacterium Pseudoalteromonas sp. isolated from the surface of the CCA species Hydrolithon onkodes induces significant levels of larval metamorphosis in the corals Acropora willisae and A. millepora in laboratory experiments (Negri et al., 2001). Researchers have recently shown that exposure to Pseudoalteromonas isolates cultured from Negoniolithon fosliei and Hydrolithon onkodes significantly increases rates of metamorphosis on the Pacific coral Acropora millepora (Tebben et al., 2011). Bioassay-guided isolation identified the inductive molecule as tetrabromopyrrole (Tebben et al., 2011). Other strains of Pseudoalteromonas and Thalassomonas have also been shown to induce larval settlement and metamorphosis in the coral Pocillopora damicornis (Tran and Hadfield, 2011). Not all tested isolates of Pseudoalteromonas and Thalassomonas were inductive in that study, indicating that the ability to induce settlement is taxon-specific. In addition, the isolation source of the bacteria (algal surface vs. coral surface) was not linked to the strains’ inductive properties (Tran and Hadfield, 2011). Together, these studies indicate that coral recruitment and successful larval attachment and metamorphosis (which is crucial for continued repopulation of coral reef ecosystems) is strongly governed by the activity of specific bacteria in reef environments.

Recent research has focused on the role of bacteria native to the coral surface mucous layer that control bacterial colonization within the mucus, ultimately regulating resistance to disease. Corals have been shown to protect themselves against pathogen infection via the presence of allelopathic properties in the mucus (Geffen and Rosenberg, 2005; Ritchie, 2006) or the coral tissue (Koh, 1997; Kelman et al., 2006; Gochfeld and Aeby, 2008). However, antimicrobial assays with numerous Red Sea corals reveal that the capabilities of coral species for antibiotic production are highly variable (Kelman et al., 2006). Bacteria isolated from corals are able to inhibit the colonization and growth of many other types of bacteria, including potentially invasive coral pathogens (Reshef et al., 2006; Ritchie, 2006; Wegley et al., 2007; Gochfeld and Aeby, 2008; Nissimov et al., 2009; Shnit-Orland and Kushmaro, 2009; Sharon and Rosenberg, 2010; Kvennefors et al., 2012). In addition, the presence of a high number of genes involved in antibacterial compound biosynthesis have been detected in metagenomes from multiple corals (Wegley et al., 2007; Thurber et al., 2009). It is not clear to what extent these bacteria and the metabolites they produce play a role in community structure. In situ antibiotic production by bacteria is known to be a means of securing a niche by controlling microbial populations competing for the same resources (Nielsen et al., 2000; Rao et al., 2005). It is therefore likely that bacteria in and on the coral host govern the dynamics of coral microbiota.

Although the mechanisms by which mucous-associated bacteria prevent pathogenic infection are still unknown, the data indicate that a sophisticated system of bacterial cell-cell chemical signaling known as quorum sensing (QS) may be involved in microbial pathogenesis in corals. QS is modulated by small diffusible compounds called autoinducers, which are molecules that, when accumulated to a threshold concentration within a diffusion-limited environment, result in synchronized group behaviors. This density-dependent regulation allows bacterial populations to act in unison, effectively magnifying their ecological impact. Though the cell-cell communication systems differ among bacterial species, QS has been demonstrated to regulate many bacterial behaviors, including biofilm formation, antibiotic production, bioluminescence, and pathogenesis (Ng and Bassler, 2009), and it commonly drives important interactions between bacterial communities and their hosts (Rasmussen and Givskov, 2006; Dobretsov et al., 2009).

Quorum sensing in bacterial pathogens is the mechanism by which virulence genes are expressed relative to pathogen density in the host, thereby initiating a coordinated attack once bacterial cell numbers reach a critical mass (Dobretsovet al., 2009). Both eukaryotes and prokaryotes have evolved to recognize and counter QS in pathogens, and there is evidence that eukaryotic signal-mimics can stimulate QS responses in bacteria (Teplitski et al., 2011). Other bacteria can counter-attack by producing quorum-quenching acylases or lactonases that break down signaling molecules (Teplitski et al., 2011). In addition to the signal-degrading enzymes, eukaryotes can inhibit or activate bacterial QS by producing compounds that mimic QS signals. For example, Rajamani et al. (2008) demonstrated that lumichrome, a derivative of the vitamin riboflavin that is produced by the unicellular alga Chlamydomonas reinhardtii (as well as other prokaryotes and eukaryotes) can interact with the bacterial receptor for QS signals and elicit QS responses.

Quorum sensing may inhibit or activate pathogenesis, antibiotic production, exoenzyme production, and attachment by beneficial bacteria within coral tissues and on surfaces. Coral extracts contain compounds capable of interfering with QS activities (Skindersoe et al., 2008; Alagely et al., 2011) that may be involved in regulating the colonization of coral mucus by pathogens, commensal bacteria, or beneficial bacteria. The source of this activity is difficult to pinpoint and could originate from the coral, the dominant endosymbiont, or any associated bacteria. Alagely et al. (2011) recently showed that both coral- and Symbiodinium-associated bacteria alter swarming and biofilm formation in the coral pathogen Serratia marcescens. These phenotypes are typically controlled by QS, although inhibition of QS by these isolates remains to be demonstrated. There are few studies on the in situ roles of QS in corals, but this process is likely to be used in both pathogenesis and mutualistic interactions (Krediet et al., 2009a,b; Teplitski and Ritchie, 2009; Tait et al., 2010). While it is clear that at least some coral-associated commensals and pathogens produce QS signals under laboratory conditions (Tait et al., 2010; Alagely et al., 2011), it is not clear whether these signals accumulate to threshold concentrations in natural environments.

It is feasible that Symbiodinium spp. also produce signaling molecules that control bacterial cell-cell communication, which would influence the specific complement of bacteria that associate with corals. Perhaps bacterial species-specificity in corals is, in part, driven by Symbiodinium within the coral, but this has yet to be tested. The potential for Symbiodinium to be a source of antibacterial compounds in corals represents an aspect of bioactive compound production that is not yet described. It is likely that the source of antibacterial activity in corals is a combination of allelopathic chemicals produced by the coral, by associated bacteria, or by endosymbiotic dinoflagellates. In a study conducted by Marquis et al. (2005), eggs from 11 coral species were tested for antibacterial activity, and the only species exhibiting antibiotic activity was the one coral species in the study that incorporates Symbiodinium into the egg before the egg is released, suggesting a potential allelopathic contribution of Symbiodinium. It is also possible that coral-associated bioactive compounds are derived from bacteria whose presence or activity is influenced by Symbiodinium, but this has yet to be tested.

Role of Bacteria in Reef Ecosystem Responses to Environmental Change

The latest research on how coral-associated bacterial communities mediate responses of corals and coral reef ecosystems to environmental change addresses shifts in both the phylogenetic structure and metabolic capabilities of bacterial assemblages in corals. Multiple approaches and tools from microbiology, molecular biology, microscopy, and chemical ecology have been used to identify the role of bacterial communities in response to threats such as increased sea-surface temperature, increased organic carbon and nutrient levels in seawater, increased macroalgal and cyanobacterial cover on reefs, and decreased seawater pH.

Rising sea-surface temperatures are linked to increases in coral diseases worldwide. However, the study of microbial coral diseases has been challenging due to many factors including microbial dynamics in the marine environment, the complications of proving unequivocal disease causation, and insufficient diagnostic tools (Pollock et al., 2011; Weil and Rogers, 2011). Some bacteria identified as coral pathogens include Serratia marcescens (Sutherland et al., 2011), Aurantimonas coralicida (Denner et al., 2003), and a consortium of bacterial and cyanobacteria phylotypes that make up what is known as Black Band Disease (Sekar et al., 2008). The most common bacteria present and problematic for corals are members of the Vibrionaceae that have been implicated in coral bleaching (Kushmaro et al., 1997; Ben-Haim and Rosenberg, 2002) and a myriad of coral diseases (Patterson et al., 2002; Frias-Lopez et al., 2003, 2004; Kline et al., 2006; Cervino et al., 2008). The Vibrionaceae are a common but diverse group of heterotrophic marine bacteria, collectively referred to as vibrios. Vibrios have been shown to be present in higher abundance on coral surfaces before obvious signs of distress (Ritchie, 2006; Mao-Jones et al., 2010). This group includes human pathogens and benign planktonic and animal-associated marine bacteria. Bleaching of the scleractinian coral Oculina patagonica in the eastern Mediterranean Sea was shown to be caused by Vibrio shiloi (Kushmaro et al., 1997). Vibrio coralliilyticus was isolated from bleached corals of the genus Pocillopora damicornis and shown to cause coral bleaching and tissue sloughing (Ben-Haim and Rosenberg, 2002). In these pathogens, toxin production and the ability to infect coral tissue have a strong temperature dependence (Kushmaro et al., 1997; Ben-Haim and Rosenberg, 2002). Vibrio dynamics are affected by water temperature and salinity, yet little else is known about environmental drivers of their abundance and distribution in the marine environment (Johnson et al., 2010). These organisms are often cultured rapidly and are able to utilize a wide range of carbon sources, suggesting that the biogeochemical significance of vibrios may vary with the nutrient state of the environment (Thompson et al., 2004). Some reef organisms are thought to be vectors for coral disease agents, specifically vibrios. These include organisms that come into contact with, or feed on, corals such as fireworms, snails, and corallivorous fishes (Weil and Rogers, 2011). Several recent reviews offer a comprehensive summary of the occurrence and possible environmental determinants of coral diseases (Rosenberg et al., 2009; Pollock et al., 2011; Weil and Rogers, 2011). Research on processes governing pathogen dynamics, abundance, and pathogenesis has informed us on coral defense mechanisms.

The coral surface mucous layer and its resident microbes appear to be significant in defending corals from microbial diseases. Mucus harvested from the coral Acropora palmata during a period of increased seawater temperatures does not exhibit significant antibiotic activity compared to mucus sampled at lower temperatures (Ritchie, 2006). This suggests that the protective capacity of some corals may be lost when temperatures increase, providing a mechanism to explain how increased temperatures lower coral resistance and increase susceptibility to diseases. In addition, when temperatures increase, the dominant bacterial flora in coral mucus shifts from antibiotic-producing bacteria to pathogens (Ritchie, 2006). This finding indicates that a balance of potentially beneficial microbes may be important for the overall physiological health of reef corals. Rising sea-surface temperatures can cause a breakdown of coral-Symbiodinium symbiosis. In addition, shifting seawater temperatures can simultaneously affect interactions among other microbes, particularly bacteria present in or on the coral, rendering the host susceptible to opportunistic or secondary infection by certain bacteria (Ritchie, 2006; Lesser et al., 2007). Research on the Pacific coral Acropora millepora indicates that after bleaching (the loss of Symbiodinium) there is a dramatic shift to a Vibrio-dominated community (Bourne et al., 2007), but it is unclear whether the bacterial communities are responding to the absence of the Symbiodinium, to physiological changes in the coral host, or to the increased light and sea-surface temperature. Following bleaching-induced coral mortality, nitrogen-fixing bacteria increase in abundance on coral skeletons (Holmes and Johnstone, 2010). The resulting increase in available nitrogen in the seawater has the potential to affect the growth of macroalgae and other nitrogen-limited primary producers, including benthic cyanobacteria (Holmes and Johnstone, 2010). Taken together, these results demonstrate that temperature stress and coral bleaching have the potential to alter the composition and metabolism of coral-associated bacterial assemblages, with significant impacts on the health of corals and coral reef communities.

As a result of heightened fishing pressure, decline in herbivore populations, and increased nutrient levels, reefs are undergoing a “phase shift” from coral-dominated ecosystems to algal-dominated ecosystems (Pandolfi et al., 2003). Overgrowth by turf macroalgae and benthic cyanobacteria has been documented on adult coral colonies on reefs (Ritson-Williams et al., 2005). Concern is growing for how this shift in ecosystems affects bacterial communities within coral reefs (Dinsdale et al., 2008). Recent research demonstrates that allelochemicals from macroalgae and benthic cyanobacteria have the potential to mediate shifts in abundance and community composition of microbiota associated with adult corals (Morrow et al., 2011). When tested against a library of strains isolated from algal surfaces, from mucus of the Caribbean corals Montastraea faveolata and Porites astreoides in direct contact with algal surfaces, and from the mucus without direct contact of algae, chemical extracts from six species of macroalgae and two species of benthic cyanobacteria stimulated the growth of some strains but inhibited the growth of other strains (Morrow et al., 2011). While some of the algal extracts had broad-spectrum activity against the collection of test isolates from phylogenetically diverse environmental bacteria, other extracts specifically increased the growth rates of the bacterial genus Vibrio (Morrow et al., 2011). Many of the active compounds in the study were hydrophilic, indicating that the bioactive compounds from algae or cyanobacteria may be readily solubilized and transported throughout seawater, providing a potential mechanism for algae to regulate microbial activity without direct contact, especially in low-flow benthic systems (Morrow et al., 2011). Allelopathic interactions among algae and corals have been shown to have detrimental effects on coral larval behavior, recruitment, and survival (Kuffner and Paul, 2004; Kuffner et al., 2006; Ritson-Williams et al., 2009). It is unknown how the bioactive compounds influence health of the early life stages, but it is feasible that the observed effects are linked to shifting bacterial communities associated with the coral planulae and recruits.

Smith et al. (2006) explored the effects of macroalgae on bacterial growth in the coral surface mucopolysaccharide layer. The results of that research, together with prior work on controlled exposure of coral fragments to seawater with increased dissolved organic carbon (DOC) levels (Kline et al., 2006), suggest that an excess of DOC, exuded from macroalgae, leads to coral mortality (Smith et al., 2006). In addition, Barott et al. (2011) found that the community composition of bacteria on surfaces of multiple reef macroalgal species is distinct from those found on coral surface mucous layers.

On the basis of these studies, it is clear that macroalgae have the potential to act as reservoirs of specific bacteria (beneficial, commensal, or pathogenic) not usually native to the coral mucous layer. Macroalgae also release compounds into the surrounding seawater that can have direct inhibitory or stimulatory effects on the coral-associated microbiota and, hence, on the health of the coral host.

Ocean acidification is a major concern for marine ecosystems in general—particularly those dependent on calcifying organisms, as secretion of calcium carbonate skeletons depends directly on carbonate saturation state in seawater (Caldeira et al., 2007). Recent research suggests that a decrease in seawater pH can alter marine bacterial communities, but very little is known about the large-scale impacts of those changes (Joint et al., 2011). Laboratory manipulations of seawater pH have shown that acidification can result in loss of Symbiodinium endosymbionts, decrease in calcification, depression of overall net productivity in corals (Anthony et al., 2008), and dissolution or slowed deposition of coral skeletons (Fine and Tchernov, 2007). In addition, decreased seawater pH levels have been attributed to a decline in overall abundance of crustose coralline algae (Kuffner et al., 2008), some of which have been shown to facilitate coral recruitment in reefs (Ritson-Williams et al., 2010). Experiments demonstrate that lower PCO2 levels in seawater result in significant detrimental effects on early life stages of the coral Porites astreoides, including fertilization success, larval settlement rates, post-settlement growth, and post-settlement skeleton deposition (Albright et al., 2008, 2010).

Several laboratory-based studies have focused specifically on the impacts of ocean acidification on coral microbiota. Meron et al. (2011) explored shifts in microbial assemblages associated with the coral Acropora eurystoma exposed to ambient seawater and seawater with pH 7.3 over a period of 2 mon using denaturing gradient gel electrophoresis profiles and 16S rRNA gene clone libraries. According to the resulting cluster analysis, a decrease in pH results in an increase in detection of Rhodobacteraceae and a decrease in detection of Bacteroidetes and Deltaproteobacteria (Meron et al., 2011). Relative to libraries from corals exposed to ambient seawater, clone libraries from A. eurystoma exposed to pH 7.3 conditions exhibited a higher percentage of clones representing bacteria closely related to those detected in stressed, injured, or diseased invertebrates (Meron et al., 2011). In another study with the Pacific coral Porites compressa, individuals exposed to an extremely low pH (6.7) exhibited shifts in bacterial community diversity (Thurber et al., 2009). Though the mechanism by which this occurs is not yet clear, it has been suggested that the altered seawater pH indirectly causes a shift in the bacterial diversity by impacting host metabolism, which results in a shift of nutrients and carbon available to the associated microbiota (Meron et al., 2011).

Metagenomic analysis of P. compressa mucus revealed potential functional shifts in the associated microbiota as a result of decreased pH and increased temperature (Thurber et al., 2009), most notably an increase in the number of detected genes for antibiotic and toxin production. Mucus from corals exposed to a decreased pH exhibits low antimicrobial activity (Meron et al., 2011), and mucus of Acropora palmata exhibits lower antibacterial activity after prolonged warm periods (Ritchie, 2006). Together, these results warn that even slight changes in seawater pH and temperature can have ecologically significant effects on coral-associated microbiota and, hence, on coral’s susceptibility to bacterial pathogens. The shift in the coral microbiome phylogenetic profile has been proposed as a potential indicator for declining coral health before the corals exhibit more obvious signs of stress or disease (Thurber et al., 2009; Ainsworth et al., 2010; Garren and Azam, 2012).

A Model for Climate-Change-Induced Shifts in the Coral Metaorganism

The research reviewed here suggests that alterations in sea surface temperature, algal and cyanobacterial abundance on reefs, and seawater pH can have detrimental effects on corals by decreasing protective qualities of the coral mucous layer, via inhibition of growth or compound production in beneficial bacteria or by alteration of host-associated compound biosynthesis. Another aspect of coral-bacterial interactions that has garnered much attention is the ability of bacteria on reef substrates to influence successful larval recruitment. These surfaces include crustose coralline algae (CCAs), which are coated with microbial biofilms and are thought to be involved in mediating coral larval settlement (Webster et al., 2001, 2011; Ritson-Williams et al., 2009, 2010; Tebben et al., 2011).

Figure A11-1 represents the current model of corals and their interdependence on associated microbes. Both coral tissue and coral mucus contain abundant and diverse microbial communities (Figure A11-1a). When sea-surface temperatures increase, antibacterial compounds in the coral mucus disappear. Simultaneously, antibacterial-producing bacteria normally associated with healthy corals decrease while bacteria with pathogenic capabilities increase (Figure A11-1b). Mathematical modeling of this system suggests that once this shift to pathogen dominance is established, this state persists long after conditions return to those favorable for the reestablishment of beneficial microbes (Mao-Jones et al., 2010). Recent data from coral mucus bacterial metagenomes exposed to decreased pH (Thurber et al., 2009; Meron et al., 2011) indicate that ocean acidification may also result in a similar shift in the protective properties of coral mucus.

A four-panel schematic of coral surfaces and associated microbes


Schematic of coral surfaces and associated microbes. (a) Under normal conditions, the coral animal, associated endosymbiotic algae, or native bacteria may produce allelopathic compounds that regulate the abundance and activities of other microbes that (more...)

On the basis of this model and the data reviewed in this paper, we present a second model of coral-bacterial interactions in which environmental changes lead to shifts in bacterial communities on reef surfaces (Figure A11-1c and d). It has been shown that increased temperatures change the phylogenetic composition of CCA-associated bacterial communities and the success of larval recruitment (Webster et al., 2011). In addition, it was recently shown that decreased pH inhibits settlement of the coral Porites astreoides (Albright et al., 2008, 2010). Temperature may affect the growth, abundance, or bioactive metabolite biosynthesis of beneficial bacteria, particularly Pseudoalteromonas spp., on reef surfaces that are important for successful recruitment, which can ultimately result in a decline of new recruitment on reefs. Though the effects of decreased pH on surface biofilms have not been well described, this condition may alter the bacterial biofilm community and influence larval settlement success. Figure A11-1c and d shows a schematic model of reef surface-associated microbes before (c) and after (d) increased sea-surface temperature or ocean acidification. In ambient conditions on the reef, CCAs, or bacteria growing on CCA surfaces, produce compounds that facilitate larval settlement (Figure A11-1c). When sea-surface temperatures increase, bacterial communities on CCAs change, resulting in lower larval recruitment rates (Figure A11-1d). Similarly, as pH decreases, larval settlement decreases (Albright et al., 2008, 2010). It is hypothesized that the inductive properties of CCAs, whether they are due to compounds released by bacterial biofilms on CCAs or by the CCAs themselves, decrease (Figure A11-1d). As in the coral mucus (Figure A11-1a and b), there is a shift in the bacterial community of the reef surfaces. In this case, under increased sea-surface temperatures, the bacterial community dominated by inductive bacteria, such as Pseudoalteromonas and Thalassomonas, moves to a community dominated by bacteria that may not have inductive properties.

Next Questions: Microbe-Microbe Interactions in Corals

One of the next steps in increasing our understanding of coral fitness is a comprehensive characterization of coral-associated microbial interactions. For example, it is unclear if Symbiodinium plays a role in selectively recruiting bacteria to corals, if Symbiodinium affects bacterial physiology or secondary metabolite biosynthesis, or if bacterial metabolism influences Symbiodinium activity.

Little is known about the nature of free-living Symbiodinium, including what bacterial mutualisms may be present before coral acquisition of Symbiodinium, in the case that the algal symbiont is not transmitted vertically. Members of the Roseobacteriales group are specifically present in association with Symbiodinium cultures and are able to increase Symbiodinium growth rates in vivo (Ritchie, 2011). This observed association between α-proteobacteria and dinoflagellates may be a true mutualism with benefits for both the bacteria and the algal host. The bacteria may benefit by having a readily available source of organic compounds such as dimethylsulfoniopropionate (DMSP), a preferred source of reduced sulfur (Miller and Belas, 2004; Raina et al., 2010). The algae may derive benefits from the bacterial production of antimicrobials such as tropodithietic acid (Geng and Belas, 2010) and bioactive compounds such as vitamin B-12 (Geng and Belas, 2010). A genomic comparison of the Roseobacter clade of α-proteobacteria indicates that some type of surface-associated lifestyle is central to the ecology of all members of the group (Slightom and Buchan, 2009).

Very little is known about how Symbiodinium affects bacterial communities in corals (or vice versa) or how these interactions impact the fitness of the coral host. Recent studies suggest that bacterial communities in juvenile corals differ significantly if they were initially colonized by different strains of Symbiodinium (Littman et al., 2009) with different photosynthetic efficiencies (Littman et al., 2010). It has been hypothesized that DMSP production by Symbiodinium plays a role in structuring bacterial communities in corals by attracting certain bacteria to the surface mucous layer of corals (Raina et al., 2009, 2010).

An important adaptive property of many α-proteobacteria is the presence of a bacterial system for diversity generation facilitated by gene transfer agents (GTAs) (Paul, 2008). GTAs are defective bacteriophages that are able to randomly package bacterial host DNA and transfer DNA to other α-proteobacteria (Paul, 2008). It has recently been shown that Symbiodinium-associated α-proteobacteria produce GTAs and are able to transfer genes to a range of bacteria in the marine environment (McDaniel et al., 2010). Furthermore, gene transfer via this mechanism is much higher in the coral reef environment than in other marine environments, suggesting an alternate mode of adaptation via swapping of potentially beneficial genes among marine bacteria (McDaniel et al., 2010) and possibly the coral holobiont.

A fundamental requirement of model systems is that they address interspecies interactions in a metaorganism. Research on host-microbe interactions can greatly benefit from a well-documented host-microbe study that spans the spectrum from pathogenicity to mutualism. Much work has been done on the basal metazoan Hydra to illustrate the value of a model systems approach (Weis et al., 2008; Bosch et al., 2009). Because Hydra is associated with a limited number of bacteria, it has provided valuable insight into the molecular basis of immunity and symbiosis in simple animals. Cnidarian and dinoflagellate models can also be used to elucidate roles of bacteria in both coral and Symbiodinium biology. Ideally, these models require cultured symbionts (bacterial and dinoflagellate) and an easily maintained cnidarian host (Weis et al., 2008). Our ability to culture many of these bacterial symbionts will aid in exploring functions that are otherwise impossible to study due to the complex nature of the coral holobiont. Generation of genome sequence data from animal hosts and their associated microorganisms will exponentially enhance our basic understanding of symbiotic associations at the molecular level. This includes reconstruction of host-symbiont phylogenies, analysis of genes important in specific interactions, comparative genomics, and advanced technologies. The sea anemone Aiptasia pallida has recently been proposed as a model for coral biology for a number of reasons (Weis et al., 2008). While corals are difficult to grow in captivity, this species is hardy to laboratory manipulation and grows quickly in aquaria. Many protocols have been developed to manipulate Symbiodinium density in A. pallida without lethal effects on the host, and as a result, this organism has successfully been used to describe mechanisms of coral bleaching (Dunn et al., 2007) and disease (Alagely et al., 2011). Aiptasia pallida represents an opportunity to integrate a model systems approach with novel technologies from the “omics age” to learn more about multipartner interactions in corals in a moment of great environmental change.


This work was funded in part by the Mote Marine Laboratory Protect Our Reefs Grants Program and by the Dart Foundation. We thank Cathleen Sullivan (MML) for assistance with EndNote formatting, and two anonymous reviewers for improvements in the manuscript.


  • Ainsworth TD, Thurber RV, Gates RD. The future of coral reefs: a microbial perspective. Trends Ecol Evol. 2010;25:233–240. [PubMed: 20006405]
  • Alagely A, Krediet CJ, Ritchie KB, Teplitski M. Signaling-mediated cross-talk modulates swarming and biofilm formation in a coral pathogen Serratia marcescens. ISME J. 2011;5:1609–1620. [PMC free article: PMC3176518] [PubMed: 21509042]
  • Albright R, Mason B, Langdon C. Effect of aragonite saturation state on settlement and post-settlement growth of Porites astreoides larvae. Coral Reefs. 2008;27:485–490.
  • Albright R, Mason B, Miller M, Langdon C. Ocean acidification compromises recruitment success of the threatened Caribbean coral Acropora palmata. Proc Natl Acad Sci USA. 2010;107:20400–20404. [PMC free article: PMC2996699] [PubMed: 21059900]
  • Anthony KRN, Kline DI, Diaz-Pulido G, Dove S, Hoegh-Guldberg O. Ocean acidification causes bleaching and productivity loss in coral reef builders. Proc Natl Acad Sci USA. 2008;105:17442–17446. [PMC free article: PMC2580748] [PubMed: 18988740]
  • Anthony KRN, Kleypas JA, Gattuso J-P. Coral reefs modify their seawater carbon chemistry—implications for impacts of ocean acidification. Glob Change Biol. 2011;17:3655–3666.
  • Apprill A, Rappe MS. Response of the microbial community to coral spawning in lagoon and reef flat environments of Hawaii, USA. Aquat Microb Ecol. 2011;62:251–266.
  • Apprill A, Marlow HQ, Martindale MQ, Rappe MS. The onset of microbial associations in the coral Pocillopora meandrina. ISME J. 2009;3:685–699. [PubMed: 19242535]
  • Augustin R, Fraune S, Bosch TC. How Hydra senses and destroys microbes. Semin Immunol. 2010;22:54–58. [PubMed: 20005124]
  • Azam F. Microbial control of oceanic carbon flux: the plot thickens. Science. 1998;280:694–696.
  • Azam F, Worden AZ. Microbes, molecules, and marine ecosystems. Science. 2004;303:1622–1624. [PubMed: 15016987]
  • Barott KL, Rodriguez-Brito B, Janouskovec J, Marhaver KL, Smith JE, Keeling P, Rohwer FL. Microbial diversity associated with four functional groups of benthic reef algae and the reef-building coral Montastraea annularis. Environ Microbiol. 2011;13:1192–1204. [PubMed: 21272183]
  • Ben-Haim Y, Rosenberg E. A novel Vibrio sp. pathogen of the coral Pocillopora damicornis. Mar Biol. 2002;141:47–55.
  • Bosch TCG, McFall-Ngai MJ. Metaorganisms as the new frontier. Zoology. 2011;114:185–190. [PMC free article: PMC3992624] [PubMed: 21737250]
  • Bosch TC, Augustin R, Anton-Erxleben F, Fraune S, Hemmrich G, Zill H, Rosenstiel P, Jacobs G, Schreiber S, Leippe M, et al. Uncovering the evolutionary history of innate immunity: the simple metazoan Hydra uses epithelial cells for host defence. Dev Comp Immunol. 2009;33:559–569. [PubMed: 19013190]
  • Bourne D, Iida Y, Uthicke S, Smith-Keune C. Changes in coral-associated microbial communities during a bleaching event. ISME J. 2007;2:350–363. [PubMed: 18059490]
  • Bourne DG, Garren M, Work TM, Rosenberg E, Smith GW, Harvell CD. Microbial disease and the coral holobiont. Trends Microbiol. 2009;17:554–562. [PubMed: 19822428]
  • Bythell JC, Wild C. Biology and ecology of coral mucus release. J Exp Mar Biol Ecol. 2011;408:88–93.
  • Caldeira K, Archer D, Barry JP, Bellerby RGJ, Brewer PG, Cao L, Dickson AG, Doney SC, Elderfield H, Fabry VJ, et al. Comment on “Modern-age buildup of CO2 and its effects on seawater acidity and salinity” by Hugo A. Loaiciga. Geophys Res Lett. 2007;34 [Cross Ref]
  • Ceh J, van Keulen M, Bourne DG. Coral-associated bacterial communities on Ningaloo Reef, Western Australia. FEMS Microbiol Ecol. 2011;75:134–144. [PubMed: 21044100]
  • Cervino JM, Thompson FL, Gomez-Gil B, Lorence EA, Goreau TJ, Hayes RL, Winiarski-Cervino KB, Smith GW, Hughen K, Bartels E. The Vibrio core group induces yellow band disease in Caribbean and Indo-Pacific reef-building corals. J Appl Microbiol. 2008;105:1658–1671. [PubMed: 18798767]
  • Daniels CA, Zeifman A, Heym K, Ritchie KB, Watson CA, Berzins I, Breitbart M. Spatial heterogeneity of bacterial communities in the mucus of Montastraea annularis. Mar Ecol Prog Ser. 2011;426:29–40.
  • Denner EBM, Smith GW, Busse HJ, Schumann P, Narzt T, Polson SW, Lubitz W, Richardson LL. Aurantimonas coralicida gen. nov., sp nov., the causative agent of white plague type II on Caribbean scleractinian corals. Int J Syst Evol Microbiol. 2003;53:1115–1122. [PubMed: 12892136]
  • Dinsdale EA, Pantos O, Smriga S, Edwards RA, Angly F, Wegley L, Hatay M, Hall D, Brown E, Haynes M, et al. Microbial ecology of four coral atolls in the Northern Line Islands. PLoS One. 2008;3:e1584. [PMC free article: PMC2253183] [PubMed: 18301735]
  • Dobretsov S, Teplitski M, Paul V. Mini-review: quorum sensing in the marine environment and its relationship to biofouling. Biofouling. 2009;25:413–427. [PubMed: 19306145]
  • Dunn SR, Schnitzler CE, Weis VM. Apoptosis and autophagy as mechanisms of dinoflagellate symbiont release during cnidarian bleaching: every which way you lose. Proc R Soc B Biol Sci. 2007;274:3079–3085. [PMC free article: PMC2293937] [PubMed: 17925275]
  • Ferrier-Pages C, Schoelzke V, Jaubert J, Muscatine L, Hoegh-Guldberg O. Response of a scleractinian coral, Stylophora pistillata, to iron and nitrate enrichment. J Exp Mar Biol Ecol. 2001;259:249–261. [PubMed: 11343715]
  • Fine M, Tchernov D. Ocean acidification and scleractinian corals—Response. Science. 2007;317:1032–1033. [PubMed: 17717167]
  • Fiore CL, Jarett JK, Olson ND, Lesser MP. Nitrogen fixation and nitrogen transformations in marine symbioses. Trends Microbiol. 2010;18:455–463. [PubMed: 20674366]
  • Frias-Lopez J, Bonheyo GT, Jin QS, Fouke BW. Cyanobacteria associated with coral black band disease in Caribbean and Indo-Pacific reefs. Appl Environ Microbiol. 2003;69:2409–2413. [PMC free article: PMC154794] [PubMed: 12676731]
  • Frias-Lopez J, Klaus JS, Bonheyo GT, Fouke BW. Bacterial community associated with black band disease in corals. Appl Environ Microbiol. 2004;70:5955–5962. [PMC free article: PMC522118] [PubMed: 15466538]
  • Garren M, Azam F. New directions in coral reef microbial ecology. Environ Microbiol. 2012;14:833–844. [PubMed: 21955796]
  • Geffen Y, Rosenberg E. Stress-induced rapid release of antibacterials by scleractinian corals. Mar Biol. 2005;146:931–935.
  • Geng HF, Belas R. Molecular mechanisms underlying roseobacter-phytoplankton symbioses. Curr Opin Biotechnol. 2010;21:332–338. [PubMed: 20399092]
  • Gochfeld D, Aeby G. Antibacterial chemical defenses in Hawaiian corals provide possible protection from disease. Mar Ecol Prog Ser. 2008;362:119–128.
  • Guppy R, Bythell JC. Environmental effects on bacterial diversity in the surface mucus layer of the reef coral Montastraea faveolata. Mar Ecol Prog Ser. 2006;328:133–142.
  • Holmes G, Johnstone RW. The role of coral mortality in nitrogen dynamics on coral reefs. J Exp Mar Biol Ecol. 2010;387:1–8.
  • Huettel M, Wild C, Gonelli S. Mucus trap in coral reefs: formation and temporal evolution of particle aggregates caused by coral mucus. Mar Ecol Prog Ser. 2006;307:69–84.
  • Johnson CN, Flowers AR, Noriea NF III, Zimmerman AM, Bowers JC, DePaola A, Grimes DJ. Relationships between environmental factors and pathogenic Vibrios in the northern Gulf of Mexico. Appl Environ Microbiol. 2010;76:7076–7084. [PMC free article: PMC2976234] [PubMed: 20817802]
  • Joint I, Doney SC, Karl DM. Will ocean acidification affect marine microbes? ISME J. 2011;5:1–7. [PMC free article: PMC3105673] [PubMed: 20535222]
  • Kelman D, Kashman Y, Rosenberg E, Kushmaro A, Loya Y. Antimicrobial activity of Red Sea corals. Mar Biol. 2006;149:357–363.
  • Kimes NE, Van Nostrand Nostrand, Weil E, Zhou JZ, Morris PJ. Microbial functional structure of Montastraea faveolata, an important Caribbean reef-building coral, differs between healthy and yellow-band diseased colonies. Environ Microbiol. 2010;12:541–556. [PubMed: 19958382]
  • Kline DI, Kuntz NM, Breitbart M, Knowlton N, Rohwer F. Role of elevated organic carbon levels and microbial activity in coral mortality. Mar Ecol Prog Ser. 2006;314:119–125.
  • Knowlton N, Rohwer F. Multispecies microbial mutualisms on coral reefs: the host as a habitat. Am Nat. 2003;162:S51–S62. [PubMed: 14583857]
  • Koh EGL. Do scleractinian corals engage in chemical warfare against microbes? J Chem Ecol. 1997;23:379–398.
  • Krediet CJ, Ritchie KB, Cohen M, Lipp EK, Sutherland KP, Teplitski M. Utilization of mucus from the coral Acropora palmata by the pathogen Serratia marcescens and by environmental and coral commensal bacteria. Appl Environ Microbiol. 2009;75:3851–3858. [PMC free article: PMC2698349] [PubMed: 19395569]
  • Krediet CJ, Ritchie KB, Teplitski M. Catabolite regulation of enzymatic activities in a white pox pathogen and commensal bacteria during growth on mucus polymers from the coral Acropora palmata. Dis Aquat Org. 2009;87:57–66. [PubMed: 20095241]
  • Kuffner IB, Paul VJ. Effects of the benthic cyanobacterium Lyngbya majuscula on larval recruitment of the reef corals Acropora surculosa and Pocillopora damicornis. Coral Reefs. 2004;23:455–458.
  • Kuffner IB, Walters LJ, Becerro MA, Paul VJ, Ritson-Williams R, Beach KS. Inhibition of coral recruitment by macroalgae and cyanobacteria. Mar Ecol Prog Ser. 2006;323:107–117.
  • Kuffner IB, Andersson AJ, Jokiel PL, Rodgers KuS, Mackenzie FT. Decreased abundance of crustose coralline algae due to ocean acidification. Nat Geosci. 2008;1:114–117.
  • Kushmaro A, Rosenberg E, Fine M, Loya Y. Bleaching of the coral Oculina patagonica by Vibrio AK-1. Mar Ecol Prog Ser. 1997;147:159–165.
  • Kvennefors ECE, Roff G. Evidence of cyanobacterialike endosymbionts in Acroporid corals from the Great Barrier Reef. Coral Reefs. 2009;28:547–547.
  • Kvennefors ECE, Leggat W, Hoegh-Guldberg O, Degnan BM, Barnes AC. An ancient and variable mannose-binding lectin from the coral Acropora millepora binds both pathogens and symbionts. Dev Comp Immunol. 2008;32:1582–1592. [PubMed: 18599120]
  • Kvennefors ECE, Sampayo EM, Ridgway T, Barnes AC, Hoegh-Guldberg O. Bacterial communities of two ubiquitous Great Barrier Reef corals reveals both site- and species-specificity of common bacterial associates. PLoS One. 2010;5:e10401. [PMC free article: PMC2861602] [PubMed: 20454460]
  • Kvennefors E, Sampayo E, Kerr C, Vieira G, Roff G, Barnes A. Regulation of bacterial communities through antimicrobial activity by the coral holobiont. Microb Ecol. 2012;63:605–618. [PubMed: 21984347]
  • Lesser MP, Falcon LI, Rodriguez-Roman A, Enriquez S, Hoegh-Guldberg O, Iglesias-Prieto R. Nitrogen fixation by symbiotic cyanobacteria provides a source of nitrogen for the scleractinian coral Montastraea cavernosa. Mar Ecol Prog Ser. 2007;346:143–152.
  • Littman RA, Willis BL, Bourne DG. Bacterial communities of juvenile corals infected with different Symbiodinium(dinoflagellate) clades. Mar Ecol Prog Ser. 2009;389:45–59.
  • Littman RA, Bourne DG, Willis BL. Responses of coral-associated bacterial communities to heat stress differ with Symbiodinium type on the same coral host. Mol Ecol. 2010;19:1978–1990. [PubMed: 20529072]
  • Mao-Jones J, Ritchie KB, Jones LE, Ellner SP. How microbial community composition regulates coral disease development. PLoS Biol. 2010;8:e1000345. [PMC free article: PMC2846858] [PubMed: 20361023]
  • Marquis CP, Baird AH, de Nys R, Holmstrom C, Koziumi N. An evaluation of the antimicrobial properties of the eggs of 11 species of scleractinian corals. Coral Reefs. 2005;24:248–253.
  • McDaniel LD, Young E, Delaney J, Ruhnau F, Ritchie KB, Paul JH. High frequency of horizontal gene transfer in the oceans. Science. 2010;330:50–50. [PubMed: 20929803]
  • McFall-Ngai M, Heath-Heckman EA, Gillette AA, Peyer SM, Harvie EA. The secret languages of coevolved symbioses: insights from the Euprymna scolopes-Vibrio fischeri symbiosis. Semin Immunol. 2012;24:3–8. [PMC free article: PMC3288948] [PubMed: 22154556]
  • Meron D, Atias E, Iasur Kruh L, Elifantz H, Minz D, Fine M, Banin E. The impact of reduced pH on the microbial community of the coral Acropora eurystoma. ISME J. 2011;5:51–60. [PMC free article: PMC3105665] [PubMed: 20668489]
  • Miller TR, Belas R. Dimethylsulfoniopropionate metabolism by Pfiesteria-associated Roseobacter spp. Appl Environ Microbiol. 2004;70:3383–3391. [PMC free article: PMC427730] [PubMed: 15184135]
  • Morrow KM, Paul VJ, Liles MR, Chadwick NE. Allelochemicals produced by Caribbean macroalgae and cyanobacteria have species-specific effects on reef coral microorganisms. Coral Reefs. 2011;30:309–320.
  • Negri AP, Webster N, Hill RT, Heyward AJ. Metamorphosis of broadcast spawning corals in response to bacteria isolated from crustose algae. Mar Ecol Prog Ser. 2001;223:121–131.
  • Ng W-L, Bassler BL. Bacterial quorum-sensing network architectures. Annu Rev Genet. 2009;43:197–222. [PMC free article: PMC4313539] [PubMed: 19686078]
  • Nielsen AT, Tolker-Nielsen T, Barken KB, Molin S. Role of commensal relationships on the spatial structure of a surface-attached microbial consortium. Environ Microbiol. 2000;2:59–68. [PubMed: 11243263]
  • Nissimov J, Rosenberg E, Munn CB. Antimicrobial properties of resident coral mucus bacteria of Oculina patagonica. FEMS Microbiol Lett. 2009;292:210–215. [PubMed: 19191871]
  • Olson ND, Ainsworth TD, Gates RD, Takabayashi M. Diazotrophic bacteria associated with Hawaiian Montipora corals: diversity and abundance in correlation with symbiotic dinoflagellates. J Exp Mar Biol Ecol. 2009;371:140–146.
  • Pandolfi JM, Bradbury RH, Sala E, Hughes TP, Bjorndal KA, Cooke RG, McArdle D, McCLenachan L, Newman MJH, Paredes G, Warner RR, Jackson JBC. Global trajectories of the long-term decline of coral reef ecosystems. Science. 2003;301:955–958. [PubMed: 12920296]
  • Patterson KL, Porter JW, Ritchie KE, Polson SW, Mueller E, Peters EC, Santavy DL, Smiths GW. The etiology of white pox, a lethal disease of the Caribbean elkhorn coral, Acropora palmata. Proc Natl Acad Sci USA. 2002;99:8725–8730. [PMC free article: PMC124366] [PubMed: 12077296]
  • Paul JH. Prophages in marine bacteria: dangerous molecular time bombs or the key to survival in the seas? ISME J. 2008;2:579–589. [PubMed: 18521076]
  • Pollock FJ, Morris PJ, Willis BL, Bourne DG. The urgent need for robust coral disease diagnostics. PLoS Pathog. 2011;7:e1002183. [PMC free article: PMC3197597] [PubMed: 22028646]
  • Raina J-B, Tapiolas D, Willis BL, Bourne DG. Coral-associated bacteria and their role in the biogeochemical cycling of sulfur. Appl Environ Microbiol. 2009;75:3492–3501. [PMC free article: PMC2687302] [PubMed: 19346350]
  • Raina JB, Dinsdale EA, Willis BL, Bourne DG. Do the organic sulfur compounds DMSP and DMS drive coral microbial associations? Trends Microbiol. 2010;18:101–108. [PubMed: 20045332]
  • Rajamani S, Bauer WD, Robinson JB, Farrow JM III, Pesci EC, Teplitski M, Gao M, Sayre RT, Phillips DA. The vitamin riboflavin and its derivative lumichrome activate the LasR bacterial quorum-sensing receptor. Mol Plant Microbe Interact. 2008;21:1184–1192. [PMC free article: PMC3856186] [PubMed: 18700823]
  • Rao D, Webb JS, Kjelleberg S. Competitive interactions in mixed-species biofilms containing the marine bacterium Pseudoalteromonas tunicata. Appl Environ Microbiol. 2005;71:1729–1736. [PMC free article: PMC1082554] [PubMed: 15811995]
  • Rasmussen TB, Givskov M. Quorum-sensing inhibitors as anti-pathogenic drugs. Int J Med Microbiol. 2006;296:149–161. [PubMed: 16503194]
  • Reis AM, Araujo SD Jr, Moura RL, Francini-Filho RB, Coelho G, Pappas AM Jr, Kruger RH, Thompson FL. Bacterial diversity associated with the Brazilian endemic reef coral Mussismilia braziliensis. J Appl Microbiol. 2009;106:1378–1387. [PubMed: 19187136]
  • Reshef L, Koren O, Loya Y, Zilber-Rosenberg I, Rosenberg E. The coral probiotic hypothesis. Environ Microbiol. 2006;8:2068–2073. [PubMed: 17107548]
  • Ritchie KB. Regulation of microbial populations by coral surface mucus and mucus-associated bacteria. Mar Ecol Prog Ser. 2006;322:1–14.
  • Ritchie KB. Bacterial symbionts of corals and Symbiodinium. In: Rosenberg E, Gophna U, editors. Beneficial Microorganisms in Multicellular Life Forms. Springer; Heidelberg: 2011. pp. 139–150.
  • Ritson-Williams R, Paul VJ, Bonito V. Marine benthic cyanobacteria overgrow coral reef organisms. Coral Reefs. 2005;24:629–629.
  • Ritson-Williams R, Arnold SN, Fogarty ND, Steneck RS, Vermeij MJA, Paul VJ. New perspectives on ecological mechanisms affecting coral recruitment on reefs. In: Lang MA, Macintyre IG, Rützler K, editors. Proceedings of the Smithsonian Marine Science Symposium: Smithsonian Contributions to the Marine Sciences. Smithsonian Institution Scholarly Press; Washington, D.C: 2009. pp. 437–457.
  • Ritson-Williams R, Paul VJ, Arnold SN, Steneck RS. Larval settlement preferences and post-settlement survival of the threatened Caribbean corals Acropora palmata and A. cervicornis. Coral Reefs. 2010;29:71–81.
  • Rohwer FR, Breitbart MB, Jara JJ, Azam FA, Knowlton NK. Diversity of bacteria associated with the Caribbean coral Montastraea franksi. Coral Reefs. 2001;20:85–91.
  • Rohwer F, Seguritan V, Azam F, Knowlton N. Diversity and distribution of coral-associated bacteria. Mar Ecol Prog Ser. 2002;243:1–10.
  • Rosenberg E, Koren O, Reshef L, Efrony R, Zilber-Rosenberg I. The role of microorganisms in coral health, disease and evolution. Nat Rev Microbiol. 2007;5:355–362. [PubMed: 17384666]
  • Rosenberg E, Kushmaro A, Kramarsky-Winter E, Banin E, Yossi L. The role of microorganisms in coral bleaching. ISME J. 2009;3:139–146. [PubMed: 19005495]
  • Rowan R, Knowlton N. Intraspecific diversity and ecological zonation in coral-algal symbiosis. Proc Natl Acad Sci USA. 1995;92:2850–2853. [PMC free article: PMC42316] [PubMed: 7708736]
  • Schmitt S, Weisz JB, Lindquist N, Hentschel U. Vertical transmission of a phylogenetically complex microbial consortium in the viviparous sponge Ircinia felix. Appl Environ Microbiol. 2007;73:2067–2078. [PMC free article: PMC1855684] [PubMed: 17277226]
  • Schmitt S, Tsai P, Bell J, Fromont J, Ilan M, Lindquist N, Perez T, Rodrigo A, Schupp PJ, Vacelet J, Webster N, Hentschel U, Taylor MW. Assessing the complex sponge microbiota: core, variable and species-specific bacterial communities in marine sponges. ISME J. 2011;6:564–576. [PMC free article: PMC3280146] [PubMed: 21993395]
  • Sekar R, Kaczmarsky LT, Richardson LL. Microbial community composition of black band disease on the coral host Siderastrea siderea from three regions of the wider Caribbean. Mar Ecol Prog Ser. 2008;362:85–98.
  • Sharon G, Rosenberg E. Healthy corals maintain Vibrio in the VBNC state. Environ Microbiol Rep. 2010;2:116–119. [PubMed: 23766005]
  • Sharp KH, Eam B, Faulkner DJ, Haygood MG. Vertical transmission of diverse microbes in the tropical sponge Corticium sp. Appl Environ Microbiol. 2007;73:622–629. [PMC free article: PMC1796987] [PubMed: 17122394]
  • Sharp KH, Ritchie KB, Schupp PJ, Ritson-Williams R, Paul VJ. Bacterial acquisition in juveniles of several broadcast spawning coral species. PLoS One. 2010;5:e10898. [PMC free article: PMC2878338] [PubMed: 20526374]
  • Sharp KH, Distel D, Paul VJ. Diversity and dynamics of bacterial communities in early life stages of the Caribbean coral Porites astreoides. ISME J. 2012;6:790–801. [PMC free article: PMC3309355] [PubMed: 22113375]
  • Shnit-Orland M, Kushmaro A. Coral mucus-associated bacteria: a possible first line of defense. FEMS Microbiol Ecol. 2009;67:371–380. [PubMed: 19161430]
  • Skindersoe ME, Ettinger-Epstein P, Rasmussen TB, Bjarnsholt T, de Nys R, Givskov M. Quorum sensing antagonism from marine organisms. Mar Biotechnol. 2008;10:56–63. [PubMed: 17952508]
  • Slightom RN, Buchan A. Surface colonization by marine roseobacters: integrating genotype and phenotype. Appl Environ Microbiol. 2009;75:6027–6037. [PMC free article: PMC2753062] [PubMed: 19666726]
  • Smith JE, Shaw M, Edwards RA, Obura D, Pantos O, Sala E, Sandin SA, Smriga S, Hatay M, Rohwer FL. Indirect effects of algae on coral: algae-mediated, microbe-induced coral mortality. Ecol Lett. 2006;9:835–845. [PubMed: 16796574]
  • Stat M, Carter D, Hoegh-Guldberg O. The evolutionary history of Symbiodinium and scleractinian hosts—symbiosis, diversity, and the effect of climate change. Perspect Plant Ecol Evol Syst. 2006;8:23–43.
  • Sunagawa S, Woodley CM, Medina M. Threatened corals provide underexplored microbial habitats. PLoS One. 2010;5:e9554. [PMC free article: PMC2832684] [PubMed: 20221265]
  • Sutherland KP, Shaban S, Joyner JL, Porter JW, Lipp EK. Human pathogen shown to cause disease in the threatened elkhorn coral Acropora palmata. PLoS One. 2011;6:e23468. [PMC free article: PMC3157384] [PubMed: 21858132]
  • Sweet MJ, Croquer A, Bythell JC. Bacterial assemblages differ between compartments within the coral holobiont. Coral Reefs. 2011;30:39–52.
  • Sweet MJ, Croquer A, Bythell JC. Development of bacterial biofilms on artificial corals in comparison to surface-associated microbes of hard corals. PLoS One. 2011;6:e21195. [PMC free article: PMC3123308] [PubMed: 21731669]
  • Tait K, Hutchison Z, Thompson FL, Munn CB. Quorum sensing signal production and inhibition by coral-associated vibrios. Environ Microbiol Rep. 2010;2:145–150. [PubMed: 23766010]
  • Tebben J, Tapiolas DM, Motti CA, Abrego D, Negri AP, Blackall LL, Steinberg PD, Harder T. Induction of larval metamorphosis of the coral Acropora millepora by tetrabromopyrrole isolated from a Pseudoalteromonas bacterium. PLoS One. 2011;6:e19082. [PMC free article: PMC3084748] [PubMed: 21559509]
  • Teplitski M, Ritchie K. How feasible is the biological control of coral diseases? Trends Ecol Evol. 2009;24:378–385. [PubMed: 19406502]
  • Teplitski M, Warriner K, Bartz J, Schneider KR. Untangling metabolic and communication networks: interactions of enterics with phytobacteria and their implications in produce safety. Trends Microbiol. 2011;19:121–127. [PubMed: 21177108]
  • Thompson JR, Randa MA, Marcelino LA, Tomita-Mitchell A, Lim E, Polz MF. Diversity and dynamics of a north Atlantic coastal Vibrio community. Appl Environ Microbiol. 2004;70:4103–4110. [PMC free article: PMC444776] [PubMed: 15240289]
  • Thurber RV, Willner-Hall D, Rodriguez-Mueller B, Desnues C, Edwards RA, Angly F, Dinsdale E, Kelly L, Rohwer F. Metagenomic analysis of stressed coral holobionts. Environ Microbiol. 2009;11:2148–2163. [PubMed: 19397678]
  • Tran C, Hadfield MG. Larvae of Pocillopora damicornis(Anthozoa) settle and metamorphose in response to surface-biofilm bacteria. Mar Ecol Prog Ser. 2011;433:85–96.
  • Vidal-Dupiol J, Ladriere O, Destoumieux-Garzon D, Sautiere PE, Meistertzheim AL, Tambutte E, Tambutte S, Duval D, Foure L, Adjeroud M, Mitta G. Innate immune responses of a scleractinian coral to vibriosis. J Biol Chem. 2011;286:22688–22698. [PMC free article: PMC3121412] [PubMed: 21536670]
  • Vidal-Dupiol J, Ladriere O, Meistertzheim AL, Foure L, Adjeroud M, Mitta G. Physiological responses of the scleractinian coral Pocillopora damicornis to bacterial stress from Vibrio coralliilyticus. J Exp Biol. 2011;214:1533–1545. [PubMed: 21490261]
  • Wagner-Dobler I, Biebl H. Environmental biology of the marine Roseobacter lineage. Annu Rev Microbiol. 2006;60:255–280. [PubMed: 16719716]
  • Webster NS, Webb RI, Ridd MJ, Hill RT, Negri AP. The effects of copper on the microbial community of a coral reef sponge. Environ Microbiol. 2001;3:19–31. [PubMed: 11225720]
  • Webster NS, Smith LD, Heyward AJ, Watts JEM, Webb RI, Blackall LL, Negri AP. Metamorphosis of a scleractinian coral in response to microbial biofilms. Appl Environ Microbiol. 2004;70:1213–1221. [PMC free article: PMC348907] [PubMed: 14766608]
  • Webster NS, Soo R, Cobb R, Negri AP. Elevated seawater temperature causes a microbial shift on crustose coralline algae with implications for the recruitment of coral larvae. ISME J. 2011;5:759–770. [PMC free article: PMC3105747] [PubMed: 20944682]
  • Wegley L, Edwards R, Rodriguez-Brito B, Liu H, Rohwer F. Metagenomic analysis of the microbial community associated with the coral Porites astreoides. Environ Microbiol. 2007;9:2707–2719. [PubMed: 17922755]
  • Weil E, Rogers CS. Coral reef diseases in the Atlantic-Caribbean. In: Dubinsky Z, Stambler N, editors. Coral Reefs: An Ecosystem in Transition. Springer, Science Business Media; Dordrecht, The Netherlands: 2011. pp. 465–491.
  • Weis VM, Davy SK, Hoegh-Guldberg O, Rodriguez-Lanetty M, Pringe JR. Cell biology in model systems as the key to understanding corals. Trends Ecol Evol. 2008;23:369–376. [PubMed: 18501991]
  • Wild C, Huettel M, Klueter A, Kremb SG, Rasheed MYM, Jorgensen BB. Coral mucus functions as an energy carrier and particle trap in the reef ecosystem. Nature. 2004;428:66–70. [PubMed: 14999280]
  • Wild C, Naumann MS, Haas A, Struck U, Mayer FW, Rasheed MY, Huettel M. Coral sand O2 uptake and pelagicbenthic coupling in a subtropical fringing reef, Aqaba, Red Sea. Aquat Biol. 2009;6:133–142.
  • Wood-Charlson EM, Hollingsworth LL, Krupp DA, Weis VM. Lectin/glycan interactions play a role in recognition in a coral/dinoflagellate symbiosis. Cell Microbiol. 2006;8:1985–1993. [PubMed: 16879456]
  • Yi H, Lim YW, Chun J. Taxonomic evaluation of the genera Ruegeria and Silicibacter: a proposal to transfer the genus Silicibacter Petursdottir and Kristjansson 1999 to the genus Ruegeria Uchino, et al. 1999. Int J Syst Evol Microbiol. 2007;57:815–819. [PubMed: 17392212]
  • Zilber-Rosenberg I, Rosenberg E. Role of microorganisms in the evolution of animals and plants: the hologenome theory of evolution. FEMS Microbiol Rev. 2008;32:723–735. [PubMed: 18549407]


,47 ,48 ,49 and 47,*.

47 Department of Biological Sciences, University of Idaho, Moscow, Idaho, USA.
48 Department of Plant Pathology and Microbiology, University of California, Riverside, California, USA.
49 Department of Plant Biology, Michigan State University, East Lansing, Michigan, USA.


Understanding the molecular mechanisms of pathogen emergence is central to mitigating the impacts of novel infectious disease agents. The chytrid fungusBatrachochytrium dendrobatidis(Bd) is an emerging pathogen of amphibians that has been implicated in amphibian declines worldwide.Bdis the only member of its clade known to attack vertebrates. However, little is known about the molecular determinants of—or evolutionary transition to—pathogenicity inBd. Here we sequence the genome ofBd’s closest known relative—a non-pathogenic chytridHomolaphlyctis polyrhiza(Hp). We first describe the genome ofHp, which is comparable to other chytrid genomes in size and number of predicted proteins. We then compare the genomes ofHp,Bd, and 19 additional fungal genomes to identify unique or recent evolutionary elements in theBdgenome. We identified 1,974Bd-specific genes, a gene set that is enriched for protease, lipase, and microbial effector gene ontology terms. We describe significant lineage-specific expansions in threeBdprotease families (metallo-, serine-type, and aspartyl proteases). We show that these protease gene family expansions occurred after the divergence ofBdandHpfrom their common ancestor and thus are localized to theBdbranch. Finally, we demonstrate that the timing of the protease gene family expansions predates the emergence ofBdas a globally important amphibian pathogen.

Author Summary

The chytrid fungus Batrachochytrium dendrobatidis (Bd) is an emerging pathogen that has been implicated in decimating amphibian populations around the world. Bd is the only member of an ancient group of fungi (called the Chytridiomycota) that is known to attack vertebrates. The question of how an amphibian-killing fungus evolved from non-pathogenic ancestors is vital to protecting the world’s remaining amphibians from Bd. We sequenced the genome of Bd’s closest known relative—a non-pathogenic chytrid named Homolaphlyctis polyrhiza (Hp). We compared the genomes of Bd, Hp and 18 additional fungi to identify what makes Bd unique. We identified a large number of Bd-specific genes, a gene set that contains a number of possible pathogenicity factors. In particular, we describe a large number of protease genes in the Bd genome and show that these genes were duplicated after the divergence of Bd and Hp from their common ancestor. Studying Bd’s pathogenesis in an evolutionary context provides new evidence for the role of protease genes in Bd’s ability to kill amphibians.


Understanding the emergence of novel pathogens is a central challenge in epidemiology, disease ecology, and evolutionary biology. Emerging pathogens of humans, wildlife, and agriculturally important crops generally have a dynamic recent evolutionary past. For example, many emerging pathogens have become adapted to new environmental conditions, shifted their host range, and/or evolved more virulent forms (Hoskisson and Trevors, 2010; Smith and Guegan, 2010; Woolhouse and Gaunt, 2007). Identifying the genetic basis of these evolutionary shifts can lend insight into the mechanisms of pathogen emergence.

Studies of the amphibian-killing fungus Batrachochytrium dendrobatidis (Bd) provide an opportunity to better understand evolutionary transitions to pathogenicity. Bd is considered the leading cause of amphibian declines worldwide and is found on every continent where amphibians occur (Berger et al., 1998; Lips et al., 2006). Bd infects amphibian skin and the resulting disease, chytridiomycosis, is responsible for population declines and extirpations in hundreds of amphibian species (Lötters et al., 2004; Skerratt et al., 2007). Bd is the only documented vertebrate pathogen in a diverse, early-branching lineage of fungi called the Chytridiomycota. Some chytrids are pathogens of plants, but most chytrids are primarily known to survive on decaying organic material as saprobes (James et al., 2006). The question of how an amphibian-killing fungus evolved from an ancestor that was not a vertebrate pathogen is vital to understanding and mitigating the chytridiomycosis epidemic and will also shed light on the evolution of novel pathogens more broadly.

Investigating the transition to pathogenicity in chytrid fungi requires an explicitly evolutionary perspective. Specifically, identifying elements of the genome that have undergone recent evolution in the branch leading to Bd may help us determine how Bd attacks its amphibian hosts. Previously we identified several families of proteases that may be involved in Bd’s ability to infect amphibian skin. Specifically, we found expanded gene families of metallo- and serine proteases in the Bd genome that exhibit life-stage specific gene expression patterns (Rosenblum et al., 2008). These proteases have been hypothesized to play a role in the ability of other fungal pathogens to invade and degrade host tissue (Burmester et al., 2011; da Silva et al., 2006; Monod, 2008; Monod et al., 2002). However, previous studies could not resolve if these gene family expansions occurred along the branch leading to Bd because the fungal genomes available for comparison were only distantly related to Bd.

To determine what unique features of the Bd genome might relate to its ability to colonize amphibian skin, we compared genomes of Bd and its closest known relative, Homolaphlyctis polyrhiza (Hp) (this isolate has been described by Joyce Longcore [pers. comm.] and has been referred to as “JEL142” in previous publications [James et al., 2006]). Bd and Hp are in the same Rhizophydiales order (Letcher et al., 2006), and Bd is the only member of this clade known to be a vertebrate pathogen (James et al., 2006). We first confirmed that Hp cannot survive on amphibian skin alone. We then sequenced and characterized the genome of Hp using Roche-454 pyrosequencing. Finally, we used a comparative genomics approach to identify differences between Bd and Hp using additional fungal species as outgroups. Based on identified unique elements of the Bd genome, we develop hypotheses for the mechanisms and evolution of Bd pathogenicity.

Materials and Methods

Taxon Sampling

Our focal isolates were the JAM81 strain of Bd and the JEL142 strain of Hp. JAM81 was isolated from Rana muscosa in the Sierra Nevada Mountains in California, where Bd has caused catastrophic declines in R. muscosa populations (Rachowicz et al., 2006). Hp was collected from leaf litter in Maine and is a presumed saprobe. We also used the information from publically available genomes of an additional Bd isolate—JEL423 (, and an additional chytrid, Spizellomyces punctatus, a terrestrial saprobe (Origins of Multicellularity Sequencing Project, Broad Institute of Harvard and MIT []). Finally we used the genome information from 17 additional publicly available fungal genomes (Table S1). We chose these outgroups to represent a broad phylogenetic survey of fungi that span four additional fungal phyla: Blastocladiomycota (Allomyces macrogynus), Zygomycota (Phycomyces blakesleeanus), Basidiomycota (Coprinopsis cinerea, Cryptococcus neoformans, Puccinia graminis f. sp. tritici, Ustilago maydis), and Ascomycota (Arthroderma benhamiae, Aspergillus nidulans, Blastomyces dermatitidis, Botrytis cinerea, Coccidioides immitis, Fusarium graminearum, Microsporum canis, Neurospora crassa, Pyrenophora tritici-repentis, Trichophyton rubrum, and Uncinocarpus reesii). Arthroderma benhamiae, M. canis, and T. rubrum were chosen in particular because they are dermatophytes (i.e., fungal pathogens that infect skin).

We reconstructed the phylogenetic relationships among the 19 taxa used in this study using Bayesian phylogenetic analyses of 51 single-copy genes. The alignment was comprised of 21,182 total trimmed amino acid residues. The orthologous sequences were aligned with T-Coffee (Notredame et al., 2000), concatenated, and trimmed with trimAl (Capella-Gutiérrez et al., 2009). The Basidiomycota phylum was constrained by members Ustilago maydis and Puccinia graminis, and the tree rooted with the Chytridiomycota clade based on James et al. (2006). Bayesian posterior probabilities are shown below internal nodes and ML bootstrap values from 100 replicates above the nodes.

Growth of Bd and Hp on Amphibian Skin

We grew Bd (JAM81) and Hp on the standard growth medium PmTG (made from peptonized milk, tryptone and glucose) (Barr, 1986). After one week of growth, we transferred 3.8×106 zoospores from each isolate to 3 mL of two liquid growth conditions: standard growth media and amphibian skin. For standard growth media we used 1% liquid PmTG, and for amphibian skin we used 10% w/v pulverized and autoclaved cane-toad skin in water. We established six technical replicates of each isolate in each condition. Liquid cultures were gently shaken in 6-well tissue culture plates. To test how long Bd and Hp survived in each growth condition, we tested an aliquot from each culture every day for 14 days. Each day we removed 15 μL from each of the technical replicates, pooled aliquots for each isolate in each treatment group, and inoculated PmTG-agar growth plates. We inspected growth plates every day using 200× magnification to visualize whether active zoospores were produced.

Hp Genome Sequence, Assembly, and Annotation

We grew Hp at room temperature (23–25C) in liquid PmTG medium with gentle agitation for approximately 2 weeks. We extracted Hp DNA using a Zolan and Pukkila (Zolan and Pukkila, 1986) protocol modified by the use of 2% sodium dodecyl sulphate as extraction buffer in place of CTAB. We sequenced the Hp genome using a Roche 454 Genome Sequencer FLX with Titanium chemistry and standard Roche protocol. We screened and trimmed 1,100,797 reads of vector sequences and assembled them with Roche’s GS De Novo Assembler. We improved the assembly by synteny-based alignment to the JAM81 genome sequence with Mercator (Dewey, 2007).

We annotated the Hp genome with predicted proteins using the MAKER annotation pipeline (Cantarel et al., 2008). MAKER predicts proteins based on homology with protein-coding sequences of other species, and with the consensus of the ab initio gene prediction algorithms GeneMark, AUGUSTUS, and SNAP. GeneMark is self-training so we simply applied it to determine ab initio parameters. We trained AUGUSTUS using parameters provided in the MAKER package and previously determined Bd training parameters. We trained SNAP by iteratively running MAKER with SNAP Bd models and then retraining on the most confident gene model parameters from the initial run. All parameters files are available in Because MAKER’s final set of predicted proteins (referred to hereafter as “Hp_Maker”) is a conservative estimate that relies upon the consensus of different prediction algorithms, we also used the set of ab initio predicted proteins in MAKER by GeneMark-ES (Ter-Hovhannisyan et al., 2008) as an upper limit (referred to hereafter as “Hp_GeneMark”). Hp_Maker is not a perfect subset of Hp_GeneMark, so we considered both datasets when characterizing the proteome of Hp. We annotated Hp protein models by comparison to the Pfam database of protein domains (Finn et al., 2010) using HMMER 3.0 (

We used two methods that rely on different algorithms to confirm that we successfully identified the majority of Hp proteins. First, we used the eukaryotic genome annotation pipeline CEGMA to predict the number of core eukaryotic genes in the Hp alignment (Parra et al., 2007). Second we determined the number of “chytrid-specific” orthologous groups that were present in the Hp genome. We defined chytrid-specific orthologous groups as those groups shared between all available Chytridiomycota genomes: two Bd isolates (JAM81 and JEL423) and one Spizellomyces punctatus isolate (DAOM BR117) (Table S1). We identified chytrid-specific orthologous groups using BLASTP (Altschul et al., 1990) and OrthoMCL (Li et al., 2003), and determined how many of these were also found within either set of Hp predicted proteins (i.e., Hp_Maker and Hp_GeneMark).

Bd Unique Genomic Features

We also used BLASTP and OrthoMCL to determine orthologous groups for all sampled taxa. These orthologous groups were used to determine “Bd-specific” genes which we defined as those groups or genes that were present in both sequenced Bd genomes (JAM81 and JEL423) but absent from all other sampled fungi. [Note that the Bd-specific gene set is distinct from the more broadly defined chytrid-specific gene set discussed above]. We used GO::TermFinder (Boyle et al., 2004) to determine if the Pfam annotations for the set of Bd-specific genes showed enrichment for particular GO terms.

Bd Gene Family Expansions

We identified several gene family expansions in Bd through inspection of the top ten largest Bd-specific orthologous groups and inspection of enriched GO categories. We found gene family expansions in families with genes containing M36, S41, and Asp (both Asp and Asp_protease) protease Pfam signature domains (see Table S2 for sequences and their Pfam domain delimitation). We conducted an exhaustive search in the focal genomes for M36, S41, and Asp domains using HMMER3 ( For Hp we conducted the HMMER3 search in both the MAKER and GENEMARK datasets. For S41 and Asp, the predicted proteins from Maker were subsets of those from GeneMark, so we only report GeneMark names. For M36 there were several Maker predicted proteins that were not included in the GeneMark set, so we report both Maker and GeneMark names. We then aligned the sequences of the protein domains for all members in each expanded family for the three Chytridiomycota genomes (Bd, Hp, and Spizellomyces punctatus) and one Blastocladiomycota outgroup (Allomyces macrogynus). We generated these alignments using the iterative alignment program MUSCLE (Edgar, 2004). After inspecting the alignments, we found that 8 M36 and 13 Asp protein sequences were missing >50% of their domain sequences. These partial sequences were likely mis-annotation or pseudogenes so we excluded them from further analysis (see Table S2B for identities of excluded partial sequences). After aligning the protein domain sequences of the remaining proteins (see Figure S1 for alignments), we reconstructed gene trees for each family using the Maximum Likelihood method implemented in RAxML (Stamatakis et al., 2005). We used the rapid bootstrap algorithm (400 replicates) with the Jones-Taylor-Thornton substitution matrix assuming a gamma model of rate heterogeneity. We report the Maximum Likelihood trees with the highest log likelihood score and bootstrap support values.

We calculated synonymous and non-synonymous substitution rates (Ks and Ka, respectively) with the yn00 program implemented in the PAML package (Yang, 2007) using full length annotated coding sequences. For each expanded protease gene family (containing M36, S41, and Asp domains) we calculated Ks and Ka of putative orthologs between all focal taxa pairs [i.e., chytrids (Bd, Hp, and Spizellomyces punctatus) and between all focal taxa and the outgroup (Allomyces macrogynus)]. We identified putative orthologs based on a cross-species reciprocal best match between any species pairs (Hanada et al., 2008). In addition, we used a second, more stringent approach that required sequence distances between reciprocal best matches to follow the relationships between the four focal species. Because the rate distributions from these two approaches were similar, we only report results from the first approach. Because yn00 does not robustly correct for multiple substitutions (Yang and Bielawski, 2000), and because Ks values are large between our focal taxa, we use Ks values to make a general comparison (within versus between species) for rates of molecular evolution.

We made rough divergence time estimates for the duplication events in the three expanded protease gene families using “node-Ks” as a proxy of time. The node Ks is defined as follows: for each node N in the mid-point rooted phylogeny, its Ks is the averaged Ks values between all operational taxonomic unit pairs across the two lineages that originated from N. There are no empirical estimates of chytrid substitution rates, so we do not propose specific dates for the duplication events. However, we do use a rough approximation for a reasonable substitution rate (following previous molecular evolution studies in fungi [Lynch and Conery, 2000]) to test whether the timing of gene duplications was likely coincident with the emergence of Bd as a deadly amphibian pathogen.


Taxon Sampling

The phylogenetic relationship among all 19 taxa in this study can be seen in Figure A12-1. As described above, we sampled genomes from across the diversity of five fungal phyla (i.e., Chytridiomycota, Blastocladiomycota, Zygomycota, Basidiomycota, Ascomycota). Our sampling scheme allowed us to determine, in a phylogenetic context, which elements of Bd’s genome are shared with Hp and other fungal taxa.

A phylogenetic tree showing the relationships among the 19 taxa used in comparative genomics analyses


Phylogenetic relationships among the 19 taxa used in comparative genomics analyses. Focal taxon, Hp, boxed in grey. We compared the Hp genome to the genome of the amphibian pathogen Bd and to a diverse group of other fungal genomes including representatives (more...)

Growth of Bd and Hp on Amphibian Skin

Both Bd and Hp grew well in standard PmTG growth media and produced viable zoospores throughout the entire 14 day observation period. However, only Bd survived on frog skin alone. Bd produced viable zoospores in the cane-toad skin treatment throughout the entire observation period, and after 14 days of incubation the Bd—frog skin solution was cloudy with chytrid growth and degraded skin (Figure A12-2). Conversely, Hp did not survive and reproduce on cane-toad skin alone. We observed viable zoospores for Hp in the cane-toad skin treatment only for the first three days (these zoospores most likely persisted from the initial inoculation), and after 14 days of incubation the Hp—cane-toad skin solution remained clear of chytrid growth and the cane-toad skin remained intact and not further degraded (Figure A12-2). We did not observe the growth of any bacterial or fungal contaminants in any of the treatments.

A three panel photograph showing Chytrid growth on cane toad skin with no treatment, Hp treatment, and Bd treatment


Chytrid growth on cane-toad-skin. A. Negative control (no chytrid): intact skin after 14 days. B. Hp treatment: intact skin and no Hp growth after 14 days. C. Bd treatment: degraded skin and Bd growth after 14 days.

Hp Genome Sequence, Assembly, and Annotation

We achieved a roughly 11.2× coverage of the Hp genome (total number of aligned bases divided by final genome length, assuming that most of the genome is represented in the aligned reads). We assembled 922,085 screened and trimmed sequencing reads into 16,311 contigs (N50 = 36,162). We inferred a haploid genome size for Hp of 26.7 Mb, comparable to other Chytridiomycota genomes [Bd (JAM81) = 24.3 Mb, and Spizellomyces punctatus = 24.1 Mb]. We have deposited the Hp 454 reads in GenBank through the NCBI Sequence Read Archives under the accession SRA037431.1, and we have deposited the Whole Genome Shotgun project at DDBJ/EMBL/GenBank under the accession AFSM00000000 (the version described here is the first version, AFSM01000000).

We generated 5,355 high confidence MAKER predictions and 11,857 GeneMark ab initio predictions for Hp’s protein coding genes. The number of predicted Hp proteins falls within the range of other annotated chytrid genomes (8,732 predicted proteins in Bd (JAM81) and 8,804 in Spizellomyces punctatus). The difference in number of Hp predicted protein numbers between MAKER and GeneMark is due to MAKER’s conservative approach, which relies upon homology with protein-coding sequences of other species, and with the consensus of multiple ab initio gene prediction algorithms. We did not directly validate the number of expressed genes in our predicted protein sets with EST or RNA sequencing. However, we did compare the Hp predicted protein set to gene content in other species, which provides confidence in the Hp annotation and assembly. We recovered 92% (228/249) of the core eukaryotic genes using CEGMA in the Hp_Maker dataset. Similarly, we identified 3,216 orthologous groups of “chytrid-specific” proteins shared among both Bd isolates and S. punctatus (Table S3). Of the predicted chytrid-protein set we recovered 90% (2,885/3,216) in one or both Hp predicted protein sets (2,271 in Hp_Maker and 2,817 in Hp_GeneMark). Together, these results indicate that our sequencing efforts recovered a large proportion of genes that are predicted to occur in the Hp genome.

Bd Unique Genomic Features

We identified Bd-specific genes using the genomes of Hp and 17 additional fungi. We considered genes to be Bd-specific if they were present in orthologous groups in both sequenced Bd genomes (JAM81 and JEL423) and absent from all other fungi including Hp. Using OrthoMCL clustered proteins we defined 6,556 orthologous groups in Bd (Table S4). Of the 6,556 orthologous groups in Bd, 1700 were Bd-specific by the above definition. The Bd-specific orthologous groups were comprised of 1,974 protein encoding genes, 417 (21%) of which could be functionally categorized by a Pfam domain (with an e-value <0.01) (Table S4). We did not find any orthologous groups uniquely shared between Bd and the dermatophytes to the exclusion of all other fungal outgroups (Table S4). Although we defined orthologous groups using the sequenced genomes of both Bd isolates (JAM81 and JEL423), below we report gene IDs from JAM81 for simplicity.

We conducted enrichment analyses using gene ontology (GO) terms from the set of 417 Bd-specific genes associated with a Pfam domain and found enrichment in all 3 GO structured vocabularies: Cellular Component, Biological Process, and Molecular Function. We present all significantly enriched GO terms (with a corrected P-value of ≤ = 0.05) for the Bd-specific gene set in Table A12-1. Briefly, in the Biological Process ontology we found enrichment for genes involved in metabolic processes and regulation of carbohydrates, proteins, and transcription. In the Cellular Component ontology we found enrichment of genes located extracellularly, in the nucleus, and in membranes. In the Molecular Function ontology we found enrichment for genes involved in zinc-ion binding, protein dimerization, DNA-binding, hydrolase activity, and protease and triglyceride lipase activity.

TABLE A12-1. The Enrichment of Cellular Component, Biological Process and Molecular Function GO Terms of 417 Bd Specific Genes Associated with a Pfam Domain.


The Enrichment of Cellular Component, Biological Process and Molecular Function GO Terms of 417 Bd Specific Genes Associated with a Pfam Domain.

Within the set of Bd-specific and GO-enriched genes were several functional groups of particular interest for their possible role in Bd pathogenesis. First, many Bd-specific genes were proteases and were found in expanded gene families (see below). Second, the Bd-specific gene set was enriched for genes containing the Lipase_3 Pfam domain found in triacylglyceride lipases (6 of 417 in the Bd-specific gene list, vs 20 of 8732 in the genome, p<0.03) (BATDEDRAFT 93190, BATDEDRAFT 26490, BATDEDRAFT_86691, BATDEDRAFT 93191, BATDEDRAFT_89307, BATDEDRAFT_26489). Third, we identified 62 genes from the Bd-specific gene set that encode Crinkler or CRN-like microbial effectors (CRN), a class of genes previously reported only in oomycetes and not found in any of the other fungi considered here (Figure A12-3 and Table S5).

A diagram showing gene family copy numbers for metalloproteases


Gene family copy numbers for metalloproteases (M36), serine-type proteases (S41), aspartyl proteases (ASP) and CRN-like proteins (CRN) in the Chytridiomycota (Bd, Hp and S. punctatus), and a Blastocladiomycota outgroup (A. macrogynus). Phylogenetic relationship (more...)

Bd Gene Family Expansions

We conducted more detailed analyses for three protease gene families that were identified in the Bd-specific gene set and showed GO term enrichment: metallo-, serine-type, and aspartyl proteases (M36, S41, and Asp Pfam domains, respectively). The Bd genome contained 38 metalloproteases, 32 serine-type proteases, and 99 aspartyl proteases, in all cases at least 4 times as many family members as Hp (Figure A12-3). We found that expansions of metalloproteases, serine-type proteases and aspartyl proteases were largely Bd specific, having occurred after the split between the Bd and Hp lineages from their most recent common ancestor (Summary in Figure A12-4, gene-names available in tree, Figure S2). In all three families, Bd had a greater number of gene copies than any of the other focal taxa, and the Bd gene copies were generally clustered together to the exclusion of homologues from other taxa. This clustering is consistent with lineage-specific gene family expansions in Bd (Figure A12-4). We observed a large number of metalloprotease genes not only in Bd but also in Allomyces macrogynus (38 and 31 gene family members, respectively) (Figure A12-3). However, the gene tree indicates that the expansion of metalloprotease genes in Bd and A. macrogynus were independent with most duplication events occurring after the divergence of Bd and A. macrogynus from their common ancestor (Figure A12-4A).

A three panel illustration of three phylogenetic trees showing the maximum likelihood phylogenies of gene families containing M36, S41, and Asp Pfam domains


Maximum likelihood phylogenies of gene families containing (A) M36, (B) S41, and (C) Asp Pfam domains. Each tip represents a single gene copy and each source species is denoted by shaded or hatched boxes (Bd: black, Hp: grey, S. punctatus: hatched, A. (more...)

In addition to identifying many lineage-specific duplicates of proteases in Bd, we demonstrate that these Bd duplication events likely occurred significantly more recently than the divergence time between the species analyzed (Figure A12-5). To assess the timing of expansion in each protease gene family, we calculated synonymous substitution rate, Ks, between homologs and based on the phylogeny, we calculated a node Ks value for each lineage-specific duplication node (Figure A12-5, left panel). The median Ks values for the metallo-, serine-type, and aspartyl—proteases derived from Bd-specific duplications were 0.37, 0.14 and 0.24, respectively (Figure A12-5, left panel). We note that in the metalloprotease family there were similar numbers of lineage-specific duplications in Bd (24) and A. macrogynus (28). However the Ks values of Bd-specific duplicates were significantly lower A. macrogynus duplicates (median Ks: 0.37 and 1.56, respectively; Kolmogorov-Smirnov tests, p<4.6e-5), indicating that Bd-specific M36 duplications took place much more recently than the A. macrogynus duplications.

A two panel box plot showing paralog and ortholog rates of synonymous substitution and K™s values for putative orthologs


Left panel (paralog rates) shows box plots of synonymous substitution rates (Ks) for Bd lineage-specific duplicates in three protease families. Right panel (ortholog rates) shows box plots of Ks values for putative orthologs between Bd and Hp, Bd and (more...)

We also examined Ks values of putative orthologs between Bd and Hp, Bd and S. punctatus, and Bd and A. macrogynus (Figure A12-5B, right panel). As expected from the phylogenetic relationships of these four species (James et al., 2006), the median Ks for all species was high, but the median Ks for Bd-Hp (3.40) was significantly lower than that of Bd-S. punctatus (3.94) and Bd-A. macrogynus (4.03) (Kolmogorov-Smirnov tests, p<2.2e-16). Importantly, the median Bd-Hp orthologous Ks values were ~9–24 fold higher than the median Ks of Bd lineage-specific duplicates. Therefore, the Bd-specific duplications occurred substantially more recently than the divergence of Bd and Hp. Previous molecular evolution studies in fungi have used a synonymous nucleotide substitution rate of 8.1e-9 substitutions per site per year to estimate the timing of molecular events (Lynch and Conery, 2000). If this substitution rate is reasonable for chytrid fungi, the duplication events leading to the metallo-, serine-type, and aspartyl protease gene family expansions in Bd would be millions of years old (Table S6). Even if the true substitution rate differs by several orders of magnitude, it is important to recognize that these protease duplication events occurred long before Bd emerged as a global pathogen of amphibians.


To investigate the genomic changes that accompanied the evolution of pathogenicity, we compared Bd, the deadly chytrid pathogen of amphibians, with Hp, a closely related chytrid that is not a known pathogen of vertebrates. We confirm that Bd and Hp have different nutritional modes (Figure A12-2); unlike Hp, Bd is capable of growing on amphibian skin alone. Given the most chytrids are saprobes like Hp, Bd’s ability to infect vertebrate skin likely arose after the divergence of Bd and Hp from their common ancestor. Fungal growth on vertebrate skin requires the expression of enzymes that break-down host epidermal tissue (Burmester et al., 2011; da Silva et al., 2006; Monod, 2008; Monod et al., 2002). Because Bd causes chytridiomycosis by infecting frog skin (Longcore et al., 1999; Voyles et al., 2009) we were particularly interested in elements of the Bd genome whose evolution might have allowed Bd to colonize and degrade amphibian skin.

We compared the genomes of Bd and Hp in a broad taxonomic context of 18 diverse fungal genomes to identify genomic factors that make Bd unique. The Bd and Hp genomes are similar in size and number of predicted genes but show important differences in gene content. Therefore we could identify Bd-specific genes (i.e., genes that were found in Bd but not in Hp or other fungal outgroups). Bd-specific genes are enriched for GO terms related to extracellular and enzymatic activity. Many Bd-specific genes are members of recently expanded gene families (i.e., gene families with significantly more members than other fungal species). Below we discuss Bd-specific genes with particular emphasis on understanding how Bd may interact with its amphibian hosts.

Proteases are the most dramatically enriched class of Bd-specific genes. The Bd genome contains expanded gene families of metalloproteases, serine-type proteases, and aspartyl proteases. Each of these Bd gene families contains more than 30 family members and contains 4–10 times as many family members as found in Hp (Figure A12-3). Extracellular fungal proteases have been implicated in the adherence to, invasion of, and degradation of host cells by other fungal pathogens (Burmester et al., 2011; da Silva et al., 2006; Monod, 2008; Monod et al., 2002). In particular, protease gene family expansions have been suggested as a link to pathogenesis in other fungal pathogens. Several fungal pathogens of vertebrates (e.g., Arthroderma benhamiae, Coccidioides spp, and Trichophyton spp.) exhibit gene family expansions specifically for metalloproteases and serine-type proteases (Burmester et al., 2011; Jousson et al., 2004; Sharpton et al., 2009).

Here we strengthen the evidence implicating proteases in Bd pathogenesis in several ways. First and most importantly we demonstrate that protease gene families are not expanded in Hp and thus polarize the expansion events to a much shorter phylogenetic branch leading to Bd. Second, we present an additional protease gene family expansion. We previously reported the Bd gene family expansions for metallo- and serine proteases (Rosenblum et al., 2008), and we now describe a dramatic expansion of aspartyl proteases in the Bd genome (Figure A12-3). Aspartyl proteases are of particular interest because they have been implicated in the adherence to and invasion of human host tissue by fungal pathogens (Candida spp.) (Kaur et al., 2007; Monod and Borg-von Zepelin, 2002). Many genes in the expanded metallo-, serine- and aspartyl-protease gene families are highly expressed, and in some cases differentially expressed between Bd life stages (Rosenblum et al., 2008). Finally, we more rigorously document the dynamics of protease gene family expansions. Calculations using a range of reasonable substitution rates show that the three protease gene family expansions occurred substantially more recently than the divergence of Bd and Hp from their common ancestor.

It is important to caution that our comparative genomics results do not conclusively demonstrate a role for proteases as pathogenicity factors. First, we lack a specific mechanism by which proteases mediate Bd host invasion. Understanding the functional consequences of protease gene family expansions will require molecular assays to determine how specific enzymes contribute to host substrate metabolism. Second, protease gene family expansions are not always obviously correlated with fungal pathogenicity. For example we observed a large number of metalloprotease genes in Allomyces macrogynus and a variable number of aspartyl proteases in several of our outgroup taxa (Figure A12-4). These are independent expansion events relative to the Bd gene duplications and are not associated with a specific shift in substrate metabolism. Third, the estimated timing of the Bd protease gene duplications does not unambiguously link particular genes to the recent emergence of Bd as a global frog pathogen. Although the gene duplication events are relatively recent, most still likely occurred millions of years ago. More ancient duplication of protease genes may have set the stage for Bd’s ability to infect frogs, but finer scale intraspecific data will be required to determine whether particular paralogs exhibit molecular signatures of recent selection.

While proteases may play the most obvious role in pathogen invasion and metabolism of host tissue, we also observed an enrichment of genes with triglyceride lipase activity in the Bd-specific gene set (Table A12-1). These enzymes are known to play a role in fungal-plant interactions (Gaillardin, 2000), and have been hypothesized to play a role in at least one fungal-vertebrate interaction—between Malassezia furfur and the skin of its human host (Brunke and Hube, 2006). M. furfur incorporates host lipids into its own cell wall; this is thought to assist M. furfur in adhering to the host and evading the host’s immune system. The extent to which Bd can utilize the products of triglyceride lipase activity for nutrition or adhesion remains to be tested. However, the enrichment of triglyceride lipase genes in the Bd-specific gene set suggests considering whether lipases could play a role in Bd’s invasion of host tissue.

In addition to the genes that may be involved in host tissue metabolism we observed a large number of Bd-specific genes with similarity to microbial proteins known as Crinklers and Crinkler-like effectors (CRN). Microbial effectors in general act within the host cytoplasm to suppress host defenses and alter normal host cell metabolism (Haas et al., 2009; Kamoun, 2006). It is unusual that a fungus contains CRN effectors as these proteins have so far only been reported from oomycetes, a group of important plant and fish pathogens within the kingdom Chromista (Cavalier-Smith and Chao, 2006). CRN effectors are modular proteins consisting of a signal-peptide, a downstream translocator domain that allows CRN proteins to gain entry into host cells, and a C-terminus domain that interacts with host proteins (Haas et al., 2009). While 62 unique Bd proteins show similarity to CRN effectors at the protein level, only one is predicted to be secreted (BATDEDRAFT_23205). Therefore while the function of putative CRN effectors in Bd remains to be determined, the possibility that they function as microbial effectors and interact with host elements merits further investigation.

We have sequenced the genome of Bd’s closest known relative to develop hypotheses for genomic determinants of Bd’s ability to infect and kill amphibians. However, the divergence between Bd and Hp is still substantial (James et al., 2006). Recent research indicates that chytrids may be more ubiquitous than previously appreciated in both aquatic and terrestrial environments (Freeman et al., 2009), and much chytrid diversity remains to be characterized. The discovery of additional taxa more closely related to Bd than Hp would help further localize genomic changes to the Bd lineage. Interspecific comparisons such as the one presented here can be complemented by intraspecific comparisons among Bd isolates to understand the evolutionary dynamics of genes hypothesized to play a role in Bd pathogenicity. However, robust hypothesis testing will require functional characterization of genes that may be important to Bd’s ability to infect frogs. Bd currently lacks a transformation system in which to study gene function, but heterologous expression systems could potentially be used to determine specific gene functions. Additionally, understanding expression patterns of candidate genes under different nutrient conditions and during different stages of host invasion are likely to yield important insights. Ultimately, identifying the molecular mechanisms of host-pathogen interactions will provide new avenues for mitigating the devastating effects of chytridiomycosis.


We thank Joyce Longcore (University of Maine) for providing Hp strain JEL142. We thank Matt Settles (University of Idaho) for bioinformatics support. We thank Joyce Longcore, Tim James (University of Michigan), and Jamie Voyles (University of Idaho) for comments on the manuscript and input throughout the project. We acknowledge Igor Grigoriev and the Joint Genome Institute for access to the B. dendrobatidis (JAM81) genome.

Author Contributions

Conceived and designed the experiments: SJ JES EBR. Performed the experiments: SJ. Analyzed the data: SJ JES SHS. Contributed reagents/materials/analysis tools: JES SHS SJ EBR. Wrote the paper: SJ EBR.


  • Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215:403–410. [PubMed: 2231712]
  • Barr DJS. Allochytridium expandens rediscovered—morphology, physiology and zoospore ultrastructure. Mycologia. 1986;78:439–448.
  • Berger L, Speare R, Daszak P, Green DE, Cunningham AA, et al. Chytridiomycosis causes amphibian mortality associated with population declines in the rain forests of Australia and Central America. Proc Natl Acad Sci U S A. 1998;95:9031–9036. [PMC free article: PMC21197] [PubMed: 9671799]
  • Boyle EI, Weng SA, Gollub J, Jin H, Botstein D, et al. GO::TermFinder—open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes. Bioinformatics. 2004;20:3710–3715. [PMC free article: PMC3037731] [PubMed: 15297299]
  • Brunke S, Hube B. MfLIP1, a gene encoding an extracellular lipase of the lipid-dependent fungus Malassezia furfur. Microbiology. 2006;152:547–554. [PubMed: 16436442]
  • Burmester A, Shelest E, Glöckner G, Heddergott C, Schindler S, et al. Comparative and functional genomics provide insights into the pathogenicity of dermatophytic fungi. Genome Biol. 2011;12:R7. [PMC free article: PMC3091305] [PubMed: 21247460]
  • Cantarel BL, Korf I, Robb SMC, Parra G, Ross E, et al. MAKER: An easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Res. 2008;18:188–196. [PMC free article: PMC2134774] [PubMed: 18025269]
  • Capella-Gutiérrez S, Silla-Martínez JM, Gabaldón T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics. 2009;25:1972–1973. [PMC free article: PMC2712344] [PubMed: 19505945]
  • Cavalier-Smith T, Chao EEY. Phylogeny and megasystematics of phagotrophic heterokonts (kingdom Chromista) J Mol Evol. 2006;62:388–420. [PubMed: 16557340]
  • da Silva BA, dos Santos ALS, Barreto-Bergter E, Pinto MR. Extracellular peptidase in the fungal pathogen Pseudallescheria boydii. Curr Microbiol. 2006;53:18–22. [PubMed: 16775782]
  • Dewey C. Aligning multiple whole genomes with Mercator and MAVID. Methods Mol Biol. 2007;395:221–236. [PubMed: 17993677]
  • Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32:1792–1797. [PMC free article: PMC390337] [PubMed: 15034147]
  • Finn RD, Mistry J, Tate J, Coggill P, Heger A, et al. The Pfam protein families database. Nucleic Acids Res. 2010;38:D211–D222. [PMC free article: PMC2808889] [PubMed: 19920124]
  • Freeman KR, Martin AP, Karki D, Lynch RC, Mitter MS, et al. Evidence that chytrids dominate fungal communities in high-elevation soils. Proc Natl Acad Sci U S A. 2009;106:18315–18320. [PMC free article: PMC2775327] [PubMed: 19826082]
  • Gaillardin C. Lipases as Pathogenicity Factors of Fungi. In: Timmis KN, editor. Handbook of Hydrocarbon and Lipid Microbiology. Springer; 2010. pp. 3259–3268.
  • Haas BJ, Kamoun S, Zody MC, Jiang RHY, Handsaker RE, et al. Genome sequence and analysis of the Irish potato famine pathogen Phytophthora infestans. Nature. 2009;461:393–398. [PubMed: 19741609]
  • Hanada K, Zou C, Lehti-Shiu MD, Shinozaki K, Shiu SH. Importance of lineage-specific expansion of plant tandem duplicates in the adaptive response to environmental stimuli. Plant Physiol. 2008;148:993–1003. [PMC free article: PMC2556807] [PubMed: 18715958]
  • Hoskisson PA, Trevors JT. Shifting trends in pathogen dynamics on a changing planet. Antonie Van Leeuwenhoek. 2010;98:423–427. [PubMed: 20640888]
  • James TY, Letcher PM, Longcore JE, Mozley-Standridge SE, Porter D, et al. A molecular phylogeny of the flagellated fungi (Chytridiomycota) and description of a new phylum (Blastocladiomycota) Mycologia. 2006;98:860–871. [PubMed: 17486963]
  • Jousson O, Léchenne B, Bontems O, Capoccia S, Mignon B, et al. Multiplication of an ancestral gene encoding secreted fungalysin preceded species differentiation in the dermatophytes Trichophyton and Microsporum. Microbiology. 2004;150:301–310. [PubMed: 14766908]
  • Kamoun S. A catalogue of the effector secretome of plant pathogenic oomycetes. Annu Rev Phytopathol. 2006;44:41–60. [PubMed: 16448329]
  • Kaur R, Ma B, Cormack BP. A family of glycosylphosphatidylinositol-linked aspartyl proteases is required for virulence of Candida glabrata. Proc Natl Acad Sci U S A. 2007;104:7628–7633. [PMC free article: PMC1863504] [PubMed: 17456602]
  • Letcher PM, Powell MJ, Churchill PF, Chambers JG. Ultrastructural and molecular phylogenetic delineation of a new order, the Rhizophydiales (Chytridiomycota) Mycol Res. 2006;110:898–915. [PubMed: 16919432]
  • Li L, Stoeckert CJ, Roos DS. OrthoMCL: Identification of ortholog groups for eukaryotic genomes. Genome Res. 2003;13:2178–2189. [PMC free article: PMC403725] [PubMed: 12952885]
  • Lips KR, Brem F, Brenes R, Reeve JD, Alford RA, et al. Emerging infectious disease and the loss of biodiversity in a Neotropical amphibian community. Proc Natl Acad Sci U S A. 2006;103:3165–3170. [PMC free article: PMC1413869] [PubMed: 16481617]
  • Longcore JE, Pessier AP, Nichols DK. Batrachochytrium dendrobatidis gen et sp nov, a chytrid pathogenic to amphibians. Mycologia. 1999;91:219–227.
  • Lötters S, La Marca E, Stuart S, Gagliardo R, Veith M. A new dimension of current biodiversity loss. Herpetotropicos. 2004;1:29–31.
  • Lynch M, Conery JS. The evolutionary fate and consequences of duplicate genes. Science. 2000;290:1151–1155. [PubMed: 11073452]
  • Monod M. Secreted proteases from dermatophytes. Mycopathologia. 2008;166:285–294. [PubMed: 18478360]
  • Monod M, Borg-von Zepelin M. Secreted aspartic proteases as virulence factors of Candida species. Biol Chem. 2002;383:1087–1093. [PubMed: 12437091]
  • Monod M, Capoccia S, Léchenne B, Zaugg C, Holdom M, et al. Secreted proteases from pathogenic fungi. Int J Med Microbiol. 2002;292:405–419. [PubMed: 12452286]
  • Notredame C, Higgins DG, Heringa J. T-Coffee: A novel method for fast and accurate multiple sequence alignment. J Mol Biol. 2000;302:205–217. [PubMed: 10964570]
  • Parra G, Bradnam K, Korf I. CEGMA: a pipeline to accurately annotate core genes in eukaryotic genornes. Bioinformatics. 2007;23:1061–1067. [PubMed: 17332020]
  • Rachowicz LJ, Knapp RA, Morgan JAT, Stice MJ, Vredenburg VT, et al. Emerging infectious disease as a proximate cause of amphibian mass mortality. Ecology. 2006;87:1671–1683. [PubMed: 16922318]
  • Rosenblum EB, Stajich JE, Maddox N, Eisen MB. Global gene expression profiles for life stages of the deadly amphibian pathogen Batrachochytrium dendrobatidis. Proc Natl Acad Sci U S A. 2008;105:17034–17039. [PMC free article: PMC2566996] [PubMed: 18852473]
  • Sharpton TJ, Stajich JE, Rounsley SD, Gardner MJ, Wortman JR, et al. Comparative genomic analyses of the human fungal pathogens Coccidioides and their relatives. Genome Res. 2009;19:1722–1731. [PMC free article: PMC2765278] [PubMed: 19717792]
  • Skerratt LF, Berger L, Speare R, Cashins S, McDonald KR, et al. Spread of chytridiomycosis has caused the rapid global decline and extinction of frogs. EcoHealth. 2007;4:125–134.
  • Smith KF, Guegan JF. Annual Review of Ecology, Evolution, and Systematics. Vol. 41. Palo Alto: Annual Reviews; 2010. Changing Geographic Distributions of Human Pathogens; pp. 231–250.
  • Stamatakis A, Ludwig T, Meier H. RAxML-III: a fast program for maximum likelihood-based inference of large phylogenetic trees. Bioinformatics. 2005;21:456–463. [PubMed: 15608047]
  • Ter-Hovhannisyan V, Lomsadze A, Chernoff YO, Borodovsky M. Gene prediction in novel fungal genomes using an ab initio algorithm with unsupervised training. Genome Res. 2008;18:1979–1990. [PMC free article: PMC2593577] [PubMed: 18757608]
  • Voyles J, Young S, Berger L, Campbell C, Voyles WF, et al. Pathogenesis of Chytridiomycosis, a Cause of Catastrophic Amphibian Declines. Science. 2009;326:582–585. [PubMed: 19900897]
  • Woolhouse M, Gaunt E. Ecological origins of novel human pathogens. Crit Rev Microbiol. 2007;33:231–242. [PubMed: 18033594]
  • Yang ZH. PAML 4: Phylogenetic analysis by maximum likelihood. Mol Biol Evol. 2007;24:1586–1591. [PubMed: 17483113]
  • Yang ZH, Bielawski JP. Statistical methods for detecting molecular adaptation. Trends Ecol Evol. 2000;15:496–503. [PubMed: 11114436]
  • Zolan ME, Pukkila PJ. Inheritance of DNA methylation in Coprinus cinereus. Mol Cell Biol. 1986;6:195–200. [PMC free article: PMC367498] [PubMed: 3785146]


,51,†,‡ ,52,‡ ,51,‡ ,53 ,51 ,51 ,53 ,53 ,53 ,52 ,51 ,52,† and 53,†.

E-mail: toredid-sirap-vinu.mji@xilef (MAF); (EAM); ude .ltsuw@gnawevad (DW)
These authors contributed equally to this work.
51 Institut Jacques Monod, CNRS-University of Paris-Diderot, Paris, France.
52 Gurdon Institute, University of Cambridge, Cambridge, United Kingdom.
53 Departments of Molecular Microbiology and Pathology & Immunology, Washington University in St. Louis School of Medicine, St. Louis, Missouri, United States of America.


An ideal model system to study antiviral immunity and host-pathogen co-evolution would combine a genetically tractable small animal with a virus capable of naturally infecting the host organism. The use of C. elegans as a model to define host-viral interactions has been limited by the lack of viruses known to infect nematodes. From wild isolates of C. elegans and C. briggsae with unusual morphological phenotypes in intestinal cells, we identified two novel RNA viruses distantly related to known nodaviruses, one infecting specifically C. elegans (Orsay virus), the other C. briggsae (Santeuil virus). Bleaching of embryos cured infected cultures demonstrating that the viruses are neither stably integrated in the host genome nor transmitted vertically. 0.2 μm filtrates of the infected cultures could infect cured animals. Infected animals continuously maintained viral infection for 6 mo (~50 generations), demonstrating that natural cycles of horizontal virus transmission were faithfully recapitulated in laboratory culture. In addition to infecting the natural C. elegans isolate, Orsay virus readily infected laboratory C. elegans mutants defective in RNAi and yielded higher levels of viral RNA and infection symptoms as compared to infection of the corresponding wild-type N2 strain. These results demonstrated a clear role for RNAi in the defense against this virus. Furthermore, different wild C. elegans isolates displayed differential susceptibility to infection by Orsay virus, thereby affording genetic approaches to defining antiviral loci. This discovery establishes a bona fide viral infection system to explore the natural ecology of nematodes, host-pathogen co-evolution, the evolution of small RNA responses, and innate antiviral mechanisms.

Author Summary

The nematode C. elegans is a robust model organism that is broadly used in biology. It also has great potential for the study of host-microbe interactions, as it is possible to systematically knockout almost every gene in high-throughput fashion to examine the potential role of each gene in infection. While C. elegans has been successfully applied to the study of bacterial infections, only limited studies of antiviral responses have been possible since no virus capable of infecting any Caenorhabditis nematode in laboratory culture has previously been described. Here we report the discovery of natural viruses infecting wild isolates of C. elegans and its relative C. briggsae. These novel viruses are most closely related to the ssRNA nodaviruses, but have larger genomes than other described nodaviruses and clearly represent a new taxon of virus. We were able to use these viruses to infect a variety of laboratory nematode strains. We show that mutant worms defective in the RNA interference pathway, an antiviral system known to operate in a number of organisms, accumulate more viral RNA than wild type strains. The discovery of these viruses will enable further studies of host-virus interactions in C. elegans and the identification of other host mechanisms that counter viral infection.


Model organisms such as D. melanogaster (Hao et al., 2008; Sabin et al., 2009) and C. elegans (Kim et al., 2002; Powell et al., 2009) have been increasingly used in recent years to examine features of the host immune system and host-pathogen co-evolution mechanisms, due to the genetic tractability and ease of manipulation of these organisms. A prerequisite to fully exploit such models is the identification of an appropriate microbe capable of naturally infecting the host organism. Analysis in C. elegans of bacterial pathogens such as Pseudomonas, Salmonella, or Serratia has been highly fruitful, in some instances revealing the existence of innate immune pathways in C. elegans that are also conserved in vertebrates (Kim et al., 2002). The recent report of natural infections of C. elegans intestinal cells by microsporidia makes it a promising model for microsporidia biology (Troemel et al., 2008). Efforts to use C. elegans to understand anti-viral innate immunity, however, have been hampered by the lack of a natural virus competent to infect and replicate in C. elegans.

In the absence of a natural virus infection system, some efforts to define virus-host responses in C. elegans have been pursued using artificial methods of introducing viruses or partial virus genomes into animals (Liu et al., 2006; Lu et al., 2005). For example, the use of a transgenic Flock House virus RNA1 genome segment has clearly established a role for RNAi in counteracting replication of Flock House virus RNA (Lu et al., 2005) and has defined genes essential for the RNAi response (Lu et al., 2009). However, this experimental system can only examine replication of the viral RNA and is fundamentally unable to address the host response to other critical aspects of the virus life cycle such as virus entry, virion assembly, or egress. The ability of a host to target steps other than genome replication to control viral infections is highlighted by recent discoveries such as the identification of tetherin, which plays a critical role at the stage of viral egress by blocking the release of fully assembled HIV virions from infected human cells (Neil et al., 2008). Furthermore, the artificial systems used to date for analysis of virus-nematode interactions cannot be used to examine transmission dynamics of virus infection. These limitations underscore the need to establish an authentic viral infection and replication system in nematodes.

Natural populations of C. elegans have proven hard to find until recent years. The identification of C. elegans habitats and the development of simple isolation methods (MAF, unpublished) (Barrière and Félix, 2006) has now enabled extensive collection of natural isolates of C. elegans. Here we report the discovery of natural populations of C. elegans and of its close relative C. briggsae that display abnormal morphologies of intestinal cells. These abnormal phenotypes can be maintained in permanent culture for several months, without detectable microsporidial or bacterial infection. We show that these populations are infected by two distinct viruses, one specific for C. elegans (Orsay virus), one for C. briggsae (Santeuil virus). These viruses resemble viruses in the Nodaviridae family, with a small, bipartite, RNA (+sense) genome. Infection by each virus is transmitted horizontally. In both nematode species, we find intraspecific variation in sensitivity to the species-specific virus. We further show that infected worms mount a small RNA response and that RNAi mechanisms act as antiviral immunity in nematodes. Finally, we demonstrate that the C. elegans isolate from which Orsay virus was isolated is incapable of mounting an effective RNAi response in somatic cells. We thus find natural variation in host antiviral defenses. Critically, these results establish the first experimental viral infection system in C. elegans suitable for probing all facets of the host antiviral response.


Natural Viral Infections of C. briggsae and C. elegans

From surveys of wild nematodes from rotting fruit in different regions of France, multiple Caenorhabditis strains were isolated that displayed a similar unusual morphology of the intestinal cells and no visible pathogen by optical microscopy. Intestinal cell structures such as storage granules disappeared (Figures A13-1A–J, A13-2A–C) and the cytoplasm lost viscosity and became fluid (Figure A13-1B, I), moving extensively during movement of the animal. The intestinal apical border showed extensive convolutions and intermediate filament disorganization (Figures A13-1A, A13-2H, as described in some intermediate filament mutants [Hüsken et al., 2008]). Multi-membrane structures were sometimes apparent in the cytoplasm (Figure A13-1C). Elongation of nuclei and nucleoli, and nuclear degeneration, were observed using Nomarski optics, live Hoechst 33342 staining, and electron microscopy (Figures A13-1E–H, A13-2D–F). Finally, some intestinal cells fused together (Figure A13-1I). This suite of symptoms was first noticed during sampling of C. briggsae. Indeed, more individuals appeared affected in C. briggsae than in C. elegans cultures, and to a greater extent (Figure A13-1K).

Multiple micrographs of intestinal cell infection phenotypes in wild Caenorhabditis isolates


Intestinal cell infection phenotypes in wild Caenorhabditis isolates. (A–H) C. briggsae JU1264 and (I, J) C. elegans JU1580 observed by Nomarski microscopy. (A–C, E–G, I) Infected adult hermaphrodites from the original cultures, (more...)

Transmission electron micrographs of intestinal cells of C. elegans JU1580 adult hermaphrodites


Transmission electron micrographs of intestinal cells of C. elegans JU1580 adult hermaphrodites. (A,D,G) Bleached animals. (B–C, E–F, H) Naturally infected animals. (A–C) The infection provokes a reorganization of cytoplasmic structures, (more...)

One representative, stably infected, strain of each nematode species, C. elegans JU1580 (isolated from a rotting apple in Orsay, France) and C. briggsae JU1264 (isolated from a snail on a rotting grape in Santeuil, France), were selected for detailed analysis. Bleaching of adult animals resulted in phenotype-free progeny from both strains, demonstrating that the phenotype was not vertically transmitted (embryos are resistant to the bleaching treatment) (Figure A13-1K). Addition of dead infected animals, or homogenates from infected animals after filtration through 0.2 μm filters, to plates containing previously bleached animals recapitulated the morphological phenotype, raising the possibility that a virus might play a role in inducing the morphological phenotype (Figure A13-1K). We found that the infectious agent could be passed on horizontally through live animals by incubating GFP-labeled animals (strain JU1894, Table A13-1) with 10 non-GFP-infected worms (JU1580), checking that the latter did not die before removing them 24 h later. The GFP-labeled culture displayed the intestinal symptoms after a week. One possibility is that the intestinal infectious agent is shed from the intestine through the rectum and may enter the next animal during feeding.

TABLE A13-1. Strain List.


Strain List.

In support of the hypothesis that these wild Caenorhabditis were infected by a virus, small virus-like particles of approximately 20 nm diameter were visible by electron microscopy of the intestinal cells (Figures A13-2H, S1). Such particles were not observed in bleached animals, nor in C. elegans animals infected by bacteria, which showed a strong reduction of intestinal cell volume (strain JU1409, unpublished data).

While a clear morphological phenotype was visible by microscopy, infection did not cause a dramatic decrease in adult longevity (unpublished data), nor a change in brood size (Figure S2A,B). However, progeny production was significantly slowed down during adulthood, most clearly in the infected C. briggsae JU1264 isolate compared to the uninfected control (Figure S2D).

Molecular Identification of Two Divergent Viruses

An unbiased high-throughput pyrosequencing approach was used to determine whether any known or novel viruses were present in the animals. From JU1264, 28 unique sequence reads were identified initially that shared 30%–48% amino acid sequence identity to known viruses in the family Nodaviridae. Nodaviruses are bipartite positive strand RNA viruses. The RNA1 segment of all previously described nodaviruses is ~3.1 kb and encodes ORF A, the viral RNA-dependent RNA polymerase. Some nodaviruses also encode ORFs B1/B2 at the 3′ end of the RNA1 segment. The B1 protein is of unknown function while the B2 protein is able to inhibit RNAi (Li et al., 2002). The RNA2 segment of all previously described nodaviruses is ~1.4 kb and possesses a single ORF encoding the viral capsid protein. Assembly of the initial JU1264 pyrosequencing reads followed by additional pyrosequencing, RT-PCR, 5′ RACE, and 3′ RACE yielded two final contigs, which were confirmed by sequencing of overlapping RT-PCR amplicons. The two contigs corresponded to the RNA1 and RNA2 segments of a novel virus. The first contig (3,628 nt) encoded a predicted open reading frame of 982 amino acids that shared 26%–27% amino acid identity to the RNA-dependent RNA polymerase of multiple known nodaviruses by BLAST alignment. All known nodavirus B2 proteins overlap with the C-terminus of the RNA-dependent RNA polymerase and are encoded in the +1 frame relative to the polymerase. No open reading frame with these properties was predicted in the 3′ end of the RNA1 segment. The second contig of 2,653 nt, which was presumed to be the near-complete RNA2 segment, encoded at its 5′ end a predicted protein with ~30% identity to known nodavirus capsid proteins (Figure A13-3A). This contig was ~1 kb larger than the RNA2 segment of all previously described nodaviruses and appeared to encode a second ORF of 332 amino acids at the 3′ end. This second predicted ORF, named ORF δ, had no significant BLAST similarity to any sequence in Genbank.

A three panel illustration of genomic organization and phylogenetic analysis of novel viruses


Genomic organization and phylogenetic analysis of novel viruses. (A) Schematic of genomic organization of Santeuil virus. Predicted open reading frames are displayed in gray boxes. Red bar indicates sequence used to generate double-stranded DNA probes (more...)

Pyrosequencing of JU1580 demonstrated the presence of a second distinct virus that shared the same general genomic organization as the virus detected in JU1264. Partial genome sequences of 2,680 nucleotides of the RNA1 segment and 2,362 nucleotides of the RNA2 segment were obtained and confirmed by RT-PCR. The putative RNA-dependent RNA polymerases of the two viruses shared 44% amino acid identity by BLAST analysis. Like the virus in JU1264, the virus in JU1580 was predicted to encode a capsid protein at the 5′ end of the RNA2 segment as well as a second ORF in the 3′ half of the RNA2 segment. The ORF δ encoded proteins from the two viruses shared 37% amino acid identity when compared using BLAST. Thus, the genomic organization of these two viruses, while sharing substantial commonality with known nodaviruses, also displayed novel genomic features. Phylogenetic analysis of the predicted RNA polymerase and capsid proteins demonstrated that the virus sequences in JU1580 and JU1264 were highly divergent from all previously described nodaviruses and most closely related to each other (Figure A13-3B,C). We propose that these sequences represent two novel virus species and have tentatively named them Santeuil virus (from JU1264) and Orsay virus (from JU1580).

Viral Detection and Confirmation of Viral Infection

RT-PCR assays were used to analyze RNA extracted from JU1580, JU1264, their corresponding bleached control strains JU1580bl and JU1264bl, and the same strains following reinfection with viral filtrates. Orsay virus RNA could be detected by RT-PCR in the original JU1580 culture, disappeared in the bleached strains, and stably reappeared following re-infection with the corresponding viral filtrate (Figure A13-4A). The same pattern applied for the Santeuil virus and JU1264 animals (Figure A13-4B). JU1580 and JU1264 cultures continuously propagated for 6 mo by transferring a piece of agar (approx. 0.1 cm3) to the next plate twice a week continued to yield positive RT-PCR results (unpublished data).

Five examples of molecular evidence of viral infection


Molecular evidence of viral infection. (A) RT-PCR detection of the Orsay virus in the original JU1580 wild isolate (I), after bleaching (bl) and after re-infection by a 0.2 μM filtrate after 7 d (RI1) and 3 wk (RI2) of culture at 23°C. (more...)

Northern blotting confirmed the presence of Orsay and Santeuil virus RNA sequences in the infected animals. Hybridization with a DNA probe targeting the RNA1 segment of Santeuil virus yielded multiple bands in JU1264 animals but not in the corresponding bleached control strain. The strongest band detected migrated between 3.5 and 4 kb consistent with the 3,628 nt sequence we generated for the putative complete RNA1 segment (Figure A13-4C). Multiple higher molecular weight bands were also detected that may represent multimeric forms of the viral genomic RNAs, which have previously been described for some nodaviruses (Ball, 1994; Johnson et al., 2000). Northern blotting with a probe targeting the RNA2 segment (Figure A13-4C) yielded a major band that migrated at ~2.5 kb as well as fainter, higher molecular weight bands. Similar patterns were seen for both segments of Orsay virus (unpublished data).

To demonstrate virus replication in the infected animals, we performed Northern blotting using strand-specific riboprobes. For positive sense RNA viruses like nodaviruses, the negative sense RNA is only synthesized during active viral replication. It is not packaged in virions and typically exists in much lower quantities than the positive strand. Robust levels of the positive strand of the Santeuil virus RNA1 segment were detected (Figure A13-4D). Northern blotting with a riboprobe designed to hybridize to the negative sense strand detected a band of ~3.5 kb as well as higher molecular weight bands and a lower band of ~1.5 kb (~30-fold longer exposure than the positive sense blot; Figure A13-4D). While the precise nature of the high and low molecular weight species remains to be defined, the presence of multiple RNA species of negative sense polarity in JU1264 animals demonstrates bona fide replication of Santeuil virus in JU1264.

In order to determine the localization of Orsay viral RNA in infected animals, we performed RNA fluorescent in situ hybridization (FISH) using a probe complementary to the positive sense RNA1 segment of Orsay virus. Viral RNA was robustly detected in intestinal cells of JU1580bl animals infected 4 d previously with Orsay viral filtrate (Figure A13-4E, top panels). Interestingly, some animals also showed localization of viral RNA in the somatic gonad (Figure A13-4E, middle panels). JU1580bl animals not treated with the viral filtrate displayed no fluorescent signal (Figure A13-4E, bottom panels).

High Specificity of Infection by Orsay and Santeuil Nodaviruses

We tested whether the Orsay and Santeuil viruses could cross-infect the cured wild isolate of the other Caenorhabditis species, as well as the reference laboratory strains of C. elegans and C. briggsae. The Orsay and Santeuil viruses could only infect strains of C. elegans and C. briggsae, respectively (Figure A13-5A,B and Figure S3). Furthermore, each virus showed intraspecific specificity of infection. Indeed, we could not detect any replication of the Santeuil virus in C. briggsae AF16. The N2 laboratory C. elegans strain, while infectable by Orsay virus, appeared to be more resistant to viral infection than JU1580bl. Quantitative RT-PCR demonstrated that viral RNA accumulated in the N2 strain at levels above background but 50–100-fold lower than in JU1580bl (Figure A13-5C).

A three panel illustration of the results of testing for the specificity of infection by the Orsay and Santeuil viruses


Specificity of infection by the Orsay and Santeuil viruses. (A) Specificity of infection by the Orsay virus. Each Caenorhabditis strain (name indicated below the gel) was mock-infected (−) or infected with a virus filtrate (+). RT-PCR on cultures (more...)

Small RNA Response upon Infection

One key defense mechanism of plants and animals against RNA viruses is the small RNA response (Aliyari and Ding, 2009). We therefore determined by deep sequencing of small RNAs whether the infected animals produced small RNAs in response to viral infection. We generated small RNA libraries from mixed-stage JU1580 animals infected with the Orsay virus and from the bleached control strain and analyzed them using Illumina/Solexa high-throughput sequencing. These libraries represent small RNAs of 18–30 nucleotides in length independent of their 5′ termini. Small RNAs from infected JU1580 animals that mapped to viral RNA1 or RNA2 and had no match to the C. elegans genome are shown in Figure A13-6A and A13-6B, respectively. Of a total of 1,149,633 unambiguously mapped unique sequences, almost 2% (21,392) mapped to the two RNA segments of Orsay virus. Such RNAs were virtually absent from a library generated from bleached JU1580 animals (<0.001%) (unpublished data). Small RNAs that corresponded to the sense strand of the viral RNAs had a broad length distribution and no 5′ nucleotide preference. These sense small RNAs might represent Dicer cleavage products or other viral RNA degradation intermediates. In contrast, most antisense small RNAs were 22 nt long and showed a bias for guanidine as the first base (Figure A13-6A, B). This signature is reminiscent of a class of secondary RNAs named 22G RNAs that are thought to be downstream effectors of exogenous and endogenous small RNA pathways (Gu et al., 2009; Pak and Fire, 2007; Sijen et al., 2007). Such RNAs are not associated with transgenes expressed in the soma of C. elegans from extrachromosomal arrays (Pak and Fire, 2007) nor generally a feature associated with active transcription of endogenous genes (Gu et al., 2009; Pak and Fire, 2007; Sijen et al., 2007). These data suggest that JU1580 animals raise a small RNA response to viral infection. We also detected small RNAs of both sense and antisense polarity that mapped to the Santeuil virus genome in the JU1264 wild C. briggsae isolate but not in bleached animals (unpublished data).

A two panel illustration of small RNAs produced upon viral infection.


Small RNAs produced upon viral infection. Number of unique sequences obtained by Illumina/Solexa high-throughput sequencing of a 5′-independent small RNA library from JU1580 matching a given position in the Orsay virus segment RNA1 (A) or RNA2 (more...)

RNAi Competency of the Host Is an Antiviral Defense

As viral infection appears to invoke a small RNA response in JU1580 animals, we next tested if mutations in small RNA pathways could affect replication of the Orsay virus. Orsay virus infection of the N2 reference strain was reduced compared to JU1580, as assayed by viral RNA qRT-PCR (Figure A13-5C) and infection symptoms (Figures S3A and A13-7B). Mutation of the rde-1 gene—which encodes an Argonaute protein required for the initiation of exogenous RNAi (Tabara et al., 1999)—in the N2 background increased viral RNA abundance and morphological symptoms to levels comparable to JU1580 using both assays (Figure A13-7A,B). The infected rde-1 strain produced infectious viral particles, as reinfection of the cured JU1580 strain by filtrates of infected rde-1 animals yielded positive RT-PCR results (unpublished data). In addition, mutation of other exogenous RNAi pathway genes including rde-2, rde-4, and mut-7 (Table A13-1) also led to increased viral RNA accumulation as determined by quantitative RT-PCR (Figure A13-7A). We thus conclude that RNAi mechanisms provide antiviral immunity to C. elegans and that Orsay virus infection of mutant animals can be used to define genes important for antiviral defense.

Two graphs showing that RNAi deficient mutants of C. elegans can be infected by the Orsay virus


RNAi-deficient mutants of C. elegans can be infected by the Orsay virus. (A) JU1580bl, N2, rde-1(ne219) (n = 10 independent replicates each), rde-2(ne221), rde-4(ne301), and mut-7(pk204) (n = 5 independent replicates each) were tested by qRT-PCR for infection (more...)

Natural Variation in Somatic RNAi Efficiency in C. elegans

Since a functional RNAi pathway limits the accumulation of viral RNA in the N2 reference strain, we assessed the exogenous RNAi competency of the bleached culture of JU1580 (JU1580bl) relative to the reference N2 strain. Using external application of dsRNAs by feeding, JU1580bl was found to be highly resistant to RNAi of a somatically expressed gene (unc-22) but competent for RNA inactivation of a germline-expressed gene (pos-1) (Figure A13-8A, B). C. elegans wild isolates, such as CB4856, were previously known to be variably sensitive to germline RNAi (Tijsterman et al., 2002). Here we thus observed for the first time a large variation in sensitivity to somatic RNAi, which does not correlate with germline RNAi sensitivity and thus cannot be due to inability to intake dsRNA from the intestinal lumen. We confirmed insensitivity to somatic RNAi of the JU1580bl isolate using a ubiquitously expressed GFP transgene (let-858::GFP), which was inactivated by GFP RNAi in the C. elegans N2 reference background, but only modestly repressed in the JU1580bl isolate (Figure A13-8C, Figure S4). We confirmed that the insensitivity to somatic RNAi also applied when unc-22 dsRNAs were directly injected into the syncytial germline (Figure A13-8D). Therefore, the robust accumulation of Orsay virus RNA observed in infected JU1580 may be rendered possible in part by the partial defect in the somatic RNAi pathway of this wild isolate. The accumulation of small RNAs in response to the virus in infected JU1580 indicates, however, that its RNAi response is at least partially active in some tissues, perhaps including the germline.

Five graphs showing the natural variation in somatic RNAi efficacy in C. elegans


Natural variation in somatic RNAi efficacy in C. elegans. (A) Somatic RNAi was tested using bacteria expressing dsRNA specific for the unc-22 gene (acting in muscle [Fire et al., 1998]). The percentage of animals with the corresponding twitcher phenotype (more...)

The germline RNAi competence of JU1580 together with the presence of Orsay virus RNA1 in the somatic gonad raises the possibility that vertical transmission of viral infection could occur in a strain defective for germline RNAi. To examine this possibility, JU1580bl, N2, and rde-1 were exposed to Orsay virus filtrate. A subset of adult animals from each plate was bleached and their adult offspring collected 4 d later. No evidence for vertical transmission was observed by qRT-PCR for Orsay virus RNA in any strain (Figure S5).

We further tested the efficiency of the RNAi response in six other wild C. elegans isolates representative of its worldwide diversity (Figure A13-8A, B, D). Our results suggest that the somatic RNAi response varies quantitatively in C. elegans and is not correlated with germline RNAi sensitivity. Under experimental conditions that yield efficient infection of JU1580bl by Orsay virus, none of the other strains yielded significant levels of morphological symptoms (Figure A13-8E). Only JU1580bl and JU258 were positive by RT-PCR (unpublished data). Thus, factors other than RNAi competency also contribute to the sensitivity of C. elegans to the Orsay virus.


The First Viruses Infecting Caenorhabditis

Here we report the first molecular description, to our knowledge, of viruses that naturally infect nematodes in the wild. The two novel viruses we identified, while clearly related to known nodaviruses, possess unique genomic features absent from all other previously described nodaviruses. These viruses may thus define a novel genus within the family Nodaviridae or may even represent prototype species of a new virus family (pending formal classification by the International Committee for the Taxonomy of Viruses). The same range of intestinal symptoms was observed in animals that were infected by the Orsay and Santeuil viruses, further suggesting that these viral infections were causing the cellular symptoms. We observed putative viral particles of the size expected for nodaviruses, and a strong RNA FISH signal in intestinal cells and the somatic gonad of infected animals demonstrating that the virus is present intracellularly. It is likely that further sampling of natural populations of Caenorhabditis will yield other viruses of this and other groups. In fact, these symptoms were seen repeatedly in C. briggsae animals sampled from different locations in France, and in one instance, a Santeuil virus variant has been identified (unpublished data).

A characteristic feature of these two viruses is the presence of the novel ORF δ. Conservation of sequence length and identity of the ORF δ in these two viruses, and the absence of this ORF in all other described nodaviruses, suggests that this predicted protein is likely to be important for the ability of the virus to infect or replicate in nematodes. Its function is currently unknown, but it is tempting to speculate that this protein may play a role in antagonizing an innate antiviral pathway.

A Laboratory Viral Infection of a Small Model Animal

The infection of C. elegans by the Orsay nodavirus provides an exciting prospect for studies in virology, host cell biology, and antiviral innate immunity. Genetic screens to identify anti-viral factors in model organisms have been limited in large part by the lack of natural infection systems. Although Drosophila has been used with great success to examine host-virus interactions for various insect viruses (Huszar and Imler, 2008) and influenza (Hao et al., 2008), none of these studies has examined viral infection of the host organism by natural transmission routes. Here we present a novel association between C. elegans and a virus that persists in culture through horizontal transmission, causing high damage in intestinal cells yet remarkably little effect on the animal, which continues moving, eating, and producing progeny, although at a lower rate.

De novo infection of naïve animals can be affected by the simple addition of either dead infected animals or homogenized lysates made from infected animals to culture dishes. This is sufficient to seed sustained complete cycles of viral replication, shedding, and infection. With this system, it is now possible to embark on whole genome genetic screens to identify host factors that block any facet of the viral life cycle. Using the current experimental conditions, infection of JU1580bl and rde-1 mutants in N2 background was highly reproducible. The fact that the reference wild type N2 strain may only sustain a very low yet detectable viral titer makes it a particularly favorable genetic background in which to screen for genes involved in interaction with the virus.

The intestine is a tissue that is particularly exposed to microbes through ingestion, and is a main entry point for pathogens in C. elegans as in other animals. In C. elegans, the intestinal cells are large and easily amenable to observations by optical microscopy. The viral parasites affect the organization of the polarized epithelial intestinal cells and will likely provide interesting mechanisms and tools to study their cell biology. Clear reorganization occurs in the intermediate filaments that line the apical brush border, as well as in the lipid storage granules, the nuclear membrane, and other intracellular compartments.

The abnormal state of the intestinal cells may slow down progeny production by decreasing the food intake. Alternatively, the presence of viral RNA in the somatic gonad may explain the delay in progeny production, although no gonadal cellular phenotypes have been observed. The presence of viral RNA in the somatic gonad is particularly interesting given the lack of vertical transmission.

Targeted Mutant Screens with Orsay Virus Confirm a Role for RNAi in Antiviral Defense

Although prior studies have clearly demonstrated a role for C. elegans RNAi in counteracting viral infection, these studies utilized either a transgenic system of viral RNA expression (Lu et al., 2005) or primary culture cells (Schoot et al., 2005; Wilkins et al., 2005). The observed susceptibility of Orsay virus RNA to RNAi processing in JU1580 animals provides the first evidence in a completely natural setting, without any artificial manipulations, that RNAi serves an antiviral role in nematodes. Coupled to the increase in accumulation of Orsay virus RNA in RNAi pathway mutant strains as compared to wild type N2, these studies demonstrate that the RNAi pathway is an important antiviral defense against Orsay virus. Moreover, these results demonstrate the feasibility of identifying antiviral genes or pathways in this experimental infection system. The mechanism by which the animals prevent transmission to their offspring is unclear, but our initial results with rde-1 mutants suggest that perturbing germline RNAi is not sufficient to enable vertical transmission.

Evolution of Viral Sensitivity and Specificity in Natural Populations

The quantitative difference in Orsay nodavirus sensitivity between the N2 and JU1580 wild C. elegans genetic backgrounds will allow the identification of a set of host genes that modulate viral sensitivity during evolution of natural host populations. Based on the defect in exogenous RNAi of the JU1580 strain, we speculate that this set will include, but is unlikely to be limited to, genes involved in exogenous RNAi pathways. Support for the role of other genes outside the RNAi pathway comes from our data on natural isolates. Despite the fact that the magnitude of the somatic RNAi defect of the natural isolate PS2025 was comparable to that of JU1580, no evidence of viral RNA accumulation or morphological symptoms was observed following addition of Orsay virus filtrate. Whether PS2025 lacks one or more crucial receptors for viral infection or has alternative antiviral pathways that suppress viral replication is currently unknown.

In addition, the Orsay and Santeuil viruses appear to specifically infect C. elegans and C. briggsae, respectively. Moreover, the C. elegans rde-1 mutation in the N2 background confers susceptibility to the Orsay virus, but not to the Santeuil virus (Figure S3C). The two viruses thus provide a system to study host-parasite specificity and its evolution. With the isolation of additional variants of each virus (our unpublished data), viral evolution studies can also be undertaken. Host-parasite evolutionary and ecological interactions can thus be explored at two evolutionary scales, within and between species of both host and parasite. The rapid life cycle of C. elegans also allows experimental evolution in the laboratory (Azevedo et al., 2002; Schulte et al., 2010). This model system, which can include both natural and engineered variants of both virus and host, is thus favorable for combining studies of host-pathogen co-evolution in the laboratory and in natural populations.

Materials and Methods

Nematode Field Isolation

Caenorhabditis nematodes were isolated on C. elegans culture plates seeded with E. coli strain OP50 using the procedures described in (Barrière and Félix, 2006). JU1264 was isolated from a snail collected on rotting grapes in Santeuil (Val d’Oise, France) on 14 Oct 2007. JU1580 was isolated from a rotting apple sampled in Orsay (Essonne, France) on 6 Oct 2008. When required, cultures were cleared of natural bacterial contamination by frequent passaging of the animals and/or antibiotic treatment (LB plates with 50 μg/ml tetracycline, ampicilline, or kanamycine for 1 h). Infected cultures were kept frozen at −80°C and in liquid N2 as described in Wood (1988). Bleaching was performed as in Wood (1998).

Light Microscopy

When observed with a transillumination dissecting microscope, infected animals displayed a paler intestine than healthy worms. This lack of intestinal coloration occurred all along the entire intestinal tract in C. briggsae JU1264 and preferentially in the anterior intestinal tract in C. elegans JU1580. Intestinal cells were observed with Nomarski optics with a 63× or 100× objective. The four symptoms used for scoring were 1, the disappearance of gut granules in at least part of a cell; 2, degeneration of the nucleus including a very elongated nuclear or nucleolus (when the rest of the nucleus has degenerated) or the apparent disappearance of the nucleus; 3, the loss of cytoplasmic viscosity visible as a very fluid flow of cytosol within the cell; and 4, the fusion of intestinal cells. Some of these traits may sometimes appear in uninfected animals. We systematically tested for a significant increase after infection of the proportion of animals with symptoms (Fisher’s exact test). Note that some of these symptoms can also be caused by microsporidial and bacterial infections. Thus, the diagnostic of a viral infection based on the cellular symptoms requires an otherwise clean culture.

Live Hoechst 33342 Staining of Nuclei

Animals were washed off a culture plate in 10 ml of ddH20, pelleted and incubated in 10 ml of 10 μg/ml Hoechst 33342 in ddH20 for 45 min with soft agitation, protecting the tube from light with an aluminum foil. The animals were then pelleted and transferred to a new culture plate seeded with E. coli OP50. After 2 h, they were mounted and observed with a fluorescence microscope.

Electron Microscopy

A few adults were washed in 0.2 ml of M9 solution, suspended in 2% para-formaldehyde +0.1% glutaraldehyde, and cut in two on ice under a dissecting microscope for better reagent penetration (Hall, 1995). Worm pieces were then resuspended overnight in 2% OsO4 at 4°C, washed, embedded in 2% low melting point agar, dehydrated in solutions of increasing ethanol concentrations, and embedded in resin (Epon-Araldite). High-pressure freezing was performed using a Leica PACT2 high-pressure freezer (Weimer, 2006).

Progeny Counts

The time course was started by isolating single L4 larvae for C. elegans JU1580 and single L3 larvae for C. briggsae JU1264. The parent animal then transferred every day to a new plate until the end of progeny production. The plates were incubated at 20°C for 2 d and kept at 4°C until scoring. The few cases where the parent died before the end of its laying period were not included. Some progeny died as embryos in both infected and non-infected cultures (non-significant effect of treatment; unpublished data). The timing of progeny production was analyzed in R using a Generalized Linear Model using infection status, day, individual (nested in infection status), and Infection Status×Day as explanatory variables, assuming a Poisson response variable and a log link function. Individual, day and Infection Status×Day were the significant explanatory variables for both JU1264 and JU1580 (p<0.001).

Infectious Filtrate Preparation and Animal Infections

Nematodes were grown on 10 plates (90 mm diameter) until just starved, resuspended in 15 ml of 20 mM Tris-Cl pH 7.8, and pelleted by low-speed centrifugation (5,000 g). The supernatant was centrifuged twice at 21,000 g for 5 min (4°C) and pellets discarded. The supernatant was passed on a 0.2 μm filter. 55 mm culture plates were prepared with 2–5 young adults of N2, rde-1(ne219), or JU1580bl. At the same time (Figures A13-1K, A13-4A,B, A13-5, A13-7B, and S3), or the following day (Figures A13-5C, A13-7A), 30 μl of infectious filtrate was pipetted onto the bacterial lawn. The cultures were incubated at 20°C except otherwise indicated. When both C. elegans and C. briggsae were grown in parallel, an incubation temperature of 23°C (indicated in the figure legends) was used so that both species developed at similar speeds. Maintenance over more than 4 d after re-infection was performed by transferring a piece of agar (approx. 0.1 cm3) every 2–3 d to a new plate with food.

High-Throughput Sequencing

Phenol-chloroform purified DNA and RNA from infected JU1580 and JU1264 animals were subject to random PCR amplification as described (Wang et al., 2003). The amplicons were then pyrosequenced following standard library construction on a Roche Titanium Genome Sequencer. Raw sequence reads were filtered for quality and repetitive sequences. BLASTn and BLASTx were used to identify sequences with limited similarity to known viruses in Genbank. Contigs were assembled using the Newbler assembler. To confirm the assembly, primers for RT-PCR were designed to amplify overlapping fragments of ~1.5 kb. Amplicons were cloned and sequenced.

5′ and 3′ RACE

5′ RACE was performed according to standard protocols (Invitrogen 5′ RACE kit). 3′ RACE was performed by first adding a polyA tail using PolyA polymerase (Ambion) and then using Qiagen 1-step RT-PCR kit with gene specific primers and an oligo-dT-adapter primer. Products were cloned into pCR4 and sequenced using standard Sanger chemistry.

Small RNA Sequencing

4–6 90 mm plates with 15–20 adults (JU1580 or bleached JU1580) were grown for 4 d at 20°C. Mixed stage animals from all plates were collected, pooled, and frozen at −80°C. Total RNA was extracted using the mirVana miRNA isolation kit (Ambion). Small RNAs were size selected to 18–30 bases by denaturing polyacrylamide gel fractionation. A cDNA library that did not depend on 5′-monophosphates was constructed by tobacco acid pyrophosphatase treatment using adapters recommended for Solexa sequencing as described previously (Das et al., 2008). Each sample was labeled with a unique four base pair barcode. cDNA was purified using the NucleoSpin Extract II kit (Macherey & Nagel). Small RNA libraries were sequenced using the Illumina/Solexa GA2 platform (Illumina, Inc., San Diego, CA). Fastq data files were processed using custom Perl scripts. Reads with missing bases or whose first four bases did not match any of the expected barcodes were excluded. Reads were trimmed by removing the first four nucleotides and any 3′ As. The obtained inserts were collapsed to unique sequences, retaining the number of reads for each sequence. Sequences in the expected size range (18–30 nucleotides) were aligned to the C. elegans genome (WS190) downloaded from the UCSC Genome Browser website ( (Kent et al., 2002) and the JU1580 partial virus genome using the ELAND module within the Illumina Genome Analyzer Pipeline Software, v0.3.0. Figure A13-6 is based on unique sequences (multiple reads of the same sequence were collapsed) with perfect and unambiguous alignment to the Orsay virus genome. Small RNA sequence data were submitted to the Gene Expression Omnibus under accession number GSE21736.

Neighbor-Joining Phylogenetic Analysis

The predicted amino acid sequences from Orsay and Santeuil nodaviruses were aligned using ClustalW to the protein sequences of the following nodaviruses. Capsid Protein: Barfin1 flounder nervous necrosis virus NC_013459, Barfin2 flounder virus BF93Hok RNA2 NC_011064, Black beetle virus NC_002037, Boolarra virus NC_004145, Epinephelus tauvina nervous necrosis virus NC_004136, Flock house virus NC_004144, Macrobrachium rosenbergii nodavirus RNA-2 NC_005095, Nodamura virus RNA2 NC_002691, Pariacoto virus RNA2 NC_003692, Redspotted grouper nervous necrosis virus NC_008041, Striped Jack nervous necrosis virus RNA2 NC_003449, Tiger puffer nervous necrosis virus NC_013461, Wuhan nodavirus ABB71128.1, and American nodavirus ACU32796.1. 1,000 bootstrap replicates were performed.

RNA Polymerase: Barfin flounder nervous necrosis virus YP_003288756.1, Barfin flounder virus BF93Hok YP_002019751.1, Black beetle virus YP_053043, Boolarra virus NP_689439, Epinephelus tauvina nervous necrosis virus NP_689433.1, Flock house virus NP_689444.1, Nodamura virus NP_077730, Pariacoto virus NP_620109.1, Redspotted grouper nervous necrosis virus YP_611155.1, Striped Jack nervous necrosis virus NP_599247.1, Tiger puffer nervous necrosis virus YP_003288759.1, Macrobrachium rosenbergii nodavirus NP_919036.1, Wuhan_Nodavirus AAY27743, and American nodavirus SW-2009a ACU32794.1. 1,000 bootstrap replicates were performed.


Nematodes from two culture plates were resuspended in M9 and then washed three times in 10 ml M9. RNA was extracted using Trizol (Invitrogen) (5–10 vol:vol of pelleted worms) and resuspended in 20 μl in RNAse-free ddH2O. 5 μg of RNA were reverse transcribed using SuperscriptIII (Invitrogen) in a 20 μl volume. 5 μl were used for PCR in a 20 μl volume (annealing temperature 60°C, 35 cycles). For the Orsay nodavirus, the reverse transcription used the GW195 primer (5′ GACGCTTCCAAGATTGGTATTGGT) and the PCR oTB3 (5′ CGGATTCTCGACATAGTCG) and oTB4 (5′GTAGGCGAGGAAGGAGATG). For the Santeuil nodavirus, reverse transcription used oTB6RT (5′ GGTTCTGGTGGTGATGGTG) and PCR oTB5 (5′ GCGGATGTTCTTCACGGAC) and oTB6 (5′ GTCAGTAGCGGACCAGATG).

One-Step RT-PCR

Animals from one 55 mm culture plate plus viral filtrate (see infection procedure) were washed twice in M9. RNA was extracted using 1 ml Trizol (Invitrogen) and resuspended in 10 μl DEPC-treated H2O. 0.1 μl was used for RT-PCR using the OneStep RT-PCR Kit (Qiagen). Primers annealed to viral RNA1 (GW194 and GW195).


cDNA was generated from 1 μg total RNA with random primers using Superscript III (Invitrogen). cDNA was diluted to 1:100 for qRT-PCR analysis. qRT-PCR was performed using either QuantiTect SYBR Green PCR (Qiagen) or ABsolute Blue SYBR Green ROX (Thermo Scientific). The amplification was performed on a 7300 Real Time PCR System (Applied Biosystems). Each sample was normalized to ama-1, and then viral RNA1 (primers GW194: 5′ ACC TCA CAA CTG CCA TCT ACA and GW195: 5′ GAC GCT TCC AAG ATT GGT ATT GGT) levels were compared to those present in re-infected bleached JU1580 animals.

Northern Blotting

For Northern blots, 0.5 μg of total RNA extracted from JU1264 and JU1264bl animals were electrophoresed through 1.0% denaturing formaldehyde-MOPS agarose gels. RNA was transferred to Hybond nylon membranes and then subject to UV cross-linking followed by baking at 75°C for 20 min. Double stranded DNA probes targeting the RNA1 segment of Santeuil nodavirus (nt 1141–1634) and the RNA2 segment of Santeuil nodavirus (nt 1833–2308) were generated by random priming in the presence of α-32P dATP using the Decaprime kit (Ambion). Blots were hybridized for 4 h at 65°C in Rapid hyb buffer (GE Healthcare) and washed in 2XSSC/0.1%SDS 5 min×2 at 25°C, 1XSSC/0.1%SDS 10 min×2 at 25°C, 0.1XSSC/0.1%SDS 5 min×4 at 25°C, and 0.1XSSC/0.1%SDS 15 min×2 at 42°C and 0.1XSSC/0.1%SDS 15 min×1 at 68°C. For strand specific riboprobes, 32P labeled RNA was generated by in vitro transcription with either T7 or T3 RNA polymerase (Ambion) in the presence of α-32P UTP. The target plasmid contained a cloned region of the Santeuil nodavirus RNA1 segment (nt 523–1022) and was linearized with either PmeI or NotI, respectively. For the riboprobes, blots were hybridized at 70°C and then sequentially washed as follows: 2XSSC/0.1%SDS 5 min×2 at 68°C, 1XSSC/0.1%SDS 10 min×2 at 68°C, 0.1XSSC/0.1%SDS 10 min×2 at 68°C, and 0.1XSSC/0.1%SDS 20 min×1 at 73°C. The Santeuil RNA1 segment migrates at approximately the same position as the 28S ribosomal RNA. Under the extended exposure time (72 h) needed to visualize the negative sense genome, low levels of non-specific binding to the 28S RNA become apparent (Figure A13-4D).

RNA Interference

For pos-1 and unc-22 RNAi using bacteria as the dsRNA source, bacterial clones from the Ahringer library expressing dsRNAs (Kamath et al., 2003) (available through MRC Geneservice) were used to feed C. elegans on agar plates. For the pos-1 experiment, bacteria were concentrated 10-fold by centrifugation prior to seeding the plates. A C. briggsae Cbr-lin-12 fragment (Félix, 2007) was used as a negative control as it does not match any sequence in C. elegans. Three or four L4s were deposited on an RNAi plate, singly transferred the next day to a second RNAi plate, and their progeny scored after 2 d (pos-1) or 3 d (unc-22) at 23°C.

For unc-22 dsRNA synthesis and injection, the unc-22 fragment in the Ahringer library clone was amplified by PCR using the T7 primer and in vitro transcribed with the T7 polymerase using the Ambion MEGAscript kit, according to the manufacturer’s protocol (Ahringer, 2006). Cel-unc-22 dsRNAs were injected at 50 ng/μl into both gonadal arms of young hermaphrodite adults of the relevant strain. The animals were incubated at 20°C. The adults were transferred to a new plate individually on the next day, and the proportion of twitching progeny scored 3 d later, touching each animal with a platinum-wire pick to induce movement.

For GFP RNAi, transgenic N2 and JU1580 strains were generated expressing the ubiquitously expressed let-858::GFP and the pharyngeal marker myo-2::DsRed as an extrachromosomal array. Bacteria expressing dsRNA against GFP cDNA were used to feed animals on agar plates. An empty vector was used as a negative control. Two or three L4s were deposited on a 55 mm RNAi plate, grown at 20°C for 3 d, and the GFP/DsRed expression levels in their offspring measured using flow cytometry (Union Biometrica) as described previously (Lehrbach et al., 2009). Offspring from two RNAi plates were combined for sorting. Each combination of RNAi vector and strain was repeated in at least triplicate. GFP and DsRed intensities were obtained from 14 wormsorter runs including 3–4 replicate runs for N2 and JU1580 after treatment with GFP RNAi or empty vector. A larger proportion of N2 animals showed reporter expression compared to JU1580 animals (Figure S4, top). To control for this difference between strains, animals with no reporter expression were excluded by requiring DsRed intensities to exceed a cutoff set to the median 99th percentile from three control runs of animals with no array present (Figure S4). A linear regression model was fitted to the median log2(GFP/DsRed) intensity ratios including strain, treatment, and an interaction term as explanatory variables. The interaction term was significantly different from zero at p<0.001.

RNA Fluorescent In Situ Hybridization (FISH)

A segment of Orsay virus RNA1 was generated with primers GW194 and GW195 and cloned into pGEM-T Easy (Promega). Fluorescein labeled probe was generated from linearized plasmid using the Fluorescein RNA Labeling Mix (Roche) and MEGAscript SP6 transcription (Ambion). JU1580bl animals were infected with Orsay virus filtrate and grown for 4 d at 20°C on 90 mm plates. Control animals were grown under the same conditions in the absence of virus. In situ hybridization was performed essentially as previously described (Motohashi et al., 2006). The fluorescent RNA probe was visualized directly on an Olympus FV1000 Upright microscope.

Genbank Sequences: Accession numbers for Orsay and Santeuil virus contigs: HM030970-HM030973. Small RNA sequencing data at GEO: GSE21736.


M.A.F. thanks the caretakers of the Orsay orchard for access to it and J.-L. Bessereau, F. Duveau, R. Legouis, N. Naffakh, and B. Samuel for helpful discussions.

Author Contributions

The author(s) have made the following declarations about their contributions: Conceived and designed the experiments: MAF EAM DW. Performed the experiments: MAF AA JP GW IN TB YJ CJF MS. Analyzed the data: MAF AA JP GW IN TB GZ CJF LDG MS EAM DW. Wrote the paper: MAF AA EAM DW.


  • Wormbook, editor. Ahringer J, editor. Reverse genetics. The C elegans research community. 2006. [Accessed 29 December 2010]. Available http://www​
  • Aliyari R, Ding SW. RNA-based viral immunity initiated by the Dicer family of host immune receptors. Immunol Rev. 2009;227:176–188. [PMC free article: PMC2676720] [PubMed: 19120484]
  • Azevedo RB, Keightley PD, Lauren-Maatta C, Vassilieva LL, Lynch M, et al. Spontaneous mutational variation for body size in Caenorhabditis elegans. Genetics. 2002;162:755–765. [PMC free article: PMC1462287] [PubMed: 12399386]
  • Ball LA. Replication of the genomic RNA of a positive-strand RNA animal virus from negative-sense transcripts. Proc Natl Acad Sci U S A. 1994;91:12443–12447. [PMC free article: PMC45454] [PubMed: 7809056]
  • Barrière A, Félix M-A.Community The C. elegans Research Community, editor. Isolation of C. elegans and related nematodes. Wormbook; 2006. [PMC free article: PMC4781001] [PubMed: 18050443]
  • Das PP, Bagijn MP, Goldstein LD, Woolford JR, Lehrbach NJ, et al. Piwi and piRNAs act upstream of an endogenous siRNA pathway to suppress Tc3 transposon mobility in the Caenorhabditis elegans germline. Mol Cell. 2008;31:79–90. [PMC free article: PMC3353317] [PubMed: 18571451]
  • Félix M-A. Cryptic quantitative evolution of the vulva intercellular signaling network in Caenorhabditis. Curr Biol. 2007;17:103–114. [PubMed: 17240335]
  • Fire A, Xu S, Montgomery MK, Kostas SA, Driver SE, et al. Potent and specific genetic interference by double-stranded RNA in Caenorhabditis elegans. Nature. 1998;391:806–811. [PubMed: 9486653]
  • Gu W, Shirayama M, Conte D Jr, Vasale J, Batista PJ, et al. Distinct argonaute-mediated 22G-RNA pathways direct genome surveillance in the C. elegans germline. Mol Cell. 2009;36:231–244. [PMC free article: PMC2776052] [PubMed: 19800275]
  • Hall DH. Electron microscopy and three-dimensional image reconstruction. In: Epstein HF, Shakes DC, editors. Methods in cell biol, Caenorhabditis elegans modern biological analysis of an organism. San Diego: Academic Press; 1995. pp. 395–436.
  • Hao L, Sakurai A, Watanabe T, Sorensen E, Nidom CA, et al. Drosophila RNAi screen identifies host genes important for influenza virus replication. Nature. 2008;454:890–893. [PMC free article: PMC2574945] [PubMed: 18615016]
  • Hüsken K, Wiesenfahrt T, Abraham C, Windoffer R, Bossinger O, Leube RE. Maintenance of the intestinal tube in Caenorhabditis elegans: the role of the intermediate filament protein IFC-2. Differentiation. 2008;76:881–896. [PubMed: 18452552]
  • Huszar T, Imler JL. Drosophila viruses and the study of antiviral host-defense. Adv Virus Res. 2008;72:227–265. [PubMed: 19081493]
  • Johnson KN, Zeddam JL, Ball LA. Characterization and construction of functional cDNA clones of Pariacoto virus, the first Alphanodavirus isolated outside Australasia. J Virol. 2000;74:5123–5132. [PMC free article: PMC110865] [PubMed: 10799587]
  • Kamath RS, Fraser AG, Dong Y, Poulin G, Durbin R, et al. Systematic functional analysis of the Caenorhabditis elegans genome using RNAi. Nature. 2003;421:231–237. [PubMed: 12529635]
  • Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, et al. The human genome browser at UCSC. Genome Res. 2002;12:996–1006. [PMC free article: PMC186604] [PubMed: 12045153]
  • Kim DH, Feinbaum R, Alloing G, Emerson FE, Garsin DA, et al. A conserved p38 MAP kinase pathway in Caenorhabditis elegans innate immunity. Science. 2002;297:623–626. [PubMed: 12142542]
  • Lehrbach NJ, Armisen J, Lightfoot HL, Murfitt KJ, Bugaut A, et al. LIN-28 and the poly(U) polymerase PUP-2 regulate let-7 microRNA processing in Caenorhabditis elegans. Nat Struct Mol Biol. 2009;16:1016–1020. [PMC free article: PMC2988485] [PubMed: 19713957]
  • Li H, Li WX, Ding SW. Induction and suppression of RNA silencing by an animal virus. Science. 2002;296:1319–1321. [PubMed: 12016316]
  • Liu WH, Lin YL, Wang JP, Liou W, Hou RF, et al. Restriction of vaccinia virus replication by a ced-3 and ced-4-dependent pathway in Caenorhabditis elegans. Proc Natl Acad Sci U S A. 2006;103:4174–4179. [PMC free article: PMC1389701] [PubMed: 16537504]
  • Lu R, Maduro M, Li F, Li HW, Broitman-Maduro G, et al. Animal virus replication and RNAi-mediated antiviral silencing in Caenorhabditis elegans. Nature. 2005;436:1040–1043. [PMC free article: PMC1388260] [PubMed: 16107851]
  • Lu R, Yigit E, Li WX, Ding SW. An RIG-I-Like RNA helicase mediates antiviral RNAi downstream of viral siRNA biogenesis Caenorhabditis elegans. PLoS Pathog. 2009;5:e1000286. [PMC free article: PMC2629121] [PubMed: 19197349] [Cross Ref]
  • Milloz J, Duveau F, Nuez I, Félix M-A. Intraspecific evolution of the intercellular signaling network underlying a robust developmental system. Genes Dev. 2008;22:3064–3075. [PMC free article: PMC2577794] [PubMed: 18981482]
  • Motohashi T, Tabara H, Kohara Y. Protocols for large scale in situ hybridization on C. elegans larvae. WormBook. 2006. [Accessed 29 December 2010]. pp. 1–8. Available: http://wormbook​.org. [PMC free article: PMC4781301] [PubMed: 18050447]
  • Neil SJ, Zang T, Bieniasz PD. Tetherin inhibits retrovirus release and is antagonized by HIV-1 Vpu. Nature. 2008;451:425–430. [PubMed: 18200009]
  • Pak J, Fire A. Distinct populations of primary and secondary effectors during RNAi in C. elegans. Science. 2007;315:241–244. [PubMed: 17124291]
  • Powell JR, Kim DH, Ausubel FM. The G protein-coupled receptor FSHR-1 is required for the Caenorhabditis elegans innate immune response. Proc Natl Acad Sci U S A. 2009;106:2782–2787. [PMC free article: PMC2650343] [PubMed: 19196974]
  • Sabin LR, Zhou R, Gruber JJ, Lukinova N, Bambina S, et al. Ars2 regulates both miRNA- and siRNA-dependent silencing and suppresses RNA virus infection in Drosophila. Cell. 2009;138:340–351. [PMC free article: PMC2717035] [PubMed: 19632183]
  • Schott DH, Cureton DK, Whelan SP, Hunter C. An antiviral role for the RNA interference machinery in Caenorhabditis elegans. Proc Natl Acad Sci U S A. 2005;102:18420–18424. [PMC free article: PMC1317933] [PubMed: 16339901]
  • Schulte RD, Makus C, Hasert B, Michiels NK, Schulenburg H. Multiple reciprocal adaptations and rapid genetic change upon experimental coevolution of an animal host and its microbial parasite. Proc Natl Acad Sci U S A. 2010;107:7359–7364. [PMC free article: PMC2867683] [PubMed: 20368449]
  • Sijen T, Steiner FA, Thijssen KL, Plasterk RH. Secondary siRNAs result from unprimed RNA synthesis and form a distinct class. Science. 2007;315:244–247. [PubMed: 17158288]
  • Tabara H, Sarkissian M, Kelly WG, Fleenor J, Grishok A, et al. The rde-1 gene, RNA interference, and transposon silencing in C. elegans. Cell. 1999;99:123–132. [PubMed: 10535731]
  • Tijsterman M, Okihara KL, Thijssen K, Plasterl RHA. PPW-1, a PAZ/PIWI protein required for efficient germline RNAi, is defective in a natural isolate of C. elegans. Curr Biol. 2002;12:1535–1540. [PubMed: 12225671]
  • Troemel ER, Felix MA, Whiteman NK, Barriere A, Ausubel FM. Microsporidia are natural intracellular parasites of the nematode Caenorhabditis elegans. PLoS Biol. 2008;6:2736–2752. [PMC free article: PMC2596862] [PubMed: 19071962] [Cross Ref]
  • Wang D, Urisman A, Liu YT, Springer M, Ksiazek TG, et al. Viral discovery and sequence recovery using DNA microarrays. PLoS Biol. 2003;1:E2. [PMC free article: PMC261870] [PubMed: 14624234] [Cross Ref]
  • Weimer RM. Preservation of C. elegans tissue via high-pressure freezing and freeze-substitution for ultrastructural analysis and immunocytochemistry. Methods Mol Biol. 2006;351:203–221. [PubMed: 16988436]
  • Wilkins C, Dishongh R, Moore SC, Whitt MA, Chow M, et al. RNA interference is an antiviral defence mechanism in Caenorhabditis elegans. Nature. 2005;436:1044–1047. [PubMed: 16107852]
  • Wood WB. The nematode Caenorhabditis elegans. Cold Spring Harbor, NY: Cold Spring Harbor Laboratory; 1988. p. 667.



55 The Genome Institute, Washington University, 4444 Forest Park Avenue, Campus Box 8501, St. Louis, Missouri 63108, USA.


The human body is colonized by a vast array of microbes, which form communities of bacteria, viruses and microbial eukaryotes that are specific to each anatomical environment. Every community must be studied as a whole because many organisms have never been cultured independently, and this poses formidable challenges. The advent of next-generation DNA sequencing has allowed more sophisticated analysis and sampling of these complex systems by culture-independent methods. These methods are revealing differences in community structure between anatomical sites, between individuals, and between healthy and diseased states, and are transforming our view of human biology.

The microbes that exist in the human body are collectively known as the human microbiota. This amazingly complex and poorly understood group of communities has an enormous impact on humans. An increasing number of conditions are being examined for correlative and causative associations with the microbiome—which, in this Review, is used to refer to the microbiota and the habitat it colonizes (Box A14-1). Each one of the many microbial communities has its own structure and ecosystem, depending on the body environment it exists in. The fundamental goal of human microbiome research is to measure the structure and dynamics of microbial communities, the relationships between their members, what substances are produced and consumed, the interaction with the host, and differences between healthy hosts and those with disease. Despite an explosion in human-microbiome research, these communities are still the dark matter of the body. The microbiome has been called another organ (Backhed et al., 2005; Foxman et al., 2008; Possemiers et al., 2011; Shanahan, 2002) because of its products, its responsiveness to the environment and its integration with other systems. Sometimes referred to as our second genome (Bruls and Weissenbach, 2011), the genes of microbes that make up the microbiome outnumber human genes by more than 100-fold, with over 3 million bacterial genes in the gut alone (Human Microbiome Project Consortium, 2012a; Qin et al., 2010). These extensive microbial ecosystems are not limited to the human body. Microbes and their communities dominate the environment and occupy a vast range of niches. Environmental metagenomics was developed extensively before being applied to the human body (Stein et al., 196; Vergin et al., 1998), and methods from other disciplines have had a significant effect on human-microbiome research. Defining complicated microbial ecosystems and developing tools to probe their workings is an important research enterprise of twenty-first century microbiology.

Box Icon

BOX A14-1

Terminology. Biodiversity is a measure of the complexity of a community. It is affected by the number of taxa (richness) and their range of abundance (evenness). High biodiversity occurs when many taxa (high richness) are present at similar abundances (more...)

The complexity of microbial communities makes studying them challenging. There may be hundreds of different species, and enumerating what organisms are present with standard microbiological techniques is not possible because many organisms have never been grown in culture and may require special, as yet unknown, growth conditions. In addition, the abundance of some microbes can range over orders of magnitude, so deep sampling is required to detect the less-abundant members. Culture-independent methods of taking a microbial census began about 25 years ago and were based on targeted sequencing of 5S and 16S ribosomal RNA genes (Olsen et al., 1986), which differ for each species and are a convenient identifier. As this became a tractable research area, next-generation sequencing (NGS) technologies (Table A14-1) were developed and allowed more extensive analyses, both targeted 16S rRNA gene sequencing and whole-genome shotgun sequencing of microbes in communities en masse. The number of culture-independent metagenomic investigations of the human microbiome has mushroomed, and it is one of the most studied areas of microbiology with significant potential to benefit clinical practice. This culture-independent methodology is broadly applied outside human-microbiome research and is expanding our knowledge of the environment. This Review describes how NGS approaches are transforming human-microbiome studies, and posing questions and challenges for the future.

TABLE A14-1. DNA Sequencing Platforms Used for Microbiome Analysis.


DNA Sequencing Platforms Used for Microbiome Analysis.

Single Organisms and Microbial Communities

In the past, research on microbial interactions with humans has focused on single pathogenic organisms. Studies of communities of non-pathogenic microbes in the body were limited because the organisms were thought to be benign, with minor effects on human health compared with pathogens. Microbiome research has led to new interest in the communities of non-pathogenic microbes that inhabit the human body, and the need to describe the genomes of these organisms to understand the human microbiome has been recognized.

Every community of the microbiome has its own characteristics (Table A14-2). For the gut community, for example, high biodiversity is associated with a healthy state and reduced biodiversity occurs in patients with conditions such as Crohn’s disease (Manichanh et al., 2006), whereas for tissues of the vagina, a lower biodiversity exists in healthy individuals and a bloom of organisms occurs in patients with vaginosis (Fredricks et al., 2005). To understand why different sites have different properties, the mechanisms that lead to the disruption of ecosystems and to disease, and exceptions to generalities about a tissue, researchers require knowledge of the structure and behaviour of microbial communities.

TABLE A14-2. Characteristics of Bacteria, Microbial Eukaryotes, and Viruses in the Human Microbiome.


Characteristics of Bacteria, Microbial Eukaryotes, a