Logo of pnasPNASInfo for AuthorsSubscriptionsAboutThis Article
Proc Natl Acad Sci U S A. 2012 Feb 21; 109(8): 3065–3070.
Published online 2012 Feb 6. doi:  10.1073/pnas.1121491109
PMCID: PMC3286951

Genomic epidemiology of the Escherichia coli O104:H4 outbreaks in Europe, 2011


The degree to which molecular epidemiology reveals information about the sources and transmission patterns of an outbreak depends on the resolution of the technology used and the samples studied. Isolates of Escherichia coli O104:H4 from the outbreak centered in Germany in May–July 2011, and the much smaller outbreak in southwest France in June 2011, were indistinguishable by standard tests. We report a molecular epidemiological analysis using multiplatform whole-genome sequencing and analysis of multiple isolates from the German and French outbreaks. Isolates from the German outbreak showed remarkably little diversity, with only two single nucleotide polymorphisms (SNPs) found in isolates from four individuals. Surprisingly, we found much greater diversity (19 SNPs) in isolates from seven individuals infected in the French outbreak. The German isolates form a clade within the more diverse French outbreak strains. Moreover, five isolates derived from a single infected individual from the French outbreak had extremely limited diversity. The striking difference in diversity between the German and French outbreak samples is consistent with several hypotheses, including a bottleneck that purged diversity in the German isolates, variation in mutation rates in the two E. coli outbreak populations, or uneven distribution of diversity in the seed populations that led to each outbreak.

Keywords: food-borne outbreak, Shiga toxin, enteroaggregative E. coli, enterohemorrhagic E. coli

In May–July 2011, two outbreaks of bloody diarrhea and hemolytic uremic syndrome (HUS) occurred in Europe: one centered in Germany (around 4,000 cases of bloody diarrhea, 850 cases of HUS and 50 deaths), and a much smaller outbreak in southwest France, near Bordeaux (15 cases of bloody diarrhea, 9 of which progressed to HUS) (14). Both outbreaks were caused by a strain of Shiga toxin-producing Escherichia coli of serotype O104:H4 (2, 5), which possesses a plasmid, pAA, characteristic of enteroaggregative E. coli, as well as a plasmid encoding an extended-spectrum β-lactamase (ESBL) (3). The proportion of patients infected with E. coli O104:H4 who develop complications, including HUS, is higher than seen in prior outbreaks (1, 6). The source of the outbreaks was epidemiologically linked to contaminated sprouts, and evidence indicates the outbreaks are connected to a 15,000-kg seed shipment from Egypt that arrived in Germany in December 2009. The majority of the seeds from the shipment (10,500 kg) was then sent to a German seed distributor, which supplied the implicated German sprout farm. Four hundred kilograms of the original seed shipment was sent to an English seed distributor, which then repacked seeds into 50-g packets passed on to French garden stores (7). The seeds from a packet were then germinated into sprouts at a children's community center, and the sprouts were served on June 8, 2011, leading to the French outbreak (2).

Epidemiological investigations of outbreaks aim to combine various approaches to reconstruct in detail the chain of events that led to the outbreak. In principle, genetic information, such as the patterns of genetic diversity among isolates, can aid in tracking the origins and transmission of the pathogens. Genetic diversity can indicate how long the pathogenic lineage has been diversifying and shed light on when, where, and how this E. coli originated and entered the human food chain. In practice, such inferences require extensive and highly accurate genetic information. Even small error rates, which matter little for comparing an outbreak strain to historical isolates, could obscure genuine phylogenetic signal in comparing extremely closely related genomes from within an outbreak.

Based on conventional molecular epidemiological characterization (including virulence gene content, serotyping, multilocus sequence typing, rep-PCR, pulsed-field gel electrophoresis, optical mapping, and antimicrobial susceptibility testing), the outbreak strains in Germany and France appear identical (2, 8) (see also SI Materials and Methods). However, these approaches do not assess the full diversity among strains. A comprehensive strategy requires whole-genome sequencing with accurate resolution on the single nucleotide level and can be augmented by analysis of gene and plasmid content.


We first performed whole-genome sequencing using the Illumina sequencing platform on four isolates from the outbreak centered in Germany (Table 1). Among these four isolates, we found only two SNPs relative to a published genome from the German outbreak, TY2482 (9): two of the isolates showed no differences relative to the reference, and two showed one SNP each (nucleotide positions 224851 and 1096014) (Table 2; see also SI Materials and Methods, Table S1, and Fig. S1). We independently confirmed the two SNPs by Sanger sequencing. As further validation of the sequence quality, we performed genome sequencing, assembly, and SNP calling of two of these isolates (C236-11 and C227-11), using an independent genome-sequencing technology (454 sequencing platform); this analysis found the same two SNPs and no additional ones (see SI Materials and Methods and Tables, S2, S3, and S4). Our observation of limited diversity in the German outbreak isolates is consistent with a recent report that found no SNPs in two independent isolates from the German outbreak (10).

Table 1.
E. coli O104:H4 isolates sequenced and analyzed in this study
Table 2.
SNPs identified within E. coli O104:H4 outbreak isolates

We then analyzed strains from the smaller French outbreak. We performed whole-genome sequencing on 11 isolates from seven patients, including five isolated simultaneously from a single patient (Table 1). Surprisingly, the diversity of the isolates from the French outbreak was considerably greater than that from the German outbreak (Table 2). We found 19 SNPs, all of which were validated by Sanger sequencing.

The five isolates from the single host showed virtually no variation. Four isolates were identical, but the fifth lacked one SNP shared by the other four (Fig. 1A and Table 2). Technically, the low diversity within a single individual further confirms the sequencing quality. Scientifically, it suggests that infection may have involved a small inoculum [similar to the estimated low infectious dose of E. coli O157:H7 (11)], or that a small number of genotypes dominate within a host during an infection.

Fig. 1.
(A) Bootstrap consensus maximum-likelihood phylogeny using the 21 SNPs, based on 500 bootstraps and rooted on the 2004 and 2009 isolates. No branch lengths are provided for the 2004 and 2009 isolates because this phylogeny is generated only from the 21 ...

A maximum-likelihood phylogeny of the outbreak isolates (Fig. 1A), rooted on historical E. coli O104:H4 isolates from 2004 and 2009 that we had also sequenced, showed that the limited diversity seen in the samples from the large German outbreak was nested within the greater diversity of French isolates. One SNP, at location 1568661, distinguishes the historical 2004 and 2009 isolates and all but two of the French isolates from the outbreak isolates from Germany. The most parsimonious explanation is that the isolates from the outbreak in Germany represent a subset of diversity seen in the French outbreak. We additionally placed the outbreak isolates into broader phylogenetic context using C227-11 as representative of the outbreak: historical E. coli O104:H4 isolates 55989 [isolated from an HIV-positive adult from the Central African Republic in the 1990s that, like the other isolates, is enteroaggregative, but, in contrast, is not Shiga toxin-producing (12)], 01–09591 [isolated from an individual in Germany in 2001 (13)], and the 2004 and 2009 isolates from individuals in France and a commensal E. coli genome E1167 (Fig. 1B). Although the historical E. coli O104:H4 isolates from 2001, 2004, and 2009 are related to this outbreak, they do not appear to be ancestral.

To confirm that the diversity found in the French outbreak was absent in the German outbreak, we analyzed sequence data from eight additional German outbreak strains recently deposited in GenBank (GOS1, GOS2, H112180540, H112180541, H112180280, H112180282, H112180283, and LB226692). Although these genome sequences are not suitable for de novo SNP prediction using our approach (most lack quality scores), they can be evaluated for the presence of known SNPs. We found that none of these genomes contained any of the 19 SNPs seen in the French outbreak or the two identified in the German outbreak (see SI Materials and Methods for details), indicating that they share the same sequence as TY2482 at these sites.

The identity of the SNPs suggests that they reflect recent diversification without evidence for either purifying or positive selection (14). Specifically, the SNPs are not biased toward protein-altering substitutions. Of the 21 SNPS, 3 (14.3%) SNPs are intergenic (in keeping with the range of 12.3–13.8% of the genome predicted to be intergenic) (Table S5). Of the 14 SNPs within coding regions, 4 (28.6%) are synonymous.

We found that all German and French outbreak isolates contained the three plasmids, including pAA, the ESBL plasmid, and a much smaller third plasmid, all of which have been identified in other descriptions of the O104:H4 outbreak isolates (9, 10, 13, 15).

Through synteny and ortholog analysis, we computationally predicted only one region of gene difference, a deletion in Ec11-5538, one of the French outbreak isolates (see SI Materials and Methods for details). We confirmed the absence of an 836-bp region in this genome by PCR analysis and note that it is adjacent to an insertion sequence. This deleted region includes three predicted genes and the 5′ end of a fourth predicted gene (SI Materials and Methods and Figs. S2S4). We found no other evidence of gene gain or loss.


In this study, we perform whole-genome sequencing of multiple isolates from the 2011 outbreaks of E. coli O104:H4 in France and Germany to identify differences among isolates that are indistinguishable by standard molecular epidemiological tools. We find that the isolates are all closely related, and that the German outbreak isolates have extremely limited diversity, whereas there is greater diversity among the isolates from the French outbreak.

Several lines of evidence support our finding of extremely limited diversity among at least a majority of the German outbreak isolates. First, there is minimal diversity among the four independent isolates reported here (see Table 1 and Materials and Methods for description of the background of the isolates). Second, a previous analysis of two other isolates identified no SNPs between them (10). The chance of detecting a subpopulation that comprises 40% of the overall population using six randomly selected isolates is 95% [1 − (1 − 0.4)6 = 0.95]. Even in the absence of the two isolates from the independent analysis, the likelihood of detecting a subpopulation of 40% of the total population with four isolates is 87% [1 − (1 − 0.4)4 = 0.87]. Thus, our sample size is sufficient to detect, with high probability, variants present as a majority or large minority of all isolates. Third, eight isolates from the German outbreak with sequence in GenBank (GOS1, GOS2, H112180540, H112180541, H112180280, H112180282, H112180283, and LB226692) share identical sequence to TY2482 at the sites of each SNP position described in this study. Although it is impossible to exclude the possibility of unsampled diversity in the German outbreak, our findings argue that a majority of the population is extremely closely related.

Using the framework of the trace-back epidemiology that links the two outbreaks to the 2009 shipment of fenugreek seeds, several hypotheses can explain the surprising findings that there is greater diversity of E. coli O104:H4 in the much smaller French outbreak than the German outbreak, and that the outbreak isolates from Germany appear to be nested within the diversity of the French outbreak (Fig. 2).

Fig. 2.
Schematic of hypotheses to explain differences in E. coli SNP diversity seen in the French and German outbreaks. (A) At minimum, the contaminating population that gave rise to both the French and German outbreaks was polymorphic at location 1568661, and ...

One hypothesis is that the limited diversity reflects a stochastic bottleneck in at least the sampled part of the E. coli pathogen population in Germany compared with France. As we found no evidence for positive or purifying selection in the SNPs, the bottleneck we propose represents a random process that purged most of the diversity. The limited diversity observed within an individual suggests the hypothesis that the bottleneck in the German outbreak could represent contamination from a single infected human at the sprout farm in Germany. Consistent with this hypothesis, three employees were confirmed as early cases of E. coli O104:H4 infection, including two asymptomatic shedders, dating to around the time of the reported start of the outbreak in early May 2011 (16). In principle, the limited diversity in Germany could also result from partially successful measures to disinfect seeds or sprouts at the German sprout farm; however, it appears that no specific disinfection procedures were applied, apart from routine hygiene and cleaning of the sprout preparation area (16). Analysis of any isolates available from the earliest stages of the outbreak, including those from infected employees or sprouts, would allow for direct testing of these hypotheses. Broader sampling from the outbreak in Germany may help determine the extent to which the outbreak in Germany reflects contamination from a single individual, and whether there is evidence for subpopulations with additional diversity.

A second hypothesis is that although substantial diversity was present in the original bacterial source population, it was unevenly distributed, with a more diverse population, perhaps reflecting heavier contamination, affecting seeds sent to France more than those sent to Germany. As a far greater amount of seeds (10,500 kg) went to the German distributor that supplied the establishment identified as the source of the German outbreak and only 400 kg went to the English distributor that supplied the 50-g seed packets believed to be the source of the French outbreak (7), this hypothesis requires the low probability event that seeds with the higher diversity E. coli population happened to be in the smaller-sized shipments. Characterization of E. coli O104:H4 populations found on other seeds from this shipment may help to assess this hypothesis. To our knowledge, no such populations have yet been described.

Finally, a third hypothesis is that the difference in diversity reflects unknown environmental or other constraints that influenced rates of accumulation of diversity once the bacteria arrived in each country. For example, it is possible that differences in sprouting conditions between the German sprout farm and the French community center could have led to differences in diversity. These differences in conditions include use of well-water at a temperature of 20 °C in the sprout farm in Germany (16), compared with tap water at ambient temperature (between 12 and 28 °C) in the French outbreak (2). Seeds in France were also germinated for about 1.5 d longer. Testing rates of accumulation of SNPs under various conditions may help to assess this possibility.

Using next-generation sequencing methods, we have been able to reveal variation at a single nucleotide level within genome sequences from a point-source outbreak, all within a set of isolates that are identical by classic typing techniques. Highly accurate sequencing and SNP identification can overcome the noise from sequencing error and discern phylogenetic signal, which may, as in this case, depend on a small number of nucleotides. As demonstrated by the multiple independent sequencing efforts related to this E. coli O104:H4 outbreak (9, 10, 13, 15), and also epidemiological investigations of other infectious diseases (1719), genomic epidemiology is likely to become the standard strategy in molecular epidemiology as the cost of sequencing continues to decline and technology becomes more widely accessible.

The determination of genome sequence is already recognized as a vital part of investigating any new outbreak, to place the pathogen in context and gain insight into its origins and the basis of its pathogenicity. Together with other recent work (1719), this study argues strongly for multiple genome sequences to understand patterns of transmission within an outbreak. Such analyses can already be conducted in a matter of days, and technological advances will only improve our ability to perform them in real time. The advantages of whole genome data include greater resolution than classic techniques for outbreak investigation, such as pulsed-field gel electrophoresis, and a body of data amenable to analysis with well-developed and understood phylogenetic methods. As this example demonstrates, the results of such analysis, combined with traditional epidemiology, can raise novel epidemiologic hypotheses and questions that are available only through sequencing of multiple isolates.

Materials and Methods

Strains Sequenced in This Study.

Isolates include 4 linked to the outbreak centered in Germany, 11 from the outbreak in the Bordeaux area of France (of which 5 are from a single individual), and 2 2004 and 2009 Shiga toxin-producing O104:H4 isolates from France. The German outbreak isolates were linked by travel to Germany and timing of the cases. C227-11 derives from a 68-y-old woman originally from Hamburg, Germany, who was in Denmark when she fell ill; the isolate was obtained on May 18. Note that a genome sequence for this isolate was previously reported (15). To ensure consistency in our analyses, we independently sequenced this isolate and use the genome sequence we generated for the studies reported here. C236-11 was isolated from a 23-y-old man from Southern Denmark, which borders Germany, without confirmed travel to Germany; the isolate was obtained on May 21. Ec11-3677 derives from a 31-y-old German woman who had spent 2 wk in Northern Germany (May 5–21, 2011) and who was traveling in France at the time of illness on May 21. Ec11-3798 was isolated from a 55-y-old French man who traveled in Northern Germany between May 8 and 12, 2011, and had returned to France when he became ill on May 21. The French outbreak isolates (Ec11-4404, Ec11-4522, Ec11-4623, Ec11-4632_C1-C5, Ec11-5536, Ec11-5537, Ec11-5538) were collected from individuals in the same community near Bordeaux, all of whom were known to eat sprouts at a single event on June 8, 2011 (2). Ec04-8351 and Ec09-7901 were isolated from the stool of infected individuals in France in 2004 and 2009 and represent historical O104:H4 isolates (20) (Table 1).

Genome Sequencing.

We used a multiplatform strategy, generating an average of 146-fold sequence coverage on the Illumina platform, supplemented with data from 454 and Pacific Bioscience platforms for specific analyses. For details of the sequencing methods and genome assembly, see SI Materials and Methods.

SNP Prediction and Validation.

SNP calling was performed using our analysis pipeline [GATK v1.0.6011 (21)] based on alignments of paired-end read data (101 sequences from both ends of 180-bp insert fragments on the Illumina platform) to the TY2482 strain. Potential SNPs from the Illumina sequences were called by GATK Unified Genotyper (22), filtering the data according to the following parameters: >90% agreement among reads; at least five unambiguously mapped reads; no greater than 50% mapping ambiguity; insertions and deletions were ignored. Over 97% of the bases in the genome of each outbreak isolate fulfilled these criteria. Bases were identified that have the highest computational likelihood for calling a base as either agreement to the reference or a SNP. Only SNPs at locations where equally high-confidence calls could be made in all outbreak isolates were included in the analysis. At 54 sites, all outbreak and historical isolates showed the same sequence as each other but disagreed with the TY2482 reference genome; we did not identify these sites as SNPs and use them as discriminatory markers because they may represent errors in the reference sequence as opposed to true SNPs (Fig. S1 and Table S2). See SI Materials and Methods for details of 454-based genome sequencing and SNP validation and PCR-based validation.

Phylogenetic Analysis.

To study the phylogenetic relationship among the outbreak isolates, we created a single sequence for each isolate consisting of the genotype at the 21 SNP sites and used these data as input sequence to Mega (23). A maximum-likelihood tree was generated using the Kimura two-parameter model with 500 bootstraps and rooted on the branch leading to the 2004 and 2009 isolates. To study the relationship between the outbreak and historical isolates, we first aligned whole-genome assemblies of C227-11, 55989, 01–09591, 04–8351, 09–7901, and the commensal E. coli E1167 using progressiveMauve (24). We selected SNPs from this alignment that contain unambiguous bases for all isolates, are in regions that align, and have at least 90% agreement in a sliding 100-bp window around each SNP. These SNPs were used to generate a maximum-likelihood tree using the Kimura two-parameter model with 500 bootstraps and rooted on E1167.

Supplementary Material

Supporting Information:


We thank Flemming Scheutz from the World Health Organization Collaborating Centre for Reference and Research on Escherichia coli and Klebsiella, Department of Microbiological Surveillance and Research, Statens Serum Institut, Copenhagen, Denmark for the valuable contribution of strains and discussion; R. T. Chandler for his insights; all members of the Broad Biological Samples Platform and the Genome Sequencing Platform; Lynne Aftuck, Scott Anderson, Patrick Cahill, Marc Chevrette, Julio Diaz, Danielle Dionne, Sheli Dookran, Rachel Erlich, Stephanie Grandbois, Lisa Green, Andrew Hollinger, Laurie Holmes, Andrew Hoss, Edward Kelliher, Sharon Kim, Purnima Kompella, Tony LaCasse, Matthew Lee, Niall J. Lennon, Dana Robbins, Alyssa Rosenthal, Elizabeth Ryan, Brian Sogoloff, Todd Sparrow, Alvin Tam, Austin Tzou, Cole Walsh, and Emily Wheeler for sequencing and technical support; and Lucia Alvarado-Balderrama, Aaron Berlin, Gary Gearin, Sante Gnerre, Giles Hall, Alma Imavovic, David B. Jaffe, Annie Lui, Iain MacCallum, J. Pendexter Macdonald, Matthew Pearson, Margaret Priest, Dariusz Przbylski, Andrew Roberts, Filipe J. Ribeiro, Sakina Saif, Ted Sharpe, Terry Shea, Narmada Shenoy, and Shuangye Yin for computational and analysis support. This project has been funded in part with federal funds from the National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and Human Services, under Contract HHSN272200900018C (to B.W.B.), and the Infectious Disease Program of the Broad Institute; National Institute of General Medical Sciences Award U54GM088558 (to M.L. and W.P.H.); National Institutes of Allergy and Infectious Disease T32 Grant AI007061 (to Y.H.G.); Danish Council for Strategic Research Grant 09-063070 (to K.A.K.); and the Institut de Veille Sanitaire (F.-X.W. and E.B.).


The authors declare no conflict of interest.

Data deposition: The sequences reported in this paper have been deposited in the GenBank database (accession nos. E. coli C227-11 AFRH01000000; E. coli C236-11 AFRI01000000; E. coli Ec04-8351 AFRL01000000; E. coli Ec11-3677 AFRM01000000; E. coli Ec09-7901 AFRK01000000; E. coli Ec11-4632-C1 AFVA01000000; E. coli Ec11-4632-C2 AFVB01000000; E. coli Ec11-4632-C3 AFVC01000000; E. coli Ec11-4632-C4 AFVD01000000; E. coli Ec11-4632-C5 AFVE01000000; E. coli Ec11-4404 AFUX01000000; E. coli Ec11-4522 AFUY01000000; E. coli Ec11-4623 AFUZ01000000; E. coli Ec11-5538 AFVF01000000; E. coli Ec11-5537 AFVG01000000; and E. coli Ec11-5536 AFVH01000000).

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1121491109/-/DCSupplemental.


1. Frank C, et al. HUS Investigation Team Epidemic profile of Shiga-toxin-producing Escherichia coli O104:H4 outbreak in Germany. N Engl J Med. 2011;365:1771–1780. [PubMed]
2. Gault G, et al. Outbreak of haemolytic uraemic syndrome and bloody diarrhoea due to Escherichia coli O104:H4, south-west France, June 2011. Euro Surveill. 2011;16 pii: 19905. [PubMed]
3. Bielaszewska M, et al. Characterisation of the Escherichia coli strain associated with an outbreak of haemolytic uraemic syndrome in Germany, 2011: A microbiological study. Lancet Infect Dis. 2011;11:671–676. [PubMed]
4. Frank C, et al. HUS investigation team Large and ongoing outbreak of haemolytic uraemic syndrome, Germany, May 2011. Euro Surveill. 2011;16 pii: 19878. [PubMed]
5. Scheutz F, et al. Characteristics of the enteroaggregative Shiga toxin/verotoxin-producing Escherichia coli O104:H4 strain causing the outbreak of haemolytic uraemic syndrome in Germany, May to June 2011. Euro Surveill. 2011;16 pii: 19889. [PubMed]
6. Jansen A, Kielstein JT. The new face of enterohaemorrhagic Escherichia coli infections. Euro Surveill. 2011;16 pii: 19898. [PubMed]
7. European Food Safety Authority 2011 Tracing seeds, in particular fenugreek (Trigonella foenum-graecum) seeds, in relation to the Shiga toxin-producing E. coli (STEC) O104:H4 2011 Outbreaks in Germany and France. Available at http://www.efsa.europa.eu/en/supporting/doc/176e.pdf. Accessed August 4, 2011.
8. Mariani-Kurkdjian P, Bingen E, Gault G, Jourdan-Da Silva N, Weill FX. Escherichia coli O104:H4 south-west France, June 2011. Lancet Infect Dis. 2011;11:732–733. [PubMed]
9. Rohde H, et al. E. coli O104:H4 Genome Analysis Crowd-Sourcing Consortium Open-source genomic analysis of Shiga-toxin-producing E. coli O104:H4. N Engl J Med. 2011;365:718–724. [PubMed]
10. Brzuszkiewicz E, et al. Genome sequence analyses of two isolates from the recent Escherichia coli outbreak in Germany reveal the emergence of a new pathotype: Entero-Aggregative-Haemorrhagic Escherichia coli (EAHEC) Arch Microbiol. 2011;193:883–891. [PMC free article] [PubMed]
11. Armstrong GL, Hollingsworth J, Morris JG., Jr Emerging foodborne pathogens: Escherichia coli O157:H7 as a model of entry of a new pathogen into the food supply of the developed world. Epidemiol Rev. 1996;18:29–51. [PubMed]
12. Touchon M, et al. Organised genome dynamics in the Escherichia coli species results in highly diverse adaptive paths. PLoS Genet. 2009;5:e1000344. [PMC free article] [PubMed]
13. Mellmann A, et al. Prospective genomic characterization of the German enterohemorrhagic Escherichia coli O104:H4 outbreak by rapid next generation sequencing technology. PLoS ONE. 2011;6:e22751. [PMC free article] [PubMed]
14. Rocha EP, et al. Comparisons of dN/dS are time dependent for closely related bacterial genomes. J Theor Biol. 2006;239:226–235. [PubMed]
15. Rasko DA, et al. Origins of the E. coli strain causing an outbreak of hemolytic-uremic syndrome in Germany. N Engl J Med. 2011;365:709–717. [PMC free article] [PubMed]
16. Bundesinstitut fuür Risikobewertung Relevance of sprouts and germ buds as well as seeds for sprout production in the current EHEC O104:H4 outbreak event in May and June 2011. 2011. Updated Opinion No. 23/2011 of BfR, 5 July 2011. Available at http://www.bfr.bund.de/cm/349/relevance_of_sprouts_and_germ_buds_as_well_as_seeds_for_sprouts_production_in_the_current_ehec_o104_h4_outbreak_event_in_may_and_june_2011.pdf. Accessed December 28, 2011.
17. Gardy JL, et al. Whole-genome sequencing and social-network analysis of a tuberculosis outbreak. N Engl J Med. 2011;364:730–739. [PubMed]
18. Harris SR, et al. Evolution of MRSA during hospital transmission and intercontinental spread. Science. 2010;327:469–474. [PMC free article] [PubMed]
19. Rasko DA, et al. Bacillus anthracis comparative genome analysis in support of the Amerithrax investigation. Proc Natl Acad Sci USA. 2011;108:5027–5032. [PMC free article] [PubMed]
20. Monecke S, et al. Presence of Enterohemorrhagic Escherichia coli ST678/O104:H4 in France prior to 2011. Appl Environ Microbiol. 2011;77:8784–8786. [PMC free article] [PubMed]
21. McKenna A, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20:1297–1303. [PMC free article] [PubMed]
22. DePristo MA, et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet. 2011;43:491–498. [PMC free article] [PubMed]
23. Tamura K, et al. MEGA5: Molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol. 2011;28:2731–2739. [PMC free article] [PubMed]
24. Darling AE, Mau B, Perna NT. progressiveMauve: Multiple genome alignment with gene gain, loss and rearrangement. PLoS ONE. 2010;5:e11147. [PMC free article] [PubMed]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences
PubReader format: click here to try


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...