Logo of jcmPermissionsJournals.ASM.orgJournalJCM ArticleJournal InfoAuthorsReviewers
J Clin Microbiol. 2003 Feb; 41(2): 675–679.
PMCID: PMC149678

Multilocus Sequence Typing Reveals a Lack of Diversity among Escherichia coli O157:H7 Isolates That Are Distinct by Pulsed-Field Gel Electrophoresis


Escherichia coli O157:H7 is a major cause of foodborne illness in the United States. Pulsed-field gel electrophoresis (PFGE) is the molecular epidemiologic method mostly commonly used to identify food-borne outbreaks. Although PFGE is a powerful epidemiologic tool, it has disadvantages that make a DNA sequence-based approach potentially attractive. Multilocus sequence typing (MLST) analyzes the internal fragments of housekeeping genes to establish genetic relatedness between isolates. We sequenced selected portions of seven housekeeping genes and two membrane protein genes (ompA and espA) of 77 isolates that were diverse by PFGE to determine whether there was sufficient sequence variation to be useful as an epidemiologic tool. There was no DNA sequence diversity in the sequenced portions of the seven housekeeping genes and espA. For ompA, all but five isolates had sequence identical to that of the reference strains. E. coli O157:H7 has a striking lack of genetic diversity in the genes we explored, even among isolates that are clearly distinct by PFGE. Other approaches to identify improved molecular subtyping methods for E. coli 0157:H7 are needed.

Escherichia coli O157:H7 was first recognized in 1982 in association with a food-borne outbreak and is now recognized as an important cause of food-borne illness in the United States, causing an estimated 74,000 infections a year (20, 26). This pathogen causes both bloody diarrhea and hemolytic uremic syndrome (HUS), a severe illness characterized by hemolytic anemia and acute renal failure (2). In recent years, there have been numerous large outbreaks of E. coli O157:H7-related bloody diarrhea and HUS (3, 4, 11). Most E. coli O157:H7 infections are caused by exposure to food or water that has bovine fecal contamination.

Pulsed-field gel electrophoresis (PFGE) is currently the most widely utilized molecular subtyping method for detecting outbreaks of E. coli O157:H7. In fact, PFGE has been found to identify outbreaks of E. coli O157:H7 that were not detected by traditional epidemiologic methods (1). However, DNA sequence-based methods offer several important potential advantages over PFGE, including shorter assay times and fully comparable and transferable data between laboratories (5, 16, 19, 21). Additionally, while PFGE is relatively simple and inexpensive, it is labor intensive, the interpretation of banding patterns can be subjective, and it does not easily handle large sample sets (13).

Multilocus sequence typing (MLST) is a DNA sequence-based molecular subtyping method that has been used successfully for other bacteria, such as Neisseria meningitidis, Streptococcus pneumoniae, and Salmonella, for both evolutionary and epidemiologic studies (8, 16, 22, 29). Briefly, MLST examines the alleles of selected housekeeping genes by nucleotide sequencing a 500- to 600-bp segment of the gene. The sequence data are then analyzed to determine the genetic relatedness of the bacterial isolates. In this study, we performed MLST on a set of E. coli O157:H7 isolates that had been characterized epidemiologically and by PFGE to determine the potential utility of MLST for the molecular subtyping of this organism.


E. coli O157:H7 isolates.

E. coli O157:H7 isolates were obtained from several sources for this study. Isolates were selected to include groups of strains from known outbreaks that were indistinguishable by PFGE, groups that were indistinguishable by PFGE but not associated with a known outbreak, and strains with a unique PFGE pattern that were not known to be associated with an outbreak.

The Public Health Infectious Disease Laboratory (PHIDL) obtained all E. coli O157:H7 strains isolated by the Allegheny County Health Department (ACHD) from 1999 to 2001 (n = 59). These strains were not associated with known outbreaks, with the exception of seven isolates from a single restaurant-associated outbreak in August and September 2001. A sample of isolates from the Minnesota Department of Health (MDH) was also included; these were outbreak-associated isolates (n = 14) and sporadic isolates (n = 4) from 1996 and 1997. ATCC strain EDL933 and the Sakai, Japan, strain RIMD 0509952 were used as reference strains (10, 23).


PFGE analysis was performed according to the Centers for Disease Control and Prevention PulseNet protocol with minor variations (25, 28). Briefly, pure isolates were grown overnight on blood agar. Equal amounts of bacterial suspension, represented by an optical density at 610 nm of 1.3 in 1× TE buffer (10 mM Tris, 1 mM EDTA [pH 8.0]; Sigma, St. Louis, Mo.), 1% SeaKem Gold agarose (BioWhittaker, Rockland, Maine), and 1% sodium dodecyl sulfate (Sigma) were added to 0.5 mg of proteinase K per ml (Sigma) and mixed to form plugs. The bacteria were lysed within the plugs with a cell lysis buffer (50 mM Tris, 50 mM EDTA [pH 8], 1% Sarcosine, 0.1 mg of proteinase K per ml [Sigma]) and incubated overnight at 37°C. The plugs were then washed four times with 1× TE buffer. Two-millimeter slices of plugs were incubated overnight with either XbaI or SpeI (New England Biolabs, Beverly, Mass.) at 37°C. The plugs were then loaded onto a 1% SeaKem Gold agarose gel. PFGE was performed with the CHEF III system (Bio-Rad, Hercules, Calif.) with the following run parameters: XbaI with a switch time of 3 to 40 s and a run time of 21 h and SpeI with switch time of 3 to 20 s and a run time of 21 h. All gels were run with the Centers for Disease Control and Prevention reference strain, G5244, of E. coli O157:H7. After the gel had been stained with ethidium bromide, the gel was captured with the Gel Doc 2000 and Multi-Analyst program (Bio-Rad). Dendrograms were created with Molecular Analyst (Bio-Rad) by using the Dice coefficient, unweighted pair group method with arithmetic means (UPGMA), and a position tolerance of 1.3%. Isolates were considered highly related with 0 or 1 band difference with both XbaI and SpeI.


Genomic DNA was isolated with Prepman Ultra according to the manufacturer's instructions (Applied Biosystems, Foster City, Calif.). Seven housekeeping genes were amplified from the genomic DNA by using recombinant Taq DNA polymerase (Gibco-Invitrogen, Gaithersburg, Md.); reaction parameters varied depending on the primer set (Table (Table1).1). The following genes (coding for the proteins in parentheses) were included: arcA (aerobic respiratory control protein) and aroE (shikimate dehydrogenase) with primers as described by Reid et al. (24), dnaE (DNA polymerase III, α subunit), mdh (malate dehydrogenase), gnd (6-phosphogluconate dehydrogenase), gapA (glyceraldehyde 3-phophate dehydrogenase), and pgm (phosphoglucomutase). Also sequenced were the membrane protein coding genes espA (E. coli secreting protein A) and ompA (outer membrane protein A).

Forward and reverse sequences of primers, based on sequences found in GenBank or those of published or unpublished primers

The oligonucleotide primers were designed based on the published sequences of the genes found in GenBank (Table (Table1).1). PCR products were purified with Multiscreen PCR plates (Millipore, Bedford, Mass.). PCR products were sequenced with the Big Dye Terminator Cycle Sequencing Ready Reaction kit (Applied Biosystems). Initial denaturation was for 4 min at 94°C, followed by 25 cycles of denaturation at 96°C (30 s), annealing at 50°C (5 s), and extension at 60°C (4 min), with a final extension at 72°C (1 min). The sequencing products were run on an Applied Biosystems 3700 DNA sequencer. Both the forward and reverse strands were sequenced with the PCR primer set (Table (Table1).1). Raw sequences were interpreted with Phred (a base-caller program) and Phrap (an assembly program) and verified with Consed (a Unix-based graphical editor) (6, 7, 9). All sequences were aligned and compared by using ClustalX, a graphical multiple alignment program (12). Sequence results were compared to the reference strains from the National Center for Biotechnology Information (NCBI) EDL933 (AE005174) and Sakai RIMD 0509952 (BA000007) by using ClustalX.


A total of 77 E. coli O157:H7 isolates were studied: 59 from Pennsylvania and 18 from Minnesota. The PFGE patterns of a selected group of these isolates, chosen to demonstrate the range of diversity of isolates that were sequenced, are shown in Fig. Fig.1.1. The genetic relatedness of these isolates ranged from around 75 to 100% with XbaI and around 80 to 100% with SpeI (data not shown).

FIG. 1.
PFGE analysis of selected strains from the ACHD and MDH restricted with XbaI. The reference strains include the following: G5244 (Centers for Disease Control and Prevention [CDC]), Sakai RIMD 0509952 (Japan), and EDL 933 (American Type Culture Collection). ...

Initially, housekeeping genes were targeted because they have successfully been used for other organisms (8, 22, 29). The seven selected housekeeping genes were chosen for their potential sequence diversity. Three of the genes, aroE, arcA, and mdh, have been used to determine the evolution of pathogenic E. coli (24). Two genes, dnaE and pgm, were chosen because they were found to be informative for Salmonella and Vibrio cholerae (16; Pallavi Garg, personal communication). The final two housekeeping genes, gapA and gnd, were chosen because they were transferred into the O157 genome at different evolutionary times. We hoped to find diversity as these genes reached G-C equilibrium with the new host (18). Finally, the two membrane proteins were chosen as being potential targets of the immune system and under possible pressure to mutate.

MLST analysis of the seven housekeeping genes demonstrated that the PHIDL and Minnesota strains had identical sequences at all seven loci. Similar to the housekeeping genes, there were no nucleotide differences in espA. All of the housekeeping genes and espA loci had identical sequences compared to the reference NCBI sequences EDL933 and Sakai RIMD 0509952. For ompA, all of the isolates had the same allele as the reference sequences in NCBI, except for five isolates that demonstrated two minor alleles. There was a single nucleotide polymorphism (SNP) (cytosine to thymidine) that was present at base 301 in 4 PHIDL isolates (PHIDL no. 19, 26, 34, and 61). These isolates were clustered together by PFGE, but were not indistinguishable (Fig. (Fig.1).1). Additionally, PHIDL 61 did not remain in the cluster when digested with SpeI (data not shown). PHIDL 62 had an SNP further downstream in ompA at nucleotide 560 (guanine to adenine); but the other isolates in its XbaI PFGE cluster, PHIDL 12 and 35, did not have this nucleotide polymorphism and instead had the most common allele.


In this study, we found a striking lack of DNA sequence diversity for all seven housekeeping genes, with not a single difference in the approximately 311,000 nucleotides (over 4,000 nucleotides per isolate) that were sequenced. Additionally, two other genes which might have been expected to have a higher degree of diversity, ompA and espA, exhibited either minimal or no diversity, respectively. We had included these genes because we hypothesized that, with products exposed on the surface of the cell, they could be under immune pressure and therefore might exhibit a higher degree of genetic diversity than the housekeeping genes, as was seen in Neisseria menigitidis subgroup III (30).

In studies conducted with other bacterial species, strain-to-strain variations in nucleotide sequence are commonly seen, even among strains within a single serotype (8, 19, 27, 29) and PFGE type (16) and/or associated with a common source. The observed sequence conservation could be interpreted as indicative of strong selection, as has been suggested for conserved genotypes of strains of N. menigitidis (19). An alternative interpretation—that the strains are clonal due to the organism's recent evolutionary appearance as a recently emerged human pathogen—is consistent with our sequence data. The contrast between the sequence conservation and presence of diversity as measured by PFGE that we observed could be explained if the PFGE pattern changes resulted from insertions and deletions of DNA that included a restriction site.

Three lines of evidence suggest that an important source of genetic diversity of E. coli O157:H7 is based on insertions and deletions of DNA sequences. First, octamer-based genome scanning has revealed distinct lineages of E. coli O157 strains. The polymorphic markers that distinguish the lines of descent have been shown to be the result of insertion and deletion of phages and prophages (14, 15). Second, the different banding patterns by PFGE have been shown to result from insertions and deletions containing the XbaI restriction sites, not SNPs (17). These deletion/insertion sites all were localized within O157-specific regions (O-islands) of the genome compared to the restriction sites E. coli O157:H7 has in common with E. coli K-12. Third, an analysis of the differences between the two published E. coli O157:H7 genomes indicated substantial differences attributable to insertions and deletions, because the total number of potential protein-encoding genes differs between the genomes by several dozen (10, 23). Additionally, Sakai RIMD 095520 has 1,632 O-island genes that are not found in E. coli K-12, while EDL933 has only 1,387 of these genes.

Sequence analysis has multiple advantages over fingerprinting-based methods, including shorter assay time, less subjectivity in interpretation of results, fully transferable data that are comparable among laboratories, and greater ease of automated computer analysis. Our study indicates that the genes we selected for analysis did not have sufficient variation to be useful as an epidemiological tool in E. coli O157:H7. Clearly, other approaches to identify informative regions of the genome will be required to develop improved methods for molecular subtyping of this important pathogen.


We thank Yuansha Chen for technical guidance with sequencing; Pallavi Garg for development of the dnaE primers; and the Allegheny County Health Department for its invaluable support, with special thanks to Joan McMahon and Mary Blazina for assistance in obtaining the isolates and Sharon Silvestri for providing epidemiologic data.

This work was sponsored in part by the Centers for Disease Control and Prevention and the Public Health Service under Cooperative Agreement no. U90/CCU318753-01 and by a Research Career Award, National Institute of Allergy and Infectious Diseases (K24 AI52788 to L. H. Harrison).

Any opinions, findings, conclusions, or recommendations expressed herein are those of the authors and do not reflect the views of the Public Health Service, Carnegie Mellon University, or the University of Pittsburgh.


1. Bender, J. B., C. W. Hedberg, J. M. Besser, D. J. Boxrud, K. L. MacDonald, and M. T. Osterholm. 1997. Surveillance for Escherichia coli O157:H7 infections in Minnesota by molecular subtyping. N. Engl. J. Med. 337:388-394. [PubMed]
2. Besser, R. E., P. M. Griffin, and L. Slutsker. 1999. Escherichia coli O157:H7 gastroenteritis and the hemolytic uremic syndrome: an emerging infectious disease. Annu. Rev. Med. 50:355-367. [PubMed]
3. Breuer, T., D. H. Benkel, R. L. Shapiro, W. N. Hall, M. M. Winnett, M. J. Linn, J. Neimann, T. J. Barrett, S. Dietrich, F. P. Downes, D. M. Toney, J. L. Pearson, H. Rolka, L. Slutsker, and P. M. Griffin. 2001. A multistate outbreak of Escherichia coli O157:H7 infections linked to alfalfa sprouts grown from contaminated seeds. Emerg. Infect. Dis. 7:977-982. [PMC free article] [PubMed]
4. Centers for Disease Control and Prevention. 1993. Update: multistate outbreak of Escherichia coli O157:H7 infections from hamburgers—Western United States, 1992-1993. Morb. Mortal. Wkly. Rep. 42:258-263. [PubMed]
5. Enright, M. C., N. P. J. Day, C. E. Davies, S. J. Peacock, and B. G. Spratt. 2000. Multilocus sequence typing for characterization of methicillin-resistant and methicillin-susceptible clones of Staphylococcus aureus. J. Clin. Microbiol. 38:1008-1015. [PMC free article] [PubMed]
6. Ewing, B., and P. Green. 1998. Base-calling of automated sequencer traces using Phred. II. Error probabilities. Genome Res. 8:186-194. [PubMed]
7. Ewing, B., L. Hillier, M. C. Wendl, and P. Green. 1998. Base-calling of automated sequencer traces using Phred. I. Accuracy assessment. Genome Res. 8:175-185. [PubMed]
8. Feil, E. J., J. M. Smith, M. C. Enright, and B. G. Spratt. 2000. Estimating recombinational parameters in Streptococcus pneumoniae from multilocus sequence typing data. Genetics 154:1439-1450. [PMC free article] [PubMed]
9. Gordon, D., C. Abajian, and P. Green. 1998. Consed: a graphical tool for sequence finishing. Genome Res. 8:195-202. [PubMed]
10. Hayashi, T., K. Makino, M. Ohnishi, K. Kurokawa, K. Ishii, K. Yokoyama, C. G. Han, E. Ohtsubo, K. Nakeyama, T. Murata, M. Tanaka, T. Tobe, T. Iida, H. Takami, T. Honda, C. Sasakawa, N. Ogasawara, T. Yasunaga, S. Kuhara, T. Shiba, M. Hattori, and H. Shinagawa. 2001. Complete genome sequence of entero-hemorrhagic Escherichia coli O157:H7 and genomic comparison with a laboratory strain, K-12. DNA Res. 8:11-22. [PubMed]
11. Izumiya, H., J. Terajima, A. Wada, Y. Inagaki, K.-I. Itoh, K. Tamura, and H. Watanabe. 1997. Molecular typing of enterohemorrhagic Escherichia coli O157:H7 isolates in Japan by using pulsed-field gel electrophoresis. J. Clin. Microbiol. 35:1675-1680. [PMC free article] [PubMed]
12. Jeanmougin, F., J. D. Thompson, M. Gouy, D. G. Higgins, and T. J. Gibson. 1998. Multiple sequence alignment with Clustal X. Trends Biochem. Sci. 23:403-405. [PubMed]
13. Keim, P., L. B. Price, A. M. Klevytska, K. L. Smith, J. M. Schupp, R. Okinaka, P. J. Jackson, and M. E. Hugh-Jones. 2000. Multiple-locus variable-number tandem repeat analysis reveals genetic relationships within Bacillus anthracis. J. Bacteriol. 182:2928-2936. [PMC free article] [PubMed]
14. Kim, J., J. Nietfeldt, and A. K. Benson. 1999. Octamer-based genome scanning distinguishes a unique subpopulation of Escherichia coli O157:H7 strains in cattle. Proc. Natl. Acad. Sci. USA 96:13288-13293. [PMC free article] [PubMed]
15. Kim, J., J. Nietfeldt, J. Ju, J. Wise, N. Fegan, P. Desmarchelier, and A. K. Benson. 2001. Ancestral divergence, genome diversification, and phylogeographic variation in subpopulations of sorbitol-negative, β-glucuronidase-negative enterohemorrhagic Escherichia coli O157. J. Bacteriol. 183:6885-6897. [PMC free article] [PubMed]
16. Kotetishvili, M., O. C. Stine, A. Kreger, J. G. Morris, Jr., and A. Sulakvelidze. 2002. Multilocus sequence typing for characterization of clinical and environmental Salmonella strains. J. Clin. Microbiol. 40:1626-1635. [PMC free article] [PubMed]
17. Kudva, I. T., P. S. Evans, N. T. Perna, T. J. Barrett, F. M. Ausubel, F. R. Blattner, and S. B. Calderwood. 2002. Strains of Escherichia coli O157:H7 differ primarily by insertions or deletions, not single-nucleotide polymorphisms. J. Bacteriol. 184:1873-1879. [PMC free article] [PubMed]
18. Lawrence, J. G., and H. Ochman. 1998. Molecular archaeology of the Escherichia coli genome. Proc. Natl. Acad. Sci. USA 95:9413-9417. [PMC free article] [PubMed]
19. Maiden, M. C., J. A. Bygraves, E. Feil, G. Morelli, J. E. Russell, R. Urwin, Q. Zhang, J. Zhou, K. Zurth, D. A. Caugant, I. M. Feavers, M. Achtman, and B. G. Spratt. 1998. Multilocus sequence typing: a portable approach to the identification of clones within populations of pathogenic microorganisms. Proc. Natl. Acad. Sci. USA 95:3140-3145. [PMC free article] [PubMed]
20. Mead, P. S., L. Slutsker, V. Dietz, L. F. McCaig, J. S. Bresee, C. Shapiro, P. M. Griffin, and R. V. Tauxe. 1999. Food-related illness and death in the United States. Emerg. Infect. Dis. 5:607-625. [PMC free article] [PubMed]
21. Nallapareddy, S. R., R. W. Duh, K. V. Singh, and B. E. Murray. 2002. Molecular typing of selected Enterococcus faecalis isolates: pilot study using multilocus sequence typing and pulsed-field gel electrophoresis. J. Clin. Microbiol. 40:868-876. [PMC free article] [PubMed]
22. Nicolas, P., G. Raphenon, M. Guibourdenche, L. Decousset, R. Stor, and A. B. Gaye. 2000. The 1998 Senegal epidemic of meningitis was due to the clonal expansion of A:4:P1.9, clone III-1, sequence type 5 Neisseria meningitidis strains. J. Clin. Microbiol. 38:198-200. [PMC free article] [PubMed]
23. Perna, N. T., G. Plunkett III, V. Burland, B. Mau, J. D. Glasner, D. J. Rose, G. F. Mayhew, P. S. Evans, J. Gregor, H. A. Kirkpatrick, G. Posfai, J. Hackett, S. Klink, A. Boutin, Y. Shao, L. Miller, E. J. Grotbeck, N. W. Davis, A. Lim, E. T. Dimalanta, K. D. Potamousis, J. Apodaca, T. S. Anantharaman, J. Lin, G. Yen, D. C. Schwartz, R. A. Welch, and F. R. Blattner. 2001. Genome sequence of enterohaemorrhagic Escherichia coli O157:H7. Nature 409:529-533. [PubMed]
24. Reid, S. D., C. J. Herbelin, A. C. Bumbaugh, R. K. Selander, and T. S. Whittam. 2000. Parallel evolution of virulence in pathogenic Escherichia coli. Nature 406:64-67. [PubMed]
25. Ribot, E. M., C. Fitzgerald, K. Kubota, B. Swaminathan, and T. J. Barrett. 2001. Rapid pulsed-field gel electrophoresis protocol for subtyping of Campylobacter jejuni. J. Clin. Microbiol. 39:1889-1894. [PMC free article] [PubMed]
26. Riley, L. W., R. S. Remis, S. D. Helgerson, H. B. McGee, J. G. Wells, B. R. Davis, R. J. Hebert, E. S. Olcott, L. M. Johnson, N. T. Hargrett, P. A. Blake, and M. L. Cohen. 1983. Hemorrhagic colitis associated with a rare Escherichia coli serotype. N. Engl. J. Med. 308:681-685. [PubMed]
27. Stine, O. C., S. Sozhamannan, Q. Gou, S. Zheng, J. G. Morris, and J. A. Johnson. 2000. Phylogeny of Vibrio cholerae based on recA sequence. Infect. Immun. 69:7180-7185. [PMC free article] [PubMed]
28. Swaminathan, B., T. J. Barrett, S. B. Hunter, and R. V. Tauxe. 2001. PulseNet: the molecular subtyping network for foodborne bacterial disease surveillance, United States. Emerg. Infect. Dis. 7:382-389. [PMC free article] [PubMed]
29. Zhou, J., M. C. Enright, and B. G. Spratt. 2000. Identification of the major Spanish clones of penicillin-resistant pneumococci via the internet using multilocus sequence typing. J. Clin. Microbiol. 38:977-986. [PMC free article] [PubMed]
30. Zhu, P., A. van der Ende, D. Falush, N. Brieske, G. Morelli, B. Linz, T. Popovic, I. G. Schuurman, R. A. Adegbola, K. Zurth, S. Gagneux, A. E. Platonov, J. Y. Riou, D. A. Caugant, P. Nicholas, and M. Achtman. 2001. Fit genotypes and escape variant of subgroup III Neisseria meningitidis during three pandemics of epidemic meningitis. Proc. Natl. Acad. Sci. USA 98:5234-5239. [PMC free article] [PubMed]

Articles from Journal of Clinical Microbiology are provided here courtesy of American Society for Microbiology (ASM)
PubReader format: click here to try


Save items

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


  • Gene (nucleotide)
    Gene (nucleotide)
    Records in Gene identified from shared sequence links
  • MedGen
    Related information in MedGen
  • Nucleotide
    Published Nucleotide sequences
  • PubMed
    PubMed citations for these articles

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...