Logo of wtpaEurope PMCEurope PMC Funders GroupSubmit a Manuscript
Science. Author manuscript; available in PMC 2013 May 9.
Published in final edited form as:
PMCID: PMC3648787

Rapid pneumococcal evolution in response to clinical interventions


Epidemiological studies of the naturally transformable bacterial pathogen Streptococcus pneumoniae have previously been confounded by high rates of recombination. Sequencing 240 isolates of the PMEN1 (Spain23F-1) multidrug-resistant lineage enabled base substitutions to be distinguished from polymorphisms arising through horizontal sequence transfer. Over 700 recombinations were detected, with genes encoding major antigens frequently affected. Among these were ten capsule switching events, one of which accompanied a population shift as vaccine-escape serotype 19A isolates emerged in the USA following the introduction of the conjugate polysaccharide vaccine. The evolution of resistance to fluoroquinolones, rifampicin and macrolides was observed to occur on multiple occasions. This study details how genomic plasticity within lineages of recombinogenic bacteria can permit adaptation to clinical interventions over remarkably short timescales.

Streptococcus pneumoniae is a highly recombinogenic human nasopharyngeal commensal and respiratory pathogen estimated to be responsible for a global burden of almost 15 million cases of invasive disease in 2000 (1). Since the 1970s, the susceptibility of the pneumococcal population to antibiotics has decreased, largely as a consequence of the emergence and spread of a few multidrug resistant clones (2). The first recognized example was Pneumococcal Molecular Epidemiology Network clone 1 (PMEN1), an S. pneumoniae lineage typically identified as being resistant to multiple antibiotics (including penicillin), sequence type (ST) 81 and serotype 23F. The genome sequence of the first identified member of the clone, isolated in a hospital in Barcelona in 1984, revealed it had acquired a Tn5252-type integrative and conjugative element (ICE) that carries a linearised chloramphenicol resistance plasmid and a Tn916-type element with a tetM tetracycline resistance gene (3).

This lineage was subsequently found to be present in Africa, Asia and America (4-8), and by the late 1990s was estimated to be causing almost 40% of penicillin-resistant pneumococcal disease in the USA (9). Following the introduction of a heptavalent conjugate polysaccharide vaccine (PCV7) in many countries since 2000, which includes capsule type 23F as one of its seven antigens, a decrease in the frequency of serotype 23F invasive disease and carriage has been observed (10). However, this has been accompanied by a rise in disease caused by non-vaccine serotype pneumococci, such as the multidrug resistant serotype 19A strains now common in the USA (11). Based on evidence from multi locus sequence typing (MLST) from America (12) and Europe (13), some of these are thought to include capsule switch variants of PMEN1 lineage.

To study how this lineage has evolved as it has spread, we used Illumina sequencing of multiplexed genomic DNA libraries to characterize a global collection of 240 PMEN1 strains isolated between 1984 and 2008. Strains were identified either using MLST or on the basis of serotype, drug resistance profile and targeted PCR (14). Selected isolates were distributed between Europe (7 countries, 81 strains), South Africa (37 strains), America (6 countries, 54 strains) and Asia (8 countries, 68 strains) (Table S1) and included a variety of drug-resistance profiles, as well as five serotypes distinct from the ancestral 23F: 19F (also included in PCV7), 19A, 6A, 15B and 3.

Construction of the phylogeny

Sequence reads were mapped against the complete reference chromosome of S. pneumoniae ATCC 700669 (3) and, using the criteria described in Harris et al. (15), 39,107 polymorphic sites were identified within the PMEN1 lineage. Maximum likelihood analysis produced a phylogeny with a high proportion of homoplasic sites (23%) and a weak correlation between the date of a strain’s isolation and its distance from the root of the tree (Pearson correlation, N = 222, R2 = 0.05, p = 0.001; Fig. S1), suggesting variation was primarily arising through incorporation of imported DNA and not through steady accumulation of base substitutions. As these strains are closely related, sequences acquired by recombination could be identified as loci with a high density of polymorphisms. These events were reconstructed onto the phylogeny and, using an iterative algorithm (14), an alignment and tree based on vertically inherited base substitutions alone was generated.

From this analysis (Fig. 1), a total of 57,736 single nucleotide polymorphisms (SNPs) were identified, with 50,720 (88%) being introduced by 702 recombination events. This gives a per site r/m ratio (the relative likelihood that a polymorphism was introduced through recombination rather than point mutation) of 7.2, less than the previously calculated value of ~66 from MLST data (16). By removing recombination events from the phylogeny, the number of homoplasic sites is reduced by 97% and the tree has significantly shortened branches such that root-to-tip distance more strongly correlates with date of isolation (R2 = 0.46, p = <2.2 × 10−16; Figs. S1, S2). The rate at which base substitutions occur outside of recombinations suggests a mutation rate of 1.57 × 10−6 substitutions per site per year (95% confidence interval 1.34-1.79 × 10−6), close to the estimate of 3.3 × 10−6 substitutions per site per year from Staphylococcus aureus ST239 (15) and much higher than that found between more distantly related isolates (17). Furthermore, by excluding SNPs introduced through recombinations, the date of origin of the lineage implied by the tree moved from about 1930, which pre-dates the introduction of penicillin, chloramphenicol and tetracycline, to about 1970 (Fig. S1).

Fig. 1
Phylogeography and sequence variation of PMEN1. (A) Global phylogeny of PMEN1. The maximum likelihood tree, constructed using substitutions outside of recombination events, is coloured according to location, as reconstructed through the phylogeny using ...

Widespread recombination and antigenic variation

Even in this sample of a single lineage, 74% of the reference genome length has undergone recombination in at least one isolate, with a mean of 74,097 bp of sequence affected by recombination in each strain. This encompasses both site-specific integrations of prophage and conjugative elements, and homologous recombinations mediated by the competence system. The 615 recombinations outside of the prophage and ICE vary in size from 3 bp to 72,038 bp, with a mean of 6.3 kb (Fig. S3). Within these homologous recombinations, there is a distinct heterogeneity in the density of polymorphisms, although it is unclear whether this represents a consequence of the mechanism by which horizontally acquired DNA is incorporated or a property of the donor sequence.

Recombination ‘hotspots’ are evident in the genome where horizontal sequence transfers are detected abnormally frequently (Fig. 1). One of the most noticeable is within Tn916, concentrated around the tetM gene. Excepting the prophage, the other loci – pspA, pspC, psrP and the capsule biosynthesis (cps) locus – are all major surface structures. PspA and PspC are potential protein vaccine targets implicated in pneumococcal pathogenesis that are targeted by antibodies produced during experimental human carriage studies (18). PsrP is a large 4,433 amino acid serine-rich protein, present in a subset of pneumococci, likely to be modified by a number of glycosyltransferases that are encoded on the same genomic island. In a mouse model of infection, PsrP-targeting antibodies can block pneumococcal infection (19). Hence it seems likely that these loci are under diversifying selection driven by the human immune system, and consequently the apparent increase in the frequency of recombination in these regions is due to the selective advantage that is offered by the divergent sequence introduced by such recombination events.

In addition to base substitutions, 1,032 small (<6 bp) insertion and deletion events can be reconstructed onto the phylogeny, of which 61% are concentrated in the 13% of the genome that does not encode for protein coding sequences (CDSs), probably due to selection against the introduction of frameshift mutations. Throughout the phylogeny, 331 CDSs are predicted to be affected by either frameshift or premature stop codon mutations. Modeling these disruptive events as a Poisson distributed process occurring at a rate proportional to the length of the CDS, 12 CDS were significantly enriched for disruptive mutations after correction for multiple testing (table S5). These included pspA and a glycosyltransferase posited to act on psrP (SPN23F17730). This again suggests there may be a selective pressure acting to either remove (pspA) or alter (psrP) two major surface antigens. Furthermore, the longest recombination in the dataset spans, and deletes, the psrP-encoding island, showing that such non-essential antigens can be quickly removed from the chromosome. These data imply that the pneumococcal population is likely to be able to respond very rapidly to the introduction of some of the protein antigen based pneumococcal vaccines currently under development.

Population and serotype dynamics

The spread of PMEN1 can be tracked using the phylogeography indicated by the tree (Fig. 1). There are several European clades with their base near the root of the tree, and a parsimony-based reconstruction of location supports a European origin for the lineage. Interspersed amongst the European isolates are samples from Central and South America, which may represent an early transmission from Spain, where the clone was first isolated, to Latin America, a route previously suggested to occur by data from S. aureus (15). One clade (labeled ‘A’ in Fig. 1), containing South African isolates from 1989 to 2006, appears to have originated from a single highly successful intercontinental transmission event. There is also a cluster of isolates from Ho Chi Minh City (labeled ‘V’) representing a transmission to South East (SE) Asia. However, the predominant clade found outside of Europe (labeled ‘I’) appears to have spread quite freely throughout North America, SE Asia and Eastern Europe, implying there are few barriers to intercontinental transmission of S. pneumoniae between these regions.

The final non-European group consists of serotype 19A US isolates (labeled ‘U’). These all date from between 2005 and 2007, and are distinct from all other US PMEN1 isolates, which have capsular types included in PCV7 (Fig. S4). This is evidence of a shift in the PMEN1 population in the USA: rather than the resident population changing capsule type, it seems that it has been eliminated by the vaccine and replaced by a different sub-population within the lineage that has expanded to fill the vacated niche. Similarly, a pair of Spanish isolates from 2001 (labeled ‘S’), the year in which PCV7 was introduced in Spain, that have independently acquired a 19A capsule are not closely associated with any other European isolates. The estimated times of origin for clades ‘U’ (1996; 95% credible interval 1992-1999) and ‘S’ (1998; 95% credible interval 1996-1999) both pre-date the introduction of PCV7, and accordingly a third 19A switch, from Canada, was isolated in 1994. Hence it appears that these changes in serotype following vaccine introduction result from an expansion of pre-existing capsular variants, which were relatively uncommon and not part of the predominant population, and would have therefore been difficult to detect prior to the existence of the selection pressure exerted by the vaccine.

Seven further serotype switching events can be detected in the data (Fig. 2), including three switches to serotype 19F. The polyphyletic nature of these 19F isolates is supported by the variation observed between the acquired cps loci, as is also the case for the 19A isolates (Fig. S5). The previously known switches to serotypes 3, 6A and 15B are only found to occur once each in the phylogeny, and in addition, a single untypeable Korean sample was identified as being a serotype 14 variant by mapping reads to known cps loci (20). The recombination events leading to these switches ranged from 21,780 bp to 39,182 bp in size, with a mean of 28.2 kb. Only 35 homologous recombinations of an equivalent size or larger occur elsewhere in the genome, with most such events being much smaller (Fig. S3), making it surprising that serotype switching occurs with such frequency and indicating a role for balancing selection at this locus. Additionally, the span of these events appears to be limited by the flanking penicillin-binding protein genes, the sequences of which are crucial in determining beta lactam resistance in pneumococci (21). Only the recombination causing the switch to serotype 3 affects one of these, and it introduces just a single SNP into the pbpX CDS, which does not appear to compromise the strain’s penicillin resistance (Table S1). Hence the positioning of these two genes may hinder the transfer of capsule biosynthesis operons from penicillin-sensitive to penicillin-resistant pneumococci via larger recombinations, although size constraints alone could also cause such a distribution.

Fig. 2
Recombinations causing serotype switching events. (A) The annotated cps locus of the reference strain. CDSs involved in capsule biosynthesis are coloured according to their role. Genes in red are regulatory; those in blue synthesise and modify the oligosaccharide ...

Resistance to non-beta lactam antibiotics

The strong selection pressures exerted by antibiotics on the PMEN1 lineage are manifest as multiple examples of geographically disparate isolates converging on common resistance mechanisms. Single base substitutions causing reduced susceptibility to some classes of antibiotics have occurred multiple times throughout the phylogeny, as observed in S. aureus (15) and Salmonella Typhi (22) populations, including mutations in parC, parE and gyrA causing increased resistance to fluoroquinolone antibiotics (23) and changes in rpoB causing resistance to rifampicin (24). The S79F, S79Y and D83N mutations in parC are estimated to occur nine, three and five times respectively in PMEN1, while D435N in the adjacent parE gene is found to happen three times. The S81F and S81Y substitutions, in the same position of gyrA, are found four and two times respectively. None of these mutations are predicted to have been introduced by recombination, whereas changes at position H499 of rpoB causing rifampicin resistance are introduced twice by horizontal transfer and three times via base substitution.

Resistance to macrolide antibiotics tends not to derive from SNPs, but from acquisition of CDS facilitating one of the two common resistance mechanisms: methylation of the target ribosomal RNA by erm genes, and removal of the drug from the cell by mef-type efflux pumps. Both can be found in the PMEN1 population and in all cases, the genes appear to be integrated into the Tn916 transposon (Fig. 3). They are carried by three different elements. Tn917, consisting of an ermB gene with an associated transposon and resolvase, inserts into orf9 of Tn916 (25). A second has been characterized as the Mega element (26), which carries a mef/mel efflux pump system and in PMEN1 inserts upstream of orf9. A third element (henceforth referred to as an Omega element, for Omega and Multidrug-resistance Encoding Genetic Assembly) carries both an ermB gene and an aminoglycoside phosphotransferase, with the latter flanked by direct repeats of omega transcriptional repressor genes, and is found just downstream of orf20.

Fig. 3
Acquisition of macrolide resistance cassettes. The three full-length resistance cassettes are shown in (A): the Omega element, which carries an aph3′ aminoglycoside resistance gene and an ermB macrolide resistance gene; Tn917, which just carries ...

Rather than a single acquisition of these elements occurring, and the resulting clones spreading and replacing macrolide-sensitive isolates, all three elements appear to have been acquired multiple times across the phylogeny (Fig. S10). The Mega element is predominately shared by isolates in clade ‘I’, although the ermB-encoding Omega element appears to have been subsequently acquired on two occasions, and Tn917 has entirely superseded the Mega element in one isolate. This is congruent with the known advantages of target methylation over drug efflux as a broader-spectrum resistance mechanism (27). In most instances of the Omega element, only the ermB-encoding part remains: the aminoglycoside phosphotransferase appears to have been deleted through a recombination between the Omega-encoding genes, leaving only an Omega domain-encoding open reading frame fused to orf20 as a scar. This implies that the benefit of the aminoglycoside resistance element may have not been sufficient to maintain it on the ICE.

Components of the accessory genome

Other than the insertion of these cassettes, the ICE itself is otherwise relatively unchanged throughout the population. In two cases, the 5′ region of the element up to, and including, the lantibiotic synthesis machinery is deleted, while the self-immunity genes are retained (Fig. S6). This deletion, which also removes the integrated chloramphenicol resistance plasmid, is analogous to that observed in the Pneumococcal Pathogenicity Island-1 of the PMEN1 lineage, in which all that remains is the immunity genes from a once intact lantibiotic synthesis machinery (3). In two other cases, the ICE has been supplanted by alternative transposons, both of which are similar composites of Tn5252 and Tn916-type elements: in S. pneumoniae 11876, a wholesale replacement at the same locus entails the gain of an Omega element at the expense of losing resistance to chloramphenicol (Fig. S7), while in isolate 11930 the new ICE inserts elsewhere in the chromosome and carries two ermB genes as well as a chloramphenicol acetyltransferase (Fig. S8). The only other identified conjugative element was an ICESt1-type transposon shared by isolates 8140 and 8143 (Fig. S9), and the only extrachromosomal element present in the dataset was the plasmid pSpn1 (28), found in isolate SA8.

The accessory genome is primarily composed of prophage sequence (Fig. S11), with little evidence of much variation in the complement of metabolic genes. Viral sequences appear to be a transient feature of the pneumococcal chromosome (Fig. S12), with few persisting long enough to be detected in related isolates. Four of the new prophage that could be assembled were found to insert into the competence pilus structural gene comYC, which lies within an operon shown to be essential for competence in S. pneumoniae (29). In two cases where such phage appear to be shared through common descent by pairs of isolates, no recombination events can be detected that are unique to either member of the pair, consistent with the competence system being ablated in these isolates. Furthermore, assaying the competence of available lysogenic strains in vitro also suggested these phage insertions abrogate the ability of their host to take up exogenous DNA (Fig. S13).


The ability to distinguish vertically acquired substitutions from horizontally acquired sequences is crucial to successfully reconstructing phylogenies for recombinogenic organisms such as S. pneumoniae. Phylogenies are in turn essential for detailed studies of events such as intercontinental transmission, capsule type switching and antibiotic resistance acquisition. While current epidemiological typing methods have indicated that recombination is frequent among the pneumococcal population, they cannot sufficiently account for its impact on relationships between strains at such high resolution. Only the availability of such a sample of whole genome sequences makes it possible to adequately reconstruct the natural history of a lineage. The base substitutions used to construct the phylogeny have accumulated over about 40 years and occur, on average, once every 15 weeks. Recombinations happen at a rate approximately 10 fold slower, but introduce a mean of 72 SNPs each. The responses to the different anthropogenic selection pressures acting on this variation are quite distinct. The apparently weak selection by aminoglycosides and chloramphenicol has lead to the occasional deletion of loci encoding resistance to these antibiotics. By contrast, resistance to macrolide antibiotics has been acquired frequently throughout the phylogeny, with selection strong enough to drive supplementation or replacement of the resistance afforded by the mef efflux pump with the broader-range resistance provided by ermB-mediated target modification. The response to vaccine selection is different, involving the depletion of the resident population before it can respond to the selection pressure and thereby opening the niche to isolates that already expressed non-vaccine serotypes. This is likely to reflect the high host population coverage of PCV7 in the USA, as opposed to macrolides or other antibiotics, and the relative likelihood of the recombination events that underlie these responses.

Over a few decades this single pneumococcal lineage has acquired drug resistance and the ability to evade vaccine pressure multiple times, demonstrating the remarkable adaptability of recombinogenic bacteria such as the pneumococcus. PMEN1 is, nevertheless, only one lineage of this pathogen. Our relative ignorance of the forces impacting bacterial evolution over the long term is illustrated by BM4200 (30), a multi-drug resistant serotype 23F isolate of ST1010 sequenced as the outgroup for this analysis (Fig. 1). This isolate dates to 1978, but despite its apparent similarity to PMEN1 related strains have been found very rarely since then. Hence this phenotype is not sufficient to guarantee success, an observation supported by the continued presence of successful but susceptible pneumococci in the population (31, 32). Improved understanding of the interplay between ecology and adaptation in other lineages through further focused sequencing programs may prove crucial to the future control of this, and other, diverse bacterial pathogens.

Supplementary Material

supporting material



We thank the participating surveillance networks, listed in Table S1, and the core informatics, library making and sequencing teams at the Wellcome Trust Sanger Institute. Attending authors were grateful for the opportunity to discuss this project at the Permafrost conference. Sequence accession codes are given in Tables S1 and S2. This work was funded by the Wellcome Trust.


Supporting Online Material www.sciencemag.org Materials and Methods Figs. S1-13 Tables S1-S5

References and notes

1. O’Brien KL, et al. Lancet. 2009;374:893. [PubMed]
2. McGee L, et al. J. Clin. Microbiol. 2001;39:2565. [PMC free article] [PubMed]
3. Croucher NJ, et al. J. Bacteriol. 2009;191:1480. [PMC free article] [PubMed]
4. Munoz R, et al. J. Infect. Dis. 1991;164:302. [PubMed]
5. Parry CM, et al. Antimicrob. Agents. Chemother. 2002;46:3512. [PMC free article] [PubMed]
6. Klugman KP, et al. Eur. J. Clin. Microbiol. Infect. Dis. 1994;13:171. [PubMed]
7. McGee L, Klugman KP, Friedland D, Lee HJ. Microb. Drug Resist. 1997;3:253. [PubMed]
8. Tarasi A, Chong Y, Lee K, Tomasz A. Microb. Drug Resist. 1997;3:105. [PubMed]
9. Corso A, Severina EP, Petruk VF, Mauriz YR, Tomasz A. Microb. Drug Resist. 1998;4:325. [PubMed]
10. Dagan R, Klugman KP. Lancet Infect. Dis. 2008;8:785. [PubMed]
11. Moore MR, et al. J. Infect. Dis. 2008;197:1016. [PubMed]
12. Munoz-Almagro C, et al. Clin. Infect. Dis. 2008;46:174. [PubMed]
13. Ardanuy C, et al. J. Antimicrob. Chemother. 2009;64:507. [PubMed]
14. Materials and methods are available as supporting material on Science Online.
15. Harris SR, et al. Science. 2010;327:469. [PMC free article] [PubMed]
16. Feil EJ, Smith JM, Enright MC, Spratt BG. Genetics. 2000;154:1439. [PMC free article] [PubMed]
17. Ochman H, Elwyn S, Moran NA. Proc. Natl. Acad. Sci. U S A. 1999;96:12638. [PMC free article] [PubMed]
18. McCool TL, et al. Infect. Immun. 2003;71:5724. [PMC free article] [PubMed]
19. Rose L, et al. J. Infect. Dis. 2008;198:375. [PubMed]
20. Bentley SD, et al. PLoS Genet. 2006;2:e31. [PMC free article] [PubMed]
21. Trzcinski K, Thompson CM, Lipsitch M. J. Bacteriol. 2004;186:3447. [PMC free article] [PubMed]
22. Holt KE, et al. Nat. Genet. 2008;40:987. [PMC free article] [PubMed]
23. Pletz MW, et al. Emerg. Infect. Dis. 2006;12:1462. [PMC free article] [PubMed]
24. Ferrandiz MJ, et al. Antimicrob. Agents. Chemother. 2005;49:2237. [PMC free article] [PubMed]
25. Shaw JH, Clewell DB. J. Bacteriol. 1985;164:782. [PMC free article] [PubMed]
26. Del Grosso M, Camilli R, Iannelli F, Pozzi G, Pantosti A. Antimicrob. Agents Chemother. 2006;50:3361. [PMC free article] [PubMed]
27. Del Grosso M, Northwood JG, Farrell DJ, Pantosti A. Antimicrob. Agents Chemother. 2007;51:4184. [PMC free article] [PubMed]
28. Romero P, et al. Plasmid. 2007;58:51. [PubMed]
29. Pestova EV, Morrison DA. J. Bacteriol. 1998;180:2701. [PMC free article] [PubMed]
30. Buu-Hoi A, Horodniceanu T. J. Bacteriol. 1980;143:313. [PMC free article] [PubMed]
31. Hanage WP, Fraser C, Tang J, Connor TR, Corander J. Science. 2009;324:1454. [PubMed]
32. Colijn C, et al. J. R. Soc. Interface. 2009;7:905. [PMC free article] [PubMed]
PubReader format: click here to try


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...