Logo of nihpaAbout Author manuscriptsSubmit a manuscriptNIH Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Infect Genet Evol. Author manuscript; available in PMC Sep 19, 2007.
Published in final edited form as:
PMCID: PMC1983445

Population genetics of microbial pathogens estimated from multilocus sequence typing (MLST) data


The inference of population recombination (ρ), population mutation (Θ), and adaptive selection is of great interest in microbial population genetics. These parameters can be efficiently estimated using explicit statistical frameworks (evolutionary models) that describe their effect on gene sequences. Within this framework, we estimated ρand Θ using a coalescent approach, and adaptive (or destabilizing) selection under heterogeneous codon-based and amino acid property models in microbial sequences from MLST databases. We analyzed a total of 91 different housekeeping gene regions (loci) corresponding to one fungal and sixteen bacterial pathogens. Our results show that these three population parameters vary extensively across species and loci, but they do not seem to be correlated. For the most part, estimated recombination rates among species agree well with previous studies. Over all taxa, the ρ/Θ ratio suggests that each factor contributes similarly to the emergence of variant alleles. Comparisons of Θ estimated under finite- and infinite-site models indicate that recurrent mutation (i.e., multiple mutations at some sites) can increase Θ by up to 39%. Significant evidence of molecular adaptation was detected in 28 loci from 13 pathogens. Three of these loci showed concordant patterns of adaptive selection in two to four different species.

Keywords: Coalescent, Evolutionary models, Genetic diversity, Population structure, Recombination, Selection

1. Introduction

Maynard-Smith (1995) pointed out the need for population genetic insights when contemplating the evolutionary fate of infectious diseases. Population genetics is important in understanding the evolutionary history, epidemiology, and population dynamics of pathogens, the potential for and mode of the evolution of antibiotic resistance, and ultimately for public health control strategies. The key factors in the evolutionary response of pathogens to their environments can be measured by assessing the genetic diversity (and partitioning of that diversity within versus between populations), the impact of natural selection in shaping that diversity, and the impact of recombination in redistributing that diversity, sometimes into novel combinations. Population studies of pathogens using multilocus sequencing typing (MLST) methods are generally aimed at inferring genetic diversity (usually estimated as the relative contribution of recombination and mutation per allele or per site), selection pressure, and population structure (Spratt and Maiden, 1999; Maynard-Smith et al., 2000; Dingle et al., 2001; Feil et al., 2003; Meats et al., 2003; Viscidi and Demma, 2003) to study the relative impact of genetic drift and natural selection on the evolutionary history of these pathogens.

Population parameters can be efficiently estimated using explicit statistical models of evolution, such as the coalescent approach, that describe their effect on gene sequences (Hudson, 1990; Nordborg, 2001; Felsenstein, 2004). Consider, for example, recombination and mutation rates. They can be estimated separately using a standard coalescent approach that assumes large Fisher–Wright populations, nonoverlapping generations, constant population size, and no selection or migration (or recombination when estimating mutation rates). A model-based method such as this is almost certainly a simplification of reality, but the benefits gained are significant, namely the ease of comparison between genes or species, the ability to make predictions about the question of interest, and the potential to test whether the model of evolution is an adequate characterization of the underlying process (McVean et al., 2002).

In addition, in the case of recombination, the coalescent model can be used to test the presence of the parameter by comparing the likelihood of the data with and without recombination (Brown et al., 2001). Under “model-free” methods such as the index of association (Maynard-Smith et al., 1993) and the homoplasy test (Maynard-Smith and Smith, 1998), gene or species comparisons of recombination rates are problematic and there is little or noway of statistically testing whether data sets have different levels of recombination (Maynard-Smith et al., 2000; McVean et al., 2002).

When dealing with MLST sequence data it is important to have evolutionary models that accurately describe the process of DNA substitution (e.g., Yang et al., 1994; Yang, 1997; Kelsey et al., 1999; Posada and Crandall, 2001a). Accurate models can help clarify some of the most important processes of evolution (e.g., selection pressure) by the biological interpretation of their parameters, and provide more reliable estimates of other model-based statistics (e.g., coalescent estimates of recombination and mutation) (Goldman and Yang, 1994). The effect of natural selection on molecular sequence evolution is almost always calculated as an average over all codon (amino acid) sites in the gene and over the entire evolutionary time that separates the sequences (Yang et al., 2000a). But this criterion is a very stringent one for detecting positive selection, especially in conservative proteins such as those encoded by the housekeeping genes (Crandall et al., 1999). Conservative proteins present a high proportion of invariable amino acids and appear to be under purifying selection all the time (Li, 1997). Hence adaptive evolution, if present, is most likely punctual, that is, it will affect a few amino acid residue sites (e.g., Endo et al., 1996; Li, 1997; Yang et al., 2000a). Consequently, evolutionary models that do not allow for selection heterogeneity among sites, such as the one implemented by Nei and Gojobori (1986), will certainly not detect those few sites under positive selection. Several evolutionary models exist that account for site-specific differences on adaptive selection at the protein level (Nielsen and Yang, 1998; Yang et al., 2000a; McClellan and McCracken, 2001; Yang and Swanson, 2002), and their utility has been already demonstrated (e.g., Yang et al., 2000a,b; Haydon et al., 2001; Yang and Nielsen, 2002; McClellan et al., 2005); however, MLST data are not usually examined using these approaches.

MLST was proposed in 1998 (Maiden et al., 1998) as a general approach to provide accurate, portable data that were appropriate for bacterial epidemiological investigation and which also reflected their evolutionary and population biology (Urwin and Maiden, 2003). Since then, sequence data from 17 different prokaryotic and eukaryotic microbial pathogens and almost 100 housekeeping genes have been published and are currently available via the Internet. Now, several key questions concerning microbial population genetics can be addressed using these MLST databases: how do recombination, mutation, and selection pressure vary across species and loci? Are they correlated? Which is the major force generating genetic diversity? Are MLST housekeeping genes under adaptive selection? Our goal here is to answer these questions within an evolutionary-model framework using the approaches described above.

A logical concern in this study is the adequacy of the available MLST sequences for assessing these questions. The data retrieved from the databases, although representing the reported diversity of the organisms, are unstructured and are not necessarily representative of natural populations (Urwin and Maiden, 2003). Moreover, besides the particular case of the Neisseria database and, to some extent, the Helicobacter pylori database, all the other databases contain information from a limited number of isolates that do not represent the worldwide distribution of the species and rarely include the less pathogenic samples which frequently comprise the majority of the population (Spratt and Maiden, 1999). These caveats can obviously bias the estimates of the parameters of interest (recombination, mutation, and selection rates), although we think not to the extent of completely misleading the inferences deducted from them. Nevertheless, for comparative purposes, we will also analyze published subsets of the database sequence files including strains from asymptomatic carriage and local and worldwide collections

2. Materials and methods

2.1. Data sets

Our DNA sequence data sets consisted of 91 different loci corresponding to one yeast and fifteen bacterial pathogens (a total of 184 data sets; Table 1) downloaded from two MLST databases at http://www.mlst.net and http://pubmlst.org/ (see also acknowledgements). Seventeen additional data sets for Escherichia coli and Moraxella catarrhalis were provided by one of us (TW) and can be accessed at http://web.mpnb-berlin.mpg.de/mlst. We analyzed complete MLST allele sequences (as of January 2004) for each bacterial species in order to have a good representation of their population diversity. Additionally, for the following pathogens, we analyzed subsets of published data for comparison: Haemophilus influenzae (encapsulated and/or noncapsulated; Meats et al., 2003), H. pylori (Achtman et al., 1999), Neisseria meningitidis (Maiden et al., 1998), Staphylococcus aureus (Enright et al., 2000), and Streptococcus pneumoniae (Hanage et al., 2004). Although most of the isolates analyzed here were collected worldwide, others actually represent local populations (Neisseria gonorrhoeae, S. aureus, S. pneumoniae) and one is from asymptomatic carriage (S. pneumoniae).

Table 1
Population recombination rate (Γ) and the probability of Γ = 0 (indicated by asterisks) from the LPT test, population mutation rate using Watterson’s method under an infinite-sites model (ΓWi) and a finite-sites model (Γ ...

Sequences were aligned in Clustal X (Thompson et al., 1997) and then translated into amino acids using the universal reading frame in MacClade 4.05 (Maddison and Maddison, 2000). Haplotypes including stop codons were deleted from the analysis (e.g., the ndh locus from Burkholderia pseudomallei).

Models of nucleotide and codon substitution were assessed using the maximum likelihood approach described by Huelsenbeck and Crandall (1997) and Posada and Crandall (1998). Likelihood scores for each model were estimated in PAUP* 4.0b10 (Swofford, 2003) and then compared through a series of hierarchical likelihood ratio tests (LRT) to determine the best-fit model. When two models are nested, twice the log-likelihood difference will be compared with a χ2 distribution with the degrees of freedom ν equal to the difference in the number of parameters between the two models. Recent simulation studies have shown that this approach performs very well at recovering the true underlying model of evolution (Yang et al., 2000a; Posada, 2001; Posada and Crandall, 2001a; Anisimova et al., 2001).

2.2. Genetic analysis

Population recombination (ρ), population mutation (Θ), and molecular adaptive selection were estimated independently for each gene region and species.

2.2.1. Population recombination rate (ρ)

Within each gene region ρwas estimated using the standard likelihood coalescent approach implemented in the LDhat package (McVean et al., 2002). Within this framework, ρcan be expressed as 4Ner in diploid organisms (crossing-over model), where Ne is the inbreeding effective population size and r is the recombination rate per locus per generation, or as 8Nect in haploid organisms (gene-conversion model), where c is the per base rate of initiation of gene conversion and t is the average gene conversion tract length. This method has the desirable property of relaxing the infinite-sites assumption (typically violated by many empirical data sets (Posada et al., 2002)) and accommodates different models of molecular evolution (including, importantly, rate heterogeneity). LDhat implements a composite-likelihood estimate of ρ, which has the advantage of being more computationally efficient relative to full-likelihood methods, but without summarizing the data in a single statistic (Hudson, 2001). In addition, LDhat includes a powerful likelihood permutation test (LPT) to test the hypothesis of no recombination (ρ= 0). This method has proven to be more powerful than previous permutation-based methods for detecting recombination (McVean et al., 2002), thus we will also apply it in our analyses.

2.2.2. Population mutation rate

A coalescent estimate (no recombination) of Θ for haploids (2Neμ) and diploids (4Neμ) where μ is the mutation rate per locus per generation was calculated using the statistical method of Watterson (1975) as implemented in LDhat. This program generates an estimate of Θ based on the number of segregating sites in the sequences assuming an infinite-sites (i.e., mutations only occur once per site in a population) or a finite-sites model. By comparing both estimates, we will be able to draw inferences about the mutational process (e.g., lower estimates of Θ under the infinite-sites model will indicate occurrence of multiple mutations at some sites). Other more powerful maximum likelihood approaches to estimate Θ have been proposed (Kuhner et al., 1995, 1998), but these methods require a bifurcating phylogenetic tree, are computationally intense, and are more easily affected by the presence of recombination in the data (M.K. Kuhner, personal communication). Moreover, Fu and Li (1993) and Felsenstein (2004) have shown that Watterson’s estimator, although less efficient than maximum likelihood, is remarkably good.

2.2.3. Adaptive selection

The effect of natural selection is usually studied by comparing the fixation rates of nonsynonymous (amino acid-altering) and synonymous (silent) mutations within a maximum likelihood phylogenetic framework (Yang et al., 2000a). A measure that has featured prominently in such studies is the nonsynonymous/synonymous substitution rate ratio (ω = dN/dS) or acceptance rate (Miyata and Yasunaga, 1980). ω measures the selective pressure at the protein level, with ω = 1 meaning neutral mutations, ω < 1 purifying selection, and ω > 1 diversifying positive selection. We initially estimated ω per site for all data sets using the codon-based nested models M1 (neutral), M2 (selection) and M3 (discrete) of Yang et al. (2000a). Those genes under positive selection were then examined under models M7 (beta) and M8 (beta and ω). Model likelihood scores were compared using a LRT as described before. M2 (3 parameters) and M3 (5 parameters) are more general than model M1 (1 parameter) and can be compared with M1. Similarly, M7 (2 parameters) is a special case of model M8 (4 parameters) and can be compared the same way. When ω > 1 in M2, M3, or M8 positively selected sites are inferred from the data. We also applied the empirical Bayesian approach implemented by Nielsen and Yang (1998) to identify the potential sites under diversifying selection as indicated by a posterior probability (pP) > 0.95. Sites where pP is lower than this value will not be reported. Finally, for comparative purposes, we also estimated ω per gene using the Goldman and Yang (1994) model. All of the previous analyses were carried out in PAML 3.14b3 (Yang, 1997) and were performed under initial ω values >1 and <1, as recommended by the author. If positive selection was detected, we reran PAML several times to check convergence. Here, we reported the estimates obtained under the best likelihood scores.

Maximum likelihood and Bayesian inferences under codon-substitution models relies on the phylogenetic relationships among the sequences and do not account for the presence of recombination. Empirical results reported by Yang et al. (2000a) and simulations by Anisimova et al. (2001, 2002) indicated that the LRTs and the inference of sites under positive selection do not seem to be sensitive to the assumed tree topology (a neighbor-joining tree in our analyses), even if a star tree is used. Hence, presumably, our results are not biased by whichever phylogenetic process (clonal, epidemic, or panmictic) drives the population structure of the studied pathogens. Nevertheless, to test this hypothesis, values of ω > 1 were re-estimated using alternative maximum parsimony trees generated in PAUP*. High levels of recombination, however, seem to affect dramatically the accuracy of the LRT test and often recombination is mistaken as evidence of positive selection (simulations by Anisimova et al., 2003; although see Urwin et al., 2002 for a different opinion). Anisimova et al. (2003) showed that LRTs of M0–M3 and M1–M2 are heavily affected, but LRTof M7–M8 is much less (positive selection was falsely detected in only 20% of replicates at α = 5%). Identification of sites under positive selection using the Bayesian approach appears to be less influenced by high levels of recombination. The Bayesian method predicted incorrectly ~25% of the sites for M3, ~9% for M8, and ~5% for M2. However, when data were simulated at high levels of positive selection (ω = 6), Bayes’s site prediction becomes more accurate and powerful (concrete values are not reported).

McClellan et al. (2005) have recently shown that dN/dS ratios are less sensitive to detecting single adaptive amino acid changes than methods that evaluate positive selection in terms of the amino acid properties, which comprise protein phenotypes that selection at the molecular level may act upon. Hence, in addition to estimating adaptive selection under codon-substitution models M2, M3, and M8, we also estimated adaptive selection in terms of 31 quantitative biochemical properties using the model of McClellan and McCracken (2001) as implemented in TreeSAAP 3.2 (Woolley et al., 2003). No study has shown how tree topology and recombination affect the performance of the amino acid-property-based models implemented in Tree-SAAP. For the case of recombination, intuitively one could expect TreeSAAP to be less affected than PAML since the former infers selection at the phenotype level, hence its accuracy is independent of the force generating molecular change (mutation or recombination), and what really matters is if that physicochemical change is fixed or not (D.A. McClellan, personal communication). We will test all data sets under positive selection according to PAML using the protein model implemented in TreeSAAP. Based on a phylogenetic tree, this model establishes first a chronology of observable molecular evolutionary events. The frequency of these events are then analyzed in order to identify (1) amino acid properties that may have radically changed more often than expected by chance (presumably due to selection promoting the occurrence of radical amino acid replacements) and (2) amino acid sites associated with selection, thus establishing a correlation between the sites of positive selection and the structure and function of the protein. We followed the general procedure outlined in McClellan et al. (2005). In this study, we are particularly interested in detecting molecular adaptation, selection that results in radical structural or functional shifts in local regions of the protein. To this end, the range of possible changes in an amino acid property was divided into eight magnitude categories, with numbers 6, 7, and 8 denoting radical changes. An amino acid property is said to be affected by adaptive selection (referred to as positive-destabilizing selection) when the frequency of changes in magnitude categories 6–8 significantly exceed the frequency (or frequencies) expected by chance, as indicated by z-scores > 2.326 (P < 0.01). Particular amino acid residue sites affecting those properties were then also identified by z-scores > 2.326.

3. Results and discussion

3.1. Species comparisons

Evolutionary models chosen by the LRT, population recombination rates per locus (ρ) and the probability of ρ= 0 (indicated by asterisks) from the LPT, population mutation rates per locus using Watterson’s method under infinite- (ΘWi) and finite-sites models (ΘWf), and ratio of recombination to mutation (ρ/ ΘWf), for every species and locus are presented in Table 1. No single available model in Modeltest best fit all the data and almost all possible models were chosen as most appropriate for one or more data sets. HKY (Hasegawa et al., 1985) and TrN (Tamura and Nei, 1993) models were chosen more often, but highly diverse data sets (large ρand Θ) such as those of H. pylori required more complex models (TIM and GTR) to accommodate the observed variation. Most data sets presented rate heterogeneity (i.e., the evolutionary process exhibits site-to-site variation) as accounted for by the γ distribution, and a fraction of invariable sites (sites incapable of accepting substitutions). Hence, both parameters should be incorporated as part of the evolutionary model for inferring phylogenetic relationships when using model-based tree-building methods such as neighbor-joining, maximum likelihood or Bayesian inference. Violation of this assumption can have devastating consequences. Different models fit the same gene in different species; however, the same model fit multiple genes in some pathogens (e.g., Bacillus cereus, E. coli, H. influenzae, and H. pylori).

As expected population recombination and population mutation rates and levels of adaptive selection varied greatly between and within taxa, but some general trends can be observed. In the next section, we will describe them separately.

3.1.1. Population recombination rate (ρ)

H. pylori, N. gonorrhoeae, N. meningitidis, and S. pneumoniae showed high mean levels (ρ> 50) of intragenic recombination across loci, which supports prior conclusions (e.g., Maynard-Smith et al., 1993; Suerbaum et al., 1998; Feil et al., 1999, 2000a, 2001). B. cereus, H. influenzae, Streptococcus agalactiae, and Streptococcus pyogenes showed moderate levels of recombination (15 < ρ ≤ 50). Interestingly, this second species group contained some gene regions that recombine frequently whilst others do not. This could be due to variable selective pressures on the genome and/or temporal/geographical structuring generated by random genetic drift, which would not be surprising considering the wide distribution and temporal dispersion of the isolates analyzed. These data support previous conclusions for low rates of recombination for B. cereus (Vilas-Boas et al., 2002), S. pyogenes (Enright et al., 2001; Feil et al., 2001) and S. agalactiae (Jones et al., 2003). Finally, B. pseudomallei (and closely related species), M. catarrhalis, Staphylococcus epidermidis, Vibrio vulnificus, Campylobacter jejuni, Enterococcus faecium, E. coli, and S. aureus showed consistently low mean levels of ρ( ≤15). Little information has been published on the first four of these species, but clonal (low recombination) and epidemic (sexual but superficially clonal) population structures have been proposed for C. jejuni (Suerbaum et al., 2001) and E. faecium (Homan et al., 2002). The frequency of recombination in E. coli, S. aureus, and H. influenzae is still debated: some studies suggest low rates or clonal structures (Whittam, 1995; Feil et al., 2001, 2003), while others indicate the opposite (Feil et al., 1999, 2001; Meats et al., 2003). Our results show low mean ρrates for E. coli and S. aureus and a moderate rate for H. influenzae. It was surprising to find a value of ρ= 0 for all gene regions in E. coli (Table 1). LDhat estimates intragenic recombination and will estimate ρ= 0 if break points are distributed between the gene regions. Other commonly used approaches, however, are aimed to detect both intragenic recombination and allele replacement (Feil et al., 1999) or allele replacement (Maynard-Smith and Smith, 1998); hence, rate differences between our study and previous work (e.g., Feil et al., 1999) could be expected. Furthermore, all these methods differ significantly in their relative abilities to detect recombination, which may give them high false positive rates (Posada and Crandall, 2001b). A more detailed comparison among and within clonal complexes seems necessary to assess the role of recombination in E. coli.

We investigated whether our results depend on sample size by analyzing multiple subsets of data from five species including high, medium, and low recombinant taxa. These analyses yielded comparable mean ρvalues within each species, indicating that LDhat estimates of this parameter are not strongly affected by sample size (see other examples by Jolley et al., 2000; Maggi-Solcà et al., 2001; Feil et al., 2003; Viscidi and Demma, 2003). However, many MLST data sets represent biased samples that are concentrated on disease isolates and confirmation of our results with more population-based samples is desirable.

Based on the observed levels of population recombination, we could tentatively categorize the population structure of the studied pathogens as follows: the first and second groups of highly and moderate recombinant taxa, respectively, would conform to a panmictic or nonclonal model. We note that for almost all loci with ρ> 5, LDhat significantly rejected the alternative hypothesis of no recombination. The third group of species does not recombine or recombine only rarely; these taxa conform to a clonal (or almost clonal) model. Within a phylogenetic framework, the population structure of the panmictic group might be best described by a network approach (e.g., Posada and Crandall, 2001c). In contrast, a bifurcating tree could be used for the clonal species. The structure of the panmictic species including recombinant and nonrecombinant loci could be also assessed using a tree-based approach if the recombinant loci are excluded from the analysis. Genes with low levels of recombination according to LDhat could be concatenated prior to a phylogenetic analysis under a single model of evolution. Alternatively, gene-specific substitution models (i.e., mixed models) could be used for each gene region using a Bayesian approach in order to maximize the phylogenetic signal in the data. As an example, we have compared the minimum evolution trees obtained by Meats et al. (2003) using a K80 model after concatenating seven genes from encapsulated (eca) and noncapsulated (nca) H. influenzae isolates with the results under the best fit model (HKY + Γ + I) after excluding the two genes (mdh and pgi) with the highest recombination rates (ρ> 65). Nodal support using 1000 bootstrap replicates (Felsenstein, 1985) was higher for both data sets with trees based on the five concatenated genes (Fig. 1) and different relationships were indicated.

Fig. 1
Minimum evolution (ME) trees of encapsulated (eca) and noncapsulated (nca) isolates of Haemophilus influenzae using five low recombinant (ρ ≤ 20) genes. ME trees from Meats et al. (2003) using five low and two highly recombinant (ρ ...

3.1.2. Population mutation rate (Θ)

Overall, species with higher average number of alleles ( na) also showed higher average Θ values (r ≈ 0.59*), but this correlation is clearly altered by the amount of recombination in the data. For example, H. pylori, N. meningitidis, and S. pneumoniae, which have high mean na (95–323) and also high mean ρ(>52), showed similar average Θ values to other species with clearly less mean na such as E. coli (44 alleles) or S. aureus (58 alleles), but also low mean ρ(<6). The correlation between mean na and Θ increased significantly if these three species are deleted from the comparison (r ≈ 0.69**). This indicates that in the former three species punctual mutation is not the major evolutionary force generating allelic variation (see below). Subsets of isolates with a worldwide distribution showed similar Θ values to their corresponding full data sets. However, as expected, local isolates from S. aureus and S. pneumoniae showed lower Θ values presumably due to a more homogeneous environment and recent shared evolutionary history. For these same two species, ρrates between locally and widely dispersed isolates varied less. S. aureus seems to be an almost clonal taxa, thus differences in ρrates were not expected, but in the case of S. pneumoniae this last result can be explained based on the molecular differences existing between both evolutionary processes. Recombination reshuffles existing variation generated by mutation and can potentially create new variants without novel mutations. Thus, at the local population level where events are more recent in evolutionary history high levels of ρcan be seen even with little variation in Θ.

Encapsulated and noncapsulated H. influenzae isolates (Table 1) did not show notable variation in average Θ or ρ rates: ΘWeca = 13.22 versus ΘWnca = 11 and ρeca = 28.6 versus ρnca = 23.6. However, ME phylogenetic trees (Fig. 1) of noncapsulated isolates were more weakly supported than those of encapsulated trees (even after removing pgi and mdh), suggesting that the impact of recombination may be greater in the former than in the latter group, as reported by Meats et al. (2003).

All Θ estimates under the finite-sites model (ΘWf) were higher than those generated under the infinite-sites model (ΘWi). Differences varied based on the amount of genetic variation, but in some loci such as gdh from N. meningitidis recurrent mutation (i.e., some sites experiencing multiple mutations in the history of the sample) increased Θ by up to 39%. This stresses the need for using evolutionary models that relax the infinite-site assumption, such as those incorporated in LDhat, because recurrent mutation can generate patterns of genetic variability that resemble the effects of recombination (McVean et al., 2002).

The ratio between recombination and mutation is indicative of the contribution of each factor to the emergence of variant alleles (Feil et al., 1999, 2000a,b). Our results, as indicated by the mean ρ/ΘWf ratio, showed that recombina tion generates more divergence than mutation in nine taxa (mean ρ/ΘWf > 1.0) and less in seven cases (mean ρ/ ΘWf < 1.0). As expected, taxa with moderate or high levels of recombination showed greater ρ/ ΘWf values, but results varied among loci ranging from 0 to ~17. Nevertheless, we note that ρ= 100 was chosen as a cutoff as it is the limit for which likelihoods were estimated. This means that ρ> 100 could be expected for those loci with ρ= 100. Consequently, the extent of the differences between the contribution of recombination and mutation to diversity may be greater than reported for those species with high levels of ρ(close to 100), but over all taxa, one factor does not seem to prevail over the other.

We can test the hypothesis that recombination has a major impact in leading to genetic diversity across species and loci by examining the correlation of genetic diversity (as measured by the ΘWf estimator) and recombination rate. By looking at Table 1 we observe that species with similar mean ΘWf values (independently of na) such as B. cereus and H. influenzae or H. pylori and S. aureus differ in their mean ρvalues. Furthermore, over all taxa or loci, Θ Wf and ρ are clearly not correlated (r = ~0.16 and ~0.02, respectively) as shown in Fig. 2a. Hence, in general, we can conclude that genetic diversity and recombination are not correlated, which supports our previous conclusion that recombination does not prevail over mutation in generating diversity.

Fig. 2
Scattergrams of population recombination rates (a) and acceptance rates (b) and population mutation rates per locus. The locus abcZ from N. gonorrhoeae is not included in the scattergram (b).

3.1.3. Adaptive selection

Values of the dN/dS ratio per gene (ωM0) were < 1 for all species and loci except locus abcZ in N. gonorrhoeae (data not shown). Hence, on average, most loci and species seem to be under purifying selection. This has been confirmed in almost every genetic analysis of MLST sequences (e.g., Dingle et al., 2001; Feil et al., 2003; Meats et al., 2003). No apparent connection seems to exist between and ΘWf rates, as reflected by the observed low correlation (r = ~0.29) between both parameters (Fig. 2b). Thus there seems to be minimal impact of selection on genetic diversity due to the general lack of positive selection. Most variation within genes that encode essential metabolic enzymes, such as the housekeeping genes, is likely to be selectively neutral or deleterious (Li, 1997; Feil et al., 2000a). Adaptive evolution, if present, must be punctual. Hence, the criterion that this average ωM0 > 1 is a very stringent one for detecting adaptive selection (Crandall et al., 1999). Analyses of 91 housekeeping gene regions using models that account for ω heterogeneity among sites have identified 13, 33, and 28 loci under significant positive selection as indicated by the LRTs of M1–M2, M1–M3, and M7–M8, respectively, and number of potential sites nM ≠ 0 (Table 2). Under LRTs of M7–M8 (the most conservative model), all the species but B. pseudomallei, C. jejuni, S. epidermidis, and V. vulnificus seem to experience adaptive selection for one (e.g., B. cereus) to seven (N. gonorrhoeae) loci. The number of potential sites under diversifying selection ( nM), as identified by the Bayesian approach, ranged from one (e.g., pta locus from B. cereus) to nine (gpdh from N. gonorrhoeae). All these sites were also found by TreeSAAP ( nTS; Table 2) using a completely different procedure. Moreover, for most of the genes, additional sites under positive selection were found, which confirms that dN/dS ratios are not very sensitive to detecting adaptive selection in genes under low or moderate levels of diversifying selection (McClellan et al., 2005).

Table 2
Acceptance rate per site ( ωM2, ωM3, and ωM8) and proportion of sites (pM2, pM3, and pM8) under models M2 (selection), M3 (discrete), and M8 (beta and ω ) with a ω > 1, and number of sites under positive ...

Acceptance rates and detected number of sites (nM) under positive selection diminished in the subsets compare to the full data sets. TreeSAAP, in contrast, still showed evidence of significant (P < 0.01) destabilizing selection (nTS) in almost all of the same gene regions, although at a lower level (Table 2). This difference again reaffirms the higher sensitivity of the evolutionary model of McClellan and McCracken (2001) for detecting adaptive selection. As reported before by Anisimova et al. (2001, 2002), both power and accuracy of the LRT and Bayes tests decrease as sample size diminishes, especially when the sequences are highly similar. Both encapsulated and noncapsulated isolates of N. influenzae showed evidence of adaptive selection, although no clear differences in selective pressure between them were observed. Interestingly, the amino acid sites and physicochemical properties under destabilizing selection (TreeSAAP) varied between both groups (Table 3).

Table 3
Amino acid (AA) sites and physicochemical properties under destabilizing selection (z-score > 2.326, P < 0.01; TreeSAAP) for encapsulated and noncapsulated isolates of Haemophilus influenzae in adk and atpG

Simulations by Anisimova et al. (2003) questioned the efficiency of dN/dS for detecting positive selection under high levels of recombination, such as those observed in some of our data sets (e.g., B. cereus), since this force may inflate ω and nM estimates. Nevertheless, in some of the MLST genes analyzed here, the observed values of ω and nM are so high that it is hard to believe that LRTs are completely misleading in their conclusions, especially for M7–M8 comparisons (e.g., N. gonorrhoeae and S. agalactiae). Moreover, it has been shown that LRTs are conservative (Anisimova et al., 2001, 2003; Yang et al., 2000a), so genes inferred by the test to undergo positive selection are most likely true cases of adaptation rather than an artifact of the method, as proven in most of the published studies (e.g., Bishop et al., 2000; Peek et al., 2001; Yang et al., 2000b; Yang and Swanson, 2002). Besides, we have adopted an even more conservative approach since we are not considering the loci under significant positive selection for which positively selected sites were not identified. Furthermore, gene regions and sites undergoing adaptive selection under the models implemented in PAML were also verified by TreeSAAP using a completely different amino acid-based approach which potentially, is less affected by recombination. Therefore, in conclusion, we think that all of the previous evidence indicates that microbial MLST housekeeping genes are experiencing molecular adaptation. We find this quite surprising, since these genes were essentially selected as candidates for population genetic studies because of their lack of selection as inferred by the average ω ratio. Previous studies reporting lack of diversifying selection in these genes must be interpreted cautiously. Moreover, one should be aware of their lack of neutrality when used for population or molecular evolutionary studies. Nevertheless, we do not think that our findings invalidate the use of these molecular markers for typing purposes; we agree with Cooper and Feil (2004) that “the exclusion of genes that do not conform to classical housekeeping criteria is an ill-afforded luxury”.

The finding of selection in housekeeping loci raises important evolutionary questions such as: how do these adaptive changes affect the phenotypes (proteins)? Using TreeSAAP and PAML we have first identified the sites responsible for adaptive change, providing the initial information required to understand the changes in the form and function of proteins over evolutionary time (Anisimova et al., 2002). Specific hypotheses can then be formulated using this information, for example, to propose coevolu-tionary patterns between host and parasite (e.g., Bishop et al., 2000), study how pathogens escape the immune system (e.g., Haydon et al., 2001), or determine which structural and biochemical amino acid properties drive the evolution of proteins (e.g., McClellan et al., 2005). As an example of the latter, we have used TreeSAAP to detect amino acid properties under strong levels of destabilizing selection (z-scores > 2.326; P < 0.01) in adk and atpG for encapsulated and noncapsulated isolates of H. influenzae (Table 3). Using this approach we were able to identify a total of four and three different potential properties driving protein evolution of adk and atpG, respectively. Then, following McClellan et al. (2005), future studies using protein structure models could explore how these property changes may affect the conformation and function of adk and atpG and look into their interconnections with the epidemiology and pathogenesis of both typeable and nontypeable H. influenzae.

3.2. Locus comparisons

Tables 1 and and22 show how population recombination, population mutation, and adaptive selection rates per locus vary within and between species. As we have shown this information can be used to identify appropriate candidate loci for phylogenetics and population genetics, study protein evolution, target potentially useful MLST gene regions in other species, examine the evolution of antibiotic resistance, and explore the population dynamics of species. Another interesting angle to look at these two tables is comparing how these three parameters change among species for the same locus -is there any observable pattern of gene evolution? Our data sets consist of 91 loci of which 65 were screened for only a particular species and 27 were screened for two to five species, hence, the number of data sets per locus to compare is not very large. Nevertheless, it seems like ρand ΘWf vary arbitrarily between taxa, so no obvious gene-based pattern could be established. This is not completely surprising considering that these population parameters are driven by the particular biological and ecological characteristics of each species, although as mentioned before, natural selection and population structuring can also act upon particular genes. Adaptive selection, while influenced by biological and ecological factors, is mostly a reflection of the selection pressure operating at the protein level. Convergent evolutionary responses to similar diversifying selective regimes could result in concordant patterns of adaptive selection between species for a particular locus. Our scarce data indicate that most loci seem to be under nonconcordant patterns of adaptive selection pressure. However, aroE and xpt in two species and adk in four species showed significant ω > 1 under M8 or M3 (under nonsignificant values of ρ) and nM and nTS ≠ 0. , which may suggest a common pattern of positive selection for each of these genes. Further analyses including more species and loci are needed to confirm this hypothesis.

4. Summary

Model-based statistical methods are of great utility for inferring and testing a wide variety of evolutionary parameters and hypotheses. Here we have provided a robust example of their utility for inferring population recombination, population mutation, and selection rates and building consistent phylogenetic hypotheses of relationships using a large database of multilocus sequence typing sequence data from infectious microbial agents. Within this framework, important evolutionary questions within microbial genetics have been assessed and new ones have been proposed. We hope that the outcomes of our work will stimulate further research in the evolution of infectious diseases using statistical methodology.


This publication made use of the following MLST websites: Bacillus cereus (http://pubmlst.org/bcereus), Bur-kholderia pseudomallei (http://bpseudomallei.mlst.net), Candida albicans (http://calbicans.mlst.net), Campylobacter jejuni (http://pubmlst.org/campylobacter), Enterococcus faecium (http://efaecium.mlst.net), Haemophilus influenzae (http://haemophilus.mlst.net), Helicobacterpylori (http://pub-mlst.org/helicobacter), Neisseria (http://pubmlst.org/neis-seria), Streptococcus agalactiae (http://sagalactiae.mlst.net), Staphylococcus aureus (http://saureus.mlst.net), Staphylococcus epidermidis (http://sepidermidis.mlst.net), Streptococcus pneumoniae (http://spneumoniae.mlst.net), Streptococcus pyogenes (http://spyogenes.mlst.net), and Vibrio vulnificus (http://pubmlst.org/vvulnificus).

We thank Mark Achtman and three anonymous referees for their suggestions to improve this manuscript. We gratefully acknowledge support from the National Institutes of Health grants R01 AI50217 (RPV, KAC) and GM66276 (KAC) and from the Brigham Young University Office of Research and Creative Activities.


  • Achtman M, Azuma T, Berg DE, Ito Y, Morelli G, Pan ZJ, Suerbaum S, Thompson SA, van der Ende A, van Doom LJ. Recombination and clonal groupings within Helicobacter pylori from different geographical regions. Mol Microbiol. 1999;32:459–470. [PubMed]
  • Anisimova M, Bielawski JP, Yang Z. Accuracy and power of the likelihood ratio test to detect adaptive molecular evolution. Mol Biol Evol. 2001;18:1585–1592. [PubMed]
  • Anisimova M, Bielawski JP, Yang Z. Accuracy and power of Bayes prediction of amino acid sites under positive selection. Mol Biol Evol. 2002;19:950–958. [PubMed]
  • Anisimova M, Nielsen R, Yang Z. Effect of recombination on the accuracy of the likelihood method for detecting positive selection at amino acid sites. Genetics. 2003;164:1229–1236. [PMC free article] [PubMed]
  • Bishop JG, Dean AM, Mitchell-Olds T. Rapid evolution in plant chitinases: molecular targets of selection in plant–pathogen coevolution. Proc Natl Acad Sci USA. 2000;97:5322–5327. [PMC free article] [PubMed]
  • Brown CJ, Garner EC, Dunker AK, Joyce P. The power to detect recombination using the coalescent. Mol Biol Evol. 2001;18:1421–1424. [PubMed]
  • Cooper JE, Feil EJ. Multilocus sequence typing—what is resolved? Trends Microbiol. 2004;12:373–377. [PubMed]
  • Crandall KA, Kelsey CR, Imamichi H, Lane HC, Salzman NP. Parallel evolution of drug resistance in HIV: failure of nonsynon-ymous/synonymous substitution rate ratio to detect selection. Mol Biol Evol. 1999;16:372–382. [PubMed]
  • Dingle KE, Colles FM, Wareing DRA, Ure R, Fox AJ, Bolton FE, Bootsma HJ, Willems RJL, Urwin R, Maiden MCJ. Multilocus sequence typing for Campylobacter jejuni. J Clin Micro-biol. 2001;39:14–23. [PMC free article] [PubMed]
  • Endo T, Ikeo K, Gojobori T. Large-scale search for genes on which positive selection may operate. Mol Biol Evol. 1996;13:658–690. [PubMed]
  • Enright MC, Day NP, Davies CE, Peacock SJ, Spratt BG. Multilocus sequence typing for characterization of methicillin-resistant and methicillin-susceptible clones of Staphylococcus aureus. J Clin Microbiol. 2000;38:1008–1115. [PMC free article] [PubMed]
  • Enright MC, Spratt BG, Kalia A, Cross JH, Bessen DE. Multilocus sequence typing of Streptococcus pyogenes and the relationships between emm type and clone. Infect Immun. 2001;69:2416–2427. [PMC free article] [PubMed]
  • Feil EJ, Cooper JE, Grundmann H, Robinson DA, Enright MC, Berendt T, Peacock SJ, Maynard-Smith J, Murphy M, Spratt BG, Moore CE, Day NPJ. How clonal is Staphylococcus aureus? J Bacteriol. 2003;185:3307–3316. [PMC free article] [PubMed]
  • Feil EJ, Enright MC, Spratt BG. Estimating the relative contribution of mutation and recombination to clonal diversification: a comparison between Neisseria meningitidis and Streptococcus pneumoniae. Res Microbiol. 2000a;151:465–469. [PubMed]
  • Feil EJ, Holmes EC, Bessen DE, Chan MS, Day NP, Enright MC, Goldstein R, Hood DW, Kalia A, Moore CE, Zhou J, Spratt BG. Recombination within natural populations of pathogenic bacteria: short-term empirical estimates and long-term phylogenetic consequences. Proc Natl Acad Sci USA. 2001;98:182–187. [PMC free article] [PubMed]
  • Feil EJ, Maiden MCJ, Achtman M, Spratt BG. The relative contributions of recombination and mutation to the divergence of clones of Neisseria meningitidis. Mol Biol Evol. 1999;16:1496–1502. [PubMed]
  • Feil EJ, Maynard-Smith J, Enright MC, Spratt BG. Estimating recombinational parameters in Streptococcus pneumoniae from multilocus sequence typing data. Genetics. 2000b;154:1439–1450. [PMC free article] [PubMed]
  • Felsenstein J. Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol. 1981;17:368–376. [PubMed]
  • Felsenstein J. Confidence limits on phylogenies: an approach using the bootstrap. Evolution. 1985;39:783–791.
  • Felsenstein J. Inferring Phylogenies. Sinauer Associates; Sunderland, MA: 2004.
  • Fu YX, Li WH. Maximum likelihood estimation of population parameters. Genetics. 1993;134:1261–1270. [PMC free article] [PubMed]
  • Goldman N, Yang Z. A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol Biol Evol. 1994;11:725–736. [PubMed]
  • Hanage WP, Auranen K, Syrjanen R, Herva E, Makela PH, Kilpi T, Spratt BG. Ability of pneumococcal serotypes and clones to cause acute otitis media: implications for the prevention of otitis media by conjugate vaccines. Infect Immun. 2004;72:76–81. [PMC free article] [PubMed]
  • Hasegawa M, Kishino K, Yano T. Dating the human-ape splitting by a molecular clock of mitochondrial DNA. J Mol Evol. 1985;22:160–174. [PubMed]
  • Haydon DT, Bastos AD, Knowles NJ, Samuel AR. Evidence for positive selection in foot-and-mouth disease virus capsid genes from field isolates. Genetics. 2001;157:7–15. [PMC free article] [PubMed]
  • Homan WL, Tribe D, Poznanski S, Li M, Hogg G, Spalburg E, Van Embden JD, Willems RJ. Multilocus sequence typing scheme for Enterococcus faecium. J Clin Microbiol. 2002;40:1963–1970. [PMC free article] [PubMed]
  • Hudson RR. Gene genealogies and the coalescent process. In: Futuyma D, Antonovics J, editors. Oxford Surveys in Evolutionary Biology. Vol. 7. Oxford University Press; Oxford: 1990. pp. 23–36.
  • Hudson RR. Two-locus sampling distributions and their application. Genetics. 2001;159:1805–1817. [PMC free article] [PubMed]
  • Huelsenbeck JP, Crandall KA. Phylogeny estimation and hypothesis testing using maximum likelihood. Annu Rev Ecol Syst. 1997;28:437–466.
  • Jolley KA, Kalmusova J, Feil EJ, Gupta S, Musilek M, Kriz P, Maiden MC. Carried meningococci in the Czech Republic: a diverse recombining population. J Clin Microbiol. 2000;38:4492–4498. [PMC free article] [PubMed]
  • Jones N, Bohnsack JF, Takahashi S, Oliver KA, Chan MS, Kunst F, Glaser P, Rusniok C, Crook DW, Harding RM, Bisharat N, Spratt BG. Multilocus sequence typing system for group B streptococcus. J Clin Microbiol. 2003;41:2530–2536. [PMC free article] [PubMed]
  • Jukes TH, Cantor CR. Evolution of protein molecules. In: Munro HM, editor. Mammalian Protein Metabolism. Academic Press; New York, NY: 1969. pp. 21–132.
  • Kelsey CR, Crandall KA, Voevodin AF. Different models, different trees: the geographic origin of PTLV-I. Mol Phylogenet Evol. 1999;13:336–347. [PubMed]
  • Kimura M. A simple method for estimating evolutionary rate of base substitutions through comparative studies of nucleotide sequences. J Mol Evol. 1980;16:111–120. [PubMed]
  • Kimura M. Estimation of evolutionary distances between homologous nucleotide sequences. Proc Natl Acad Sci USA. 1981;78:454–458. [PMC free article] [PubMed]
  • Kuhner MK, Yamato J, Felsenstein J. Estimating effective population size and mutation from sequence data using Metropolis-Hastings sampling. Genetics. 1995;140:1421–1430. [PMC free article] [PubMed]
  • Kuhner MK, Yamato J, Felsenstein J. Maximum likelihood estimation of population growth rates based on the coalescent. Genetics. 1998;149:429–434. [PMC free article] [PubMed]
  • Li W-H. Molecular Evolution. Sinauer Associates; Sunderland, MA: 1997.
  • Maddison DR, Maddison WP. MacClade 4: Analysis of Phylogeny and Character Evolution. Sinauer Associates; Sunderland, MA: 2000.
  • Maggi-Solcà N, Bernasconi MV, Valsangiacomo C, Van Doom LJ, Piffaretti JC. Population genetics of Helicobacter pylori in the southern part of Switzerland analysed by sequencing of four housekeeping genes (atpD, glnA, scoB and recA), and by vacA, cagA, iceA and IS605 genotyping. Microbiology. 2001;147:1693–1707. [PubMed]
  • Maiden MCJ, Bygraves JA, Feil E, Morelli G, Russell JE, Urwin R, Zhang Q, Zhou J, Zurth K, Caugant DA, Feavers IM, Achtman M, Spratt BG. Multilocus-sequencing typing: a portable approach to the identification of clones within populations of pathogenic microorganisms. Proc Natl Acad Sci USA. 1998;95:3140–3145. [PMC free article] [PubMed]
  • Maynard-Smith J. Do bacteria have population genetics? In: Baumberg JP, Young W, Saunders JR, Wellington EMH, editors. Population Genetics of Bacteria. Society for General Microbiology, Symposium 52. Cambridge University Press; London: 1995. pp. 1–12.
  • Maynard-Smith J, Feil EJ, Smith NH. Population structure and evolutionary dynamics of pathogenic bacteria. Bioessays. 2000;22:1115–1122. [PubMed]
  • Maynard-Smith J, Smith NH. Detecting recombination from gene trees. Mol Biol Evol. 1998;15:590–599. [PubMed]
  • Maynard-Smith J, Smith NH, O’Rourke M, Spratt BG. How clonal are bacteria? Proc Natl Acad Sci USA. 1993;90:4384–4388. [PMC free article] [PubMed]
  • McClellan DA, McCracken KG. Estimating the influence of selection on the variable amino acid sites of the cytochrome B protein functional domain. Mol Biol Evol. 2001;18:917–925. [PubMed]
  • McClellan DA, Palfreyman EJ, Smith MJ, Moss JL, Christensen RG, Sailsbery JK. Physicochemical evolution and molecular adaptation of the cetacean and artiodactyl cytochrome b proteins. Mol Biol Evol. 2005;22:437–455. [PubMed]
  • McVean G, Awadalla P, Fearnhead P. A coalescent-based method for detecting and estimating recombination from gene sequences. Genetics. 2002;160:1231–1241. [PMC free article] [PubMed]
  • Meats E, Feil EJ, Stringer S, Cody AJ, Goldstein R, Kroll JC, Popovic T, Spratt BG. Characterization of encapsulated and noncapsulated Haemophilus influenzae and determination of phyloge-netic relationships by multilocus sequence typing. J Clin Microbiol. 2003;41:1623–1636. [PMC free article] [PubMed]
  • Miyata T, Yasunaga Molecular evolution of mRNA: a method for estimating evolutionary rates of synonymous and nonsynonymous amino acid substitutions from homologous nucleotide sequences and its applications. J Mol Evol. 1980;16:23–36. [PubMed]
  • Nei M, Gojobori T. Simple methods for estimating the number of synonymous and nonsynonimous nucleotide substitutions. Mol Biol Evol. 1986;3:418–426. [PubMed]
  • Nielsen R, Yang Z. Likelihood models for detecting positively selected amino acid sites and applications to de HIV-1 envelope gene. Genetics. 1998;148:929–936. [PMC free article] [PubMed]
  • Nordborg M. Coalescent theory. In: Balding DJ, Bishop M, Cannings C, editors. Handbook of Statistical Genetics. John Wiley and Sons Ltd; Chichester: 2001. pp. 179–212.
  • Peek AS, Souza V, Eguiarte LE, Gaut BS. The interaction of protein structure, selection, and recombination on the evolution of the type 1 fimbrial major submit (fimA) from Escheriachia coli. J Mol Evol. 2001;52:193–204. [PubMed]
  • Posada D. The effect of branch length variation on the selection of models of molecular evolution. J Mol Evol. 2001;52:434–444. [PubMed]
  • Posada D, Crandall KA. Modeltest: testing the model of DNA substitution. Bioinformatics. 1998;14:817–818. [PubMed]
  • Posada D, Crandall KA. A comparison of different strategies for selecting models of DNA substitution. Syst Biol. 2001a;50:580–601. [PubMed]
  • Posada D, Crandall KA. Evaluation of methods for detecting recombination from DNA sequences: computer simulations. Proc Natl Acad Sci USA. 2001b;98:13757–13762. [PMC free article] [PubMed]
  • Posada D, Crandall KA. Intraspecific gene genealogies: trees grafting into networks. TREE. 2001c;16:37–45. [PubMed]
  • Posada D, Crandall KA, Holmes EC. Recombination in evolutionary genomics. Annu Rev Genet. 2002;36:15–91. [PubMed]
  • Spratt BG, Maiden MCJ. Bacterial population genetics, evolution and epidemiology. Philos Trans R Soc Lond B. 1999;354:701–710. [PMC free article] [PubMed]
  • Suerbaum S, Lohrengel M, Sonnevend A, Ruberg F, Kist M. Allelic diversity and recombination in Campylobacter jejuni. J Bacteriol. 2001;183:2553–2559. [PMC free article] [PubMed]
  • Suerbaum S, Smith JM, Bapumia K, Morelli G, Smith NH, Kunst-mann E, Dyrek I, Achtman M. Free recombination within Helicobacter pylori. Proc Natl Acad Sci USA. 1998;95:12619–12624. [PMC free article] [PubMed]
  • Swofford DL. Phylogenetic Analysis Using Parsimony (PAUP and other methods) Sinauer Associates; Sunderland, MA: 2003.
  • Tamura K, Nei M. Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol Biol Evol. 1993;10:512–526. [PubMed]
  • Tavaré S. Some probabilistic and statistical problems in the analysis of DNA sequences. In: Miura RM, editor. Some Mathematical Questions in Biology—DNA Sequence Analysis. American Mathematical Society; Providence, RL: 1986. pp. 57–86.
  • Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG. The clustalX windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 1997;24:4876–4882. [PMC free article] [PubMed]
  • Urwin R, Holmes EC, Fox AJ, Derrick JP, Maiden MCJ. Phylogenetic evidence for frequent positive selection and recombination in the meningococcal surface antigen porB. Mol Biol Evol. 2002;19:1686–1694. [PubMed]
  • Urwin R, Maiden MCJ. Multi-locus sequence typing: a tool for global epidemiology. Trends Microbiol. 2003;11:479–487. [PubMed]
  • Vilas-Boas G, Sanchis V, Lereclus D, Lemos MV, Bourguet D. Genetic differentiation between sympatric populations of Bacillus cereus and Bacillus thuringiensis. Appl Environ Microbiol. 2002;68:1414–1424. [PMC free article] [PubMed]
  • Viscidi RP, Demma JC. Genetic diversity of Neisseria gonorrhoeae housekeeping genes. J Clin Microbiol. 2003;41:197–204. [PMC free article] [PubMed]
  • Watterson GA. On the number of segregating sites in genetical models without recombination. Theor Popul Biol. 1975;7:256–276. [PubMed]
  • Whittam TS. Genetic population structure and pathogenicity in enteric bacteria. In: Baumberg S, Young JPW, Wellington EMH, Saunders JR, editors. Population Genetics of Bacteria. Cambridge University Press; 1995. pp. 217–245.
  • Woolley S, Johnson J, Smith MJ, Crandall KA, McClellan DA. TreeSAAP: selection on amino acid properties using phylogenetic trees. Bioinformatics. 2003;19:671–672. [PubMed]
  • Yang Z. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci. 1997. pp. 555–556. http://abas-cus.gene.ucl.ac.uk/software/paml.html. [PubMed]
  • Yang Z, Goldman N, Friday A. Comparison of models for nucleotide substitution used in maximum likelihood phylogenetic estimation. Mol Biol Evol. 1994;11:316–324. [PubMed]
  • Yang Z, Nielsen R. Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages. Mol Bio Evol. 2002;19:908–917. [PubMed]
  • Yang Z, Nielsen R, Goldman N, Pedersen AMK. Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics. 2000a;155:431–449. [PMC free article] [PubMed]
  • Yang Z, Swanson WJ. Codon-substitution models to detect adaptive evolution that account for heterogeneous selective pressures among sites classes. Mol Biol Evol. 2002;19:49–57. [PubMed]
  • Yang Z, Swanson WJ, Vacquier VD. Maximum likelihood analysis of molecular adaptation in abalone sperm lysin reveals variable selective pressures among lineages and sites. Mol Biol Evol. 2000b;17:1446–1455. [PubMed]
  • Zharkikh A. Estimation of evolutionary distances between nucleotide sequences. J Mol Evol. 1994;39:315–329. [PubMed]
PubReader format: click here to try


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...