![]() | ![]() |
Formats:
|
||||||||||||||||||||||||||||||
Copyright © 2008 Hao and Golding; licensee BioMed Central Ltd. Uncovering rate variation of lateral gene transfer during bacterial genome evolution 1Department of Biology, McMaster University, Hamilton, Ontario L8S 4K1, Canada Corresponding author.Weilong Hao: Haow/at/indiana.edu; G Brian Golding: Golding/at/McMaster.CA Received December 19, 2007; Accepted May 20, 2008. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. This article has been corrected. See BMC Genomics. 2008 November 25; 9: 556. This article has been cited by other articles in PMC.Abstract Background Large scale genome arrangement, such as whole gene insertion/deletion, plays an important role in bacterial genome evolution. Various methods have been employed to study the dynamic process of gene insertions and deletions, such as parsimony methods and maximum likelihood methods. Previous maximum likelihood studies have assumed that the rate of gene insertions/deletions is constant over different genes. This assumption is unrealistic. For instance, it has been shown that informational genes are less likely to be laterally transferred than non-informational genes. However, how much of the variation in gene transfer rates is due to the difference between informational genes and non-informational genes is unclear. In this study, a Γ-distribution was incorporated in the likelihood estimation by considering rate variation for gene insertions/deletions between genes. This makes it possible to address whether a difference between informational genes and non-informational genes is the main contributor to rate variation of lateral gene transfers. Results The results show that models incorporating rate variation fit the data better than do constant rate models in many phylogenetic groups. Even though informational genes are less likely to be laterally transferred than non-informational genes, the degree of rate variation for insertions/deletions did not change dramatically and remained high even when informational genes were excluded from the study. This suggests that the variation in rate of insertions/deletions is not due mainly to the simple difference between informational genes and non-informational genes. Among genes that are not classified as informational and among the informational genes themselves, there are still large differences in the rates that these genes are inserted and deleted. Conclusion While the difference in informational gene rates contributes to rate variation, it is only a small fraction of the variation present; instead, a substantial amount of rate variation for insertions/deletions remains among both informational genes and among non-informational genes. Background Gene insertions and deletions have been widely acknowledged to play an essential role in shaping bacterial genomes during evolution [1-4]. Parsimony methods have been employed to understand the process of gene insertions and deletions [5-8]. However, parsimony methods fail to distinguish parallel deletions and insertions on multiple branches [9-11]. The problem of parallel deletions and insertions can be overcome using maximum likelihood methods by making use of transition probabilities [12]. Recently a maximum likelihood method was employed to study gene insertions and deletions assuming constant rates across genes in a given genome [13]. However, the assumption of constant insertion/deletion rates among genes is unrealistic. For example, it has been shown that informational genes, such as those involved in transcription and translation, are less likely to be laterally transferred than are operational genes responsible for metabolic processes [14,15]. This observation forms the basis of the "complexity hypothesis". Unfortunately, causes of rate variation for insertions/deletions beyond the difference between informational genes and operational genes still remain unclear. A study of rate variation for gene insertions/deletions making use of the maximum likelihood method, therefore, becomes useful to address questions on rate variation for gene insertions/deletions. Here, a Γ-distribution has been incorporated into a maximum likelihood estimation of gene insertion/deletion rates (Figure (Figure1).1
The results reveal that rate variation of gene insertions/deletions is much more complex than simply a difference between informational genes and operational genes; instead, a high degree of rate variation for insertions/deletions remains among both informational genes and among non-informational genes. Results The same set of data from the Bacillus group in [13] was used to initially test the performance of the models incorporating rate variation. Following the previous study, three rate-conditions (μ1 = μ2 = μ3; μ1, μ2 = μ3; μ1, μ2, μ3;) were assumed (see Figure Figure2),2
The likelihood of the models was improved significantly by incorporating a Γ-distribution in rate variation models compared with relevant constant rate models (χ2 = 2ΔLnL 3.84 with d.f. = 1, see Table 1). Of the three models, the likelihood of the single-constant rate model was improved the most by incorporating rate variation. In the single rate model with a Γ-distribution, the MLE (maximum likelihood estimation) rate of insertions/deletions is 5.29, which is much greater than the rate 0.51 of the constant rate model. There is a high degree of rate variation for gene insertions/deletions, since the rate variation parameter α is 0.37; indicating that there is a small subset of genes with rapid gene turn-over.
The two-rate and three-rate models both assumed different rates on certain parts of the phylogeny. After incorporating rate variation, they both showed significantly better fits to the data than did the single rate model (Table 1), but the MLE rates are similar to those estimated from the constant rate models. In the two-rate model, the rates with rate variation in a Γ-distribution are 4.73, 0.37 versus 4.42, 0.35 with constant rates. Similarly, in the three-rate model, the rates with rate variation are 4.49, 0.32, 1.67 versus 3.92, 0.28, 1.23 with constant rates. Both models consistently support that recently transferred genes tend to have high rates of gene insertions/deletions as noted in [13]. Both cases showed lower levels of rate variation (greater α values) compared with the single-rate model (5.83 and 3.04 versus 0.37 of α values), even though, incorporating rate variation also improved the likelihood significantly. To gain a clearer picture of rate variation of lateral gene transfer in the domain of bacteria, the study was expanded to 173 complete bacterial genomes in 25 phylogenetic groups (Tables 2 and 3). For each phylogenetic group, two phylogenies were constructed. One (the select-genes tree) is based on a group of selected genes, the other (the common-genes tree) is based on genes present among the relevant taxa as described in the Methods section. Of the 25 phylogenetic groups, 20 groups showed a fairly high level of rate variation for gene insertions/deletions among genes, and the confidence interval falls into a small range for each α value (Table 4). It is also striking that estimates using phylogenies on different set of genes or using different phylogenies are similar (Tables 2, 3, and 4). Five groups did not show a significant level of rate variation for gene insertions/deletions (α is ∞). The five groups are Candidatus, Ehrlichia, Lactobacillus, Mycoplasma, and Synechococcus. The lower boundary of each infinite α value was also estimated. The Ehrlichia group shows a very broad interval range for the "maximum" likelihood value and the lower boundary of α value is 0.39 (Table 4). The undistinguished difference between the rate variation model and the constant rate model in Ehrlichia might be due to the limited size of data (gene families) and/or accelerated evolution at the sequence level in this intracellular group. The relationship between the rate variation parameter α in a Γ distribution and the average branch length of each group was examined. Figure Figure33
MLE estimates of gene insertions/deletions rates in informational genes and in non-informational genes were estimated separately in the absence of rate variation. The rates in informational genes are lower than those in non-informational genes in all groups, but none of them are zero (Additional file 1). This is consistent with previous studies that have shown informational genes have slower rates of gene insertions/deletions than non-informational genes [14,15], but they are not completely free of gene movement [16]. To evaluate whether the difference in informational genes contributes the most to rate variation of gene transfers, maximum likelihood estimation with rate variation for gene insertions/deletions was conducted by excluding all informational genes. The α values excluding informational genes are remarkably similar to those including informational genes (Figure (Figure4),4
Discussion The accuracy of maximum likelihood estimation of gene indel rates is dependent on the presence of a robust phylogeny for the genomes under study. Phylogenies obtained from single genes can sometimes be distorted due to rampant LGT [2,6,17,18] and rRNA sequences may not be useful due to the lack of informative characters differentiating closely related species and varying functional constraints over the molecule [19,20]. We used a concatenated DNA sequence obtained by joining the gene sequences that are commonly present in many bacterial genomes. For the select-genes tree construction, a set of genes were chosen from those reported in previous studies [21,22]. If there is more than one phylogeny generated for a group, all phylogenies were used and weighted by their occurrence to overcome the uncertainty of using just one. To avoid the confounding effects of duplication during evolution [23,24], duplicated genes were removed from phylogeny construction. Due to the broad spectrum of species analyzed in this study, there are few genes free of both duplication and lateral gene transfer across all groups. Consequently, the genes used for phylogeny reconstruction may be different between groups (details are given as supplementary information at [25]). To assess the robustness of each select-genes tree, the common-genes trees were reconstructed using genes present in all members of each phylogenetic group. When the common-genes tree and the select-genes tree are not topologically identical, a supertree was constructed. There are 12 groups that have an identical topology between the select-genes tree and the common-genes tree. The remaining 13 groups do not show an identical topology between the select-genes tree and the common-genes tree. Please note that many differences are due to either the lack of phylogenetic signal at the tips of a phylogeny or the placement of the root (see supplementary information [25] for more details). One way to achieve a more accurate phylogeny is to make use of a large number of genes in comprehensive phylogenetic studies, such as supermatrics (concatenated genes) and supertrees [26-30]. The more data included in a phylogenomic analysis, the more likely to overcome possible stochastic errors [27,31]. In this study, the common-genes tree was not always favored by the supertree over the select-genes trees. Indeed, there are two groups that the select-genes tree is supported by the supertree (Additional file 4). Slow evolving genes are sometimes more informative for phylogeny construction, since fast evolving genes might cause problems such as long branch attraction [28,32]. The tree length of each gene was computed from each phylogeny and plotted in (Additional file 5). It is clear that the selected genes for phylogeny construction have relatively slow evolutionary rates compared with all common genes. There are 4 groups whose supertree does not support the select-genes tree or the common-genes tree. The lack of congruence in these groups is likely due to insufficient taxon sampling. [27]. More accurate trees might be obtained as more complete genome sequences become available. Importantly, the maximum likelihood estimates based on the common-genes trees are remarkably similar with the estimates based on the select-genes trees (Tables 2, 3, 4, 5 and Figures Figures3,3 To further explore how much, if at all, different phylogenies might alter the results, maximum likelihood estimation based on possible alternative topologies was investigated. One hundred bootstraps from the alignment of the common genes were generated for each group. For the select-genes, possible alternative topologies were obtained from the MRBAYES output. If there are more than 10 distinct topologies, the top 10 ones according to their likelihood were chosen for further maximum likelihood estimation. The maximum likelihood estimates are shown in Additional files 6 and 7. It is clear that the α values are similar among different phylogenies. The removal of informational genes results in little change on the rate variation parameter α, and this holds true for each phylogeny. Furthermore, the likelihood estimations in the Bacillus group based on a phylogeny constructed from different genes (Tables 2 and 3) are similar with those based on the phylogeny of the previous study (Tables 1). The slight difference is due to the removal of short sequences in this study and differences in the phylogenies constructed from different sequences. The results do not, therefore, seem to be an artifact of the genes included or the phylogeny reconstruction. Informational genes are known to be less likely to undergo lateral gene transfer [14], which is also the core of the "complexity hypothesis" [15,33]. In this study, informational genes were found to have lower rates of gene insertions/deletions compared with non-informational genes (Additional file 1). However, no group has an insertion/deletion rate equal to 0, suggesting that informational genes are not completely free of gene movement. In fact, several ribosomal protein coding genes are deleted from Streptococcus mutans (Additional file 8). The rate variation parameter, α, change after excluding informational genes is similar to the α change after randomly removing genes rather than the α change after excluding the most conserved genes (Additional file 9). Furthermore, different cutoffs used in identifying informational genes only resulted in variation of the number of informational genes but did not affect the degree of rate variation for insertions/deletions in non-informational genes (Additional file 10). There is a great deal of rate variation for gene insertions/deletions in non-informational genes and also there is a significant level of rate variation in informational genes. In other words, the different rates between informational genes and non-informational genes as shown in the "complexity hypothesis" can only explain a small part of rate variation for gene insertions/deletions. Similarly, our simulation study showed that the high level of rate variation can not be explained solely by the fast turn-over rates of recently transferred genes (Additional file 11). It has been suggested that different cutoff thresholds for identifying homologues might affect the identification of some gene gains [34], but different thresholds result in little change on the number of gene families [35] and the rates of gene insertions/deletions [36]. In this study, results using different thresholds were similar (data not shown). It is important to note that gene duplication was not taken into consideration in this study, since our focus was on insertions/deletions of gene families rather than intraspecific gene family duplication. This avoids the difficulty of distinguishing some gene transfer from gene duplication [4,11]. Recently, some studies have suggested that duplicated genes or genes that have a high duplicability propensity might be more likely to be involved in lateral gene transfer [37,38]. Methods incorporating gene duplication information are desirable for future studies. For the 20 groups that showed a significant improvement in likelihood by adding rate variation, there is a positive association between the rate variation parameter α and the average branch length. Higher degrees of rate variation for gene insertions/deletion are expected to be observed in closely related groups. The seven closely related Bacillus genomes in the Bc group were analyzed separately and, as expected, a high degree of rate variation was observed (data not shown). Similarly, the five groups that have an infinite α value show fairly high levels of divergence within the group in terms of the average branch length. An acceleration of sequence evolution in the endosymbiont genomes has been acknowledged [39]. The accelerated rates of evolution might affect the branch lengths of the phylogeny used in the analyses and might also affect the identification of homologues within each phylogenetic group. Four endosymbiont groups showed strong accelerated rates of evolution. They are Candidatus, Ehrichia, Mycoplasma, and Richettsia. Analyses after the removal of these four groups also showed similar results (data not shown). There are two possible explanations for the correlation between the average branch length and the α value. First, it is possible that the observed correlation is due to the lack of power of maximum likelihood estimation in distantly related groups. Previously, it has been shown that comparison among distantly related species tends to infer lower rates of insertions/deletions [13]. On the other hand, if maximum likelihood estimation in the study has enough power in distantly related groups, the results might suggest that rate variation for insertions/deletions has a strong local effect and becomes weaker as evolution in progress. This strong local effect might be, at least partially, due to a high variability in recently transferred genes. It is known that many of recently transferred genes are under faster rates of evolution and might be eliminated from the genome rapidly [8,13], and while some transferred genes that play roles in long term adaptation might become fixed [40-43] and integrated into the functional network [44,45]. Maximum likelihood estimations from simulated data showed support for these explanations. When the number of insertions/deletions increased, a larger proportion of insertions/deletions became undetectable, at the same time, sister taxa shared less common genes and have more unique genes (Additional file 11). If this correlation holds true, after a long enough time period, one should expect rate variation becomes undetectable. Hence, over a long time period, genes would have roughly the same chance to be transferred. This has been shown in some recent studies. By examining the Cyanobacteria group, Zhaxybayeva et al. reported that genes from all functional categories are subject to gene transfer [46]. In addition, it was suggested that among all sequenced gene families, at least two-thirds and probably all, have been affected by LGT at some time in their evolutionary past [35]. On the other hand, not many genes in a genome are shown to be affected by lateral gene transfer when comparing closely related species; This might be partially able to explain the contradictory views of lateral gene transfer at different phylogenetic scales. It has been reported that the genes from closely related species tend to have clearer tree-like relationship than the ones from distantly related species [47,48] and the studies analyzing genomes in different degrees of divergence do not show congruent results [18,49]. There may be several sources of noise in the data in Figure Figure3.3 Conclusion Maximum likelihood models incorporating rate variation allow us to evaluate the contribution to rate variation of gene insertions/deletions between informational genes and non-informational genes. Consistent with the "complexity hypothesis", informational genes are less likely to be laterally transferred than non-informational genes. However, the difference between informational genes and non-informational genes is only a small fraction of the variation present; instead, a substantial amount of rate variation for insertions/deletions remains among both informational genes and among non-informational genes. Furthermore, the observation of rate variation has a strong local effect and becomes blurry over evolutionary time. Methods A maximum likelihood model was used as described in [13]. In brief, gene presence or gene absence was treated as a binary character (0,1) state (Figure (Figure1).1 Since the genes absent in all of the taxa are unobservable, the results must be corrected for missing data. Hao and Golding (2006) used the correction for missing data as was used for missing restriction sites in [58], and the results are then made conditional on observing the gene present in at least one species. This is Here L- is the likelihood of a gene being absent in all taxa while L+ is the likelihood of the gene present in at least one genome from the observed data. After incorporating a discrete Γ model, the likelihood of observing the pattern of gene family i will be At the root of the tree we can compute the overall likelihood as Here N is the total number of gene families. To estimate the maximum likelihood, the ins/del rates together with the rate variation parameter α in a Γ distribution were optimized to find those rates/values that maximized the likelihood of observing the gene patterns. The same set of data from the Bacillus group in [13] were used to test the performance of the likelihood model with rate variation. As was done in the previous study, three models were examined (Figure (Figure2)2 To apply the improved maximum likelihood estimation to a broad spectrum in the bacterial domain, 173 complete bacterial genomes in 25 phylogenetic groups (including Bacillus) were examined. Genomes were selected to be within the same group based on the same genus name in the NCBI taxonomy database and whenever at least four genomes from the same genus were completely sequenced. Following previous studies, Oceanobacillus iheyensis and Geobacillus kaustophilus were included in the Bacillus group [13] and Ureaplasma urealyticum was included in the Mycoplasma group [8]. Since some highly diverged Synechococcus species are closely related to Prochlorococcus species [59], the group of Synechococcus in this study only includes Synechococcus sp. strains. Genome sequences were obtained from the NCBI database [60]. Sixteen non-ribosomal protein coding genes from commonly present genes [21,22] were chosen for phylogeny construction, and they are argS, gcp, gltX, hisS, infB, ksgA, lysS, metG, nusA, nusG, pheS, proS, rpoA, secY, serS, and ychF. In each group, any duplicated genes of these 16 genes were excluded from phylogeny construction for that group. The phylogeny of each group was constructed from the concatenated DNA sequences of these genes using MRBAYES [61] (200,000 generations sampled every 100 generations with a Γ distribution model and invariant class). For convenience, this tree is called the select-genes tree. The species information of each group together with outgroup information, genes used for phylogeny construction of each group, and the best supported phylogenetic tree are given as supplementary information at [25]. If more than one possible phylogeny was generated for a group, all possible phylogenies were used for further analysis, weighed by their posterior probabilities. The robustness of these phylogenies was further assessed by concatenating all common genes from each group (labelled the common-genes tree to distinguish it from the select-genes tree). As for selected genes, common genes that have paralogs were excluded from the analysis, to avoid the confounding effects of duplication. The number of genes (and characters) from each group are given in Additional file 4. Sequence alignment was performed individually for each gene using MUSCLE [62]. Aligned sequences were concatenated for phylogenetic analysis. Since MRBAYES [61] has a limitation for the maximum number of characters, DNAML in the PHYLIP package was used instead and the rate variation parameter alpha was estimated using the PUZZLE program [63]. A supertree method was then employed for the groups in which the select-genes tree and common-genes tree are not topologically identical. Genes present in at least 4 taxa were used for phylogeny construction. A supertree was computed by assuming equal weight on all phylogenies using the CLANN program [64]. When the supertree does not support either the select-genes tree or the common-genes tree, the supertree topology is additionally as supplementary information at [25]. Please note that reconstructed supertrees themselves do not have branch length information. When needed, branch length information was estimated from the selected genes by forcing a supertree topology. Average branch length was used as an indicator for the degree of the divergence in that group. The method to identify members of a gene family has been described in [8]. This study focuses on the presence/absence pattern of each gene family rather than individual gene; thus, varied number of genes (e.g. duplicated genes) in a gene family within the group of organisms would not be taken into consideration in the analysis. Non-annotated genes were recovered from the whole genome DNA sequences using a TBLASTN search [65] with annotated genes as query sequences, and predicted ORFs that are present in only one genome but do not have homologues detected in any other complete genomes by BLAST were removed from further analysis. In addition, genes encoding proteins that are less than 100 amino acids in length were removed from further analysis in this study, since a similarity search using BLAST has less power to detect homologues in short sequences [65]. Informational genes in each genome were identified by applying the COG classification (Clusters of Orthologous Groups of proteins) [66]. All available protein sequences with functional annotation from bacterial genomes were downloaded from the COGs database [67]. There are 24,797 genes from 50 complete bacterial genomes involved in information storage and processing according to the COG classification (categories J, A, K, L, and B in COGs). A reciprocal BLASTP search was conducted to identify the homologues of informational proteins in the studied genomes. Significant hits were required to have expect values less than 10-20 and match over 85% of the length of the query protein (10-20 + 85%). Different cutoffs (10-10 + 70%, 10-05 + 50%) were also examined to avoid the ambiguity of one cutoff threshold (Additional file 10). Genes that have significant hits with any informational genes were identified as informational genes. Rate variation for gene insertions/deletions was estimated after informational genes were excluded, and for comparisons, the same number of the most conserved genes were excluded and the same number of randomly chosen genes were removed. Rate variation for insertions/deletions of informational genes was estimated in the same manner. Authors contributions WH and GBG designed the study. WH carried out all analyses. WH and GBG wrote the manuscript. Additional file 1 Different ins/del rates between informational genes and non-informational genes. A, estimation was based on the select-genes trees; B, estimation was based on the common-genes trees. Only constant rates with no rate variation are shown, and the y = x line is also shown. Click here for file(12K, pdf) Additional file 2 Insertion/deletion rates of non-informative genes in different phylogenetic groups estimated with rate variation. Estimation was based on the select-genes trees. All the informational genes were excluded from the estimation. Click here for file(9.7K, pdf) Additional file 3 Insertion/deletion rates between informational genes and noninformational genes in COG classification. Estimation was based on the select-genes trees. Click here for file(9.7K, pdf) Additional file 4 Information on phylogeny construction using different methods. Click here for file(5.0K, pdf) Additional file 5 Boxplot of tree length of the select-genes tree and the common-genes tree from each group. Group names are shown in the first three letters (except MYB for Mycobacterium, MYP for Mycoplasma. For each group, tree length of the select-genes tree is on the left, and that of the common-genes tree is on the right. Click here for file(38K, pdf) Additional file 6 Alpha values based on different phylogenies. Estimation are based on possible alternative phylogenies for the common genes, which are sorted from best supported to lest supported. Click here for file(5.7K, pdf) Additional file 7 Alpha values based on different phylogenies. Estimation are based on possible alternative phylogenies for the selected genes, which are sorted from best supported to lest supported. Click here for file(5.9K, pdf) Additional file 8 Click here for file(1.8K, pdf) Additional file 9 Small α change after excluding informational genes compared with excluding the most conserved genes. A, Estimation was based on the select-genes trees; B, Estimation was based on the common-genes trees. Each bar represents a group and all groups were sorted according to their ratios. The ratios are obtained from Table 5. Click here for file(28K, pdf) Additional file 10 α value after informational genes were removed using different cutoffs on e-value and match length in identifying informative genes. Estimation was based on the select-genes trees. Maximum likelihood estimation was conducted by only using the best supported phylogeny of each group to reduce computational burden. Click here for file(6.9K, pdf) Acknowledgements This work was supported by a Natural Sciences and Engineering Research Council of Canada (NSERC) grant to GBG. The authors would like to thank Prof. R. Morton and two anonymous reviewers for their valuable comments on previous drafts. References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||||||||||||||
Genome Res. 2000 Nov; 10(11):1719-25.
[Genome Res. 2000]Nat Rev Microbiol. 2005 Sep; 3(9):679-87.
[Nat Rev Microbiol. 2005]Genome Biol. 2003; 4(9):R57.
[Genome Biol. 2003]Mol Biol Evol. 2004 Jul; 21(7):1294-307.
[Mol Biol Evol. 2004]Mol Biol Evol. 2005 Mar; 22(3):683-90.
[Mol Biol Evol. 2005]Genome Res. 2006 May; 16(5):636-43.
[Genome Res. 2006]Proc Natl Acad Sci U S A. 1998 May 26; 95(11):6239-44.
[Proc Natl Acad Sci U S A. 1998]Proc Natl Acad Sci U S A. 1999 Mar 30; 96(7):3801-6.
[Proc Natl Acad Sci U S A. 1999]Genome Res. 2006 May; 16(5):636-43.
[Genome Res. 2006]Genome Res. 2006 May; 16(5):636-43.
[Genome Res. 2006]Genome Res. 2006 May; 16(5):636-43.
[Genome Res. 2006]Proc Natl Acad Sci U S A. 1998 May 26; 95(11):6239-44.
[Proc Natl Acad Sci U S A. 1998]Proc Natl Acad Sci U S A. 1999 Mar 30; 96(7):3801-6.
[Proc Natl Acad Sci U S A. 1999]Trends Genet. 2000 Dec; 16(12):529-33.
[Trends Genet. 2000]Mol Biol Evol. 2002 Dec; 19(12):2226-38.
[Mol Biol Evol. 2002]Genome Res. 2003 Jul; 13(7):1589-94.
[Genome Res. 2003]EMBO J. 2000 Dec 15; 19(24):6637-43.
[EMBO J. 2000]BMC Evol Biol. 2005 May 24; 5(1):33.
[BMC Evol Biol. 2005]Int J Syst Bacteriol. 1992 Jan; 42(1):166-70.
[Int J Syst Bacteriol. 1992]Nature. 2003 Oct 23; 425(6960):798-804.
[Nature. 2003]Science. 2006 Mar 3; 311(5765):1283-7.
[Science. 2006]Nat Rev Genet. 2005 May; 6(5):361-75.
[Nat Rev Genet. 2005]Trends Ecol Evol. 2006 Nov; 21(11):614-20.
[Trends Ecol Evol. 2006]Mol Biol Evol. 2005 May; 22(5):1246-53.
[Mol Biol Evol. 2005]Proc Natl Acad Sci U S A. 1998 May 26; 95(11):6239-44.
[Proc Natl Acad Sci U S A. 1998]Proc Natl Acad Sci U S A. 1999 Mar 30; 96(7):3801-6.
[Proc Natl Acad Sci U S A. 1999]Mol Biol Evol. 2005 Feb; 22(2):200-9.
[Mol Biol Evol. 2005]Genome Biol. 2007; 8(2):402.
[Genome Biol. 2007]Proc Natl Acad Sci U S A. 2007 Jan 16; 104(3):870-5.
[Proc Natl Acad Sci U S A. 2007]Nat Rev Microbiol. 2005 Sep; 3(9):679-87.
[Nat Rev Microbiol. 2005]Mol Biol Evol. 2005 Mar; 22(3):683-90.
[Mol Biol Evol. 2005]Genome Biol. 2003; 4(8):R48.
[Genome Biol. 2003]Mol Biol Evol. 2004 Jun; 21(6):1110-22.
[Mol Biol Evol. 2004]Genome Res. 2006 May; 16(5):636-43.
[Genome Res. 2006]Mol Biol Evol. 2004 Jul; 21(7):1294-307.
[Mol Biol Evol. 2004]EMBO Rep. 2001 May; 2(5):376-81.
[EMBO Rep. 2001]BMC Evol Biol. 2007 Feb 8; 7 Suppl 1():S8.
[BMC Evol Biol. 2007]Nat Genet. 2005 Dec; 37(12):1372-5.
[Nat Genet. 2005]Proc Biol Sci. 2004 Dec 22; 271(1557):2551-8.
[Proc Biol Sci. 2004]Mol Biol Evol. 2006 Nov; 23(11):2049-57.
[Mol Biol Evol. 2006]BMC Evol Biol. 2005 May 24; 5(1):33.
[BMC Evol Biol. 2005]Science. 2003 Aug 8; 301(5634):829-32.
[Science. 2003]Mol Biol Evol. 2004 Oct; 21(10):1884-94.
[Mol Biol Evol. 2004]Genome Res. 2006 May; 16(5):636-43.
[Genome Res. 2006]J Mol Evol. 1994 Sep; 39(3):306-14.
[J Mol Evol. 1994]J Mol Evol. 2001 Oct-Nov; 53(4-5):447-55.
[J Mol Evol. 2001]Genome Res. 2006 May; 16(5):636-43.
[Genome Res. 2006]Genome Res. 2006 May; 16(5):636-43.
[Genome Res. 2006]Mol Biol Evol. 2004 Jul; 21(7):1294-307.
[Mol Biol Evol. 2004]Appl Environ Microbiol. 2005 Jul; 71(7):4127-31.
[Appl Environ Microbiol. 2005]Genome Res. 2003 Mar; 13(3):407-12.
[Genome Res. 2003]Genome Res. 2004 Dec; 14(12):2469-77.
[Genome Res. 2004]Nucleic Acids Res. 2004; 32(5):1792-7.
[Nucleic Acids Res. 2004]Bioinformatics. 2001 Aug; 17(8):754-5.
[Bioinformatics. 2001]Bioinformatics. 2005 Feb 1; 21(3):390-2.
[Bioinformatics. 2005]Mol Biol Evol. 2004 Jul; 21(7):1294-307.
[Mol Biol Evol. 2004]Nucleic Acids Res. 1997 Sep 1; 25(17):3389-402.
[Nucleic Acids Res. 1997]Science. 1997 Oct 24; 278(5338):631-7.
[Science. 1997]