![]() | ![]() |
Formats:
|
||||||||||||||||||||
Copyright © 2004, Cold Spring Harbor Laboratory Press Computing prokaryotic gene ubiquity: Rescuing the core from extinction Genome Atlantic, and Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia, Canada B3H 1X5 1Corresponding author. E-mail Ford/at/dal.ca; fax (902) 494-1355. Received July 20, 2004; Accepted October 7, 2004. This article has been cited by other articles in PMC.Abstract The genomic core concept has found several uses in comparative and evolutionary genomics. Defined as the set of all genes common to (ubiquitous among) all genomes in a phylogenetically coherent group, core size decreases as the number and phylogenetic diversity of the relevant group increases. Here, we focus on methods for defining the size and composition of the core of all genes shared by sequenced genomes of prokaryotes (Bacteria and Archaea). There are few (almost certainly less than 50) genes shared by all of the 147 genomes compared, surely insufficient to conduct all essential functions. Sequencing and annotation errors are responsible for the apparent absence of some genes, while very limited but genuine disappearances (from just one or a few genomes) can account for several others. Core size will continue to decrease as more genome sequences appear, unless the requirement for ubiquity is relaxed. Such relaxation seems consistent with any reasonable biological purpose for seeking a core, but it renders the problem of definition more problematic. We propose an alternative approach (the phylogenetically balanced core), which preserves some of the biological utility of the core concept. Cores, however delimited, preferentially contain informational rather than operational genes; we present a new hypothesis for why this might be so. The concept of a genomic core plays a key role in the literature of evolutionary and comparative prokaryotic genomics (Makarova et al. 1999; Nesbø et al. 2001; Harris et al. 2003; Koonin 2003). Operationally, a core can be defined as the set of all genes shared as orthologs by all members of an evolutionarily coherent group (a species such as Escherichia coli, a phylum such as Proteobacteria, a domain such as Bacteria, or all of Life). Biologically, cores have been used for three purposes as follows: to help deduce the composition of ancestral genomes (Mushegian and Koonin 1996; Koonin 2003), to guide in the construction of minimal cells (Zimmer 2003), and to facilitate the reconstruction of organismal phylogenetic trees (Makarova et al. 1999; Nesbø et al. 2001; Daubin et al. 2002, 2003; Lerat et al. 2003). This last use involves the assumption that core genes, universally shared by all members of a taxon, are relatively unlikely to have experienced lateral gene transfer (LGT). Thus, in all three usages some biological— rather than simply statistical—meaning is attached to the size and composition of a core. In prokaryotic species for which genomes of several different strains have been completely sequenced, orthologous genes of the species core can usually be identified easily; they are conserved in chromosomal position as well as in sequence. For deeper and more inclusive taxa, cores become progressively smaller and more elusive because of weak phylogenetic signals, genomic rearrangements, and problems in recognizing paralogy. Nevertheless, there is much interest in such deep cores, especially the universal (Bacteria + Archaea + Eukarya) core. Several investigators argue that this core's composition might reflect that of the genome of the Last Universal Common Ancestor (LUCA), while phylogenies of its genes, if congruent, could delineate the earliest branchings of the Tree of Life (Brown et al. 2001; Woese 2002; Koonin 2003). Recent attempts to define the universal core have concluded that it contains very few genes. Harris et al. (2003), in a study of 34 genomes report 80 core genes, Koonin (2003) using about 100 genomes finds something like 60 core genes, Brown et al. (2001) with 45 genomes (and greater requirements for stringency) 23. Although the operational and largely arbitrary nature of the definition of this minimal set has been appreciated, several authors have attributed biological significance to the fact that the number of ubiquitous genes is so small. For instance, for Koonin (2003), “the important realization that comes from this type of analysis is the remarkable evolutionary plasticity of even the central, essential biological functions. Only a tiny group of genes (nearly all of them associated with translation and transcription) is truly ubiquitous among living things”. Woese (1987) asks, in the phylogenetic context, “What does it mean, then, to speak of an organismal genealogy when nearly all of the genes in the cell, genes that give it its general character, do not share a common history?” Here, we describe a similarly motivated study, addressing a more extensive set of prokaryotic genomes (130 Bacteria and 17 Archaea) with a greater variety of methods of analysis. We, too, find a diminutive set of truly ubiquitous genes among prokaryotes. Our interest, however, is not so much in this number as in whether or not it might be a statistical illusion, what more useful algorithms for defining cores might be possible, and the difficult question of biological significance of core size and composition. Results Ubiquitous cores defined by reciprocal best matches Table 1 summarizes our enumeration of genes shared by all members of selected prokaryotic taxa. Reported values are the means of all queries in which a search was launched with any one genome from a member of the taxon against all others from that taxon, requiring a reciprocal best match (RBM, see Methods) in each. Variation around these means is small, and reflects cases in which genes are variously disconnected from one another in the BLAST method. (By disconnection, we mean a situation in which an occasional gene X will be an above-threshold best-reciprocal match between genomes A and B and A and C, but not between B and C). The average prokaryotic genomic core defined in this way (genes shared by a launching genome and all other Bacteria and Archaea) has only 14.82 such ubiquitous genes. A more generous result, and one which actually produces a unique list of gene names, is obtained by taking the union of all such sets shared by all 147 genomes. The size of this union of RBM sets is 30 genes.
It seems unlikely that this number is falsely large. Artifacts that could make it so would require extreme degrees of sequence convergence, and in any case, all or most of the genes identified in such an analysis are expected to be highly conserved on biological grounds (see below; Table 3, below). It could easily be falsely small, however, and the remaining analyses presented here variously address possible artifactual explanations for the diminutive size of the prokaryotic core. Effects of BLAST parameters and genome size on apparent core size One such possibility is that we have set cut-off values for BLASTP too stringently, so that legitimate orthologs have been overlooked. If this were so, the average number of genes with RBMs in all genomes should be exquisitely sensitive to that cut-off value. It is not, as Table 2 illustrates. There is remarkably little variation in average numbers of genes obtained for expectation values between 1.0e-3 and 1.0e-7. Nevertheless, a few more genes were added by combining the results of the union of shared RBMs at 1.0e-5 with the results of the consensus gene name, or CGN approach (see Methods), which might recover some cases of poorly or nonreciprocally matching orthologs. Both sets are listed in Table 3, and are largely overlapping (30 genes from the union of shared RBMs, 34 genes from CGNs, 26 in their intersection and 38 in their union).
Another possible artifact (or misleading bias) could result from inclusion of the highly reduced genomes of endosymbionts or parasites, which can lack many genes required by free-living cells. To examine the effect of excluding small genomes, we performed the analysis shown in Figure 1A
Figure 1B Core composition and the problem of missing genes Table 3 lists the 30–38 largely identical genes that are found in the universal prokaryotic cores defined by either the union of RBMs (Table 1) or CGN methods (Fig. 1 As it happens, the distribution and conservation of ribosomal proteins has recently been subjected to a careful analysis by Lecompte et al. (2002). With 66 genomes (45 Bacteria, 14 Archaea, and seven Eukarya), they found 33 universal prokaryotic ribosomal proteins, after correcting for missed annotations. Extending their analysis to 147 prokaryotes (130 Bacteria and 17 Archaea), with a thorough tBLASTN search for each missing gene in each apparently deficient genome, we concluded that the status of these investigators' 15 ubiquitous small-subunit ribosomal proteins remains secure. However, four of the 18 then-ubiquitous large-subunit proteins can now be declared missing in at least one bacterial genomic sequence. The normally adjacent rplB and rplW genes cannot be located within the Streptococcus mutans UA159 genomic sequence, but rather the pair's neighbors (rpsS and rplD) overlap by 11 bp; and there is no sign of rpmC within the Wolinella succinogenes DSMZ 1740 genome. Whether these three absences represent legitimate losses or sequencing artifacts is impossible to tell. Finally, the rplM gene is annotated within the genome of Enterococcus faecalis V583 as having a frameshift, but is presumed nonfunctional (no protein sequence is described). Some further losses clearly have occurred among several ribosomal protein genes described by Lecompte et al. (2002) as restricted to, but ubiquitous within Bacteria. The rplI gene cannot be located within the Mycoplasma penetrans HF-2 sequence; rpmB is annotated as a pseudogene in Mycobacterium leprae TN and appears to be absent from Pirellula sp. 1; rpmF cannot be found in Mycobacterium tuberculosis H37Rv; rpmH cannot be found in Pirellula sp. 1; rpmI cannot be found in Bdellovibrio bacteriovorus HD100; and rpmJ appears to be absent from both strains (C58 and C58 UWash) of Agrobacterium tumefaciens and from Corynebacterium glutamicum. All of the archaeal-specific ribosomal proteins that were ubiquitous in the study of Lecompte et al. (2002) are still ubiquitous with our slightly larger (17 vs. 14) archaeal genomic data set. Our case-by-case searches described above revealed that the true ubiquitous core consists of 29 prokaryotic ribosomal protein genes, but our strictest measurement of the core using an automated analysis identified only 11 ribosomal protein genes (Table 3). Many of the false negatives in the automated search, which is necessarily based on annotated genes, were due to missed annotations (especially of small proteins not designated as ORFs in genome databases) and misannotations (usually where a larger overlapping ORF was selected at the expense of a smaller gene). Some of the false negatives were, however, attributable to deficiencies in our methods, where BLASTP thresholds were too strict or where alternate annotations confounded assignment of the CGN. Ribosomal proteins might, however, be worst-case examples both for annotation artifacts and algorithmic shortcomings; many ribosomal proteins are very small, and can easily be overlooked by annotators and by BLASTP alike. Furthermore, some of the apparently genuine absences affect very few genomes; four more genes could be added to the core if the ubiquity requirement were relaxed, even very slightly. It will not be simple to extract from our analysis of ribosomal proteins any reliable estimate of how many genes in other functional categories have been excluded from Table 3 because of error, and how many are genuinely missing from at least one or a few genomes. Our focus, in any case, is not so much on the precise number or identity of core genes as on the methods best used to define them. In that regard, we infer that the requirement for ubiquity in defining genomic cores—as well as being completely unforgiving with respect to errors—might be standing in the way of our recognition of some biologically more significant collection of almost ubiquitous genes. Relaxing the requirement for ubiquity The analysis illustrated in Figure 2
A comparison of Figures Figures22 It is appealing to propose a model in which each gene has a different and independent characteristic probability of going missing from a genome (Krylov et al. 2003), where even critical functions are not formally exempt from analogous replacement in evolution, and no gene is exempt from sequencing or annotation error. But, it would be very difficult to establish the parameters of such a model, not only the gene-specific loss propensities, but in the end, just what value of loss propensity is tolerable for inclusion in the core. In other words, although these gene-specific probabilities would have a biological significance, the core itself could be arbitrary or artifactual in two senses, depending on the number (and nature) of genomes examined and the cut-off value set for inclusion. Toward a biologically more significant phylogenetically balanced core We reasoned that the biological goals of defining a prokaryotic core might be better achieved by methods that do not demand ubiquity or assert some arbitrary definition of ubiquity, but do retain the requirement that genes of the core be (1) very common, and (2) distributed as broadly as possible, phylogenetically. Table 4 and Figure 3
This approach has at least three distinct advantages over ubiquity-requiring global analyses. First, for the mean values of multiphylum comparisons, genes that are missing (either truly or through error) from only a few genomes, will usually have less effect. Second, and for related reasons, highly reduced genomes have less impact on the size of the core—unless they are the only representatives of their phyla. The maximum and minimum values show the extent to which large and small genomes (endosymbiotic or parasitic) within well-sampled phyla influence our comparisons. Third, as more genomes are added within existing phyla, this estimate will become more precise (its standard error will decrease). Its value is not expected to diminish and will likely even increase, as larger genomes from phyla, so far poorly sampled, appear. The addition of new phyla will diminish the PBC core, but the list of such completely unsampled phyla is not, like that of unsequenced genomes, limitless. Our ultimate interest here is in such a core for all prokaryotes, defined as genes present in some reasonable fraction of genomes in each and every one of 12 bacterial phyla and two archaeal phyla (Crenarchaeota and Euryarchaeota, including Nanoarchaeum equitans). In Table 5, we list all genes that are (1) present by consensus name in at least one genome in each of these 14 groups, and (2) within these groups, present in 100%, at least 90%, or at least 80% of genomes. Of course, the 100% gene set is identical to that shown in Table 3, except for minor differences arising from Table 5's more comprehensive data set; there is no phylogenetic balancing when ubiquity is required. (For constructing this table, we have used, in addition to the 147 prokaryotic genomes available in January 2004, 23 that have since appeared.) Interestingly, rpoB and rpoC, which are necessary and ubiquitous components of RNA polymerase, are only retrieved using our relaxed definition of the core. In some genomes, these genes are fused, and thus thwart the retrieval of rpoB and/or rpoC by RBM or CGN. This example helps to underline the need for flexibility in defining core genes, not only in compensation for sequencing and annotation artifacts, and the odd rarely lost gene, but also for some genes' tendency to form multidomain proteins.
Gene loss is a known factor in genome evolution, but since ancestral genomes cannot have been much larger than present-day genomes, gene genesis (largely by duplication and divergence) is necessary in order to compensate (Snel et al. 2002; Kunin and Ouzounis 2003; Mirkin et al. 2003). Gene genesis creates paralogs, and gene loss deletes both paralogs and orthologs; the whole process thus inevitably results in a shrinking of an orthologously defined core. Genes remaining in a universal core should thus not only be critically necessary and maintained in all taxa, but should also be resistant to the usage of alternative forms. The most inclusive core defined in this way contains 71 genes (Table 5). We suspect that its size may increase as more or larger genomes for some of the sparsely sampled phyla— especially Aquificales, Chlorobi, Planctomycetes, and Thermotogales—become available. Although the potentially distorting effect of individually aberrant genomes that are among the few representatives of their respective phyla cannot be ignored, its importance will diminish as more genome sequences appear (Two examples are as follows: ffh appears to be absent from Nanoarchaeum equitans [acceptable to PBC], but also from both strains of Leptospira interrogans and thus, from 40% of our Spirochaetes; ftsY appears to be absent from N. equitans [acceptable], but also from Sulfolobus tokodaii and thus, from 25% of our Crenarchaeota). The decision to choose genes present in 90%, 80%, or some other fraction of genomes in each phylum remains arbitrary. But, the requirement imposed by the PBC approach for presence at such a level in all phyla guarantees a more representative biological sampling, whereas a core defined as including genes present in 80% of all genomes regardless of phyletic distribution would be a bacterial core. Discussion Although universal prokaryotic cores as described here and in other recent literature are often suggestively similar in size, this apparent convergence lacks biological significance. When each core gene is required to be in every genome, cores will inevitably be artifactually small. Some genes will be missing because of sequencing, assembly, and annotation errors, and some genuine orthologs will have diverged beyond detectability. Because of such errors alone, the size of ubiquity-requiring genomic cores should continue to decline slowly, as more genome sequences appear. The impact of errors like this might be reduced by relaxing the requirement for ubiquity (to <100% of all genomes). However, the set of almost ubiquitous genes rises almost continuously in number as percent representation is relaxed, and includes more and more genes whose nonubiquity is not artifactual. There is no obvious place to draw the line, other than that at which we can discount all Archaea (about 80%). To do this would be to abandon the claim for (prokaryotic) universality. Defining cores as genes present in some fraction of genomes less than 100% is thus not only arbitrary, but gives disproportionate weight to taxa favored, for whatever reason, by sequencers. The PBC approach we describe will address the problem of errors and the problem of disproportionate weighting of popular phyla when ubiquity is not required. It also favors universality (within a phylogenetic context), but it still does not tell us where to draw the line. We do find it encouraging that the core defined in this way increases less than 20% in size, when prevalence required within taxa is dropped from 90% to 80%. If, indeed, there is a conserved set of genes that might be considered the basic heritage for all prokaryotes, but which defies precise definition because of occasional loss or orthologous replacement in scattered lineages, the PBC might provide a good approximation. Although a nonarbitrary delimitation of core size may be impossible, core composition is not random. Cores are generally dominated by genes of the translational apparatus. Why should this be? The widely accepted explanation is that genes of this important informational process are intrinsically less exchangeable than genes of operational processes, like metabolism (Woese 1987, 2002; Jain et al. 2002). There are so many complex coevolved interactions with other cellular informational constituents, this argument holds, that any replacement of one of these genes by a distant homolog or by an analog would be disadvantageous. Although one might object that there are individual instances in which such components (not only ribosomal proteins and translation factors, but ribosomal RNAs themselves) have been transferred, this complexity hypothesis (or annealing hypothesis) remains appealing and popular. But there is an alternative explanation, which we will sketch out here, and have represented by a cartoon in Figure 4
Exchanges of this nature need not be cryptic in phylogenetic analyses. Informational genes of the universal prokaryotic core should not produce congruent phylogenetic trees if cells have exchanged them for foreign (but still orthologous) versions as often as they seem to have traded nonorthologous (indeed, nonhomologous) operational genes. Whether or not genes of the universal core do show congruent phylogenies is, however, still a matter of legitimate debate; at this (phylum) depth few individual genes have reliable phylogenetic signal. Several years ago, Teichmann and Mitchison (1999) observed that only three of 32 protein families shared between selected bacterial, archaeal, and eukaryotic genomes showed significant phylogenetic signal, and concluded that this signal was actually due to recent LGT events. Several recent studies that have used concatenated sequences of core genes to construct universal trees have argued that the robustness of these trees reflects an underlying phylogenetic coherence (Brown et al. 2001; Brochier et al. 2002; Matte-Tailliez et al. 2002). But, it was the failure of the genes individually to produce resolved trees that motivated concatenation in the first place, and in any event, all such reports conclude that there is significant divergent signal among core translational genes. Harris et al. (2003) note, from preliminary phylogenetic analyses, that 30 of their 80 universal core genes do not maintain domain monophyly, showing clades in which Bacteria, Archaea, and Eukarya are mixed. Brown et al. (2001) were reduced to a core of only 14 genes they considered congruent phylogenetically. Overall, it might be safe to say that informational genes support the Bacterial/Archaeal dichotomy more often than do operational genes— and perhaps, more often than not. But, a consistent and extensive pattern of congruence among informational genes in branching patterns at the level of bacterial and archaeal phyla has not been established. Thus, objections to our alternative explanation (Fig. 4 Methods Determining the size and composition of a core of genes ubiquitous among a set of genomes requires a method with which to compute orthologous relationships in bulk. For practical reasons, we can assume that a pair of reciprocally best (or near-best) matching genes are likely to be orthologs, and can thus automate the process of comparative genomic analysis (Charlebois et al. 2003). The BLASTP bit score (Altschul et al. 1997) serves as a convenient measure of sequence similarity, although we concede that it can only represent an approximation to the true similarity between genes, and may therefore generate false positive matches and false negative failures of matching, especially near the match threshold that must be imposed (Ragan and Charlebois 2002). Using BLASTP scores, we can distinguish between ordinary matches (including many which may be paralogous) and reciprocal best matches (RBMs), which are more often orthologous. Additionally, we can make use of information present in genomic annotations, permitting genes with standardized gene names to find their orthologs despite BLASTP matches that may fall slightly below threshold. Where annotations are reasonably consistent, this permits the union of overlapping sets of genes where BLASTP matches alone might miss members outside of the sets' intersection. Matches are still based on BLASTP, but some-what disconnected outliers can then link through the bridge of a common consensus gene name (CGN). These are computed as follows: For each ORF in a genome, its RBM, if any, is found in each of the other genomes, and the RBM's annotated name is appended to a list. The dominant name in this list (e.g., ftsA) becomes the query ORF's CGN. Lists of prevalent names, found in most or all members of a set of genomes, are generated by the “List named ubiquitous genes” query within NGIBWS (http://www.neurogadgets.com/bws.php) (Charlebois et al. 2003). A strict definition of a core of genes has those genes present in every member of the set of genomes under consideration. We performed such an analysis on clades of genomes (defined according to a genomic phylogeny [Gophna et al. 2004]), using the “Find ubiquitous genes” query within NGIBWS (Charlebois et al. 2003). Using each of the genomes from the clade in turn, ORFs are found that have an RBM (with allowance for near ties) better than the specified BLASTP threshold, in each of the other genomes in the clade. Both RBM and CGN approaches present some limitations. False negatives can arise in the RBM approach when BLASTP matches fall below threshold; false positives can arise when match thresholds are set so low as to permit spurious matches, and when paralogs match for want of lost orthologs. The use of CGNs as bridges for weak matches overcomes some of the problems inherent in the pure RBM approach, but introduces new problems relating to inconsistent annotation. Where alternative names for a gene are popular, an orthologous cluster may break up into name cliques, where a gene is truly ubiquitous, but fails to appear as a ubiquitous CGN. False positives are theoretically possible with the CGN approach, but only if homonymous names are used in annotation, which should be rare among core or otherwise prevalent genes, lysyl-tRNA synthetase (lysS) not-withstanding (Ibba et al. 1999). Both RBM and CGN methods of finding ubiquitous genes are exquisitely sensitive to missed annotation (where a gene is present in the sequence, but is not annotated as an ORF), and to sequencing artifacts (during cloning or sequence assembly). We assessed the extent of the former problem by repeating the work of Lecompte et al. (2002) on ribosomal proteins, where several expected proteins turned up missing in our larger set (147 vs. 59) of prokaryotic genomes. All of our analyses, except that illustrated in Table 5, are based on all complete sequences of Bacterial (130) and Archaeal (17) genomes available in January, 2004. Acknowledgments This work was supported by Genome Atlantic (Genome Canada) and the Canadian Institutes for Health Research, and the Canada Research Chairs Program. Notes Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.3024704. References
Web site references
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||||
Genome Res. 1999 Jul; 9(7):608-28.
[Genome Res. 1999]J Mol Evol. 2001 Oct-Nov; 53(4-5):340-50.
[J Mol Evol. 2001]Genome Res. 2003 Mar; 13(3):407-12.
[Genome Res. 2003]Nat Rev Microbiol. 2003 Nov; 1(2):127-36.
[Nat Rev Microbiol. 2003]Proc Natl Acad Sci U S A. 1996 Sep 17; 93(19):10268-73.
[Proc Natl Acad Sci U S A. 1996]Nat Genet. 2001 Jul; 28(3):281-5.
[Nat Genet. 2001]Proc Natl Acad Sci U S A. 2002 Jun 25; 99(13):8742-7.
[Proc Natl Acad Sci U S A. 2002]Nat Rev Microbiol. 2003 Nov; 1(2):127-36.
[Nat Rev Microbiol. 2003]Genome Res. 2003 Mar; 13(3):407-12.
[Genome Res. 2003]Nat Rev Microbiol. 2003 Nov; 1(2):127-36.
[Nat Rev Microbiol. 2003]Nat Genet. 2001 Jul; 28(3):281-5.
[Nat Genet. 2001]Microbiol Rev. 1987 Jun; 51(2):221-71.
[Microbiol Rev. 1987]Proc Natl Acad Sci U S A. 2003 Oct 28; 100(22):12984-8.
[Proc Natl Acad Sci U S A. 2003]Genome Biol. 2003; 4(8):115.
[Genome Biol. 2003]Nat Rev Microbiol. 2003 Nov; 1(2):127-36.
[Nat Rev Microbiol. 2003]Genome Res. 2003 Mar; 13(3):407-12.
[Genome Res. 2003]Microbiol Rev. 1987 Jun; 51(2):221-71.
[Microbiol Rev. 1987]Proc Natl Acad Sci U S A. 1998 Jun 9; 95(12):6854-9.
[Proc Natl Acad Sci U S A. 1998]Proc Natl Acad Sci U S A. 2000 Jul 18; 97(15):8392-6.
[Proc Natl Acad Sci U S A. 2000]Nucleic Acids Res. 2002 Dec 15; 30(24):5382-90.
[Nucleic Acids Res. 2002]Nucleic Acids Res. 2002 Dec 15; 30(24):5382-90.
[Nucleic Acids Res. 2002]Nat Genet. 2001 Jul; 28(3):281-5.
[Nat Genet. 2001]Genome Res. 2003 Mar; 13(3):407-12.
[Genome Res. 2003]Nat Rev Microbiol. 2003 Nov; 1(2):127-36.
[Nat Rev Microbiol. 2003]Genome Res. 2003 Oct; 13(10):2229-35.
[Genome Res. 2003]Genome Res. 2002 Jan; 12(1):17-25.
[Genome Res. 2002]Genome Res. 2003 Jul; 13(7):1589-94.
[Genome Res. 2003]BMC Evol Biol. 2003 Jan 6; 3():2.
[BMC Evol Biol. 2003]Microbiol Rev. 1987 Jun; 51(2):221-71.
[Microbiol Rev. 1987]Proc Natl Acad Sci U S A. 2002 Jun 25; 99(13):8742-7.
[Proc Natl Acad Sci U S A. 2002]Theor Popul Biol. 2002 Jun; 61(4):489-95.
[Theor Popul Biol. 2002]Science. 1999 Mar 26; 283(5410):2027-8.
[Science. 1999]Theor Popul Biol. 2002 Jun; 61(4):489-95.
[Theor Popul Biol. 2002]Trends Genet. 1996 Sep; 12(9):334-6.
[Trends Genet. 1996]J Mol Evol. 1999 Jul; 49(1):98-107.
[J Mol Evol. 1999]Nat Genet. 2001 Jul; 28(3):281-5.
[Nat Genet. 2001]Trends Genet. 2002 Jan; 18(1):1-5.
[Trends Genet. 2002]Mol Biol Evol. 2002 May; 19(5):631-9.
[Mol Biol Evol. 2002]Genome Res. 2003 Mar; 13(3):407-12.
[Genome Res. 2003]FEMS Microbiol Lett. 2003 Aug 29; 225(2):213-20.
[FEMS Microbiol Lett. 2003]Nucleic Acids Res. 1997 Sep 1; 25(17):3389-402.
[Nucleic Acids Res. 1997]Int J Syst Evol Microbiol. 2002 May; 52(Pt 3):777-87.
[Int J Syst Evol Microbiol. 2002]FEMS Microbiol Lett. 2003 Aug 29; 225(2):213-20.
[FEMS Microbiol Lett. 2003]FEMS Microbiol Lett. 2003 Aug 29; 225(2):213-20.
[FEMS Microbiol Lett. 2003]Proc Natl Acad Sci U S A. 1999 Jan 19; 96(2):418-23.
[Proc Natl Acad Sci U S A. 1999]Nucleic Acids Res. 2002 Dec 15; 30(24):5382-90.
[Nucleic Acids Res. 2002]Science. 1997 Oct 24; 278(5338):631-7.
[Science. 1997]Trends Genet. 1996 Sep; 12(9):334-6.
[Trends Genet. 1996]FEMS Microbiol Lett. 2003 Aug 29; 225(2):213-20.
[FEMS Microbiol Lett. 2003]Proc Natl Acad Sci U S A. 1999 Jan 19; 96(2):418-23.
[Proc Natl Acad Sci U S A. 1999]