![]() | ![]() |
Formats:
|
|||||||||||||||||||||
Copyright : © 2005 Lerat et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited Evolutionary Origins of Genomic Repertoires in Bacteria 1Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, Arizona, United States of America 2Department of Biochemistry and Molecular Biophysics, University of Arizona, Tucson, Arizona, United States of America David Hillis, Academic Editor University of Texas, United States of America Corresponding author.Howard Ochman: hochman/at/email.arizona.edu Received October 15, 2004; Accepted February 12, 2005. See "Where Do All Those Genes Come From?" , e169. This article has been cited by other articles in PMC.Abstract Explaining the diversity of gene repertoires has been a major problem in modern evolutionary biology. In eukaryotes, this diversity is believed to result mainly from gene duplication and loss, but in prokaryotes, lateral gene transfer (LGT) can also contribute substantially to genome contents. To determine the histories of gene inventories, we conducted an exhaustive analysis of gene phylogenies for all gene families in a widely sampled group, the γ-Proteobacteria. We show that, although these bacterial genomes display striking differences in gene repertoires, most gene families having representatives in several species have congruent histories. Other than the few vast multigene families, gene duplication has contributed relatively little to the contents of these genomes; instead, LGT, over time, provides most of the diversity in genomic repertoires. Most such acquired genes are lost, but the majority of those that persist in genomes are transmitted strictly vertically. Although our analyses are limited to the γ-Proteobacteria, these results resolve a long-standing paradox—i.e., the ability to make robust phylogenetic inferences in light of substantial LGT. Introduction The complexity and coordination of cellular functions are remarkable in view of the disparate histories of the genes that make up contemporary genomes. In eukaryotes, new genes arise primarily through the duplication of existing genes [1,2,3,4], while some ancestral genes are inactivated or eliminated over time. In contrast, prokaryotic genomes undergo substantial rates of gene acquisition from foreign sources [5], as well as duplication and loss of existing genes. Thus, if we consider the gene repertoire of a particular bacterial cell, some genes have been transmitted vertically for very long periods of time, perhaps from the time of the common ancestor of all cellular life-forms, whereas other genes were acquired or generated at various points in the history of the lineage, including some very recently. Although the role of vertical transmission and horizontal transfer are both well documented, as yet, we have no comprehensive, quantitative picture of the genome-wide history of gene gain and loss over time for any particular prokaryotic group. The availability of many complete genome sequences of bacteria presents the possibility of tracing the history of individual genes within evolving lineages by identifying the points at which genes originate through acquisition or duplication, and the points at which genes are lost. The resulting picture would address several outstanding questions and paradoxes concerning bacterial genomes. For example, if there is a robust estimate of the cell (or organismal) phylogeny for a set of lineages, can we identify the events of gene acquisition, duplication, and loss that lead to the current gene repertoires of individual cells? Is the incidence of gene acquisition ongoing or episodic, and do acquired genes come from very close relatives or from distant sources? Of acquired genes, what types and what proportion become permanently installed within descendant genomes, and which are lost? It is clear that gene duplication, gene loss, and gene transfer all impact bacterial genomes; but the relative contributions of each remain controversial [6,7,8,9,10,11,12,13]. The situation is confounded by the fact that, in bacteria, the presence of two or more homologous sequences within a single genome might reflect the acquisition of a gene copy from a foreign source rather than the duplication of a resident gene. In the absence of further analysis, such homologs cannot be confidently described as paralogs (or duplicates) [14] or as xenologs (acquired via horizontal transfer) [15], and we propose the term “synologs” as an agnostic name for homologs within a genome arising from either process. Distinguishing the origins of synologs within genomes allows both the accurate dissection of gene families and the full reconstruction of events responsible for the contents of cellular genomes. Here we investigate the full protein-coding gene repertoires within the γ-Proteobacteria, a group chosen because the large number of fully sequenced genomes, combined with their well-supported phylogenetic relationships, allows us to trace the origins of new genes in organisms that differ widely in their gene inventories (ranging from 564 protein-coding genes in Buchnera aphidicola to 5,540 in Pseudomonas aeruginosa) [16]. This group is an ancient bacterial phylum, at least several hundreds of million years old, based on the sequence divergence within the group [17] and on its containing at least one ancient subclade (Buchnera) that has cospeciated with hosts for over 100 million years. Available genome sequences include species displaying diversified lifestyles and subject to varying degrees of gene acquisition [5,6,18,19,20]. By assessing the history of every gene family, we find that gene acquisition is a major factor contributing to genomic diversity of these bacteria, but that, paradoxically, they rarely exchange genes. In addition, duplication appears to have played a secondary role in the evolution of gene repertoires, as multigene families are scarce and a substantial fraction of the genetic redundancy observed in genomes is better explained by gene acquisition from a distant source. These results support the view that bacterial genomes evolve mainly by incorporation of completely novel genes rather than by intragenomic duplication or by replacement of resident genes with distant homologs. Results Having defined all gene families of homologs present in 13 sequenced γ-proteobacterial genomes [16], we partitioned them according to their distribution among species and their incidence of synology. We examined the congruence of each of these families with the organismal phylogeny using maximum-likelihood (ML) tests, and we provide two estimates (one stringent and one permissive) of the number of lateral gene transfers (LGTs) found in these families (Figure 1
Single-Copy Genes Among single-copy genes present in six to 12 genomes (Figure 1 Occurrence and Source of Synology in Bacterial Genomes The sizes of gene and protein families in bacterial genomes have previously been shown to follow a power law distribution [21,22,23,24]. Within the γ-Proteobacteria, we find that, overall, very few gene families contain synologs, as evident in the low frequencies of families in which members outnumber genomes (Figure 1
For families with larger numbers of synologs, the direct comparison of gene trees with the reference topology also indicates high levels of LGT (up to 60%), although its inference is less definite due to uncertainties in reconstructing complex histories of multiple gene gains and losses. Families with three or more synologs are few in number (<2% of the 14,158 families) but include some instances of ancient gene duplication preceding the diversification of lineages. Phylogenetic Signal and the Evidence for Vertical Inheritance In tests for phylogenetic congruence, alignments that do not reject the reference organismal phylogeny are usually interpreted as reflecting vertical inheritance. However, such results, that is, the absence of a significant difference from the reference topology, can also be caused by phylogenetically uninformative alignments. Such problems are most likely for extremely divergent sequences, for which alignment and phylogenetic inference procedures are prone to failure, or for very short sequences, which may lack sufficient numbers of informative sites. Our gene families were constructed so as to exclude extremely divergent sequences, leaving the possibility that short genes are the most problematic ones. But, in our tests, there was no significant difference (p > 0.2) in the incidence of LGT among genes of different size categories, implying that lack of sufficient information was not a primary reason for failing to reject the reference topology. Furthermore, to explain the result whereby families with synologs display more LGT than those without, one would need to hypothesize that the lack of phylogenetic signal is restricted to families without synologs. However, the difference in the frequency of LGT between the families with and without synologs remains evident in each of the size categories (see Figure 1 Genes with Very Limited Phylogenetic Distributions About half of the gene families (7,655 of 14,158; Figure 1 This analysis, in which the cutoff for protein matches is based on e-values rather than on an empirically determined percentage of the maximal bit score, provided evidence that the majority of the single-member gene families within γ-proteobacterial genomes could be attributed to LGT. Only 17.5% of the proteins unique to a single genome had matches in other γ-proteobacterial genomes. These potentially represent quickly evolving genes that were originally excluded from protein families because of insufficient similarity. In contrast, 40% of the unique proteins gave hits in organisms outside of the γ-Proteobacteria, a distribution that will most likely arise by LGT between distantly related lineages. The remaining 42.5% of the single-member gene families correspond to orphan open reading frames (ORFans), that is, genes that have no homologs in the current databases. Alternatively, this last category could result from the misannotation of genome sequences. However, a recent study of ORFans in Escherichia coli demonstrated that most encode functional proteins [26]. ORFan genes tend to be short and enriched in A/T nucleotides when compared to the rest of the genome, features that suggest that they originated in parasitic elements, such as bacteriophages [26]. An analysis of the base composition of the sets of unique genes in the γ-Proteobacteria demonstrate that in all genomes (with the exceptions of Buchnera, Wigglesworthia, and Haemophilus, each possessing few, if any, unique genes), ORFans are significantly biased toward A+T at the third codon positions when compared with other genes in the genome (averaging a 5% difference in A+T contents; p < 0.05). This result is consistent with the hypothesis that these genes, which have no matches in current databases, have been recently acquired from bacteriophages, whose diversity is largely unsampled and unknown [27]. Therefore, the prevalence of gene families restricted to one or a few genomes (Figure 1 For families containing single members in four or five genomes, ML tests supported phylogenetic congruence for nearly 100% of cases (results not shown). However, this high degree of congruence could reflect, in part, the large number of gene families shared by closely related genomes (e.g., the two Yersinia or the two xanthomonads). To further evaluate those families (with and without synologs) present in two to five genomes, we enumerated the gene losses required to explain the phylogenetic distribution of the family under the assumption of no LGT following a single initial appearance in a lineage (Figure 1 Cumulatively, the phylogenetic evidence (for gene families present in six or more genomes) and the distributional evidence (for gene families present in fewer than six genomes) indicate that high levels of foreign gene acquisition have introduced the majority of genes of γ-proteobacterial genomes, but that this gene acquisition has little impact on gene phylogenies within this group. Massive gene uptake does not cause phylogenetic inconsistencies because (i) acquired genes come from sources outside of this group, (ii) they rarely have homologs within the recipient genome, and (iii) subsequent to their initial acquisition, genes tend to be vertically transmitted. Extent of Gene Origination and Acquisition among Taxa The incidence of LGT varies enormously among the lineages included in our tree. For instance, in addition to possessing a very large number of unique genes, the genome of P. aeruginosa contains numerous genes from families whose phylogenetic distribution can only be explained by a very large number of gene losses in other lineages or by LGT (Figure 1 Discussion Previous attempts to reconstruct the history of gene repertoires in bacteria have examined gene distributions on a species phylogeny [9,13,28]. But ignoring the relationships among homologs will lead to incorrect assessments of the relative contributions of gene gain, loss, and duplication to genome inventories. For example, if a gene has spread widely through LGT, an analysis based on gene occurrence would conclude that this ubiquitous distribution resulted solely from vertical inheritance. Moreover, such methods cannot distinguish between LGT and duplication as the origin of synology and therefore provide a distorted view of the extent of duplication in bacterial genomes. Only by evaluating the evidence for concordance between the gene phylogenies and the organismal phylogeny is it possible to trace the history of gain, loss, and duplication affecting each gene family. It was previously shown only about 200 single-copy genes are shared by these genomes [16] and that only 1% of these broadly distributed genes display statistically supported evidence of LGT. However, most of these genomes contain several thousand genes, indicating that the majority of genes in the genome were not present in the ancestor to all γ-Proteobacteria and that they originated either through LGT or by duplications as lineages diversified. By conducting an exhaustive phylogenetic analysis of all genes present in completely sequenced γ-proteobacterial genomes, we have evaluated the factors responsible for altering gene inventories and contributing to genomic innovation. It has long been recognized that duplication and LGT contribute to the genome composition of evolving bacterial lineages and, in particular, of lineages in the γ-Proteobacteria [5,29,30,31], and we provide a quantitative assessment of the roles of these processes on a genome-wide scale. An enormous incidence of gene acquisition is suggested by the large number of genome- or clade-restricted gene families, but beyond their initial acquisitions, few gene histories conflict with the organismal tree. Our results show that most acquired genes lack homologs in the recipient genome and in other γ-Proteobacteria. Therefore, most of the genes present in contemporary genomes have arisen from distant sources. Although these genes may have been transmitted from unrelated cellular organisms, recent work revealing the previously overlooked diversity of bacteriophages [27,32] and their probable role in bacterial evolution [26,33] suggest that they have contributed significantly to the evolution of bacterial gene repertoires. Traditionally, high levels of LGT have been considered to be incompatible with a tree-like representation of bacterial evolution. However, the diversity of gene families unique to single genomes indicates that the pool of available genes is very large, allowing the rate of gene acquisition to be both high for a genome and very low for a particular gene. Interestingly, there is no evidence that genes with narrower phylogenetic distributions were more likely to undergo LGT, suggesting that the essentiality of a gene, as denoted by its universal presence among species, is not a predictor of its propensity for LGT. Hence, once acquired, most genes appear to strictly follow the organismal phylogeny. Whereas in eukaryotes, most multicopy genes arise from duplications, we find that LGT underlies a substantial proportion of the cases of synology in bacterial genomes. But, overall, synology is rare among gene families. Because duplicates are only rarely retained in bacterial genomes for long periods of time, hidden paralogy, that is, the differential loss of paralogs in independent lineages, is an unlikely explanation for phylogenetic incongruence. The overall paucity of families with synologs and their association with high rates of LGT indicate that duplications are not a major mechanism for diversifying functions in these bacteria. Although duplications play an important role in the short-term adaptation of bacteria [29,30], only a few duplicated genes are retained and subject to selection for diversifying functions. The fixation of duplicates requires the gradual evolution of sequence changes conferring differences in expression or function, whereas genes arriving through LGT are likely to be operationally distinct from those already present in a genome and, thus, immediately able to contribute unique functions and to be maintained in the genome by selection. The large number of genes that are confined to a single genome indicates frequent gene acquisition in this group of bacteria. In contrast, substantially fewer genes are distributed in families present in more than one proteobacterial genome. Therefore, based on the distributions of gene families and on the abundance of genes confined to a single genome, recently acquired genes are lost most readily. This implies that genes are continuously integrated into the genomes but rarely persist long enough for hosts to diversify [31,34]. Although a few such genes could be present in multiple species, but quickly evolving and unrecognizable due to loss of sequence similarity, this situation cannot apply widely given the close relationships of some of the genomes [26]. Rather, most genes confined to a single genome reflect recent acquisition from a source outside of the sampled γ-Proteobacteria. Cumulatively, the picture emerging from these studies is that bacterial lineages are constantly subjected to the input of new genes from a large available pool. Conversely, resident genes are continually lost. As a result, genomes contain sequences that have been resident in a particular lineage for very different durations (Figure 3
Our results, based on the distributions and phylogenies of all genes of a set of related genomes, provide a context for understanding several findings that previously seemed contradictory: extremely high levels of LGT [5], congruence among gene trees at various depths within bacteria [6,16,35,36], and general agreement of sequence-based gene trees with phylogenies based on genome contents [37,38]. We focused on the most intensively sequenced bacterial clade: as more genomic sequence data become available, similar approaches can be applied to determine if genome contents evolve in the same manner in other groups. Materials and Methods Defining gene families To investigate the history of all protein-coding genes, we defined all gene families present in the following γ-Proteobacteria: E. coli K12 [39], B. aphidicola APS [40], H. influenzae Rd [41], Pasteurella multocida Pm70 [17], S. enterica serovar Typhimurium LT2 [42], Y. pestis CO-92 [19], Y. pestis KIM5 P12 [43], Vibrio cholerae (chromosomes I and II [44]), Xanthomonas axonopodis pv. citri 306 [45], X. campestris [45], Xylella fastidiosa 9a5c [46], P. aeruginosa PA01 [47], and W. glossinidia brevipalpis [48]. Protein sequences from complete genomes were retrieved from GenBank [49] and filtered to remove proteins annotated as insertion sequences or as bacteriophage sequences. Accession numbers for these genomes can be found in the Accession Numbers section of this paper. Homologous genes (and resulting gene families) were defined using a cutoff for the degree of similarity among proteins reflected in the blastp bit scores [50]. The procedure for defining gene families was described in Lerat et al. [16] and is briefly summarized as follows: first, a bank containing all annotated protein sequences from all included species was queried with all the proteins in each of the genomes via blastp, such that all proteins were searched against both their resident genome proteins and those from the other species. To establish the threshold for grouping genes into a family, we examined the distribution of the ratio of the bit score to the maximal bit score (i.e., protein match against itself) based on that observed for the proteins of E. coli compared against proteins of the other genomes. In each case, there is a bimodal distribution, with a first peak of low similarity values, which is constant among comparisons and represents random matches, and a second peak of higher values, which varies from one comparison to another and therefore probably represents true homologs. The height of the second peak varies according to the number of gene family constituents and can range from one, for single member families, to hundreds. The two phases of the distribution are partitioned at approximately 30% of the maximal bit score, and thus proteins having bit score values ≥ 30% of the maximal bit score were considered homologous and members of the same gene family. Genes were assigned to families by a simple link rule such that if gene A matches gene B, and gene B matches genes C, then all three are grouped into the same family. Comparisons among the families resolved after applying different thresholds (10%, 20%, 30%, or 40% of the maximal bit score) revealed that the 30% cutoff maximized the number of families containing genes from all 13 species, indicating that this criterion is optimal for the interspecific identification of homologous sequences. (Information about the distribution and constituents of gene families is available upon request from the authors.) Gene origins and ancestries Of the 14,158 gene families, 205 families are present as exactly one copy in each of the 13 genomes, and previous work has established that 99% (203) of these single-copy, widely distributed gene families are consistent with a single phylogeny, as expected if they share a history of vertical transmission through the replicating cell lineages [16]. This reference phylogeny provides a scaffold upon which the ancestry of every member of every gene family could be examined. To investigate how each gene originates within a genome and how gene families are generated, all protein-coding genes within each family were subjected to phylogenetic analysis. Although strong evidence of LGT can be gained by a phylogenetic approach, several factors, including the sensitivity of the tests employed and the varied causes of phylogenetic incongruence (such as hidden paralogy or long branch attraction, besides LGT), can confound the interpretation of such analyses. Therefore, we estimated the frequency of LGT in gene families by both stringent and permissive approaches. The conservative estimates rely upon the analysis of four different ML tests of phylogenetic congruence and the visual inspection of the trees and alignments for each family. In this case, we require that at least three tests support phylogenetic incongruence and that this incongruence not be explicable parsimoniously by hidden paralogy or ambiguous alignments. In the permissive estimates, LGT is inferred when at least one out of the four tests supported phylogenetic incongruence and when the tree needed more than two independent gene losses to be explained by hidden paralogy. Gene families differ in their distribution among the sampled genomes and in the numbers of members per genome, and we considered the following cases: Families without synologs We first focused on gene families that contained no synology (i.e., the number of genes equals the number of genomes in which family members are found) and whose members are present in at least six of the genomes considered. Sequences were aligned using ClustalW version 1.83 [51], and the best ML tree was inferred using proml from the PHYLIP package version 3.6 [52] with the JTT model of amino acid change [53] and a model of heterogeneity of evolutionary rates among sites (α parameter estimated from the dataset on the best tree, using Tree-Puzzle 5.1 [54]). The likelihood of this tree was then compared to the reference species phylogeny [16], using the different ML tests (Shimodaira-Hasegawa test [55], the one- and two-sided Kishino-Hasegawa tests [56,57], and the expected likelihood weights [58]) implemented in Tree-Puzzle 5.1 [54] with a confidence interval of 5%. LGT was inferred from the results of these different tests and by visual inspection of the tree and alignment for each family. Families with synologs In cases where a gene family contained one or two synologs (i.e., # species < # genes ≤ # species + 2), we addressed whether synology arose from LGT or from intragenomic duplication by analyzing all possible combinations of genes from an alignment but including only one gene per species via individual ML tests (see Figure 2 Because procedures that reconstruct all possible phylogenies using individual synologs are difficult to interpret when numerous synologs are present, the ML tests were not applied to families with more than two synologs. The number of such families was small, which enabled us to infer cases of LGT by inspection of tree topologies. For each family containing multiple synologs, a tree based on the whole family was built with “Neighbor” using a distance matrix obtained from protdist (JTT model of amino acid change [53]) from the PHYLIP package version 3.6 [52]. Distances were computed under the γ-based method for correcting the heterogeneity of rates among sites with the α parameter obtained from the dataset on the best tree, using Tree Puzzle 5.1 [54]. Families present in few species For gene families present in fewer than six genomes, ML analyses either are not possible (when family members are present in fewer than four species) or might overestimate congruence (when pairs of very closely related genomes are included, such as the two Yersinia or the two xanthomonads). To further evaluate the incidence of LGT in gene families distributed in two to five genomes, we inferred an initial acquisition event in the most recent ancestor of the species containing a homolog and tallied the minimum number of independent events of loss required to explain the phylogenetic distribution. Families requiring the inference of zero, one, or two losses can most readily be interpreted as vertically transmitted following their origin in the shared ancestor. In contrast, families requiring inference of many losses would be most reasonably interpreted as having undergone multiple acquisition events from outside sources or transfer between lineages of γ-Proteobacteria. Accession Numbers The GenBank (http://www.ncbi.nlm.nih.gov/Genbank/index.html) accession numbers for genomes discussed in this paper are Escherichia coli K12 (NC 000913), Buchnera aphidicola APS (NC 002528), Haemophilus influenzae Rd (NC 000907), Pasteurella multocida Pm70 (NC 002663), Salmonella enterica serovar Typhimurium LT2 (NC 003197), Yersinia pestis CO-92 (NC 003143), Yersinia pestis KIM5 P12 (NC 004088), Vibrio cholerae (NC 002505 [chromosome I] and NC 002506 [chromosome II]), Xanthomonas axonopodis pv. citri 306 (NC 003919), Xanthomonas campestris (NC 003902), Xylella fastidiosa 9a5c (NC 002488), Pseudomonas aeruginosa PA01 (NC 002516 [47]), and Wigglesworthia glossinidia brevipalpis (NC 004344). Acknowledgments Financial support was provided by Department of Energy grant DEFG0301ER63147 to HO and National Science Foundation grant 0313737 to NAM. Competing interests. The authors have declared that no competing interests exist. Abbreviations
Footnotes Author contributions. EL, VD, and NAM conceived and designed the experiments. EL performed the experiments. EL, VD, HO, and NAM analyzed the data. VD contributed reagents/materials/analysis tools. EL, VD, HO, and NAM wrote the paper. Citation: Lerat E, Daubin V, Ochman H, Moran NA (2005) Evolutionary origins of genomic repertoires in bacteria. PLoS Biol 3(5): e130. References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
||||||||||||||||||||
Genome. 1989; 31(1):304-10.
[Genome. 1989]Science. 2000 Mar 24; 287(5461):2204-15.
[Science. 2000]Nature. 2000 May 18; 405(6784):299-304.
[Nature. 2000]Science. 2003 Aug 8; 301(5634):829-32.
[Science. 2003]Trends Cell Biol. 1999 Dec; 9(12):M5-8.
[Trends Cell Biol. 1999]Mol Biol Evol. 2002 Dec; 19(12):2226-38.
[Mol Biol Evol. 2002]Nat Rev Microbiol. 2003 Nov; 1(2):127-36.
[Nat Rev Microbiol. 2003]Genome Res. 2003 Jul; 13(7):1589-94.
[Genome Res. 2003]PLoS Biol. 2003 Oct; 1(1):E19.
[PLoS Biol. 2003]Proc Natl Acad Sci U S A. 2001 Mar 13; 98(6):3460-5.
[Proc Natl Acad Sci U S A. 2001]Nature. 2000 May 18; 405(6784):299-304.
[Nature. 2000]Science. 2003 Aug 8; 301(5634):829-32.
[Science. 2003]Nature. 2001 Oct 25; 413(6858):848-52.
[Nature. 2001]PLoS Biol. 2003 Oct; 1(1):E19.
[PLoS Biol. 2003]Mol Biol Evol. 1998 May; 15(5):583-9.
[Mol Biol Evol. 1998]Phys Rev Lett. 2000 Sep 18; 85(12):2641-4.
[Phys Rev Lett. 2000]Nature. 2002 Nov 14; 420(6912):218-23.
[Nature. 2002]Proteins. 2003 Jun 1; 51(4):569-76.
[Proteins. 2003]Nucleic Acids Res. 2000 Jan 1; 28(1):68-71.
[Nucleic Acids Res. 2000]Genome Res. 2004 Jun; 14(6):1036-42.
[Genome Res. 2004]Cell. 2003 Apr 18; 113(2):171-82.
[Cell. 2003]PLoS Biol. 2003 Oct; 1(1):E19.
[PLoS Biol. 2003]Nat Rev Microbiol. 2003 Nov; 1(2):127-36.
[Nat Rev Microbiol. 2003]Genome Res. 2002 Jan; 12(1):17-25.
[Genome Res. 2002]Trends Microbiol. 2004 Apr; 12(4):148-54.
[Trends Microbiol. 2004]PLoS Biol. 2003 Oct; 1(1):E19.
[PLoS Biol. 2003]Nature. 2000 May 18; 405(6784):299-304.
[Nature. 2000]Science. 1998 Nov 6; 282(5391):1133-5.
[Science. 1998]Proc Natl Acad Sci U S A. 2002 Feb 19; 99(4):2164-9.
[Proc Natl Acad Sci U S A. 2002]Proc Natl Acad Sci U S A. 1998 Aug 4; 95(16):9413-7.
[Proc Natl Acad Sci U S A. 1998]Cell. 2003 Apr 18; 113(2):171-82.
[Cell. 2003]Theor Popul Biol. 2002 Jun; 61(4):471-80.
[Theor Popul Biol. 2002]Genome Res. 2004 Jun; 14(6):1036-42.
[Genome Res. 2004]Curr Opin Microbiol. 2003 Aug; 6(4):417-24.
[Curr Opin Microbiol. 2003]Science. 1998 Nov 6; 282(5391):1133-5.
[Science. 1998]Proc Natl Acad Sci U S A. 2002 Feb 19; 99(4):2164-9.
[Proc Natl Acad Sci U S A. 2002]Proc Natl Acad Sci U S A. 1998 Aug 4; 95(16):9413-7.
[Proc Natl Acad Sci U S A. 1998]Genome Biol. 2003; 4(9):R57.
[Genome Biol. 2003]Genome Res. 2004 Jun; 14(6):1036-42.
[Genome Res. 2004]Nature. 2000 May 18; 405(6784):299-304.
[Nature. 2000]Science. 2003 Aug 8; 301(5634):829-32.
[Science. 2003]PLoS Biol. 2003 Oct; 1(1):E19.
[PLoS Biol. 2003]Trends Genet. 2002 Jan; 18(1):1-5.
[Trends Genet. 2002]Genome Res. 2002 Jul; 12(7):1080-90.
[Genome Res. 2002]Science. 1997 Sep 5; 277(5331):1453-62.
[Science. 1997]Nature. 2000 Sep 7; 407(6800):81-6.
[Nature. 2000]Science. 1995 Jul 28; 269(5223):496-512.
[Science. 1995]Proc Natl Acad Sci U S A. 2001 Mar 13; 98(6):3460-5.
[Proc Natl Acad Sci U S A. 2001]Nature. 2001 Oct 25; 413(6858):852-6.
[Nature. 2001]Nucleic Acids Res. 1997 Sep 1; 25(17):3389-402.
[Nucleic Acids Res. 1997]PLoS Biol. 2003 Oct; 1(1):E19.
[PLoS Biol. 2003]PLoS Biol. 2003 Oct; 1(1):E19.
[PLoS Biol. 2003]Methods Enzymol. 1996; 266():383-402.
[Methods Enzymol. 1996]Comput Appl Biosci. 1992 Jun; 8(3):275-82.
[Comput Appl Biosci. 1992]PLoS Biol. 2003 Oct; 1(1):E19.
[PLoS Biol. 2003]J Mol Evol. 1989 Aug; 29(2):170-9.
[J Mol Evol. 1989]Syst Biol. 2000 Dec; 49(4):652-70.
[Syst Biol. 2000]Comput Appl Biosci. 1992 Jun; 8(3):275-82.
[Comput Appl Biosci. 1992]Nature. 2000 Aug 31; 406(6799):959-64.
[Nature. 2000]PLoS Biol. 2003 Oct; 1(1):E19.
[PLoS Biol. 2003]PLoS Biol. 2003 Oct; 1(1):E19.
[PLoS Biol. 2003]PLoS Biol. 2003 Oct; 1(1):E19.
[PLoS Biol. 2003]