![]() | ![]() |
Formats:
|
||||||||||||||
Copyright © 2005 Guy et al; licensee BioMed Central Ltd. Genometrics as an essential tool for the assembly of whole genome sequences: the example of the chromosome of Bifidobacterium longum NCC2705 1Département de Microbiologie Fondamentale, Faculté de Biologie et Médecine, Université de Lausanne, CH-1015 Lausanne, Switzerland Corresponding author.Lionel Guy: lionel.guy/at/unil.ch; Dimitri Karamata: dimitri.karamata/at/unil.ch; Philippe Moreillon: philippe.moreillon/at/unil.ch; Claude-Alain H Roten: claude-alain.roten/at/unil.ch Received April 29, 2005; Accepted October 13, 2005. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. This article has been cited by other articles in PMC.Abstract Background Analysis of the first reported complete genome sequence of Bifidobacterium longum NCC2705, an actinobacterium colonizing the gastrointestinal tract, uncovered its proteomic relatedness to Streptomyces coelicolor and Mycobacterium tuberculosis. However, a rapid scrutiny by genometric methods revealed a genome organization totally different from all so far sequenced high-GC Gram-positive chromosomes. Results Generally, the cumulative GC- and ORF orientation skew curves of prokaryotic genomes consist of two linear segments of opposite slope: the minimum and the maximum of the curves correspond to the origin and the terminus of chromosome replication, respectively. However, analyses of the B. longum NCC2705 chromosome yielded six, instead of two, linear segments, while its dnaA locus, usually associated with the origin of replication, was not located at the minimum of the curves. Furthermore, the coorientation of gene transcription with replication was very low. Comparison with closely related actinobacteria strongly suggested that the chromosome of B. longum was misassembled, and the identification of two pairs of relatively long homologous DNA sequences offers the possibility for an alternative genome assembly proposed here below. By genometric criteria, this configuration displays all of the characters common to bacteria, in particular to related high-GC Gram-positives. In addition, it is compatible with the partially sequenced genome of DJO10A B. longum strain. Recently, a corrected sequence of B. longum NCC2705, with a configuration similar to the one proposed here below, has been deposited in GenBank, confirming our predictions. Conclusion Genometric analyses, in conjunction with standard bioinformatic tools and knowledge of bacterial chromosome architecture, represent fast and straightforward methods for the evaluation of chromosome assembly. Background Bifidobacterium longum is an obligate anaerobe, belonging to the Actinomycetales, a branch of the high-GC Gram-positive bacteria which includes, among others, corynebacteria, mycobacteria and streptomycetes. B. longum is a natural colonizer of the gastrointestinal tract (GIT) and the vagina [1]. It is one of the very first bacteria which colonize the sterile GIT of newborns, predominating in breast-fed infants until weaning [2]. Thereafter, its numerical importance decreases, while Bacteroides and other taxa replace it [3]. B. longum, a harmless bacterium, considered to play an important role in maintaining a healthy GIT by preventing diarrhea, improving lactose intolerance, and participating to immunomodulation [1], is now widely used in health-promoting foods. Recently, the whole genome of B. longum strain NCC2705 has been sequenced [4]. Comparison with other high-GC Gram-positives revealed high levels of protein homology with Streptomyces coelicolor A3(2) (34% of best hits), Mycobacterium tuberculosis (9.3% of best hits) and, to a lesser extent, with other actinobacteria as well as with some unrelated genera such as Clostridium and Streptococcus. Surprisingly, it contains a very high number of genetic entities related to mobile elements such as transposons and plasmids. There are 14 integrases/recombinases, 16 intact insertion sequences (ISs), many remnants of ISs, one integrated plasmid, many possible remnants of integrated plasmids and prophages. The origin and the terminus of chromosome replication of B. longum NCC2705 could not be accurately localized along the initial whole genome sequence. Today, DJO10A, another strain of B. longum, is almost fully sequenced, but not assembled. The presence of many ISs and IS remnants in B. longum NCC2705 leaves open the possibility of major chromosomal rearrangements [4]. These internal recombination events were already advanced to explain the poor conservation of gene order during the evolution of prokaryotic genomes [5]. It appears that major chromosomal rearrangements almost always consist in inversions of a segment of the chromosome centered on the origin of replication [6,7]. Other inversions are probably counter-selected [8], since, unlike inversions around the origin of replication, they change the orientation of transcription relative to DNA replication or they change the length of chromosome arms. Such events have adverse effects on both replication speed and transcription [9,10]. Alternatively, it has been proposed that rearrangements preferentially centered on the origin of replication are favored by the bidirectional DNA replication: starting simultaneously from the origin of chromosome replication, the two replication forks are at the same distance from it and are likely to be in close contact [6]. Thus, DNA breaks produced by topoisomerases would generate structures suitable for recombinations between the two chromosomal arms, leading to origin-centered rearrangements [6]. The coorientation between gene transcription and DNA replication is apparently a fundamental feature of bacterial chromosome architecture. More specifically, ORFs and tRNA genes follow a similar tendency and all so far identified ribosomal RNA operons are cooriented with DNA replication [11,12]. The asymmetric bias in the nucleotide composition at the genome level is another relevant feature (for a review see [13]). The leading strand, defined by chromosome replication, is generally enriched in guanines (Gs) and depleted in cytosines (Cs). To explain this observation several proposals have been advanced: (i) a preferential usage of certain codons to avoid frameshifting during translation [14], (ii) the enrichment of coding sequences in purines so as to avoid mRNAs secondary structures [15,16], (iii) mutational biases targeting single-stranded DNA present during transcription [17] or (iv) during DNA replication [18]. Mechanisms that would lead to the observed biases in models (i) to (iii) rest on the widespread coorientation of gene transcription and chromosome replication (see above). These asymmetric biases have allowed to unambiguously determine the origin of replication in almost all bacteria [12,18-20] as well as the terminus of replication in a large majority of the species [12]. Genomic rearrangements are often highlighted by comparison of whole chromosomal sequences belonging to the same species or genus. For example, dot-plot analyses revealed two recombination events in Streptococcus pyogenes SSI-1, with respect to other S. pyogenes strains, leading to an inversion around the origin of replication [21]. Since they do not change the orientation of transcription relative to DNA replication, these symmetric rearrangement were not revealed by nucleotide bias (skew) analysis. However, several examples of asymmetric rearrangements are known, pointed out by nucleotide skew analyses. Several isolates of the original clone of Pseudomonas aeruginosa PAO1, a high-GC gamma-proteobacterium and an opportunistic pathogen, have an inversion of a third of their chromosome. The inversion occurs by homologous recombination between two rRNA loci: rrnA and rrnB. As a consequence, the circular chromosome is divided into two inequal arms of one and two thirds, instead of the usual two halves [22]. This asymmetry is obvious on a cumulative GC- or TA skew curve (see the P. aeruginosa PAO1 page on Comparative Genometrics website [23]). The citrus pathogen Xylella fastidiosa 9a5c is another example of asymmetric rearrangements, that is highlighted when compared with another X. fastidiosa strain, Temecula1 [24,25]. In this case, the rearrangements occur between three pairs of prophage-related elements [25], also dividing the chromosome of strain 9a5c in two arms of inequal lenghts (one third and two thirds), as in P. aeruginosa PAO1 (see pages for X. fastidiosa strains on Comparative Genometrics [23]). In Yersinia pestis strains, the high number of insertion sequence (IS) copies leads to frequent recombination events, inverting segments of the chromosome and changing their orientation of transcription with respect to replication. These inversions are easily spotted on a GC skew plot (see pages for Y. pestis strains on Comparative Genometrics [23]). In all three above cases, the rearrangements occur naturally, and do not constitute an incorrect genome assembly. In this contribution we assess the assembly of the initially deposited genome sequence of B. longum NCC2705 by genometric methods, rapid and efficient tools suitable for testing the assembly of prokaryotic chromosomes [23]. Our analysis, strongly suggesting that the chromosome of B. longum NCC2705 was initially misassembled, was confirmed by Schell et al. [26] during the review of this contribution. Results and discussion Analysis of the initial Bifidobacterium longum NCC2705 genome sequence Investigation with genometric tools of the initially released nucleotide sequence of B. longum NCC2705 (Configuration I [GenBank:NC_004307.1]) revealed several atypical features. First, cumulative GC skew on the first codon position, as well as the cumulative ORF orientation skew, yielded a curve with six significant changes of the slope sign. Furthermore, the dnaA gene was not located at the lowest minimum of the curve (Figure (Figure1A1A
The presence of dnaA at a place other than the minimum of the cumulative ORF orientation curve has never been reported in high-GC Gram-positive bacteria. For a large majority of bacterial species, it was shown that dnaA, a gene whose product binds to the origin of replication and participates in the initiation of replication, is located close to the origin [12,28]. More generally, in archaea, gene orc1/cdc6, which encodes the archaeal counterpart of DnaA, is very often also located close to the origin of chromosome replication. Finally, in sequenced genomes, half of archaea and most of bacterial genes encoding origin binding proteins are close to the minimum of the cumulative GC skew and ORF orientation curves [12]. Second, in the first published B. longum NCC2705 sequence, coorientation indexes (CI), i.e. the proportion of genes or of a given subset of genes that are transcribed in the direction of chromosome replication, revealed significant anomalies. Indeed, the CI of protein encoding genes and tRNAs was 0.48 and 0.47, respectively, while only one out of the four rRNA operons was cooriented (Table 1). These low CIs are most uncommon in bacteria where it has been shown that the majority of protein encoding genes [11] as well as of tRNA genes are cooriented with chromosome replication (i.e. CIs higher than 0.5), while, so far, a strict coorientation of rRNA operons constitutes a universal rule in prokaryotes [12]. More specifically, CIs of protein encoding genes of M. tuberculosis CDC1551 and S. coelicolor A3(2), two high G+C Gram-positives related to B. longum NCC2705, are 0.59 and 0.55, respectively. CIs of tRNAs of the same species are similar, i.e. 0.62 and 0.57, respectively, whereas all rRNA operons are cooriented (Table 1).
Third, between-species whole-genome alignments of B. longum NCC2705 and M. tuberculosis CDC1551 or S. coelicolor A3(2) revealed a very poor conservation of gene order (Figure (Figure2)2
In summary, genometric analyses revealed major anomalies in the organization of the B. longum NCC2705 genome: (i) several changes in the sign of the slope of the cumulative nucleotide skew curves, and location of the dnaA gene far from the minimum of the curve, (ii) low gene coorientation indexes and (iii) absence of correlation between B. longum NCC2705 and related species in between-species whole-genome alignments. Relationship of B. longum strains NCC2705 and DJO10A Availability of numerous contigs of the genome of DJO10A, another B. longum strain, strongly suggested that the initially reported sequence of NCC2705 chromosome could have been incorrectly assembled, or had undergone major chromosomal rearrangements. Indeed, BLAST results reveal three DJO10A long scaffolds -number 1, 8 and 9 – each with a large number of hits in two different regions of the B. longum NCC2705 chromosome (Figure (Figure1A1A
The presence of these insertion sequences (Table 2) at putative recombination sites offers a straightforward way to account for chromosomal rearrangements which can mediate the shift between initial configuration of strain NCC2705 and the putative configuration of strain DJO10A, here below designated configurations I and II, respectively (Figure (Figure3).3
Analysis of the configuration II of the B. longum NCC2705 chromosome Genometric analyses of the B. longum NCC2705 chromosome in configuration II reveal a genome architecture typical of high-GC Gram-positive organisms. Indeed, the cumulative GC-skew curve performed on the first codon positions and the cumulative ORF orientation skew curves are very similar to those characteristic of high-GC Gram-positive chromosomes (Figure (Figure1C).1C A genome sequence corresponding approximatively to configuration II of B. longum NCC2705 [GenBank:NC_004307.2] has been recently deposited in the GenBank database by Schell et al. [26]. Whereas we hypothesized that the two pairs of IS, ISBlo2a, ISBlo2b, ISBlo5c and ISBlo5d were the only four chromosomal rearrangement loci, these authors found experimental evidences that, moreover, the initial sequence of NCC2705 had been misassembled at the level of the three ribosomal RNA operons. The sequence in configuration II proposed in this contribution has three DNA segments (totalizing 226 kb, i.e. 10% of the genome) which are differently assembled than the corrected sequence. These assembly discrepancies have only very limited consequences on the results of our analyses. Thus, configuration II, similar to the recently deposited sequence of the B. longum NCC2705 genome [Genbank:NC_004307.2], is endowed with all chromosomal features common to high-GC Gram-positive bacteria: (i) cumulative GC-and ORF orientation skew curves are typical and the dnaA gene is located at the minimum of the curves, (ii) between-species whole-genome alignments provide the expected X-shape and the coefficients of correlation are relatively close to 1 or -1 and (iii) relevant contigs of B. longum DJO10A are each homologous to a single continuous region of the proposed NCC2705 chromosome. Conclusion Genometric analyses – nucleotide skews, coorientation indexes, BLAST comparisons and between-species whole-genome alignments – revealed a most peculiar chromosomal architecture of the initially reported sequence of the B. longum NCC2705 genome. This observation may have two explanations. First, it is highly probable that in the final stages of the sequencing process, the genome of B. longum NCC2705 was misassembled, a possibility presently favored by Schell and coworkers, in particular since independently performed long-range PCR experiments confirm the presence of configuration II, but could not detect configuration I (F. Arigoni, personal communication). This is in full agreement with genometric analyses of more than 150 genome sequences ([23] and supplementary material of Tillier and Collins [20,33]) which reveal a near universal architecture of prokaryotic chromosomes, also found in configuration II of B. longum NCC2705. Second, the genome of NCC2705 may possibly undergo major chromosomal rearrangements, yielding either of the two alternative configurations I and II. The interconversion between them, achieved by two crossovers between two pairs of homologous insertion sequences (IS), would have drastic consequences. In particular, in configuration II, the cooriented transcription of the majority of the genes, including all rRNA operons, with chromosome replication would allow higher growth rates in suitable conditions. Absence of significant coorientation in configuration I due to the inversion of large segments b and d, representing about 50% of the chromosome, would probably considerably increase the generation time because of collisions between the DNA- and the RNA polymerase [9]. However, as discussed here above, the adverse effects of inversions, other than those around the origin of replication, would render the existence of configuration I highly unlikely. Actually, the latter has apparently not been detected in long range PCR experiments (F. Arigoni, personal communication). However, inversions of relatively long segments – leading to the antiorientation of the majority of the genes in the segment – have been reported, for example, in Yersinia pestis [34-37], and thus may not be completely excluded in B. longum NCC2705. For the first time, our analyses illustrate the potential of fast and straightforward genometric methods to test genome assembly. They almost immediately revealed gross anomalies of the B. longum NCC2705 initially published sequence, pointing to an incorrect assembling. In conclusion, although their results have to be supported by experimental verification, these simple and powerful tools are essential for the assembly of a chromosome sequence, and for its final validation. Methods Sequences Full genome sequences and annotation files of B. longum NCC2705,S. coelicolor A3(2), M. tuberculosis CDC1551 and contigs of an unfinished sequence of B. longum DJO10A were retrieved from NCBI database [38,39]. For B. longum NCC2705, the initial [GenBank:NC_004307.1] and the second [GenBank:NC_004307.2] versions of the chromosome (released on August, 27th, 2002 and on January, 21st, 2005 respectively) were downloaded. Sequence of configuration II as proposed in this contribution, and both initial and corrected versions of the B. longum NCC2705 genome are available in fasta format [40]. As proposed by Cebrat et al. [41], we term the genome sequences available on databases and those of the complementary strands the Watson- and Crick strands, respectively. Genome analyses Nucleotide skews As defined by Lobry [18], a GC skew is the difference between the number of Gs and Cs normalized to the G+C content. In our contribution we used the non-normalized nucleotide skew, calculated in 1-kb windows along the genome. In the genomic GC skew, the whole genome sequence is used. For GC skew on the first codon position, only nucleotides at the first position of codons are considered for the skew calculation. Cumulative nucleotide skews Slightly different from the definition of Grigoriev [19], the cumulative nucleotide skew of any given window is the nucleotide skew of the latter (see above) added to the sum of skews of all preceding windows. Cumulative ORF orientation skews As in [20], in the ORF orientation skew analysis, the value attributed to each ORF corresponds to its length, considered as positive if the ORF is located on the Watson strand, and negative if encoded on the Crick strand. The cumulative ORF orientation analysis is calculated as a cumulative nucleotide skew by replacing windows and GC skews by genes and ORF orientation skews: the value corresponding to a given ORF is added to the sum of the values of all upstream-located ORFs. A cumulative ORF orientation skew is represented as a function of the position of the center of each gene. We used the number of nucleotide per gene, and not the number of ORFs to normalize the signal to the length of the gene, otherwise, in the cumulative ORF orientation skew plot, small genes would have a greater importance than long ones. Coorientation Indexes (CI) For all genomes, coorientation indexes (CI), i.e. the percentage of all or of certain categories of genes – protein encoding genes, rRNAs, tRNAs – transcribed in the direction of DNA replication, were calculated according to [12]. For that purpose, the origin and the terminus of chromosome replication are determined by cumulative GC skew. For B. longum NCC2705, where the cumulative GC skew did not reveal the origin and/or the terminus of replication, the first codon position cumulative GC skew and ORF orientation skews were used. In most so far sequenced bacterial genomes, the origin of replication is located at the minimum of the cumulative skew curves. S. coelicolor A3(2), that has an extremely high G+C content, is an exception since its origin of replication is located at the maximum of the genomic GC skew curve. Generally, the origin of replication was shown to be close to the dnaA gene. The terminus of replication is assumed to be at the maximum of the skew curves, except in S. coelicolor A3(2), where it is assumed to be at the minimum, corresponding to both ends of the linear chromosome. However, for the first reported sequence of B. longum NCC2705, where the skew analyses did not provide the origin or the terminus of replication, we assumed that they are respectively located close to dnaA and at the putative terminus of replication in the integrated plasmid, about 180° from the dnaA gene on the circular chromosome. BLAST Basic Local Alignment Search Tool (BLAST) 2.2.4 [44] analysis was performed with the software kindly provided by the NCBI [38] using as a cutoff an expected E-value of 10-2 for alignments of the full genome sequence of B. longum NCC2705 vs. the available contigs of B. longum DJO10A. An E-value of 10-2 indicates that a hit with the same or a better alignment score occurs with a probability of 10-2 when searching the same database with a random sequence. BLAST results with an alignment length below 1000 nucleotides were discarded. BLAST analysis was performed with an expected E-value of 10 for alignments of S. coelicolor A3(2) and M. tuberculosis CDC1551 vs. B. longum NCC2705 in its actual as well as putative alternative chromosomal configuration. For the latter analyses only, hits with an alignment score below 100 were discarded. A hit is defined as direct or indirect if the DNA segments are in the same, or respectively opposite, orientation in both genomes. Between-species alignments of whole genomes Also called dot-plot analyses [6,21], genome-to-genome comparisons were achieved according to [45]. The relative positions of homologous segments in pairwise comparisons of bacterial genomes were determined by BLAST (see above). Correlation coefficients of type II regression (major axis regression) were determined for both direct and indirect hit subsets. If a genome had undergone only exactly symmetric rearrangements around the origin of replication, the correlation coefficients of the direct- and indirect BLAST hit sets would be 1 and -1, respectively. Correlation coefficients close to zero show no correlation between relative chromosomal positions of homologous segments. Accession numbers Bifidobacterium longum NCC2705, [GenBank:NC_004307.1] and [GenBank:NC_004307.2]; S. coelicolor A3(2), [GenBank:NC_003888]; M. tuberculosis CDC1551, [GenBank:NC_002755]; B. longum DJO10A, [GenBank:NZ_AABM00000000]. List of abbreviations kb, kilobase; A, adenine; C, cytosine; G, guanine; T, thymine; rRNA, ribosomal RNA; tRNA, transfer RNA. Authors' contributions LG carried out the analyses during his MSc thesis, supervised by CAR who designed and managed the project. LG and CAR proposed the genome configuration II. LG, DK, PM and CAR participated to the interpretation of the results. LG drafted the manuscript in collaboration with CAR and DK. All authors read and approved the final manuscript. Acknowledgements We would like to thank Alexandre Panchaud who drew our attention at the particular genome configuration presented by the first reported sequence of B. longum NCC2705. We warmly thank Fabrizio Arigoni and Bernard Berger for sharing unpublished results in discussions initiated by our poster presentation at Genomes 2004: International Conference on the Analysis of Microbial and Other Genomes (Guy L., Karamata D., Moreillon P. and Roten CAH, The genome of Bifidobacterium longum NCC2705: an example of major chromosomal rearrangements revealed by genometric analyses. April 14–17, 2004, The Wellcome Trust Conference Centre, Cambridge, UK). We are particularly grateful to them for informing us in late 2004, after submission of the first version of our paper (July 2004), that they and their colleagues consider that their initial published sequence of B. longum NCC2705 was misassembled. References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||
Appl Environ Microbiol. 2002 Jan; 68(1):219-26.
[Appl Environ Microbiol. 2002]J Pediatr Gastroenterol Nutr. 2000 Jan; 30(1):61-7.
[J Pediatr Gastroenterol Nutr. 2000]Proc Natl Acad Sci U S A. 2002 Oct 29; 99(22):14422-7.
[Proc Natl Acad Sci U S A. 2002]Proc Natl Acad Sci U S A. 2002 Oct 29; 99(22):14422-7.
[Proc Natl Acad Sci U S A. 2002]Annu Rev Genet. 1998; 32():339-77.
[Annu Rev Genet. 1998]Nat Genet. 2000 Oct; 26(2):195-7.
[Nat Genet. 2000]Genome Biol. 2000; 1(6):RESEARCH0011.
[Genome Biol. 2000]Cell. 1988 Jun 3; 53(5):679-86.
[Cell. 1988]Science. 1992 Nov 20; 258(5086):1362-5.
[Science. 1992]Mol Cell Biol. 2005 Feb; 25(3):888-95.
[Mol Cell Biol. 2005]J Mol Evol. 1998 Dec; 47(6):691-6.
[J Mol Evol. 1998]Gene. 2004 Sep 29; 340(1):45-52.
[Gene. 2004]Gene. 1999 Sep 30; 238(1):65-77.
[Gene. 1999]J Mol Biol. 1987 Apr 20; 194(4):643-52.
[J Mol Biol. 1987]Cold Spring Harb Symp Quant Biol. 1966; 31():123-7.
[Cold Spring Harb Symp Quant Biol. 1966]Genome Res. 2003 Jun; 13(6A):1042-55.
[Genome Res. 2003]Nature. 2000 Aug 31; 406(6799):959-64.
[Nature. 2000]J Bacteriol. 2003 Feb; 185(3):1018-26.
[J Bacteriol. 2003]Mol Microbiol. 2004 Jul; 53(1):9-18.
[Mol Microbiol. 2004]Nucleic Acids Res. 2002 Jan 1; 30(1):142-4.
[Nucleic Acids Res. 2002]Gene. 2004 Sep 29; 340(1):45-52.
[Gene. 2004]Gene. 2004 Sep 29; 340(1):45-52.
[Gene. 2004]FEMS Microbiol Rev. 2002 Nov; 26(4):355-74.
[FEMS Microbiol Rev. 2002]J Mol Evol. 1998 Dec; 47(6):691-6.
[J Mol Evol. 1998]Gene. 2004 Sep 29; 340(1):45-52.
[Gene. 2004]Nat Genet. 2000 Oct; 26(2):195-7.
[Nat Genet. 2000]Genome Biol. 2000; 1(6):RESEARCH0011.
[Genome Biol. 2000]Proc Natl Acad Sci U S A. 2002 Oct 29; 99(22):14422-7.
[Proc Natl Acad Sci U S A. 2002]BMC Microbiol. 2004 Dec 22; 4():48.
[BMC Microbiol. 2004]Mol Microbiol. 2004 Jul; 53(1):9-18.
[Mol Microbiol. 2004]J Bacteriol. 1991 Aug; 173(16):5097-104.
[J Bacteriol. 1991]Mol Microbiol. 1994 Jul; 13(1):17-23.
[Mol Microbiol. 1994]J Mol Evol. 2000 Mar; 50(3):249-57.
[J Mol Evol. 2000]Science. 1992 Nov 20; 258(5086):1362-5.
[Science. 1992]Nature. 2001 Oct 4; 413(6855):523-7.
[Nature. 2001]Proc Natl Acad Sci U S A. 2004 Sep 21; 101(38):13826-31.
[Proc Natl Acad Sci U S A. 2004]Nucleic Acids Res. 2001 Jan 1; 29(1):11-6.
[Nucleic Acids Res. 2001]Nucleic Acids Res. 1998 May 15; 26(10):2286-90.
[Nucleic Acids Res. 1998]J Mol Evol. 2000 Mar; 50(3):249-57.
[J Mol Evol. 2000]Gene. 1999 Sep 30; 238(1):59-64.
[Gene. 1999]Nucleic Acids Res. 2002 Jan 1; 30(1):142-4.
[Nucleic Acids Res. 2002]BMC Microbiol. 2004 Dec 22; 4():48.
[BMC Microbiol. 2004]Mol Biol Evol. 1996 May; 13(5):660-5.
[Mol Biol Evol. 1996]Nucleic Acids Res. 1998 May 15; 26(10):2286-90.
[Nucleic Acids Res. 1998]J Mol Evol. 2000 Mar; 50(3):249-57.
[J Mol Evol. 2000]Gene. 2004 Sep 29; 340(1):45-52.
[Gene. 2004]Nucleic Acids Res. 1997 Sep 1; 25(17):3389-402.
[Nucleic Acids Res. 1997]Nat Genet. 2000 Oct; 26(2):195-7.
[Nat Genet. 2000]Genome Res. 2003 Jun; 13(6A):1042-55.
[Genome Res. 2003]Science. 2000 Jun 23; 288(5474):2212-5.
[Science. 2000]