Logo of jbacterPermissionsJournals.ASM.orgJournalJB ArticleJournal InfoAuthorsReviewers
J Bacteriol. 2007 Jan; 189(2): 377–387.
Published online 2006 Nov 3. doi:  10.1128/JB.00999-06
PMCID: PMC1797399

Genomic Structure and Phylogeny of the Plant Pathogen Ralstonia solanacearum Inferred from Gene Distribution Analysis[down-pointing small open triangle]


In the present study, we investigated the gene distribution among strains of the highly polymorphic plant pathogenic β-proteobacterium Ralstonia solanacearum, paying particular attention to the status of known or candidate pathogenicity genes. Based on the use of comparative genomic hybridization on a pangenomic microarray for the GMI1000 reference strain, we have defined the conditions that allowed comparison of the repertoires of genes among a collection of 18 strains that are representative of the biodiversity of the R. solanacearum species. This identified a list of 2,690 core genes present in all tested strains. As a corollary, a list of 2,338 variable genes within the R. solanacearum species has been defined. The hierarchical clustering based on the distribution of variable genes is fully consistent with the phylotype classification that was previously defined from the nucleotide sequence analysis of four genes. The presence of numerous pathogenicity-related genes in the core genome indicates that R. solanacearum is an ancestral pathogen. The results establish the long coevolution of the two replicons that constitute the bacterial genome. We also demonstrate the clustering of variable genes in genomic islands. Most genomic islands are included in regions with an alternative codon usage, suggesting that they originate from acquisition of foreign genes through lateral gene transfers. Other genomic islands correspond to genes that have the same base composition as core genes, suggesting that they either might be ancestral genes lost by deletion in certain strains or might originate from horizontal gene transfers.

The analysis of complete genome sequences of bacteria has revealed a large variability in the organization of these genomes. In some species, such as Mycobacterium tuberculosis, very few differences are seen among strains, whereas in other species, a large proportion of genes may be variable from strain to strain (28, 40). These analyses also established that several bacteria have developed a composite genome comprising one or several large replicons and a variable number of phages and plasmids in addition to the basic chromosome (5, 9, 16, 20, 21, 36). In the case of bacteria with such a composite genome, analysis of the gene distribution among the different replicons provides some clues about how these genomes may have emerged and evolved and how they have acquired particular properties such as pathogenicity. Recently, the development of comparative genomic hybridization (CGH) using microarray technology provided a new means to explore, on a genome-wide scale, the genetic diversity within any particular group of bacteria and to shed new light on these evolutionary processes. In particular, this approach allows the distinction between “core” (conserved) and “variable” (unevenly distributed) genes within numerous species. Depending on the species, from 1 to 45% of the genes could be assigned as being variable genes (6, 14, 29, 38, 42). This diversity is believed to result from duplication or loss of existing genes as well as from gene gain from foreign sources by lateral gene transfer (LGT) (23). LGT is one of the main mechanisms contributing to microbial genome diversification (12, 28, 30). The analysis of many complete genome sequences of bacteria reveals a mosaic structure where regions differing significantly in nucleotide composition and codon usage alternate with regions having an average genome composition and with standard codon usage (8, 26). This mosaic structure provides evidence for the acquisition of genes through LGT. Every variable gene originates from a historical event in the evolution of the species. The absence of these genes from one or more genomes could result either from presence in the ancestor followed by loss in some lineages or from absence in the ancestor followed by acquisition from a distant source. Exploring the distribution of the variable genes between different lineages of a species may contribute to better understanding of the evolutionary development of the species.

The use of comparative genome hybridization is particularly fruitful in the case of Ralstonia solanacearum to tentatively define the evolutionary scenario based on the distribution of the variable genes. R. solanacearum is a gram-negative β-proteobacterium that is the causative agent of bacterial wilt, one of the most severe and devastating vascular plant diseases in the world. This bacterium is characterized by a high level of phenotypic and genotypic diversity. Based on nucleotide sequence analysis of four genes, four monophyletic groups of strains, termed phylotypes, have been distinguished (13). These phylotypes correlate with the geographical origin of the strains: phylotype I includes strains originating primarily from Asia, phylotype II from America, phylotype III from Africa and surrounding islands in the Indian Ocean, and phylotype IV from Indonesia (13, 34). Studies of DNA-DNA hybridization have revealed that the identity between R. solanacearum genomes is often less than the 70% threshold level commonly expected within a bacterial species (31). This high genetic variation between isolates was used to define R. solanacearum as a “species complex,” a term first used by Gillings and Fahy (18). Taghavi et al. (39) then expanded the concept of the R. solanacearum species complex by including two closely related species from Indonesia, Ralstonia syzygii (a pathogen from cloves) and the agent of blood disease of banana, as both of these organisms were found to fall within the phylotype IV of R. solanacearum as defined by 16S rRNA gene sequence analysis. However, the nature of the genes responsible for such divergence is still mostly unknown, as are the molecular mechanisms which generated such diversity, which is often associated with high variation of phenotypic traits.

One characteristic of R. solanacearum that could explain this high level of diversity is its ability to naturally develop a state of competence and to exchange genetic material by horizontal gene transfer during the infection process (2, 3). The acquisition of numerous genes by horizontal gene transfer was further supported by the complete genome sequence analysis of the R. solanacearum GMI1000 strain (36). This analysis revealed the mosaic structure of both the 3.7-megabase chromosome and the 2.1-megabase megaplasmid that constitute the bacterial genome. On these two replicons, genomic regions with a biased G+C composition and alternative codon usage regions (ACURs) are dispersed within regions of standard composition. These ACURs encompass 7% of the genome and are evenly distributed over the two replicons (36).

In the present study, using strain GMI1000 as the reference strain, we establish the repertoire of genes that constitute the core genome. Based on the distribution of the variable genes among a collection of strains representative of R. solanacearum biodiversity, we also propose a tentative scenario for evolution of this species, with special attention to pathogenicity determinants.


Bacterial strains and growth conditions.

The R. solanacearum strains used in this study are shown in Table Table1.1. In addition, R. syzygii strain R28, isolated from clove (Syzygium aromaticum) in Indonesia, and R. pickettii strain LMG6871, isolated from soil from a rice field in Senegal, were investigated. All strains were grown at 28°C in complete medium B or on a rotary shaker in 100 ml liquid minimal medium supplemented with 20 mM glutamate (3).

Characteristics of the Ralstonia strains used in this study

Microarray description.

The DNA microarray used in these experiments was generated by Occhialini et al. (27). This microarray encompasses 5,074 oligonucleotides representative of the 5,120 predicted genes from the GMI1000 strain of R. solanacearum and is composed of 70- or 65-mer oligonucleotides. This microarray also includes as negative controls 10 oligonucleotides corresponding to five Corynebacterium glutamicum genes and a set of “blank” controls in which buffer without oligonucleotide was spotted. The additional 26 oligonucleotides corresponding to R. solanacearum sequences not present in strain GMI1000 were not considered for the present analysis.

DNA labeling and hybridization.

Genomic DNA was extracted from fresh bacterial cultures as described by Chen and Kuo (7) and labeled with either Cy3 or Cy5 fluorescent dye (Amersham, Biosciences) by using the BioPrime DNA labeling system kit (Invitrogen) according to the manufacturer's recommendations. For a 50-μl reaction mixture, 2 μg of genomic DNA in 23 μl of sterile water was heated at 95°C for 10 min, combined with 20 μl of 2.5× random primers solution, heated again at 95°C for 5 min, and chilled on ice. Remaining components were added to the following final concentrations: 0.12 mM dATP, dGTP, and dTTP; 0.06 mM dCTP; 0.02 mM Cy3- or Cy5-dCTP (Amersham Biosciences); 1 mM Tris-HCl (pH 8.0); 0.1 mM EDTA; and 40 units of Klenow fragment (Invitrogen). The solution was incubated at 37°C for 2 h before the reaction was stopped by adding EDTA (pH 8.0) to a final concentration of 45 mM. The fluorescence-labeled DNA was purified using the CyScribe GFX purification kit (Amersham Biosciences) and dissolved in 60 μl of elution buffer.

Hybridizations were carried out using a Lucidea automated slide processor (Amersham Pharmacia Biotech). Each experiment was run as a competitive hybridization by using Cy3-labeled DNA from one of the 18 strains of interest and Cy5-labeled DNA from GMI1000. No dye swapping was performed, since preliminary experiments had demonstrated that this had no significant impact on the final results. Microarrays were prehybridized for 1 h at 42°C in Dig Easy buffer (Roche) containing 385 μg ml−1 of salmon sperm DNA. Hybridization was done for 15 h under the same conditions after 1 μg each of Cy3- and Cy5-labeled DNA were added. Following hybridization, microarrays were washed in 1× SSC (0.15 M NaCl plus 0.015 M sodium citrate)-0.1% sodium dodecyl sulfate for 5 min at 60°C and then in 0.1× SSC for 5 min at room temperature, dried at 37°C for 5 min, and then washed by immersion in isopropanol and dried again at 37°C for 5 min. Hybridizations were systematically duplicated.

Array scanning and analysis.

Hybridized microarrays were scanned using a GenePix 4000A dual-channel (635 nm and 532 nm) confocal laser scanner (Axon Instruments) with a resolution of 10 nm per pixel. The laser power was set at 100, and the photomultiplier tension was adjusted to between 680 and 800 V according to the average intensity of the hybridization of each slide in order to optimize the dynamic range of measurements. Quantification of the signals from individual arrays was done using ImaGene 5.6.1 software and analyzed using Genesight 3.5.2 software (BioDiscovery, Inc.). Empty spots and spots with impurities, high local background fluorescence, or weak intensity compared to the signal observed for hybridization to the negative controls were excluded from analysis. For each spot, the ratio of the hybridization signal of the tested strain to that of the reference strain GMI1000 was calculated and log2 transformed, and the values thus obtained were normalized by subtracting the mean log2 ratio value calculated on a set of 1,144 conserved genes in R. solanacearum strains. These conserved genes were designed from Blastp results between the amino acid sequence of each individual gene from the reference strain GMI1000 and the genome draft of the phylogenetically distant strain IPO1609. Conserved genes were selected as having a Blastp hit covering 100% of the query with at least 90% identity. Finally the average log2 ratio of the four spots representing each gene (two slides with two spots for each gene) was calculated and used for further interpretations. Lists of the GMI1000 genes that are conserved in each tested strain were established by selecting the genes for which the average log2 ratio value thus calculated was above −2 (in other words, by excluding the genes for which the hybridization signal of the tested strain was at least four times weaker than the hybridization signal with the reference strain GMI1000). This cutoff value was chosen based on empirical optimization (see Results).

Hierarchical clustering.

Hierarchical clustering was performed with the final data set consisting of three different values (0, absent; 1, present; or ?, missing data), using Genesight 3.5.2 software (BioDiscovery, Inc.). The Ward technique was used for cluster linkage and the Euclidian method for the distance metric. Phylogenetic trees were built using DARwin 4.0.290 software (32). Genetic distances were calculated based on the Sokal-Michener index: D(i,j) = u/(m + u), where m is the number of genes with the same status (present or absent) in strains i and j and u is the number of genes with different status in strains i and j. The distance matrix thus generated was used to build unweighted neighbor-joining trees, and 1,000 bootstraps were performed.

In silico genome comparisons.

In the comparisons of the R. solanacearum proteome (or proteome subsets) with other proteomes, the presence of an R. solanacearum ortholog gene in a test genome was assimilated to the occurrence of best reciprocal hits with a minimum expected value of below 10−6 on at least 50% of both protein lengths in Blastp alignments.

Microarray data accession numbers.

All primary data from microarray experiments as well as experimental protocols used are available from the ArrayExpress depository (accession numbers A-MEXP-152 and E-MEXP-851 at http://www.ebi.ac.uk/arrayexpress/).


The strain GMI1000 microarray has the potential for detection of genes in distantly related R. solanacearum strains.

We first wanted to estimate the intrinsic ability of the GMI1000 microarray to identify which GMI1000 genes are actually conserved in a given strain. For that purpose, we estimated to what extent the oligonucleotides that had been designed for strain GMI1000 are representative of orthologous genes present in other strains. As a test strain, we used strain IPO1609, for which we have recently developed a high-quality draft genome sequence (12× genome coverage) assembled in 16 supercontigs (our unpublished data). This strain is a representative of the brown rot cluster of strains from phylotype II, which have long been recognized to be phenotypically and phylogenetically distant from strains of phylotype I such as GMI1000 (19, 34).

In the first step, we determined the list of genes that are conserved in the two strains. This was based on Blastp comparison of the amino acid sequence of each gene from strain GMI1000 with the genome sequence of IPO1609. A list of 2,963 conserved genes was defined, for which the Blastp hit covered at least 80% of the length of the query sequence with at least 80% identity at the amino acid level between the two strains. We also established a list of 488 genes from GMI1000 that are absent from IPO1609, for which the corresponding best Blastp hit covered less than 2% of the query sequence.

In the second step, we established a list of the oligonucleotides designed from strain GMI1000 that share identity with strain IPO1609. This was conducted based on Blastn comparison of the sequence of each oligonucleotide with the genome draft of IPO1609. The score for the best hit of each oligonucleotide was defined as the sum of a +1 value for each base match and a −1 value for each mismatch. This identified a list of 3,463 oligonucleotides having a minimal score of 45 (ranging from 84% identity over the entire oligonucleotide length to 92% identity over 53 consecutive base pairs) that were considered to be sufficiently conserved to give a positive signal when the microarray was hybridized with genomic DNA from strain IPO1609.

In the last step, we determined the intersections between the different lists thus generated. The list of the 3,463 oligonucleotides sharing identity with IPO1609 overlaps with 2,828 genes out of the 2,963 highly conserved genes, indicating that the oligonucleotides present on the microarray are potentially suitable for the detection of over 95% of the orthologous genes. The list of conserved oligonucleotides also overlaps with two genes out of the 488 GMI1000 genes known to be absent from strain IPO1609, therefore providing an estimation of 0.4% for the frequency of oligonucleotides that could lead to potential false-positive detection of a gene. The remaining 633 oligonucleotides correspond to genes that are present in strain IPO1609 although they are more divergent.

In conclusion, with 95% representativity and 0.4% lack of specificity, the GMI1000 microarray is well suited to investigate the distribution of the GMI1000 orthologous genes in a distant R. solanacearum strain such as IPO1609.

Calibration of the CGH methodology used in this study.

Investigation of the presence of a specific gene in a test strain by using hybridization on microarrays is based on the comparison of the intensity of the hybridization signal obtained with the genomic DNA of the tested strain to that of the hybridization signal obtained with the genomic DNA of the reference strain. For this analysis it is thus essential to define a relative cutoff value under which a particular gene will be classified as absent (or sufficiently divergent). Based on genome sequence comparisons, absolute lists of conserved and absent genes can easily be established and compared to experimental lists drawn from hybridization experiments using different cutoff values in order to maximize the detection of conserved genes while maintaining an acceptable number of false positives. For that purpose, we performed comparative genomic hybridization with genomic DNAs of strains GMI1000 and IPO1609. The lists of “detected” and “nondetected” genes in strain IPO1609 were established based on three different cutoff values for the log2 ratio of hybridization signals (−1.5, −2, and −2.5). Each list of “detected” genes thus obtained was compared with the list of the 2,828 “conserved” genes previously identified based on in silico analysis and properly represented by an oligonucleotide. Results of these comparisons are shown in Table Table2.2. Similarly, the proportion of false positives (genes that are not present in IPO1609 but that give a positive hybridization signal with IPO1609) was estimated by comparing the lists of the “detected” genes with the list of “absent” genes identified based on Blastp analysis. Results of these comparisons are also shown in Table Table2.2. Together, these comparisons led to the choice of the cutoff value of −2 to be used in further experiments, since this value provides 95% detection for conserved genes while the proportion of “false-present” genes remains close to 5%.

Calibration of an optimized cutoff value of the log2 ratio of hybridization signals of the tested strain IPO1609 relative to the reference strain GMI1000a

Core genome definition and analysis.

In order to identify the set of genes common to all R. solanacearum strains, CGH experiments were conducted using 15 additional strains. In addition, a strain of R. syzygii, previously identified as being closely related to phylotype IV of R. solanacearum, and a strain of R. pickettii, chosen as an outgroup species, were included in these experiments. Using the cutoff value of −2.0 previously defined, we found that the number of conserved genes within R. solanacearum strains varies from 68 to 98% of the GMI1000 genome depending on the strain tested (Fig. (Fig.1).1). These data establish that 2,690 out of the 5,074 genes (53%) represented on the array are present in all R. solanacearum strains tested (see Table S1 in the supplemental material). Considering the genetic diversity of the strains that were used in these experiments, it is assumed that this set of genes fairly represents the R. solanacearum core genome. It should be noted that with the exception of 185 genes, this set of core genes is shared with R. syzygii.

FIG. 1.
Clustering of the GMI1000 reference genes and of R. solanacearum strains based on gene distribution among the different strains, as determined from CGH experiments. For each individual strain, the presence of a particular gene is represented by a black ...

Concerning the distribution of the core genes thus identified on the two replicons of the GMI1000 genome, a large majority (2,079 genes, or 77%) are located on the chromosome, whereas 611 are on the megaplasmid. Therefore, core genes represent 61% of the genes from the chromosome and only 37% of the megaplasmid genes. Genes from the core genome were analyzed for the presence of orthologs (as defined by the existence of a best reciprocal hit) in the two pathogenic β-proteobacteria Burkholderia mallei (accession no. NC_006348 and NC_002349) and B. pseudomallei (accession no. NC_006350 and NC_006351), in the nonpathogenic β-proteobacterium Ralstonia eutropha (accession no. NC_007336, NC_007337, NC_007347, and NC_007348), in the human pathogen Pseudomonas aeruginosa (accession no. NC_002516), in the nonpathogenic γ-proteobacterium Escherichia coli K-12 (accession no. NC_000913), and in three plant pathogens representative of the major groups of plant-pathogenic gram-negative bacteria, i.e., Erwinia carotovora (accession no. NC_004547), Pseudomonas syringae (accession no. NC_004632, NC_4633, and NC_84578), and Xanthomonas campestris (accession no. NC_7086). This comparison identified a first set of 677 genes that are conserved in the eight organisms (see Table S2A in the supplemental material). The vast majority of these genes encode basic cell constituents, machineries, and metabolic pathways and therefore correspond to essential housekeeping genes. Among the remaining ones, 809 (referred as β specific) were found only in the β-proteobacteria (see Table S2B in the supplemental material). Close to 60% of the β-specific genes code for uncharacterized regulators, for transporters, or for proteins of unknown functions and are therefore most likely involved in adaptation of bacteria to specific ecological niches; 202 of them are conserved in the three β-proteobacteria tested.

With regard to pathogenicity, 152 genes from the R. solanacearum core genome have an ortholog in at least one of the other plant pathogens and are absent from E. coli and R. eutropha (see Table S2C in the supplemental material). These 152 genes include a set of established pathogenicity determinants such as those encoding constituents of the type III secretion machinery, a plant cell wall-degrading enzyme, and 10 genes known to be under the control of the hrpB or hrpG pathogenicity regulon (27, 41).

Variable genome definition and analysis.

A total of 2,338 genes representing 46% of the GMI1000 genome are absent (or too divergent to be detected) in at least one of the tested R. solanacearum strains (see Table S3 in the supplemental material). This corresponds to an approximation of the set of variable genes that are present in any particular strain from this species. Among these genes, 95% of the genes encoding elements of external origin (genes of class V) and 94% of the ACUR genes detected in the GMI1000 genome are represented (Fig. (Fig.2).2). About 30% of the genes predicted to encode proteins that fall into the functional categories I to IV were also classified in the variable genome. A large proportion (55%) of the variable genes encode hypothetical proteins (genes of class VI) (Fig. (Fig.22).

FIG. 2.
Distribution of core and variable genes within the different functional categories defined in the annotation of the genome of R. solanacearum strain GMI1000. Pathogenic regulon, set of genes that belong to the hrpB and hrpG pathogenicity regulon.

The variable genes thus identified are distributed throughout the two replicons of the GMI1000 genome. A total of 1,313 variable genes are located on the chromosome, whereas 1,025 genes are on the megaplasmid. Nevertheless, variable genes are overrepresented on the megaplasmid (63% of the genes from the megaplasmid are variable genes) compared to the chromosome (39%). This is illustrated in Fig. Fig.3,3, which presents the spatial distribution of variable genes along the two replicons together with the localization of ACURs, prophages, and insertion sequence elements and the local G+C composition. This clearly shows that genes assigned as variable are not evenly distributed along the genome. Variable genes have a tendency to cluster within genomic islands. We could distinguish 48 islands of variable genes, 29 on the chromosome and 19 on the megaplasmid. The majority of these islands (75%) are included in ACURs, mobile genetic elements, and regions with a low GC content compared to the rest of the genome (Fig. (Fig.3).3). Interestingly, when analyzed for base composition, core and variable genes show similar distributions except that variable genes are overrepresented and tail towards low GC values (Fig. (Fig.44).

FIG. 3.
Spatial distribution of core and variable genes on the chromosome (A) and on the megaplasmid (B) of R. solanacearum strain GMI1000. Each individual gene is represented by a vertical bar on lines 1. Core genes are represented in red, and variable genes ...
FIG. 4.
Distribution of core (gray bars) and variable (black bars) genes from GMI1000 according to the GC composition of the corresponding sequences. The inset is a magnification of the histogram for low-GC content genes.

Variations in the distribution of known pathogenicity determinants.

Many determinants of virulence/pathogenicity of R. solanacearum have been identified and studied at the molecular level during the last 20 years (17, 37). As can be expected since such determinants play important roles during the pathogen life cycle, most of these genes appear to be widely conserved within the species. This is the case, for example, for the genes driving the synthesis of the type II and type III secretion pathways, extracellular exopolysaccharide, the cell-density Phc regulatory system components, and most of the known plant cell wall-degrading enzymes.

In fact, variations in the distribution of known or suspected pathogenicity determinants were observed for two classes of genes: those encoding several hemagglutinin-related proteins (a class of surface proteins reported to be important for adhesion to plant surfaces) (35) and those encoding type III secretion system (TTSS) effectors. For example, the distribution of several hemagglutinin-encoding genes (RSc0887, RSc3188, RSp1073, and RSp1545) appears to be variable even in phylotype I strains, which are closely related to GMI1000. Our analysis also clearly reveals the existence of either an important variation in type III effector gene content or gene sequence divergence among taxonomically close strains belonging to the same phylotype. This is illustrated in Table Table33 for the five strains from phylotype III tested in this study, but it is also observed between strains grouped in the other phylotypes. Table Table33 shows that at least 35 effector genes out of the 80 candidates described in the species (10, 25, 27) show a variable distribution pattern in only five taxonomically related strains originating from the same geographical area. Interestingly, only 9 out of these 35 effector genes appear to belong to ACURs, thus suggesting that a significant part of the effector set may be commonly subjected to acquisition/deletion events or may be fast-evolving genes with high intragenic sequence divergence.

Variations in the distribution of TTSS effector genes among five strains from phylotype III (African origin)

On the other hand, at least nine TTSS effector genes (RSc1475, RSc3272, RSp0099, RSp0845, RSp0846, RSp0882, RSp1218, and RSp1281) were conserved between all 17 strains tested and thus constitute the ancestral effector core. Except for two of them (RSc1475 and RSp1218), these conserved effector genes were not detected in R. pickettii. The comparison of this core effector set to the proteomes of three other plant pathogens from the subgroup of γ-proteobacteria revealed that an ortholog of one of them, an AvrE/DspA family member (RSp1281), can be found in all these species. This ubiquitous TTSS effector has been shown to be critical for pathogenicity in Erwinia amylovora and Pseudomonas syringae (4, 11).

Variable gene distribution correlates with R. solanacearum phylogeny.

Analysis of the distribution of the GMI1000 genes among the other strains included in our experiments demonstrates that CFBP2968 is the most similar to GMI1000, with 98% of GMI1000 genes conserved (Fig. (Fig.1).1). CFBP2968 belongs to phylotype I as does GMI1000. Consistently, the two other test strains from phylotype I, MAFF211266 and PSS190, are also the most similar to GMI1000, with about 90% of GMI1000 genes conserved. Strains from phylotype II are found to be the most distant from GMI1000, with only about 70% of GMI1000 genes conserved. Finally, in R. syzygii and R. pickettii, 69% and 46% of GMI1000 genes are conserved, respectively. To further evaluate the relationships among all 19 strains, we performed a hierarchical clustering based on genes that were detected in each strain (Fig. (Fig.1).1). Surprisingly, this clustering is fully consistent with the classification into four phylotypes previously established based on nucleotide sequence analysis of the internal transcribed spacer region and of the hrpB, mutS, and eglA genes (13, 33). Four clusters are distinguished, corresponding to the four phylotypes. R. pickettii is located outside of R. solanacearum group, and, as previously reported, R. syzygii appears within phylotype IV (Fig. (Fig.1).1). The same data set was further used to build neighbor-joining trees and to calculate bootstraps values. The same consistency is observed between trees based upon analysis either of partial mutS gene sequences or of the presence/absence of variable genes (Fig. 5a and b).

FIG. 5.
Phylogenetic trees within the R. solanacearum species complex based upon analysis of partial mutS gene sequences and of the presence or absence of the GMI1000 genes. (a) mutS gene; (b) all GMI1000 genes; (c) chromosomal genes; (d) megaplasmid genes; ...

Phylogenetic trees were also independently built based on the distribution of variable genes on each individual replicon. The two trees thus constructed are consistent with the tree constructed based on the data from the complete genome with high bootstrap values (Fig. 5b, c, and d). Unexpectedly, a very similar clustering pattern was also found when a tree was made based on the distribution of genes encoded within ACURs (Fig. (Fig.5e).5e). In contrast, clustering obtained based on the distribution of insertion sequences and phage genes (class V) is not correlated with the phylotypes (Fig. (Fig.5f).5f). When the strain clustering was made exclusively on the basis of the distribution of the TTSS effectors or of the distribution of genes belonging to the hrpB and hrpG pathogenicity regulons (27, 41), the observed evolutionary trees were still congruent with the phylogeny obtained based on analysis of the complete genome (Fig. 5g and h).


In the present work we have set up and optimized conditions for genome comparison of R. solanacearum based on CGH using the complete genome of strain GMI1000 as the reference. This methodology was used to analyze gene diversity in a collection of 17 strains chosen as representative of the known diversity of the species, leading to the distinction of the core genome relative to the set of variable genes.

High genomic variability within the species.

We found that only 2,690 (53%) of the 5,074 GMI1000 genes spotted on the array yielded a positive signal for the 17 R. solanacearum strains examined and thus represent the core genetic content of the species. This percentage is identical to the percentage obtained for Salmonella enterica, in which the core genetic content represented 54% of the 4,169 open reading frames of the reference genome when 24 strains were examined (6). However, this proportion appears rather low compared to the 93% obtained for the opportunistic pathogen P. aeruginosa, which has a similar genome size (5,549 open reading frames) and a similar ability to thrive in a broad range of environments (42). The other GMI1000 genes (2,338 genes) represent part of the variable genes within the R. solanacearum species. This is of course an underrepresentation of the overall repertoire of potential variable genes that can be found in this species, and the number of genes in this class will increase with the sequencing of additional strains (15; our unpublished data).

It is interesting to note the existence of a bias in the distribution of core genes on the two GMI1000 replicons, with a clear overrepresentation of these genes on the chromosome. This observation, together with the strong bias for housekeeping genes among core genes, supports our previous hypothesis that the chromosome is the ancestral replicon (36).

A large proportion of variable genes is organized in genomic islands which are dispersed over the two replicons, a situation that confirms the mosaic structure of the GMI1000 genome (17, 22, 36). We could distinguish two types of genomic islands of variable genes. The first type, the most frequent, includes genomic islands that are often flanked with mobile genetic elements and that either (i) have a GC content of 55% or less with no counterparts in the core genome or (ii) are included in ACURs. These genomic islands could originate from acquisition of foreign genes through lateral gene transfers. The second type corresponds to genes that do not significantly differ from the core genes in term of base composition. These second blocks of variable genes either could be ancestral genes that were simultaneously lost by deletion from a particular phylum during evolution or could originate from acquisition by horizontal gene transfers.

Gene distribution and phylogeny.

A major conclusion of the present study is the demonstration of congruence between the pattern of distribution of variable genes and the phylogenic position of strains previously established (33) based on the nucleotide sequences of four markers that the present study identifies as belonging to the core genome. This is also true for the R. syzygii strain included in this study, which is classified in phylotype IV, therefore confirming the close relationship between the two species (13).

The present data also demonstrate the congruence of the two phylogenetic trees independently constructed based on distribution of variable genes on the chromosome and the megaplasmid. In addition to the presence of essential genes on the megaplasmid and a similar average codon usage with the rest of the genome, as previously established (36), this result strongly supports our previous hypothesis of a long coevolution for these two replicons.

The same phylogeny is found when the clustering is restricted to the distribution of variable genes encoded within ACURs. This was rather surprising considering that many of these genes were probably acquired through lateral gene transfers and might be expected to destructure populations. The fact that such a destructuring is not observed is an indication that these genes must have been acquired by ancestral strains and were then transmitted vertically within phylotypes. The same situation has been observed in γ-proteobacteria by assessing the history of every gene family. It has been shown that gene acquisition is a major factor contributing to genomic diversity of these bacteria, but paradoxically, once acquired, these genes are rarely transferred among lineages (23).

In contrast, the lack of congruence between the distribution of prophages/insertion sequences and phylogeny is an indication that these genetic elements are still active and that they probably still spread horizontally within populations.

Evolution of pathogenicity determinants.

Results from our study indicate that a large majority of the genes encoding pathogenicity functions are part of the core genome, a status that is in agreement with their base composition and codon usage, which fit the general pattern of characteristics of the species (36). This strongly suggests that pathogenicity is an ancestral trait in R. solanacearum. Beside the basal set of core pathogenicity genes, two sets of candidate pathogenicity genes are variable from strain to strain. These correspond to genes encoding hemagglutinin-related proteins and a subclass of TTSS-dependent effectors. Both groups appear to constitute a dynamic population of genes which are predominantly either heterogeneously distributed in the species and/or subjected to a diversifying selection which results in a sufficient sequence divergence to avoid detection through microarray hybridization. Interestingly, the phylogenetic analysis based on the distribution of the TTSS effector genes revealed a remarkable degree of congruence with the rest of the genome. This distribution suggests that these genes might have two different origins: either (i) they are ancestral or ancestrally acquired pathogenicity determinants that follow the same evolution pattern as other genes or (ii) they were independently acquired in the different phylotypes during the evolution and were never exchanged between phylotypes.

Our analysis suggests that members of the filamentous hemagglutinin family of adhesins and TTSS effector pools represent prime candidates for identifying determinants controlling host specificity, since most of the phyla in the R. solanacearum species can be distinguished on the basis of that trait, varying from having relatively narrow host ranges (race 2 and 3 strains) to a wide range of hosts which can overlap several botanical families (race 1 strains). TTSS effectors identified as “avirulence” factors are known to restrict the host ranges of plant pathogens (1, 24), and it is also plausible that the collective effect of certain TTSS effector genes could lead them to overcome the plant defense reactions on a given host(s). Hemagglutinin-related proteins could also account for host specificity, since bacterial attachment to host tissues by adhesins is a first step in pathogenesis and it is conceivable that a certain degree of specificity may exist between various adhesins and different host cell surface structures. However, the identification of host specificity factors is hampered by three limitations: (i) the number of strains tested is not yet sufficient and their host specificity is not yet well enough defined to establish robust correlations between presence/absence of candidate genes and host specificity, (ii) the in silico versus microarray hybridization comparison of absent/present genes in strains GMI1000 and IPO1609 provided evidence that some oligonucleotides are not suited for detection of orthologous genes in other strains, and (iii) this approach enables detection only of genes that are present in the reference strain GMI1000. The completion of the R. solanacearum microarray with oligonucleotides representative of the novel gene sequences identified in the recently sequenced race 3 strains IPO1609 and UW551 (15; our unpublished data) will therefore improve such analyses.

Supplementary Material

[Supplemental material]


This work was supported by the Regional Council of Reunion Island, the European Community (FEOGA), the Toulouse Midi Pyrénées Genopole, and CIRAD under research grant 3P118.

We express our thanks to Xavier Nesme, who participated in the phylogenetic analysis of our microarray data, and to Lionel Gagnevin for phylogenetic tree construction. We also acknowledge Jerome Gouzy for providing help in the preparation of the figures and Nemo Peeters and Vincent Daubin for helpful comments.


[down-pointing small open triangle]Published ahead of print on 3 November 2006.

Supplemental material for this article may be found at http://jb.asm.org/.


1. Alfano, J. R., and A. Collmer. 2004. Type III secretion system effector proteins: double agents in bacterial disease and plant defense. Annu. Rev. Phytopathol. 42:385-414. [PubMed]
2. Bertolla, F., A. Frostegård, B. Brito, X. Nesme, and P. Simonet. 1999. During infection of its hosts, the plant pathogen Ralstonia solanacearum naturally develops a state of competence and exchanges genetic material. Mol. Plant-Microbe Interact. 12:467-472.
3. Boucher, C., A. Martinel, P. Barberis, G. Alloing, and C. Zischek. 1985. Virulence genes are carried by a megaplasmid of the plant pathogen Pseudomonas solanacearum. Mol. Gen. Genet. 205:270-275.
4. Boureau, T., H. El Maarouf-Bouteau, A. Garnier, M. N. Brisset, C. Perino, I. Pucheu, and M. A. Barny. 2006. DspA/E, a type III effector essential for Erwinia amylovora pathogenicity and growth in planta, induces cell death in host apple and nonhost tobacco plants. Mol. Plant-Microbe Interact. 19:16-24. [PubMed]
5. Burrus, V., and M. K. Waldor. 2004. Shaping bacterial genomes with integrative and conjugative elements. Res. Microbiol. 155:376-386. [PubMed]
6. Chan, K., S. Baker, C. C. Kim, C. S. Detweiler, G. Dougan, and S. Falkow. 2003. Genomic comparison of Salmonella enterica serovars and Salmonella bongori by use of an S. enterica serovar Typhimurium DNA microarray. J. Bacteriol. 185:553-563. [PMC free article] [PubMed]
7. Chen, W. P., and T. T. Kuo. 1993. A simple and rapid method for the preparation of gram-negative genomic DNA. Nucleic Acids Res. 21:2260. [PMC free article] [PubMed]
8. Coenye, T., and P. Vandamme. 2003. Simple sequence repeats and compositional bias in the bipartite Ralstonia solanacearum GMI1000 genome. BMC Genomics. 4:10. [PMC free article] [PubMed]
9. Coleman, M. L., M. B. Sullivan, A. C. Martiny, C. Steglich, K. Barry, E. F. Delong, and S. W. Chisholm. 2006. Genomic islands and the ecology and evolution of Prochlorococcus. Science 311:1768-1770. [PubMed]
10. Cunnac, S., A. Occhialini, P. Barberis, C. Boucher, and S. Genin. 2004. Inventory and functional analysis of the large Hrp regulon in Ralstonia solanacearum: identification of novel effector proteins translocated to plant host cells through the type III secretion system. Mol. Microbiol. 53:115-128. [PubMed]
11. Debroy, S., R. Thilmony, Y. B. Kwack, K. Nomura, and S. Y. He. 2004. A family of conserved bacterial effectors inhibits salicylic acid-mediated basal immunity and promotes disease necrosis in plants. Proc. Natl. Acad. Sci. USA 101:9927-9932. [PMC free article] [PubMed]
12. Dufraigne, C., B. Fertil, S. Lespinats, A. Giron, and P. Deschavanne. 2005. Detection and characterization of horizontal transfers in prokaryotes using genomic signature. Nucleic Acids Res. 33:e6. [PMC free article] [PubMed]
13. Fegan, M., and P. Prior. 2005. How complex is the “Ralstonia solanacearum species complex,” p. 449-462. In C. Allen, P. Prior, and C. Hayward (ed.), Bacterial wilt: the disease and the Ralstonia solanacearum species complex. APS Press, St. Paul, MN.
14. Fukiya, S., H. Mizoguchi, T. Tobe, and H. Mori. 2004. Extensive genomic diversity in pathogenic Escherichia coli and Shigella strains revealed by comparative genomic hybridization microarray. J. Bacteriol. 186:3911-3921. [PMC free article] [PubMed]
15. Gabriel, D. W., C. Allen, M. Schell, T. P. Denny, J. T. Greenberg, Y. P. Duan, Z. Flores-Cruz, Q. Huang, J. M. Clifford, G. Presting, E. T. Gonzalez, J. Reddy, J. Elphinstone, J. Swanson, J. Yao, V. Mulholland, L. Liu, W. Farmerie, M. Patnaikuni, B. Balogh, D. Norman, A. Alvarez, J. A. Castillo, J. Jones, G. Saddler, T. Walunas, A. Zhukov, and N. Mikhailova. 2006. Identification of open reading frames unique to a select agent: Ralstonia solanacearum race 3 biovar 2. Mol. Plant-Microbe Interact. 19:69-79. [PubMed]
16. Galibert, F., T. M. Finan, S. R. Long, A. Puhler, P. Abola, F. Ampe, F. Barloy-Hubler, M. J. Barnett, A. Becker, P. Boistard, G. Bothe, M. Boutry, L. Bowser, J. Buhrmester, E. Cadieu, D. Capela, P. Chain, A. Cowie, R. W. Davis, S. Dreano, N. A. Federspiel, R. F. Fisher, S. Gloux, T. Godrie, A. Goffeau, B. Golding, J. Gouzy, M. Gurjal, I. Hernandez-Lucas, A. Hong, L. Huizar, R. W. Hyman, T. Jones, D. Kahn, M. L. Kahn, S. Kalman, D. H. Keating, E. Kiss, C. Komp, V. Lelaure, D. Masuy, C. Palm, M. C. Peck, T. M. Pohl, D. Portetelle, B. Purnelle, U. Ramsperger, R. Surzycki, P. Thebault, M. Vandenbol, F. J. Vorholter, S. Weidner, D. H. Wells, K. Wong, K. C. Yeh, and J. Batut. 2001. The composite genome of the legume symbiont Sinorhizobium meliloti. Science 293:668-672. [PubMed]
17. Genin, S., and C. Boucher. 2004. Lessons learned from the genome analysis of Ralstonia solanacearum. Annu. Rev. Phytopathol. 42:107-134. [PubMed]
18. Gillings, M. R., and P. Fahy. 1994. Genomic fingerprinting: towards a unified view of the Pseudomonas solanacearum species complex, p. 95-112. In A. C. Hayward and G. L. Hartman (ed.), Bacterial wilt: the disease and its causative agent, Pseudomonas solanacearum. CAB International, Wallingford, United Kingdom.
19. Hayward, A. C. 1991. Biology and epidemiology of bacterial wilt caused by Pseudomonas solanacearum. Annu. Rev. Phytopathol. 29:65-87. [PubMed]
20. Heidelberg, J. F., J. A. Eisen, W. C. Nelson, R. A. Clayton, M. L. Gwinn, R. J. Dodson, D. H. Haft, E. K. Hickey, J. D. Peterson, and L. Umayam. 2000. DNA sequence of both chromosomes of the cholera pathogen Vibrio cholerae. Nature 406:477-483. [PubMed]
21. Iguchi, A., S. Iyoda, J. Terajima, H. Watanabe, and R. Osawa. 2006. Spontaneous recombination between homologous prophage regions causes large-scale inversions within the Escherichia coli O157:H7 chromosome. Gene 372:199-207. [PubMed]
22. Lavie, M., E. Shillington, C. Eguiluz, N. Grimsley, and C. Boucher. 2002. PopP1, a new member of the YopJ/AvrRxv family of type III effector proteins, acts as a host-specificity factor and modulates aggressiveness of Ralstonia solanacearum. Mol. Plant-Microbe Interact. 15:1058-1068. [PubMed]
23. Lerat, E., V. Daubin, H. Ochman, and N. A. Moran. 2005. Evolutionary origins of genomic repertoires in bacteria. PLoS Biol. 3:807-814. [PMC free article] [PubMed]
24. Mudgett, M. B. 2005. New insights to the function of phytopathogenic bacterial type III effectors in plants. Annu. Rev. Plant Biol. 56:509-531. [PubMed]
25. Mukaihara, T., N. Tamura, Y. Murata, and M. Iwabuchi. 2004. Genetic screening of Hrp type III-related pathogenicity genes controlled by the HrpB transcriptional activator in Ralstonia solanacearum. Mol. Microbiol. 54:863-875. [PubMed]
26. Nakaruma, Y., T. Itoh, H. Matsuda, and T. Gojobori. 2004. Biased biological functions of horizontally transferred genes in prokaryotic genomes. Nat. Genet. 36:760-766. [PubMed]
27. Occhialini, A., S. Cunnac, N. Reymond, S. Genin, and C. Boucher. 2005. Genome-wide analysis of gene expression in Ralstonia solanacearum reveals that the hrpB gene acts as a regulatory switch controlling multiple virulence pathways. Mol. Plant-Microbe Interact. 18:938-949. [PubMed]
28. Ochman, H., J. G. Lawrence, and E. A. Groisman. 2000. Lateral gene transfer and the nature of bacterial innovation. Nature 405:299-304. [PubMed]
29. Ochman, H., and S. R. Santos. 2005. Exploring microbial microevolution with microarrays. Infect. Gen. Evol. 5:103-108. [PubMed]
30. Ortutay, C., Z. Gaspari, G. Toth, E. Jager, G. Vida, L. Orosz, and T. Vellai. 2003. Speciation in Chlamydia: genomewide phylogenetic analyses identified a reliable set of acquired genes. J Mol. Evol. 57:672-680. [PubMed]
31. Palleroni, N. J., and M. Doudoroff. 1971. Phenotypic characterization and deoxyribonucleic acid homologies of Pseudomonas solanacearum. J. Bacteriol. 107:690-696. [PMC free article] [PubMed]
32. Perrier, X., A. Flori, and F. Bonnot. 2003. Data analysis methods, p. 43-76. In P. Hamon, M. Seguin, X. Perrier, and J. C. Glaszmann (ed.), Genetic diversity of cultivated tropical plants. Enfield, Science Publishers, Montpellier, France.
33. Poussier, S., P. Prior, J. Luisetti, C. Hayward, and M. Fegan. 2000. Partial sequencing of the hrpB and endoglucanase genes confirms and expands the known diversity within the Ralstonia solanacearum species complex. Syst. Appl. Microbiol. 23:479-486. [PubMed]
34. Prior, P., and M. Fegan. 2005. Recent development in the phylogeny and classification of Ralstonia solanacearum. Acta Hort. 695:127-136.
35. Rojas, C. M., J. H. Ham, W. L. Deng, J. J. Doyle, and A. Collmer. 2002. HecA, a member of a class of adhesins produced by diverse pathogenic bacteria, contributes to the attachment, aggregation, epidermal cell killing, and virulence phenotypes of Erwinia chrysanthemi EC16 on Nicotiana clevelandii seedlings. Proc. Natl. Acad. Sci. USA 99:13142-13147. [PMC free article] [PubMed]
36. Salanoubat, M., S. Genin, F. Artiguenave, J. Gouzy, S. Mangenot, M. Arlat, A. Billault, P. Brottier, J. C. Camus, L. Cattolico, M. Chandler, N. Choisne, C. Claudel-Renard, S. Cunnac, N. Demange, C. Gaspin, M. Lavie, A. Moisan, C. Robert, W. Saurin, T. Schiex, P. Siguier, P. Thébault, M. Whalen, P. Wincker, M. Levy, J. Weissenbach, and C. A. Boucher. 2002. Genome sequence of the plant pathogen Ralstonia solanacearum. Nature 415:497-502. [PubMed]
37. Schell, M. A. 2000. Control of virulence and pathogenicity genes of Ralstonia solanacearum by an elaborate sensory network. Annu. Rev. Phytopathol. 38:263-292. [PubMed]
38. Stabler, R. A., G. L. Marsden, A. A. Witney, Y. Li, S. D. Bentley, C. M. Tang, and J. Hinds. 2005. Identification of pathogen-specific genes through microarray analysis of pathogenic and commensal Neisseria species. Microbiology 151:2907-2922. [PubMed]
39. Taghavi, M., C. Hayward, L. I. Sly, and M. Fegan. 1996. Analysis of the phylogenetic relationships of strains of Burkholderia solanacearum, Pseudomonas syzygii, and the blood disease bacterium of banana based on 16S rRNA gene sequences. Int. J. Syst. Bacteriol. 46:10-15. [PubMed]
40. Tettelin, H., V. Masignani, M. J. Cieslewicz, C. Donati, D. Medini, N. L. Ward, S. V. Angiuoli, J. Crabtree, A. L. Jones, A. S. Durkin, R. T. Deboy, T. M. Davidsen, M. Mora, M. Scarselli, I. Margarit y Ros, J. D. Peterson, C. R. Hauser, J. P. Sundaram, W. C. Nelson, R. Madupu, L. M. Brinkac, R. J. Dodson, M. J. Rosovitz, S. A. Sullivan, S. C. Daugherty, D. H. Haft, J. Selengut, M. L. Gwinn, L. Zhou, N. Zafar, H. Khouri, D. Radune, G. Dimitrov, K. Watkins, K. J. O'Connor, S. Smith, T. R. Utterback, O. White, C. E. Rubens, G. Grandi, L. C. Madoff, D. L. Kasper, J. L. Telford, M. R. Wessels, R. Rappuoli, and C. M. Fraser. 2005. Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial “pan-genome”. Proc. Natl. Acad. Sci. USA 102:13950-13955. [PMC free article] [PubMed]
41. Valls, M., S. Genin, and C. Boucher. 2006. Integrated regulation of the type III secretion system and other virulence determinants in Ralstonia solanacearum. PloS Pathog. 2:e82. [PMC free article] [PubMed]
42. Wolfgang, M. C., B. R. Kulasekara, X. Liang, D. Boyd, K. Wu, Q. Yang, C. G. Miyada, and S. Lory. 2003. Conservation of genome content and virulence determinants among clinical and environmental isolates of Pseudomonas aeruginosa. Proc. Natl. Acad. Sci. USA 100:8484-8489. [PMC free article] [PubMed]

Articles from Journal of Bacteriology are provided here courtesy of American Society for Microbiology (ASM)
PubReader format: click here to try


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...