• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of iaiPermissionsJournals.ASM.orgJournalIAI ArticleJournal InfoAuthorsReviewers
Infect Immun. Dec 2005; 73(12): 7894–7905.
PMCID: PMC1307019

Identification of Core and Variable Components of the Salmonella enterica Subspecies I Genome by Microarray


We have performed microarray hybridization studies on 40 clinical isolates from 12 common serovars within Salmonella enterica subspecies I to identify the conserved chromosomal gene pool. We were able to separate the core invariant portion of the genome by a novel mathematical approach using a decision tree based on genes ranked by increasing variance. All genes within the core component were confirmed using available sequence and microarray information for S. enterica subspecies I strains. The majority of genes within the core component had conserved homologues in Escherichia coli K-12 strain MG1655. However, many genes present in the conserved set which were absent or highly divergent in K-12 had close homologues in pathogenic bacteria such as Shigella flexneri and Pseudomonas aeruginosa. Genes within previously established virulence determinants such as SPI1 to SPI5 were conserved. In addition several genes within SPI6, all of SPI9, and three fimbrial operons (fim, bcf, and stb) were conserved within all S. enterica strains included in this study. Although many phage and insertion sequence elements were missing from the core component, approximately half the pseudogenes present in S. enterica serovar Typhi were conserved. Furthermore, approximately half the genes conserved in the core set encoded hypothetical proteins. Separation of the core and variant gene sets within S.enterica subspecies I has offered fundamental biological insight into the genetic basis of phenotypic similarity and diversity across S. enterica subspecies I and shown how the core genome of these pathogens differs from the closely related E. coli K-12 laboratory strain.

The genus Salmonella is divided into two species, S. enterica and S. bongori. S. enterica has been divided further into six subspecies, which include S. enterica subspecies enterica (subspecies I) (24, 25, 38, 45). S. enterica subspecies I contains approximately 60% of known Salmonella serovars which inhabit the intestinal tract of humans and warm-blooded animals. Many serovars within subspecies I show host specificity while others are promiscuous, with the outcome of infection being dependent upon the host and the infecting serovars. S. enterica serovar Typhimurium and S. enterica serovar Enteritidis are currently the most prevalent cause of Salmonella-induced human food poisoning (46). However, these serovars can also induce systemic infection in mice or be carried asymptomatically in chronically infected chickens (5). Host-specific serovars such as S. enterica serovar Typhi and S. enterica serovar Gallinarum cause typhoid and typhoid-like disease in humans and poultry, respectively, and rarely cause enteritis in humans (4, 29). The genetics behind this pathogenic diversity is only beginning to be uncovered, with a major goal for comparative genomics being the ability to identify the genetic basis for the unique virulence attributes of these closely related Salmonella pathogens.

Chromosomal DNA hybridization studies have demonstrated that S. enterica strains share between 70 and 100% genetic relatedness (15), which falls to about 55% when S. bongori strains are compared to S. enterica strains (24). Genome sequence comparison of two sequenced Salmonella enterica strains from subspecies I, S. enterica serovar Typhi CT18 and S. enterica serovar Typhimurium LT2, has suggested the genome to be 89% conserved, interspersed with regions of genomic variation (31, 37). These results are complemented by data from several studies using the Salmonella microarray chip, which have looked at differences in the genetic repertoire of strains spanning the Salmonella genus, allowing distinction of Salmonella into subgroups (11, 39, 41), which are largely in agreement with phylogenetic analysis from multilocus enzyme electrophoresis (9) and sequence information derived using both housekeeping and invasion genes (8, 26) and rRNA sequences (12).

The Salmonella species therefore represents a composite gene pool within which distinctive subgroups have acquired degrees of specialization by acquisition (or loss) of specific subsets. The aim of this work was to perform a detailed study to identify genes universally present in the genome of strains within S. enterica subspecies I, using genomic microarray hybridization studies. Forty field and clinical isolates from 12 commonly infective serovars within S. enterica subspecies I were tested using an S. enterica serovar Typhi chip. Genes representing the invariant core component of our microarray data were separated mathematically from the variable or polymorphic regions of the S. enterica subspecies I chromosome, and we looked for the presence of homologues of genes within the core set in the Escherichia coli K-12 genome. It was hypothesized that, by identifying the common chromosomal gene pool, invariant among S. enterica subspecies I, we would be closer to understanding what genomic components group these phenotypically diverse organisms together, while recognizing the regions of the chromosome which contribute to their diversity.


Bacterial strains, growth conditions, and DNA isolation.

The bacterial strains used in this study are shown in Table Table1.1. These include clinical and field isolates of Salmonella from diseased and healthy animals or patients and the sequenced S. enterica serovar Typhi (CT18) and S. enterica serovar Typhimurium (LT2) laboratory strains. For preparation of genomic DNA, cells were grown overnight in LB broth at 37°C and DNA was isolated using the QIAGEN DNeasy tissue kit (no. 69504; QIAGEN) or the CTAB (hexadecyltrimethylammonium bromide; Sigma-Aldrich Ltd.) method.

Bacterial strains used in this studya

S. enterica serovar Typhi microarray analysis.

The design and construction of the S. enterica serovar Typhi CT18 generation 1 arrays are described in the work of Thompson et al. (51). Briefly, the array contains 4,097 screened and validated PCR products from the annotated sequence of the S. enterica serovar Typhi CT18 chromosome. For each slide hybridization reaction, the control strain (S. enterica serovar Typhi CT18) and a test strain were fluorescently labeled with FluoroLink Cy3 and Cy5 dyes, respectively. Dye swap experiments were performed to prevent any bias in the data from uneven labeling (www.sanger.ac.uk/Projects/Microarrays/arraylab/methods.shtml/). At least four hybridization reactions were performed for each test strain; six hybridization reactions were performed for the control CT18 and LT2 strains. The processed slides were scanned using a GenePix 4000B scanner (Axon Instruments, Inc.), and fluorescent spots were quantified using GenePix Pro software (Axon Instruments, Inc.). The array design (array accession no. A-SGRP-1) and all raw data (experiment accession no. E-SGRP-1) can be accessed through http://www.ebi.ac.uk/arrayexpress/query/entry. All spots with a value for the median signal intensity minus background below 50, in both the reference and test channel, were discarded. To compensate for unequal dye incorporation, per spot per chip intensity-dependent normalization (Lowess) was performed. The average normalized ln (Cy5/Cy3) ratio intensity for each gene, which included eight data points (four arrays carried out per strain with each gene spotted in duplicate), was used. Only those coding DNA sequences/genes with a good reference signal (or reading) in all 40 of the test strains were considered for analysis.

Construction of filters and validation.

The set of signal intensities for each gene and strain (denoted Pgs, with a gene range of [1,4048] for g and a serovar range of [1,40] for s) was used to calculate the mean normalized signal intensity of Pgs (denoted as μPg) for each gene across all strains, together with its standard deviation (denoted as σPg). The gene variance (σPg)2 was plotted in order of increasing variance (Fig. (Fig.1a),1a), and using these values as estimators for gene variability, the 4,048 genes for which data were available from all strains were sorted using the decision tree illustrated in Fig. Fig.1b.1b. For all genes present within our data set (set G), the genes were separated by gene ranking through variance into those with low variance across all strains (set H), which exhibited an approximately linear variation with gene ranking, and those with high variance, which exhibited an additional exponential variation with gene ranking (set J). Set H was further divided into those genes with high mean (set A) and those with low mean (set B). The maximum value of the mean signal for this component was selected arbitrarily as 0.6 by visual inspection and validated using the LT2 sequence. This value corresponded well to the distinction between present (above 0.67) and absent (below 0.5) used by Porwollik et al. (41). Similarly genes in set J with low mean were split out into set D. These genes displayed a higher variance than those in set B; the genes in set B were mostly absent or divergent apart from CT18, whereas the genes in set D were present in a minority of strains.

Distribution of variance in gene presence level (μ) as a function of gene order. Distribution of variance in gene presence level over the 4,048 genes present in our data set was analyzed. The genes were ordered by increasing variance with the ...

It was found that the mean signal intensity for many genes within set J showed high variance due to a normalized signal intensity (Cy5/Cy3) ratio up to 7.005 for individual genes and strains. As all signal intensities >1 denote the presence of the gene, any individual signal intensity ratios >1 were reduced to 1 to obtain the filtered values P*gs = min(1.0, Pgs); the mean μP*g and standard deviation σP*g of these filtered values were used in the subsequent analysis. This allowed separation of genes causing high variance due to absence or divergence (set L) from those causing high variance due to high signal intensity ratio (set K). All genes with μP*g of >0.6 exhibiting low variance (using the same variance value used to separate sets J and H) were removed to set C. Finally, genes within set L were separated into intermediate variance (set E) and high variance (set F) using the geometric mean of the maximum and minimum values of (σP*g)2 in set L as the separation point. It was noted that (σP*g)2 changed smoothly through the genes within sets E and F, with no obvious separation value, and hence the division was selected arbitrarily in the sense that there was no clear separation of the two sets, and the separation value was at midpoint. Consequently, genes at the border between these two components were treated with some caution. Genes within a high-mean and low-variance, low-mean and low-variance, or moderate- to high-variance component are listed in Appendix S1 in the supplemental data.

The microarray data within each component were validated by searching for the unique region of each gene represented by the spotted PCR product (query sequence) by running BLASTN against the fully sequenced or partially sequenced genomes of S. enterica serovar Typhi CT18, S. enterica serovar Typhi TY2, S. enterica serovar Typhimurium LT2, S. enterica serovar Typhimurium DT104, S. enterica serovar Typhimurium SL1344, S. enterica serovar Enteritidis PT4, and S. enterica serovar Gallinarum. The BLASTN results were measured for the following parameters: highest-scoring segment pairs (HSP), percent identity over the HSP alignment, alignment length/query length, and identities/query length. If the query sequence was absent from the searched genome, the HSP score was found to be 0 (see Appendix S2 in the supplemental data for details). Any hits of less than 50 bp in length were ignored. A few core genes that were absent only in the unfinished genomes of S. enterica serovar Typhimurium SL1344 (11 genes) and S. enterica serovar Gallinarum (four genes) but were present in the S. enterica serovar Typhimurium and S. enterica serovar Gallinarum strains included in the microarray have been included within the core genome but would require further validation on completion of the respective genome sequences.

To compare the core gene list generated from this data set with Salmonella microarray genomic hybridization studies that have used the S. enterica serovar Typhimurium LT2 microarray (11, 39), a reciprocal FASTA search was used to identify genes common in the two chromosomes. The reciprocal FASTA search identified 3,833 orthologous gene sets in S. enterica serovar Typhi CT18 and S. enterica serovar Typhimurium LT2 chromosome. All genes presumed to be within the core component (11, 39) but absent from the LT2-CT18 orthologous gene sets were discarded from the core gene list. Consequently, only core genes within the 3,833 CT18-LT2 orthologous gene sets were used for comparison between each study (see Appendix S3 in the supplemental data) (Table (Table2).2). For the work of Porwollik et al. (39) a core gene list was created using the presence and absence details from their microarray data for S. enterica subspecies I isolates (supplement A on the website http://bioinformatics.skcc.org/mcclelland/salmonella/subspecies1/; see Appendix 3 in the supplemental data).

Comparison of the Salmonella enterica subspecies I core genes identified from different genomic hybridization studiesa


Separation of the core and variable components of S. enterica subspecies I genome.

DNA microarray-based comparative genomics allows the genome content of a relatively large number of bacterial isolates to be compared to a reference sequenced genome or group of genes to assess the extent of genetic variability within a bacterial population, assessed with respect to the reference genome (see the work of G. Schoolnik for reviews [47, 48]). We have developed an approach to analyze the microarray data from 40 strains of S. enterica subspecies I of both human and animal origin, representing 12 different serovars (Table (Table1),1), using an S. enterica serovar Typhi DNA microarray. Our analysis resulted in separation of the physically invariant component of the chromosome, i.e., the core genome, from the remaining genes. Such information is crucial in unraveling the correlation between pathogenic strains of Salmonella and will add to our understanding of the genetic diversity encompassed within the genome of S. enterica subspecies I pathogens.

We hypothesized that genes with low variance and high mean were the physically invariant component of the S. enterica subspecies I chromosomal backbone comprising the “core” gene set, while the remaining components (low variance and low mean, and moderate to high variance) comprised the variable regions of the S. enterica subspecies I chromosome. The mean normalized signal intensity served to separate genes largely absent/divergent from the remaining data set, while the variance served to distinguish genes with appreciable variation across the strains from those largely invariant or conserved across the strains. The structure of the variance data, with genes ranked by increasing variance, exhibited, after an initial nonlinear rise, a linear increase in variance, with an exponential rise superimposed. There was no clear separation into high- and low-variance sets (Fig. (Fig.1a).1a). However, we were able to use the feature in the ranked variance data represented by the start of noticeable exponential variation, together with plausible separator values between present/conserved and absent/divergent genes for the mean normalized intensity value across all strains, to separate the genes into two components, core and noncore genes. The resulting decision tree is shown in Fig. Fig.1b1b (see Materials and Methods for detail). Figure Figure2a2a shows the microarray data from all strains, which were initially organized with respect to the CT18 annotated gene order, before mathematical separation, while Fig. Fig.2b2b shows the data after separation using the decision tree outlined in Fig. Fig.1b.1b. This resulted in separation of the data into genes with low variance and high mean (core genes; Fig. Fig.2b),2b), genes with low variance and low mean [noncore genes; Fig. Fig.2c(i)],2c(i)], and genes with moderate to high variance [noncore genes; Fig. Fig.2c(ii)].2c(ii)]. The genes within each filter and their presence/absence detail for each strain are given in Appendix S1 in the supplemental data.

Separation of the core from the variable component of genes within the 40 S. enterica subspecies I strains studied. (a) A comparative genomic index of 40 S. enterica subspecies I field and clinical isolates, using the serovar Typhi chromosome as baseline, ...

To confirm the core and noncore gene lists generated above using our microarray data, the genome sequences (or partial sequences) from S. enterica serovar Typhimurium strains LT2, DT104, and SL1344; S. enterica serovar Gallinarum; S. enterica serovar Typhi TY2; and S. enterica serovar Enteritidis PT4 (http://www.sanger.ac.uk) were compared by BLASTN to that of S. enterica serovar Typhi CT18 (see Materials and Methods). We found approximately 99% of the genes detected from BLASTN and present within our data set to be accurately classified in each component of the filter (see Appendix S2 in the supplemental data).

Comparison of our core gene list with other microarray genomic hybridization studies, which included 19 or more S. enterica subspecies I strains (11, 39), showed that, although the total number of core genes identified from each study differed, more than ~90% of core genes from the work of Chan et al. (11) and Porwollik et al. (39), in the CT18-LT2 orthologous gene sets were present within our core group (see Materials and Methods) (Table (Table2)2) (see Appendix S3 in the supplemental data). Disparity in the core gene set probably arose due to differences in the sequences spotted on each array, the number of gene duplicates used per array, the number of microarray hybridization slides used per strain, the hybridization wash stringency used for processing slides, and the processes used for analysis and normalization of data. Despite these differences the S. enterica subspecies I core genomes identified from each study compared well. Future experiments using larger numbers of isolates to account for serovar and genovar variability and a standardized chip and data processing protocol are required to determine the definitive number of genes invariant in the S. enterica subspecies I chromosomal backbone.

Therefore, we have separated the core invariant portion of our microarray data from the variable component in 40 S. enterica subspecies I isolates using a novel mathematical approach. Our separation compared well with available S. enterica subspecies I genome sequences and other S. enterica subspecies I genomic microarray hybridization data. Such mathematical organization of microarray data from comparative genomic studies provides an ideal tool to examine the genome-scale information derived from microarray studies using large numbers of strains. Such processes, if automated, will provide faster and more accurate discrimination of comparative genomic microarray data than is currently available and are being developed within our group for future studies.

Overview of the core invariant component.

By ranking gene variance we have identified both the physically invariant gene pool, present in all S. enterica subspecies I strains, and the variable component (Fig. (Fig.2).2). However, to appreciate the significance of the common genes within S. enterica subspecies I pathogens, especially with respect to virulence characteristics, genes present within this set were compared with the genome of a closely related bacterium, E. coli K-12 strain MG1655. Parkhill et al. (37) previously identified genes common to both the S. enterica serovar Typhi CT18 and E. coli K-12 chromosomes, and we compared the overlap between the S. enterica subspecies I core set and the CT18-K-12 common genes. Indeed, there is a large overlap (approximately 83%) with genes in the S. enterica subspecies I core set and the conserved homologues within E. coli, and this includes genes involved with metabolism, transcription, translation, cell motility, and signal transduction (see Appendix S4 in the supplemental data). In fact the synteny in the conserved genes reiterates the common evolutionary pathways which group these enteric bacteria together. However, genes which were absent from E. coli K-12, and that differentiated it from the Salmonella pathogen, include many genes of unknown function as well as previously established virulence determinants such as genes within the Salmonella pathogenicity islands (SPI) and many fimbrial operons present in S. enterica serovar Typhi. In addition, several genes which have not been previously associated with Salmonella virulence were present in the core set but absent or highly divergent in K-12 (see Appendix S4 in the supplemental data). These genes also had homologues present within S. bongori (data not shown) and include aroQ (STY1852), a chorismate mutase; the hpc/hpa operon (STY1134 to STY1142), involved with tyrosine metabolism; a transketolase (STY2570 to STY2572); cydAB (STY0392-STY0393), which comprise the cytochrome bd complex; a ribokinase and l-fucose permease (STY3989-STY3990); a putative fructose and mannose-specific phosphotransferase (PTS) system (STY4013 to STY4016); and a carbamate kinase and arginine deaminase (STY4804-STY4805; see Appendix S4 in the supplemental data). Furthermore, the presence of homologues of many of these genes, with high amino acid sequence identity, in pathogenic bacteria such as Shigella flexneri (STY1134 to STY1140 shows 80 to 95% amino acid identity with SF4384 to SF4379, respectively), uropathogenic E. coli CTF073 (STY3989 and STY3990 show 96% amino acid identity with c0331and c0332, respectively, and STY4804 and STY4805 have 85 to 96% amino acid identity with c5349 and c5350, respectively), Enterococcus faecalis (STY4013 to STY4016 have 43 to 63% amino acid identity to EF2980 to EF977, respectively), and Pseudomonas aeruginosa (STY0392-STY0393 have 65 to 70% amino acid identity to CioA and CioB, respectively) may be indicative of their involvement in virulence in Salmonella.

An important phenotypic characteristic of most S. enterica isolates, which distinguishes it from E. coli, is reduction of tetrathionate and production of hydrogen sulfide (3). Almost 2% of the S. enterica genome is devoted to this process and comprises genes involved with biosynthesis and utilization of coenzyme B12 (cysG, STY4319; btuR, STY1332; cbi operon, STY2222 to STY2240; cobCD, STY0694-STY0695; cobUST, STY2219 to STY2221), 1,2-propanediol degradation (pocR, STY2241; pdu operon, STY2242 to STY2263), ethanolamine degradation (eut operon, STY2692 to STY2706), tetrathionate reduction (ttr operon, STY1733 to STY1738), and reduction of thiosulfate and sulfite (phs operon, STY2269 to STY2271; asr operon, STY2794 to STY2796) to hydrogen sulfide (3, 36, 42, 44). All genes in the aforementioned processes that were present in our data set were within the filtered core genome. Most of the genes were S. enterica specific and absent from the CT18-K-12 common gene set with the exception of the eut operon and cysG, cobA/btuR, cobC, and cobUST genes (see Appendix S4 in the supplemental data). The eut operon is shared by the two species (32), while homologues for cysG (18); btuR (17); and cobU, cobS, cobT, and cobC have been identified (36) in E. coli. Interestingly, although these genes remain conserved in the human-adapted salmonellae, several genes within the cbi and pdu cluster are pseudogenes in S. enterica serovar Typhi (cbiM, cbiK, cbiJ, cbiC, and pduN), while cbiA and pduF are mutated in S. enterica serovar Paratyphi (30).

The KEGG pathways (http://www.genome.jp/kegg/kegg2.html) were used to analyze the category of genes conserved within the core set. Genes attributed to pathways involved in central metabolism such as glycolysis and gluconeogenesis, fatty acid biosynthesis (pathways 1 and 2), fatty acid metabolism, the pentose phosphate pathway, ATP synthesis, and pyruvate metabolism were mainly conserved in all S. enterica subspecies I strains. However, serovar- and genovar-specific variation was seen in various pathways including those involved with fructose and mannose metabolism, galactose metabolism, nucleotide sugar metabolism, and glycerolipid metabolism (Table (Table3).3). Genetic variations in such pathways reflect differences in composition of capsular polysaccharides (antigenic determinants), available carbon and energy sources, and biochemical reactions, which are probably in response to host specialization or niche adaptation. While some of these differences are well understood, e.g., the rfb cluster (10, 22, 27, 28, 56), others require further investigation to appreciate the importance of the variant gene cluster in the assigned pathway(s) and the likely homologues or alternative pathways present in serovars/strains lacking these genes which can perform similar functions, e.g., the absence of allABCD in S. enterica serovar Montevideo, S. enterica serovar Binza, and a subset of S. enterica serovar Typhimurium, or the xap operon in S. enterica serovar Paratyphi (Table (Table33).

Variability in genes involved with amino acid, carbon, lipid, energy, and nucleotide metabolism within the S. enterica subspecies I serovars determined using genes from the filtered variable genes (see Appendices S1 and S4 in the supplemental material; ...

The presence of virulence determinants such as type III secretion systems allows salmonellae to be intracellular pathogens (13, 19, 55), distinguishing them from commensal E. coli. In S. enterica serovar Typhi, 10 SPI regions have been characterized (37). Our data showed genes within SPI1 to SPI5 to be present or conserved in all S. enterica subspecies I strains, which is consistent with the findings of Ochman and Groisman (34), Chan et al. (11), and Porwollik et al. (41). In addition, we noted that four genes from within SPI6 (STY0335, STY0338, STY0351, and STY0352), which represent only 7% of the SPI6 region in S. enterica serovar Typhi, were present in our core set. The SPIs are believed to have been acquired by horizontal transfer and may be self-mobile (35). Therefore, the four genes could be remnants of a previously integrated SPI6 at this position or could indicate that genes within the 59-kb island have been acquired at various evolutionary intervals. Also, SPI9, which is a 16-kb pathogenicity island, was present in all S. enterica strains included in our core data set. Like SPI4, SPI9 also encodes a type I secretory system and a large RTX-like protein. The function and contribution of these RTX proteins to virulence within Salmonella are not yet known. However, RTX proteins located adjacent to type 1 secretory systems are commonly involved in virulence, and examples include the Vibrio cholerae RTX toxin VcRtxa, which is involved in covalently cross-linking actin (49), and the alpha-hemolysin secretion system which encodes the cytolytic RTX exotoxin HlyA in pathogenic E. coli (6, 14).

Fimbriae are virulence determinants important in bacterial adherence to biotic and abiotic surfaces (2, 7, 43), and S. enterica serovars typically harbor a large number of putative fimbrial operons. For example, the S. enterica serovar Typhi genome contains 12 putative fimbrial operons (52), while S. enterica serovar Typhimurium contains 13 putative fimbrial operons (31). Among the plethora of fimbrial operons in Salmonella only three (fim, bcf, and stb) were conserved within all S. enterica serovars examined. However, two of these (bcf and fim) have pseudogenes in S. enterica serovar Typhi, while bcf has pseudogenes in S. enterica serovar Paratyphi (30). Nevertheless, expression of all three fimbrial operons (fim, bcf, and stb) by S. enterica serovar Typhimurium has been shown in vivo in bovine ligated loops (21). Two other putative fimbrial operons, saf and std, were present in all S. enterica strains in our data set except S. enterica serovar Senftenberg and S. enterica serovar Gallinarum, respectively. The saf operon, although present in S. enterica serovar Paratyphi, again harbors pseudogenes (30). The absence of the std operon in the S. enterica serovar Gallinarum strain used for our microarray studies was consistent with the S. enterica serovar Gallinarum genome sequence but varied from the work of Porwollik et al. (39), suggesting genovar-specific variation within this serogroup. The small number of conserved fimbrial operons within the core set could be indicative of the role that fimbriae play in niche specialization within S. enterica subspecies I serovars. In future, as more genomes within this group are sequenced, the variety of fimbrial operons present in S. enterica serovars, and how they differ with differing host specificity, will become obvious.

Many genes involved in pathogenicity, phage, and insertion sequence elements were missing from both the S. enterica subspecies I core set and E. coli genes. The majority of genes within this component were present only in serovar Typhi strains and within the noncore, low-mean-low-variance filter [Fig. [Fig.2c(i);2c(i); see Appendices S1 and S2 in the supplemental data). The S. enterica serovar Typhi genome probably acquired these genes through horizontal gene transfer, as it became human adapted. Detailed analysis of the seven prophage-like elements identified within this component, using both sequence and microarray data, can be found in the work of Thomson et al. (51).

Another interesting feature of the S. enterica serovar Typhi genome is the large number of pseudogenes (204) that are present (37). The majority of these genes (145) are functional in S. enterica serovar Typhimurium, whereas only 23 are present as pseudogenes (31, 37). S. enterica serovar Paratyphi, which harbors 177 pseudogenes, shares only 28 pseudogenes with S. enterica serovar Typhi (30). Therefore, the S. enterica serovar Typhi pseudogenes present within the core set (106 of 204) may still be functional within other S. enterica subspecies I serovars. It has been suggested that these mutations, which inactivate genes, are of relatively recent origin and have resulted in large numbers of genes involved in gastric survival being pseudogenes in the human-restricted strains (30, 31, 37). Therefore, accumulation of pseudogenes could be a consequence by which S. enterica serovars specialize to different environmental conditions. Thus, it can be speculated that serovars such as S. enterica serovar Gallinarum and S. enterica serovar Pullorum, which are host restricted and cause fowl typhoid and pullorum disease, respectively (50), will also possess a unique set of pseudogenes, comprising genes no longer required by these serovars.

Moreover, approximately half of genes within the core set were hypothetical proteins of unknown function, which had been preserved through the vertical evolution of S. enterica subspecies I. Surprisingly, a large number of these genes were also conserved within the E. coli K-12 genome. Preservation of these genes may implicate some yet unknown but nevertheless important function associated with these genes and their conservation. They could be involved with enhancing the fitness and survival of enteric bacteria under different environmental conditions or even involved in escaping host immune response, and understanding their role requires further work.

Therefore, to understand the significance of the conserved genes present in all pathogenic S. enterica subspecies I strains, we compared the core gene set within S. enterica subspecies I to genes present in the commensal E. coli K-12 genome. Salmonella is a close relative of E. coli, many serotypes of which are commensals of mammals and birds, while others are human and animal pathogens. As expected, a large number of genes within the core set had homologues present in E. coli K-12 strain MG1655 (~80%), and the majority of these were also present in S. bongori (data not shown). In addition, we identified several genes in the core S. enterica subspecies I set which had close homologues in other pathogenic bacteria, such as uropathogenic E. coli CFT073 (54) and P. aeruginosa, and therefore may be associated with virulence in Salmonella. The S. enterica serovar Typhi and Typhimurium protein CydAB, which shows approximately 70% amino acid identity with P. aeruginosa cytochrome bd complex (CioAB), was present in our core set. The cytochrome bd complex in P. aeruginosa is cyanide insensitive, allowing it to respire and grow during cyanide production (16). Moreover, it has been shown that production of hydrogen cyanide by P. aeruginosa can paralyze and kill the nematode Caenorhabditis elegans (20). In contrast, the cytochrome bd orthologue present in E. coli, with approximately 30% amino acid identity to S. enterica serovar Typhi or S. enterica serovar Typhimurium CydAB, is expressed under low aeration and at the stationary phase of growth (53). Future work in this area will increase our understanding of the evolutionary origin of such genes within S. enterica subspecies I pathogens and show how, if at all, they contribute to the virulence attributes shared among these pathogens.

Separation of the core component from the variable component of the genome may also help us in future to understand the host restriction shown by many Salmonella serovars and phage types. Microarray genome comparison of the host-restricted S. enterica serovar Typhimurium DT2 and DT99 pigeon isolate genomes to the broad-host-range S. enterica serovar Typhimurium LT2 genome has shown no genetic islands, present in LT2, whose loss could be associated with host restriction (1). Similarly, in this study, we were also unable to distinguish the genomes of two S. enterica serovar Typhimurium pigeon isolates, S6332 and S1055, from those of the remaining S. enterica serovar Typhimurium strains.


In 1998 the White-Kauffmann-Le Minor scheme divided Salmonella according to antigenic structure into 2,449 serovars, of which 1,443 were in S. enterica subspecies I (38). Among this wide variety of S. enterica serovars, only a small fraction within subspecies I are enteric pathogens. In fact the 12 most prevalent Salmonella serovars have been shown to be responsible for more than 70% of all human Salmonella infections (Centers for Disease Control and Prevention, 2001; http://www.cdc.gov/ncidod/dbmd/phlisdata/Salmonella.htm).

The aim of this study was to determine the common chromosomal gene pool that exists within S. enterica subspecies I, by microarray, using some of the most prevalent serotypes. The reference strain used was S. enterica serovar Typhi CT18, so only genes present within the CT18 chromosome were considered. A mathematical approach was developed which provides an ideal tool for application in comparative genomic hybridization studies. It can be used to separate the core genes, representing genes conserved within S. enterica subspecies I, from the variant component, when the genomes of a number of closely related strains are compared with a sequenced strain. Therefore, such separation, based only on the physical presence/absence of genes in the chromosome, as detected by DNA-DNA hybridization, is not based on function. Hence, the core data set includes pseudogenes, which harbor single-base-pair changes and are not functional. Nevertheless, using such a method, the resulting core set comprised genes essential for growth, survival, and virulence of S. enterica subspecies I strains and also contained many genes with homologues in a commensal E. coli strain.

In Salmonella approximately 25% of all genes are thought to have been acquired after separation of Salmonella from E. coli around 100 million years ago (40, 41). In fact laterally acquired genes, which drive evolutionary diversity and niche specialization, have resulted in creating the mosaic structure common to bacterial genomes (33, 35). Klasson and Andersson (23) have shown through comparison of genomic sequences of host-dependent bacteria that the minimal gene sets that have evolved are species specific. They have further iterated that such gene sets can persist in nature for tens of millions of years provided that the environment is rich in nutrients, that the host population size is large, and that there is a strong host-level selection for bacterial gene functions (23). Therefore, preservation of genes within our core set probably reflects the specificity that S. enterica subspecies I strains have gained as they evolved and adapted to their environment, which remained rich in nutrients.

The information gleaned from this study will increase our understanding of genotypic factors that group these diverse pathogens together within S. enterica subspecies I and complement other microarray genomic hybridization studies which have looked at genetic factors which differentiate them (11, 39, 41). Understanding genetic similarities and diversity encompassed within the Salmonella genome will inform not only future intervention strategies for controlling its entry and propagation through the food chain but also treatment regimens for salmonella-associated disease.

Supplementary Material

[Supplemental material]


M.F.A. and M.J.W. are grateful for funding from the Veterinary Laboratories Agency seedcorn fund.

We thank Steve Gordon and Luke Randall at the VLA for many helpful suggestions and strains, respectively. We are also grateful to Daniel James and James Tucker for technical assistance.


Editor: F. C. Fang


Supplemental material for this article may be found at http://iai.asm.org/.


1. Andrews-Polymenis, H. L., W. Rabsch, S. Porwollik, M. McClelland, C. Rosetti, L. G. Adams, and A. J. Baumler. 2004. Host restriction of Salmonella enterica serotype Typhimurium pigeon isolates does not correlate with loss of discrete genes. J. Bacteriol. 186:2619-2628. [PMC free article] [PubMed]
2. Austin, J. W., G. Sanders, W. W. Kay, and S. K. Collinson. 1998. Thin aggregative fimbriae enhance Salmonella enteritidis biofilm formation. FEMS Microbiol. Lett. 162:295-301. [PubMed]
3. Barrett, E. L., and Marta A., Clark. 1987. Tetrathionate reduction and production of hydrogen sulfide from thiosulfate. Microbiol. Rev. 51:192-205. [PMC free article] [PubMed]
4. Barrow, P. A., M. B. Huggins, and M. A. Lovell. 1994. Host specificity of Salmonella infection in chickens and mice is expressed in vivo primarily at the level of the reticuloendothelial system. Infect. Immun. 62:4602-4610. [PMC free article] [PubMed]
5. Baumler, A. J., A. J. Gilde, R. M. Tsolis, A. W. van der Velden, B. M. Ahmer, and F. Heffron. 1997. Contribution of horizontal gene transfer and deletion events to development of distinctive patterns of fimbrial operons during evolution of Salmonella serotypes. J. Bacteriol. 179:317-322. [PMC free article] [PubMed]
6. Blight, M. A., A. L. Pimenta, J. C. Lazzaroni, C. Dando, L. Kotelevets, S. J. Seror, and I. B. Holland. 1994. Identification and preliminary characterization of temperature-sensitive mutations affecting HlyB, the translocator required for the secretion of haemolysin (HlyA) from Escherichia coli. Mol. Gen. Genet. 245:431-440. [PubMed]
7. Boddicker, J. D., N. A. Ledeboer, J. Jagnow, B. D. Jones, and S. Clegg. 2002. Differential binding to and biofilm formation on HEp-2 cells by Salmonella enterica serovar Typhimurium is dependent upon allelic variation in the fimH gene of the fim gene cluster. Mol. Microbiol. 45:1255-1265. [PubMed]
8. Boyd, E. F., J. Li, H. Ochman, and R. K. Selander. 1997. Comparative genetics of the inv-spa invasion gene complex of Salmonella enterica. J. Bacteriol. 179:1985-1991. [PMC free article] [PubMed]
9. Boyd, E. F., F. S. Wang, T. S. Whittam, and R. K. Selander. 1996. Molecular genetic relationships of the salmonellae. Appl. Environ. Microbiol. 62:804-808. [PMC free article] [PubMed]
10. Brown, P. K., L. K. Romana, and P. R. Reeves. 1991. Cloning of the rfb gene cluster of a group C2 Salmonella strain: comparison with the rfb regions of groups B and D. Mol. Microbiol. 5:1873-1881. [PubMed]
11. Chan, K., S. Baker, C. C. Kim, C. S. Detweiler, G. Dougan, and S. Falkow. 2003. Genomic comparison of Salmonella enterica serovars and Salmonella bongori by use of an S. enterica serovar Typhimurium DNA microarray. J. Bacteriol. 185:553-563. [PMC free article] [PubMed]
12. Christensen, H., S. Nordentoft, and J. E. Olsen. 1998. Phylogenetic relationships of Salmonella based on rRNA sequences. Int. J. Syst. Bacteriol. 48:605-610. [PubMed]
13. Cornelis, G. R., and F. Van Gijsegem. 2000. Assembly and function of type III secretory systems. Annu. Rev. Microbiol. 54:735-774. [PubMed]
14. Cortajarena, A. L., F. M. Goni, and H. Ostolaza. 2002. His-859 is an essential residue for the activity and pH dependence of Escherichia coli RTX toxin alpha-hemolysin. J. Biol. Chem. 277:23223-23229. [PubMed]
15. Crosa, J. H., D. J. Brenner, W. H. Ewing, and S. Falkow. 1973. Molecular relationships among the salmonellae. J. Bacteriol. 115:307-315. [PMC free article] [PubMed]
16. Cunningham, L., M. Pitt, and H. D. Williams. 1997. The cioAB genes from Pseudomonas aeruginosa code for a novel cyanide-insensitive terminal oxidase related to the cytochrome bd quinol oxidases. Mol. Microbiol. 24:579-591. [PubMed]
17. Escalante-Semerena, J. C., S. J. Suh, and J. R. Roth. 1990. cobA function is required for both de novo cobalamin biosynthesis and assimilation of exogenous corrinoids in Salmonella typhimurium. J. Bacteriol. 172:273-280. [PMC free article] [PubMed]
18. Fazzio, T. G., and J. R. Roth. 1996. Evidence that the CysG protein catalyzes the first reaction specific to B12 synthesis in Salmonella typhimurium, insertion of cobalt. J. Bacteriol. 178:6952-6959. [PMC free article] [PubMed]
19. Galan, J. E. 2001. Salmonella interactions with host cells: type III secretion at work. Annu. Rev. Cell Dev. Biol. 17:53-86. [PubMed]
20. Gallagher, L. A., and C. Manoil. 2001. Pseudomonas aeruginosa PAO1 kills Caenorhabditis elegans by cyanide poisoning. J. Bacteriol. 183:6207-6214. [PMC free article] [PubMed]
21. Humphries, A. D., M. Raffatellu, S. Winter, E. H. Weening, R. A. Kingsley, R. Droleskey, S. Zhang, J. Figueiredo, S. Khare, J. Nunes, L. G. Adams, R. M. Tsolis, and A. J. Baumler. 2003. The use of flow cytometry to detect expression of subunits encoded by 11 Salmonella enterica serotype Typhimurium fimbrial operons. Mol. Microbiol. 48:1357-1376. [PubMed]
22. Jiang, X. M., B. Neal, F. Santiago, S. J. Lee, L. K. Romana, and P. R. Reeves. 1991. Structure and sequence of the rfb (O antigen) gene cluster of Salmonella serovar typhimurium (strain LT2). Mol. Microbiol. 5:695-713. [PubMed]
23. Klasson, L., and S. G. Andersson. 2004. Evolution of minimal gene sets in host-dependent bacteria. Trends Microbiol. 12:37-43. [PubMed]
24. Le Minor, L., M. Y. Popoff, B. Laurent, and D. Hermant. 1986. Characterization of a 7th subspecies of Salmonella: S. choleraesuis subsp. indica subsp. nov. Ann. Inst. Pasteur Microbiol. 137B:211-217. (In French.) [PubMed]
25. Le Minor, L., M. Veron, and M. Popoff. 1982. The taxonomy of Salmonella. Ann. Microbiol. (Paris) 133:223-243. (In French.) [PubMed]
26. Li, J., H. Ochman, E. A. Groisman, E. F. Boyd, F. Solomon, K. Nelson, and R. K. Selander. 1995. Relationship between evolutionary rate and cellular location among the Inv/Spa invasion proteins of Salmonella enterica. Proc. Natl. Acad. Sci. USA 92:7252-7256. [PMC free article] [PubMed]
27. Liu, D., A. M. Haase, L. Lindqvist, A. A. Lindberg, and P. R. Reeves. 1993. Glycosyl transferases of O-antigen biosynthesis in Salmonella enterica: identification and characterization of transferase genes of groups B, C2, and E1. J. Bacteriol. 175:3408-3413. [PMC free article] [PubMed]
28. Liu, D., N. K. Verma, L. K. Romana, and P. R. Reeves. 1991. Relationships among the rfb regions of Salmonella serovars A, B, and D. J. Bacteriol. 173:4814-4819. [PMC free article] [PubMed]
29. Mandal, B. K. 1979. Typhoid and paratyphoid fever. Clin. Gastroenterol. 8:715-735. [PubMed]
30. McClelland, M., K. E. Sanderson, S. W. Clifton, P. Latreille, S. Porwollik, A. Sabo, R. Meyer, T. Bieri, P. Ozersky, M. McLellan, C. R. Harkins, C. Wang, C. Nguyen, A. Berghoff, G. Elliott, S. Kohlberg, C. Strong, F. Du, J. Carter, C. Kremizki, D. Layman, S. Leonard, H. Sun, L. Fulton, W. Nash, T. Miner, P. Minx, K. Delehaunty, C. Fronick, V. Magrini, M. Nhan, W. Warren, L. Florea, J. Spieth, and R. K. Wilson. 2004. Comparison of genome degradation in Paratyphi A and Typhi, human-restricted serovars of Salmonella enterica that cause typhoid. Nat. Genet. 36:1268-1274. [PubMed]
31. McClelland, M., K. E. Sanderson, J. Spieth, S. W. Clifton, P. Latreille, L. Courtney, S. Porwollik, J. Ali, M. Dante, F. Du, S. Hou, D. Layman, S. Leonard, C. Nguyen, K. Scott, A. Holmes, N. Grewal, E. Mulvaney, E. Ryan, H. Sun, L. Florea, W. Miller, T. Stoneking, M. Nhan, R. Waterston, and R. K. Wilson. 2001. Complete genome sequence of Salmonella enterica serovar Typhimurium LT2. Nature 413:852-856. [PubMed]
32. Mori, K., R. Bando, N. Hieda, and T. Toraya. 2004. Identification of a reactivating factor for adenosylcobalamin-dependent ethanolamine ammonia lyase. J. Bacteriol. 186:6845-6854. [PMC free article] [PubMed]
33. Ochman, H. 2001. Lateral and oblique gene transfer. Curr. Opin. Genet. Dev. 11:616-619. [PubMed]
34. Ochman, H., and E. A. Groisman. 1996. Distribution of pathogenicity islands in Salmonella spp. Infect. Immun. 64:5410-5412. [PMC free article] [PubMed]
35. Ochman, H., J. G. Lawrence, and E. A. Groisman. 2000. Lateral gene transfer and the nature of bacterial innovation. Nature 405:299-304. [PubMed]
36. O'Toole, G. A., M. R. Rondon, J. R. Trzebiatowski, S. J. Suh, and J. C. Escalante-Semerena. 1996. Biosynthesis and utilization of adenosyl-cobalamin (coenzyme B12), p. 710-720. In F. C. Neidhardt, R. Curtiss III, J. L. Ingraham, E. C. C. Lin, K. B. Low, B. Magasanik, W. S. Reznikoff, M. Riley, M. Schaechter, and H. E. Umbarger (ed.), Escherichia coli and Salmonella: cellular and molecular biology, 2nd ed. American Society for Microbiology, Washington, D.C.
37. Parkhill, J., G. Dougan, K. D. James, N. R. Thomson, D. Pickard, J. Wain, C. Churcher, K. L. Mungall, S. D. Bentley, M. T. Holden, M. Sebaihia, S. Baker, D. Basham, K. Brooks, T. Chillingworth, P. Connerton, A. Cronin, P. Davis, R. M. Davies, L. Dowd, N. White, J. Farrar, T. Feltwell, N. Hamlin, A. Haque, T. T. Hien, S. Holroyd, K. Jagels, A. Krogh, T. S. Larsen, S. Leather, S. Moule, P. O'Gaora, C. Parry, M. Quail, K. Rutherford, M. Simmonds, J. Skelton, K. Stevens, S. Whitehead, and B. G. Barrell. 2001. Complete genome sequence of a multiple drug resistant Salmonella enterica serovar Typhi CT18. Nature 413:848-852. [PubMed]
38. Popoff, M. Y., J. Bockemuhl, and F. W. Brenner. 1998. Supplement 1997 (no. 41) to the Kauffmann-White scheme. Res. Microbiol. 149:601-604. [PubMed]
39. Porwollik, S., E. F. Boyd, C. Choy, P. Cheng, L. Florea, E. Proctor, and M. McClelland. 2004. Characterization of Salmonella enterica subspecies I genovars by use of microarrays. J. Bacteriol. 186:5883-5898. [PMC free article] [PubMed]
40. Porwollik, S., and M. McClelland. 2003. Lateral gene transfer in Salmonella. Microbes Infect. 5:977-989. [PubMed]
41. Porwollik, S., R. M. Wong, and M. McClelland. 2002. Evolutionary genomics of Salmonella: gene acquisitions revealed by microarray analysis. Proc. Natl. Acad. Sci. USA 99:8956-8961. [PMC free article] [PubMed]
42. Price-Carter, M., J. Tingey, T. A. Bobik, and J. R. Roth. 2001. The alternative electron acceptor tetrathionate supports B12-dependent anaerobic growth of Salmonella enterica serovar Typhimurium on ethanolamine or 1,2-propanediol. J. Bacteriol. 183:2463-2475. [PMC free article] [PubMed]
43. Prouty, A. M., W. H. Schwesinger, and J. S. Gunn. 2002. Biofilm formation and interaction with the surfaces of gallstones by Salmonella spp. Infect Immun. 70:2640-2649. [PMC free article] [PubMed]
44. Raux, E., A. Lanois, F. Levillayer, M. J. Warren, E. Brody, A. Rambach, and C. Thermes. 1996. Salmonella typhimurium cobalamin (vitamin B12) biosynthetic genes: functional studies in S. typhimurium and Escherichia coli. J. Bacteriol. 178:753-767. [PMC free article] [PubMed]
45. Reeves, M. W., G. M. Evins, A. A. Heiba, B. D. Plikaytis, and J. J. Farmer III. 1989. Clonal nature of Salmonella typhi and its genetic relatedness to other salmonellae as shown by multilocus enzyme electrophoresis, and proposal of Salmonella bongori comb. nov. J. Clin. Microbiol. 27:313-320. [PMC free article] [PubMed]
46. Rodrigue, D. C., R. V. Tauxe, and B. Rowe. 1990. International increase in Salmonella enteritidis: a new pandemic? Epidemiol. Infect. 105:21-27. [PMC free article] [PubMed]
47. Schoolnik, G. K. 2002. Functional and comparative genomics of pathogenic bacteria. Curr. Opin. Microbiol. 5:20-26. [PubMed]
48. Schoolnik, G. K. 2002. Microarray analysis of bacterial pathogenicity. Adv. Microb. Physiol. 46:1-45. [PubMed]
49. Sheahan, K. L., C. L. Cordero, and K. J. Satchell. 2004. Identification of a domain within the multifunctional Vibrio cholerae RTX toxin that covalently cross-links actin. Proc. Natl. Acad. Sci. USA 101:9798-9803. [PMC free article] [PubMed]
50. Shivaprasad, H. L. 2000. Fowl typhoid and pullorum disease. Rev. Sci. Technol. 19:405-424. [PubMed]
51. Thomson, N., S. Baker, D. Pickard, M. Fookes, M. Anjum, N. Hamlin, J. Wain, D. House, Z. Bhutta, K. Chan, S. Falkow, J. Parkhill, M. Woodward, A. Ivens, and G. Dougan. 2004. The role of prophage-like elements in the diversity of Salmonella enterica serovars. J. Mol. Biol. 339:279-300. [PubMed]
52. Townsend, S. M., N. E. Kramer, R. Edwards, S. Baker, N. Hamlin, M. Simmonds, K. Stevens, S. Maloy, J. Parkhill, G. Dougan, and A. J. Baumler. 2001. Salmonella enterica serovar Typhi possesses a unique repertoire of fimbrial gene sequences. Infect. Immun. 69:2894-2901. [PMC free article] [PubMed]
53. Trumpower, B. L., and R. B. Gennis. 1994. Energy transduction by cytochrome complexes in mitochondrial and bacterial respiration: the enzymology of coupling electron transfer reactions to transmembrane proton translocation. Annu. Rev. Biochem. 63:675-716. [PubMed]
54. Welch, R. A., V. Burland, G. Plunkett III, P. Redford, P. Roesch, D. Rasko, E. L. Buckles, S. R. Liou, A. Boutin, J. Hackett, D. Stroud, G. F. Mayhew, D. J. Rose, S. Zhou, D. C. Schwartz, N. T. Perna, H. L. Mobley, M. S. Donnenberg, and F. R. Blattner. 2002. Extensive mosaic structure revealed by the complete genome sequence of uropathogenic Escherichia coli. Proc. Natl. Acad. Sci. USA 99:17020-17024. [PMC free article] [PubMed]
55. Winstanley, C., and C. A. Hart. 2001. Type III secretion systems and pathogenicity islands. J. Med. Microbiol. 50:116-126. [PubMed]
56. Xiang, S. H., A. M. Haase, and P. R. Reeves. 1993. Variation of the rfb gene clusters in Salmonella enterica. J. Bacteriol. 175:4877-4884. [PMC free article] [PubMed]

Articles from Infection and Immunity are provided here courtesy of American Society for Microbiology (ASM)


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...