• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of aemPermissionsJournals.ASM.orgJournalAEM ArticleJournal InfoAuthorsReviewers
Appl Environ Microbiol. Aug 2008; 74(15): 4703–4710.
Published online Jun 6, 2008. doi:  10.1128/AEM.00132-08
PMCID: PMC2519362

Streptococcus thermophilus Core Genome: Comparative Genome Hybridization Study of 47 Strains[down-pointing small open triangle]

Abstract

A DNA microarray platform based on 2,200 genes from publicly available sequences was designed for Streptococcus thermophilus. We determined how single-nucleotide polymorphisms in the 65- to 75-mer oligonucleotide probe sequences affect the hybridization signals. The microarrays were then used for comparative genome hybridization (CGH) of 47 dairy S. thermophilus strains. An analysis of the exopolysaccharide genes in each strain confirmed previous findings that this class of genes is indeed highly variable. A phylogenetic tree based on the CGH data showed similar distances for most strains, indicating frequent recombination or gene transfer within S. thermophilus. By comparing genome sizes estimated from the microarrays and pulsed-field gel electrophoresis, the amount of unknown DNA in each strain was estimated. A core genome comprised of 1,271 genes detected in all 47 strains was identified. Likewise, a set of noncore genes detected in only some strains was identified. The concept of an industrial core genome is proposed. This is comprised of the genes in the core genome plus genes that are necessary in an applied industrial context.

The genome includes all genetic material of an organism. It includes a number of core genes, which are detected in all strains of a species, as well as variable genes, which are detected in only some of the strains. The core genes are necessary for overall viability, while the variable genes enable the organism to thrive in particular niches. The properties of a strain are the result of both types of genes.

Streptococcus thermophilus is an industrially important species used globally in the production of fermented milk products and cheese. This species has two industrial roles, i.e., the production of lactic acid in cheese and yogurt and the formation of texture and flavor during yogurt production (11, 23). To ensure robust production, especially in the presence of bacteriophage, and to provide product diversity, a large number of different strains are used commercially (23).

Even though S. thermophilus is part of the genus Streptococcus, which includes several pathogens, it has a generally recognized as safe status in the United States and a qualified presumption of safety status in the European Union due to a long history of safe use in food production.

Comparison of complete genome sequences has revealed the absence of genes associated with pathogenicity in three sequenced strains of S. thermophilus, and it is postulated that extensive genome evolution has taken place due to the use of this species in a milk environment for several millennia (4). Here we further explore the gene content and genome evolution of S. thermophilus by comparative genome hybridization (CGH) of 47 strains. The microarray platform harbors probes for the genes identified in three published S. thermophilus genome sequences combined with additional S. thermophilus genes found in GenBank (www.ncbi.nlm.nih.gov). This analysis gives information about the presence or absence of genes in the genomes of the investigated strains, thereby refining the definition of the core genome of this species (13).

MATERIALS AND METHODS

Strains and culture conditions.

All S. thermophilus strains were from the Chr. Hansen culture collection (Hørsholm, Denmark). They were confirmed to be S. thermophilus by sequence analysis of the 16S rRNA gene (data not shown). Strains were grown without shaking at 37°C in M17 broth (Oxoid A/S, Greve, Denmark) supplemented with 2% lactose.

Design of oligonucleotides and preparation of microarrays.

The design of oligonucleotides for the S. thermophilus microarray platform was carried out using OligoWiz 1.0 (17), with the relevant coding sequences (CDSs) as input and 65- to 75-mer oligonucleotides as output. The platform was intended to cover all publicly available CDSs of S. thermophilus (as of August 2005). LMG18311 (4) was arbitrarily chosen as the primary strain, and all CDSs were collected in FASTA format. One copy of the 16S and 23S rRNA genes was included, whereas tRNA genes, which are relatively short, were excluded. CDSs present in CNRZ1066 (4) but not LMG18311 were identified using nucleotide BLAST, with an E value cutoff of 10−50 (1). Subsequently, CDSs present only in the 199 contigs of the LMD-9 draft genome sequence (14) were identified. Additional entries for S. thermophilus CDSs were downloaded from GenBank, CDSs were extracted using FeatureExtract (www.cbs.dtu.dk/services/FeatureExtract), and unique genes were identified. Finally, unique CDSs from phages DT1 (26) and O1205 (21) were downloaded from GenBank.

All CDSs were used as input to OligoWiz, with LMG18311 set as the reference genome. The numbers of input CDSs were as follows: for LMG18311, 1,893; for CNRZ1066, 103; for LMD-9, 156; for GenBank sequences, 219; and for phages, 45. The resulting oligonucleotides were analyzed against the combined list of CDSs by using BLAST. Some oligonucleotides showed multiple matches (a perfect match over the entire 65- to 75-mer oligonucleotide would give an E value of about 10−32). For these, we attempted to redesign the oligonucleotide. When this was not possible, we removed the respective oligonucleotides, i.e., 87 for LMG18311, 13 for CNRZ1066, 34 for LMD-9, 30 for GenBank sequences, and 2 for phages. One oligonucleotide was designed for each of 2,250 genes.

The oligonucleotides are designated as follows, where “#” indicates an integer relating to a gene identifier for the relevant genome or a position: for LMG18311, stl#; for CNRZ1066, stc#; for LMD-9, StheL#; for GenBank sequences, relevant gene identifier; and for phages, gi23455848_# and gi29165635_#).

We designed multiple oligonucleotides for three important S. thermophilus CDSs for use as controls. These are designated with the suffixes -a, -b, -c, etc. The selected CDSs were the urea amidohydrolase (urease) α subunit (five oligonucleotides), pyruvate kinase (five oligonucleotides), and β-galactosidase (eight oligonucleotides). Twenty negative control oligonucleotides were designed from CDSs of the Leuconostoc mesenteroides ATCC 8293 draft genome sequence (14). These oligonucleotides had no hits among the S. thermophilus CDSs even when the E value was set very high (100) in a BLAST comparison and are designated negLM#.

For determining the sensitivity to one or more sequence variations, we designed oligonucleotides with point mutations, insertions, and deletions, all with one of the pyruvate kinase oligonucleotides (stl1196c) as the basis. These control oligonucleotides were named apm-pyk#, ins-pyk#, and del-pyk#, where “#” is 1 to 8, 1 to 4, and 1 to 4, respectively, and designates the number of sequence changes. The designation “apm” stands for “accumulated point mutations,” “ins” designates insertions, and “del” designates deletions.

To determine the number of bases needed to obtain a signal, oligonucleotides with increasing numbers of base substitutions in a negative control oligonucleotide were designed. The oligonucleotides for one such series were named detLM0001pyk#, where “#” equals 0, 14, 16, 18, 20, 22, 19/20, or 21/22. LM0001 is a negative control oligonucleotide from L. mesenteroides, and “#” designates the number of bases from pyk of S. thermophilus that were substituted in the middle of the oligonucleotide for bases of LM0001. The designations “19/20” and “21/22” indicate one point mutation in 20- and 22-base regions, respectively. A similar series named detLM0356gap was designed, where gap is glyceraldehyde-3-dehydrogenase of S. thermophilus and LM0356 is the negative control oligonucleotide.

In total, 2,304 oligonucleotides were designed and purchased from Bioneer Corporation (Daedeok-gu, Daejeon, South Korea). Lyophilized oligonucleotides were provided following 50-nmol-scale synthesis and BioRP purification. The printing, or spotting, of microarrays was carried out as described previously (M. B. Pedersen et al., submitted for publication).

Isolation of gDNA.

Genomic DNA (gDNA) was isolated from 1.5-ml stationary-phase cultures by use of a DNeasy blood and tissue kit according to the manufacturer's protocol (Qiagen, Valencia, CA), using lysozyme (L6876) from Sigma-Aldrich (Brøndby, Denmark). Elution was done with two 100-μl volumes of H2O. Samples were lyophilized in a Speed-Vac device, and the concentration was adjusted to 0.5 μg/μl with H2O. The yield per sample was typically 5 to 10 μg.

Copying and labeling of gDNA.

The copying of gDNA and labeling with either Cy3 (reference) or Cy5 (test) were done using a CyScribe postlabeling kit normally used for copying RNA into cDNA (Amersham Biosciences, Hillerød, Denmark). The manufacturer's protocol was followed except for the first steps, which are described here. Ten microliters of gDNA (0.5 μg/μl) was mixed with 3 μl H2O and 2 μl random nonamer from the kit for priming. This mixture was incubated in boiling water for 5 min, after which it was put on ice. A mixture containing 1 μl dUTP mix, 1 μl AA-dUTP, 2 μl 10× Klenow buffer, and 1 μl Klenow enzyme (Klenow fragment [3′→5′ exonuclease negative] at 50,000 U/ml [Medinova Scientific]) was added, and the sample incubated at 37°C for 2 h. Samples were frozen at −20°C or processed immediately.

Hybridization, washing, scanning, and preanalysis of arrays.

Hybridization, washing, scanning, and preanalysis of arrays were carried out as described previously (Pedersen et al., submitted). Normalization was done in Acuity 4.0, using the default parameters (Axon Instruments Inc., Union City, CA). When an array was of poor quality (<70% of the spots/features designated “found” in generating a data set in Acuity), the preparation of the respective array was repeated.

Generation of dendrograms.

The log2 values were exported from Acuity to Excel, and genes determined to be present (those with a log2 value of >−2) (see Results and Discussion for details) were assigned a value of 1 and those determined to be absent were assigned a value of 0. The data were then reimported into Acuity and used to generate dendrograms.

Microarray data accession number.

The platform specifications for the microarrays are available at the NCBI Gene Expression Omnibus (GEO) under accession number GPL6369.

RESULTS AND DISCUSSION

Design of microarray platform.

Our aim was to design an array platform with all publicly available CDSs of S. thermophilus. The following three genome sequences of S. thermophilus were available: S. thermophilus CNRZ1066 (4), LMG18311 (4), and LMD-9 (14). All nonredundant S. thermophilus CDSs in GenBank were also added. In total, probes for more than 2,200 S. thermophilus genes are represented on the microarrays.

In the following, we use the log2 value (log2 ratio), i.e., the log2 of the Cy5 “test strain” hybridization signal (minus background) divided by the Cy3 “reference strain” hybridization signal (minus background). LMG18311 was always the reference strain.

Microarray platform validation.

The microarray platform was tested in a self-hybridization with LMG18311, i.e., this strain was used as both the reference and test strains. A plot of the log2 values of the data is presented in Fig. Fig.1.1. LMG18311 did not hybridize to all probes on the microarrays, as the arrays contain CDSs not present in this strain. The majority (99.4%) of genes present in the strain produced a log2 value of between −0.5 and 0.5 (a ratio of 1.4 up or down); a few gave greater differences, but all genes present in LM18311 had log2 values of between −1 and 1.

FIG. 1.
log2 values for all detected genes on the microarray, hybridizing LMG18311 against itself. Dashed lines mark values of ±0.5.

Hols et al. (12) identified 3,000 single-nucleotide polymorphisms (SNPs) between LMG18311 and CNRZ1066. We investigated the effect of SNPs on the hybridization signal. The presence of one SNP in the oligonucleotide sequence, as determined from the genome sequences, resulted in log2 values ranging from approximately 0 to about −1 on the CGH array. In the case of six, seven, or more SNPs in the oligonucleotide sequence, the log2 value was around −2 (Fig. (Fig.2A).2A). When more SNPs were present, the log2 value dropped even further. The exact effect of a SNP on hybridization is dependent on its position within the oligonucleotide sequence and also on the specific base involved (data not shown).

FIG. 2.
Comparison of LMG18311 and CNRZ1066. (A) The numbers of SNPs between the microarray probe sequence, based on LMG18311, and the published sequence of CNRZ1066 were plotted against log2 values (circles) from the CGH array for genes where the log2 value ...

Comparing the genome sequences of the two strains allows prediction of which oligonucleotides should return a “present” signal from CNRZ1066 and which should return an “absent” signal. From Fig. Fig.2B,2B, it can be seen that absent and present genes (based on the sequenced DNA) were distributed relatively well around a log2 value of about −2. We therefore designated genes with log2 values of <−2 as absent from the test genome. In addition, the genes absent in both the test and reference strains yielded no hybridization data. Such genes were assigned a log2 value of −2, marking them as not present.

We have applied this method to 47 S. thermophilus strains. Genes present in the investigated strains but not on the microarray could naturally not be detected. A single array was done for each strain. While this slightly reduces the accuracy of the results, it allows the inclusion of a greater number of strains. We have previously shown that the general array platform gives highly reliable data using single arrays, i.e., a detection level of 1.5- to 2-fold, up or down, without the use of replicate arrays (10, 18). This is further substantiated by the data presented in Fig. Fig.11.

Phylogenetic tree.

To establish the relatedness of the different strains, a phylogenetic tree was generated using a hierarchical clustering algorithm (Pearson centered algorithm; Acuity) (Fig. (Fig.3),3), following conversion of the microarray data to absent (0) and present (1). Some distinct subgroups appear within the tree. Strain S39-20 clusters in a group of its own. Except for the relatively few strains in the subgroups, most strains have similar phylogenetic distances to most other strains.

FIG. 3.
Phylogenetic tree based on CGH microarray data. The dashed box indicates a subgroup containing PrtS protease-negative strains. The strains S39-69 and S39-72 differ by 35 genes, whereas S39-69 and S39-20 differ by 270 genes.

Delorme et al. (6) concluded that recombination is frequent in Streptococcus salivarius and Streptococcus vestibularis, which are part of the S. salivarius group, which also contains S. thermophilus (8a). The lack of a clear evolutionary path (Fig. (Fig.3)3) indicates that recombination or gene transfer is also frequent in S. thermophilus. This confirms previous observations based on the sequence diversity of exopolysaccharide (EPS) genes (4a) but is in contrast to observations based on multilocus sequence typing results (12).

Interestingly, the strains in the largest subgroup (marked with a dashed square in Fig. Fig.3)3) all lack the prtS protease gene, important for rapid acidification by S. thermophilus in milk (20). In total, 20 genes that are present in >50% of the other strains are missing from the strains in the prtS-negative group. These include genes for efflux pumps, antimicrobial peptide transporters, ion channels, and response regulators. Likewise, 39 genes are detected in all of the strains of the prtS-negative group but are missing in >50% of the other strains. These include genes for several amino acid uptake systems, maltose/maltodextrin transporters, iron compound uptake systems, and transcriptional regulators. Hence, the prtS-negative strains may be able to compensate for the lack of a protease by having additional amino acid uptake systems, thereby relying on other proteases, for example, those provided by Lactobacillus delbrueckii subsp. bulgaricus during yogurt fermentation, to degrade protein sources. Four strains (S39-20, S39-48, S39-51, and S39-70) outside this group also lack the prtS gene (data not shown).

Core genome and variable genes.

Core genes are genes which are present in all dairy S. thermophilus strains and represent genes which are necessary for survival under conditions normally encountered by S. thermophilus in an industrial dairy environment. Here we have identified 1,271 core genes detected in all 47 investigated strains (see Table S1 in the supplemental material) and 916 noncore genes detected in only some of the strains.

Some noncore genes are detected in only a few strains, whereas others are detected in almost all strains.

There were 69 noncore genes detected in 45 strains and 233 noncore genes detected in 46 strains (Fig. (Fig.4).4). These 302 genes we designate “conserved genes.” They are probably core genes which have been lost in a few strains (or, in a few cases, genes for which the signal from the respective oligonucleotide is below the threshold due to noise). Similarly, there were between 27 and 58 noncore genes detected in only one to five genomes (Fig. (Fig.4).4). These are likely to be genes which were recently acquired and are designated “recently acquired genes.” There are 183 recently acquired genes in this data set. Finally, the group of noncore genes detected in 6 to 44 genomes we term “variable genes,” of which there are 431 (Fig. (Fig.4).4). Curiously, 14 genes on the array were not detected in any of these strains; these are also likely to belong to the group of “recently acquired genes.”

FIG. 4.
Number of noncore genes present in only a given number of strains (1 to 46). Genes found in 45 or 46 strains form a group of conserved genes. Those found in 6 to 44 strains form a group termed variable genes. The group of recently acquired genes is made ...

The distribution of core genes and the various noncore genes within S. thermophilus is illustrated in Fig. Fig.5.5. The noncore genes consist mainly of genes encoding bacteriocins, efflux/uptake pumps, and proteins involved in EPS biosynthesis and peptide metabolism, phage genes, and phage resistance genes.

FIG. 5.
Relative proportions of different gene groups in 47 S. thermophilus strains.

The definition of the size of the core genome depends on the number of strains investigated, as illustrated in Fig. Fig.6.6. When a small number of strains are taken into account, the number of core genes is high but decreases rapidly as genomes are added. When more strains are taken into account, the rate of gene exclusion decreases. In this study, we used 47 strains and ended up with a core genome of 1,271 genes. Since the curve on Fig. Fig.66 has not flattened out, it appears that we have not yet reached the point where the core genome no longer decreases when more strains are included in the calculation. Lefébure and Stanhope (13) defined the core genome of S. thermophilus to be 1,487 genes, based on three sequenced strains. They also defined the core genome of the genus Streptococcus to be around 600 genes, indicating that a substantial number of genes are present which give S. thermophilus its unique characteristics.

FIG. 6.
Size of the core genome plotted against the number of strains used for determination. Error bars are for 10 different random input sequences of strains.

S. thermophilus, together with S. salivarius and S. vestibularis, forms the salivarius group of the viridans group streptococci. These species are closely related (8a), and based on the genome sequences of two S. thermophilus strains, it has been suggested that S. thermophilus has evolved during the last 3,000 to 30,000 years, perhaps as part of human dairy activities, which began around 7,000 years ago. Several of the acquired genes found in S. thermophilus appear to originate from other dairy species, such as Lactococcus lactis and Lactobacillus delbrueckii, and thus contribute to its adaptation to the milk environment (7).

Estimation of chromosome size.

The CGH microarrays can be used to determine an approximate chromosome size for the strains. First, the starting position of each gene on the chromosome can be used to estimate the approximate size of each gene. When the gene size is calculated from the start position of one gene to the start position of the next gene, the size includes intergenic/noncoding regions associated with the gene. For genes that do not have an identified position on the main genome sequences, we estimated the size based on the average number of base pairs per S. thermophilus gene. From the information presented by Hols et al. (12), we calculated that S. thermophilus has 944 bp/gene. Furthermore, when the microarrays were designed, all tRNA and rRNA genes were excluded, except for one 23S and one 16S RNA gene. Bolotin et al. (4) identified 67 tRNAs and 6 rRNA operons in the genomes of CNRZ1066 and LMG13811. To compensate for this, we added 14.2 kb to the values calculated from the microarrays, as tRNA genes have an average size of about 80 bp and rRNA genes have an average size of 2.2 kb (1.5 kb for 16S rRNA and 2.9 kb for 23S rRNA). Furthermore, if plasmid DNA is present in the strains, its size has to be excluded from the size calculations.

To get an estimate of how well the method determines chromosome sizes, the values obtained from the CGH data for the three sequenced strains were compared to the published sizes (12, 14) (Table (Table1).1). It is evident that the error for LMD-9 is quite large, probably partly because the oligonucleotides were designed before the LMD-9 genome sequence was completed.

TABLE 1.
Comparison of genome sizes determined with microarrays and sequencing

Using only the data for LMG18311 and CNRZ1066, we estimate that our CGH-based method underestimates chromosome sizes by about 8 kb (~0.4%). The largest chromosome is estimated to be 1,814 kb (strain S39-50), and the smallest is 1,696 kb (strain S39-63). The difference in size, 118 kb, corresponds to roughly 119 genes of average S. thermophilus size; looking at the actual numbers from the arrays, there is a difference of 135 genes. The size estimates were confirmed using pulsed-field gel electrophoresis (PFGE) with four different restriction enzymes on gels including the three reference strains (data not shown).

The chromosome sizes estimated by CGH exclude genes that are present in S. thermophilus but are not present on the array platform. By comparing the chromosome sizes estimated by CGH with chromosome sizes estimated by PFGE, we verified that S39-50 indeed has one of the largest S. thermophilus chromosomes investigated in the current work and that S39-63 contains one of the smallest chromosomes (1,720 to 1,735 kb) (data not shown). S39-50 has a PFGE fingerprint almost identical to that of LMD-9.

S. thermophilus S39-52 is closely related to S39-50, with very similar PFGE fingerprints and phage resistance, but it is lysogenic (data not shown). The PFGE data suggest that S39-52 contains 40 kb of additional DNA. CGH with strain S39-52 furthermore gives signals from 13 phage genes that are not detectable in S39-50. Overall, this indicates that the extra DNA in S39-52 is prophage DNA.

By comparing closely related strains, using the CGH data and PFGE with four different restriction enzymes, we have estimated the amount of unknown DNA in the 47 strains to be between 0 and 50 kb. Tettelin et al. (24) proposed a model to predict the average number of new genes that will be identified by sequencing additional Streptococcus agalactiae and Streptococcus pyogenes strains. The model predicted 33 new genes for the former species and 27 new genes for each additional S. pyogenes strain sequenced.

The core genome of 1,271 genes is theoretically the smallest number of genes with which an S. thermophilus strain can survive. Of the 47 strains investigated, the smallest genome contains 1,745 genes, or 474 more than the core genome. This indicates that S. thermophilus strains harbor a considerable number of noncore genes, but the exact nature of these is variable.

This has led us to propose a conditional core genome. For instance, S. thermophilus LMD-9 can synthesize histidine, whereas LMG18311 and CNRZ1066 cannot (12). The latter two auxotrophic strains instead harbor systems to acquire histidine from the environment. It is not necessary to harbor both systems, and hence they are not in the core genome, but a strain must harbor one of them to survive. We propose that such types of genes belong to a conditional core genome. Many of the 474 noncore genes belonging to the smallest genome identified in this study may be such conditional core genome genes.

Another interesting group of genes is the minimal industrial genome. The minimum genome is the core genome plus the conditional core genome, but that is not necessarily applicable in an industrial dairy context. Certain features are needed for strains to be useful in the dairy industry. The strains must (i) be able to grow in laboratory medium when they are first isolated, (ii) grow in production medium to high density, (iii) grow well in milk, (iv) be stable and maintain their activity when stored, and (v) be relatively resistant toward phages. Hence, the industrial minimum genome (presented here) is probably substantially larger than the theoretical minimum genome.

EPS genes.

The ability of S. thermophilus to produce EPS is important for the dairy industry, as it enhances the texture of fermented milk products such as yogurt. S. thermophilus EPS consists of heterosaccharide polymers of primarily galactose, glucose, and rhamnose monomers (8). EPS synthesis in S. thermophilus involves binding of sugar monomers to a lipid carrier, using amino sugars as precursors. This reaction is performed by a galactose-1-phosphate or glucose-1-phosphate transferase, and subsequent attachment of different monomers is performed by glycosyl transferases. In addition to this, enzymes for polymerization and transmembrane translocation are needed (9, 19).

More than 10 EPS clusters in S. thermophilus have been identified and sequenced, indicating a large degree of variability. It was suggested by Broadbent et al. (5) that the organization of structural and regulatory genes within the EPS clusters is modular. All EPS clusters identified so far contain the deoD-epsABCD genes at the 5′ end (see reference 5 and references therein). We found that most strains analyzed in this study do indeed contain the deoD-epsABCD genes. Seven of the strains are missing epsE, encoding a galactose-1-phosphate or glucose-1-phosphate transferase, and two strains are missing epsA, encoding a transcriptional regulator. These missing genes could be compensated for by other EPS genes with similar functions. The S. thermophilus microarray platform contains 118 putative genes involved in EPS synthesis. A few of these genes form clusters where the entire cluster is always either absent or present. The EPS clusters of the different strains seem to be almost “random” assemblies of different EPS genes. The relatedness of the different strains based on their EPS gene contents can be seen in Fig. Fig.77.

FIG. 7.
Phylogenetic tree based on EPS gene content. The gray squares indicate the presence of various EPS genes.

In accordance with the work of Broadbent et al. (5), we speculate that the different EPS genes are harbored in a few physical locations in the chromosome (as in the case of the sequenced strains) and that diversification of the clusters proceeds via horizontal gene transfer and recombination of EPS genes.

Conclusion.

In the present work, we used CGH to identify 1,271 genes belonging to the core genome of the 47 investigated industrial S. thermophilus strains. Interestingly, only a few strains cluster phylogenetically, indicating that S. thermophilus evolves mainly via recombination with other S. thermophilus strains.

The microarrays were also used to estimate the sizes of the chromosomes. The largest chromosome found was about 1,814 kb. In contrast, the smallest chromosome among the 47 investigated strains was 118 kb smaller, containing approximately 135 fewer genes. The size estimates were confirmed by PFGE and could be used to reveal the presence of up to 50 kb of novel DNA in the investigated genomes. The genome size estimates indicate that even the smallest identified genomes are considerably larger than the minimal core genome. We propose that there is a conditional core genome consisting of the core genes plus a subset of genes drawn from a pool of genes encoding essential functions.

CGH provides a detailed picture of the evolution of S. thermophilus strains and might give clues to how the strains are constantly evolving. An understanding of the evolution of S. thermophilus might be used in the search for new industrial strains or the development of new derivatives with improved technological functions.

Supplementary Material

[Supplemental material]

Acknowledgments

We thank Henrik Bjørn Nielsen (CBS, DTU) for adding S. thermophilus LMG18311 and CNRZ1066 to the homology database of OligoWiz. We also thank Karen Fuglede Appel and Helle Schack Andersen for excellent technical work.

Footnotes

[down-pointing small open triangle]Published ahead of print on 6 June 2008.

Supplemental material for this article may be found at http://aem.asm.org/.

REFERENCES

1. Altschul, S., T. Madden, A. Schäffer, J. Zhang, Z. Zhang, W. Miller, and D. Lipman. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25:3389-3402. [PMC free article] [PubMed]
2. Reference deleted.
3. Reference deleted.
4. Bolotin, A., B. Quinquis, P. Renault, A. Sororkin, S. D. Ehrlich, S. Kulakauskas, A. Lapidus, E. Goltsman, M. Mazur, G. Pusch, M. Fonstein, R. Overbeek, N. Kyprides, B. Purnelle, D. Prozzi, K. Ngui, D. Masuy, F. Hancy, S. Burteau, M. Boutry, J. Delcour, A. Goffeau, and P. Hols. 2004. Complete sequence and comparative genome analysis of the dairy bacterium Streptococcus thermophilus. Nat. Biotechnol. 12:1554-1558. [PubMed]
4a. Bourgoin, F., A. Pluvinet, B. Gintz, B. Decaris, and G. Guédon. 1999. Are horizontal transfers involved in the evolution of the Streptococcus thermophilus exopolysaccharide synthesis loci? Gene 233:151-161. [PubMed]
5. Broadbent, J., D. McMahon, D. Welker, C. Oberg, and S. Moineau. 2003. Biochemistry, genetics, and applications of exopolysaccharide production in Streptococcus thermophilus: a review. J. Dairy Sci. 86:407-423. [PubMed]
6. Delorme, C., C. Poyart, S. D. Ehrlich, and P. Renault. 2007. Extent of horizontal gene transfer in evolution of streptococci of the salivarius group. J. Bacteriol. 189:1330-1341. [PMC free article] [PubMed]
7. Delorme, C. 22 August 2007, posting date. Safety assessment of dairy organisms: Streptococcus thermophilus. Int. J. Food Microbiol. doi:.10.1016/j.ijfoodmicro.2007.08.014 [PubMed] [Cross Ref]
8. Faber, E., P. Zoon, J. Kamerling, and J. Vliegenthart. 1998. The exopolysaccharides produced by Streptococcus thermophilus Rs and Sts have the same repeating unit but differ in viscosity of their milk cultures. Carbohydr. Res. 310:269-276. [PubMed]
8a. Facklam, R. 2002. What happened to the streptococci: overview of taxonomic and nomenclature changes. Clin. Microbiol. Rev. 15:613-630. [PMC free article] [PubMed]
9. García, E., and R. López. 1997. Molecular biology of the capsular genes of Streptococcus pneumoniae. FEMS Microbiol. Lett. 149:1-10. [PubMed]
10. Garrigues, C., B. Stuer-Lauridsen, and E. Johansen. 2005. Characterisation of Bifidobacterium animalis subsp. lactis BB-12 and other probiotic bacteria using genomics, transcriptomics and proteomics. Aust. J. Dairy Technol. 60:84-92.
11. Grappin, R., T. C. Rank, and N. F. Olson. 1985. Primary proteolysis of cheese proteins during ripening. A review. J. Dairy Sci. 68:531-540.
12. Hols, P., F. Hancy, L. Fontaine, B. Grossiord, D. Prozzi, N. Leblond-Bourget, B. Decaris, A. Bolotin, C. Delorme, S. D. Ehrlich, E. Guédon, V. Monnet, P. Renault, and M. Kleerebezem. 2005. New insights in the molecular biology and physiology of Streptococcus thermophilus revealed by comparative genomics. FEMS Microbiol. Rev. 29:435-463. [PubMed]
13. Lefébure, T., and M. J. Stanhope. 2007. Evolution of the core and pan-genome of Streptococcus: positive selection, recombination, and genome composition. Genome Biol. 8:R71. [PMC free article] [PubMed]
14. Makarova, K., A. Slesarev, Y. Wolf, A. Sorokin, B. Mirkin, E. Koonin, A. Pavlov, N. Pavlova, V. Karamychev, N. Polouchine, V. Shakhova, I. Grigoriev, Y. Lou, D. Rohksar, S. Lucas, K. Huang, D. Goodstein, T. Hawkins, V. Plengvidhya, D. Welker, J. Hughes, Y. Goh, A. Benson, K. Baldwin, J. H. Lee, I. Díaz-Muñiz, B. Dosti, V. Smeianov, W. Wechter, R. Barabote, G. Lorca, E. Altermann, R. Barrangou, B. Ganesan, Y. Xie, H. Rawsthorne, D. Tamir, C. Parker, F. Breidt, J. Broadbent, R. Hutkins, D. O'Sullivan, J. Steele, G. Unlu, M. Saier, T. Klaenhammer, P. Richardson, S. Kozyavkin, B. Weimer, and D. Mills. 2006. Comparative genomics of the lactic acid bacteria. Proc. Natl. Acad. Sci. USA 103:15611-15616. [PMC free article] [PubMed]
15. Reference deleted.
16. Reference deleted.
17. Nielsen, H. B., R. Wernersson, and S. Knudsen. 2003. Design of oligonucleotides for microarrays and perspectives for design of multi-transcriptome arrays. Nucleic Acids Res. 31:3491-3496. [PMC free article] [PubMed]
18. Pedersen, M. B., S. L. Iversen, K. I. Sorensen, and E. Johansen. 2005. The long and winding road from the research laboratory to industrial applications of lactic acid bacteria. FEMS Microbiol. Rev. 29:611-624. [PubMed]
19. Roberts, I. S. 1996. The biochemistry and genetics of capsular polysaccharide production in bacteria. Annu. Rev. Microbiol. 50:285-315. [PubMed]
20. Shahbal, S., D. Hemme, and M. Desmazeaud. 1991. High cell wall-associated proteinase activity of some Streptococcus thermophilus strains (H-strains) correlated with a high acidification rate in milk. Lait 71:351-357.
21. Stanley, E., G. F. Fitzgerald, C. Le Marrec, B. Fayard, and D. van Sinderen. 1997. Sequence analysis and characterization of [var phi]O1205, a temperate bacteriophage infecting Streptococcus thermophilus CNRZ1205. Microbiology 143:3417-3429. [PubMed]
22. Reference deleted.
23. Tamime, A. Y., and H. C. Deeth. 1980. Yoghurt: technology and biochemistry. J. Dairy Prot. 43:939-977.
24. Tettelin, H., V. Masignani, M. J. Cieslewicz, C. Donati, D. Medini, N. L. Ward, S. V. Angiuoli, J. Crabtree, A. L. Jones, A. S. Durkin, R. T. Deboy, T. M. Davidsen, M. Mora, M. Scarselli, I. Ros, J. D. Peterson, C. R. Hauser, J. P. Sundaram, W. C. Nelson, R. Madupu, L. M. Brinkac, R. J. Dodson, M. J. Rosovitz, S. A. Sullivan, S. C. Daugherty, D. H. Haft, J. Selengut, M. L. Gwinn, L. Zhou, N. Zafar, H. Khouri, D. Radune, G. Dimitrov, K. Watkins, K. J. O'Connor, S. Smith, T. R. Utterback, O. White, C. E. Rubens, G. Grandi, L. C. Madoff, D. L. Kasper, J. L. Telford, M. R. Wessels, R. Rappuoli, and C. M. Fraser. 2005. Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial “pan-genome.” Proc. Natl. Acad. Sci. USA 102:13950-13955. [PMC free article] [PubMed]
25. Reference deleted.
26. Tremblay, D. M., and S. Moineau. 1999. Complete genomic sequence of the lytic bacteriophage DT1 of Streptococcus thermophilus. Virology 255:63-76. [PubMed]

Articles from Applied and Environmental Microbiology are provided here courtesy of American Society for Microbiology (ASM)
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links