![]() | ![]() |
Formats:
|
||||||||||||||||||||||
Copyright © 2001 Le Flèche et al, licensee BioMed Central Ltd. A tandem repeats database for bacterial genomes: application to the genotyping of Yersinia pestis and Bacillus anthracis 1Centre d'Etudes du Bouchet, BP3, 91710 Vert le Petit, France 2Génomes et Minisatellites, Institut de Génétique et Microbiologie, Bat 400, Université Paris XI, 91405 Orsay cedex, France 3Department of Biomathematical Sciences, Box 1023, Mount Sinai School of Medicine, One Gustave L. Levy Place, New York, USA Corresponding author.Philippe Le Flèche: lefleche/at/igmors.u-psud.fr; Yolande Hauck: Yolande.Hauck/at/igmors.u-psud.fr; Lucie Onteniente: Lucie.Onteniente/at/igmors.u-psud.fr; Agnès Prieur: Agnes.Prieur/at/igmors.u-psud.fr; France Denoeud: France.Denoeud/at/igmors.u-psud.fr; Vincent Ramisse: Vincent.Ramisse/at/ceb.etca.fr; Patricia Sylvestre: psylvest/at/pasteur.fr; Gary Benson: benson/at/ecology.biomath.mssm.edu; Françoise Ramisse: f.ramisse/at/freesurf.fr; Gilles Vergnaud: Gilles.Vergnaud/at/igmors.u-psud.fr Received February 19, 2001; Accepted March 30, 2001. This article has been cited by other articles in PMC.Abstract Background Some pathogenic bacteria are genetically very homogeneous, making strain discrimination difficult. In the last few years, tandem repeats have been increasingly recognized as markers of choice for genotyping a number of pathogens. The rapid evolution of these structures appears to contribute to the phenotypic flexibility of pathogens. The availability of whole-genome sequences has opened the way to the systematic evaluation of tandem repeats diversity and application to epidemiological studies. Results This report presents a database (http://minisatellites.u-psud.fr) of tandem repeats from publicly available bacterial genomes which facilitates the identification and selection of tandem repeats. We illustrate the use of this database by the characterization of minisatellites from two important human pathogens, Yersinia pestis and Bacillus anthracis. In order to avoid simple sequence contingency loci which may be of limited value as epidemiological markers, and to provide genotyping tools amenable to ordinary agarose gel electrophoresis, only tandem repeats with repeat units at least 9 bp long were evaluated. Yersinia pestis contains 64 such minisatellites in which the unit is repeated at least 7 times. An additional collection of 12 loci with at least 6 units, and a high internal conservation were also evaluated. Forty-nine are polymorphic among five Yersinia strains (twenty-five among three Y. pestis strains). Bacillus anthracis contains 30 comparable structures in which the unit is repeated at least 10 times. Half of these tandem repeats show polymorphism among the strains tested. Conclusions Analysis of the currently available bacterial genome sequences classifies Bacillus anthracis and Yersinia pestis as having an average (approximately 30 per Mb) density of tandem repeat arrays longer than 100 bp when compared to the other bacterial genomes analysed to date. In both cases, testing a fraction of these sequences for polymorphism was sufficient to quickly develop a set of more than fifteen informative markers, some of which show a very high degree of polymorphism. In one instance, the polymorphism information content index reaches 0.82 with allele length covering a wide size range (600-1950 bp), and nine alleles resolved in the small number of independent Bacillus anthracis strains typed here. Background The polymorphism associated with tandem repeats has been instrumental in mammalian genetics for the construction of genetic maps and still is the basis of DNA fingerprinting in forensic applications. Tandem repeats are usually classified among satellites (spanning megabases of DNA, associated with heterochromatin), minisatellites (repeat units in the range 6-100 bp, spanning hundreds of base-pairs) and microsatellites (repeat units in the range 1-5 bp, spanning a few tens of nucleotides). More recently, a number of studies have supported the notion that tandem repeats reminiscent of mini and microsatellites are likely to be a highly significant source of very informative markers for the identification of pathogenic bacteria even when these pathogens are recently emerged, highly monomorphic species [1, 2, 3, 4, 5]. This probably reflects the important contribution of tandem repeats to the adaptation of the pathogen to its host. Tandem repeats appear to contribute to phenotypic variation in bacteria in at least two ways. Tandem repeats located within the regulatory region of a gene can constitute an on/off switch of gene expression at the transcriptional level [6, 7]. Similarly, tandem repeats within coding regions with repeat units length not a multiple of three can induce a reversible premature end of translation when a mutation changes the number of repeats (reviewed in 8, 9, 10). In other instances, the repeated unit length is a multiple of three, and the tandem repeat contributes to a coding region. In such cases, variations in the number of copies modify the gene product itself [11]. Mutation mechanisms of micro and minisatellites have been studied in some detail in eukaryotes, essentially human and yeast (reviewed in [12]). In brief, the data obtained so far suggest that microsatellites mutate by replication slippage processes; mutation rates depend upon the efficiency of mismatch repair mechanisms and an internal heterogeneity within the array strongly stabilizes the tandem repeat. In contrast, minisatellites mutate predominantly as the result of the repair of a double strand break initiated within, or very close to, the tandem repeat. In eukaryotes at least, these events can be of replicative origin [13], or can be genetically controlled, and specifically induced, during meiosis, at double strand breaks hot-spots. Minisatellite mutation rate in eukaryotes appears to be insensitive to mismatch repair efficiency, and internal heterogeneity is compatible with a high mutation rate [12, 14]. In bacteria, loci containing a tandem repeat from the microsatellite class (repeat unit sizes of 1-8 bp) have been called simple sequence contingency loci [8]. Altered number of repeats allows for reversible on and off states of expression for the corresponding gene. The mutation rate of a tetranucleotide (microsatellite) tract in Haemophilus influenzae is higher than 10-4 and contributes to the adaptation of the pathogen to its hosts as the infection progresses [15]. In such an extreme situation, the microsatellite is of limited value for strain identification, epidemiological and phylogenetic studies. The tandem repeat array is composed of perfect copies of the elementary unit, and different alleles are observed in a single culture. In contrast, the phylogenetic identity of minisatellite alleles of identical size can usually be further checked by DNA sequencing, since the repeated units are often not perfect [16]. The pattern of variants along the array provides an additional level of allele identification and phylogenetic information. In addition, tandem repeats with longer repeat unit length can be relatively easily typed in the size range of a few hundred base-pairs using ordinary horizontal gel electrophoresis. In this report, we will first describe the use of a tandem repeats database for bacterial genomes (http://minisatellites.u-psud.fr) and briefly compare the general characteristics of tandem repeats in a number of bacterial genomes for which the sequence has been determined and made publicly available. We will then show how this tool can easily be applied to the rapid characterization of new highly polymorphic markers in two pathogens, Y. pestis and B. anthracis. Both Y. pestis (causative agent of plague) and B. anthracis (causative agent of anthrax) are recently emerged clones of respectively Y. pseudotuberculosis [17] and B. cereus [18]. In the case of Y. pestis, a high resolution typing tool based on RFLP (Restriction Fragment Length Polymorphism) analysis of IS100 locations has already been developed [17]. However this technology is more demanding than PCR typing, which justifies the development of such an assay. In the case of B. anthracis, polymorphisms were initially identified essentially using AFLP (Amplified Fragment Length Polymorphism) typing [19]. Subsequent analyses demonstrated that the most informative fragments in AFLP patterns resulted from tandem repeat array length variations (five minisatellite loci were characterized in this way [2]). Results and discussion Use of the tandem repeats database To date, 36 bacterial genome sequences from 32 species have been released in the public domain and are included in the database (Figure (Figure1A;1A
As a quick illustration of the use of this database to facilitate the development of genotyping tools for bacterial genomes, we have evaluated the polymorphism associated with tandem repeats from Y. pestis on one hand and B. anthracis on the other (in this second instance, the genome sequence has not been completed yet and does not appear on the publicly accessible Tandem Repeats Database page, Figure Figure1A1A Application to Y. pestis Figure Figure3A3A
Application to B. anthracis Given the relatively low overall size of most bacterial tandem repeats, tandem repeat search can be run even on unfinished sequences. Tandem Repeats Finder was applied to B. anthracis sequence obtained from The Institute for Genomic Research through the website at http://www.tigr.org. The sequence was recovered as approximately 1000 contigs, for a total amount of slightly more than 5 Mb. Thirty tandem repeats have at least 10 copies of a repeat unit longer than 9 base-pairs. Fourteen of them are polymorphic among the 31 B. anthracis strains typed here (Table 2). Twenty-seven different genotypes are identified. Polymorphism information content (PIC) indexes based on the 27 genotypes vary from 0.07 to 0.82. Nine PIC values are above 0.5. Eight alleles are identified for CEB-Bams30, in a size range 270-900 base-pairs (Figure (Figure5).5
Correlations between polymorphism and structural characteristics of minisatellites We have looked for correlations between on one hand the number of alleles and polymorphism of the minisatellites, and on the other, simple structural characteristics of the tandem repeats in the sequenced strain : motif size, number of motifs, total length, conservation of the motifs along the array (percent identity), GC content, strand bias. In the case of B. anthracis, a highly significant correlation (0.01 level) is observed between polymorphism and both total length and GC content. This is not true for Y. pestis in which a strong correlation is seen between the number of alleles and the conservation of the motifs (Figure (Figure77
Conclusions We limited here our investigation of tandem repeats to minisatellites, i.e. repeat units longer than 9 base-pairs, so as to avoid simple sequence contingency loci [8] of limited epidemiological value, and to facilitate the typing of alleles with agarose gel electrophoresis. However, simple sequence contingency loci are also represented in the database and are of great interest for molecular pathogenicity studies [6, 7, 8]. The use of the tandem repeats database was demonstrated here on two of the most genetically homogeneous human pathogenes, Y. pestis and B. anthracis. There is consequently a possibility that a common database format for identification and epidemiological analyses of pathogens amenable to minisatellite typing be developed. As more data becomes available on polymorphism associated with tandem repeats, it will be added to the database presented here in order to avoid duplication of work and nomenclature. Bacterial species differ very significantly in the density of tandem repeats within their genome, and also in their use of tandem repeats. Some species have a very strong excess of tandem repeats with repeat units length which are multiple of three, the most striking examples being M. tuberculosis and P. aeruginosa. Polymorphism in such tandem repeats is likely to modulate the protein structure rather than gene activity. In M. tuberculosis, all tandem repeats with total length (L) higher than 100 bp and 9 or 15 base-pairs long units are located with ORFs [21]. An important proportion of these tandem repeats correspond to the so-called PE and PPE multigene families [21]. In the two species studied here, tandem repeat polymorphism is strongly correlated with one or more of the sequenced allele characteristics, as illustrated in Figure Figure7.7 Five among the B. anthracis markers described here (Ceb-Bams1, 3, 7, 13 and 30) are highly polymorphic with PIC values (or Nei's index) above 0.7. In this respect, it is important to observe that the length of the allele observed for Ceb-Bams1 in the Ames strain is not of the size expected from the sequence data (Table 2). This may result either from a high mutation rate at Ceb-Bams1 or from a sequencing error. The expected allele size corresponds to allele 4 (Table 3), which is unlikely for the Ames strain because Ceb-Bams1 allele 4 is observed only in cluster B strains (Figure (Figure6)6 It is interesting to observe that, although the magnitude of allele size difference has not been taken into account when building the distance matrix, the resulting phylogenetic tree proposed in Figure Figure66 Materials and methods Bacterial genomes DNA sequences Finished sequences in the public domain were recovered by ftp from the NCBI or the Sanger center sites (http://www.ncbi.nlm.nih.gov/PMGifs/Genomes/bact.html; http://www.sanger.ac.uk/Projects/Microbes/). Preliminary sequence data for B. anthracis was obtained from The Institute for Genomic Research through the website at http://www.tigr.org. DNA preparation All strains used here are part of the collection maintained by the Centre d'Etudes du Bouchet (CEB). They originate either from the CIP (Collection Institut Pasteur, http://www.pasteur.fr/) or from AFSSA (Agence Française de Sécurité Sanitaire des Aliments, http://www.afssa.fr/, Dr Josée Vaissaire). DNA from each isolate was obtained by large-batch procedures or by the simplified procedure as described in [2]. In addition, 15 μg of DNA from the B. anthracis Ames strain were kindly provided by Dr Mats Forsman, FOA, Sweden. Minisatellite PCR amplification and genotyping PCR reactions were performed in 15 μl containing 1 ng of DNA, 1x Long Range Reaction Buffer 3 (Roche-Boehringer), 1 unit of Taq DNA polymerase, 200 μM of each dNTP, 0.3 μM of each flanking primer. The Taq DNA polymerase was either prepared essentially as described in [22] or purchased from Qbiogen or Roche-Boehringer. The 1x LongRange Buffer 3 is 1.75 mM MgCl2, 50 mM Tris-HCl pH9.2, 16 mM (NH4)2SO4. PCR reactions were run on a Perkin-Elmer 9600 or a MJResearch PTC200 thermocycler. An initial denaturation at 96°C for five minutes was followed by 34 cycles of denaturation at 96°C for 20 seconds, annealing at 60°C for 30 seconds, elongation at 65°C for 1 minute, followed by a final extension step of 5 minutes at 65°C. In few cases, other annealing temperatures and/or elongation times were used (see tables 1 and 2). Five microliters of the PCR products where run on standard 1% or 2% agarose gel (Qbiogen) in 0.5 x TBE buffer at a voltage of 10 V/cm as indicated in Tables 1 and 2. Gel length of 10 to 40 cm were used according to PCR product size and motif length. Gels were stained with ethidium bromide and visualized under UV light. Allele sizes were estimated using as size markers the 1 kb ladder plus (Gibco-BRL which also includes a 100 bp ladder between 100 bp and 500 bp, plus 650, 850 and 1000 bp bands) or the 50 bp ladder (Euromedex) which provides a 50 bp ladder between 50 and 300 bp and a 100 bp ladder from 300 bp to 1000 bp. Data analysis Tandem Repeats Finder analysis: Sequences were processed using the Tandem Repeats Finder software (http://c3.biomath.mssm.edu/trf.html). The output was processed to eliminate duplicates before being imported in a database (running under Access2000, Microsoft Corp.) as described previously [12]. The B. anthracis preliminary sequence data file uses FASTA type of headers (i.e. >sequenceId) to separate the independent contigs. The headers were replaced by runs of 10 Ns before running Tandem Repeats Finder. Blast queries against the M. tuberculosis genome: The identifications of the open reading frames containing a given tandem repeat from M. tuberculosis were done by running a BLAST search on the dedicated web page at http://www.sanger.ac.uk/Projects/M_tuberculosis/blast_server.shtml. Estimation of the excess of tandem repeats with motif length multiple of three: A χ2 test was calculated for the difference between the observed number of tandem repeats with motif length multiple of 3 and the expected number of tandem repeats with motif length multiple of 3 (expected value in the absence of bias being the total number of tandem repeats divided by 3). The χ2 values vary from 0.01 to 253.5. There is a significant excess (χ2 > 3.841) for all species but 6 (Buchnera sp, T. maritima, H. influenzae, M. genitalium, R. prowazekii, Y. pestis). Polymorphism index: Polymorphism Information Index (PIC) or Nei's diversity index is calculated as 1 - Σ (allele frequency)2 based upon the unique genotypes. Phylogenetic reconstruction: A phenetic approach, based on a distance matrix was used. Distance matrix between strains was obtained by counting the number of differences between the corresponding genotypes. Then, Neighbor Joining cluster analysis was performed with Phylip [23] accessed at http://www.infobiogen.fr/. An outgroup was arbitrary chose among Bacillus cereus strains (9785) and input order of species was randomised. Data (genotypes, distance matrix, phylogenetic tree) are available at http://minisatellites.u-psud.fr/ASPSamp/Phylogenie/data.htm Correlation analysis Correlations were calculated with the statistical program SPSS: Pearson correlation, and non-parametric correlations (Kendall's tau and Spearman's rho) show similar results. Acknowledgements Minisatellite investigations in the laboratory are supported by grants from Délégation Générale de l'Armement (DGA/DSA/STTC and DGA/DSA/SPNuc). Preliminary sequence data for B. anthracis was obtained from The Institute for Genomic Research through the website at http://www.tigr.org. Sequencing of B. anthracis was accomplished with support from Office of Naval Research, Department of Energy, and National Institute of Allergy and Infectious diseases. The Yersinia pestis sequence data were produced by the Y. pestis Sequencing Group at the Sanger Centre and can be obtained from ftp://ftp.sanger.ac.uk/pub/pathogens/yp/. We wish to thank the referees for the significant improvements they have suggested. References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||||||
Infect Immun. 1997 Dec; 65(12):5017-27.
[Infect Immun. 1997]J Bacteriol. 2000 May; 182(10):2928-36.
[J Bacteriol. 2000]Microbiology. 1998 May; 144 ( Pt 5)():1189-96.
[Microbiology. 1998]Mol Microbiol. 2000 May; 36(3):762-71.
[Mol Microbiol. 2000]J Clin Microbiol. 2000 Apr; 38(4):1516-9.
[J Clin Microbiol. 2000]Genome Res. 2000 Jul; 10(7):899-907.
[Genome Res. 2000]Mol Cell Biol. 1998 May; 18(5):2779-88.
[Mol Cell Biol. 1998]Nat Genet. 1999 Nov; 23(3):367-71.
[Nat Genet. 1999]J Clin Invest. 2001 Mar; 107(6):657-62.
[J Clin Invest. 2001]Mol Microbiol. 2000 Jan; 35(1):211-22.
[Mol Microbiol. 2000]Microbiol Mol Biol Rev. 1998 Jun; 62(2):275-93.
[Microbiol Mol Biol Rev. 1998]Proc Natl Acad Sci U S A. 1999 Nov 23; 96(24):14043-8.
[Proc Natl Acad Sci U S A. 1999]Appl Environ Microbiol. 2000 Jun; 66(6):2627-30.
[Appl Environ Microbiol. 2000]J Bacteriol. 1997 Feb; 179(3):818-24.
[J Bacteriol. 1997]J Bacteriol. 2000 May; 182(10):2928-36.
[J Bacteriol. 2000]Nucleic Acids Res. 1999 Jan 15; 27(2):573-80.
[Nucleic Acids Res. 1999]Nucleic Acids Res. 1999 Jan 15; 27(2):573-80.
[Nucleic Acids Res. 1999]Proc Natl Acad Sci U S A. 1999 Nov 23; 96(24):14043-8.
[Proc Natl Acad Sci U S A. 1999]J Bacteriol. 2000 May; 182(10):2928-36.
[J Bacteriol. 2000]J Clin Invest. 2001 Mar; 107(6):657-62.
[J Clin Invest. 2001]Cell. 1993 Jun 18; 73(6):1187-96.
[Cell. 1993]Cell. 1989 Nov 17; 59(4):657-65.
[Cell. 1989]Nature. 1998 Jun 11; 393(6685):537-44.
[Nature. 1998]Genome Res. 2000 Jul; 10(7):899-907.
[Genome Res. 2000]J Bacteriol. 2000 May; 182(10):2928-36.
[J Bacteriol. 2000]Infect Immun. 1997 Dec; 65(12):5017-27.
[Infect Immun. 1997]J Bacteriol. 2000 May; 182(10):2928-36.
[J Bacteriol. 2000]Anal Biochem. 1990 Dec; 191(2):396-400.
[Anal Biochem. 1990]Genome Res. 2000 Jul; 10(7):899-907.
[Genome Res. 2000]J Bacteriol. 2000 May; 182(10):2928-36.
[J Bacteriol. 2000]