• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of jcmPermissionsJournals.ASM.orgJournalJCM ArticleJournal InfoAuthorsReviewers
J Clin Microbiol. Jan 2007; 45(1): 39–46.
Published online Nov 1, 2006. doi:  10.1128/JCM.02483-05
PMCID: PMC1828963

Role of Large Sequence Polymorphisms (LSPs) in Generating Genomic Diversity among Clinical Isolates of Mycobacterium tuberculosis and the Utility of LSPs in Phylogenetic Analysis[down-pointing small open triangle]


Mycobacterium tuberculosis strains contain different genomic insertions or deletions called large sequence polymorphisms (LSPs). Distinguishing between LSPs that occur one time versus ones that occur repeatedly in a genomic region may provide insights into the biological roles of LSPs and identify useful phylogenetic markers. We analyzed 163 clinical M. tuberculosis isolates for 17 LSPs identified in a genomic comparison of M. tuberculosis strains H37Rv and CDC1551. LSPs were mapped onto a single-nucleotide polymorphism (SNP)-based phylogenetic tree created using nine novel SNP markers that were found to reproduce a 212-SNP-based phylogeny. Four LSPs (group A) mapped to a single SNP tree segment. Two LSPs (group B) and 11 LSPs (group C) were inferred to have arisen independently in the same genomic region either two or more than two times, respectively. None of the group A LSPs but one group B LSP and five group C LSPs were flanked by IS6110 sequences in the references strains. Genes encoding members of the proline-glutamic acid or proline-proline-glutamic acid protein families were present only in group B or C LSPs. SNP- versus LSP-based phylogenies were also compared. We classified each isolate into 58 LSP types by using a separate LSP-based phylogenetic analysis and mapped the LSP types onto the SNP tree. LSPs often assigned isolates to the correct phylogenetic lineage; however, significant mistakes occurred for 6/58 (10%) of the LSP types. In conclusion, most LSPs occur in genomic regions that are prone to repeated insertion/deletion events and were responsible for an unexpectedly high degree of genomic variation in clinical M. tuberculosis. Group B and C LSPs may represent polymorphisms that occur due to selective pressure and affect the phenotype of the organism, while group A LSPs are preferable phylogenetic markers.

As pathogenic bacteria adapt to their host environments, virulence properties may change through the insertion or deletion (indel) of chromosomal regions and the gain or loss of genetic material (3, 4, 10, 11, 19, 25, 27, 29, 34). Mycobacterium tuberculosis is a major pathogen of humans, and genomic deletions (also known as large sequence polymorphisms [LSPs] or regions of difference) can also be detected in most clinical isolates of this species (16, 23, 32). Studies examining the biological role of LSPs in M. tuberculosis have been inconclusive (26). LSPs were demonstrated to be unique event polymorphisms (UEPs) in a study of 100 clinical M. tuberculosis isolates (23). It was also possible to perform an informative analysis of a large sample of clinical M. tuberculosis isolates by using these LSPs as phylogenetic markers (17). UEP refers to a mutation that has occurred once in the phylogeny of a species (i.e., is “unique”), is irreversible, and does not display homoplasy (23). The observation that most LSPs were UEPs suggested that LSPs were unlikely to have an important role in disease pathogenesis, because mutations that confer an evolutionary advantage to M. tuberculosis should be selected for repeatedly in the evolution of the species (2). Supporting this hypothesis, LSPs were found to have a possible attenuating effect on clinical disease in one retrospective study (26). A study of gene expression in 10 clinical M. tuberculosis isolates also demonstrated that LSPs predominately carried genes that were variably expressed or not expressed in broth cultures (18). These results suggested that LSPs do not generally involve functionally important proteins. Instead, a number of investigators have assumed that LSPs are selectively neutral and have used them as phylogenetic markers for population and evolutionary investigations (17, 23, 24).

Other studies have suggested that LSPs do have a critical role in M. tuberculosis pathogenesis. Clinical M. tuberculosis LSPs had low consistency indices when incorporated into a phylogeny composed of single-nucleotide polymorphisms (SNPs), LSPs, and clinical parameters (16). Furthermore, investigations of clinical strains have detected three apparent genomic “hot spots” for insertion of IS6110 and associated chromosomal deletions (13, 14, 24, 30). Genomic analysis indicates that LSPs almost always include segments of open reading frames (16, 23), although this may be due to a paucity of noncoding regions in the M. tuberculosis genome (7). Finally, Yang et al. (33) recently showed that clinical M. tuberculosis isolates with deletions in the plcD gene (one of the known deletion hot spots) are indeed phenotypically different, exhibiting a twofold increased risk of causing extrapulmonary tuberculosis. Taken together, these results indicate that some LSPs have evolved repeatedly in the radiation of M. tuberculosis and suggest that LSP-associated indels provide a selective advantage to certain M. tuberculosis strains.

Unfortunately, the rates of indels underlying M. tuberculosis LSPs cannot be conveniently measured in the laboratory. This makes it difficult to differentiate experimentally between mutations that are UEPs and mutations that have a tendency to occur repeatedly. Phylogenetic analysis makes an alternative approach available. Clinical strains containing a particular indel can be mapped onto a phylogenetic tree and then examined to determine whether or not they can be traced to a single ancestral event. Indels that have arisen independently multiple times in the population may have significant biological roles. Mutations that appear to have a single origin are more likely to represent UEPs that are evolutionarily neutral (2). A variation of this approach was undertaken in prior LSP studies (16, 23). However, these previous studies also used the LSPs themselves as markers in the phylogenetic tree construction. Furthermore, the largest of these studies defined distinct LSPs quite strictly, choosing to analyze LSPs separately if they had different deletion sites, even if the deletions mapped to identical or overlapping genes. In investigating the biology of LSPs, we propose that it is more important to categorize LSPs according to the gene or genes that are deleted (or inserted) rather than by the exact location of the indel site. This is because the effect of an LSP on microbial phenotype is more likely to be due to the genes that are disrupted or otherwise affected by the LSP rather than the exact indel sites where the LSP occurred. Therefore, we have favored a less restricted definition of LSP that is based on the presence or absence of a gene region rather than on the presence or absence of a specific deletion.

In this report, we present a phylogenetic analysis of gene deletions found in M. tuberculosis LSPs, using an “unequivocal” phylogenetic tree constructed with synonymous SNP markers. We present phylogenetic evidence that many of the gene regions contained within LSPs have been deleted (or possible inserted) multiple times as separate events in the history of M. tuberculosis divergence, and we identify several possible mechanisms for these genomic changes. Our results suggest that LSPs represent an important mechanism of genetic variation in M. tuberculosis and indicate that further investigations into the functional relevance of LSPs may provide insights into M. tuberculosis pathogenesis and immunity. LSPs, as defined in our study, that recur independently with high frequency may be precluded as phylogenetic markers.


Study population.

The study population has been described previously (16); it consisted of consecutive patients with positive cultures for M. tuberculosis identified at Montefiore Medical Center in the Bronx, NY, between 1989 and 1996. All isolates had been typed with IS6110-based restriction length polymorphism (RFLP) analysis and with a secondary typing procedure if necessary (1). Of the 319 available cultures from that period, 169 of the samples plus the M. tuberculosis reference strains H37Rv and CDC1551 were selected at random for SNP and LSP analysis. Six clinical samples gave indeterminate SNP or LSP results, enabling 163 clinical isolates plus H37Rv and CDC1551 to be included in the present study. The demographic and clinical characteristics of this subset were similar to those of the overall study population and were generally reflective of the diverse nature of New York City residents (1). This subset included M. tuberculosis isolates from a broad range of ethnicities and patients from at least 19 different known countries of origin.

LSP identification.

Eighty-six LSPs larger than 10 base pairs were identified by comparing the genomes of M. tuberculosis strains CDC1551 and H37Rv in a previous investigation (16). Seventeen LSPs were further studied: LSPs 1 through 12 were selected from sequences that were present in CDC1551 but absent from H37Rv; LSPs 13 to 17 were selected from sequences that were present in H37Rv but absent from CDC1551. DNA probes were then prepared for one gene in each LSP by PCR (16). We limited our study to 17 LSPs because of the technical complexity of studying each LSP in large numbers of M. tuberculosis samples. The coordinates for each probe and primer were described in this previous work. Approximately 2 μg of genomic DNA from the clinical M. tuberculosis isolates or CDC1551 and H37Rv was suspended in 2× SSC (1× SSC is 0.15 M NaCl plus 0.015 M sodium citrate) at a final volume of 200 μl. Each sample was boiled for 5 min and then cooled on ice. A multislot hybridization apparatus (Immunoblotter; Immunetics, MA) was assembled as per the manufacturers recommendations with the modification that the cushion was replaced with five pieces of dry 3-mm Whatman paper underneath one piece of 1-mm Whatman paper soaked in 2× SSC. A prewetted Biotrans Plus nylon membrane (ICN Pharmaceuticals, CA) was placed on top of the thin Whatman paper. The apparatus was assembled, and the cooled genomic DNA was bound in longitudinal strips onto the membrane by rapidly loading the DNA mixture into the apparatus. Bubbles were avoided inside the apparatus by loading a slight excess volume of DNA solution. The apparatus was then disassembled, and the membrane was removed, rinsed in 2× SSC, and then cross-linked with UV light. For identification of the LSPs present in each DNA sample, the membrane was prehybridized for 1 hour in Rapid Hyb buffer (Amersham, CT) at 69°C in a hybridization oven. The still-wet membrane was then reinserted into the multislot hybridization apparatus at 90°C from its previous orientation, using the manufacturer's cushion instead of Whatman paper (described above) to seal the apparatus. Each slot was then loaded with approximately 200 μl of boiled and then rapidly ice-cooled hybridization buffer containing γ-32P-labeled probes for the 17 LSPs. The openings of the apparatus were sealed with Parafilm, and the apparatus was incubated at 69°C with occasional gentle rocking for 2 h. The Parafilm was carefully removed, unhybridized probe was sucked out of each hybridization well using a vacuum attached to the wash device supplied by the manufacturer, and each slot was washed (again using the vacuum wash device) with 2× SSC. The apparatus was then dissembled; the membrane was washed one more time in 2× SSC and three times in 0.1× SSC at 69°C and then exposed on film. Using this protocol, 44 different genomic DNA samples could be slotted in an array consisting of 44 lines extending across the membrane. Hybridizing of probes for each LSP at a 90° angle to this array permitted every probe to come into contact with every genomic DNA sample. The presence of a particular LPS in a DNA sample was determined by examining the developed autoradiogram for dark spots. An example of a LSP blot has been shown previously (16).

SNP identification.

We had previously identified six SNP markers that were sufficient to classify a global M. tuberculosis collection into seven phylogenetically distinct “SNP cluster groups” (SCGs) (15). For the current study, we selected a different set of nine SNP markers that enabled us to further subdivide the SCGs into subgroups (SC subgroups), for a total of seven SCGs and five SC subgroups (Table (Table1).1). All of the study samples were then tested at the nine SNP loci by using hairpin primer assays as described previously (22) (Table (Table2),2), and the alleles were determined.

SNP set used to assign the SCGs and SC subgroups
Genome locations and hairpin assay primers used for the nine-SNP set

Phylogenetic analysis.

Each isolate was assigned to an SCG or SC subgroup according to the allele pattern at the nine SNP loci (Table (Table1)1) and plotted on a neighbor-joining phylogenetic tree previously created by analyzing a global M. tuberculosis collection using 212 SNP markers (15) (Fig. (Fig.1).1). The presence or absence of each LSP was scored as a binary, and the isolates were also classified into 58 LSP types (LSP-Ts) as defined by the distinct patterns in the present or absent LSPs for each isolate.

FIG. 1.
Phylogeny of the M. tuberculosis study isolates. M. tuberculosis isolates were assigned to each SCG or SC subgroup based on SNP alleles at nine loci. The SCG and SC subgroup designations had been defined in a previous work (15). The number of study strains ...


LSPs occur repeatedly in the M. tuberculosis genome.

In order to perform a phylogenetic analysis of the distributions of M. tuberculosis LSPs, it was first necessary to unambiguously establish the phylogeny of the 163 M. tuberculosis study isolates plus H37Rv and CDC1551. Each isolate was tested for the presence of nine SNP markers, and an SCG or SC subgroup was assigned to each isolate based on the pattern of its SNP alleles. The LSP and SNP alleles for each isolate are shown in Table S1 in the supplemental material. The typed isolates were then plotted onto a phylogenic tree of M. tuberculosis established previously (15). The study set was found to include members of all SCGs except SCG 7 (which contains primarily Mycobacterium bovis) (Fig. (Fig.1).1). Each M. tuberculosis SCG/SC subgroup contained an average of 18 isolates (range, 0 to 43) and an average of 13 different strains (range, 0 to 38) as defined by the presence of distinct RFLP patterns.

We selected 17 M. tuberculosis LSPs from a larger set of previously identified LSPs (16) to study their distribution on the strain phylogeny (Table (Table3).3). The distribution of these LSPs has not been previously examined in a set of phylogenetically characterized clinical strains. Three LSPs (LSPs 10, 11, and 13) were located near two IS1547 elements, which are known to be “hot spots” for IS6110 insertions (17). Each M. tuberculosis isolate was examined for the presence or absence of each of the 17 LSPs by probing for an internal DNA sequence. All of the LSPs were then mapped onto the phylogenetic tree. We found that the majority of LSPs did not appear to be UEPs. Unlike the distribution of the selectively neutral SNPs shown in a previous report (2), only four of the 17 LSPs studied (LSPs 1, 9, 13, and 16) (Fig. (Fig.2A)2A) were situated on the phylogenetic tree such that their presence could be explained by a single event in a common ancestor. We have called these LSPs group A LSPs in subsequent discussions. Two other LSPs (LSPs 12 and 14) appeared to have occurred independently at least two times (Fig. (Fig.2B).2B). We have called these LSPs group B LSPs. The remaining 11 LSPs (LSPs 2, 3, 4, 5, 6, 7, 8, 10, 11, 15, and 17) were situated on the phylogenetic tree such that they could not have arisen from a single common ancestor and must have arisen independently multiple times (Fig. (Fig.3).3). These LSPs were renamed group C LSPs.

FIG. 2.
Distribution of group A and group B LSPs on the SNP tree. M. tuberculosis strains containing each designated LSP are indicated next to each tree branch. Numbers refer to the total number of strains with the indicated LSP/total number of isolates with ...
FIG. 3.
Distribution of group C LSPs on the SNP tree. M. tuberculosis strains containing group C LSPs in this study are shown. Numbers refer to the total number of strains with the indicated LSP/total number of isolates with the indicated LSP. Thick lines are ...
LSP groups and their attributes

The genes that corresponded to the probes for each LSP were then examined (Table (Table3).3). We examined all of the genes that were deleted in each LSP as it was originally defined by the CDC1551-H37Rv genomic comparisons, although some clinical strains may have smaller or larger LSPs in each region. None of the group A LSPs and only one of the group B LSPs were flanked by IS6110 elements in either CDC1551 or H37Rv, while 5 of the 11 group C LSPs were flanked by IS6110 elements. These results suggest that recombination between IS6110 elements is one of the mechanisms that generate LSPs that reoccur frequently. Indeed, we also noted that the IS6110 elements adjacent to the locations of four of the group C LSPs (LSPs 3, 4, 10, and 11) lack the characteristic 3- to 4-bp direct repeats indicative of recombination between IS6110 elements (16). This adds further support to the hypothesis that IS6110 is an important driving force for large sequence diversity in M. tuberculosis (5, 13, 24, 30, 31). Genes encoding members of the proline-glutamic acid, proline-glutamic acid polymorphic GC-rich sequence, or proline-proline-glutamic acid protein families were not present in any of the group A LSPs, while one of the group B LSPs and three group C LSPs contained PPE genes. Recombination and deletion between these genes that have substantial sequence similarity might represent a second mechanism for LSP generation. However, the number of LSPs examined was too small to reasonably test for statistical differences in PPE gene frequency among the LSP groups. Despite these two proposed mechanisms for recurrent LSP generation, one of two group B LSPs and three of 11 group C LSPs were not associated with either flanking IS6110 sequences or repetitive genes.

The LSPs associated with IS6110 in the reference strains did not occur at a higher frequency than other LSPs. The phylogenetic analysis of each LSP (Fig. (Fig.22 and and3)3) suggested that there were 65 independent LSP events in the 165 M. tuberculosis isolates (Table (Table3)3) (this population contained many more LSPs, but a group of phylogenetically related isolates with the same LSP were considered to constitute one LSP event). Approximately one-third (6/17) of the LSPs studied were associated with IS6110, and these LSPs were associated with 27/65 (42%) of the independent LSPs in the population. This did not differ significantly from the approximately two-thirds (11/17) of the LSPs studied that were not associated with IS6110 in the reference strains. These LSPs accounted for at least 38/65 (58%) of the independent LSPs.

Twelve of the 17 LSPs in this study represent sequences that are absent in H37Rv but present in CDC1551 (although LSP 6 appears to be present in some H37Rv isolates and must therefore have been deleted recently in a subset of H37Rv isolates in experimental use [16]). Each of these LSPs was also found to be missing in at least one clinical isolate, demonstrating that the H37Rv LSPs did not include unique deletion events that might have occurred as a consequence of a prolonged in vitro culture.

Confirmation of LSP identification and variability within IS6110-defined clusters.

It was important to ensure that the results of this study were not due to artifacts of the LSP identification process. Inconsistencies in detecting LSPs could make it falsely appear as if LSPs were occurring repeatedly as independent events. Repeated probing of the same strain gave identical LSP results, suggesting that the LSP identification process was sound. We also examined strains that were identical by IS6110 RFLP analysis to determine if these closely related strains contained the same LSPs. We found only six instances, in 17 clusters involving 66 isolates, where two isolates within a cluster did not have exactly the same LSP pattern. In each of these cases, only 1 of the 17 LSPs was discordant between the isolates. Furthermore, all six of the mismatched LSPs were group C LSPs (four were LSP 6, one was LSP 2, and one was LSP 10). These results suggest that the small variation in LSP patterns that we observed within isolates of a cluster is due to the propensity of M. tuberculosis to develop independent deletions in these regions. The exact time frame of LSP generation cannot be deduced from this study because the epidemiological connections among the clustered isolates were not well characterized in our data set. Prior reports suggest that differences in LSP patterns are not observed among RFLP-identical isolates with known epidemiological links (16). However, these results do strongly suggest that different LSPs are generated at different rates.

Phylogenetic analysis of M. tuberculosis populations by using LSP markers.

LSPs appear to be useful phylogenetic markers for studies of M. tuberculosis (20, 23, 28), especially when the specific identity of each LSP can be confirmed by sequencing the ends of each deletion (23). End sequencing makes it possible to identify which deletions within a similar genome region are, in fact, independent deletions. However, large-scale sequencing of deletion sites is not practical, and even PCR-based identification of specific deletion sites may be difficult if LSPs of similar sizes occur near the same genomic locus. We studied the ability of the LSPs identified in this study to accurately describe phylogenetic relationships among M. tuberculosis isolates. Each of the 163 clinical isolates (H37Rv and CDC1551 were not included in this analysis) were classified into one of 58 LSP types, based on the pattern of LSPs that were present (see Table S1 in the supplemental material). Each LSP-T was then located on the SNP tree, and the proximity of all of the isolates with the same LSP-T was examined. LSP-Ts that placed M. tuberculosis isolates together in a manner that was consistent with the SNP tree would be considered good phylogenetic assignments. LSP-Ts that conflicted with the SNP tree would represent inaccurate assignments. Our results showed that LSP-Ts situated most of the M. tuberculosis isolates on the same or an adjoining branch of the SNP tree (Fig. (Fig.4).4). However, six of the LSP-Ts incorrectly grouped isolates together that were more distantly related according to the SNP tree (Fig. (Fig.4,4, LSP-Ts 3, 5, 7, 9, 14, and 15). Many of the LSP-Ts contained only a single M. tuberculosis isolate. We performed a secondary analysis restricted to commonly occurring LSP-Ts by eliminating LSP-Ts that contained fewer than two isolates. This analysis reduced the study to 31 LSP-Ts and 137 isolates. We found that 6/31 (19%) of the LSP-Ts that contained two or more isolates continued to produce important conflicts with the SNP tree. These results confirm our findings with the total study sample.

FIG. 4.
Locations of LSP-Ts on the SNP tree. The locations of clinical M. tuberculosis strains identified by LSP-T are shown relative to the location of each SCG and SC subgroup on the SNP tree. Colored LSP-Ts and connecting lines indicate LSP-Ts that are present ...


This study suggests that LSPs are a substantial source of diversity within the M. tuberculosis genome. While some LSPs appeared to represent rare events in the population, the majority of LSPs appeared to have been generated multiple times in the divergence of M. tuberculosis strains. The low frequency of group A and B LSP events suggests that these LSPs arose from random genomic events and have become associated with a particular phylogenetic lineage. These LSPs may have occurred in the absence of special mechanism for generating genomic change at high frequency. We suspect that these LSPs are unlikely to result in a selective advantage for the organism; however, this is very difficult to test without additional data.

Group C LSPs are much more variable and appear to have been generated by at least two mechanisms. Forty-five percent of the group C LSPs were flanked by IS6110 transposable elements on at least one side of a reference strain. The presence of IS6110 in proximity to LSP regions that are not present (and likely to be deleted) in other isolates suggests that recombination between nearby IS6110 elements produced a deletion, creating the LSP. IS6110 transposition events may be advantageous, neutral, or detrimental to the bacterial cell depending on the genes involved. Yang et al. (33) have shown that plcD deletions (LSP 4, a group C LSP flanked by IS6110 in our study) do indeed affect bacterial phenotype, in this case showing a strong association with extrapulmonary tuberculosis. This work supports the hypothesis that the variation associated with group C LSPs affects bacterial phenotype (although it is unclear whether an extrapulmonary phenotype should be considered selectively advantageous); it also provides further evidence that IS6110 is a contributing force driving genetic diversity in the M. tuberculosis complex. Indeed, as IS6110 may also be present in the clinical isolates at sites where H37Rv and CDC1551 do not contain IS6110, this element may be playing an even more pivotal role. We speculate that the group C LSPs have occurred under positive selective pressure, and these deletions (LSPs) enhance transmission and other virulence features of M. tuberculosis. Day et al. have demonstrated similar events in Shigella strains, where parallel losses of the cadA locus in different lineages of Shigella were found to be pathoadaptive (8). An alternative hypothesis is that group C LSPs represent highly unstable genomic regions that are repeatedly deleted because the genes encompassed by these LSPs are nonfunctional. Under these circumstances, the repeated loss of these genes could reflect a selective advantage for loss of nonfunctional DNA. However, observations in other bacteria suggest that deletion of nonfunctional DNA is a progressive occurrence that begins with mutation of nonfunctional genes into pseudogenes and is only later followed by a series of deletion events (21). In M. tuberculosis, there is no evidence that any of the deleted genes have mutated to pseudogenes. One of the group B and three of the group C LSPs were in PPE genes that others have speculated may be involved in immune variation and evasion (6, 9, 12). Their recurrent deletion in different M. tuberculosis lineages is consistent with the hypothesis that these are escape mutants created by silencing these gene products during the course of infection of mammalian hosts.

Our findings do not directly contradict the work of Hirsh et al. (23), which suggested that virtually all LSPs were unique evolutionary events. First, this prior investigation excluded LSPs originating or terminating in PPE genes, whereas these LSPs were included in our study. Second, our investigation included regions that were deleted in H37Rv relative to the genome of CDC1551. Hirsh et al. examined only regions that are missing in clinical isolates relative to the genome of H37Rv. Finally, we used a hybridization-based approach to identify the presence or absence of genomic regions known to be encompassed by LSPs. In contrast, Hirsh et al. sequenced across each end of the LSP, confirming the exact deletion sites and distinguishing among similar deletion events. It is likely that a reanalysis of this previous work would demonstrate that many LSPs overlap, differing only at the specific deletion sites, and would confirm our observation that many genomic regions were likely to be deleted independently.

Other investigators have suggested that LSPs can provide an accurate genetic marker system for molecular epidemiological and evolutionary studies of M. tuberculosis (20, 28). Our results suggest that LSPs may be informative markers in situations where discrimination of strains is the main objective. However, phylogenetic inference will be complicated by the multiple origins and parallel evolution of many LSPs, which will generate incompatibilities with other phylogenetic markers such as SNP loci. The extent to which this problem can be alleviated by direct sequencing of LSP deletion sites requires further study.

In summary, this work demonstrates that LSPs are predominately genomic deletions that result in an unexpected degree of genomic plasticity in clinical M. tuberculosis isolates. At least one-third of the plasticity in specific genomic regions appears to involve recombination between IS6110 elements in the region. The repeated evolution of some LSPs suggests that these polymorphisms are a critical source of genetic variation that is adaptive and may underlie variation in virulence among M. tuberculosis strains; however, this is difficult to test and warrants future investigational studies of pathogenicity and immunity.

Supplementary Material

[Supplemental material]


This work was supported by Public Health Service grants AI-46669 and AI-49352 from the National Institutes of Health.


[down-pointing small open triangle]Published ahead of print on 1 November 2006.

Supplemental material for this article may be found at http://jcm.asm.org/.


1. Alland, D., G. E. Kalkut, A. R. Moss, R. A. McAdam, J. A. Hahn, W. Bosworth, E. Drucker, and B. R. Bloom. 1994. Transmission of tuberculosis in New York City. An analysis by DNA fingerprinting and conventional epidemiologic methods. N. Engl. J. Med. 330:1710-1716. [PubMed]
2. Alland, D., T. S. Whittam, M. B. Murray, M. D. Cave, M. H. Hazbon, K. Dix, M. Kokoris, A. Duesterhoeft, J. A. Eisen, C. M. Fraser, and R. D. Fleischmann. 2003. Modeling bacterial evolution with comparative-genome-based marker systems: application to Mycobacterium tuberculosis evolution and pathogenesis. J. Bacteriol. 185:3392-3399. [PMC free article] [PubMed]
3. Baek, S. H., G. Rajashekara, G. A. Splitter, and J. P. Shapleigh. 2004. Denitrification genes regulate Brucella virulence in mice. J. Bacteriol. 186:6025-6031. [PMC free article] [PubMed]
4. Blaser, M. J., and J. C. Atherton. 2004. Helicobacter pylori persistence: biology and disease. J. Clin. Investig. 113:321-333. [PMC free article] [PubMed]
5. Brosch, R., W. J. Philipp, E. Stavropoulos, M. J. Colston, S. T. Cole, and S. V. Gordon. 1999. Genomic analysis reveals variation between Mycobacterium tuberculosis H37Rv and the attenuated M. tuberculosis H37Ra strain. Infect. Immun. 67:5768-5774. [PMC free article] [PubMed]
6. Choudhary, R. K., R. Pullakhandam, N. Z. Ehtesham, and S. E. Hasnain. 2004. Expression and characterization of Rv2430c, a novel immunodominant antigen of Mycobacterium tuberculosis. Protein Expr. Purif. 36:249-253. [PubMed]
7. Cole, S. T., R. Brosch, J. Parkhill, T. Garnier, C. Churcher, D. Harris, S. V. Gordon, K. Eiglmeier, S. Gas, C. E. Barry III, F. Tekaia, K. Badcock, D. Basham, D. Brown, T. Chillingworth, R. Connor, R. Davies, K. Devlin, T. Feltwell, S. Gentles, N. Hamlin, S. Holroyd, T. Hornsby, K. Jagels, A. Krogh, J. McLean, S. Moule, L. Murphy, K. Oliver, J. Osborne, M. A. Quail, M. A. Rajandream, J. Rogers, S. Rutter, K. Seeger, J. Skelton, R. Squares, S. Squares, J. E. Sulston, K. Taylor, S. Whitehead, and B. G. Barrell. 1998. Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence. Nature 393:537-544. [PubMed]
8. Day, W. A., Jr., R. E. Fernandez, and A. T. Maurelli. 2001. Pathoadaptive mutations that enhance virulence: genetic organization of the cadA regions of Shigella spp. Infect. Immun. 69:7471-7480. [PMC free article] [PubMed]
9. Delogu, G., and M. J. Brennan. 2001. Comparative immune response to PE and PE_PGRS antigens of Mycobacterium tuberculosis. Infect. Immun. 69:5606-5611. [PMC free article] [PubMed]
10. de Visser, J. A., A. D. Akkermans, R. F. Hoekstra, and W. M. de Vos. 2004. Insertion-sequence-mediated mutations isolated during adaptation to growth and starvation in Lactococcus lactis. Genetics 168:1145-1157. [PMC free article] [PubMed]
11. Ernst, R. K., D. A. D'Argenio, J. K. Ichikawa, M. G. Bangera, S. Selgrade, J. L. Burns, P. Hiatt, K. McCoy, M. Brittnacher, A. Kas, D. H. Spencer, M. V. Olson, B. W. Ramsey, S. Lory, and S. I. Miller. 2003. Genome mosaicism is conserved but not unique in Pseudomonas aeruginosa isolates from the airways of young children with cystic fibrosis. Environ Microbiol. 5:1341-1349. [PubMed]
12. Espitia, C., J. P. Laclette, M. Mondragon-Palomino, A. Amador, J. Campuzano, A. Martens, M. Singh, R. Cicero, Y. Zhang, and C. Moreno. 1999. The PE-PGRS glycine-rich proteins of Mycobacterium tuberculosis: a new family of fibronectin-binding proteins? Microbiology 145:3487-3495. [PubMed]
13. Fang, Z., C. Doig, D. T. Kenna, N. Smittipat, P. Palittapongarnpim, B. Watt, and K. J. Forbes. 1999. IS6110-mediated deletions of wild-type chromosomes of Mycobacterium tuberculosis. J. Bacteriol. 181:1014-1020. [PMC free article] [PubMed]
14. Fang, Z., and K. J. Forbes. 1997. A Mycobacterium tuberculosis IS6110 preferential locus (ipl) for insertion into the genome. J. Clin. Microbiol. 35:479-481. [PMC free article] [PubMed]
15. Filliol, I., A. S. Motiwala, M. Cavatore, W. Qi, M. H. Hazbon, M. Bobadilla del Valle, J. Fyfe, L. Garcia-Garcia, N. Rastogi, C. Sola, T. Zozio, M. I. Guerrero, C. I. Leon, J. Crabtree, S. Angiuoli, K. D. Eisenach, R. Durmaz, M. L. Joloba, A. Rendon, J. Sifuentes-Osornio, A. Ponce de Leon, M. D. Cave, R. Fleischmann, T. S. Whittam, and D. Alland. 2006. Global phylogeny of Mycobacterium tuberculosis based on single-nucleotide polymorphism (SNP) analysis: insights into tuberculosis evolution, phylogenetic accuracy of other DNA fingerprinting systems, and recommendations for a minimal standard SNP set. J. Bacteriol. 188:759-772. [PMC free article] [PubMed]
16. Fleischmann, R. D., D. Alland, J. A. Eisen, L. Carpenter, O. White, J. Peterson, R. DeBoy, R. Dodson, M. Gwinn, D. Haft, E. Hickey, J. F. Kolonay, W. C. Nelson, L. A. Umayam, M. Ermolaeva, S. L. Salzberg, A. Delcher, T. Utterback, J. Weidman, H. Khouri, J. Gill, A. Mikula, W. Bishai, W. R. Jacobs, Jr., J. C. Venter, and C. M. Fraser. 2002. Whole-genome comparison of Mycobacterium tuberculosis clinical and laboratory strains. J. Bacteriol. 184:5479-5490. [PMC free article] [PubMed]
17. Gagneux, S., K. DeRiemer, T. Van, M. Kato-Maeda, B. C. de Jong, S. Narayanan, M. Nicol, S. Niemann, K. Kremer, M. C. Gutierrez, M. Hilty, P. C. Hopewell, and P. M. Small. 2006. Variable host-pathogen compatibility in Mycobacterium tuberculosis. Proc. Natl. Acad. Sci. USA 103:2869-2873. [PMC free article] [PubMed]
18. Gao, Q., K. E. Kripke, A. J. Saldanha, W. Yan, S. Holmes, and P. M. Small. 2005. Gene expression diversity among Mycobacterium tuberculosis clinical isolates. Microbiology 151:5-14. [PubMed]
19. Goerke, C., S. Matias y Papenberg, S. Dasbach, K. Dietz, R. Ziebach, B. C. Kahl, and C. Wolz. 2004. Increased frequency of genomic alterations in Staphylococcus aureus during chronic infection is in part due to phage mobilization. J. Infect. Dis. 189:724-734. [PubMed]
20. Goguet de la Salmoniere, Y. O., C. C. Kim, A. G. Tsolaki, A. S. Pym, M. S. Siegrist, and P. M. Small. 2004. High-throughput method for detecting genomic-deletion polymorphisms. J. Clin. Microbiol. 42:2913-2918. [PMC free article] [PubMed]
21. Gomez-Valero, L., A. Latorre, and F. J. Silva. 2004. The evolutionary fate of nonfunctional DNA in the bacterial endosymbiont Buchnera aphidicola. Mol. Biol Evol. 21:2172-2181. [PubMed]
22. Hazbon, M. H., and D. Alland. 2004. Hairpin primers for simplified single-nucleotide polymorphism analysis of Mycobacterium tuberculosis and other organisms. J. Clin. Microbiol. 42:1236-1242. [PMC free article] [PubMed]
23. Hirsh, A. E., A. G. Tsolaki, K. DeRiemer, M. W. Feldman, and P. M. Small. 2004. Stable association between strains of Mycobacterium tuberculosis and their human host populations. Proc. Natl. Acad. Sci. USA 101:4871-4876. [PMC free article] [PubMed]
24. Ho, T. B., B. D. Robertson, G. M. Taylor, R. J. Shaw, and D. B. Young. 2000. Comparison of Mycobacterium tuberculosis genomes reveals frequent deletions in a 20 kb variable region in clinical isolates. Yeast 17:272-282. [PMC free article] [PubMed]
25. Israel, D. A., N. Salama, C. N. Arnold, S. F. Moss, T. Ando, H. P. Wirth, K. T. Tham, M. Camorlinga, M. J. Blaser, S. Falkow, and R. M. Peek, Jr. 2001. Helicobacter pylori strain-specific differences in genetic content, identified by microarray, influence host inflammatory responses. J. Clin. Investig. 107:611-620. [PMC free article] [PubMed]
26. Kato-Maeda, M., J. T. Rhee, T. R. Gingeras, H. Salamon, J. Drenkow, N. Smittipat, and P. M. Small. 2001. Comparing genomes within the species Mycobacterium tuberculosis. Genome Res. 11:547-554. [PMC free article] [PubMed]
27. Kuipers, E. J., D. A. Israel, J. G. Kusters, M. M. Gerrits, J. Weel, A. van Der Endeqq, R. W. van Der Hulstqq, H. P. Wirth, J. Hook-Nikanne, S. A. Thompson, and M. J. Blaser. 2000. Quasispecies development of Helicobacter pylori observed in paired isolates obtained years apart from the same host. J. Infect. Dis. 181:273-282. [PMC free article] [PubMed]
28. Mostowy, S., D. Cousins, J. Brinkman, A. Aranaz, and M. A. Behr. 2002. Genomic deletions suggest a phylogeny for the Mycobacterium tuberculosis complex. J. Infect. Dis. 186:74-80. [PubMed]
29. Pearson, B. M., C. Pin, J. Wright, K. I'Anson, T. Humphrey, and J. M. Wells. 2003. Comparative genome analysis of Campylobacter jejuni using whole genome DNA microarrays. FEBS Lett. 554:224-230. [PubMed]
30. Sampson, S. L., M. Richardson, P. D. Van Helden, and R. M. Warren. 2004. IS6110-mediated deletion polymorphism in isogenic strains of Mycobacterium tuberculosis. J. Clin. Microbiol. 42:895-898. [PMC free article] [PubMed]
31. Sreevatsan, S., X. Pan, K. E. Stockbauer, N. D. Connell, B. N. Kreiswirth, T. S. Whittam, and J. M. Musser. 1997. Restricted structural gene polymorphism in the Mycobacterium tuberculosis complex indicates evolutionarily recent global dissemination. Proc. Natl. Acad. Sci. USA 94:9869-9874. [PMC free article] [PubMed]
32. Tsolaki, A. G., A. E. Hirsh, K. DeRiemer, J. A. Enciso, M. Z. Wong, M. Hannan, Y. O. Goguet de la Salmoniere, K. Aman, M. Kato-Maeda, and P. M. Small. 2004. Functional and evolutionary genomics of Mycobacterium tuberculosis: insights from genomic deletions in 100 strains. Proc. Natl. Acad. Sci. USA 101:4865-4870. [PMC free article] [PubMed]
33. Yang, Z., D. Yang, Y. Kong, L. Zhang, C. F. Marrs, B. Foxman, J. H. Bates, F. Wilson, and M. D. Cave. 2005. Clinical relevance of Mycobacterium tuberculosis plcD gene mutations. Am. J. Respir. Crit. Care Med. 171:1436-1442. [PMC free article] [PubMed]
34. Zhong, S., A. Khodursky, D. E. Dykhuizen, and A. M. Dean. 2004. Evolutionary genomics of ecological specialization. Proc. Natl. Acad. Sci. USA 101:11719-11724. [PMC free article] [PubMed]

Articles from Journal of Clinical Microbiology are provided here courtesy of American Society for Microbiology (ASM)
PubReader format: click here to try


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


  • EST
    Published EST sequences
  • Gene (nucleotide)
    Gene (nucleotide)
    Records in Gene identified from shared sequence links
  • MedGen
    Related information in MedGen
  • Nucleotide
    Published Nucleotide sequences
  • PubMed
    PubMed citations for these articles
  • Taxonomy
    Related taxonomy entry
  • Taxonomy Tree
    Taxonomy Tree

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...