![]() | ![]() |
Formats:
|
|||||||||||||||||||||||||||||||||||||
Copyright : © 2007 Bentley et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Meningococcal Genetic Variation Mechanisms Viewed through Comparative Analysis of Serogroup C Strain FAM18 1 Wellcome Trust Sanger Institute, Hinxton, United Kingdom 2 Bacterial Pathogenesis and Functional Genomics Group, Sir William Dunn School of Pathology, University of Oxford, Oxford, United Kingdom 3 Molekulare Biologie, Max-Planck Institut für Infektionsbiologie, Berlin, Germany Claire M Fraser-Liggett, Editor The Institute for Genomic Research, United States of America * To whom correspondence should be addressed. E-mail: sdb/at/sanger.ac.uk Received September 8, 2006; Accepted December 21, 2006. This article has been cited by other articles in PMC.Abstract The bacterium Neisseria meningitidis is commonly found harmlessly colonising the mucosal surfaces of the human nasopharynx. Occasionally strains can invade host tissues causing septicaemia and meningitis, making the bacterium a major cause of morbidity and mortality in both the developed and developing world. The species is known to be diverse in many ways, as a product of its natural transformability and of a range of recombination and mutation-based systems. Previous work on pathogenic Neisseria has identified several mechanisms for the generation of diversity of surface structures, including phase variation based on slippage-like mechanisms and sequence conversion of expressed genes using information from silent loci. Comparison of the genome sequences of two N. meningitidis strains, serogroup B MC58 and serogroup A Z2491, suggested further mechanisms of variation, including C-terminal exchange in specific genes and enhanced localised recombination and variation related to repeat arrays. We have sequenced the genome of N. meningitidis strain FAM18, a representative of the ST-11/ET-37 complex, providing the first genome sequence for the disease-causing serogroup C meningococci; it has 1,976 predicted genes, of which 60 do not have orthologues in the previously sequenced serogroup A or B strains. Through genome comparison with Z2491 and MC58 we have further characterised specific mechanisms of genetic variation in N. meningitidis, describing specialised loci for generation of cell surface protein variants and measuring the association between noncoding repeat arrays and sequence variation in flanking genes. Here we provide a detailed view of novel genetic diversification mechanisms in N. meningitidis. Our analysis provides evidence for the hypothesis that the noncoding repeat arrays in neisserial genomes (neisserial intergenic mosaic elements) provide a crucial mechanism for the generation of surface antigen variants. Such variation will have an impact on the interaction with the host tissues, and understanding these mechanisms is important to aid our understanding of the intimate and complex relationship between the human nasopharynx and the meningococcus. Author Summary Human surface tissues, including the skin and gut lining, are host to many different species of bacteria. N. meningitidis is a species of bacteria that is only found in humans where it is able to colonise mucosal surfaces of the nasopharynx (nose and throat). This association is normally harmless and at any one time around 15% of the population are carriers. Some strains of N. meningitidis can cause disease by invading the host tissue leading to septicaemia or meningitis. We aim to gain understanding of the mechanisms by which these bacteria cause disease by studying and comparing genomes from different strains. Here we describe specific genes and associated repetitive DNA sequences that are involved in variation of the bacterial cell surface. The repeat sequences encourage the swapping of genes that code for variant copies of cell surface proteins. The resulting variation of the bacterial cell surface appears to be important in the close interaction between host and bacteria and the potential for disease. Introduction N. meningitidis (the meningococcus) colonizes the nonciliated columnar mucosal cells of the human nasopharynx as a harmless commensal organism and, as such, is carried by five to ten percent of the adult population [1,2]. Some strains are able to cross the mucosa into the bloodstream from where they can cause septicaemia or meningitis and, as a result, are a major cause of disease worldwide [2]. Several genetic loci have been associated with disease [3,4], but for most strains the mechanism of virulence is not well defined. The close interaction with the human host is reflected in enriched diversity and variability at the bacterial cell surface. There are 12 different polysaccharide capsules, which are the basis of serogrouping, some of which are virulence determinants [5–7]. Vaccines targeted to the capsule types most commonly associated with disease have been successful, though capsule switching is a cause of concern [8]. Many meningococcal surface-exposed proteins and carbohydrates are also highly variable, creating a major challenge in the development of a universal meningococcal vaccine [9,10]. Current models of bacterial populations describe a spectrum of structures ranging from clonal, where lineages are derived from a common ancestor and horizontal genetic exchange plays no role, to nonclonal (or panmictic), where rates of horizontal genetic exchange are so high that genetic differences between isolates are effectively randomised and individual genetic lineages are undetectable [11]. Extremes are rare with many bacteria having a semiclonal structure where horizontal exchange is common, but groups of clonally related bacteria exist. Multilocus sequence typing has played a major role in defining bacterial population structure and shows N. meningitidis to have a fundamentally nonclonal population due to the natural competence and high rates of recombination that characterise the species [12–14]. However, multilocus sequence typing is able to resolve N. meningitidis into groups of related sequence types known as clonal complexes, and studies have shown that while there is enormous diversity in the population as a whole there are relatively few lineages associated with the ability to cause disease [15,16]. Most disease causing strains belong to serogroups A, B, or C, but it is clear that membership of one of the hyperinvasive lineages is equally predictive of the ability to cause disease. The genomes we analyse here represent three (of around ten) such disease associated lineages (Table 1), and it is hoped that comparative genomics will help to unravel the paradox of devastating virulence in an organism that relies on asymptomatic carriage and person-to-person transmission for its proliferation.
The genome sequences of N. meningitidis strains Z2491 [17] and MC58 [18], which belong to serogroups A and B respectively, have been reported previously and allowed the initial identification of both known and potentially novel mechanisms for variation of surface structures, including the transfer of coding information from silent gene cassettes, phase variation through slippage-like mechanisms, local recombination, and the presence of arrays of short noncoding repeats throughout the chromosome. These repeat arrays were postulated to increase the variability of the associated genes through enhanced recombination with externally acquired DNA [17]. Here we report the genome sequence of N. meningitidis serogroup C strain FAM18, a medically important representative of the ET-37/ST-11 complex, which has been a major cause of meningococcal disease worldwide throughout the last century [19] and, despite low carriage rates, continues to be associated with sporadic outbreaks [20–23]. Strain FAM18 was isolated from the cerebrospinal fluid of a child suffering from meningitis by Dr. Janne Cannon and colleagues in North Carolina in the 1980s and remains capable of infection. The genomes of these three strains have been incorporated into pan-Neisseria DNA microarrays and used in three separate comparative genomic hybridisation studies to compare the gene contents of a range of isolates including N. gonorrhoeae strains, invasive and carriage strains of N. meningitidis, and commensal species of Neisseria [24–26]. These studies highlighted differential acquisition of islands between strains, evidence of horizontal DNA transfer between N. meningitidis and N. lactamica, and the potential to define virulence-specific gene sets. Here, we have compared the FAM18 genome data with those of strains Z2491 and MC58, to specifically focus upon local sequence divergence and provide evidence for mechanisms whereby recombination of exogenous DNA with specific chromosomal loci is promoted to generate variation in cell surface antigens. Results/Discussion We sequenced and annotated the genome of N. meningitidis serogroup C strain FAM18 by standard methods and protocols. The addition of the sequence data present in the FAM18 genome sequence to that from the two previously sequenced N. meningitidis genomes (serogroup A strain Z2491 and serogroup B strain MC58) enabled a three-way whole genome comparison between representatives of three disease-associated lineages within this species. Table 1 shows the general features of these strains and their genomes. For convenience we will subsequently refer to the three strains simply as Z2491, MC58, and FAM18. Genome Structure The three sequenced N. meningitidis genomes are largely colinear with only three apparent reciprocal inversions around the origin of replication (Figure 1
The inversion event closest to the origin of replication (IE1; 3′-adjacent to NMA0220/NMB0050/NMC0034) seems to be due to recombination between repeat arrays in Z2491. Interestingly, one of the arrays flanks a pilin gene, pilC2, while the other is adjacent to genes involved in pilus retraction (pilTU). Thus, in MC58 and FAM18, pilC2 is adjacent to pilTU, while in Z2491 they are distant. Neisserial PilC proteins are important components of the type IV pilus machinery involved in adhesion to host cells, promotion of piliation, and transformation competence [27,28]. The PilT protein is essential for pilus retraction [29], and it has been shown that PilC1 regulates PilT-mediated pilus fibre retraction [28]. Although there is no direct published evidence for coregulated transcription of pilTU and pilC2, it seems plausible that this rearrangement may have an effect on pilus phenotype. The FAM18 pilus regulon also differs from Z2491 and MC58 because of the deletion of much of the pilE/S locus and the insertion of the class II pilin-encoding pilE2 (see below). Further variability at IE1 can be seen with the insertion of a copy of a meningococcal disease associated (MDA) island in the repeat array (see below) directly upstream of pilC2 in FAM18. The MDA island encodes a filamentous bacteriophage that is secreted via the type IV pilus and is specifically associated with strains that have the potential to cause disease [30]. The second inversion, IE2 (5′-adjacent to NMA2200/ NMB0287/NMC0293), is probably due to recombination between copies of IS1106 in FAM18 and is also associated with the insertion of a locus encoding a putative restriction-modification system in FAM18. Restriction-modification systems coordinate the recognition and destruction of “non-self” DNA from sources lacking the same system and for N. meningitidis have been associated with specific lineages [31]. The third, IE3, is the most complex of the three inversion events and seems to be due to recombination between loci encoding a large repetitive surface protein and its associated secretion system (NMA0688, NMB0497, NMB1779, and NMC0444; see below). These loci appear to encode two-partner, or type V, secretion systems [32] and are similar in sequence and genetic arrangement to those of Bordetella species where fhaC and fhaB, respectively, encode a secretion accessory protein and a filamentous haemagglutinin important in virulence [33]. Z2491 and FAM18 have a single copy of this locus, while MC58 has two copies that are approximately equidistant from the origin of replication and are the foci of the rearrangement. Prior to duplication the locus has also acquired a novel set of two-partner secretion protein genes and an MDA-related prophage. Duplication of the whole locus and subsequent recombination involving another MDA island may have lead to the current genomic arrangement, which would appear to have benefited MC58 with a greater potential variety of surface protein expression. All three of the reciprocal inversion foci that we have described here seem to affect genetic loci with the potential to modulate interaction with the host and/or other strains of N. meningitidis. Despite frequent inter-strain recombination, N. meningitidis genomes maintain a high level of colinearity, so it may be the case that the rearrangements observed in this three-genome comparison have added significance. Three-Way Coding Sequence Comparison The predicted amino acid sequences of the coding sequences (CDS) from each of the genome annotations were compared by three-way reciprocal Fasta analysis to assess the numbers of orthologous and unique CDS. The latter were defined as CDS where a reciprocal match was not detected in either of the other two translated genome sequences. Visualisation and manual curation of the results of this analysis using the Artemis Comparison Tool revealed limitations of the test. This analysis methodology did not take into account the relative chromosomal position of the genes, so the best matches between genes of the different genomes could be those that are in different chromosomal contexts and, therefore, likely to be paralogues (genes of similar sequence in the same genome) rather than true orthologues. Examples of characteristic features in N. meningitidis that confound the reciprocal match test include CDS within loci encoding variable surface proteins such as adhesins or haemagglutinins. In some cases multiple paralogous loci exist within each genome and may be exchanging DNA by intra- and/or inter-genomic recombination. The result is that syntenic loci (those in the same position) are equally diverged from one another as they are from nonsyntenic loci (see below). For convenience we have designated such genes as “variable” to distinguish them from simple orthologues. Paralogous CDS at nonsyntenic loci are also designated as variable. Variable genes tend to occur in clusters, and there is a clear correlation between these gene clusters and regions of low % G + C content. Viewed on a whole genome scale, eight of the nine most prominent GC troughs across the genome of FAM18 coincide with variable loci with the one exception being the ribosomal protein operon (NMC0129–NMC0159) (Figure S1). It was observed in Helicobacter pylori that genes that are not universally present across a number of strains, and are therefore likely to be laterally acquired, tend to have a lower than average GC content [34], and a similar bias has been seen in related enteric genomes [35]. It has been suggested that accessory genes (those variably present in different strains within a species) may be subject to different selective pressures to the core genes, and that low % G + C content is one of the results of this difference [36]. It is therefore possible that the low % G + C nature of the variable genes in N. meningitidis may be a consequence of selection for exchange within the species. In addition, the three meningococcal genome sequences were compared using ACEDB, as described previously [37], to identify unique coding sequences, regardless of their annotation in their respective genomes. The results of these two analysis methodologies were combined and, following manual curation, 240 unique genes were identified; 83 (4.1%) in Z2491, 97 (4.8%) in MC58, and 60 (3.0%) in FAM18. Table 2 summarizes the types and numbers of regions containing unique genes and Table S1 details the individual CDS functional annotations. The majority encode hypothetical proteins of unknown function. This is to be expected, because strain-specific genes are generally poorly studied, and largely do not form part of the common and core metabolic functions that have been most studied, and are most readily identifiable through comparison with other well-studied species and biochemical pathways. There are some unique restriction-modification systems and these would be expected to have an impact on the uptake of DNA from N. menigitidis strains in the same niche; such systems have previously been shown to be associated with different lineages [5,38,39].
The majority (39 of 56) of the unique gene clusters contain three or fewer consecutive genes, and 30 of these (68 genes) correspond to known or candidate Minimal Mobile Elements [40] with alternative unique loci present at syntenic locations across the three genomes. With the exception of dam in FAM18, all of the unique restriction-modification system genes are within MMEpheST or MMErfaDclpA. Larger unique regions are often associated with insertion sequence elements (nine of 56 clusters; 53 genes; Table S1) and with a Mu-like prophage (pnm2) present at the same location in all three genomes (between; NMA1280 and NMA1323, NMB1077 and NMB1112, NMC1041 and NMC1056). The IS-associated unique CDS are often small and lie in low % G + C troughs. They are also mostly annotated as “hypothetical proteins” with little information available to allow prediction of the effect of their differential presence. MC58 carries the largest version of prophage pnm2, which includes CDS-encoding cell surface antigens able to induce bacteriocidal antibodies in mice [41]. The presence of unique genes in each genome within these prophage could be due to independent phage insertions, differential gene loss from a larger prophage inserted in a common ancestor, or intergenomic recombination between prophage. Z2491 contains a large unique region of 63 annotated genes (NMA1821–NMA1885), which constitutes a Mu-like prophage (pnm1) shown to be conserved among epidemic serogroup A strains [42], though an association with virulence has not been demonstrated. FAM18 contains a region (NMC0852–NMC0895, IHT-E) that includes genes homologous to lambdoid bacteriophage genes and a transposon carrying a type I secretion system [26]. Repeat Arrays and Flanking Genes As with other members of the species, the N. meningitidis FAM18 genome contains many hundreds of repetitive sequence elements ranging from simple sequence repeats associated with phase variable genes (see below), to complete gene cluster duplications (Table 3). DNA uptake sequences (5′-GCCGTCTGAA-3′) are the most abundant repeats and are distributed throughout the genome [43]. Concordant with their % G + C-rich sequence, they are less frequent in low % G + C regions, which often coincide with important genetic loci including those for ribosomal proteins, capsule biosynthesis, pilus biosynthesis, Maf adhesins, prophage, Iga protease, cytolysin transport, and RTX-family exoproteins.
The next most abundant repeat types are the “neisserial intergenic mosaic elements” (NIMEs), which consist of 20-bp inverted repeats (ATTCCCNNNNNNNNGGGAAT, dRS3 elements) flanking over 100 families of ~50–150-bp repeat sequences (RS elements) [17]. Also frequent are the “Correia repeat enclosed elements” (known as CREE or Correia elements), which comprise a conserved repeat sequence (156 bp full length or 51 bp internal deletant) bounded by a 51-bp inverted repeat. CREEs are often located upstream of genes [44], have been shown to affect gene expression [45,46], and may be transposable or mobilisable [47]. The numbers of each major repeat type are comparable in the three complete N. meningitidis genomes (Table 3). Comparison of repeat elements between the three genomes revealed no repeat types unique to one genome though it did identify RS element diversity. For example, repeat sequence clustering analysis for Z2491 and FAM18 showed that of the 611 RS elements in FAM18, there are 80 FAM18-specific versions that group into 27 subfamilies, suggesting novel repeat development, possibly generated by recombination. NIMEs are often clustered into long arrays of multiple dRS3s separated by different RS elements. These arrays may also contain other repeats such as CREEs and insertion sequence elements, which may be opportunistic insertions. We have previously suggested that these NIME arrays may encourage sequence variation in neighbouring genes by increasing the frequency of recombination with exogenous DNA, and thus exchange of adjacent sequences, either by acting as substrates for homologous recombination, or as targets for a specific recombinase [17]. The chromosomal position of these repeat arrays is generally consistent between the three genomes suggesting that they were initially introduced in a common ancestor. However, comparison of syntenic repeat arrays reveals considerable differences in repeat number and array length, indicating that the arrays themselves are dynamic (Figure 2
To study the correlation between repeat arrays and coding sequence divergence, the three pairwise genome comparisons were combined to measure amino acid identities between orthologous CDS. This showed that the average percentage identity between orthologous CDS flanking repeat arrays is significantly (p-value = 5.2 × 10−6 using a single-tailed t-test) lower than the average percentage identity of orthologues not flanking repeat arrays, supporting the hypothesis that the arrays are associated with increased diversity in flanking genes. Despite this strong association, other measures of the diversifying affect of repeat arrays are less clear cut. The relationship between array length and flanking gene diversity is displayed in Figure 3
We further analysed the orthologue sequence identities to test whether the diversifying affect of the repeat arrays could be detected beyond the immediate adjacent CDS. Figure 3 Based on the above findings, we hypothesise that the relative positions of CDS and repeat arrays are under selective pressure such that genes where increased variation is beneficial are more likely to be associated with arrays. The repeat arrays serve to promote recombination with exogenously acquired DNA, increasing the rate of gene exchange at the adjacent loci. Although this does not directly cause increased variation in these genes, it should enhance the exchange of variants, and therefore increase the apparent rate of variation. This correlates with the pattern seen in Figure 3
Another consideration is that there may be a reciprocal relationship between the genes undergoing repeated recombination to generate antigenically variable mosaics and the flanking repeats. Since there is little or no selective pressure for accurate recombination within the flanking repeat regions, the recombination within these regions is likely to be more “error prone” than that within the coding regions. So, the recombination of these genes may serve to create variation and growth of the flanking repeats, which in turn may favour further recombination within the adjacent coding regions. NIME Array Structure The NIME arrays themselves display a striking and regular wave profile for % G + C content with troughs corresponding to RS elements and peaks corresponding to dRS3. Figure 5
We hypothesise that dRS3 sequences within NIME arrays are binding sites for a site-specific recombinase that enhances recombination between these sequences and exogenously acquired DNA containing other dRS3 elements, thereby promoting variation at a number of genes associated with NIME arrays. If the dRS3-mediated recombination formed an initial cross-over event, then the insertion of linear DNA could be completed by RecA-mediated homologous recombination in the adjacent sequences, ensuring replacement with similar genes. Alternatively, pairs of arrays surrounding genes could both participate in dRS3-mediated recombination, or array-flanked genes on acquired DNA could be inserted into chromosomal arrays. Continued recombination between chromosomal and acquired dRS3 elements, with the functionally selected consequence of exchanging adjacent genes, could have the effect of building up repeat arrays containing “spacer” regions (the RS elements) with specific physical or conformational properties. Bille et al. [30] and Kawai et al. [49] have recently described a type of neisserial filamentous prophage whose presence in meningococcal genomes is associated with the ability to invade host tissues. They have also showed that these bacteriophage integrate into dRS3 repeats by the action of a phage-encoded transposase/recombinase. This protein is therefore a plausible candidate for the specific recombinase predicted by our hypothesis. This phage is a member of a larger family of neisserial phage, and it is therefore reasonable to suppose that this recombinase has been present in the neisserial genome for some time. Silent Gene Cassette–Mediated Variation N. meningitidis genomes contain several loci where transcriptionally silent gene cassettes can be used as sources of variation for expressed surface structures and proteins. Comparison of such variable loci from different strains reveals detail of different genetic arrangements and may be useful for understanding the mechanisms for generation of variants. The best-described example is the pilin-encoding pilE/S system where the expressed pilin (PilE) can be altered by incorporation into the pilE CDS of DNA from 5′-adjacent promoter-less pilS genes [50]. Much of the pilE/S locus has been deleted in FAM18, which is associated with the previously recognized insertion, and conversion to the sole expression, of a class II pilin-encoding gene (pilE2) elsewhere on the chromosome (which is not present in Z2491 and MC58) [51]. Variation of the pilE gene using pilS sequences has been extensively studied [50,52,53], the efficiency of which has been shown in N. gonorrhoeae to involve a short DNA sequence (the Sma/Cla repeat) located downstream of the pilE gene. In N. meningitidis the silent pilS loci are embedded within NIME arrays, and it is possible that the specific dRS3-mediated recombination postulated above may contribute to generating silent variation within pilS sequences. A different mechanism of variation appears to exist for several loci encoding putative haemagluttinins (fhaB) and adhesins (mafB). Downstream of these genes are what appear to be silent cassettes encoding alternative C termini for the encoded proteins. These cassettes contain short repeats that are identical to sequences only present within the upstream genes. We have previously suggested that these repeats could be the substrates for direct recombination, replacing the 5′ end of the gene [17]. The three-way comparison provides more evidence in support of this view, including examples where the C terminus of one of the expressed genes in one genome is identical to a silent cassette in the same locus in another genome. The maf loci are generally comprised of tandem mafA and mafB genes, both of which are thought to encode adhesins, followed by a number of putative silent cassettes, and many genes of unknown function (Figure 6
Although all three maf loci have a similar structure and appear to be encoding a similar product, at the sequence level maf1 and maf2 are more similar to each other, with maf3 having only localised similarity. DNA identities between maf1 and maf2 mafA genes are high (>97%), but their identities to maf3 mafA genes are much lower (~65%). Identities between maf3 mafA genes are greater than 98%. An analogous situation exists for the mafB sequences. The encoded maf3 MafAs have an N-terminal extension relative to the others but all appear to have an intact signal sequence and are likely to be exported. At 41.2%, the average % G + C content of maf loci is markedly lower than the genome average of 51.5%. Furthermore, there is a distinct profile of % G + C content across maf loci. Generally mafA and mafB CDS, including the downstream alternative CDS, correspond to % G + C peaks with some mafA CDS as high as 59% GC. Although the intervening % G + C troughs have been annotated with potential CDS, they have no similarity to genes in the database, and their role is unclear. It is notable that they do not contain any of the repeats associated with repeat arrays. FAM18 and Z2491 have single syntenic fha loci, while MC58 has two that are associated with a genome inversion event (IE3) as described above (Figure 1 The maf and fha loci show considerable potential for generation of multiple versions of the expressed coding sequence and, together with surface structures such as pilus, capsule, and other surface proteins, are likely to be major contributors to cell surface diversity. The presence of multiple syntenic maf loci is striking and suggests an important role. Phase Variable Genes Previously, a number of potential phase variable genes have been identified based on the presence of potentially slippage-prone short repetitive sequences, and these lists have been progressively refined, through analysis of neisserial genome sequences, first in N. meningitidis strains MC58 [18] and Z2491 [17] and then in comparative studies using both of these and N. gonorrhoeae strain FA1090 [37], and subsequently in a study of the commonly used experimental N. gonorrhoeae strains [55] and a partial study of N. meningitidis [56]. Based upon these studies, and those published by others on specific genes, and a four-way comparison using the N. meningitidis FAM18 genome sequence, a revised and updated phase variable gene list is presented here (Table S3). There are now 24 known phase variable genes in the Neisseria spp. and a further 25 strong candidates, counting members of established gene families such as Opa proteins only once. Over half of these encode surface proteins, enzymes that modify surface proteins, or are LPS biosynthesis proteins. This mechanism therefore has a vast capacity to vary the surface-exposed structures and epitopes of N. meningitidis. Concluding Remarks Based upon the comparisons of the three meningococcal genomes, we further characterized a number of known and putative mechanisms for the generation of diversity within and between strains of this highly adaptable and variable species. Many of these mechanisms involve random variation, which is locally increased due to the presence of repeats, either generating local instability or serving as substrates for homologous recombination, resulting in altered expression of specific genes (phase variation) or generating allelic diversity within particular surface proteins (pilE, mafB/fhaB families, NIME array-associated genes) While phase variation through homopolymeric tracts has been noted in several other genera, the NIME arrays and cassette-mediated variation described here seem to be specific to Neisseria and may be important characterising features of the genus. NIME arrays are found at syntenic positions in the genomes of N. gonorrhoeae and N. lactamica (http://www.sanger.ac.uk/Projects/N_lactamica) but have not been observed in non-neisserial genomes. Although two-partner secretion systems analogous to the fha locus are found in other bacterial genomes [33], they only include the two essential components and lack the silent cassettes that enhance variability in N. meningitidis. Notably, the N. gonorrhoeae and N. lactamica genomes both have three maf loci syntenic with those in N. meningitidis, N. gonorrhoeae has two extra maf loci, and neither have fha loci. These genomic differences will affect the cell surface and may relate to niche differences and interactions with the host. Variation of the bacterial cell surface is a common theme in host–pathogen interactions and appears to be important for colonisation of new niches and avoidance of the immune system. Such variation may be even more important for commensal organisms, such as Neisseria, that remain associated with their hosts for long periods. The genome of FAM18, and its comparison with MC58 and Z2491, highlights N. meningitidis as a paradigm of genomic variability linking a combination of DNA uptake and recombination, minimal mobile elements, intergenic repeat arrays, and phase variation to generate and maintain phenotypic diversity focused at the cell surface. Materials and Methods Genome sequencing. N. meningitidis strain FAM18 genomic DNA was prepared as previously described [57]. An approximately 8×-shotgun sequence was produced from a total of 68,352 end-sequences from pUC clones with 1.4–2.0kb inserts using the Big Dye Terminator Cycle Sequencing kit from Applied Biosystems (http://www.appliedbiosystems.com). Reactions were run on Applied Biosystems 3700 sequencers. An approximately 1× coverage was produced from 1,152 end sequences from 10–20 kb inserts cloned into pBACe3.6 and used to scaffold contigs and bridge repeat sequences. The sequence was finished to standard criteria [17]. Sequence assembly, visualisation, and finishing were performed using PHRAP (P. Green, unpublished data; http://www.phrap.org) and Gap4 [58]. Annotation and genome comparison. Putative orthologues were identified by reciprocal-best-match FASTA searches between the meningococcal strains Z2491, MC58, and FAM18 protein sequences with cutoffs of 80% sequence length and 30% identity. The orthologue list was manually curated and annotation was transferred for orthologues common to strains Z2491 and FAM18. All other genes were annotated using standard criteria [17], and the complete genome annotation was then manually curated in Artemis [59]. The strain Z2491 EMBL entry has also been resubmitted to reflect the annotation updates generated during this study and the rotation of the sequence to place the origin of replication at the start. Genome comparisons were visualised using the Artemis Comparison Tool [60]. Repeats were defined and annotated using a combination of BLAST [61] and HMMer [62]. In an independent analysis, the complete genome sequences of N. meningitidis strains FAM18, MC58, and Z2491 and N. gonorrhoeae strain FA1090 were analysed using ACEDB (R. Durbin, J.T. Thierry-Mieg, unpublished data, http://www.acedb.org) as described previously [37,55,63,64]. Perfect sequence repeats characteristic of phase variable genes were identified using ARRAYFINDER [65]. Repeats, the annotations from all four neisserial genome sequences, and other sequence features were displayed in their sequence context within ACEDB. Analysis of the potential for simple sequence repeats to generate transcriptional or translational phase variation was determined through analysis of the repeat in the sequence context, as has been done previously [37,55,63,64]. Unique genes were identified as those for which no homology was displayed, the display parameters within ACEDB being set to 1e−50 for DNA identity, and 1e−4 for amino acid similarity. In cases of large paralogous gene families, genes that displayed low homology to only a portion of the gene with an annotated feature from another genome sequence were considered in their wider chromosomal context to determine if the allele is unique or divergent. The results of these independent analyses were combined and curated. Figure S1: Functions Associated with Percentage G + C Troughs across the FAM18 Genome (59 KB PDF) Click here for additional data file.(59K, pdf) Table S1: Genes Unique to Each of the Three N. meningitidis Genomes (298 KB DOC) Click here for additional data file.(299K, doc) Table S2: Repeat Arrays and Functional Annotation of Flanking Genes in FAM18 (75 KB DOC) Click here for additional data file.(75K, doc) Table S3: Phase Variable Genes of the Neisseria spp. (221 KB DOC) Click here for additional data file.(221K, doc) Accession Numbers The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl) accession numbers for the genomes discussed in this paper are N. gonorrhoeae (AE004969), N. meningitidis strain FAM18 (AM421808), and N. meningitidis strain Z2491 (AL157959). Acknowledgments We acknowledge the use of core facilities at the Wellcome Trust Sanger Institute. We also greatly appreciate the help given by Dr. Simon McGowan by generating figures. Abbreviations
Footnotes Competing interests. The authors have declared that no competing interests exist. A previous version of this article appeared as an Early Online Release on December 21, 2006 (doi:10.1371/journal.pgen.0030023.eor). Author contributions. SDB, GSV, MA, BB, and JP conceived and designed the experiments. SDB, GSV, LASS, CC, CA, TC, AC, PHD, NEH, KJ, MM, SM, ER, SS, LU, SW, MAQ, NJS, and JP performed the experiments. SDB, GSV, LASS, NJS, and JP analyzed the data. SDB, NJS, and JP wrote the paper. Funding. This work was supported by the Wellcome Trust through the Beowulf Genomics Initiative. References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
||||||||||||||||||||||||||||||||||||
J Infect Dis. 1983 Sep; 148(3):369-76.
[J Infect Dis. 1983]N Engl J Med. 2001 May 3; 344(18):1378-88.
[N Engl J Med. 2001]Microbes Infect. 2000 May; 2(6):687-700.
[Microbes Infect. 2000]Plasmid. 2005 Nov; 54(3):191-218.
[Plasmid. 2005]J Bacteriol. 2001 Apr; 183(8):2570-5.
[J Bacteriol. 2001]Philos Trans R Soc Lond B Biol Sci. 1999 Apr 29; 354(1384):701-10.
[Philos Trans R Soc Lond B Biol Sci. 1999]Annu Rev Microbiol. 2006; 60():561-88.
[Annu Rev Microbiol. 2006]Mol Biol Evol. 1999 Jun; 16(6):741-9.
[Mol Biol Evol. 1999]APMIS. 1998 May; 106(5):505-25.
[APMIS. 1998]J Clin Microbiol. 2004 Nov; 42(11):5146-53.
[J Clin Microbiol. 2004]Nature. 2000 Mar 30; 404(6777):502-6.
[Nature. 2000]Science. 2000 Mar 10; 287(5459):1809-15.
[Science. 2000]J Infect Dis. 1993 Jun; 167(6):1320-9.
[J Infect Dis. 1993]J Clin Microbiol. 2005 Oct; 43(10):5129-35.
[J Clin Microbiol. 2005]J Infect Dis. 2002 Jun 1; 185(11):1596-605.
[J Infect Dis. 2002]Proc Natl Acad Sci U S A. 1994 Apr 26; 91(9):3769-73.
[Proc Natl Acad Sci U S A. 1994]EMBO J. 2004 May 5; 23(9):2009-17.
[EMBO J. 2004]Nature. 2000 Sep 7; 407(6800):98-102.
[Nature. 2000]J Exp Med. 2005 Jun 20; 201(12):1905-13.
[J Exp Med. 2005]Infect Immun. 2001 Mar; 69(3):1816-20.
[Infect Immun. 2001]FEMS Microbiol Rev. 2006 Mar; 30(2):292-319.
[FEMS Microbiol Rev. 2006]Microbiology. 2000 May; 146 ( Pt 5)():1211-21.
[Microbiology. 2000]Genome Res. 2004 Jun; 14(6):1036-42.
[Genome Res. 2004]Genome Biol. 2006; 7(4):R34.
[Genome Biol. 2006]Microbiology. 2001 Aug; 147(Pt 8):2321-32.
[Microbiology. 2001]J Bacteriol. 2001 Apr; 183(8):2570-5.
[J Bacteriol. 2001]Clin Microbiol Rev. 1989 Apr; 2 Suppl():S78-82.
[Clin Microbiol Rev. 1989]J Bacteriol. 2000 Mar; 182(5):1296-303.
[J Bacteriol. 2000]Microbiology. 2002 Dec; 148(Pt 12):3756-60.
[Microbiology. 2002]Infect Immun. 2001 Apr; 69(4):2580-8.
[Infect Immun. 2001]Infect Immun. 2000 Apr; 68(4):2082-95.
[Infect Immun. 2000]Microbiology. 2006 Dec; 152(Pt 12):3733-49.
[Microbiology. 2006]Proc Natl Acad Sci U S A. 1988 Sep; 85(18):6982-6.
[Proc Natl Acad Sci U S A. 1988]Nature. 2000 Mar 30; 404(6777):502-6.
[Nature. 2000]J Bacteriol. 2002 Nov; 184(22):6163-73.
[J Bacteriol. 2002]Infect Immun. 2006 May; 74(5):2637-50.
[Infect Immun. 2006]Biochim Biophys Acta. 2002 Jun 7; 1576(1-2):39-44.
[Biochim Biophys Acta. 2002]FEBS Lett. 2002 Jul 3; 522(1-3):52-8.
[FEBS Lett. 2002]Nature. 2000 Mar 30; 404(6777):502-6.
[Nature. 2000]Mol Microbiol. 2000 Jun; 36(5):1049-58.
[Mol Microbiol. 2000]J Exp Med. 2005 Jun 20; 201(12):1905-13.
[J Exp Med. 2005]DNA Res. 2005; 12(6):389-401.
[DNA Res. 2005]Nucleic Acids Res. 1997 Apr 1; 25(7):1362-8.
[Nucleic Acids Res. 1997]FEMS Microbiol Lett. 2005 Aug 15; 249(2):327-34.
[FEMS Microbiol Lett. 2005]Infect Immun. 1997 Jul; 65(7):2613-20.
[Infect Immun. 1997]Mol Microbiol. 1994 Jul; 13(1):75-87.
[Mol Microbiol. 1994]Nature. 2000 Mar 30; 404(6777):502-6.
[Nature. 2000]Mol Microbiol. 2006 Jul; 61(2):368-82.
[Mol Microbiol. 2006]Science. 2000 Mar 10; 287(5459):1809-15.
[Science. 2000]Nature. 2000 Mar 30; 404(6777):502-6.
[Nature. 2000]Microbiology. 2001 Aug; 147(Pt 8):2321-32.
[Microbiology. 2001]BMC Microbiol. 2005 Apr 27; 5():21.
[BMC Microbiol. 2005]Mol Microbiol. 2003 Oct; 50(1):245-57.
[Mol Microbiol. 2003]Microbiology. 2000 May; 146 ( Pt 5)():1211-21.
[Microbiology. 2000]Mol Microbiol. 1992 Aug; 6(15):2135-46.
[Mol Microbiol. 1992]Nature. 2000 Mar 30; 404(6777):502-6.
[Nature. 2000]Nucleic Acids Res. 1995 Dec 25; 23(24):4992-9.
[Nucleic Acids Res. 1995]Nature. 2000 Mar 30; 404(6777):502-6.
[Nature. 2000]Brief Bioinform. 2003 Jun; 4(2):124-32.
[Brief Bioinform. 2003]Bioinformatics. 2005 Aug 15; 21(16):3422-3.
[Bioinformatics. 2005]J Mol Biol. 1990 Oct 5; 215(3):403-10.
[J Mol Biol. 1990]BMC Bioinformatics. 2005 Apr 15; 6():99.
[BMC Bioinformatics. 2005]Microbiology. 2001 Aug; 147(Pt 8):2321-32.
[Microbiology. 2001]BMC Microbiol. 2005 Apr 27; 5():21.
[BMC Microbiol. 2005]Mol Microbiol. 2000 Jul; 37(1):207-15.
[Mol Microbiol. 2000]BMC Microbiol. 2003 Nov 12; 3():23.
[BMC Microbiol. 2003]Mol Biol Evol. 1999 Feb; 16(2):253-65.
[Mol Biol Evol. 1999]