![]() | ![]() |
Formats:
|
||||||||||||||||||||||||||||||
Copyright © 2005, Cold Spring Harbor Laboratory Press Genome sequence of Blochmannia pennsylvanicus indicates parallel evolutionary trends among bacterial mutualists of insects Josephine Bay Paul Center for Comparative Molecular Biology and Evolution, Marine Biological Laboratory, Woods Hole, Massachusetts 02543, USA 1Present address: Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, Arizona 85721, USA. 2Corresponding author. E-mail jwernegreen/at/mbl.edu; fax (508) 457-4727. Received February 3, 2005; Accepted June 7, 2005. This article has been cited by other articles in PMC.Abstract The distinct lifestyle of obligately intracellular bacteria can alter fundamental forces that drive and constrain genome change. In this study, sequencing the 792-kb genome of Blochmannia pennsylvanicus, an obligate endosymbiont of Camponotus pennsylvanicus, enabled us to trace evolutionary changes that occurred in the context of a bacterial–ant association. Comparison to the genome of Blochmannia floridanus reveals differential loss of genes involved in cofactor biosynthesis, the composition and structure of the cell wall and membrane, gene regulation, and DNA replication. However, the two Blochmannia species show complete conservation in the order and strand orientation of shared genes. This finding of extreme stasis in genome architecture, also reported previously for the aphid endosymbiont Buchnera, suggests that genome stability characterizes long-term bacterial mutualists of insects and constrains their evolutionary potential. Genome-wide analyses of protein divergences reveal 10- to 50-fold faster amino acid substitution rates in Blochmannia compared to related bacteria. Despite these varying features of genome evolution, a striking correlation in the relative divergences of proteins indicates parallel functional constraints on gene functions across ecologically distinct bacterial groups. Furthermore, the increased rates of amino acid substitution and gene loss in Blochmannia have occurred in a lineage-specific fashion, which may reflect life history differences of their ant hosts. Genome sequencing provides a rich data set to predict the metabolic capabilities of organisms, and comparative analyses among closely related species offer a powerful approach to examine mechanisms of genome flux. Since 2000, genomics has shed light on the metabolism and evolution of obligately intracellular, mutualistic bacteria that have coevolved with various insect groups for tens to hundreds of millions of years (Baumann et al. 2000). Fully sequenced genomes of Buchnera associated with aphids (Shigenobu et al. 2000; Tamas et al. 2002; van Ham et al. 2003), Wigglesworthia of tsetse flies (Akman et al. 2002), and Blochmannia associated with Camponotus (Gil et al. 2003) are extremely streamlined yet retain basic cellular processes and specific biosynthetic abilities required by the insect host. Genome comparisons within Buchnera have shown stability, with no gene acquisition, inversions, or translocations throughout 50–70 million years (Myr) of evolution within aphids (Tamas et al. 2002) and near-perfect synteny since the establishment of this association 150–200 million years ago (Mya) (van Ham et al. 2003). This exceptional stasis of genome architecture contrasts with lability of free-living and parasitic bacterial genomes. Genome stability may reflect the dearth of molecular tools for gene exchange (e.g., phage, certain rec genes, and repeated DNA sequences) in this mutualist and limited ecological opportunities to recombine with genetically distinct bacteria (Tamas et al. 2002; van Ham et al. 2003; Moran and Plague 2004). Such constraints on genome changes in stable mutualists may profoundly affect the evolutionary potential of these bacteria and their hosts. However, owing to the lack of multiple sequenced genomes within endosymbiont groups, genome stability in other long-term endosymbionts has remained untested. In order to contribute to a more comprehensive model of genome evolution in ancient endosymbiotic associations, we have evaluated genome dynamics in Blochmannia, a bacterial genus that is closely related to Buchnera and has cospeciated with ants for ~30 Myr. The wide range of interactions between ants and other species, including plants, fungi, trophobionts, other insects, and diverse bacteria (Dasch et al. 1984; Currie 2001; Zientz et al. 2001), may explain the huge ecological success of ants, which play dominant roles in nutrient turnover in terrestrial ecosystems and include more than twice as many species as mammals (Hölldobler and Wilson 1990). Blochmannia is the most evolutionarily stable ant associate and lives exclusively within cells of the closely related genera Polyrhachis, Colobopsis, and Camponotus (Dasch et al. 1984; Schröder et al. 1996; Sameshima et al. 1999; Sauer et al. 2000; Degnan et al. 2004). Blochmannia has been studied most extensively in Camponotus, the second largest ant genus, with ~1000 species (Bolton 1995) ranging from omnivores to specialists on plant secretions and homopteran exudates (Dasch 1975; Hölldobler and Wilson 1990; Bolton 1995; Davidson 1997, 1998) and with nesting habitats including wood, soil beneath rocks, and the rainforest canopy (Bolton 1995). The 706-kb sequence of B. floridanus (Blochmannia of host Camponotus floridanus) indicated this ant endosymbiont retains numerous metabolic pathways that may be involved in host nutrition, including nitrogen recycling and assimilation, biosynthesis of amino acids and fatty acids, and sulfate reduction (Gil et al. 2003). Blochmannia is thought to be important during host development (Sauer et al. 2002; Wolschin et al. 2004), but its specific roles in host physiology and ecology and its functional variation across ant species remain unclear. We have sequenced the 792-kb genome of Blochmannia pennsylvanicus (Blochmannia associated with Camponotus pennsylvanicus) to examine genome changes since this lineage and B. floridanus diverged from a common ancestor ~16–20 Mya (Degnan et al. 2004). Comparing genome inventories and architectures of the two Blochmannia strains allowed us to trace functional and structural changes that occurred in the context of this bacterium–ant interaction and to contrast genome dynamics and protein evolution in two mutualist groups (Blochmannia and Buchnera). Results Genome features The B. pennsylvanicus genome consists of a 791,654-bp circular chromosome that we sequenced to 12× coverage (Table 1; Fig. 1
Differential gene loss, yet complete stability of genome architecture within Blochmannia The 86-kb size difference between the Blochmannia genomes (B. pennsylvanicus, 792 kb; B. floridanus, 706 kb) largely reflects differential gene loss between the two lineages, with a greater extent of loss in B. floridanus. Both genomes possess several intact genes that are missing or pseudogenes in the other genome (Fig. 2
Assuming that the common ancestor of the two Blochmannia strains encoded at least their combined set of 615 ORFs, then gene loss or inactivation in the lineage leading to B. floridanus occurred at an approximate rate of one ORF per 0.64–0.8 Myr [(25 ORFs lost)/(16 Myr) to (25 ORFs lost)/(20 Myr)]. Gene loss in the B. pennsylvanicus lineage is apparently ~6.5 times slower, with loss or inactivation of one ORF per 4.0–5.0 Myr [(4 ORFs lost)/(16 Myr) to (4 ORFs lost)/(20 Myr)]. Estimated rates of gene loss in B. floridanus exceed those in Buchnera, in which one ORF was lost or inactivated per 2.70–3.60 Myr between Buchnera–B. pistaciae and Buchnera–Acyrthosiphon pisum [(111 ORFs lost)/(150 Myr × 2) to (111 ORFs lost)/(200 Myr × 2)] and per 1.70–2.38 Myr between Buchnera–A. pisum and Buchnera–Schizaphis graminum [(59 ORFs lost)/(50 Myr × 2) to (59 ORFs lost)/(70 Myr × 2)]. Rates of loss may be underestimated if the same genes were deleted independently along lineages. Mechanisms that influence rates of gene loss may include changes in the strength and/or efficacy of selection to maintain genes, as well as differences in underlying rates of gene knockouts and deletion (see Discussion). Truncations and frameshifts in otherwise conserved Blochmannia genes In all, 13 Blochmannia genes show a significant (20%–40%) truncation compared to Escherichia coli orthologs but apparently encode functional proteins. These genes include aceF, aroK, aroQ, ftsK, ftsY, hfq, mreC, pheA, rpoZ, thrS, trpD, yfcB, and yqeI. In all but one case (hfq, see below), truncations are shared by B. pennsylvanicus and B. floridanus and thus likely occurred before the divergence of these lineages. Certain truncations are also shared between Blochmannia and Wigglesworthia (yfcB, rpoZ, ftsK), Buchnera (aroQ), or among all three mutualist groups (aceF, ftsY). Although a >20% length reduction is often interpreted as evidence for loss of gene function (e.g., Lerat and Ochman 2004), the sequence conservation of truncated Blochmannia genes suggests they encode functional proteins. First, nonsynonymous divergence (dN) between B. pennsylvanicus and B. floridanus is relatively low at the 12 truncated genes they share (average of 0.3012 ± 0.14), whereas synonymous divergence (dS) exceeds 2 for each gene. Although dN/dS is difficult to calculate because of saturation of synonymous sites, this ratio is below 0.13 for each gene. Moreover, apart from RpoZ, protein divergences at truncated genes are generally comparable to other ORFs (average protein divergence of 0.569 ± 0.281, compared to the genome-wide average of 0.565 ± 0.309). RpoZ is quite divergent (2.16), but the fact that dN/dS 1 suggests the truncation did not eliminate gene function. The one truncated gene (hfq) in B. pennsylvanicus that lacks an ortholog in B. floridanus has a relatively low protein divergence with the closest outgroup, Wigglesworthia (0.576). Thus, in the absence of detailed biochemical and structural information for each of these genes, a 20% truncation appears too conservative a criterion for inferring loss of function. In Blochmannia, truncated genes clearly differ from annotated pseudogenes, in which frameshift and nonsense mutations introduce multiple stop codons throughout the gene.We also detected a single, short frameshift in each of B. pennsylvanicus ytfM, ybiS, hisH, and ubiF. The first two genes are unclassified proteins, while the latter are required for histidine and ubiquinone biosynthesis, respectively (see Fig. 2
Despite their consistency, we hesitate to interpret these frameshifts as firm evidence of pseudogenes. Notably, apart from a single frameshift, these genes would otherwise encode intact proteins that are relatively conserved between the two Blochmannia genomes, with an average protein divergence (0.67 ± 0.16) that is only slightly above the average for other ORFs (0.565 ± 0.309). The occurrence of frameshifts within homopolymeric tracks is consistent with slippage during transcription (Wagner et al. 1990; Baranov et al. 2005) or during translation (Baranov et al. 2002; Gurvich et al. 2003) that could restore the full-length protein (see Discussion). If the frameshifts have, indeed, disrupted protein functions, these mutations must have occurred very recently since rapid mutation in Blochmannia (Degnan et al. 2004) is expected to erode pseudogenes quickly. Metabolic similarities of Blochmannia spp. Analysis of B. pennsylvanicus and reanalysis of other insect mutualist genomes using MultiFun (Serres and Riley 2000) allowed more comprehensive metabolic comparisons across groups (see Fig. 2 Metabolic differences between Blochmannia spp. The 30 ORFs that distinguish the two Blochmannia genomes have a range of predicted cellular functions that may alter host–symbiont metabolic exchanges (Figs. (Figs.22
Several differences between the Blochmannia genomes involve the biosynthesis, transport, and mediation of cellular wall and membrane components. B. pennsylvanicus retains six distinct ORFs that contribute to the de novo synthesis of peptidoglycan (murein), the major constituent of Gram-negative bacterial cell walls (Park 1996). In addition to MurI, it retains the complete pathway for the biosynthesis of isoprenoids (Fig. 4B B. pennsylvanicus is the first fully sequenced insect mutualist that retains the entire sec-dependent secretory pathway, including the chaperonin SecB. This general pathway mediates the export and translocation of numerous proteins to the periplasm or inner membrane (Pugsley 1993). The other insect mutualist genomes lack certain components (typically secBDF) of this pathway (Shigenobu et al. 2000; Akman et al. 2002; Tamas et al. 2002; Gil et al. 2003; van Ham et al. 2003), although these losses are not expected to eliminate function (Mushegian and Koonin 1996). B. pennsylvanicus also retains the inner-membrane-bound heat-induced protease HtpX and the periplasmic chaperonins LolB and DsbA. Together, these results suggest that B. pennsylvanicus is better able to respond to cellular stress and ensure proper transport localization, conformation, and decomposition of gene products. Distinct membrane features in B. pennsylvanicus also include its retention of the inner membrane proteins YciC, putatively involved in transport, and BacA, which confers bacitracin resistance. While insect mutualists have lost many regulatory genes, B. pennsylvanicus encodes three that are missing from B. floridanus (hns, hfq, and mntR). Like B. floridanus and Wigglesworthia, B. pennsylvanicus lacks DnaA, considered important in the initiation of DNA replication (Akman et al. 2002; Gil et al. 2003). In B. floridanus, the HU-like nucleoprotein HlpA might be involved in starting the nucleosome (Gil et al. 2003), and in B. pennsylvanicus, the nucleoprotein Hns might also be recruited to the replication origin for this purpose. Neither Blochmannia genome encodes the θ subunit of the holoenzyme DNA polymerase III, but B. pennsylvanicus retains the τ and γ subunits of DnaX. The τ subunit acts as a molecular tether that couples DnaB (DNA helicase) to the core of DNA polymerase III (α and ε) as the replication fork progresses from the origin (Walker et al. 2000; Gao and McHenry 2001). Thus, the inability of B. floridanus, Buchnera, and Wigglesworthia to transcribe the τ subunit is expected to decrease the efficiency, accuracy, and processivity of the holoenzyme (van Ham et al. 2003). These distinct features of the B. pennsylvanicus replication machinery might contribute to the apparent 29-kb shift observed in B. floridanus GC skew relative to B. pennsylvanicus (see Supplemental Fig. S1). Long intergenic spacers in B. pennsylvanicus Intergenic spacers in B. pennsylvanicus are significantly longer (average 291 bp) than homologous spacers in B. floridanus (average 180 bp; Wilcoxon Rank sum test, p < 0.0001), are longer than spacers in most other bacteria (Bacillus subtilis, average 121.9 bp; Vibrio cholerae, average 156.4 bp; Escherichia coli K12 (http://ecocyc.org), average 128.8 bp) (Mira et al. 2001) and contribute to its larger genome size compared to B. floridanus (77% vs. 84% coding regions) (Table 1). While spacers of the two Blochmannia genomes are too divergent to align, the strict conservation of gene order allowed us to predict homologous spacers based on their position in the chromosome. A detectable relationship between lengths of spacers in the two genomes suggests a certain degree of conservation of spacer length (Fig. 5A
Polymorphism within B. pennsylvanicus Polymorphisms in the pooled symbiont population used for library construction were detectable as well-supported (Phred scores >40) discrepancies in the genome assembly that were represented by at least two independent clones. Nearly all (445/497) polymorphisms are single nucleotide polymorphisms (SNPs). Those located within ORFs occur primarily at third codon positions (Table 2). Although the majority of SNPs are located within ORFs, a disproportionate number of SNPs (35%) and indels (96% of insertions and/or deletions) occur within the intergenic regions (which comprise 23% of the B. pennsylvanicus genome). The two polymorphic indels within ORFs produce amino acid insertions/deletions in sucA and aroE.
Comparison of protein divergences Wide variation in protein divergence across loci indicates variable functional constraint across Blochmannia proteins (Fig. 6
Previous studies have demonstrated a negative relationship between GC content of endosymbiont genes and their level of divergence from free-living relatives (Herbeck et al. 2003; Banerjee et al. 2004), suggesting that amino acid changes in proteins under strong functional constraint are less severely affected by AT mutational bias. A strong negative association between GC content and protein divergence in Blochmannia (Supplemental Fig. S3) indicates this relationship also holds when protein divergences are estimated within a mutualist group (rather than to a more distant free-living relative). Protein divergences were, on average, ~1.88 times faster in B. floridanus lineage compared to B. pennsylvanicus lineage. Genes with particularly elevated rates in B. floridanus include secG, rpsJ, rpsR, and rnpA, each of which evolves more than 10-fold faster in B. floridanus compared to B. pennsylvanicus (Fig. 7
The B. pennsylvanicus genome offered the first opportunity to compare genome-wide patterns of protein evolution in the context of distinct endosymbiotic associations. A previous study showed accelerated evolution at 16S rDNA and at synonymous positions of select Blochmannia genes compared to enteric bacteria and even compared to the rapidly evolving Buchnera (Degnan et al. 2004). Here, we tested whether the Blochmannia genome also undergoes exceptionally fast rates of protein evolution. Strong correlations in divergences at homologous genes indicate parallel functional constraints in Blochmannia compared to Buchnera and E. coli–Photorhabdus luminescens (Fig. 6 Comparisons of average protein divergences within and between major functional categories confirmed many of these observations above (Supplemental Table S5). Blochmannia, Buchnera, and E. coli–P. luminescens show similar relative divergences across most functional categories, such as relatively high divergences of unclassified and hypothetical genes, loci for surface structures, cell membrane components; moderate divergences of genes encoding cofactor biosynthesis, information transfer, cell processes, metabolism, information transfer; and relatively low divergences of genes for nucleotide biosynthesis and amino acid biosynthesis. Striking differences among pairs include relatively high divergence of chaperonins and fatty acid biosynthetic genes in Buchnera, the relative conservation of regulation genes in both endosymbionts, and the slightly higher divergence of translation genes in Blochmannia, as noted above. Discussion In contrast to many pathogenic and free-living bacterial species in which lateral gene transfer, chromosomal inversions, and translocations drive changes in gene order and content, we found complete conservation in gene order and orientation between two genomes of Blochmannia that diverged 16–20 Mya. This exceptional genome stability, first demonstrated in Buchnera (Tamas et al. 2002; van Ham et al. 2003), suggests that Blochmannia also lacks genetic machinery for gene inversions or translocation. Notably, like Buchnera, both Blochmannia strains lack RecA, numerous other recombination functions (RadA and Rec-FOR), phage, and have relatively low levels of repeated DNA. The gene content of B. pennsylvanicus differs from B. floridanus only by 3.6% (24/659); however, genes specifically retained in B. pennsylvanicus span varied functional categories that may affect its metabolic capabilities and host interaction. The retention of genes for cell wall integrity, chaperonins, gene regulation, and DNA replication in B. pennsylvanicus may reflect different bacterial requirements for the maintenance of cell processes and structures within C. pennsylvanicus bacteriocytes. Furthermore, the ability to synthesize both coenzyme A and isoprenoids likely benefits B. pennsylvanicus and their ant hosts. Genome stasis implies gene losses in either Blochmannia lineage are irreversible, such that deletions of metabolic functions may constrain the evolutionary potential of this association. Such constraints have been proposed in Buchnera (Tamas et al. 2002), where the loss of genes for sulfur reduction and cysteine biosynthesis in Buchnera–SG may constrain the S. graminum host to its relatively cysteine-rich grass diet. Certain aspects of Blochmannia metabolism remain unclear, owing to the uncertainty of whether or not single frameshifts in particular genes eliminate function. For example, if single indels within poly(A) tracts of hisH and ubiF are subject to correction of some type, the encoded proteins might be functional in B. pennsylvanicus. One possible mechanism for correction may be instability of such frameshifts during DNA replication, such that populations include heterogeneous genomes with different numbers of adenines in the homopolymeric repeats (e.g., Parkhill et al. 2000). However, among the many C. pennsylvanicus colonies used for symbiont library construction, we found no evidence for variation in lengths of these or any homopolymeric tracts. Alternatively, transcriptional slippage, or “stuttering” within repeat mononucleotides is well-documented in E. coli (Chamberlin and Berg 1962), where it often occurs within poly(A) or poly(T) tracts (Wagner et al. 1990). In a survey of published bacterial genomes, Baranov et al. (2005) identified several “pseudo pseudogenes,” for which transcriptional slippage could correct a frameshift within poly(A) or poly(T) tracts and restore uninterrupted ORFs. These authors describe a mechanism in which the RNA chain dissociates from the DNA template and reassociates in a new location. Third, functional proteins may be restored by frame-shifting during translation, or “recoding,” a phenomenon that occurs in yeast (Hansen et al. 2003), archea (Cobucci-Ponzano et al. 2005), and E. coli (Gurvich et al. 2003) and can also follow poly(A) tracts, especially in the form of A_AAA_AAG (Baranov et al. 2002, 2003). Such slippage has been proposed to explain aberrant indels in animal mitochondrial genes (Beckenbach et al. 2005), and in principle, might operate in endosymbionts. Given the multiple levels at which frameshifts within homopolymeric sequences may be corrected, we argue that these mutations should be interpreted cautiously, and with consideration of whether the gene otherwise encodes a full-length ORF. Although it is possible that such genes represent very recent loss of function, we take the approach of Baranov et al. (2005) in questioning whether such genes should be annotated as pseudogenes. Conservation of genome architecture and overall similarity in gene content within Blochmannia contrasts with the exceptionally fast rates of sequence evolution observed in this group. Namely, protein divergences are higher for the two ant mutualists than within much older bacterial pairs. A previous study showed that amino acid changes were influenced by AT mutational pressure to a greater extent in basal lineages in Buchnera, suggesting endosymbiont proteins were more tolerant of these presumably deleterious changes early in the association (Clark et al. 1999). By the same token, rapid rates of protein evolution in Blochmannia might reflect the younger age of this association (~30 Myr, compared to ~150–200 Myr for Buchnera). Despite an overall rate acceleration in Blochmannia, the relative levels of divergence among proteins were strikingly similar to those of Buchnera and the enterobacteria. Although different types of selection shape free-living and endosymbiotic bacteria, this observed correlation suggests parallel functional constraints across many shared proteins. A deviation from this pattern occurs in Blochmannia, where genes involved in translation are not the most conserved functional category relative to the enterics (Supplemental Table S5). We compared the ribosomal proteins that deviated from the best fit line in Figure 6 Within Blochmannia, faster divergence in the B. floridanus lineage at nearly all (~90%) proteins may reflect elevated mutation rates, reduced selective coefficients, or smaller effective population size of this symbiont and/or its host with associated increased genetic drift. Although data to distinguish these alternatives are limited, an analysis of four gene regions (groEL, rpsB, atpB, and gidA, all of which evolve faster in the B. floridanus than B. pennsylvanicus lineage) showed no consistent increase in dN/dS in B. floridanus or its close relatives (Fry and Wernegreen 2005) that would be predicted under relaxed selection or drift hypotheses. Evidence supporting the mutation hypothesis includes a lower genomic GC content for B. floridanus than B. pennsylvanicus (Table 1). Because the two Blochmannia genomes have the same set of DNA repair genes, there is no a priori reason to propose that B. floridanus has a faster rate of mutations per replication. However, the year-round activity of C. floridanus and its relatives in the subgenus Myrmothrix contrasts with the winter dormancy of C. pennsylvanicus and related temperate species in the subgenus Camponotus. This activity may increase the number of host and bacterial generations per year and, consequently, the rate of mutations per unit time. Likewise, a combination of elevated mutation, relaxed selection, and/or increased genetic drift in the B. floridanus lineage may account for faster rates of gene loss compared to the B. pennsylvanicus lineage (see Lawrence and Roth [1999] for theoretical framework). However, given that current data cannot distinguish points along the B. floridanus lineage at which evolutionary rates accelerated or at which particular genes were lost, at this time we cannot link these changes to specific aspects of host ecology. More intensive sampling of Blochmannia across ecologically diverse hosts should allow such connections to be made. Given the wide variation in Camponotus nutritional ecology, ranging from plant-specialists to omnivorous species, it seems unlikely that a single nutrient is lacking in the diet of all species that house Blochmannia. Rather than supplementing specific dietary deficiencies, nutritional functions of Blochmannia may play critical roles during two “starvation” phases of the host when metabolic demands exceed the available food supply—metamorphosis and colony founding (Wheeler and Martinez 1995). Recent work has shown that Blochmannia proliferate during pupation (Wolschin et al. 2004), a stage of metamorphosis when the host must construct all components of the adult body plan with no food intake (Wheeler and Martinez 1995). Genome sequence data provide a starting point for experimental analyses to clarify the functional significance of this mutualism, to explore the implications of genome variability on the physiology and ecology of both symbiotic partners, and to clarify the levels and timing of selection that shape this long-term bacterium–ant association. Methods Blochmannia genome sequencing and assembly B. pennsylvanicus genomic DNA (gDNA) was prepared from worker and larvae C. pennsylvanicus collected from five colonies at two sites in Falmouth, Massachusetts, USA. The gDNA was either extracted directly from the agarose plugs containing the purified bacterial cells (Charles and Ishikawa 1999) or gelpurified from a chromosomal fragment resolved through Pulsed Field Gel Electrophoresis (PFGE) (Wernegreen et al. 2002). Short (1.5–2.5 kb) insert libraries were generated from hydrosheared DNA using a double adaptor kit (SeqWright Inc.) (Andersson et al. 1996). Plasmid clones were purified and bidirectionally sequenced using BigDye v3.0 chemistry on either an ABI3700 or an ABI3730xl (Applied Biosystems). Detailed methods for library construction and sequencing are provided in the Supplemental material. Raw sequence data were analyzed by PHRED (Ewing and Green 1998; Ewing et al. 1998) (http://www.phrap.org/phredphrapconsed.html) and screened using BLASTN/X (Altschul et al. 1990) for ant host contamination. Sequence reads that were putatively identified as γ-Proteobacterial (E ≤ 1–10) were assembled using ARACHNE 2 (Jaffe et al. 2003) (http://www.genome.wi.mit.edu/wga). The resulting contigs were analyzed by hand in CONSED (Gordon et al. 1998) and using BAMBUS (Pop et al. 2004). The B. pennsylvanicus assembly was aligned using LAGAN (Brudno et al. 2003) (http://lagan.stanford.edu/lagan_web/index.shtml) to the published B. floridanus genome (NC005061), which facilitated primer design for gap closure by PCR. Annotation and metabolic reconstruction Open reading frames (ORFs) were identified iteratively using GLIMMER v2.10a (Delcher et al. 1999) (http://www.tigr.org/software/glimmer) and gene orthology predictions based on BLASTP sequence similarity to the NR, SWISS-PROT, and ECOLI databases; HMMR searches against Pfam_ls (Bateman et al. 2004); and identification of E. coli orthologs using the Reciprocal Sequence Distance (RSD) program (Wall et al. 2003) (details of the RSD method are noted below under “Comparison of protein divergences”). The few discrepancies among methods were limited to cases of differential loss of one gene from a pair of paralogs (ilvBG, tufAB, argFI), the presence of gene fusions (yidCD) or split genes (trpDG), or failure of RSD to identify a given ortholog because of high sequence divergence. Three pseudogenes (uvrD, yqiC, rpmD) were detected as regions with similarity to functional ORFs in other genomes, but with multiple indels and missense mutations resulting in stop codons throughout each gene. The two most degraded pseudogenes (rpmD and yqiC) were undetectable by BLASTX and were only identified because of the conservation of gene order between the two Blochmannia genomes. Among the three pseudogenes, uvrD retains the longest (107 amino acids) intact reading frame with similarity to functional orthologs, but even this region is just 15% of the length of UvrD in E. coli and other outgroups. In this sense, annotated pseudogenes clearly differ from truncated B. pennsylvanicus ORFs, which retain at least 60% of the length of orthologous proteins. The three B. pennsylvanicus pseudogenes also differ from genes with single frameshifts within homopolymeric regions (hisH, ubiF, ytfM, ybiS), since the latter would encode intact, relatively conserved proteins if the frame-shift were corrected by slippage during transcription or translation (see Fig. 3 ORFS and RNAs were manually curated using a Generic Model Organism Database (GMOD) Web browser. ORFs that lacked sequence similarity to any entry in GenBank or the Comprehensive Microbial Resource (Peterson et al. 2001) and lacked any predicted protein domains in Pfam_ls were excluded from the annotation. Functional and pseudo-transfer RNAs (tRNAs) were identified using tRNAscan-SE (http://selab.wustl.edu/cgibin/selab.pl?mode=software), and ribosomal RNAs and structural RNAs were identified by BLASTN searches of the intergenic regions. Blochmannia gene functions and interactions were inferred from orthologs of E. coli K12 MG1655 described in GenProtEC (Serres et al. 2004) (http://genprotec.mbl.edu) and characterized by MultiFun (Serres and Riley 2000), two resources that represent functions of ~80% of the 4401 genes in E. coli K12. Metabolic pathways were evaluated using the reference pathways available for E. coli at EcoCyc (Karp et al. 2004) and KEGG (Kanehisa and Goto 2000). Genomes of other insect mutualists were reanalyzed in the same manner for a consistent metabolic comparison. Comparison of protein divergences The RSD algorithm (Wall et al. 2003) was used to identify the reciprocal best BLAST hits (rbh) between translated ORFs of select bacterial genomes. The program used BLAST to identify potential matches of a given translated gene, aligned all potential matches using CLUSTALW (Thompson et al. 1994), and calculated a maximum likelihood estimation of amino acid substitutions between proteins using PAML (Yang 1997). Protein divergences were based on an empirical amino acid substitution rate matrix (Jones et al. 1992) and accounted for variation in evolutionary rates among protein sites using a γ distribution with shape parameter α = 1.53 (as recommended by Nei et al. 2001). The protein with the lowest divergence was then BLASTed against the first genome, followed by the alignment and divergence calculations. If the protein match with the lowest divergence was the same as the original query sequence, the pair was considered orthologous and the divergence was retained in the output. Such comparisons were performed within endosymbiont groups: B. pennsylvanicus versus B. floridanus; Buchnera–A. pisum versus Buchnera–S. graminum; and Buchnera–A. pisum versus Buchnera–Baizongia pistaciae. Divergences in endosymbionts were compared to the enterobacterial pairs E. coli versus S. typhimurium, and E. coli versus P. luminescens. Genomes were downloaded from NCBI in June 2004. All genomes were compared to E. coli using RSD for ortholog detection and MultiFun-based functional assignments (detailed in Supplemental material). Rates of protein divergence along the lineages leading to B. pennsylvanicus and B. floridanus were compared using E. coli as an outgroup. Relative rates were calculated as follows, with “B0” representing the common ancestor of the two Blochmannia lineages: Acknowledgments We thank Andrew McArthur, Margrethe Serres, Seth Bordenstein, Seth Kauppinen, and four anonymous reviewers for comments on the manuscript, and are grateful to Daniel Hahn, Diana Wheeler, Stefan Cover, and Diana Davidson for helpful discussion of Camponotus biology. Amy McCurley and Josh Larson contributed to the library construction, and Jeremy Brozek confirmed the B. pennsylvanicus hisH sequence. We also thank S. Kauppinen for assistance in aligning and analyzing pseudogene sequences. Andrew McArthur, Margrethe Serres, Michael Cipriano, Sulip Goswami, Hilary Morrison, and Matt Beverly provided intellectual and technical support with genome assembly and annotation. Alessandro Romualdi provided software and support for the JENA Prokaryotic Genome Viewer, Dennis P. Wall provided access to RSD, and D. Wall and Richard Fox offered assistance implementing this program. We thank Brian Dale for permission to collect C. pennsylvanicus for this study. This research was supported by grants to J.J.W. from NIH (R01 GM62626-01) and the NSF (DEB 0089455). Additional support was provided by the NASA Astrobiology Institute (NCC2-1054 and NNA04CC04A). Template preparation and DNA sequencing was performed in the W.M. Keck Facility for Ecological Genomics at the Marine Biological Laboratory. Notes [Supplemental material is available online at www.genome.org. The complete, annotated genome sequence has been submitted to GenBank under accession no. CP000016. Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.3771305. References
Web site references
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||||||||||||||
Nature. 2000 Sep 7; 407(6800):81-6.
[Nature. 2000]Science. 2002 Jun 28; 296(5577):2376-9.
[Science. 2002]Proc Natl Acad Sci U S A. 2003 Jan 21; 100(2):581-6.
[Proc Natl Acad Sci U S A. 2003]Proc Natl Acad Sci U S A. 2003 Aug 5; 100(16):9388-93.
[Proc Natl Acad Sci U S A. 2003]Curr Opin Genet Dev. 2004 Dec; 14(6):627-33.
[Curr Opin Genet Dev. 2004]Annu Rev Microbiol. 2001; 55():357-80.
[Annu Rev Microbiol. 2001]Genome Biol. 2001; 2(12):REVIEWS1032.
[Genome Biol. 2001]Mol Microbiol. 1996 Aug; 21(3):479-89.
[Mol Microbiol. 1996]Int J Syst Evol Microbiol. 2000 Sep; 50 Pt 5():1877-86.
[Int J Syst Evol Microbiol. 2000]Syst Biol. 2004 Feb; 53(1):95-110.
[Syst Biol. 2004]Proc Natl Acad Sci U S A. 2003 Aug 5; 100(16):9388-93.
[Proc Natl Acad Sci U S A. 2003]Appl Environ Microbiol. 2002 Sep; 68(9):4187-93.
[Appl Environ Microbiol. 2002]Appl Environ Microbiol. 2004 Jul; 70(7):4096-102.
[Appl Environ Microbiol. 2004]Syst Biol. 2004 Feb; 53(1):95-110.
[Syst Biol. 2004]Genome Res. 2004 Nov; 14(11):2273-8.
[Genome Res. 2004]Nucleic Acids Res. 1990 Jun 25; 18(12):3529-35.
[Nucleic Acids Res. 1990]Genome Biol. 2005; 6(3):R25.
[Genome Biol. 2005]Gene. 2002 Mar 20; 286(2):187-201.
[Gene. 2002]EMBO J. 2003 Nov 3; 22(21):5941-50.
[EMBO J. 2003]Syst Biol. 2004 Feb; 53(1):95-110.
[Syst Biol. 2004]Microb Comp Genomics. 2000; 5(4):205-22.
[Microb Comp Genomics. 2000]Proc Natl Acad Sci U S A. 2003 Aug 5; 100(16):9388-93.
[Proc Natl Acad Sci U S A. 2003]J Biol Chem. 2004 Jul 16; 279(29):30106-13.
[J Biol Chem. 2004]Subcell Biochem. 1997; 28():57-87.
[Subcell Biochem. 1997]J Biol Chem. 2001 Mar 16; 276(11):7876-83.
[J Biol Chem. 2001]Microbiol Rev. 1993 Mar; 57(1):50-108.
[Microbiol Rev. 1993]Nature. 2000 Sep 7; 407(6800):81-6.
[Nature. 2000]Science. 2002 Jun 28; 296(5577):2376-9.
[Science. 2002]Proc Natl Acad Sci U S A. 2003 Aug 5; 100(16):9388-93.
[Proc Natl Acad Sci U S A. 2003]Proc Natl Acad Sci U S A. 2003 Jan 21; 100(2):581-6.
[Proc Natl Acad Sci U S A. 2003]Proc Natl Acad Sci U S A. 2003 Aug 5; 100(16):9388-93.
[Proc Natl Acad Sci U S A. 2003]J Bacteriol. 2000 Nov; 182(21):6106-13.
[J Bacteriol. 2000]J Biol Chem. 2001 Feb 9; 276(6):4441-6.
[J Biol Chem. 2001]Proc Natl Acad Sci U S A. 2003 Jan 21; 100(2):581-6.
[Proc Natl Acad Sci U S A. 2003]Trends Genet. 2001 Oct; 17(10):589-96.
[Trends Genet. 2001]Microbiology. 2003 Sep; 149(Pt 9):2585-96.
[Microbiology. 2003]Syst Biol. 2004 Feb; 53(1):95-110.
[Syst Biol. 2004]Mol Biol Evol. 2004 Jun; 21(6):1110-22.
[Mol Biol Evol. 2004]Mol Biol Evol. 1999 Nov; 16(11):1586-98.
[Mol Biol Evol. 1999]Science. 2002 Jun 28; 296(5577):2376-9.
[Science. 2002]Proc Natl Acad Sci U S A. 2003 Jan 21; 100(2):581-6.
[Proc Natl Acad Sci U S A. 2003]Science. 2002 Jun 28; 296(5577):2376-9.
[Science. 2002]Nature. 2000 Feb 10; 403(6770):665-8.
[Nature. 2000]Proc Natl Acad Sci U S A. 1962 Jan 15; 48():81-94.
[Proc Natl Acad Sci U S A. 1962]Nucleic Acids Res. 1990 Jun 25; 18(12):3529-35.
[Nucleic Acids Res. 1990]Genome Biol. 2005; 6(3):R25.
[Genome Biol. 2005]EMBO Rep. 2003 May; 4(5):499-504.
[EMBO Rep. 2003]Mol Biol Evol. 1999 Nov; 16(11):1586-98.
[Mol Biol Evol. 1999]Nature. 2000 Sep 21; 407(6802):327-39.
[Nature. 2000]Cell. 2001 Nov 30; 107(5):679-88.
[Cell. 2001]Comp Biochem Physiol B Biochem Mol Biol. 1995 Sep; 112(1):15-9.
[Comp Biochem Physiol B Biochem Mol Biol. 1995]Appl Environ Microbiol. 2004 Jul; 70(7):4096-102.
[Appl Environ Microbiol. 2004]J Mol Evol. 1999 Feb; 48(2):142-50.
[J Mol Evol. 1999]Microbiology. 2002 Aug; 148(Pt 8):2551-6.
[Microbiology. 2002]Anal Biochem. 1996 Apr 5; 236(1):107-13.
[Anal Biochem. 1996]Genome Res. 1998 Mar; 8(3):186-94.
[Genome Res. 1998]Genome Res. 1998 Mar; 8(3):175-85.
[Genome Res. 1998]J Mol Biol. 1990 Oct 5; 215(3):403-10.
[J Mol Biol. 1990]Genome Res. 2003 Jan; 13(1):91-6.
[Genome Res. 2003]Genome Res. 1998 Mar; 8(3):195-202.
[Genome Res. 1998]Nucleic Acids Res. 1999 Dec 1; 27(23):4636-41.
[Nucleic Acids Res. 1999]Nucleic Acids Res. 2004 Jan 1; 32(Database issue):D138-41.
[Nucleic Acids Res. 2004]Bioinformatics. 2003 Sep 1; 19(13):1710-1.
[Bioinformatics. 2003]Nucleic Acids Res. 2001 Jan 1; 29(1):123-5.
[Nucleic Acids Res. 2001]Nucleic Acids Res. 2004 Jan 1; 32(Database issue):D300-2.
[Nucleic Acids Res. 2004]Microb Comp Genomics. 2000; 5(4):205-22.
[Microb Comp Genomics. 2000]Nucleic Acids Res. 2000 Jan 1; 28(1):27-30.
[Nucleic Acids Res. 2000]Bioinformatics. 2003 Sep 1; 19(13):1710-1.
[Bioinformatics. 2003]Nucleic Acids Res. 1994 Nov 11; 22(22):4673-80.
[Nucleic Acids Res. 1994]Comput Appl Biosci. 1997 Oct; 13(5):555-6.
[Comput Appl Biosci. 1997]Comput Appl Biosci. 1992 Jun; 8(3):275-82.
[Comput Appl Biosci. 1992]Proc Natl Acad Sci U S A. 2001 Feb 27; 98(5):2497-502.
[Proc Natl Acad Sci U S A. 2001]Comput Appl Biosci. 1997 Oct; 13(5):555-6.
[Comput Appl Biosci. 1997]Nucleic Acids Res. 1997 Dec 15; 25(24):4876-82.
[Nucleic Acids Res. 1997]Nucleic Acids Res. 2003 Jul 1; 31(13):3537-9.
[Nucleic Acids Res. 2003]