• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of pnasPNASInfo for AuthorsSubscriptionsAboutThis Article
Proc Natl Acad Sci U S A. Nov 11, 2008; 105(45): 17516–17521.
Published online Nov 5, 2008. doi:  10.1073/pnas.0802782105
PMCID: PMC2579889

Metagenome analysis of an extreme microbial symbiosis reveals eurythermal adaptation and metabolic flexibility


Hydrothermal vent ecosystems support diverse life forms, many of which rely on symbiotic associations to perform functions integral to survival in these extreme physicochemical environments. Epsilonproteobacteria, found free-living and in intimate associations with vent invertebrates, are the predominant vent-associated microorganisms. The vent-associated polychaete worm, Alvinella pompejana, is host to a visibly dense fleece of episymbionts on its dorsal surface. The episymbionts are a multispecies consortium of Epsilonproteobacteria present as a biofilm. We unraveled details of these enigmatic, uncultivated episymbionts using environmental genome sequencing. They harbor wide-ranging adaptive traits that include high levels of strain variability analogous to Epsilonproteobacteria pathogens such as Helicobacter pylori, metabolic diversity of free-living bacteria, and numerous orthologs of proteins that we hypothesize are each optimally adapted to specific temperature ranges within the 10–65 °C fluctuations characteristic of the A. pompejana habitat. This strategic combination enables the consortium to thrive under diverse thermal and chemical regimes. The episymbionts are metabolically tuned for growth in hydrothermal vent ecosystems with genes encoding the complete rTCA cycle, sulfur oxidation, and denitrification; in addition, the episymbiont metagenome also encodes capacity for heterotrophic and aerobic metabolisms. Analysis of the environmental genome suggests that A. pompejana may benefit from the episymbionts serving as a stable source of food and vitamins. The success of Epsilonproteobacteria as episymbionts in hydrothermal vent ecosystems is a product of adaptive capabilities, broad metabolic capacity, strain variance, and virulent traits in common with pathogens.

Keywords: Epsilonproteobacteria, hydrothermal vent

The polychaete Alvinella pompejana is an endemic inhabitant of deep-sea hydrothermal vents located from 21°N to 32°S latitude on the East Pacific Rise (1). This tube-dwelling polychaete forms dense colonies exclusively on the walls of high-temperature black smoker chimneys (2, 3), which are characterized by extreme physicochemical gradients and dynamic in thermal emission rates and intensive mineral precipitation. The high-temperature diffuse flow surrounding the worms' tubes is acidic (pH 4.2–6.1) carrying high levels of total (free + complexed) hydrogen sulfide (>1 mM), ammonia (3.8–10 μm), and reactive heavy metals (0.3–200 μM) including ferrous iron (290–840 μm) (2). Temperature fluctuations in the actual tubes of A. pompejana range from 29 °C to 84 °C while the chemical conditions are anoxic, slightly acidic to near neutral pH (5.33–6.9), and rich in electron acceptors (sulfate, nitrate, Fe III, and Mg) as well as potentially lethal levels of heavy metals (2, 3). The tube fluids contain surprisingly low levels of free H2S (<0.2 μM to 46.53 μM) and are a mix of ambient seawater (72–91%) supplemented with vent-derived emissions (3, 4). An analysis of the thermal tolerance of a structural protein biomarker (5) supports the assertion that A. pompejana is likely among the most thermotolerant and eurythermal metazoans on Earth (6, 7).

A. pompejana is characterized by a filamentous microflora that forms cohesive hair-like projections from mucous glands lining the polychaete's dorsal intersegmentary spaces (8). The episymbiont community is constrained to the bacterial subdivision, Epsilonproteobacteria, (9)—a taxonomic class that is widely represented in hydrothermal vent ecosystems on surfaces, free-living, in symbioses with invertebrates, in sediments and the shallow subsurface (10, 11). Distribution of the 2 dominant members (5A and 13B) of this Epsilonproteobacteria consortium (9) appears highly structured along each hair-like projection. The community was reported to comprise between 10 to 15 phylotypes (related at ≥99% SSU rRNA gene identity), of which >98% were Epsilonproteobacteria (9). Most described host-symbiont associations consist of either monospecific relationships [e.g., Euprymna squid–Vibrio fischeri (12)] or more often, divergent multispecies associations [e.g., termites (13), marine oligochaete, Olavius algarvensis (14)]. However, population-level variation has been observed in some symbioses, such as the marine sponge Axinella mexicana, which is associated with 2 dominant crenarchaeotal populations (15) and the gammaproteobacterial endosymbionts of the vent tubeworm Riftia pachyptila (16). The diverse yet phylogenetically constrained episymbionts of A. pompejana are in stark contrast to a vent-associated shrimp from the Mid-Atlantic Ridge, Rimicaris exoculata, which has established a more conventional single-species association with an epsilonproteobacterium (11). Although the A. pompejana episymbionts have eluded isolation, molecular approaches have revealed that two of the dominant epsilonproteobacterial phylotypes have the capability for chemolithoautotrophic growth via the reductive TCA cycle (17). Until recently, it was believed carbon was solely fixed chemoautotrophically via the Calvin Benson cycle in vent microbial communities (17).

We performed an environmental genomic analysis of the A. pompejana episymbiont community in an effort to define the basic metabolic strategy of the association and elucidate why Epsilonproteobacteria are so successful in this environment. We hypothesized that under extreme constraints imposed by the local geochemical environment, the Epsilonproteobacteria episymbiotic consortium employs a core metabolic strategy shared by most members of the community and that geochemical and thermal fluctuations impose significant selection pressures on the community. Identification of the metabolic and cellular features of this microbial community have provided details regarding the relationships between the symbionts and their host as well as the nature of epibiont adaptation in a dynamic physiochemical environment

Results and Discussion

Metagenome Characteristics and Comparisons.

The primary metagenomic library (97% of the sequence generated) was prepared from a single worm collected from one of the 9°N hydrothermal vent sites. Sequences from 2 other genomic libraries generated from pools of 4 and 3 worms respectively (9,061 sequences) were included in the analysis as their community composition was similar to the primary dataset; this added approximately 4.5 MB of DNA to aid in the sequence assembly. The highly variable physical and chemical features of the worm habitat are characteristic of this ecosystem (e.g., temperature ≈40–70 °C) (supporting information (SI) Table S1) and are similar to those measured previously (2, 3).

At least 27 highly related strains of bacteria (and no archaea) were detected from V1-V2 SSU rRNA gene sequences from the A. pompejana episymbiont metagenome (referred to hereafter as EM). Two clusters of sequences were related to the previously described ribotypes 5A and 13B (9). These clusters dominate the consortium (35% and 30%, respectively, based on the V1-V2 aligned sequences) (Fig. S1). The shotgun metagenome SSU rRNA gene sequences were very similar (82% of the sequences >98% identity) to a PCR SSU RNA gene clone library derived from the same DNA (Fig. S2). The PCR library has a higher power of sampling depth and revealed greater population diversity than the EM data revealed alone (Fig. S3). Rarefaction analysis of conserved (rpS12) to the variable (recA) protein-coding genes (Fig. 1) identified between 6 and 31 distinct sequences, respectively.

Fig. 1.
Rarefaction curves for SSU rRNA genes and selected proteins in the EM. Rarefaction curves were determined for the SSU rRNA gene (216 bases, 79 sequences at 99% similarity) and the following proteins: recombinase subunit A (recA, 280 bases, 43 sequences ...

The EM dataset consisted of 300,607 high-quality reads that assembled into short contigs at 95% nucleotide identity (Table S2). Attempts to bin either the raw sequence data or the assembled data based on G+C content were unsuccessful because the data were almost uniformly distributed around the 35% average (Fig. S4). The assembled contigs averaged 1,462 bp in length and comprised 38.5 MB. The total dataset (68 MB), including singletons, contained 128,412 sequences. A total of 103,371 proteins was predicted from these data. Forty-one percent of these predicted proteins could be assigned to a cluster of orthologous group (COG). Given the complexity of the dataset, the amino acid sequences were clustered at 40% identity to simplify data analysis and help establish a nonredundant core episymbiont metagenome (hereafter referred to as the EM-C) (Fig. S5). The EM-C dataset was designated as the 2812 clusters (of 36,221) with greater than 5 amino acid sequences in them. These clusters comprised ≈60% of the total predicted proteins.

The EM-C dataset was compared against Epsilonproteobacteria genomes from pathogens (Helicobacter pylori J99, Campylobacter jejuni 11168, and Helicobacter hepaticus, ATCC 51449) and free-living organisms (Sulfurovum sp. NBC37–1, Nitratiruptor sp. SB155–2, and Sulfurimonas denitrificans, ATCC33889). The EM-C contained most of the COGs found in both pathogen and free-living Epsilonproteobacteria. Pathogen Epsilonproteobacteria genomes contained, on average, 1014 different COGs, of which ≈80% are in common between these genomes and 72% are found in EM-C. Free-living Epsilonproteobacteria genomes contained, on average, 1174 different COGs, of which ≈82% are in common and 72% are found in EM-C. There are 598 predicted proteins associated with COGs in common among the 3 datasets. The distributions of COGs between the EM-C, free-living, and pathogen Epsilonproteobacteria genomes were compared (Fig. 2).

Fig. 2.
COG distributions in the EM-C dataset compared with 3 pathogenic epsilonproteobacteria (bold, capitalized letters) and 3 free-living bacteria (lowercase letters). The letters indicate the specific COG role category.

Distribution of predicted proteins in COG functional classes were similar to other Epsilonproteobacteria, including an over-representation of translation COGs compared with transcription COGs (5.6% and 1.5%, respectively, for EM-C vs. 5.2% and 2.6%, respectively, for Epsilonproteobacteria vs. 3.7% and 4.9%, respectively, for all bacterial genomes). The relatively high frequency of translation COGs is a function of the small genome sizes of Epsilonproteobacteria and basic translation needs of a microorganism. However, there are fewer translation COGs in epsilon pathogen genomes than free-living epsilons and the EM-C. This difference is a result of multiple copies of both putative methionyl-tRNA formyltransferase and putative rRNA methylases in free-living genomes. The low number of genes in transcription COGs could be the result of a combination of alternate gene expression regulators such as guanosine-3′-diphosphate-5′ diphosphate (ppGpp), high levels of environmental adaptation, and lack of competition (18). Normalized to genome size, the Epsilonproteobacteria have approximately 4-fold fewer transcriptional regulators than E. coli.

Signal transductions COGs (COG ‘T’) were found in greater abundance in free-living Epsilonproteobacteria (4.5%) and the EM-C (2.8%) compared with pathogens (1.3%)—possibly a result of the stochastic environment characteristic of both the free-living organisms and episymbionts. The complexity of the cell wall and membrane biosynthesis pathways (COG ‘M’) in the episymbionts compared with other Epsilonproteobacteria are reflected in the abundance of genes found in this category. It is the most represented COG category, accounting for 8% of the EM-C. The most abundant class of proteins in the entire dataset, however, were putative replication initiator proteins associated with plasmids (Table S3). Efforts to detect plasmid DNA in episymbiont genomic DNA preparations from several A. pompejana samples were unsuccessful; however, contigs comprised of unidentifiable DNA and long stretches of hypothetical proteins, some of which may be plasmid in origin, were identified by using tetranucleotide analysis (Fig. S6). Not surprisingly, the EM-C has fewer proteins associated with the cell motility COG (<1%) than either the free-living epsilons (2%) or pathogens (3.5%) (Fig. 2 and Fig. S6). In the EM-C, these proteins are associated with a type II secretory pathway (Table S4), methyl-accepting chemotaxis, and cytolysin. Pathogens are unable to synthesize all amino acids compared with the EM-C (Fig. 3) and free-living epsilons. COGs that are uniquely found in pathogens include some outer membrane proteins, urease accessory proteins, and proteins involved in the type IV secretory pathway (Fig. S7).

Fig. 3.
Model of predicted metabolic processes in the episymbiont cell based on annotation of the EM-C. The KEGG database was used as a basis for metabolic reconstruction.

The average G+C content of pathogen and free-living genomes is 35.4% and 39.4%, respectively. The EM G+C content is 35%. There were no interpretable differences in average pI of the predicted proteomes; there were also no correlations with G+C content of the genomes as has been shown in free-living and endosymbiotic Gammaproteobacteria (19).

Protein branch lengths of concatenated ribosomal proteins (5235 aa, gaps removed) (Fig. S8) are similar between most Epsilonproteobacteria and EM-C, with the exception of Nitratiruptor sp. Branch-length similarity suggests that evolutionary forces acting on pathogens (lack of repair genes, high population heterogeneity) are similarly affecting the free-living organisms living in the unique hydrothermal-vent environment, including the episymbionts associated with A. pompejana. The ribosomal protein phylogeny also suggests that the EM-C is more similar to other free-living Epsilonproteobacteria than to the Epsilonproteobacteria pathogens, and it is most closely related to Sulfurovum sp. NBC37–1. This relationship is also apparent upon inspection of ribosomal RNA phylogeny based on maximum likelihood (Fig. S3).

Eurythermalism in the Episymbiont Proteome.

Ecoparalogs are functionally equivalent genes in a single genome adapted to different ecological niches (20). Although the complexity of our dataset precluded colocating homologs in a single genome, numerous divergent gene variants were found. We hypothesize that selection pressures in the hydrothermal vent environment have disproportionate effects on amino acid usage and protein structure resulting in sequence variability like that found in EM groEL and recA sequences (Fig. S9).

Predicted structural changes between EM-C gene variants (2 to 7 per protein) and other Epsilonproteobacteria orthologs were compared for the following proteins: GroES, GroEL, and glutamate dehydrogenase (GDH) and 3 proteins that comprise the 50S subunit of the ribosome (L1, which is important in the binding of rRNA; L11, which is involved in the elongation process of protein translation; and L19, which binds to L14 and is involved in stabilizing the central region of the large subunit). There were 7 gene variants of the groES protein between 95% and 60% amino acid identity, and there were 5 variants of the L19 protein between 91% and 60% amino acid identity. The gene variants for all proteins considered in this analysis have amino acid differences that could be inferred to provide different thermal optima; they have differences in Arg and Lys usages, as well as varying number of salt bridges (from 0 to 5) (21, 22). Recent experimental work (23) determined in vitro kinetic parameters for two EM enzymes, GDH and isopropylmalate dehydrogenase. These results provided the first experimental evidence demonstrating the wide range of thermodynamic parameters found in the EM.

Predicted structural changes between proteins encoded by EM-C gene variants with orthologs from Sulfurovum sp., H. pylori J99, and an unrelated thermophile, Thermus thermophilus, resulted in several different structures (Fig. 4). Structural differences aligned 1 variant to a thermophile structure and another to a mesophile structure. In addition, EM L19 ribosomal proteins had a number of amino acid substitutions (especially Lys and Arg) (Fig. S10). Differences in amino acid usage and predicted structures may confer a broader thermal tolerance on the stability and function of proteins in the episymbionts. Perhaps this tolerance provides a competitive advantage in vent ecosystems. The SSU RNA data showing a diverse, closely related population structure and gene variant data suggest that episymbionts have populations with different thermal and chemical optima. Epsilonproteobacteria have relatively large shared core genes despite low degrees of synteny and high levels of genome rearrangements (24). This feature likely contributes to their success and diversity. As well, there is supporting evidence that the host, A. pompejana, is also well adapted to these temperature fluctuations (3, 6, 7).

Fig. 4.
Homology based 3D models of the L19 ribsomal protein. Proteins are from T. thermophilus, H. pylori J99, and Sulfurovum sp. (A) and from 2 ecoparalogs in the EM (B). Each amino acid is mapped onto the structure based on its BLOSUM60 substitution probability ...

Amino acid usage in 104 EM-C proteins within 5% of the length of H. pylori orthologs (average length is 308 aa) was also examined. On average (inclusive of gene variants that result in higher statistical variation), proteins from the episymbionts had a more acidic pI (7.7 ± 2.4 vs. 8.2 ± 2.1, t test, P < 0.001); were slightly more hydrophilic (grand average of hydropathicity, −0.3 ± 0.3 vs. 0.24 ± 0.35, t test, P < 0.01; as in ref. 25); and used significantly more Asp, Gly, Ile, Arg, and Thr in comparison with the pathogens. Pathogen Epsilonproteobacteria used more Ala, Cys, His, Leu, and Gln (t test, P < 0.01). These results suggest that on average, the EM-C has an amino acid usage profile more consistent with thermophily than its mesophilic Epsilonproteobacteria neighbors. For example, it has been repeatedly demonstrated that thermophiles use more Arg and less Gln and His (21, 26).

Foreign DNA, Defenses, and Virulence in the EM.

Evidence of foreign DNA further supports our hypothesis that these episymbionts are uniquely adapted to thrive in this environment by maintaining the ability to constantly update and modify their genomic content as has been reported for numerous populations of H. pylori that harbor flexible (plastic) genomes (i.e., refs. 24 and 27). Abundant plasmid replication proteins (Table S3), a plasmid-encoded colicin V-related protein, multiple integrases, and transposases associated with IS elements are present in the episymbiont core, including several with unique sequences in comparison with the other 6 epsilon genomes used in our analyses (Table S4). Integrases and associated gene cassettes investigated in other vent-associated systems (28) were found to be diverse and, in combination with our observations here, suggest that mobile DNA may play a major role in microbial evolution and adaptation in vent environments. Several gene homologs encoding pili-related proteins, which have orthologs used in type II secretion and exogenous DNA uptake, were identified. The competence pathway in the episymbionts is unknown; however, a complete pathway using a type IV secretion as reported for H. pylori (29) was not found. We identified putative island-associated regions that contained long stretches of hypothetical proteins (Fig. S5) using tetranucleotide clustering of EM contigs >5 kb, which further substantiates the claim that genomic islands, plasmid, and phage-like elements are significant in this consortium. Furthermore, nucleotide clustering [using D2 cluster (30)] revealed significant amounts of repetitive DNA, which are likely associated with genomic islands and serine/aspartic acid-rich regions of adhesion factors as they are in other epsilonproteobacterial genomes (31).

Bacterial defenses and virulence traits were also well represented in the EM-C. Several genes were found to encode antimicrobial peptide transport, multidrug transport, and restriction endonuclease type I and II systems with homology to other Epsilonproteobacteria in addition to other bacteria. Numerous virulence genes identified in other Epsilonproteobacteria genomes were detected in the EM-C; however, the prevalence of virulence genes and unique types in the episymbiont core are noteworthy (Table S4).

Energy Metabolism and Carbon Utilization.

A versatile suite of carbon processing and energy generation capabilities is present in the Epsilonproteobacteria episymbiont consortium. The EM-C encodes complete pathways for reductive TCA, denitrification (via Nap, NirS, Nor, and Nos), and sulfur oxidation (via sox system), similar to other vent-associated epsilonproteobacterial genomes, including strains of S. denitrificans, Sulfurovum sp. NBC37–1, and Nitratiruptor sp. SB155–2 (Fig. 3). The EM-C also contains homologs of flavocytochrome c sulfide dehydrogenase, suggesting elemental sulfur and/or polysulfide production, and the enzymes required for direct sulfite oxidation via APS reductase and APS sulfurylase. In addition, α- and β-subunits of polysulfide reductase were detected. Within Epsilonproteobacteria genomes, these subunits are only present in Sulfurimonas denitrificans (32) and Wolinella succinogenes (33). Genes encoding dsr and qmo redox complexes [for dissimilatory sulfate reduction (34)] were not detected. The diverse suite of enzymes involved in sulfur cycling in the EM-C also allows for the metabolism of various sulfur intermediates (including thiosulfate), enabling resistance to pH fluctuations in this environment.

Glycolysis, PTS, and pentosphosphate cycles are more complete than in other Epsilonproteobacteria genomes (i.e., phosphorylase and phosphoglucomutase are present, suggesting the potential for glycogen utilization and formation, although ribose-5-phosphate isomerase appears to be missing). The episymbionts are found intimately associated with mucopolysaccharide. However, genes encoding metalloendopeptidase, betaglucosidase, xylanase, and a chitin deacetylase were found in low abundance. Two essential enzymes in the Entner Doudoroff pathway (6-phosphogluconate dehydratase and 2-keto-3-deoxy-6-phosphogluconate aldolase) are absent in both the episymbiont dataset and other free-living Epsilonproteobacteria genomes. They are present, however, in the Epsilonproteobacteria pathogen genomes—indicating free glucose utilization as an energy source. The EM-C encodes numerous enzymes for pyruvate metabolism, although the most prevalent is the biosynthetic enzyme pyruvate synthetase, consistent with rTCA-carbon fixation. A portion of the episymbiont consortia likely generates energy via pyruvate synthesis from glucose, as all of the enzymes in the pathway are present (Table S3). Although we detected all enzymes to run TCA in the forward direction, sequences encoding ATP citrate lyase subunits were much more abundant (at least 8:1) than those with homology to citrate synthase. Use of the TCA cycle in Epsilonproteobacteria varies between different species. C. jejuni genomes encode a complete oxidative TCA cycle, whereas both H. pylori genomes and H. hepaticus appear to use a branched pathway in which part of the cycle functions in the reductive direction. The EM-C also encodes 2 fumarate reductases that may act as an alternate electron acceptor as suggested in other Epsilonproteobacteria genomes (32). This enzyme is also required in the rTCA cycle. The EM-C encodes both a group I membrane-bound Ni-Fe hydrogenase for H2 uptake (HypF, A, E, D, C, and B) and a group II H2-sensing Ni-Fe hydrogenase (HupV and HupS), which is similar to S. denitrificans, although lacking the H2-evolving function found in Sulfurovum sp. and Nitratiruptor sp. (35). Predicted proteins for several NADH dehydrogenases and 2 cytochrome oxidases also are found in the EM dataset. A flavoprotein, with 30% homology to putative fixD encoding genes in several Sulfolobus species, is unique in comparison with other epsilonproteobacterial genomes. Despite being a phylogenetically constrained epsilonproteobacterial consortium, the EM is metabolically versatile, consistent with other Epsilonproteobacteria (32, 35) and with the oligochaete endosymbiotic system (14).

Relationship Between the Host and Episymbiont.

Although these analyses have focused on the episymbionts, we suspect that the host benefits nutritionally from this unique symbiosis. Indeed, we observed worms eating the fleece off the back of their neighbors (Movie S1). Because the episymbionts synthesize all of their amino acids, the nutritional benefits may go beyond a rich source of reduced carbon. The episymbionts could provide a dietary source of vitamin B6 and pyridoxal 5′-phosphate (PLP), which is the active form of vitamin B6 and an important cofactor in amino acid metabolism (36). Genes for both pathways in the de novo synthesis of PLP are present, suggesting that the EM has the capability to synthesize PLP from d-erythrose-4-phosphate (a pentose phosphate pathway derivative) and glyceraldehyde-3-phosphate (a glycolysis derivative). This capability distinguishes the EM from all of the other Epsilonproteobacteria. Retention of both of these routes again suggests that the organisms rely on different modes of metabolism for sustained periods, depending on oxygen availability and fluctuating redox potential of the environment. Nonetheless, given the location and structure of the episymbiont community on the worm and its presence on all worms collected throughout its known geographic range (6,000 km), it is clear that the partnership is favored.


Analyses of the metagenome and SSU rRNA sequence data from an A. pompejana episymbiont consortium revealed a multispecies community comprised exclusively of closely related Epsilonproteobacteria related to a sequenced, vent-associated, free-living bacterium, Sulfurovum sp. NBC37–1. Predicted proteins clustered at 40% amino acid identity comprised a core dataset representing ≈60% of the sequences. Many clusters consisted of multiple sequence variants suspected to range in thermal optima based on amino acid usage and structural predictions. Metabolic reconstruction revealed numerous pathways (e.g., rTCA, denitrification, and sulfur oxidation), and analysis of multiple homologous protein structures confirm that the episymbiont consortium has capabilities to thrive in and mediate the intense physiochemical hydrothermal vent environment. Also, multiple vitamin and amino acid biosynthetic pathways and the persistence and prevalence of the symbiosis across the geographic range of A. pompejana suggest that the host may benefit from autotrophic biosynthetic capabilities of the episymbionts. The episymbionts have characteristics of pathogens and biofilms that facilitate their success. They are armed with many characteristics of their pathogenic Epsilonproteobacteria relatives, harboring the genetic armory required to evade a host response—through cell-surface modifications, mobile DNA, and other flexible gene pool features (31, 37).


Collection, DNA Extraction, and Screening.

A. pompejana specimens in their associated tubes were collected at the East Pacific Rise by the deep submergence research vehicle Alvin in 1994 (13°N), 1999 (9°N), 2001 (9°N), and 2002 (9°N). Before collection, the temperature of the worms' immediate environment was measured and discrete water samples were taken from within the tubes for geochemical analysis (3). Individual alvinellids were collected, transported to the surface, and frozen (17, 38). The epibionts were removed with sterile forceps and placed in a Tris-SDS-proteinase K lysis buffer for 1 h. The nucleic acids then were recovered by using a cetyltrimethylammonium bromide extraction protocol (17). RNA was removed with RNace-It (Stratagene).

Semiquantitative PCR was used to estimate the extent of eukaryotic contamination before library construction using universal primers against the actin gene (Applied Biosystems) (17). The percentage of eukaryotic DNA was estimated by calculating the approximate number of genome equivalents of A. pompejana DNA—haploid genome content is 0.8 pg or 782 MB per total amount of DNA (39). The samples were analyzed by denaturing gradient gel electrophoresis (DGGE) of the V3 region of the 16S rRNA gene (40). Samples were chosen for library construction if they represented the majority of the episymbiont population, based on DGGE analysis.

SSU rRNA Gene PCR Clone Library.

The 16S rRNA gene was amplified by PCR using universal primers from DNA extracted from a single episymbiont community (AP201, Extreme 2002, Dive 3836, from Bio9 chimney at 9°N, EPR). PCR conditions were essentially as described previously (40); however, low cycle amplifications were done in triplicate to minimize PCR bias, and products were pooled before cloning into the PCR TA Topo cloning vector (Invitrogen). The region containing the 16S rRNA gene was amplified with primers M13F and M13R. The PCR product was then sequenced on an ABI3130 Genetic Analyzer (Applied Biosystems) with T3 and T7 primers.

Metagenomic Library Construction and Sequencing.

Initial libraries were prepared by using a modified BAC shotgun library construction protocol (Invitrogen). DNA was nebulized to 1- to 4-kb-sized fragments (selected via agarose gel electrophoresis, end-repaired, and phosphorylated using a combination of T4 and Klenow polymerases) and cloned into a blunt-ended TA Topo vector. For large library preparation, DNA was ligated into the gap-free cloning vector pSMART-HCKan (Lucigen). Specific details on DNA sequencing and quality control are in the SI Methods.

Assembly, Clustering, and Annotation.

Upon quality screening of the data, 270,000 reads were assembled using the program PCAP (41), a parallelized version of CAP3 (42) with the following modifications: the overlap score cutoff was increased to 6000 (a more stringent value to prevent false joins), a score of 50 was used in the masking and identification of repetitive regions, and 95% was used as the overlap percent identity. The assembly resulted in generation of approximately 26,000 contigs and approximately 67,000 singletons. There were 374 contigs greater than 5 kb (Table S2).

Amino acid sequences from the assembled dataset plus unassembled sequences were clustered using the program CD-HIT (43). A nonredundant dataset consisted of the longest sequences in clusters ≥40% amino acid identity using a word length of 3 and a tolerance of 5 (44).

A customized entity relational database was designed and implemented to store the sequence data and associated automated annotation information (45). We integrated several annotation tools, including BLAST (46), FgenesB (Softberry) for gene calling and COG assignment, PRIAM (47), and a custom Pfam database generated with prokaryote-only sequences (M. Gollery, personal communication). The epibiont metagenome database is accessible through a graphical user interface and is available for browsing (http://ocean.dbi.udel.edu; login as guest).

Tools Used in Data Analysis.

Concatenated ribosomal protein alignments and phylogeny were created using ClustalW and the program VMD (http://www.ks.uiuc.edu/Research/vmd/). The protein maximum likelihood phylogeny was created using PHYLIP (v3.67). Maximum likelihood analysis of nearly full-length SSU rRNA genes was conducted with fastDNAml (48). Comparative analysis of gene content between the EM-C and published Epsilonproteobacteria genomes was performed using published datasets (31, 35) and tools provided in the IMG database (49). Methods for rarefaction analysis are in SI Methods.

Supplementary Material

Supporting Information:


We thank Roger Bhan (Sym-Bio Inc.) for bioinformatics support, Martin Gollery (University of Nevada, Reno, NV), Scott Hazelhurst (University of the Witwatersrand, Johannesburg, South Africa), and Win Hide (South African National Bioinformatics Institute, Belville, South Africa). This research was supported by National Science Foundation Grants OCE-0120648 (to S.S.C., A.E.M., R.F., G.G.), EPS-0447416 (to D.R.I.), EPS-0447610 (to S.S.C.), and OPP-0421514 (to A.E.M.) and the Desert Research Institute's postdoctoral research support program.


The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

Data deposition: The Whole Genome Shotgun project has been deposited in the DDBJ/EMBL/GenBank database (accession no. AAUQ00000000); the version described in this paper is the first version (accession no. AAUQ01000000). The project is registered with project ID 17241 and consists of sequences AAUQ01000001-AAUQ01128934. Traces were submitted to the National Center for Biotechnology Information Trace Archive (accession nos. 18784132991878706363); PCR-amplified SSU rRNA gene sequences were also submitted (accession nos. EF462586EF462837).

This article contains supporting information online at www.pnas.org/cgi/content/full/0802782105/DCSupplemental.


1. Hurtado LA, Lutz RA, Vrijenhoek RC. Distinct patterns of genetic differentiation among annelids of eastern Pacific hydrothermal vents. Mol Ecol. 2004;13:2603–2615. [PubMed]
2. Le Bris N, Zbinden M, Gaill F. Processes controlling the physico-chemical micro-environments associated with Pompeii worms. Deep-Sea Res I. 2005;52:1071–1083.
3. Di Meo-Savoie CA, Luther GW, Cary SC. Physicochemical characterization of the microhabitat of the epibionts associated with Alvinella pompejana, a hydrothermal vent annelid. Geochim Cosmochim Acta. 2004;68:2055–2066.
4. Luther GW, et al. Chemical speciation drives hydrothermal vent ecology. Nature. 2001;410:813–816. [PubMed]
5. Gaill F, Mann K, Wiedemann H, Engel J, Timpl R. Structural comparison of cuticle and interstitial collagens from annelids living in shallow sea-water and at deep-sea hydrothermal vents. J Mol Biol. 1995;246:284–294. [PubMed]
6. Chevaldonne P, Desbruyeres D, Childress JJ. Some like hot - and some even hotter. Nature. 1992;359:593–594.
7. Cary SC, Shank T, Stein J. Worms bask in extreme temperatures. Nature. 1998;391:545–546.
8. Desbruyères D, Gaill F, Laubier L, Fouquet Y. Polychaetous annelids from hydrothermal vent ecosystems: An ecological overview. Bull Biol Soc Wash. 1985;6:103–116.
9. Haddad MA, Camacho F, Durand P, Cary SC. Phylogenetic characterization of the epibiotic bacteria associated with the hydrothermal vent polychaete Alvinella pompejana. Appl Environ Microbiol. 1995;61:1679–1687. [PMC free article] [PubMed]
10. Campbell BJ, Engel AS, Porter ML, Takai K. The versatile epsilon-proteobacteria: key players in sulphidic habitats. Nat Rev Microbiol. 2006;4:458–468. [PubMed]
11. Polz MF, Cavanaugh CM. Dominance of one bacterial phylotype at a mid-Atlantic ridge hydrothermal vent site. Proc Natl Acad Sci USA. 1995;92:7232–7236. [PMC free article] [PubMed]
12. Ruby EG, Lee KH. The Vibrio fischeri Euprymna scolopes light organ association: Current ecological paradigms. Appl Environ Microbiol. 1998;64:805–812. [PMC free article] [PubMed]
13. Warnecke F, et al. Metagenomic and functional analysis of hindgut microbiota of a wood-feeding higher termite. Nature. 2007;450:560–565. [PubMed]
14. Woyke T, et al. Symbiosis insights through metagenomic analysis of a microbial consortium. Nature. 2006;443:950–955. [PubMed]
15. Schleper C, et al. Genomic analysis reveals chromosomal variation in natural populations of the uncultured psychrophilic archaeon Cenarchaeum symbiosum. J Bacteriol. 1998;180:5003–5009. [PMC free article] [PubMed]
16. Robidart JC, et al. Metabolic versatility of the Riftia pachyptila endosymbiont revealed through metagenomics. Environ Microbiol. 2008;10:727–737. [PubMed]
17. Campbell BJ, Stein JL, Cary SC. Evidence of chemolithoautotrophy in the bacterial community associated with Alvinella pompejana, a hydrothermal vent polychaete. Appl Environ Microbiol. 2003;69:5070–5078. [PMC free article] [PubMed]
18. Marais A, Mendz GL, Hazell SL, Megraud F. Metabolism and genetics of Helicobacter pylori: The genome era. Microbiol Mol Biol Rev. 1999;63:642–674. [PMC free article] [PubMed]
19. Wu D, et al. Metabolic complementarity and genomics of the dual bacterial symbiosis of sharpshooters. Plos Biol. 2006;4:1079–1092. [PMC free article] [PubMed]
20. Sanchez-Perez G, Mira A, Nyrio G, Pasic L, Rodriguez Valera F. Adapting to environmental changes by specialized paralogs. Trends Gen. 2008;24:154–158. [PubMed]
21. Kreil DP, Ouzounis CA. Identification of thermophilic species by the amino acid compositions deduced from their genomes. Nucleic Acids Res. 2001;29:1608–1615. [PMC free article] [PubMed]
22. Chakravarty S, Varadarajan R. Elucidation of factors responsible for enhanced thermal stability of proteins: A structural genomics based study. Biochemistry. 2002;41:8152–8161. [PubMed]
23. Lee CK, Cary SC, Murray AE, Daniel RM. Enzymatic approach to eurythermalism of Alvinella pompejana and its episymbionts. Appl Environ Microbiol. 2008;74:774–782. [PMC free article] [PubMed]
24. Linz B, Schuster SC. Genomic diversity in Helicobacter and related organisms. Res Microbiol. 2007;158:737–744. [PubMed]
25. Grzymski JJ, et al. Comparative genomics of DNA fragments from six antarctic marine planktonic bacteria. Appl Environ Microbiol. 2006;72:1532–1541. [PMC free article] [PubMed]
26. La D, Silver M, Edgar RC, Livesay DR. Using motif-based methods in multiple genome analyses: A case study comparing orthologous mesophilic and thermophilic proteins. Biochemistry. 2003;42:8988–8998. [PubMed]
27. Levine SM, et al. Plastic cells and populations: DNA substrate characteristics in Helicobacter pylori transformation define a flexible but conservative system for genomic variation. FASEB J. 2007;21:3458–3467. [PubMed]
28. Elsaied H, et al. Novel and diverse integron integrase genes and integron-like gene cassettes are prevalent in deep-sea hydrothermal vents. Environ Microbiol. 2007;9:2298–2312. [PubMed]
29. Smeets LC, Kusters JG. Natural transformation in Helicobacter pylori: DNA transport in an unexpected way. Trends Microbiol. 2002;10:159–162. [PubMed]
30. Burke J, Davison D, Hide W. D2_cluster: A validated method for clustering EST and full-length cDNA sequences. Genome Res. 1999;9:1135–1142. [PMC free article] [PubMed]
31. Eppinger M, Baar C, Raddatz G, Huson DH, Schuster SC. Comparative analysis of four Campylobacterales. Nat Rev Microbiol. 2004;2:872–885. [PubMed]
32. Sievert SM, et al. Genome of the epsilonproteobacterial chemolithoautotroph Sulfurimonas denitrificans. Appl Environ Microbiol. 2008;74:1145–1156. [PMC free article] [PubMed]
33. Jankielewicz A, Schmitz RA, Klimmek O, Kroger A. Polysulfide reductase and formate dehydrogenase from Wolinella succinogenes contain molybdopterin guanine dinucleotide. Arch Microbiol. 1994;162:238–242.
34. Cardoso-Pereira IA. Respiratory membrane complexes of Desulfovibrio. In: Dahl C, Friedrich CG, editors. Microbial Sulfur metabolism. Berlin: Springer; 2007. pp. 24–35.
35. Nakagawa S, et al. Deep-sea vent epsilon-proteobacterial genomes provide insights into emergence of pathogens. Proc Natl Acad Sci USA. 2007;104:12146–12150. [PMC free article] [PubMed]
36. Tanaka T, Tateno Y, Gojobori T. Evolution of vitamin B6 (pyridoxine) metabolism by gain and loss of genes. Mol Biol Evol. 2005;22:243–250. [PubMed]
37. Kraft C, Suerbaum S. Mutation and recombination in Helicobacter pylori: Mechanisms and role in generating strain diversity. Int J Med Microbiol. 2005;295:299–305. [PubMed]
38. Cary SC, Cottrell MT, Stein JL, Camacho F, Desbruyeres D. Molecular identification and localization of filamentous symbiotic bacteria associated with the hydrothermal vent annelid Alvinella pompejana. Appl Environ Microbiol. 1997;63:1124–1130. [PMC free article] [PubMed]
39. Gregory TR, et al. Eukaryotic genome size databases. Nucl Acids Res. 2007;35:D332–338. [PMC free article] [PubMed]
40. Campbell BJ, Cary SC. Characterization of a novel spirochete associated with the hydrothermal vent polychaete annelid, Alvinella pompejana. Appl Environ Microbiol. 2001;67:110–117. [PMC free article] [PubMed]
41. Huang X, Wang J, Aluru S, Yang S-P, Hillier L. PCAP: A whole-genome assembly program. Genome Res. 2003;13:2164–2170. [PMC free article] [PubMed]
42. Huang X, Madan A. CAP3: A DNA sequence assembly program. Genome Res. 1999;9:868–877. [PMC free article] [PubMed]
43. Li W, Jaroszewski L, Godzik A. A clustering of highly homologous sequences to reduce the size of large protein databases. Bioinformatics. 2001;17:282–283. [PubMed]
44. Li W, Jaroszewski L, Godzik A. Tolerating some redundancy speeds up clustering of large protein databases. Bioinformatics. 2002;18:77–82. [PubMed]
45. Kaplarevic M, Murray AE, Cary SC, Gao GR. EnGENIUS - Environmental genome informational utility system. J Bioinfor Comput Biol. 2008;6:6. [PubMed]
46. Altshul SF, et al. Gapped BLAST and PSI-blast: A new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. [PMC free article] [PubMed]
47. Claudel-Renard C, Chevalet C, Faraut T, Kahn D. Enzyme-specific profiles for genome annotation: PRIAM. Nucleic Acids Res. 2003;31:6633–6639. [PMC free article] [PubMed]
48. Olsen GJ, Matsuda H, Hagstrom R, Overbeek R. Fastdnaml - A tool for construction of phylogenetic trees of DNA-sequences using maximum-likelihood. Comput Applications Biosci. 1994;10:41–48. [PubMed]
49. Markowitz V, et al. The integrated microbial genomes (IMG) system in 2007: Data content and analysis tool extensions. Nucleic Acids Res. 2008;36:D528–533. [PMC free article] [PubMed]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences
PubReader format: click here to try


Related citations in PubMed

See reviews...See all...


Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...