Logo of aemPermissionsJournals.ASM.orgJournalAEM ArticleJournal InfoAuthorsReviewers
Appl Environ Microbiol. 2006 Feb; 72(2): 1532–1541.
PMCID: PMC1392886

Comparative Genomics of DNA Fragments from Six Antarctic Marine Planktonic Bacteria


Six environmental fosmid clones from Antarctic coastal water bacterioplankton were completely sequenced. The genome fragments harbored small-subunit rRNA genes that were between 85 and 91% similar to those of their nearest cultivated relatives. The six fragments span four phyla, including the Gemmatimonadetes, Proteobacteria (α and γ), Bacteroidetes, and high-G+C gram-positive bacteria. Gene-finding and annotation analyses identified 244 total open reading frames. Amino acid comparisons of 123 and 113 Antarctic bacterial amino acid sequences to mesophilic homologs from G+C-specific and SwissProt/UniProt databases, respectively, revealed widespread adaptation to the cold. The most significant changes in these Antarctic bacterial protein sequences included a reduction in salt-bridge-forming residues such as arginine, glutamic acid, and aspartic acid, reduced proline contents, and a reduction in stabilizing hydrophobic clusters. Stretches of disordered amino acids were significantly longer in the Antarctic sequences than in the mesophilic sequences. These characteristics were not specific to any one phylum, COG role category, or G+C content and imply that underlying genotypic and biochemical adaptations to the cold are inherent to life in the permanently subzero Antarctic waters.

Environmental genome sequencing and analysis are rapidly increasing our knowledge of the genetic and functional diversity of bacteria and archaea. Diverse environments have been examined, including soil (31, 38), marine sediments (19), marine waters (5, 36, 40), such as Antarctic systems (4, 8, 23, 27), and acid mine drainage systems (39). These studies have produced physiological insights, such as the widespread nature of rhodopsins in marine bacteria (8, 40), as well as insights into biogeochemistry, such as the genetic inference of the mechanism of methane oxidation in anoxic environments (19). Antarctic metagenomic studies have included an examination of phylogenetic diversity in deep polar front waters (23), comparative genomics of Antarctic and temperate archaeal genome fragments (4, 8), and genome fragment analysis of a marine euryarchaeote (27).

Water temperatures in the coastal water off the Antarctic Peninsula do not exceed 2°C and remain at −1.8°C for most of the year (Palmer Station [http://iceflo.icess.ucsb.edu:8080/data/default.htm]). While little is known about the specific niches different Antarctic bacterial species occupy or the extent and nature of species diversity, extensive work has been done to study microbial activity and the impact of microbial processes on carbon and nitrogen cycling (10, 22). Clearly, many Antarctic bacteria have adapted to cold conditions and achieve growth rates comparable to those in temperate environments (16). However, only recently have comparative genomic studies been used to reveal possible cold adaptations at the predicted amino acid level in bacterial and archaeal genomes (25, 34). One of the major adaptations to cold includes modifications to structural features of proteins. Specific amino acid usage patterns and structural characteristics have emerged from analyses of a limited number of psychrophilic protein crystal structures (12, 14), mesophile and thermophile genomes (20), and psychrophile genomes (25, 34). Amino acid modifications are necessary to overcome structural stability and thermodynamic hurdles that occur at both temperature extremes (32, 33). In general, when cold-adapted proteins are compared to mesophilic proteins, they have fewer structural features that promote stability and more that increase protein flexibility. Structural features of cold-adapted proteins include fewer ion pairs, fewer arginine residues, fewer polar, H-bond-forming residues, fewer proline residues in protein loops, and fewer aromatic interactions than those in mesophilic proteins (12, 15). These conclusions are based on relatively few cloned and sequenced genes and not on genome or genome fragment analyses (33). Myriad, disparate strategies are employed between and among cold-adapted proteins in a fashion that is not yet predictable (14). Protein families may adopt certain strategies, but a given microbial genome may employ any combination of strategies.

The motivation for this investigation was twofold. First, we sought to explore environmental genomes of several uncultured Antarctic bacterioplankton isolates. These genomic data, in combination with previous studies (4, 8) of the same library, can be used in the future to target these organisms and their gene expression profiles in the natural environment. Second, we were interested in testing hypotheses of cold adaptation derived from limited psychrophilic enzymes across a large data set from diverse bacteria. This study reports on six environmental fosmid clones from a library created from DNAs collected in nearshore waters off Palmer Peninsula, Antarctica (4). These clones were selected for complete sequencing based on their ecological and evolutionary relevance. Each represents a different uncultivated marine bacterial group, four of which contain ribosomal operons and two of which contain phylogenetically conserved coding regions that we used to infer their affiliation with the Cytophaga/Flavobacteria/Bacteroidetes (CFB) group and α-proteobacterial lineages. Here we present a comparative genomic analysis of the six fragments and data concerning amino acid modifications in cold-adapted microorganisms that appear to be consistent themes in bacterial genomes.


A large contig shotgun fosmid library was prepared by using DNAs from a seawater sample collected from the nearshore waters off Palmer Station, Antarctica (64°46′S, 64°03′W), in the late Austral winter 1996. Details of the library preparation were described by Béjà et al. (4). Fosmid clones with rRNA genes selected for sequencing in this study were screened using a PCR-denaturing gradient gel electrophoresis (PCR-DGGE) approach. In order to assess phylogenetic diversity in the library, 45 96-well plates were screened by PCR using primers targeting the bacterial V3 region of the rRNA operon (28, 30). Standard denaturing gradient gel electrophoresis running conditions were applied (8% acrylamide gel with a 30 to 60% denaturing gradient, run at 62 V for 16 h) on a Bio-Rad DCode system according to the method of Murray et al. (29). Unique bands from the DGGE analysis were excised from the gel, reamplified using non-GC-clamped primers, and subjected to DNA sequence analysis. In addition, two fosmid clones were randomly selected for sequencing.

Sequencing of the small-subunit (SSU) rRNA gene PCR fragments was performed on an ABI Prism 3730 instrument at the Nevada Genomics Center. Well locations of fosmid clones selected for complete sequencing were then identified via row and column DGGE screening in which the correct bands were matched with the original pooled 96-well plate reactions. DNAs from the selected fosmid clones were purified using a Plasmid Mega kit (QIAGEN). Fosmid DNAs were treated with plasmid-safe DNase (Epicenter) to remove remaining host genomic DNA. Fosmid DNAs were then subcloned using a TOPO ShotGun subcloning kit (Invitrogen Life Technologies). In brief, approximately 10 to 20 μg of fosmid DNA was sheared using a nebulizer at 16 psi for 75 s, resulting in fragmented DNA in the size range of 1,500 bp to 3,000 bp. The DNA was then blunt end repaired, dephosphorylated, inserted into vectors, and transformed into One Shot TOP10 electrocompetent Escherichia coli cells. Successful clones were selected for both ampicillin and kanamycin resistance. Genome sequencing was conducted at Amersham Biosciences (now part of GE Healthcare) and the Joint Genome Institute.

Two gene finding programs, FgenesB (Softberry Inc.) and GLIMMER (TIGR), identified open reading frames (ORFs). The results from gene finding using GLIMMER were then processed through TIGR's annotation engine and packaged in the TIGR database MANATEE. A BER search (BLAST Extend Repraze) was performed for each ORF included in the MANATEE database. The two automated annotations were compared manually and combined.

Evolutionary distance was employed to determine relationships between homologous protein coding regions, using PHYLIP (v3.62). Protein distance matrices were calculated using the Jones-Taylor-Thornton model after bootstrapping using the ProtDist package. Bootstrap analysis (1,000 iterations) was performed in PHYLIP using SeqBoot. The PHYLIP package Fitch was used to reconstruct phylogenetic trees from maximum likelihood distance matrices generated with nonresampled amino acid sequence data.

A method for amino acid usage analysis was developed. Combined SwissProt/UniProt protein database and custom G+C-specific databases were created locally. The G+C-specific databases were created for each Antarctic bacterial fosmid with genome sequences from at least 10 mesophilic bacteria that were ~±2.5% of the fosmid G+C content. BLASTP analysis was performed for all Antarctic predicted proteins against the two types of databases. All results that had at least two BLAST hits with an expect value of <10−15 were parsed along with their best matches (up to five). These data were stored in a customized MYSQL database and subjected to further amino acid analyses. Amino acid compositions and the protein parameters GRAVY (grand average of hydropathicity) and aliphaticity were calculated on the ExPASy website (http://us.expasy.org and references therein). Predictions of natural disordered residues, lengths of the disordered regions, and overall PONDR scores in the protein coding regions were calculated according to the algorithms VLXT and VL3 described by Dunker et al. (11), using licensed software (PONDR; Molecular Kinetics). The VLXT algorithm was only used to predict regions of disorder of >45 residues. All of the parsed Antarctic and mesophilic amino acid sequences (described above) of >30 residues were analyzed. The PONDR results were uploaded to the customized MYSQL database and analyzed. Statistical comparisons of all amino acid contents and of associated parameters and disorder-associated values of the mesophile averages (n = 2 to 5) versus the Antarctic data values were made using a one-sample t test in Statistica (StatSoft).

Nucleotide sequence accession numbers.

The annotated sequences determined for this study have been submitted to GenBank under accession numbers DQ295237 to DQ295242.


Here we report the identification of open reading frames and the analysis of predicted amino acid usage for six Antarctic bacterial genome fragments (Table (Table1).1). Four of the fragments sequenced, i.e., Ant4D5, Ant39E11, Ant29B7, and Ant4E12, are affiliated with mostly uncultivated and unknown marine bacterial lineages. Ant4D3 and Ant24C4 (γ- and α-proteobacteria, respectively) represent commonly detected phylogenetic groups, although Ant4D3 is only distantly related (91% SSU rRNA gene sequence identity) to its nearest cultivated relative (Fig. (Fig.1).1). Four of the fragments contained rRNA genes, while Ant29B7 and Ant24C4 did not. These inserts contained conserved protein coding regions that made phylogenetic identification possible.

FIG. 1.
Distribution of SSU rRNA gene phylotypes identified in the late-winter Antarctic bacterioplankton clone library. Solid bars represent numbers of unique phylotypes in the designated phyla/classes, and open bars represent total numbers of phylotypes found ...
Summary of the six environmental genome fragments analyzed

Gene finding.

A summary of the number of ORFs found, the percentage of hypothetical proteins, the average gene length, and other statistics for each Antarctic fosmid insert is found in Table Table1,1, and a complete list of all ORFs identified and their most similar homologs, positions, and cellular roles can be found in Table S1 in the supplemental material. Ant4D5 had the highest percentage of hypothetical proteins, at 45% of 33 total ORFs, and is least similar to any cultivable relative. The protein coding regions showed no significant similarities to any particular group of bacteria. The top BLAST hit for each of the 23 ORFs with homologs covered 19 different organisms in seven different phyla (see Table S1 in the supplemental material). In contrast, the genome fragment ANT4D3 affiliated with the γ-proteobacteria contained 50 ORFs, with only 5 ORFs identified as either hypothetical or conserved hypothetical proteins. The majority of the coding regions (68%) were most closely related to γ-Proteobacteria and were typically >70% identical to γ-Proteobacteria ORF homologs.

Affiliation of genomic fragments without SSU rRNA genes.

BLASTP searches against a newer nonredundant database than the one used for the BER search were performed with each of the ORFs identified in Ant24C4. Silicibacter pomeroyi, a common coastal marine bacterium and the only Roseobacter sp. with a known genome sequence, was added to this newer database and was consistently identified as the closest homolog (e.g., ORFA019 had 77% identity and ORFA038 had 73% identity; see Table S2 in the supplemental material for protein distances of six of the predicted proteins). Protein distance matrices for six amino acid sequences versus the top BLASTP hits in the nonredundant database helped substantiate the conclusion that Ant24C4 is most likely affiliated with the α-Proteobacteria, specifically the Roseobacter clade (see Table S2 in the supplemental material).

Phylogenetic analysis of the universal target region of the groEL gene product (21) placed Ant29B7 within the CFB group (Fig. (Fig.2).2). Ant29B7 branches with 14 members of the CFB for which groEL sequences are available. The closest related genome sequences to Ant29B7, those of Bacteroides thetaiotaomicron and Porphyromonas gingivalis (18/30 predicted proteins listed in Table S1 in the supplemental material), were also identified from protein distance comparisons to top 10 BLASTP hits (data not shown).

FIG. 2.
Neighbor-joining tree representing the 184- to 186-amino-acid universal target region of groEL gene products from Ant29B7 and its nearest neighbors.

Gene content.

Relative lengths of the genome fragments and putative ORFs, positions of the ribosomal operon (if present), COG categories of ORFs, and highly conserved genes are presented in Fig. Fig.3.3. The analysis of gene content for all of the Antarctic genome fragments focused on the following major themes: (i) genes with suspected biogeochemical relevance, (ii) genes involved in amino acid transport and biosynthesis, and (iii) genes with suspected relevance in the cold. In many cases, the genes described fit into more than one category.

FIG. 3.
Linear ORF maps for the six fully sequenced fosmids from the Antarctic marine picoplankton library. ORFs are color coded according to their COG affiliations and to highlight ribosomal operons, where they exist. Highly conserved genes (equivalogs) with ...

(i) Biogeochemically relevant ORF products.

Seventeen of the Ant39E11 ORFs (52%) were most similar to homologs affiliated with members of the CFB group. The ferrous iron transport genes feoA (ORFC019) and feoB (ORFC020) were in this fragment. Also, two peptidases from the M23/M27 (ORFC032) and M20/M25/M40 (ORFC043) families were identified. The Roseobacter-like Ant24C4 fragment contained genes for phosphonate (organic phosphorus) transport and metabolism (ORFA035 to -41). It also contained a putative ammonia monooxygenase (ORFA062). The fragment Ant29B7 contained numerous genes with potential biogeochemical relevance. It contained genes for gliding motility (ORFB009 and -10), NH3-dependent NAD+ synthetase (ORFB012), and a peptidase (ORFB039).

(ii) Amino acid synthesis and transport.

COG categories identified by color in Fig. Fig.33 indicate the large number of amino acid transport and metabolism genes identified in the Antarctic genome fragments, especially for Ant4E12 and Ant4D3. Ant4E12 contains a large operon for glutamate biosynthesis that spans almost 13 kb (ORFF007 to -17). Ant4D3 contains genes for two hydrolases responsible for histidine biosynthesis (ORFD002 and -3), asparaginase (ORFD019), and a suite of amino acid transporters (ORFD021 to -24). There were also genes for aspartate and glutamate biosynthesis (ORF045, -047, and -050). Two copies of the gene for an ABC-type polar amino acid transport protein appear consecutively in this environmental clone (ORFD022 and ORFD023). The coding regions are significantly divergent and are duplicated in other bacterial genomes. Phylogenetic analysis showed clustering of these two proteins in distinct branches (see Fig. S1 in the supplemental material). The ABC transporter from ORFD022 branches together with representatives from the γ-Proteobacteria, while the ABC transporter from ORFD023 branches between two groups of α-Proteobacteria.

(iii) Cold metabolism ORF products.

Numerous predicted proteins from each bacterial genome fragment could play an important role in cold tolerance. Chaperones were identified in three fragments, namely, Ant4D5 (groES; ORFE011), Ant24C4 (dnaK; ORFA058), and Ant29B7 (groES and groEL; ORFB033 and -34). Three other cellular role categories suspected to be important for life in the cold were protein synthesis, DNA metabolism and transcriptional regulation, and DNA binding proteins. Numerous tRNA aminoacylation and nucleotide base-modifying proteins were annotated, including synthetases (Ant4D5, ORFE025; Ant39E11, ORFC031; and Ant4E12, ORFF005), transferases (Ant39E11, ORFC008; and Ant29B7, ORFB026), and a hydrolase (Ant24C4, ORFA014). Genes involved in DNA metabolism included those encoding a DNA helicase (Ant39E11, ORFC013), topoisomerase I (Ant29B7,ORFB025), and DNA polymerase IIIb (Ant39E11, ORFC). Ant4D3 contained recG (ORFD025), an RNA polymerase subunit gene (rpoZ; ORFD027), a tyrosine recombinase gene (ORFD043), and a CbbY-like protein gene (ORFD053). At least one transcriptional regulatory-like protein was identified in each bacterial fragment, except Ant4E12, and these proteins ranged in size from 70 to 420 amino acids.

Amino acid usage. (i) Arginine and proline.

There were 123 and 113 predicted proteins that had at least two homologs (E value, <10−15) in the G+C-specific and SwissProt/UniProt databases, respectively. Amino acid sequences from all Antarctic predicted proteins, except those for Ant24C4, showed significantly reduced Arg/(Arg + Lys) ratios compared to their mesophile homologs (Table (Table2).2). These results were consistent regardless of the G+C content of the genomes under comparison. Ant4D3 had significantly reduced arginine usage in 22 of 34 ORF products analyzed against the SwissProt/UniProt database. Half of the amino acid sequences analyzed were also significantly different from homologs in the G+C-specific protein database. Only two and five ORFs had significantly increased Arg/(Arg + Lys) ratios compared to the SwissProt/UniProt and G+C-specific databases, respectively.

Results of amino acid analysis for predicted proteins from six Antarctic bacterial genome fragments versus their mesophilic homologsa

The proline contents of three of the Antarctic fosmid clones (Ant4D5, Ant39E11, and Ant4E12; Table Table2)2) were significantly reduced compared to those of their mesophile homologs and were independent of the G+C content. About two-thirds of the amino acid sequences analyzed had significantly reduced proline usage. Only 8 of the combined 47 putative proteins analyzed (for Ant4D5, Ant39E11, and Ant4E12) had increased proline usage compared to their homologs. Ant4D3, Ant24C4, and Ant29B7 had almost equal numbers of sequences that showed increased and decreased proline usage compared to sequences from the BLASTP results.

(ii) Aliphaticity and GRAVY.

The aliphatic index was significantly lower for 35 of 107 (33%) sequences analyzed against custom G+C-specific databases. Seventeen amino acid sequences (16%) had significantly increased aliphatic indices compared to those of the mesophiles. The results were similar for amino acid sequences analyzed against the SwissProt/UniProt database (Table (Table2).2). ORFs in Ant4D3, Ant24C4, Ant4D5, and Ant39E11 showed significant reductions in aliphaticity. The calculations of GRAVY indicated that for the sequences with significant BLASTP results, there was no increase in hydrophilicity for the ORF products from the Antarctic genome fragments, except for Ant4D5. Sequences analyzed against the G+C-specific databases indicated that 24/107 had significantly lower GRAVY indices than their homologs, while 31/107 were calculated to have significantly higher indices than their mesophile homologs. These results were similar for sequence comparisons to the SwissProt/UniProt database.

(iii) Reduced Glu and Asp content.

In every genome fragment except Ant24C4, there was an overwhelming reduction in the acidic residues Glu and Asp compared to mesophile homologs in either database (Table (Table2).2). Over 60% of the sequences analyzed against the G+C-specific database had a significant reduction in Glu and/or Asp. Less than 15% of the putative proteins had increases in these residues, and almost all (11 of 14) of these came from Ant24C4. Ant24C4 was the only fragment in this or any amino acid usage category to exhibit a strong trend with the G+C content of the genomes under comparison. Of the 19 Ant24C4 sequences analyzed against the G+C-specific database, 11 had increased Glu and Asp usage and 1 had decreased usage. Fewer sequences had significant alterations, with either increased or decreased usage (three and two, respectively), compared to the SwissProt/UniProt database.

Protein disorder.

One hundred twelve amino acid sequences from Antarctic genome fragments and 509 mesophilic homologs from genomes whose G+C contents were similar (±~2.5%) to those of the Antarctic fosmid inserts were analyzed for regions of disorder, using the VL3 and VLXT algorithms (for regions of disorder of >45 residues) (11). The longest region of disordered residues was longer in 27/111 sequences from the Antarctic fragments than in their homologs, in contrast to only 9 that were shorter (Table (Table3).3). Ant4D3 had only eight predicted proteins with significantly more total disordered residues than their closest homologs, while 19 sequences had fewer total disordered residues. We did not exclude data based on a minimum number of disordered residues or based on the average prediction score (PONDR score; Table Table3),3), and there is a high variance in the data with low PONDR scores (i.e., few disordered bases). Amino acid sequences from Ant4E12 appeared to be more disordered than those from their mesophilic homologs. The same trends were found for the average PONDR scores. The VLXT algorithm is a better predictor of long disordered regions (>45 residues). The stretches of disordered residues of >45 amino acids calculated with the VLXT algorithm were also compared. In every instance but two, the Antarctic amino acid sequences had longer disordered stretches.

Results of PONDR for predicted proteins from six Antarctic bacterial genome fragments versus their mesophilic homologsa


This report describes the complete sequences of six 39- to 46-kb genome fragments selected from an initial screening of 4,320 environmental fosmid clones from bacterioplankton sampled in the waters off Anvers Island, Antarctica. These cold, stenothermal waters contain a diverse yet poorly understood microbial community in terms of diversity, evolutionary history, physiological capacity, ecological role, and environmental adaptation. Amino acid usage patterns and their relevance to cold adaptation were analyzed in the six genome fragments affiliated with currently uncultivated Antarctic marine bacterioplankton species. This analysis has heretofore not been performed using environmental genomic data. These studies are also important in uncovering new insights into marine bacterioplankton function and diversity, as other studies have shown. For example, environmental genomic studies have revealed new types of phototrophy by targeting specific genes (e.g., proteorhodopsin) (3, 8). Environmental DNA libraries have also been used to screen for the uncultivated marine crenarchaeote GI (4, 35) and to screen for products valuable in drug discovery (6).

Environmental genomic library and description of sequenced fosmid inserts.

The library contained at least 105 bacterial rRNA gene-containing inserts representing 56 different rRNA gene sequences (Fig. (Fig.1).1). We selected fosmid clones that were ecologically relevant (e.g., Ant4D3, an abundant γ-proteobacterial phylotype) and/or phylogenetically unique (e.g., Ant4D5, with one Gemmatimonadetes phylotype).

The Ant4D3 SSU rRNA gene affiliated with the γ-Proteobacteria was the most commonly encountered SSU rRNA gene sequence in the late-winter clone library (14/105 sequences detected). PCR-DGGE analysis of Antarctic planktonic rRNA gene fragments revealed this phylotype at different depths and throughout the year (see Fig. S2 in the supplemental material). This is in contrast to Ant39E11, which was absent at these depths and during these time points (see Fig. S2, ladder marker B, in the supplemental material). Ant4D3 is distantly related to a cultivable representative (91%) (28) within the OMG and much more closely related (95 to 99%) to SSU rRNA clones from polar environments, including a common phylotype detected in DGGE surveys of the Arctic ocean (2). The Ant4D3 genome fragment sequence contributes functional information regarding amino acid biosynthesis, DNA metabolism, and protein translocation and transport of this potentially important microorganism in polar waters.

Genomic sequences from two other fosmid clones are related to bacterioplankton from the commonly occurring Roseobacter group (Ant24C4) and the abundant CFB group (Ant29B7) in Antarctic waters. Ant24C4 was closely affiliated with α-Proteobacteria, most specifically with the recently reported Silicibacter pomeroyi (26). Phylogenetic analysis of the universal target region (21) of the groEL gene indicated that Ant29B7 was closely related to bacteria in the CFB group.

Ant4E12 is the first marine actinobacterium-related genome fragment to be sequenced. This group has often been encountered in oceanographic surveys (~3.5% of environmental sequences and isolates recovered from seawater) (37). Although the roles of this group in ocean systems are not known, their abilities to degrade high-molecular-weight carbon and to produce antimicrobial compounds are of interest.

Two clones, Ant4D5 and Ant39E11, were sequenced that were distantly related to cultivated bacteria (<90% over the entire SSU rRNA gene; Table Table1)1) but appear to be periodically active in the environment. Each clustered with several uncultivated rRNA gene environmental clones from either deep-sea or polar latitudes (92 to 96% identical). In the case of Ant4D5, this is the first description of genome characteristics of this newly described bacterial phylum, i.e., Gemmatimonadetes (41).

Three themes: biogeochemistry, amino acid biosynthesis/transport, and cold tolerance.

Little is known about the physiology of the Antarctic microbes targeted in this study. Small genome fragments provide interesting snapshots of the physiological and metabolic capabilities of an organism and are practical for complex community screening (3, 4, 8, 19). Caution should be used when interpreting the biogeochemical relevance of genes or genome fragments without expression studies or rate measurements. However, gene contents may accelerate our ability to bring related organisms into culture or to measure their activities in situ.

Many predicted proteins were identified in the broad themes of biogeochemical relevance, amino acid synthesis and transport, and aids in cold tolerance. Gene finding identified an anaerobic ferrous iron transport operon (ORFC019 and -020) in Ant39E11. Total iron, but especially ferrous iron, concentrations in the surface ocean are very low (24); therefore, it is assumed that iron acquisition is an important bacterioplankton process. Two peptidases in ANT39E11 were identified from the M25 family (ORFC043) and the M23 family (ORFC032). Uncommon peptidases such as these two may have ecological relevance in Antarctic marine bacteria that rely on the short periods of high phytoplankton biomass for their carbon. Members of the CFB group in marine environments are commonly thought to be associated with particles (9) and are abundant in the Southern Ocean (1). Ant29B7, also a CFB member, contained genes for gliding motility and ammonia metabolism.

Ant24C4, the Roseobacter-affiliated bacterial genome fragment, contained predicted proteins implicated in phosphonate utilization and ammonia oxidation. Interestingly, ammonia oxidation was not reported for Silicibacter pomeroyi, for which the only complete Roseobacter genome is available (26). The putative amoA gene from Ant24C4 is more identical to that of Magnetospirillum magnetotacticum (37% amino acid identity) than that of Silicibacter pomeroyi (29% identity).

Given the importance and cost of protein synthesis, there are two main reasons to hypothesize that amino acid transport and biosynthesis pathways might be interesting in cold environments. First, given the high cost of amino acid biosynthesis (especially the high-molecular-weight residues Trp, Try, Arg, and Phe), scavenging amino acids in cold environments may be important and cost-effective. Second, when this scavenging is not possible, the proteins involved in biosynthesis will be modified in comparison to their mesophilic homologs. All of the putative proteins for Ant4D3 and Ant4E12 that were implicated in amino acid biosynthesis had at least two amino acid modifications (described in Table Table2)2) that are indicative of cold adaptation. Twelve of these 14 putative proteins had significant reductions in the Arg/(Arg + Lys) ratio compared to their mesophilic homologs (further described in the following section).

Putative proteins involved in cellular roles (chaperones, protein synthesis, DNA metabolism, transcriptional regulation, and DNA binding proteins) that might be particularly affected by cold were abundant in our data set. These putative proteins appear to be highly modified, particularly in terms of decreased Arg/(Arg + Lys) ratios and decreased polar residue usage. In Ant4D3, all five of the putative proteins within these role categories had significantly reduced Arg/(Lys + Arg) ratios. Four of the five had decreased usage of polar residues, and four had significantly increased serine usage (data not shown). Our global analysis did not turn up significant increases in serine usage in the Antarctic bacteria compared to their mesophilic homologs; however, a significant increase in serine usage was reported for Colwellia psychrerythea 34H (25) compared to mesophile and thermophile genomes.

Amino acid usage and cold adaptation.

For enzymes, increased flexibility and decreased stability translate into greater entropy (17). The thermodynamic effect of cold adaptation is a reduction in the temperature dependence of the maximum catalytic rate (14, 15). This can be achieved through structural plasticity and decreased stability during the activation of an enzyme-substrate complex, resulting in a reduction of the activation enthalpy and an increase in the activation entropy of the reaction. These consequences are similar to reactions involving intrinsically disordered protein motifs (11). These are regions without a defined three-dimensional structure or a time-averaged canonical set of Ramachandran angles (11). The kinetic effects of disorder mimic the effects of cold adaptation, including high specificities coupled to low affinities, binding plasticity, the creation of very large interaction surfaces, and higher rates of association and dissociation (11, 13, 15, 17, 18).

The analysis of all Antarctic ORFs with significant BLASTP results (at least two BLASTP hits with E values of <10−15) indicated widespread adaptations to the cold, stenothermal environment. About half of the predicted proteins and their closest homologs in the two types of databases used (SwissProt/UniProt and G+C-specific databases) were analyzed for amino acid usage, disordered regions, hydrophilicity, polarity, and aliphaticity. The results suggest that, for these analyses, there is no significant G+C effect on amino acid usage. Finally, it appears that proteins from psychrotrophs have increased regions of natural disorder than do mesophilic bacteria.

(i) Reductions in arginine and proline.

Decreased arginine and proline usage in psychrophilic enzymes has two main effects, namely, a reduction in salt bridges and an increase in the entropy of the unfolded protein (13, 33). Arginine-to-lysine substitutions that lower the Arg/(Arg + Lys) ratio were significant in a previous study comparing 21 psychrophilic enzymes (18). In almost every protein included in our analysis, the ratio Arg/(Arg + Lys) was reduced compared to those of significant BLASTP matches. Perhaps given the frequency of this occurrence, the psychrophilic benefit, and the simplicity of the Arg-to-Lys substitution (a second-position G-to-A purine substitution), this amino acid usage “rule” in cold-adapted proteins is G+C independent. As more psychrophile genomes become available, this hypothesis will be interesting to test.

(ii) Reductions in aliphaticity and GRAVY.

In general, it is assumed that an increase in protein flexibility is an advantage at cold temperatures (13). An increase in the protein core flexibility through reduced hydrophobic and/or nonpolar interactions may increase cold-temperature reaction rates of enzymes and increase the efficiency of other structure-dependent processes such as substrate and DNA binding (14, 17, 18). The analyses of aliphatic indices and GRAVY scores did not yield the same results. This could be a result of our analysis methods, which do not account for the class or structural regions (exposed versus buried) of putative proteins being analyzed. Presumably, there are various degrees of selective pressure on amino acid usage for the various functional classes of proteins. A more detailed analysis separating functional classes and including whether residues are buried or exposed is needed (18, 25, 34). Nonetheless, our results suggest that there are significant characteristics of amino acid usage associated with these Antarctic bacterial genome fragments. For example, the arginine repressor argR (ORFF08) from Ant4E12 encodes a signal protein that shows numerous adaptations to cold, including a lowered proline content and a deletion of 12 hydrophobic residues immediately following the disordered DNA binding domain. The tatA gene (ORFD04) from Ant4D3 also encodes an amino acid deletion of a stretch of 10 extremely hydrophobic residues, including Val, Ile, and Leu. While these deletions do not significantly change the GRAVY score of the entire protein, a comparative analysis of the tertiary structures of these Antarctic bacterial proteins compared to a mesophile structure would help elucidate the importance of such hydrophobic deletions on structure and function.

The artM gene duplication is found in many bacteria. Phylogenetic analysis of the two predicted artM protein products shows that they branch in separate clusters (see Fig. S1 in the supplemental material). One branch is highly ordered based on phylogenetic relationships within the Bacteria and within class in the case of the Proteobacteria. ORFD022 from Ant4D3 in the phylogenetically ordered branch has a 16-amino-acid deletion of hydrophobic residues. The predicted protein from ORFD023 has numerous insertions and deletions compared to its nearest neighbors, including members of the α- and γ-Proteobacteria. The deletions are almost always from very hydrophobic regions, suggesting that the structural rigidity imparted by hydrophobicity is minimized in this cold-adapted protein. Future work will identify if the two proteins are expressed.

(iii) Reductions in Glu and Asp.

The ratio (Asn + Gln)/(Asn + Gln + Glu + Asp) strongly favored the less polar residues in the Antarctic bacterial amino acid sequences. Only ORFs from Ant24C4 did not demonstrate this tendency. Otherwise, the trend was independent of the G+C content of the database. Using substitution matrices, Gianese et al. found that a Glu→Ala substitution was one of the most significant in their cold adaptation structural analysis (18). Glutamate substitutions were favored in helix structures and in exposed positions of the tertiary structure. The effect of this substitution is to reduce ion pairing, H bonding, and other electrostatic interactions. Also, the substitution of a hydrophilic residue for a hydrophobic one on the surface may destabilize the protein (7). While we did not discriminate between exposed and buried residues, the overwhelming trend in our data suggests that these polar residues are often substituted.

(iv) Increases in disorder.

Based on analyses of disorder in the predicted proteins of the Antarctic bacterial genome fragments, our data suggest that disordered regions of amino acids are longer in psychrotrophic and psychrophilic bacteria. Our hypothesis is that disorder increases entropy and is necessary to compensate for the structural rigidity encountered at lower temperatures. Cold temperatures place demands on access to enzymatic active sites and binding regions. These demands can be mitigated via increased disorder, decreased hydrophobicity, reduced salt bridges, and hydrophilic insertions and are common phenomena in the Antarctic bacterial amino acid sequences analyzed here.

We have found that proteins involved in regulation and signal transduction have altered amino acid usages compared to their mesophilic homologs. This is probably due to the need for the structural plasticity necessary for signal binding or recognition, as reported in other cold adaptation genome studies (11, 34). For example, tatB (ORFD05) encodes a significant C-terminal modification to a proline-rich region following an extremely disordered putative binding site. The transcriptional regulator encoded by atoC (ORFB029) contains two binding sites that under cold, rigid conditions require enhanced flexibility and binding access. A reduced number of prolines and a more disordered region around the ATP binding site define the C-terminal section of the protein. The N-terminal helix-turn-helix DNA binding site is defined by much weaker order-disorder transitions, resulting in a higher average disorder strength than those of the mesophiles. These are definitive cold adaptation mechanisms (13).

Among the chaperones, the groES ORF (ORFE011) from Ant4D5 encoded a polar insertion of 16 amino acids that lengthened a stretch of disordered amino acids. The chaperonin genes groES and groEL were also present in the Ant29B7 fosmid. Similar to the groES gene from Ant4D5, ORFB033, the groES gene from Ant29B7, encoded a hydrophobic amino acid deletion. The disordered stretch of the Ant29B7 protein was longer than those of similar mesophile proteins, providing more flexibility.

A sulfur transferase with a rhodanese domain was found in Ant29B7 (ORF061). This protein was significantly more hydrophilic in Ant24C4 due to a large 16-amino-acid deletion of hydrophobic amino acids than in its mesophilic homologs. The N-terminal side of the protein consisted of a disordered region of 40 bases. Stretches of disordered residues were consistently found to be longer in the Antarctic bacterial amino acid sequences than in their mesophilic homologs (Table (Table33.)

These Antarctic bacterial shotgun clones (40 to 44 kb each) have provided a diverse suite of genomic information about central metabolism, environmental sensors, stress responses, cellular transport, and amino acid modifications that demonstrate cold adaptation. Furthermore, our approach has enabled us to study both environmentally relevant organisms and unique members of the community that do not have close relatives that have been cultivated or genome sequence information available (with the exception of Ant24C4, which appears to fall in the Roseobacter clade). Amino acid analysis of the coding regions from these psychrophilic/psychrotrophic marine bacteria spanning four phyla revealed pervasive amino acid modifications characteristic of cold adaptability or decreased structural rigidity. If these modifications are ubiquitous in Antarctic marine organisms, we expect to find similar adaptations in the deep-sea microbes that inhabit >75% of the ocean below 4°C.

Supplementary Material

[Supplemental material]


This work was supported by the NSF LExEn program and the NSF Office of Polar Programs (OPP0085435 to A.E.M.).

We are especially indebted to our collaborators E. Rubin and associates at the Joint Genome Institute for DNA sequencing services. We thank the DRI IT staff, including S. Liu and P. Neeley, and B. Beck at the Nevada Center for Bioinformatics for providing direction in protein structure analysis. We thank Mihailo Kaplarevic and Garrett Taylor for sharing their programming and database expertise. We also thank J. Campbell and Integrated Genomics, Chicago, IL, for access to Polaribacter filamentus genome data and the Joint Genome Institute for providing DNA sequencing.


Supplemental material for this article may be found at http://aem.asm.org/.


1. Abell, G. C. J., and J. Bowman. 2005. Ecological and biogeographic relationships of class Flavobacteria in the Southern Ocean. FEMS Microbiol. Ecol. 51:265-277. [PubMed]
2. Bano, N., and J. T. Hollibaugh. 2002. Phylogenetic composition of bacterioplankton assemblages from the Arctic Ocean. Appl. Environ. Microbiol. 68:505-518. [PMC free article] [PubMed]
3. Béjà, O., L. Aravind, E. V. Koonin, M. T. Suzuki, A. Hadd, L. P. Nguyen, S. Jovanovich, C. M. Gates, R. A. Feldman, J. L. Spudich, E. N. Spudich, and E. F. DeLong. 2000. Bacterial rhodopsin: evidence for a new type of phototrophy in the sea. Science 289:1902-1906. [PubMed]
4. Béjà, O., E. V. Koonin, L. Aravind, L. T. Taylor, H. Seitz, J. L. Stein, D. C. Bensen, R. A. Feldman, R. V. Swanson, and E. F. DeLong. 2002. Comparative genomic analysis of archaeal genotypic variants in a single population and in two different oceanic provinces. Appl. Environ. Microbiol. 68:335-345. [PMC free article] [PubMed]
5. Béjà, O., M. T. Suzuki, E. V. Koonin, L. Aravind, A. Hadd, L. P. Nguyen, R. Villacorta, M. Amjadi, C. Garrigues, S. B. Jovanovich, R. A. Feldman, and E. F. DeLong. 2000. Construction and analysis of bacterial artificial chromosome libraries from a marine microbial assemblage. Environ. Microbiol. 2:516-529. [PubMed]
6. Courtois, S., C. M. Cappellano, M. Ball, F. X. Francou, P. Normand, G. Helynck, A. Martinez, S. J. Kolvek, J. Hopke, M. S. Osburne, P. R. August, R. Nalin, M. Guerineau, P. Jeannin, P. Simonet, and J. L. Pernodet. 2003. Recombinant environmental libraries provide access to microbial diversity for drug discovery from natural products. Appl. Environ. Microbiol. 69:49-55. [PMC free article] [PubMed]
7. Creighton, T. E. 1994. The energetic ups and downs of protein folding. Nat. Struct. Biol. 1:135-138. [PubMed]
8. de la Torre, J. R., L. M. Christianson, O. Beja, M. T. Suzuki, D. M. Karl, J. Heidelberg, and E. F. DeLong. 2003. Proteorhodopsin genes are distributed among divergent marine bacterial taxa. Proc. Natl. Acad. Sci. USA 100:12830-12835. [PMC free article] [PubMed]
9. Delong, E. F., D. G. Franks, and A. L. Alldredge. 1993. Phylogenetic diversity Of aggregate-attached vs free-living marine bacterial assemblages. Limnol. Oceanogr. 38:924-934.
10. Ducklow, H. 2000. Bacterial production and biomass in the oceans, p. 542. In D. L. Kirchman (ed.), Microbial ecology of the oceans. Wiley-Liss, New York, N.Y.
11. Dunker, A. K., C. J. Brown, J. D. Lawson, L. M. Iakoucheva, and Z. Obradovic. 2002. Intrinsic disorder and protein function. Biochemistry 41:6573-6582. [PubMed]
12. Feller, G. 2003. Molecular adaptations to cold in psychrophilic enzymes. Cell Mol. Life Sci. 60:648-662. [PubMed]
13. Feller, G., J. L. Arpigny, E. Narinx, and C. Gerday. 1997. Molecular adaptations of enzymes from psychrophilic organisms. Comp. Biochem. Physiol. 118A:495-499.
14. Feller, G., and C. Gerday. 2003. Psychrophilic enzymes: hot topics in cold adaptation. Nat. Rev. Microbiol. 1:200-208. [PubMed]
15. Feller, G., and C. Gerday. 1997. Psychrophilic enzymes: molecular basis of cold adaptation. Cell. Mol. Life Sci. 53:830-841. [PubMed]
16. Fuhrman, J. A., and F. Azam. 1980. Bacterioplankton secondary production estimates for coastal waters of British Columbia, Antarctica, and California. Appl. Environ. Microbiol. 39:1085-1095. [PMC free article] [PubMed]
17. Gerday, C., M. Aittaleb, J. L. Arpigny, E. Baise, J. P. Chessa, G. Garsoux, I. Petrescu, and G. Feller. 1997. Psychrophilic enzymes: a thermodynamic challenge. Biochim. Biophys. Acta 1342:119-131. [PubMed]
18. Gianese, G., P. Argo, and S. Pascarella. 2001. Structural adaptation of enzymes to low temperatures. Protein Eng. 14:141-148. [PubMed]
19. Hallam, S. J., N. Putnam, C. M. Preston, J. C. Detter, D. Rokhsar, P. M. Richardson, and E. F. DeLong. 2004. Reverse methanogenesis: testing the hypothesis with environmental genomics. Science 305:1457-1462. [PubMed]
20. Haney, P. J., J. H. Badger, G. L. Buldak, C. I. Reich, C. R. Woese, and G. J. Olsen. 1999. Thermal adaptation analyzed by comparison of protein sequences from mesophilic and extremely thermophilic Methanococcus species. Proc. Natl. Acad. Sci. USA 96:3578-3583. [PMC free article] [PubMed]
21. Hill, J. E., S. L. Penny, K. G. Crowell, S. H. Goh, and S. M. Hemmingsen. 2004. cpnDB: a chaperonin sequence database. Genome Res. 14:1669-1675. [PMC free article] [PubMed]
22. Karl, D. M. 1993. Microbial processes in the Southern Ocean, p. 634. In E. I. Friedmann (ed.), Antarctic microbiology. Wiley-Liss, New York, N.Y.
23. Lopez-Garcia, P., A. Lopez-Lopez, D. Moreira, and F. Rodriguez-Valera. 2001. Diversity of free-living prokaryotes from a deep-sea site at the Antarctic polar front. FEMS Microbiol. Ecol. 36:193-202. [PubMed]
24. Martin, J. H., R. M. Gordon, and S. E. Fitzwater. 1990. Iron in Antarctic waters. Nature 345:156-158.
25. Methe, B. A., K. E. Nelson, J. W. Deming, B. Momen, E. Melamud, X. Zhang, J. Moult, R. Madupu, W. C. Nelson, R. J. Dodson, L. M. Brinkac, S. C. Daugherty, A. S. Durkin, R. T. DeBoy, J. F. Kolonay, S. A. Sullivan, L. Zhou, T. M. Davidsen, M. Wu, A. L. Huston, M. Lewis, B. Weaver, J. F. Weidman, H. Khouri, T. R. Utterback, T. V. Feldblyum, and C. M. Fraser. 2005. The psychrophilic lifestyle as revealed by the genome sequence of Colwellia psychrerythraea 34H through genomic and proteomic analyses. Proc. Natl. Acad. Sci. USA 102:10913-10918. [PMC free article] [PubMed]
26. Moran, M. A., A. Buchan, J. M. Gonzalez, J. F. Heidelberg, W. B. Whitman, R. P. Kiene, J. R. Henriksen, G. M. King, R. Belas, C. Fuqua, L. Brinkac, M.Lewis, S. Johri, B. Weaver, G. Pai, J. A. Eisen, E. Rahe, W. M. Sheldon, W. Y. Ye, T. R. Miller, J. Carlton, D. A. Rasko, I. T. Paulsen, Q. H. Ren, S. C. Daugherty, R. T. Deboy, R. J. Dodson, A. S. Durkin, R. Madupu, W. C. Nelson, S. A. Sullivan, M. J. Rosovitz, D. H. Haft, J. Selengut, and N. Ward. 2004. Genome sequence of Silicibacter pomeroyi reveals adaptations to the marine environment. Nature 432:910-913. [PubMed]
27. Moreira, D., F. Rodriguez-Valera, and P. Lopez-Garcia. 2004. Analysis of a genome fragment of a deep-sea uncultivated group II euryarchaeote containing 16S rDNA, a spectinomycin-like operon and several energy metabolism genes. Environ. Microbiol. 6:959-969. [PubMed]
28. Murray, A. E., J. T. Hollibaugh, and C. Orrego. 1996. Phylogenetic compositions of bacterioplankton from two California estuaries compared by denaturing gradient gel electrophoresis of 16S rDNA fragments. Appl. Environ. Microbiol. 62:2676-2680. [PMC free article] [PubMed]
29. Murray, A. E., C. M. Preston, R. Massana, L. T. Taylor, A. Blakis, K. Wu, and E. F. DeLong. 1998. Seasonal and spatial variability of bacterial and archaeal assemblages in the coastal waters near Anvers Island, Antarctica. Appl. Environ. Microbiol. 64:2585-2595. [PMC free article] [PubMed]
30. Muyzer, G., E. C. Dewaal, and A. G. Uitterlinden. 1993. Profiling of complex microbial populations by denaturing gradient gel electrophoresis analysis of polymerase chain reaction-amplified genes coding for 16S rRNA. Appl. Environ. Microbiol. 59:695-700. [PMC free article] [PubMed]
31. Rondon, M. R., P. R. August, A. D. Bettermann, S. F. Brady, T. H. Grossman, M. R. Liles, K. A. Loiacono, B. A. Lynch, I. A. MacNeil, C. Minor, C. L. Tiong, M. Gilman, M. S. Osburne, J. Clardy, J. Handelsman, and R. M. Goodman. 2000. Cloning the soil metagenome: a strategy for accessing the genetic and functional diversity of uncultured microorganisms. Appl. Environ. Microbiol. 66:2541-2547. [PMC free article] [PubMed]
32. Russell, N. J. 1990. Cold adaptation of microorganisms. Philos. Trans. R. Soc. Lond. B 326:595-611. [PubMed]
33. Russell, N. J. 2000. Toward a molecular understanding of cold activity of enzymes from psychrophiles. Extremophiles 4:83-90. [PubMed]
34. Saunders, N. F. W., T. Thomas, P. M. G. Curmi, J. S. Mattick, E. Kuczek, R. Slade, J. Davis, P. D. Franzmann, D. Boone, K. Rusterholtz, R. Feldman, C. Gates, S. Bench, K. Sowers, K. Kadner, A. Aerts, P. Dehal, C. Detter, T. Glavina, S. Lucas, P. Richardson, F. Larimer, L. Hauser, M. Land, and R. Cavicchioli. 2003. Mechanisms of thermal adaptation revealed from the genomes of the Antarctic archaea Methanogenium frigidum and Methanococcoides burtonii. Genome Res. 13:1580-1588. [PMC free article] [PubMed]
35. Schleper, C., E. F. DeLong, C. M. Preston, R. A. Feldman, K. Y. Wu, and R. V. Swanson. 1998. Genomic analysis reveals chromosomal variation in natural populations of the uncultured psychrophilic archaeon Cenarchaeum symbiosum. J. Bacteriol. 180:5003-5009. [PMC free article] [PubMed]
36. Stein, J. L., T. L. Marsh, K. Y. Wu, H. Shizuya, and E. F. DeLong. 1996. Characterization of uncultivated prokaryotes: isolation and analysis of a 40-kilobase-pair genome fragment from a planktonic marine archaeon. J. Bacteriol. 178:591-599. [PMC free article] [PubMed]
37. Suzuki, M. T., and E. F. DeLong. 2002. Marine prokaryote diversity, p. 209-234. In J. T. Staley and A. L. Reysenbach (ed.), Biodiversity of microbial life. Wiley-Liss Inc., New York, N.Y.
38. Treusch, A. H., A. Kletzin, G. Raddatz, T. Ochsenreiter, A. Quaiser, G. Meurer, S. C. Schuster, and C. Schleper. 2004. Characterization of large-insert DNA libraries from soil for environmental genomic studies of Archaea. Environ. Microbiol. 6:970-980. [PubMed]
39. Tyson, G. W., J. Chapman, P. Hugenholtz, E. E. Allen, R. J. Ram, P. M. Richardson, V. V. Solovyev, E. M. Rubin, D. S. Rokhsar, and J. F. Banfield. 2004. Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature 428:37-43. [PubMed]
40. Venter, J. C., K. Remington, J. F. Heidelberg, A. L. Halpern, D. Rusch, J. A. Eisen, D. Y. Wu, I. Paulsen, K. E. Nelson, W. Nelson, D. E. Fouts, S. Levy, A. H. Knap, M. W. Lomas, K. Nealson, O. White, J. Peterson, J. Hoffman, R. Parsons, H. Baden-Tillson, C. Pfannkoch, Y. H. Rogers, and H. O. Smith. 2004. Environmental genome shotgun sequencing of the Sargasso Sea. Science 304:66-74. [PubMed]
41. Zhang, H., Y. Sekiguchi, S. Hanada, P. Hugenholtz, H. Kim, Y. Kamagata, and K. Nakamura. 2003. Gemmatimonas aurantiaca gen. nov., sp. nov., a gram-negative, aerobic, polyphosphate-accumulating micro-organism, the first cultured representative of the new bacterial phylum Gemmatimonadetes phyl. nov. Int. J. Syst. Evol. Microbiol. 53:1155-1163. [PubMed]

Articles from Applied and Environmental Microbiology are provided here courtesy of American Society for Microbiology (ASM)
PubReader format: click here to try


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...