• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of plosgenPLoS GeneticsSubmit to PLoSGet E-mail AlertsContact UsPublic Library of Science (PLoS)View this Article
PLoS Genet. Dec 2008; 4(12): e1000310.
Published online Dec 19, 2008. doi:  10.1371/journal.pgen.1000310
PMCID: PMC2597717

Genetic and Computational Identification of a Conserved Bacterial Metabolic Module

Michael T. Laub, Editor

Abstract

We have experimentally and computationally defined a set of genes that form a conserved metabolic module in the α-proteobacterium Caulobacter crescentus and used this module to illustrate a schema for the propagation of pathway-level annotation across bacterial genera. Applying comprehensive forward and reverse genetic methods and genome-wide transcriptional analysis, we (1) confirmed the presence of genes involved in catabolism of the abundant environmental sugar myo-inositol, (2) defined an operon encoding an ABC-family myo-inositol transmembrane transporter, and (3) identified a novel myo-inositol regulator protein and cis-acting regulatory motif that control expression of genes in this metabolic module. Despite being encoded from non-contiguous loci on the C. crescentus chromosome, these myo-inositol catabolic enzymes and transporter proteins form a tightly linked functional group in a computationally inferred network of protein associations. Primary sequence comparison was not sufficient to confidently extend annotation of all components of this novel metabolic module to related bacterial genera. Consequently, we implemented the Graemlin multiple-network alignment algorithm to generate cross-species predictions of genes involved in myo-inositol transport and catabolism in other α-proteobacteria. Although the chromosomal organization of genes in this functional module varied between species, the upstream regions of genes in this aligned network were enriched for the same palindromic cis-regulatory motif identified experimentally in C. crescentus. Transposon disruption of the operon encoding the computationally predicted ABC myo-inositol transporter of Sinorhizobium meliloti abolished growth on myo-inositol as the sole carbon source, confirming our cross-genera functional prediction. Thus, we have defined regulatory, transport, and catabolic genes and a cis-acting regulatory sequence that form a conserved module required for myo-inositol metabolism in select α-proteobacteria. Moreover, this study describes a forward validation of gene-network alignment, and illustrates a strategy for reliably transferring pathway-level annotation across bacterial species.

Author Summary

More than 1,000 microbial genomes have been sequenced to date, containing millions of predicted genes. While the broad functional category of many of these individual genes can be reliably predicted using sequence homology, sequence information alone is often insufficient to assign a gene a specific cellular function. Closing this gap in our understanding of gene function will require tremendous experimental effort over a broad phylogenetic cross-section of model microbes, along with computational methods for high-confidence extrapolation of functional information from model organisms to other species. Here, we report the experimental identification of a novel genetic module in the model α-proteobacterium C. crescentus that controls transport and catabolism of the abundant environmental sugar myo-inositol. A combination of computational methods for probabilistic protein-network assignment and gene-network alignment were required to reliably extend the annotation of genes in this metabolic module to related bacterial genera. Our computational predictions of the operon encoding the ABC myo-inositol transporter and an essential enzyme for myo-inositol catabolism in S. meliloti were validated experimentally, demonstrating the feasibility of our method for high-confidence propagation of pathway-level annotation across species.

Introduction

Inositol, or cyclohexanehexol, is one of the most abundant carbohydrates in freshwater and terrestrial ecosystems [1]. Phosphorylated and lipidated derivatives of inositol serve as important signaling molecules in eukaryotic cells and are critical components of cellular membranes. Among prokaryotes, several species of cyanobacteria, eubacteria and archaea are able to synthesize and derivitize inositol [2]. These molecules serve functional roles as antioxidants, osmolytes, cell membrane components, and as carbon storage substrates [3],[4]. Inositol can also serve as the sole carbon and energy source for many bacterial species [5][9] and, in its phosphorylated forms, as a source of phosphorus [10]. Thus inositol is an important biomolecule that is involved in multiple aspects of eukaryotic and prokaryotic cellular physiology and is also a critical nutrient and energy source positioned at the intersection of environmental carbon and phosphorus cycles [1].

While cells can derivitize inositol into many different chemical species, the unmodified myo form of inositol (cis-1,2,3,5-trans-4,6-cyclohexanehexol) is among the most abundant species in the environment [1]. The myo-inositol degradation pathway has been characterized biochemically in Klebsiella aerogenes [5], [11][13] and Bacillus subtilis [14]. In this pathway, seven proteins convert myo-inositol to CO2, acetyl CoA and dihydroxy-acetone phosphate (Figure 1). Structural and regulatory genes required for myo-inositol catabolism have been identified and characterized in several gram-positive species, including B. subtilis [7],[14], Clostridium perfringens [8], Corynebacterium glutamicum [9], and Lactobacillus casei [15], and in the gram-negative bacteria Rhizobium leguminosarum bv. viciae [6],[16], Sinorhizobium meliloti [3] and Sinorhizobium fredii [17]. Gram positives generally exhibit complete and contiguous catabolic operons that are adjacent to genes encoding myo-inositol transporters of the major facilitator superfamily; expression of these genes is controlled by transcriptional regulators of the DeoR or LacI families [7][9],[18]. Among the gram negatives, genes involved in myo-inositol metabolism are more dispersed across the chromosome. This lack of chromosomal co-location and the difficulty in assigning function to transporter and regulatory proteins using sequence homology alone [19],[20] has made comprehensive identification of the myo-inositol genetic modules more difficult in these species. In this study we have defined the structural and regulatory components of a genetic module controlling myo-inositol transport and catabolism in the gram-negative α-proteobacterium Caulobacter crescentus, and reliably extended this experimental functional annotation to other bacterial genera using a combination of computational network prediction and alignment methods.

Figure 1
Biochemical pathway of myo-inositol degradation in C. crescentus.

Materials and Methods

Strain Construction and Culture Conditions

C. crescentus strain CB15N (NA1000) [21] and strains derived from it were grown in peptone/yeast extract (PYE) or M2 minimal broth [22]. Minimal broth was supplemented with either 0.2% (w/v) myo-inositol (M2I) or 0.2% (w/v) glucose (M2G). Directed deletion strains were constructed by ligating approximately 500 base pair regions flanking the 5′ and 3′ regions of the gene to be deleted into the suicide plasmid pNPTS138 (see Table 1) using the EcoRI and HindIII restriction sites. pNPTS138 carries the nptI gene to select for single integrants on kanamycin and the sacB gene for counterselection on sucrose. The pNPTS138-derived deletion plasmids were transformed into CB15N by electroporation. Initial selection was on 25 µg/ml kanamycin, which was followed by overnight growth in nonselective media and then plating on 3% sucrose to select for cells that had undergone a second crossover event to excise the gene. PCR was used to confirm chromosomal deletions. Cloned fragments to generate pNPTS138 deletion plasmids had the following chromosomal coordinates: ibpA upstream = 955,261–956,039; ibpA downstream = 956,772–957,359. iatA upstream = 956,357–956,940; iatA downstream = 958,429–959,014. iatP upstream = 957,949–958,539; iatP downstream = 959,410–960,005. iolR upstream = 1,442,941–1,443,408; iolR downstream = 1,444,244–1,444,741. All gene deletions were in-frame. The deletion of iolR (CC1297) left the first and last 6 codons intact. The deletion of ibpA (CC0859) left the first 45 and last 38 codons intact. The deletion of iatA (CC0860) left the first 12 and last 9 codons intact. The deletion of iatP (CC0861) left the first 30 and last 13 codons intact. A transporter complementation plasmid was generated by cloning the full transporter locus plus promoter region into the KpnI and NdeI sites of the replicating plasmid pMT630 [23]; cloned chromosomal coordinates = 955,261–960,056.

Table 1
Strains.

Sinorhizobium meliloti Rm2011 and strains derived from it were obtained from the lab of Anke Becker (Bielefeld University, Germany) [24]. S. meliloti was grown in either LB or GTS minimal medium [25] supplemented with 0.2% glucose (GTS-G) or 0.2% inositol (GTS-I) as the sole carbon source. All strains used in this study are listed in Table 1.

Transposon Screen for C. crescentus Strains Deficient in Growth on myo-Inositol

A library of ≈16,000 individual C. crescentus CB15N mutant strains carrying either the Mariner–based Himar-1 transposon [26], the Mu-based HyperMu transposon (Epicentre, Madison, WI), or the Tn5-derived EZ-Tn5 transposon (Epicentre, Madison, WI) was generated and stored in 96-well format (Pritchard, Matteson and Viollier, unpublished). This transposon library was replica stamped from the 96-well plates onto M2 agar supplemented with either 0.2% (w/v) myo-inositol (M2I), 0.1% cellobiose (M2C) or 0.2% glucose (M2G). Strains that grew on M2C and M2G, but not M2I were considered to have inositol-conditional mutations. Mapping of the transposon insertion site in C. crescentus Himar-1, Hyper Mu, and Ez-Tn5 mutant strains deficient for growth on M2I was determined by isolating chromosomal DNA, digesting with HinPI for 10 minutes at 37°C, ligating the digested genomic fragments into circles using T4 DNA ligase, and transforming 1 µl of this ligation reaction into electrocompetent E. coli EC100D pir-116 cells (Epicentre, Madison, WI). These transposons all carry an R6K origin that replicates in a pir+ strain of E. coli. The circularized transposon plasmids were then isolated from E. coli and silica-column purified (Novagen, Madison, WI). The location of the transposon insertion was determined via a single primer sequencing extension reaction from the purified, circularized transposon plasmids. The oligos used to map these transposons are as follows: Himar-1-GATATTGCTGAAGAGCTTGGCGGCGAA; Ez-Tn5- CTACCCTGTGGAACACCTACATCT; Hyper-Mu-AGAGATTTTGAGACAGGATCCG.

DNA Microarray Analysis of Genes Regulated by Growth on myo-Inositol Relative to Glucose

Wild type C. crescentus CB15N cells were grown in either M2 minimal medium supplemented with 0.2% (w/v) glucose (M2G) or M2 supplemented with 0.2% myo-inositol (M2I) to OD660 of 0.3–0.4. 5 ml of 4 replicate cultures (for each carbon condition) were spun down at 10,000× g for 30 seconds, the supernatant was removed, and the cell pellets were flash frozen in liquid nitrogen. RNA was isolated from these cells by incubating in 1 ml of Trizol (Invitrogen, Carlsbad, CA) at 65°C for 10 minutes, adding chloroform, vortexing, spinning, and extracting the aqueous layer. Nucleic acid in the aqueous layer was isopropanol precipitated overnight at −80°C followed by a 30 minute centrifugation at 16,000× g. The ethanol-washed and air-dried nucleic acid pellet was resuspended in 50 µl of nuclease-free water (IDT, Coralville, IA). 1 µl of RNase-free DNase I (Ambion, Austin, TX) was added to the sample and incubated at room temperature for two hours to remove any residual DNA. The nucleic acid in this digested sample was then acid phenol-chloroform (Ambion, Austin, TX) extracted, ethanol precipitated at −80°C overnight, and centrifuged at 16,000× g to produce a DNA-free RNA pellet. RNA quality was assessed via agarose gel electrophoresis and RNA concentration determined by UV spectrophotometry using a Shimadzu UV-1650 (Kyoto, Japan).

Labeled indodicarbocyanine-dCTP (Cy3) and indocarbocyanine-dCTP (Cy5) cDNA was generated from 20 µg of total RNA by reverse transcription with Superscript II reverse transcriptase (Invitrogen, Carlsbad, CA) using 1 µg of random hexamer primers (Invitrogen, Carlsbad, CA). 2 samples of cDNA from each RNA type (M2I and M2G) were Cy3 labeled and 2 were Cy5 labeled. Dye-swapped cDNA from the remaining two samples was generated in order to minimize dye bias in the microarray analysis. Paired Cy3 and Cy5 labeled cDNA from the M2G and M2I samples were hybridized onto spotted DNA oligo arrays using a protocol previously described [27]. After hybridization and washing, the arrays were scanned with a GenePix 4000B scanner (Axon Instruments). Scanned spots were converted to ratios (red/green) with GenePix Pro 5.0 software. Expression ratio data (glucose/inositol) for the four biological replicates were normalized by median centering and analyzed using the Significance Analysis for Microarrays (SAM) package [28]. Genes that showed a 2-fold or greater mean expression change (either up or down in myo-inositol relative to glucose) and that were determined to be significant in SAM using a 5% false discovery cutoff are included in Table S1. DNA microarray data have been deposited in the Gene Expression Omnibus (GEO) database (http://ncbi.nlm.nih.gov/geo) under accession number GSE12414.

Promoter Activity Assays

To measure the promoter activities of inositol-regulated genes/operons, we first PCR-amplified promoter regions of three genes. The idhA promoter region extends from chromosomal coordinate 1,443,231 to 1,443,769; the iolC promoter region extends from coordinate 1,443,840 to 1,444,387; the ibpA promoter extends from coordinate 955,627 to 955,895. These fragments were digested and cloned into the reporter plasmid pRKlac290 [29] using the EcoRI and HindIII sites. The resulting promoter-lacZ transcriptional fusion plasmids were introduced into C. crescentus CB15N or CB15NΔiolR by tri-parental conjugation using the E. coli helper strain FC3, which carries the pRK600 plasmid [30].

β-galactosidase activity from the LacZ-promoter reporter strains was determined colorimetrically using cells in log phase (0.1–0.3 OD660) at 30°C; Z-buffer (60 mM Na2HPO4, 60 mM NaH2PO4, 10 mM KCl, 1 mM MgSO4) and an excess of o-nitrophenyl-β-D-galactopyranoside was added to chloroform-permeabilized cells and absorbance was measured at 420 nm on a Spectronic Genesys 20 Spectrophotometer (ThermoFisher Scientific, Waltham, MA). Two palindromic consensus motifs in the iolC promoter - that we also identified upstream of several myo-inositol-regulated genes - were mutated by PCR using mismatched oligos. The site-directed mutagenesis PCR was followed by 1 hour of DpnI digestion, and 1 µL of the digested reaction was transformed into electrocompetent E. coli TOP10 cells (Invitrogen, Carlsbad, CA). pCR-BluntII plasmids (Invitrogen, Carlsbad, CA) containing the mutant iolC promoters were amplified and purified, and the mutated iolC promoters were excised with EcoRI and HindIII, and sub-cloned into pRKLac290 to generate mutated PiolC-lacZ transcriptional fusions. The motif that is positioned between 104 and 119 bases upstream of the predicted start codon of iolC was mutated from TGGACCATATGTTCCA to TGTACCATATGTACAA. The motif positioned between 45 and 60 bases upstream of the predicted iolC start codon was mutated from TGGAATATGCGTTACA to TTGCATATGCGGTACA.

Growth Curves

Cell growth in different media types was measured in triplicate in bulk culture grown in 13 mm glass tubes in an Infors tube shaker (ATR Biotech, Laurel, MD), at 30°C, 220 rpm. Density measurements for individual cultures were taken hourly up to 0.3 OD660 in a Genesys20 Spectrophotometer (ThermoFisher Scientific, Waltham, MA) and the growth rate was determined by fitting the data to an exponential growth equation:

equation image

in Prism (GraphPad Software, San Diego) , where y0 is the initial cell density, k is growth rate, and t is time.

Computational Protein Network Integration

For each of 305 sequenced prokaryotic genomes, we assembled a battery of different predictors of protein association including coexpression, coinheritance, colocation, and coevolution. We formulated the network integration problem as a binary classifier, where the goal is to distinguish functionally linked protein pairs (L = 1) from non-interacting pairs (L = 0). In this formulation, a vector of interaction predictors is the input to a binary classifier function, which returns the integrated probability that two proteins are functionally linked. To calculate the mapping between raw interaction data and integrated probabilities, the classifier function is trained on a set of known interactions. Applying this classifier to predict interaction probabilities for all protein pairs in a genome yields a probabilistic protein interaction network.

Specifically, we generated the training set of known interactions by using KEGG [31] classifications of individual proteins to produce an annotation of protein pairs. For each pair we recorded if the proteins had overlapping annotations (L = 1), if both were in entirely non-overlapping KEGG categories (L = 0), or if either protein lacked an annotation code or was marked as unknown (L = ?). We also calculated four functional genomic and experimental predictors: 1) coexpression; the Pearson correlation between genes in publically-available DNA microarray expression data, 2) coinheritance; the Pearson correlation between protein phylogenetic profiles [32], 3) coevolution; the Pearson correlation between protein distance matrices, taken elementwise [33], and 4) collocation; the average chromosomal distance between ORFs. Each of these predictors is defined on a pair of proteins rather than an individual protein and can be arranged in a four dimensional vector:

equation image

It can be shown empirically that the distribution of functionally linked protein pairs is shifted relative to the distribution of functionally unlinked pairs [34]. Intuitively, this means that each genomic evidence type is a predictor of protein functional interaction. We can combine these predictors to obtain the integrated probability of protein interaction via Bayes' rule [35].

In practice, the quotient formula for the Bayesian posterior probability is quite sensitive to fluctuations in the denominator. To deal with this, we used bootstrap aggregation [36] to smooth the posterior as follows:

equation image

where M is the number of bootstrap replicates.

Thus, for each pair of proteins, we have a value P(L = 1|E1,E2,E3,E4) which represents the integrated probability of protein interaction over several data types.

Additional computational details underlying this protein network prediction strategy are discussed in Srinivasan, et al. [34]. A web interface for this functional networking database containing predicted networks for 305 bacterial species is available at http://networks.stanford.edu.

Multiple Network Alignment

Network alignment is a systems-biology analog of sequence alignment that compares protein association networks between different species in an effort to identify conserved functional modules. Such modules are sets of proteins that have both conserved primary sequences and conserved pairwise statistical associations between species. For automated network alignment, we used the experimentally- and computationally-defined myo-inositol network from C. crescentus as a query module. This module was used to conduct query-to-network alignment searches across computationally-predicted protein interaction networks of 5 related α-proteobacterial species [34]; these interaction networks had been previously defined using the statistical protein network prediction strategy outlined above. The bacterial species included in this alignment were Sinorhizobium meliloti, Mesorhizobium loti, Brucella melitensis, Agrobacterium tumefaciens, and Bradyrhizobium japonicum. Initial alignment identified the best match to the query in each protein interaction network.

Specifically, we used the Graemlin algorithm [37] to perform automated cross-species alignment. Graemlin incorporates ideas from sequence alignment to perform query-to-network alignment accurately and efficiently. To search multiple networks for matches to a query module, Graemlin first aligns the query module to the evolutionarily closest network by identifying a high scoring pair of proteins within the query and network and aligning them. Then, Graemlin extends the alignment by aligning the pair of proteins that will increase the score of the alignment the most, continuing until it cannot further increase the score of the alignment. The score for aligning a pair of proteins is higher when the proteins are 1) sequence similar and 2) connected to many proteins in the current alignment. Once Graemlin aligns the query module to the evolutionarily closest (i.e. highest scoring) network, it aligns the resulting alignment to the next evolutionarily closest network. To perform this alignment it uses the same algorithm that it uses to perform the first alignment, with an adjusted scoring function [37]. Graemlin continues performing alignments in this fashion until it has aligned the query to every network. To date, Graemlin is the only algorithm capable of aligning a query module to more than three networks. Our benchmarks have shown that when aligning a query module to a single network, this method of alignment is more accurate and efficient than existing network alignment algorithms [37].

To improve the predictive power of the alignment, we manually refined the alignment to keep the best candidates in each species using the following criteria: 1) in each species, we considered only transporter operon candidates in which the three ABC transporter components were contiguous on the chromosome; this resulted in several candidate conserved operons in each species, 2) in each species, we assessed the similarity of each candidate operon to those in all other species in the alignment. We then calculated, for each protein in the candidate operon, the average BLAST significance score to its predicted counterpart in all other species; the candidate operon with the best average significance score (i.e. lowest average p-value) was selected for inclusion in the final cross-species module. Additional computational details underlying this protein network prediction strategy are discussed in Flannick et al. [37]. The network alignment tool Graemlin 2.0 is available under the GNU public license at http://graemlin.stanford.edu.

Motif Discovery

We used MEME [38] to locate putative regulatory motifs in the upstream regions of genes in the C. crescentus myo-inositol module. In order to refine this motif, and also to investigate its conservation in other species, we used MEME to search 250 base pairs upstream of the predicted translation start sites of genes in the predicted inositol modules in each of the species present in our multi-species network alignment. The MEME search parameters were as follows: motif distribution, 0–1 per sequence; minimum motif width, 6; maximum motif width, 50.

Results/Discussion

Forward and Reverse genetic Identification of Genes That Are Essential for Growth on myo-Inositol

Using an arrayed library of ≈16,000 mutant C. crescentus strains carrying transposon insertions, we conducted a forward genetic screen for mutants that could not grow on myo-inositol as the sole carbon source. Three strains, FC354, FC362 and FC536, were discovered that were unable to grow on M2-myo-inositol medium (M2I) but exhibited normal growth on PYE, M2-cellobiose (M2C) and M2-glucose (M2G). Strain FC536 has a transposon insertion in the myo-inositol 2-dehydrogenase (idhA; CC1296, NP_420109) gene. The IdhA homolog from B. subtilis has been characterized biochemically [39], and is known to catalyze the first dehydrogenation reaction in the myo-inositol degradation pathway (Figure 1). Strain FC362 contains a transposon insertion in the iolD gene (CC1299, NP_420112). IolD has also been characterized in B. subtilis where it was shown to catalyze hydrolysis and ring opening of the catabolic intermediate D-2,3-diketo-4-deoxy-epi-inositol to form 5-dehydro-2-deoxy-D-gluconate [14]. The transposon insertion in strain FC362 likely disrupts expression of not only iolD, but also genes downstream of iolD in the operon encoding other known myo-inositol catabolic enzymes (Figure 2C and Figure 1).

Figure 2
Genomic organization of the C. crescentus myo-inositol module and regulation of gene expression by myo-inositol and glucose.

The third strain identified in our screen, FC354, contained a transposon that mapped to CC0860, a gene encoding a ATPase protein in an operon predicted to encode an ATP-binding cassette (ABC) sugar transporter (Figure 2B). This transporter operon is physically separated on the chromosome from the genes encoding the catabolic enzymes by ≈500 kilobases (Figure 2). ABC sugar transporters are inner-membrane transporters that employ three components - a periplasmic sugar binding protein, a transmembrane permease and a cytoplasmic ATPase - to move sugars from the periplasm to the cytoplasm [40]. To confirm that this transporter operon, CC0859–CC0861, is required for growth on myo-inositol, we constructed strains with in-frame deletions of each of these genes individually: C. crescentus strains CB15NΔibpA (CC0859, inositol binding protein, NP_419676), CB15NΔiatA (CC0860, inositol ABC transporter ATPase, NP_419677), and CB15NΔiatP (CC0861, inositol ABC transporter permease, NP_419678). Individual in-frame deletions of each of these genes abolished growth on defined medium containing myo-inositol as the sole carbon source, but not on defined minimal glucose medium (Table 2) or PYE complex medium. Growth on myo-inositol in the individual in-frame transporter deletion strains was restored by complementation with a replicating vector carrying the entire ibpA-iatA-iatP locus under the control of its own promoter (Table 2).

Table 2
Doubling times, in minutes, of C. crescentus strains in M2 minimal medium with either glucose (M2G) or myo-inositol (M2I) as the sole carbon source.

The inability of C. crescentus strains lacking any gene in the ibpA-iatA-iatP operon to grow in myo-inositol demonstrates that this operon encodes the only inner-membrane myo-inositol transporter in C. crescentus.

Microarray Identification of Genes That Are Differentially Expressed in myo-Inositol Relative to Glucose

Whole-genome transcriptional profiling using DNA microarrays was conducted to identify genes with differential regulation in myo-inositol relative to glucose as the sole carbon source. 50 genes were found to have transcript levels that were at least 2-fold higher in cells grown in myo-inositol than in glucose (see Materials and Methods for data analysis parameters) (Figure 2A and Table S1). Among these genes, as expected, are the catabolic genes idhA and the iolECBDA operon, as well as the gene, ibpA, encloding the periplasmic binding protein of the myo-inositol ABC transporter.

The most highly induced gene in myo-inositol relative to glucose (>4-fold) is isocitrate lysase (CC1764, NP_420572) which catalyzes formation of glyoxylate and succinate from isocitrate. This result suggests that growth of C. crescentus on myo-inositol shifts energy metabolism toward the glyoxylate cycle relative to growth on glucose. The ATPase subunit of a HlyB-family ABC-transporter (gene CC1314, NP_420127) is also four-fold more abundant in myo-inositol than in glucose (Table S1). As discussed above, cells with mutations in the ibpA-iatA-iatP transporter operon fail to grow on myo-inositol as the sole carbon source after one week of incubation (Table 2) providing evidence that this HlyB-family transporter is not a redundant myo-inositol transporter. However, this transporter may be involved in transporting derivatized versions of inositol (e.g. inositol phosphates or lipidated inositols).

Expression of Genes in the myo-Inositol Module Is Regulated by IolR and a Conserved cis-Acting Sequence

The gene CC1297 (NP_420110) is annotated as an RpiR-family transcriptional regulator and encodes a putative SIS (Sugar ISomerase; Pfam 01380) domain at its N-terminus. Based on its predicted function as a sugar-binding transcription factor and its chromosomal location adjacent to the iol catabolic operon (Figure 2C), we predicted that CC1297 would regulate transcription of the iol genes. To test this hypothesis, we constructed a strain with an in-frame deletion of this gene and measured expression from the idhA, iolC (NP_420111), and ibpA promoters in wild type and CC1297 deletion strains using promoter-lacZ fusions as transcriptional reporters. These assays revealed significant derepression of transcription from the idhA, ibpA and iolC promoters in a CC1297 deletion background when cells were grown in PYE complex medium (Student's t-test; p<0.0001) (Figure 3A). This result is consistent with the idea that CC1297 is a transcriptional regulator of the iol genes. As such, we have named this gene iolR. Notably, C. crescentus IolR is not homologous to the IolR proteins previously described in Bacillus subtilis, Corynebacterium glutamicum or Clostridium perfringens [8],[9],[18] and thus defines a new class of myo-inositol regulator proteins. In contrast with these unrelated myo-inositol regulator genes, which are induced by myo-inositol [7],[9], expression of C. crescentus iolR is not regulated by myo-inositol based on our microarray transcriptional profiling data.

Figure 3
Regulation of genes in the myo-inositol module by the transcriptional regulator iolR and a conserved promoter sequence.

We then sought to identify possible regulatory motifs in the predicted promoter regions of genes in the myo-inositol metabolic module of C. crescentus. A MEME search [38] of the DNA sequence of these promoters suggested a consensus palindromic motif, GGAANATNCGTTCCA that is present upstream of ibpA, idhA, iolC and iolA (NP_420115 ) (Figures 2B and C & Figure 4). The iolC promoter contains two copies of this motif with MEME e-values less than 10−8 (Figure 4) and with good conservation of the palindrome. Motif 1 is 104 bp upstream of the predicted translation start site of iolC, while motif 2 is 45 bp upstream (Figure 3C). We mutated each of these two motifs away from consensus (Figure 3C), and measured expression from these mutant iolC promoters in complex medium (PYE). Mutation of motif 1 results in significantly higher LacZ activity than the wild-type promoter (Student's t-test; p<0.001), demonstrating that motif 1 is involved in basal repression of iolC expression. Mutation of motif 2 does not affect measured promoter activity in PYE (p>0.05) (Figure 3B). These results demonstrate that, in the case of motif 1, the palindromic sequence we have identified in the promoters of genes required for myo-inositol metabolism is a functionally relevant regulator of gene expression. Future analysis of the regulatory role of IolR, of motif 2 in the iolC promoter, and of the palindromic motifs in the idhA and ibpA promoters promises to provide insight into additional layers of regulation in this genetic module.

Figure 4
Identification of a conserved motif in the promoters of genes regulated by myo-inositol.

Computational Prediction of a myo-Inositol Catabolic/Transport Module in Caulobacter crescentus

Independent of our experimental work, we applied statistical methods that we previously developed to predict functional associations between genes in prokaryotic genomes (see Materials and Methods and [34]). Figure 5 shows the computationally-predicted “myo-inositol module” of C. crescentus. This is a subset of our whole-genome C. crescentus integrated protein association network, containing proteins encoded in operons at just two distinct chromosomal loci. The first chromosomal locus contains genes (C. crescentus gene numbers CC1296; CC1298–CC1302) that are predicted to be involved in catabolism of myo-inositol by sequence homology to known enzymes involved in myo-inositol catabolism [5], [11][13]. The second locus is an operon containing genes (CC0859–CC0861) that are predicted to encode the three components of a canonical ABC transmembrane sugar transporter [41]: a periplasmic sugar-binding protein, an ATPase subunit, and a transmembrane permease. However, the periplasmic sugar-binding protein of this transporter is only generally annotated as a member of the XylF superfamily in the Conserved Domain Database (CDD score<e−15) [42], and its true substrate was not known at the time we constructed our microbial protein association networks. The C. crescentus inositol module also contains the gene, CC1297 (iolR), which is colocated with the predicted myo-inositol catabolic genes and is annotated as encoding a transcriptional regulator.

Figure 5
Computational prediction of the myo-inositol module in C. crescentus.

We found that the transporter and catabolic proteins have strong intra-operon linkage (>80% confidence), which is largely due to high colocation and coinheritance scores (Figure 5). The inter-operon association between the transporter, catabolic, and regulatory proteins, which are encoded from genes at two disparate chromosomal loci, primarily arises from moderate statistical correlations contained within the microarray coexpression component of our model. Using a 30% confidence cutoff, we deduce that the periplasmic sugar-binding protein CC0859 (IbpA) is functionally linked to several genes in the predicted myo-inositol catabolic operon (Figure 5). No other transmembrane transporters in the C. crescentus genome are predicted to associate with the myo-inositol catabolic genes in our network. This linkage between the myo-inositol catabolic proteins and the ABC sugar transporter is missed using a single association metric such as colocation, coinheritance or coexpression alone. An integrative statistical model, which incorporates multiple predictors of association, is required to identify this association.

As discussed above, genetic and molecular experiments have confirmed the computationally-predicted association between the ABC sugar transporter and the myo-inositol catabolic genes. Specifically, we have shown that 1) proteins encoded by the transporter operon CC0859–CC0861 (now annotated as IbpA, IatA, and IatP) function to form the sole myo-inositol inner-membrane transporter in C. crescentus, 2) transposon disruption of the predicted catabolic locus encompassing CC1296; CC1298–CC1302 (annotated as IdhA, IolC, IolD, IolE, IolB, IolA) abolishes growth of C. crescentus on myo-inositol as the sole carbon source, 3) the transcriptional regulator gene CC1297 (annotated as IolR) functions to regulate expression of the myo-inositol transporter and catabolic genes.

Computational Prediction of Functionally Homologous myo-Inositol Modules in Other Species

Using automated cross-species alignment in combination with manual post-refinement (see Materials and Methods) we identified genetic networks in other bacterial species that we predicted to be functionally homologous to the C. crescentus myo-inositol network (Figure 6). The cross-species alignment conducted in this study indicates significant conservation of the catabolic, regulatory, and transporter proteins across the six α-proteobacterial species aligned. In addition, there are conserved cross-protein functional linkages within each of these species (Figure 6A). Linkage between the transporter and catabolic proteins is particularly strong in M. loti and S. meliloti as evidenced by the large number of association edges between transporter, catabolic, and regulatory genes in these species (Figure 6A). The module is least conserved in B. japonicum, which is discussed further below.

Figure 6
Cross-species module prediction in five other α-proteobacteria.

As discussed above, we discovered a palindromic motif (GGAA-N6-TTCC) with a moderate MEME e-value upstream of several genes in the C. crescentus inositol module (Figure 4). By reasoning that conservation at the gene system level may imply conservation at the level of gene regulation, we searched 250 bases upstream of the predicted translational start sites of genes in our cross-species network alignment for sequences related to the palindromic motif identified in C. crescentus (Figure 4). In these related sequences, we found 21 more examples of this same motif, which was particularly enriched in the predicted upstream homologs of iolC, idhA, and the myo-inositol ABC transporter operons in these species (Figure 6B and and7).7). Incorporating the upstream sequences from all species in the Graemlin alignment in a MEME motif search dramatically improved the significance score for this regulatory motif (Figure 6B).

Figure 7
Genomic organization of the conserved myo-inositol module in five α-proteobacteria.

Notably, B. japonicum is the only one of the six species in our multiple network alignment in which we could not identify this motif upstream of predicted inositol catabolic and transporter genes. Although it contains strong associations at the transporter nodes and for a number of the metabolic genes, it is missing several other myo-inositol catabolic genes and also does not encode a homolog of the regulatory protein IolR (Figure 6). The lack of conservation of several components of the myo-inositol network in B. japonicum decreases our confidence in the functional predictions presented for this species in Figure 6 relative to our predictions for S. meliloti, M. loti, B. melitensis, and A. tumefaciens. We propose that if B. japonicum can metabolize myo-inositol, it employs a different regulatory mechanism, and perhaps enzymatic strategy, than the other α-proteobacteria in our cross-species alignment.

Experimental Validation of Cross-Species Functional Predictions in Sinorhizobium meliloti

The cross-species network predicted that the operon Smb20712-4 (NP_437959, NP_437960, NP_437961) in S. meliloti 1021 is a myo-inositol transporter (Figure 6A). This ABC transporter operon in S. meliloti 1021 is annotated in GenBank as a putative rhizopine transporter, based on homology to the known MocB rhizopine transporter in S. meliloti strain L5-30 [43]. While S. meliloti 1021 cannot metabolize rhizopine [3], rhizopine is derived from myo-inositol [43] suggesting that homology to MocB is a predictor of myo-inositol transport. However, a BLAST search of the S. meliloti 1021 genome using the sequence of C. crescentus IbpA inositol-binding protein as a query did not identify the periplasmic binding protein of the Smb20712-4 operon as the top hit, but rather another protein, Smb20072, that is also annotated as a periplasmic rhizopine-binding protein. Indeed, a simple BLAST search revealed several different ABC transporter operons in S. meliloti with high probability scores to the experimentally-defined C. crescentus myo-inositol transporter (see Table 3 for four candidate operons). Thus, simple pairwise comparisons with the known myo-inositol transporter of C. crescentus cannot easily distinguish the myo-inositol transport system in S. meliloti 1021. Instead, several additional search criteria must be imposed before Smb20712-4 is assigned the highest confidence score as the principal myo-inositol transporter. Specifically, while other operons in S. meliloti showed higher overall homology with select subunits of the C. crescentus ABC transporter, the operon Smb20712-4 clearly showed the highest conservation across all six species in our network alignment (Table 3), and the promoter region of this operon also contained the conserved palindromic motif first identified in C. crescentus (Figure 6B).

Table 3
Using the Graemlin network alignment algorithm [37], four candidate myo-inositol ABC transporter operons in S. meliloti were initially identified using the experimentally-defined C. crescentus inositol module as an alignment template.

To experimentally test our prediction, we tested the growth of strains of S. meliloti Rm2011 (a direct derivative of S. meliloti 1021) carrying Tn5 transposon insertions in either iolA (NP_384832 ) or in the predicted ABC transporter periplasmic binding protein gene, Smb20712 [24], on GTS minimal medium [25] supplemented with either glucose or myo-inositol as the sole carbon source. Both Tn5 mutant strains grew normally in GTS-glucose and in Luria-Bertani (LB) medium. However, the iolA::Tn5 and Smb20712::Tn5 mutant strains did not grow on GTS with myo-inositol as a sole carbon source (Table 4). These results show that S. meliloti IolA is required for growth on myo-inositol and confirm our cross-species computational prediction that the protein Smb20712 is the functional homolog of the C. crescentus periplasmic inositol-binding protein IbpA. As such, we have annotated Smb20712 as ibpA, Smb20713 as iatA, and Smb20714 as iatP.

Table 4
Doubling times, in minutes, of S. meliloti strains grown in Luria-Bertani (LB) broth, and minimal GTS medium supplemented with either glucose (GTS-G) or myo-inositol (GTS-I) as the sole carbon source.

Conclusions

Using forward and reverse genetic strategies, we have defined genes in C. crescentus involved in the metabolism of the abundant environmental sugar, myo-inositol. These experiments uncovered an ABC myo-inositol transporter, and identified a novel myo-inositol regulatory gene and conserved cis-acting promoter regulatory sequence that control gene expression. Together, these genes and regulatory sequences form a metabolic module that ensures C. crescentus can regulate gene expression in response to myo-inositol, transport the sugar across its inner membrane, and catabolize the sugar to form the central metabolite acetyl-CoA. Expanding upon these traditional genetic studies, we also presented a schema for generating reliable cross-species annotation of an entire functional genetic module. Specifically, using statistical and computational methods we leveraged our experimental work on C. crescentus myo-inositol metabolism to produce high-quality gene annotations for functionally-homologous myo-inositol transporters, catabolic enzymes, and transcriptional regulators in five related α-proteobacteria. The myo-inositol genes in all of these species were noncontiguous on their respective chromosomes (Figure 7), making it difficult to predict function using co-location.

Our work has demonstrated the efficacy of combining a statistical protein network prediction algorithm [34] and the Graemlin network alignment algorithm [37] in the prediction and extrapolation of metabolic gene function. The method is significantly more robust than simple sequence comparison as a means to transfer annotation across species. Our network prediction and alignment protocol was validated on multiple levels: First, identification of the palindromic regulatory motif that was defined experimentally in C. crescentus (Figure 4) in the upstream regions of homologous genes/operons in our cross-species network alignment (Figure 6B & 7) provided excellent correlative validation of our alignment and refinement methodology. Second, we directly validated our cross-species functional prediction in S. meliloti, demonstrating that transposon disruption of the iolA gene and the Smb20712-Smb20714 transporter operon abolished growth on myo-inositol as the sole carbon source.

As the number of microbial genome sequences continues to grow, it is imperative that we develop improved methods to define and assign functions to genes, and reliably propagate this functional information across species. This study demonstrates that combining directed genetic, genomic and molecular experiments, statistical functional prediction, and global network alignment provides a powerful means to define and propagate gene function at the pathway level.

Supporting Information

Table S1

Supporting data for DNA microarray experiments. Supporting information file contains two tables that report expression values for Caulobacter grown in glucose versus myo-inositol.

(0.09 MB DOC)

Acknowledgments

We thank Aretha Fiebig for assistance scoring the carbohydrate screen and for helpful suggestions throughout this project, and Sean Pritchard for assistance setting up the initial screen. We also thank the reviewers for their criticism and suggestions, which improved the quality of this manuscript, and the McAdams and Shapiro labs for guidance with microarray data collection.

Footnotes

The authors have declared that no competing interests exist.

CCB was funded by NIH Graduate Training Grant T32 GM007183-32; BSS by an NSF VIGRE postdoctoral fellowship (grant no. EMSW21-VIGRE 0502385); JAF by a Stanford Graduate Fellowship. Computation in the lab of SB was supported by the NSF (Protein Interaction Networks: Integration and Alignment - grant no. DBI-0640211-001). PHV is funded by the U.S. Department of Energy Office of Science (BER, grant no. DE-FG02-05ER64136). SC is funded by a Beckman Young Investigator Award and a grant from the Mallinckrodt Foundation.

References

1. Turner BL, Paphazy MJ, Haygarth PM, McKelvie ID. Inositol phosphates in the environment. Phil Trans Roy Soc B. 2002;357:449–469. [PMC free article] [PubMed]
2. Majumder A, Chatterjee A, Dastidar K, Majee M. Diversification and evolution of L-myo-inositol 1-phosphate synthase. FEBS Letters. 2003;553:3–10. [PubMed]
3. Galbraith M, Feng S-F, Borneman J, Triplett E, Bruijn F, et al. A functional myo-inositol catabolism pathway is essential for rhizopine utilization by Sinorhizobium meliloti. Microbiology. 1998;144:2915–2924. [PubMed]
4. Roberts MF. Inositol in bacteria and archaea. In: Majumder AL, Biswas BB, editors. Biology of Inositols and Phosphoinositides. New York: Springer US; 2006.
5. Berman T, Magasanik B. The pathway of myo-inositol degradation in Aerobacter aerogenes - dehydrogenation and dehydration. J Biol Chem. 1965;241:800–806. [PubMed]
6. Poole P, Blyth A, Reid C, Walters K. myo-inositol catabolism and catabolite regulation in Rhizobium leguminosarum bv. viciae. Microbiology. 1994;140:2787–2795.
7. Yoshida K-I, Aoyama D, Ishio I, Shibayama T, Fujita Y. Organization and transcription of the myo-Inositol operon, iol, of Bacillus subtilis. J Bacteriol. 1997;179:4591–4598. [PMC free article] [PubMed]
8. Kawsar H, Ohtani K, Okumura K, Hayashi H, Shimizu T. Organization and transcriptional regulation of myo-inositol operon in Clostridium perfringens. FEMS Microbiol Lett. 2004;235:289–295. [PubMed]
9. Krings E, Krumbach K, Bathe B, Kelle R, Wendisch V, et al. Characterization of myo-inositol utilization by Corynebacterium glutamicum: the stimulon, identification of transporters and influence on L-lysine formation. J Bacteriol. 2006;188:8054–8061. [PMC free article] [PubMed]
10. Mullaney EJ, Daly CB, Ullah AH. Advances in phytase research. Adv Appl Microbiol. 2000;47:157–199. [PubMed]
11. Anderson W, Magasanik B. The pathway of myo-inositol degradation in Aerobacter aerogenes - conversion of 2-deoxy-5-keto-D-gluconic acid to glycolytic intermediates. J Biol Chem. 1971;246:5662–5675. [PubMed]
12. Anderson W, Magasanik B. The pathway of myo-inositol degradation in Aerobacter aerogenes - identification of the intermediate 2-deoxy-5-keto-D-gluconic acid. J Biol Chem. 1971;246:5653–5661. [PubMed]
13. Berman T, Magasanik B. The pathway of myo-inositol degradation in Aerobacter aerogenes - ring scission. J Biol Chem. 1965;241:807–813. [PubMed]
14. Yoshida K, Yamaguchi M, Morinaga T, Kinehara M, Ikeuchi M, et al. myo-Inositol catabolism in Bacillus subtilis. J Biol Chem. 2008;283:10415–10424. [PubMed]
15. Yebra MJ, Zuniga M, Beaufils S, Perez-Martinez G, Deutscher J, et al. Identification of a gene cluster enabling Lactobacillus casei BL23 to utilize myo-inositol. Appl Environ Microbiol. 2007;73:3850–3858. [PMC free article] [PubMed]
16. Fry J, Wood M, Poole P. Investigation of myo-Inositol catabolism in Rhizobium leguminosarum bv. viciae and its effect on nodulation competitiveness. Mol Plant Microbe Interact. 2001;14:1016–1025. [PubMed]
17. Jiang G, Krishnan AH, Kim YW, Wacek TJ, Krishnan HB. A functional myo-inositol dehydrogenase gene is required for efficient nitrogen fixation and competitiveness of Sinorhizobium fredii USDA191 to nodulate soybean (Glycine max [L.] Merr.). J Bacteriol. 2001;183:2595–2604. [PMC free article] [PubMed]
18. Yoshida K-I, Shibayama T, Aoyama D, Fujita Y. Interaction of a repressor and its binding sites for regulation of the Bacillus Subtilis iol divergon. J Mol Biol. 1999;285:917–929. [PubMed]
19. Albers SV, Koning SM, Konings WN, Driessen AJ. Insights into ABC transport in archaea. J Bioenerg Biomembr. 2004;36:5–15. [PubMed]
20. Barbier-Brygoo H, Gaymard F, Rolland N, Joyard J. Strategies to identify transport systems in plants. Trends Plant Sci. 2001;6:577–585. [PubMed]
21. Evinger M, Agabian N. Envelope-associated nucleoid from Caulobacter crescentus stalked and swarmer cells. J Bacteriol. 1977;132:294–301. [PMC free article] [PubMed]
22. Ely B. Genetics of Caulobacter crescentus. Meth Enzymol. 1991;204:372–384. [PubMed]
23. Thanbichler M, Iniesta AA, Shapiro L. A comprehensive set of plasmids for vanillate- and xylose-inducible gene expression in Caulobacter crescentus. Nuc Acids Res. 2007;35:e137. doi:110.1093/nar/gkm1818. [PMC free article] [PubMed]
24. Pobigaylo N, Wetter D, Szymczak S, Schiller U, Kurtz S, et al. Construction of a large signature-tagged mini-Tn5 transposon library and its application to mutagenesis of Sinorhizobium meliloti. Appl Environ Microbiol. 2006;72:4329–4337. [PMC free article] [PubMed]
25. Kiss GB, Vincze E, Kalman Z, Forrai T, Kondorosi A. Genetic and biochemical analysis of mutants affected in nitrate reduction in Rhizobium meliloti. J Gen Microbiol. 1979;113:105–118.
26. Viollier PH, Thanbichler M, McGrath PT, West L, Meewan M, et al. Rapid and sequential movement of individual chromosomal loci to specific subcellular locations during bacterial DNA replication. Proc Natl Acad Sci U S A. 2004;101:9257–9262. [PMC free article] [PubMed]
27. Hottes A, Meewan M, Yang D, Arana N, Romero P, et al. Transcriptional profiling of Caulobacter crescentus during growth on complex and minimal media. J Bacteriol. 2004;186:1448–1461. [PMC free article] [PubMed]
28. Tusher VG, Tibshirani R, Chu G. Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA. 2001;98:5116–5121. [PMC free article] [PubMed]
29. Gober JW, Shapiro L. A developmentally regulated Caulobacter flagellar promoter is activated by 3′ enhancer and IHF binding elements. Mol Biol Cell. 1992;3:913–926. [PMC free article] [PubMed]
30. Finan TM, Kunkel B, De Vos GF, Signer ER. Second symbiotic megaplasmid in Rhizobium meliloti carrying exopolysaccharide and thiamine synthesis genes. J Bacteriol. 1986;167:66–72. [PMC free article] [PubMed]
31. Kanehisa M, Goto S, Hattori M, Aoki-Kinoshita KF, Itoh M, et al. From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res. 2006;34:D354–357. [PMC free article] [PubMed]
32. Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates TO. Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc Natl Acad Sci U S A. 1999;96:4285–4288. [PMC free article] [PubMed]
33. Pazos F, Valencia A. Similarity of phylogenetic trees as indicator of protein-protein interaction. Protein Eng. 2001;14:609–614. [PubMed]
34. Srinivasan BS, Novak A, Flannick J, Batzoglou S, McAdams HH. Integrated protein interaction networks for 11 microbes. In: Apostiloco A, Guerra C, Istrail S, Pevzner P, Waterman M, editors. Research in Computational Molecular Biology.Lecture Notes in Computer Science v. 3909. New York, NY: Springer; 2006.
35. Bayes T. Studies in the history of probability and statistics: IX. Thomas Bayes' essay towards solving a problem in the doctrine of chances. Biometrika. 1958;45:296–315.
36. Breiman L. Bagging predictors. Machine Learning. 1996;24:123–140.
37. Flannick J, Novak A, Srinivasan B, McAdams H, Batzoglou S. Graemlin: general and robust alignment of multiple large interaction networks. Genome Res. 2006;16:1169–1181. [PMC free article] [PubMed]
38. Bailey T, Elkan C. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Menlo Park, CA: AAAI Press; 1994. [PubMed]
39. Ramaley R, Fujita Y, Freese E. Purification and properties of Bacillus subtilis inositol dehydrogenase. J Biol Chem. 1979;254:7684–7690. [PubMed]
40. Davidson A, Chen J. ATP-binding cassette transporters in bacteria. Annu Rev Biochem. 2004;73:241–268. [PubMed]
41. Fetsch EE, Davidson AL. Maltose transport through the inner membrane of E. coli. Front Biosci. 2003;8:d652–660. [PubMed]
42. Marchler-Bauer A, Bryant S. CD-Search: protein domain annotations on the fly. Nuc Acids Res. 2004;32:327–331. [PMC free article] [PubMed]
43. Rossbach S, Kulpa D, Rossbach U, Bruijn F. Molecular and genetic characterization of the rhizopine catabolism (mocABRC) genes of Rhizobium meliloti L5-30. Mol Gen Genet. 1994;245:11–24. [PubMed]
44. Crooks GE, Hon G, Chandonia JM, Brenner SE. WebLogo: a sequence logo generator. Genome Res. 2004;14:1188–1190. [PMC free article] [PubMed]
45. Kanehisa M, Araki M, Goto S, Hattori M, Hirakawa M, et al. KEGG for linking genomes to life and the environment. Nucleic Acids Res. 2008;36:D480–484. [PMC free article] [PubMed]
46. The Gene Ontology Consortium. Gene Ontology: tool for the unification of biology. Nature Genetics. 2000;25:25–29. [PMC free article] [PubMed]

Articles from PLoS Genetics are provided here courtesy of Public Library of Science
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...