Logo of jbacterPermissionsJournals.ASM.orgJournalJB ArticleJournal InfoAuthorsReviewers
J Bacteriol. 2004 May; 186(10): 3254–3258.
PMCID: PMC400601

Genome-Wide Analysis of Lipoprotein Expression in Escherichia coli MG1655


To gain insight into the cell envelope of Escherichia coli grown under aerobic and anaerobic conditions, lipoproteins were examined by using functional genomics. The mRNA expression levels of each of these genes under three growth conditions—aerobic, anaerobic, and anaerobic with nitrate—were examined by using both Affymetrix GeneChip E. coli antisense genome arrays and real-time PCR (RT-PCR). Many genes showed significant changes in expression level. The RT-PCR results were in very good agreement with the microarray data. The results of this study represent the first insights into the possible roles of unknown lipoprotein genes and broaden our understanding of the composition of the cell envelope under different environmental conditions. Additionally, these data serve as a test set for the refinement of high-throughput bioinformatic and global gene expression methods.

Bacterial lipoproteins comprise a unique set of proteins modified at their amino-terminal cysteines by the addition of N-acyl and S-diacyl glyceryl groups (30). In Escherichia coli, this lipid serves to anchor these proteins to the inner or outer membrane so that they can function at the lipid aqueous interface. These proteins can be identified by the presence of a leader with a common consensus sequence (5). The leader is typically between 15 and 40 amino acid residues in length and has at least one arginine or lysine in the first seven residues. The leader is cleaved by signal peptidase II on the amino terminal side of the cysteine residue, which is then enzymatically modified (30).

The E. coli genome has previously been searched for potential lipoproteins. Various algorithms have been used for genome sequence analysis to identify potential lipoproteins, and these lipoproteins have been tabulated in databases on the World Wide Web (http://www.mrc-lmb.cam.ac.uk/genomes/dolop/, http://www.expasy.org/prosite, and http://www.projectcybercell.com); from these databases, we compiled a list of 96 lipoproteins. Fifty-six of these genes (58%) have completely unknown functions, a much higher fraction than that for the E. coli genome, in which approximately 25 to 30% of the genes have no known function. Thus, the examination of the expression of the lipoprotein genes under different growth conditions would be a beginning to understanding the function and importance of many of the unknown genes.

Other putative lipoproteins exist in E. coli but were not part of the gene expression study. First, the murein transglycosylase MltE (Blattner no. b1163) is not in any current lipoprotein database but has been experimentally shown to be a lipoprotein (17). Second, yifL (Blattner no. b3808.1) was originally not annotated in the E. coli genome sequencing project, but YifL now appears in the Prosite database (http://www.expasy.org/prosite) as a putative lipoprotein. Also, very small lipoproteins such as the entericidins (EcnA and EcnB) (3) were omitted from this study because they are below the Affymetrix cutoff for open reading frame (ORF) inclusion (150 bp).

In the present study, we used this set of protein genes to begin analyzing the global changes in gene expression during aerobic and anaerobic growth with a view to understanding the changes in the composition of the cell envelope. The expression of lipoprotein mRNAs in E. coli MG1655 incubated in glucose defined media (21) either aerobically with shaking in an Erlenmeyer flask or anaerobically in a sealed screw-cap tube, with 40 mM KNO3 being added to one set of anaerobic cultures as an alternative electron acceptor, was monitored. RNA was then isolated from the cells with a MasterPure RNA purification kit (Epicentre Technologies, Madison, Wis.), and cDNA synthesis and labeling was done as described in the Affymetrix GeneChip E. coli Antisense Genome Array Technical Manual (1). Affymetrix GeneChip antisense E. coli genome arrays were used to analyze the complete E. coli transcriptome. Each microarray contained 295,000 probes. Each identified ORF was covered by 15 probe pairs consisting of a perfect match and a 1-nucleotide mismatch pair. If the perfect match probe showed an intensity that was 200 U higher than that of the mismatch probe, the probe pair was considered to be present. An ORF was considered to be present with 95% confidence if neighboring probe pairs within an ORF were present.

Using this cutoff, we were able to group the lipoproteins into four classes, as listed in Table Table1.1. Twenty-one lipoprotein genes were not expressed (not present in the array analysis) under any of the selected conditions. Ten were present under one growth condition, 5 were present under two conditions, and 60 were present under all three conditions. Sixty-four of the lipoprotein genes were expressed at detectable levels during aerobic growth, the standard experimental growth condition for E. coli. Not surprisingly, lpp, the gene for the major structural outer membrane murein lipoprotein (25), has the highest expression level of all the genes. Other well-known lipoprotein genes highly expressed under aerobic growth conditions include pal, the gene for peptigoglycan-associated lipoprotein (19), and cyoA, which encodes a subunit of the cytochrome O terminal oxidase, the major terminal oxidase of the aerobic respiratory chain (7).

E. coli lipoprotein genes and their expression in microarray and RT-PCR analysesa

We then used real-time PCR to help better quantify the expression levels for the lipoprotein genes. First, reverse transcription (RT) was carried out with the same total RNA samples used for the microarray analysis and random hexamer primer (Invitrogen, Burlington, Ontario, Canada). RT was performed with SuperScript II (Invitrogen) for reactions with RT (+RT reactions). Control reactions were also performed under the same conditions except that SuperScript II was omitted (−RT reactions). Both types of reactions were used in real-time PCRs.

Primers for real-time PCR were designed with Primer Express 2.0 software from Applied Biosystems (ABI) (Foster City, Calif.). Forward and reverse primer pairs were designed for the 5′ and 3′ regions of each gene and purchased from Sigma Genosys (Oakville, Ontario, Canada). Real-time PCRs were carried out for each primer set with both the +RT and −RT reactions for each growth condition. The reaction buffer contained, in part, 1× ROX glycine conjugate of 5-carboxy-X-rhodamine, with succinimidyl ester as the inert-passive reference dye, and SYBR Green I. The reaction mixtures were aliquoted into 384-well ABI reaction plates. The plates were then placed in an ABI Prism 7900HT RT-PCR machine under the following conditions: stage 1 consisted of 95°C for 45 s; stage 2 consisted of 40 cycles of 95°C for 15 s, followed by 60°C for 1 min; stage 3 consisted of 95°C for 15 s; stage 4 consisted of 60°C for 15 s; and for stage 5, the temperature was ramped to 95°C for 5 s. The RT-PCR data were analyzed with SDS 2.0 software (ABI). Each +RT-versus-−RT reaction set was compared against a standard curve generated for each primer set by using E. coli linear DNA as a standard. A cycle threshold value was chosen that gave a linear regression value greater than 0.996 for each primer set standard curve. The calculated quantity values for each +RT or −RT reaction were standardized within each individual primer set-generated standard curve.

The RT-PCR data correlate well with the microarray data in that highly expressed genes found in the microarray study also give high RT-PCR signals. However, the RT-PCR results are much more accurate and sensitive and give a wider dynamic range of numbers. Signal intensity ratios for anaerobic-versus-aerobic and anaerobic-plus-nitrate-versus-aerobic data sets were calculated for both microarray (“present” values only) and RT-PCR data and are compared in a scatter plot in Fig. Fig.1.1. The anaerobic/aerobic ratios had very good correlation between RT-PCR and microarray data, with an R2 value of 0.888; the nitrate/aerobic ratios had a slightly lower correlation (R2 = 0.757).

FIG. 1.
Scatter plot comparative analysis of microarray and RT-PCR data. The signal intensities of 61 genes which were designated “present” under all three growth conditions in the microarray data set were compared to the signal intensities generated ...

A further examination of gene expression patterns based on the RT-PCR data was then undertaken in order to gain some insight into possible functions of unknown lipoproteins. Growth under anaerobic conditions results in significant changes in the expression of many of the lipoprotein genes relative to aerobic expression. The expression of key structural protein genes such as lpp and pal remained fairly constant under these conditions, which was expected given their “housekeeping” role. However, transcripts of 14 genes are induced twofold or more under anaerobic growth. The gene with the strongest anaerobic induction is slp (Table (Table1).1). This gene is known to be induced under conditions of starvation or in the stationary phase (2), so perhaps it is not surprising that it is also induced during slow anaerobic growth. Other genes with strong anaerobic induction include ybgE, which appears to be cotranscribed with the cydAB cytochrome D terminal oxidase (also strongly induced anaerobically [data not shown]), and osmB, a lipoprotein gene which is also induced by high osmotic strength and in the stationary phase (15, 16). Only five genes had twofold or greater reductions of signal intensity under anaerobic conditions relative to that under aerobic conditions. These genes included cusC/ibeB, which is induced by high concentrations of copper ions (20) and appears to be important for virulence and invasion across the blood-brain barrier for other E. coli strains (12, 13), and spr, which encodes a putative penicillin binding protein (11).

The addition of nitrate to an anaerobic culture as an alternative electron acceptor also influences the expression of several of the lipoprotein genes. It is known that the addition of nitrate to anaerobic cultures regulates the expression of many genes, especially those involved in alternative electron transport pathways (26). When grown anaerobically with nitrate, the RT-PCR signals for 13 genes decreased twofold or more and those for 4 genes increased twofold or more relative to those under anaerobic conditions without nitrate. Anaerobically induced genes such as slp and yhiU are repressed with the addition of nitrate. Two of the four lipoproteins induced by the addition of nitrate to an anaerobic culture, albeit with low overall signals, are ymcA and ymcC, which form part of a putative ymcCBA operon. Another gene induced by nitrate is osmE, a putative lipoprotein gene which is induced by high osmotic strength (10).

Microarrays have fast become a commonly used tool to examine global expression profiles for many bacterial species (see reference 6 for a recent review). Many of these microarray studies have gone one step further by selection of a subset of genes to determine expression by RT-PCR data, which are then compared to the microarray data (4, 8, 9, 18, 22-24, 27-29). However, in these cases, the genes studied by RT-PCR are chosen as a subset of genes of interest that were originally identified by the microarray assay. The approach presented here is different; the gene set to be studied by RT-PCR was not chosen on the basis of the microarray results; instead, it was chosen based on known or predicted functions of the gene products. Even with this unbiased approach to selecting genes for RT-PCR analysis, the correlation between the RT-PCR data and the microarray data is very good. The more accurate and quantitative RT-PCR data were not used just for comparative purposes, however. These data were then used to identify potentially significant unknown lipoprotein genes with either high gene expression levels or significant changes in gene expression depending on growth conditions. This study has produced the first real data reported for these unknown genes and may lead to more effective investigation of these genes in the future. With the usefulness of this approach assured, it is now time to further study the other unknown lipoprotein genes showing either strong or varied expression levels by other means.


As this paper was under review, other potential lipoproteins in E. coli came to our attention, especially those predicted in a recent related paper (14). These genes—rcsF (Blattner no. b0196), borD (b0557), ybjR (b0867), ycaL (b0909), hslJ (b1379), ynfC (b1585), smpA (b2617), yggG (b2936), yidQ (b3688), yidX (b3696), and yjeI (b4144)—are included in the microarray results, but RT-PCR data were not generated.


This work was supported by the Natural Sciences and Engineering Research Council and by Project CyberCell.

J.H.W. is a Canada Research Chair on Membrane Biochemistry. We thank Philip Winter for data analysis.


1. Affymetrix, Inc. 2002, posting date. GeneChip® E. coli antisense genome array technical manual. [Online.] Affymetrix, Inc., Santa Clara, Calif. http://www.affymetrix.com/support/technical/manuals.affx.
2. Alexander, D. M., and A. C. St John. 1994. Characterization of the carbon starvation-inducible and stationary phase-inducible gene slp encoding an outer membrane lipoprotein in Escherichia coli. Mol. Microbiol. 11:1059-1071. [PubMed]
3. Bishop, R. E., B. K. Leskiw, R. S. Hodges, C. M. Kay, and J. H. Weiner. 1998. The entericidin locus of Escherichia coli and its implications for programmed bacterial cell death. J. Mol. Biol. 280:583-596. [PubMed]
4. Boyce, J. D., I. Wilkie, M. Harper, M. L. Paustian, V. Kapur, and B. Adler. 2002. Genomic scale analysis of Pasteurella multocida gene expression during growth within the natural chicken host. Infect. Immun. 70:6871-6879. [PMC free article] [PubMed]
5. Braun, V., and H. C. Wu. 1993. Lipoproteins, structure, function, biosynthesis and model for protein export, p. 319-342. In J.-M. Ghuysen and R. Hakenbeck (ed.), Bacterial cell wall, vol. 27. Elsevier Science, Amsterdam, The Netherlands.
6. Conway, T., and G. K. Schoolnik. 2003. Microarray expression profiling: capturing a genome-wide portrait of the transcriptome. Mol. Microbiol. 47:879-889. [PubMed]
7. Gennis, R. B., and V. Stewart. 1996. Respiration, p. 217-261. In F. C. Neidhardt, R. Curtiss, J. L. Ingraham, E. C. C. Lin, K. B. Low, B. Magasanik, W. S. Reznikoff, M. Riley, M. Schaechter, and H. E. Umbarger (ed.), Escherichia coli and Salmonella: cellular and molecular biology, 2nd ed., vol. 1. ASM Press, Washington, D.C.
8. Graham, M. R., L. M. Smoot, C. A. Migliaccio, K. Virtaneva, D. E. Sturdevant, S. F. Porcella, M. J. Federle, G. J. Adams, J. R. Scott, and J. M. Musser. 2002. Virulence control in group A Streptococcus by a two-component gene regulatory system: global expression profiling and in vivo infection modeling. Proc. Natl. Acad. Sci. USA 99:13855-13860. [PMC free article] [PubMed]
9. Guckenberger, M., S. Kurz, C. Aepinus, S. Theiss, S. Haller, T. Leimbach, U. Panzner, J. Weber, H. Paul, A. Unkmeir, M. Frosch, and G. Dietrich. 2002. Analysis of the heat shock response of Neisseria meningitidis with cDNA- and oligonucleotide-based DNA microarrays. J. Bacteriol. 184:2546-2551. [PMC free article] [PubMed]
10. Gutierrez, C., S. Gordia, and S. Bonnassie. 1995. Characterization of the osmotically inducible gene osmE of Escherichia coli K-12. Mol. Microbiol. 16:553-563. [PubMed]
11. Hara, H., N. Abe, M. Nakakouji, Y. Nishimura, and K. Horiuchi. 1996. Overproduction of penicillin-binding protein 7 suppresses thermosensitive growth defect at low osmolarity due to an spr mutation of Escherichia coli. Microb. Drug Resist. 2:63-72. [PubMed]
12. Huang, S. H., Y. H. Chen, Q. Fu, M. Stins, Y. Wang, C. Wass, and K. S. Kim. 1999. Identification and characterization of an Escherichia coli invasion gene locus, ibeB, required for penetration of brain microvascular endothelial cells. Infect. Immun. 67:2103-2109. [PMC free article] [PubMed]
13. Huang, S. H., and A. Y. Jong. 2001. Cellular mechanisms of microbial proteins contributing to invasion of the blood-brain barrier. Cell. Microbiol. 3:277-287. [PubMed]
14. Juncker, A. S., H. Willenbrock, G. Von Heijne, S. Brunak, H. Nielsen, and A. Krogh. 2003. Prediction of lipoprotein signal peptides in gram-negative bacteria. Protein Sci. 12:1652-1662. [PMC free article] [PubMed]
15. Jung, J. U., C. Gutierrez, F. Martin, M. Ardourel, and M. Villarejo. 1990. Transcription of osmB, a gene encoding an Escherichia coli lipoprotein, is regulated by dual signals. Osmotic stress and stationary phase. J. Biol. Chem. 265:10574-10581. [PubMed]
16. Jung, J. U., C. Gutierrez, and M. R. Villarejo. 1989. Sequence of an osmotically inducible lipoprotein gene. J. Bacteriol. 171:511-520. [PMC free article] [PubMed]
17. Kraft, A. R., M. F. Templin, and J. V. Holtje. 1998. Membrane-bound lytic endotransglycosylase in Escherichia coli. J. Bacteriol. 180:3441-3447. [PMC free article] [PubMed]
18. Lee, J. M., S. Zhang, S. Saha, S. Santa Anna, C. Jiang, and J. Perkins. 2001. RNA expression analysis using an antisense Bacillus subtilis genome array. J. Bacteriol. 183:7371-7380. [PMC free article] [PubMed]
19. Mizuno, T. 1981. A novel peptidoglycan-associated lipoprotein (PAL) found in the outer membrane of Proteus mirabilis and other gram-negative bacteria. J. Biochem. (Tokyo) 89:1039-1049. [PubMed]
20. Munson, G. P., D. L. Lam, F. W. Outten, and T. V. O'Halloran. 2000. Identification of a copper-responsive two-component system on the chromosome of Escherichia coli K-12. J. Bacteriol. 182:5864-5871. [PMC free article] [PubMed]
21. Neidhardt, F. C., P. L. Bloch, and D. F. Smith. 1974. Culture medium for enterobacteria. J. Bacteriol. 119:736-747. [PMC free article] [PubMed]
22. Paustian, M. L., B. J. May, and V. Kapur. 2002. Transcriptional response of Pasteurella multocida to nutrient limitation. J. Bacteriol. 184:3734-3739. [PMC free article] [PubMed]
23. Schembri, M. A., K. Kjaergaard, and P. Klemm. 2003. Global gene expression in Escherichia coli biofilms. Mol. Microbiol. 48:253-267. [PubMed]
24. Stintzi, A. 2003. Gene expression profile of Campylobacter jejuni in response to growth temperature variation. J. Bacteriol. 185:2009-2016. [PMC free article] [PubMed]
25. Suzuki, H., Y. Nishimura, S. Yasuda, A. Nishimura, M. Yamada, and Y. Hirota. 1978. Murein-lipoprotein of Escherichia coli: a protein involved in the stabilization of bacterial cell envelope. Mol. Gen. Genet. 167:1-9. [PubMed]
26. Unden, G., and J. Bongaerts. 1997. Alternative respiratory pathways of Escherichia coli: energetics and transcriptional regulation in response to electron acceptors. Biochim. Biophys. Acta 1320:217-234. [PubMed]
27. Voyich, J. M., D. E. Sturdevant, K. R. Braughton, S. D. Kobayashi, B. Lei, K. Virtaneva, D. W. Dorward, J. M. Musser, and F. R. DeLeo. 2003. Genome-wide protective response used by group A Streptococcus to evade destruction by human polymorphonuclear leukocytes. Proc. Natl. Acad. Sci. USA 100:1996-2001. [PMC free article] [PubMed]
28. Wei, Y., J. M. Lee, D. R. Smulski, and R. A. LaRossa. 2001. Global impact of sdiA amplification revealed by comprehensive gene expression profiling of Escherichia coli. J. Bacteriol. 183:2265-2272. [PMC free article] [PubMed]
29. Wilson, J. W., R. Ramamurthy, S. Porwollik, M. McClelland, T. Hammond, P. Allen, C. M. Ott, D. L. Pierson, and C. A. Nickerson. 2002. Microarray analysis identifies Salmonella genes belonging to the low-shear modeled microgravity regulon. Proc. Natl. Acad. Sci. USA 99:13807-13812. [PMC free article] [PubMed]
30. Wu, H. C. 1996. Biosynthesis of lipoproteins, p. 1005-1014. In F. C. Neidhardt, R. Curtiss, J. L. Ingraham, E. C. C. Lin, K. B. Low, B. Magasanik, W. S. Reznikoff, M. Riley, M. Schaechter, and H. E. Umbarger (ed.), Escherichia coli and Salmonella: cellular and molecular biology, 2nd ed., vol. 1. ASM Press, Washington, D.C.

Articles from Journal of Bacteriology are provided here courtesy of American Society for Microbiology (ASM)
PubReader format: click here to try


Save items

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


  • Conserved Domains
    Conserved Domains
    Conserved Domain Database (CDD) records that cite the current articles. Citations are from the CDD source database records (PFAM, SMART).
  • Gene
    Gene records that cite the current articles. Citations in Gene are added manually by NCBI or imported from outside public resources.
  • GEO Profiles
    GEO Profiles
    Gene Expression Omnibus (GEO) Profiles of molecular abundance data. The current articles are references on the Gene record associated with the GEO profile.
  • MedGen
    Related information in MedGen
  • PubMed
    PubMed citations for these articles
  • Substance
    PubChem chemical substance records that cite the current articles. These references are taken from those provided on submitted PubChem chemical substance records.

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...