Logo of jbacterPermissionsJournals.ASM.orgJournalJB ArticleJournal InfoAuthorsReviewers
J Bacteriol. 2001 Sep; 183(17): 5025–5040.

Characterizations of Highly Expressed Genes of Four Fast-Growing Bacteria


Predicted highly expressed (PHX) genes are characterized for the completely sequenced genomes of the four fast-growing bacteria Escherichia coli, Haemophilus influenzae, Vibrio cholerae, and Bacillus subtilis. Our approach to ascertaining gene expression levels relates to codon usage differences among certain gene classes: the collection of all genes (average gene), the ensemble of ribosomal protein genes, major translation/transcription processing factors, and genes for polypeptides of chaperone/degradation complexes. A gene is predicted highly expressed (PHX) if its codon frequencies are close to those of the ribosomal proteins, major translation/transcription processing factor, and chaperone/degradation standards but strongly deviant from the average gene codon frequencies. PHX genes identified by their codon usage frequencies among prokaryotic genomes commonly include those for ribosomal proteins, major transcription/translation processing factors (several occurring in multiple copies), and major chaperone/degradation proteins. Also PHX genes generally include those encoding enzymes of essential energy metabolism pathways of glycolysis, pyruvate oxidation, and respiration (aerobic and anaerobic), genes of fatty acid biosynthesis, and the principal genes of amino acid and nucleotide biosyntheses. Gene classes generally not PHX include most repair protein genes, virtually all vitamin biosynthesis genes, genes of two-component sensor systems, most regulatory genes, and most genes expressed in stationary phase or during starvation. Members of the set of PHX aminoacyl-tRNA synthetase genes contrast sharply between genomes. There are also subtle differences among the PHX energy metabolism genes between E. coli and B. subtilis, particularly with respect to genes of the tricarboxylic acid cycle. The good agreement of PHX genes of E. coli and B. subtilis with high protein abundances, as assessed by two-dimensional gel determination, is verified. Relationships of PHX genes with stoichiometry, multifunctionality, and operon structures are also examined. The spatial distribution of PHX genes within each genome reveals clusters and significantly long regions without PHX genes.

Escherichia coli, Vibrio cholerae, and Haemophilus influenzae are gram-negative γ-proteobacteria that can grow in human tissue and produce or contribute to disease. The principal habitat of E. coli is the human gut, V. cholerae is mainly a freshwater microbe, and H. influenzae is found in the human lung. On the other hand, Bacillus subtilis is a gram-positive, nonpathogenic soil bacterium. The minimal doubling time for these four bacteria in cultures is significantly less than 1 h. Fast growth implies many ribosomes, and these four bacteria have large numbers of rRNA operons per genome.

Predicted highly expressed (PHX) genes are characterized for the rapidly dividing bacteria E. coli, H. influenzae, V. cholerae, and B. subtilis using a method based on codon usage differences among gene classes (21). For complete lists of PHX genes, consult the website ftp://gnomic.stanford.edu/pub (see also Table Table22).

Top 20 PHX genes of the four fast-growing bacteria and their predicted E(g) valuea


Assessments of gene or protein expression levels from codon usage were carried out as follows. High expression is predicted from codon usage as follows. Let G be a family of genes with average codon frequencies g(x, y, z) for the codon nucleotide triplet (x, y, z), normalized so that

equation M1

where the sum extends over all codons (x, y, z) translated to amino acid a. Let f(x, y, z) indicate the average codon frequencies for the gene family F, normalized to 1 in each amino acid codon family. The codon usage difference of the gene family F (or a single gene) relative to the gene family G, termed the codon bias of F with respect to G, is calculated with the following formula:

equation M2

where pa(F) are the average amino acid frequencies of the genes of F (cf. references 19 and 20). Denoted by C is the collection of all genes, by RP the ribosomal protein genes, by CH chaperone/degradation protein genes, and by TF translation/transcription processing genes. Qualitatively, a gene g is deemed PHX if B(g|C) is appropriately high, whereas B(g|RP), B(g|CH), and B(g|TF) are suitably low. Predicted expression levels with respect to individual standards are based on the ratios

equation M3

and the combined expression measure is

equation M4

Other weighted combinations are also possible, but the results do not qualitatively change when different weights are used. We impose higher weight on the RP standard because the RP genes are generally the most PHX in all current completely sequenced genomes (21). The specification of the RP, CH, and TF gene classes as standards derives from the observation that these gene classes are consistently highly expressed in most genomes (21). Thus, these three gene classes (RP, CH, and TF) serve as representatives of highly expressed genes, and our method specifies genes with similar codon usages as PHX genes. These assignments are reasonable under fast growth conditions, where there is a need for many ribosomes, for proficient transcription and translation, and for many chaperone/degradation proteins needed to ensure correctly folded, modified, and translocated protein products.

A gene is predicted highly expressed (PHX) if the following two conditions are satisfied: at least two of the three expression values ERP(g), ECH(g), and ETF(g) exceed 1.05, and the general expression level E(g) is ≥1.00. We sometimes refer to genes that do not unequivocally satisfy this definition but that have an E(g) of approximately 1.00 as marginally PHX.


To expose the significance of the PHX gene classes, we plotted B(g|C) versus B(g|RP) traversing all individual genes g (≥100 codons in length). The plots are given in Fig. Fig.11 for each of the four rapidly growing bacteria. The distribution of points reveals two horns. The left horn effectively corresponds to the PHX genes. The right horn we refer to as putative alien genes. It consists of genes that significantly differ in their codon usages from the four classes C, RP, CH, and TF and will be discussed in a separate publication. If we replace the horizontal axis B(g|RP) with the coordinates of B(g|TF) or B(g|CH), the plots in Fig. Fig.11 remain largely unchanged (data not shown).

FIG. 1
Genes of ≥100 codons in the four fast-growing bacteria. Each gene is represented by a single point. Its position is determined by its bias relative to all genes B(g|C) and by its bias relative to the RP genes B(g|RP). PHX genes ...

Top 20 PHX genes.

The distribution of PHX genes among the four fast-growing bacteria is displayed in Table Table1.1. The highest E(g) value exceeds 2 in all four genomes. Such high values are rare among the completely sequenced genomes (cf. reference 21). These four bacteria have a substantial number of PHX genes, ranging from 142 to 306.

Distribution of PHX genes in four fast-growing bacteria

Table Table22 presents the 20 genes with the highest predicted expression levels in the genomes of E. coli, V. cholerae, H. influenzae, and B. subtilis. In those few instances when the homologous genes in the other genomes are not PHX, their E(g) values are shown in parentheses. The genes are segregated into functional categories. Almost all ribosomal proteins attain high expression levels in all rapidly growing bacteria (Tables (Tables22 and and3).3). The S1 ribosomal protein gene (exceeding 500 codons in length in most bacteria) in B. subtilis is found at the diminished length of 327 codons but is still PHX, with an E(g) value of 1.20. Ribosomal protein genes are present in single copies, in contrast to rRNA genes, and are predominantly of a high expression level, presumably conforming with stoichiometric requirements for ribosome formation between proteins and RNA and among the proteins themselves.

Predicted expression levels for ribosomal protein genes among four fast-growing bacteriaa

The major (eubacterial) chaperone/degradation proteins HSP70 (DnaK) and HSP60 (GroEL) and the mRNA degradation protein polynucleotide phosphorylase (Pnp) are prominently PHX. Pnp in B. subtilis, however, is not PHX [E(g) = 0.79]. The corresponding genes in H. influenzae achieve E(g) values of 1.29, 1.47, and 1.72, respectively. The gene enolase (eno), listed under energy metabolism as part of the glycolysis pathway, is potently PHX. It is also a component of the mRNA degradosome in a multifunctional capacity (27) and so has reasons for being potently PHX.

Processing factors for protein synthesis are outstandingly PHX, especially the ATP-dependent DNA-directed RNA polymerase units RpoB and RpoC and the elongation factors EF-G (fus), EF-Tu (tuf), EF-Ts (tsf). The elongation factor EF-Tu often is present in two copies, both dramatically PHX. B. subtilis has but one copy, and it is PHX. EF-G (fusA) is present in two copies in V. cholerae, with E(g) values of 2.02 and 0.96. The DNA helicase DeaD is PHX in E. coli, V. cholerae, and B. subtilis but not in H. influenzae, nor is the second copy in B. subtilis PHX. It will be interesting to see if there are functional differences between the two copies of DeaD and the EF-G proteins. DeaD box proteins protect mRNA from endonucleases (27).

Many glycolysis genes are among the top PHX genes; they include genes for pyruvate kinase (pykA and pykF), fructose-1,6-bisphosphate aldolase (fba), phosphoglycerate kinase (pgk), enolase (eno), and glyceraldehyde-3-phosphate dehydrogenase (gap).

PHX genes contributing to anaerobic fermentation include the alcohol/acetaldehyde dehydrogenase gene (adhE). Other PHX genes of energy metabolism include several, but significantly not all, genes of the tricarboxylic acid (TCA) cycle. Several subunits of the pyruvate dehydrogenase complex genes are among the top PHX genes. These include genes for multiple copies of the three enzymatic components, pyruvate dehydrogenase E1 (aceE), except in B. subtilis, dihydrolipoamide acetyltransferase E2 (aceF), and lipoamide dehydrogenase E3 (lpdA), all part of the pyruvate oxidation pathway. Genes contributing to proton gradient-driven ATP synthesis (namely, the genes for the two major subunits of the ATP synthase catalytic domain, atpA and atpD) are potently PHX. The PHX gene for adenylosuccinate synthetase (purA) stands out, except in B. subtilis. It participates in the de novo biosynthesis pathway of purine nucleotides and in the first step of AMP biosynthesis. However, the genes for the other enzymes of that pathway are not PHX.

Several porin genes of E. coli, H. influenzae, and V. cholerae are PHX. These are absent from B. subtilis, which is a gram-positive bacterium, lacking the distinctive gram-negative outer membrane. The peptidoglycan-associated lipoprotein (Pal) attached to the outer membrane by a lipid anchor is PHX in gram-negative bacteria. Several lipid biosynthesis PHX genes are among the top 20. The first enzyme of the glyoxylate shunt pathway, isocitrate lyase (AceA), is PHX in the moderately fast-growing Deinococcus radiodurans (90-min average doubling time) and the slow-growing Mycobacterium tuberculosis (24 to 36 h). It exists in E. coli and V. cholerae but is not PHX and has not been detected for most prokaryotic genomes. Isocitrate lyase is widespread in plant and fungal organisms. There is an open reading frame (ORF) (yeiM) of unknown function but possibly encoding a nucleoside transporter, with an E(g) value of 2.00, in V. cholerae, and there is a homolog with an E(g) value of 1.07 in H. influenzae but not PHX in E. coli and B. subtilis. Differences among genes in predicted expression levels present challenging questions for experimentation.

PHX genes in H. influenzae parallel PHX genes in E. coli. These include genes for mainstream glycolysis and TCA enzymes and genes for detoxification and DNA damage control, such as the sodA and catalase genes. The highest E(g) value is 2.01, attained by the elongation factor EF-G (fusA). The heat shock proteins GroEL and DnaK are among the most highly expressed. The ribosome release factor (Rrf) is the top PHX protein in H. influenzae. Rrf is responsible for the release of ribosomes from mRNA at the termination of protein synthesis (37). Rrf is present and generally highly expressed in all eubacterial organisms with completely sequenced genomes but is absent from archaea (35).

Comparison of predicted levels of expression in E. coli with 2D gel patterns.

For many E. coli proteins, two-dimensional (2D) gel electrophoresis data for their abundances during growth in minimal medium are available. We compared the molar abundances of 96 proteins (with lengths of ≥100 amino acids [aa] [45, 46]) with the set of PHX genes (Table (Table4).4). Among the 20 most abundant of the 96 proteins, 17 were identified as PHX by our method. Among the 20 least abundant proteins of the 96, only 7 qualified as PHX. Of the remaining 56 proteins, which have intermediate molar abundances on 2D gels, 28 were identified as PHX. This agreement between high 2D gel abundances and high E(g) values supports naming the genes “highly expressed.”

Comparison of 2D gel expression measurements (45) and predicted E(g) values

Three exceptions to the good agreement between high protein molar abundances and PHX status are MetE, FolA, and IlvE, which are involved in amino acid biosynthesis and methylation. These proteins are among the most abundant in 2D gel determinations but do not qualify as PHX. The enzymatic turnover rate for MetE, determined by kinetic studies, is low but is compensated for with a high molar abundance (12). In E. coli, the methionine biosynthesis pathway includes MetK, with a very high E(g) value, 2.21, whereas MetE has an E(g) value of 0.69 and MetH has an E(g) value of 0.60. MetE and MetH offer strict alternative pathways for l-methionine synthesis. MetK acts on homocysteine to produce S-adenosylmethionine, which serves as a methyl donor for a broad range of metabolites, lipids, and vitamins (41). It has been conjectured that the metE gene or the entire Met operon in E. coli, because of its codon usage, may be a newly laterally transferred gene analogous to the Cob operon of Salmonella enterica serovar Typhimurium (24). FolA (dihydrofolate reductase) registers high 2D gel assessments but has a low E(g) value, 0.60.

Hecker and colleagues (e.g., reference 3) have conducted extensive 2D gel assessments of B. subtilis proteins. Consulting their 2D database (http://microbio2.biologie.uni-greiswald.de:8880), we compared the brightest spots on their gels with the E(g) values for the corresponding proteins: RpS2, 1.84; SerA, 0.62; IlvC, 1.03; AroA, 0.93; Gap, 1.80; PdhC, 2.05; CitC, 1.33; TufA, 1.97; Fus, 2.34; YwjH, 1.01; RpL10, 1.46; ClpP, 1.05; SodA, 1.64; and CitH, 0.81. Most of these proteins are PHX, and several achieve an E(g) value of >1.8. Thus, there is a good correlation of PHX proteins with high 2D gel abundances in B. subtilis, as in E. coli.

Classes of PHX genes.

Tables Tables33 and and55 through through99 compare for the four fast-growing bacteria predicted levels of expression of all ribosomal protein genes, of the genes for the major transcription/translation processing factors, of the chaperone/degradation protein genes, and of the major energy metabolism genes. The extended repair gene repertoire of the four genomes and the vitamin biosynthesis genes of E. coli are evaluated in terms of E(g) levels (Tables (Tables1010 and and11).11). Each class is discussed in turn.

Predicted expression levels for translation/transcription processing genes among four fast-growing bacteriaa
Major energy metabolism genes of the four fast-growing bacteria and their predicted E(g) valuesa
Repair proteinsa
Major vitamin biosynthesis genes of E. coli

(i) Ribosomal protein genes (Table (Table33).

Ribosomes of the four fast-growing bacteria have practically the same numbers of small- and large-subunit proteins. However, among all prokaryotic genomes, that number ranges from 50 to 65, while in eukaryotes, the number is constant at 79 (except in yeast, 78) (48, 50). This information suggests a greater range of variation in the patterns of protein synthesis among prokaryotes, consistent with the constrained phylogenetic origin of eukaryotic cells compared with the less constrained origin of prokaryotic species.

Thirty-five RP genes are shown in Table Table33 (only those ≥100 codons long). Unlike those of yeast and Drosophila, many of the bacterial RP genes are concatenated to form a large operon encompassing 20 to 40% of all RP genes. Genes for some of the major translation/transcription processing factors, including tuf, fus, rpoA, rpoB, and rpoC, are within or near the large RP operon. Other RP operons typically consist of two to five genes. In E. coli, the cluster of L7/L12, L10, L1, L11, rpoB, and rpoC is noteworthy. B. subtilis possesses an RP cluster that effectively combines the two largest clusters of E. coli. In these fast-growing bacteria, most of the eubacterial RP genes are positioned near the origin of replication, oriC. It is evident from Table Table33 that virtually all RP genes are PHX. The EF-Tu gene is often duplicated, with both copies being PHX and incorporated near or in an RP cluster. groEL, rpoB, and rpoC also tend to localize to the vicinity of the main RP cluster. Many eukaryotic and eubacterial ribosomal proteins are multifunctional (50).

The “giant” RP (labeled S1 or RpsA, generally exceeding 500 amino acids in length) has a remarkable phylogeny. It is recognized in most eubacteria but is not part of an RP operon, and it generally reaches among the highest expression levels. In B. subtilis, there is an S1 homolog, but it is only 327 codons long, and the S1 gene is entirely missing from the three current completely sequenced mycoplasma genomes. The S1 gene is essential in E. coli, where it is thought to contribute to the initiation of polypeptide synthesis. The absence of an S1 protein in B. subtilis can possibly be compensated for by a strong ribosome binding site (34). The evolutionarily deep branching bacterium Aquifex aeolicus has a giant S1 gene. Thermotoga maritima, allowing for a frameshift, also has an S1 homolog. None of the archaeal genomes has an S1 homolog, and eukaryotic genomes also lack an S1 homolog.

The origin of replication (oriC) for E. coli is identified within the 232-bp interval from 3923372 to 3923603. The major RP cluster is proximal to oriC at 3436600 to 3476134 and contains, in addition to RP genes, genes for the elongation factors EF-Tu and EF-G and two flanking chaperones of the peptidyl-prolyl cis-trans isomerase (PPIase) family. Proximity to oriC implies a higher-than-average gene copy number per rapidly growing cell. A second RP cluster occurs proximally on the other side of oriC and includes genes for a duplicate copy of EF-Tu (tufB) and the DNA-directed RNA polymerase units rpoB and rpoC. The E(g) values for RP genes (≥100 codons long) in E. coli range from 2.44 to 1.13. All but one of the RP genes are PHX; the single exception is L9 in B. subtilis. The majority have E(g) values exceeding 1.50. The correlations of E(g) values among the RP genes of E. coli, V. cholerae, and H. influenzae are high (Table (Table33).

Does stoichiometry matter? For example, among the RP genes, why aren't all 50S units PHX at the same expression level? A partial answer may be that not all ribosomal proteins play an exclusive role in determining ribosome structure. Some may have a regulatory role (e.g., S1 is proposed to function in translation initiation) (M. Nomura, personal communication) (34). The acidic ribosomal protein component P0 is PHX in archaea but is absent from eubacteria. L7/L12 is also acidic and is thought to act in adapting mRNA chains to the ribosome. Actually, L7/L12 forms dimers with an elongated shape. Two dimers associate with a copy of L10 to form a very strong complex (4). Very relevant is that several ribosomal proteins are multifunctional (50). For example, S9 provides ancillary utility in certain repair activities (49); S16, in part, acts as an endonuclease (31).

(ii) Genes for transcription/translation processing factors (Table (Table55).

The majority of protein synthesis factors are PHX over all prokaryotic genomes. Expression levels correlate highly across species (Table (Table5,5, footnote a). As with the ribosomal proteins, the E(g) values cover a wide range. Elongation factor EF-G (fus) is distinctive, with an E(g) value exceeding 2 for each genome. The highest expression levels in E. coli occur for the RpoB and RpoC subunits of the core RNA polymerase. RpoA is PHX in B. subtilis but not in E. coli, V. cholerae, and H. influenzae. Why are the predicted expression levels for the RpoB and RpoC subunits higher than that for RpoA? Based on the RNA polymerase stoichiometry (one copy of RpoB, one copy of RpoC, but two RpoA units), should one expect elevated expression levels for RpoA compared to RpoB and RpoC? A possible explanation relates to the differences in protein sizes, RpoB and RpoC being larger proteins than RpoA. It has been observed for E. coli that codon choices in long genes tend to be more biased than those in short genes (10). Interestingly, Mycoplasma genitalium, its relative Ureaplasma urealyticum, and the spirochete Treponema pallidum feature PHX RpoA but not RpoB and RpoC.

(iii) Chaperone/degradation protein genes (Table (Table66).

Predicted expression levels for chaperone/degradation genes among four fast-growing bacteriaa

Among the top PHX genes in most eubacterial genomes are those for the major chaperone protein archetypes, DnaK and GroEL. These reach E(g) values exceeding 1.3 (>2 in E. coli). The gene for the multifunctional enzyme Pnp, fundamental in RNA processing and mRNA degradation, attains the highest predicted E(g) value, 2.66, among all E. coli genes. Pnp is PHX in many eubacterial genomes but not in B. subtilis.

Thioredoxin (trxA) implements protein folding by catalyzing the formation or disruption of disulfide bonds. The eukaryotic thioredoxin homolog is protein disulfide isomerase, operating in the endoplasmic reticulum. It has been verified experimentally that protein disulfide isomerase augments protein folding needs (7, 15, 47). The highest E(g) values for thioredoxin occur in B. subtilis (1.35) and then in other fast-growing bacteria in the order D. radiodurans (1.23) (data not shown), V. cholerae (1.21), H. influenzae (1.11), and E. coli (1.06).

Peptidyl-prolyl cis-trans isomerases (PPIases) accelerate the proper folding of proteins by promoting the cis-trans isomerization of imide bonds in proline within oligopeptides. E. coli has at least nine PPIases defined by sequence similarity. One of these, the survival protein SurA, enhances the folding of periplasmic and outer membrane proteins. As expected, SurA does not exist in gram-positive B. subtilis, which has neither compartment. Trigger factor (Tig) is a ribosome-associated chaperone that can complement DnaK (8). Tig and DnaK cooperate in the folding of newly synthesized proteins. Simultaneous deletion of Tig and DnaK is lethal under usual growth conditions (43). Tig is broadly PHX for eubacterial genomes but is not found for archaeal genomes. Expression levels of Tig in fast-growing bacteria are quite similar (Table (Table66).

DegP is a chaperone folding factor that is significantly PHX, with an E(g) value of 1.26; it acts primarily in degrading misfolded proteins in the periplasm. Also associated with periplasmic and cytoplasmic chaperones are several PPIases, including PpiC [E(g) = 1.02], PpiB (1.53), FkpA (1.40), SlyD (2.08), PpiA (0.95), PpiD (1.11), SurA (1.10), FhlB (0.85), and YaaD (0.77); four are active in the periplasm, and five are active in the cytoplasm. Another relevant chaperone protein is disulfide oxidase (DsbA), which is marginally PHX, with an E(g) value of ≈1.02; it senses misfolded proteins in the periplasm.

Correlations among the fast-growing bacteria for levels of expression of major chaperone genes are generally significantly high (Table (Table6,6, footnote a). However, E. coli and B. subtilis are marginally correlated (0.3). In E. coli, degradation proteins are mostly PHX, but this is not consistently the case for the other fast-growing bacteria. Why are the major chaperone genes so often PHX? Chaperone/degradation proteins are vitally needed both during rapid growth and in stationary phase. In normal cell physiology, these proteins have multiple functions: they contribute decisively in ensuring correct protein folding, in remedying misfolded structures, in directing protein trafficking, and in coordinating protein secretion. Chaperone proteins also contribute to conformational changes and to minimizing protein damage during stress.

(iv) Levels of expression of aminoacyl-tRNA synthetases (Table (Table77).

Predicted expression levels for aminoacyl-tRNA synthetase genes among four fast-growing bacteriaa

There are 19 PHX tRNA synthetase polypeptides in E. coli, including two subunits of phenylalanyl-tRNA synthetase (PheS-α and PheT-β) and two subunits of glycyl-tRNA synthetase (GlyQ-α and GlyS-β). However, there are only eight in V. cholerae, seven in H. influenzae, and three in B. subtilis. IleS is missing from H. influenzae, and GlnS is missing from B. subtilis, which uses amidotransferase modifications to produce Gln-tRNAGln from Glu-tRNAGlu synthetase. Actually, the GlnS gene is absent from most prokaryotic genomes (14).

Expression level correlations for the tRNA synthetase genes among the three rapidly dividing gram-negative genomes are generally positive but low. On the other hand, the corresponding relationship of B. subtilis with E. coli is uncorrelated (−0.04) and that of B. subtilis with V. cholerae is modestly negatively correlated (−0.24). LysS is the only PHX tRNA synthetase for all four genomes.

There are three aminoacyl-tRNA synthetases in E. coli which occur at only moderate predicted expression levels: CysS, with an E(g) of 0.89; TrpS, with an E(g) of 0.91; and HisS, with an E(g) of 0.74. The average amino acid usage frequencies for E. coli genes correlate positively with the predicted expression levels for tRNA synthetases. Interestingly, the three lowest amino acid usage frequencies in E. coli are for Cys (1.2%), Trp (1.5%), and His (2.3%) (Table (Table88).

Relationship between aminoacyl-tRNA synthetase expression levels and amino acid frequencies in E. coli proteinsa

(v) Levels of expression of major energy metabolism genes (Table (Table99).

Enzymes of major catabolic pathways can be divided into four groups: glycolysis, pyruvate metabolism, the pentose phosphate pathway, and the TCA cycle. The glycolysis genes are predominantly PHX in all four fast-growing bacteria, with very high E(g) values, >2.00, for several of these genes in E. coli. Hexokinase and glucokinase are prominent glycolysis proteins in most eukaryotes, but the former is not found in most prokaryotes, including the four fast-growing bacteria under analysis in this study. Why? In glycolysis, hexokinase converts glucose to glucose-6-phosphate. However, glucose-6-phosphate arises from other hexoses and from glucose transported into the cell via the phosphotransferase system. Perhaps the multiplicity of sources means that glucokinase need not be PHX. Glucokinase occurs in many (but not all) eubacteria, normally at low to moderate E(g) values, 0.3 to 0.8.

The genes for pyruvate dehydrogenase are commonly PHX in the four genomes. The TCA genes are generally PHX in E. coli but generally not PHX in H. influenzae and B. subtilis. In B. subtilis, two TCA genes are PHX and the others cover the range 0.4 to 1.0. Many prominent TCA genes appear to be absent from H. influenzae. Why are TCA genes in B. subtilis mostly not PHX? The TCA cycle, apart from energy (ATP) production, can contribute in myriad ways to cellular needs, especially in making precursors and intermediates to macromolecules, e.g., in amino acid, vitamin, and heme biosyntheses (see Discussion). The order of actions in the TCA cycle is as follows: citrate synthase (GltA; in B. subtilis, there are two versions, designated CitZ and CitA), aconitate hydratase (AcnA/AcnB), isocitrate dehydrogenase (Icd), 2-oxoglutarate dehydrogenase (SucA), succinyl coenzyme A (succinyl-CoA) synthetase (SucD and SucC), succinate dehydrogenase (SdhB, SdhC, and SdhD), fumarate hydratase (FumA, FumB, FumC, or CitG), and malate dehydrogenase (Mdh/CitH). The initial enzymes of the TCA pathway in E. coli are all PHX, with E(g) values ≥1.29, whereas those beyond succinyl-CoA synthetase (except for Mdh) all have E(g) values ≤1.10, and most are not PHX. Apart from the differences in the expression levels among the TCA cycle genes, correlations among genomes for energy metabolism gene expression levels across all four fast-growing bacteria are high, suggesting similar uses for this set of enzymes (Table (Table9,9, footnote a).

Certain gene groups generally not PHX.

Specific regulatory proteins or proteins responding to special demands and used few times, as in the highly specialized DNA repair processes, are not expected to be PHX. Also, specific transcription proteins and DNA replication proteins, because the cell assembles few replication machines, tend not to be PHX.

(i) Genomic repair proteins.

Table Table1010 reports predicted expression levels for the main collection of repair proteins for the four genomes. Only two repair proteins of E. coli reach PHX levels: RecA and Ssb (single-stranded DNA binding protein) [E(g) for both, 1.48]. Two other repair proteins are borderline PHX: Dut (deoxyuridine 5′-triphosphate nucleotide hydrolase) and HepA [E(g) = 0.97 and 0.99, respectively]. Other repair proteins have low to moderate predicted expression levels, the E(g) values almost always in the range from 0.35 to 0.80. These evaluations parallel those for D. radiodurans, in which RecA [E(g), 2.04] has a dramatically high predicted expression level and MutT (gene no. DR2358) reaches an E(g) of 1.29, these being the only two proteins qualifying as PHX (22). The other repair proteins of D. radiodurans have E(g) values in the range 0.40 to 0.80.

(ii) Vitamin biosynthesis proteins (Table (Table1111).

Pathways to the synthesis of vitamins, of which only small amounts are needed to provide adequate cofactor function, have largely low predicted expression levels, with E(g) values of about 0.40 to 0.75. In E. coli, the genes acting in the synthesis of six vitamin cofactors, biotin, thiamine, riboflavin, lipoate, pyridoxal, and cobalamin, were examined. Only RibH, which participates in riboflavin biosynthesis, is PHX in E. coli. Although the enzymes of the biosynthetic pathways are poorly expressed, some of the enzymes that utilize the vitamins as cofactors are highly expressed, for example, biotin carboxylase (a subunit of E. coli acetyl-CoA carboxylase). In B. subtilis, RibE, which is not PHX, in the same pathway forms an oligomer complex with RibH in which the structural union (RibE-RibH) combines 3 units of RibE with 60 units of RibH (23). This anomalous stoichiometry makes it likely that RibH furnishes structural support and, for this reason, is PHX; in this guise, RibH may be used in other capacities. Paradoxically, RibH is not PHX in B. subtilis.

Interestingly, M. tuberculosis features nine PHX proteins among the vitamin biosynthesis pathways. Synechocystis and A. aeolicus each have three PHX vitamin biosynthesis genes, Borrelia burgdorferi has one, Archaeoglobus fulgidus has two, T. pallidum has one, and D. radiodurans has one. The biotin carboxylase protein is PHX in the E. coli, H. influenzae, V. cholerae, Helicobacter pylori, Synechocystis, Chlamydia trachomatis, and A. fulgidus genomes.

(iii) Genes of signal transduction pathways.

In Table 8 of reference 21, the predicted expression levels for several two-component sensor genes (histidine kinases) of E. coli and B. subtilis are reported. In all of those examples, the predicted expression levels were low, the E(g) values ranging from 0.30 to 0.70.

One particular example is the Cpx regulon of the sensor kinase/phosphatase periplasmic family, which encompasses the genes encoding CpxA and CpxR (components of a histidine kinase), CpxP (down regulates the Cpx pathway), and NlpE (membrane lipoprotein), believed to eliminate abnormal proteins in the periplasm and to recover amino acids during nitrogen starvation (32). These proteins regulate a hierarchy of ς factors, including ς32 and ςE, active in autoregulation and repression. The predicted expression levels are low [for CpxA, E(g) = 0.70; for CpxR, E(g) = 0.57; for CpxP, E(g) = 0.62; and for NlpE, E(g) = 0.61], as is common with specific regulatory proteins. Cpx is a sensor kinase acting in the periplasm. The Cpx pathway apparently also monitors pilus assembly during infection of tissues by uropathogenic E. coli (17).

(iv) Principal starvation genes of E. coli and their predicted levels of expression (Table (Table1212).

Genes induced under starvation conditions in E. coli

The genes shown in Table Table1212 are associated with starvation states, as discussed in the review (26). Three genes in this category are PHX: dps, also labeled pexB [E(g), 1.13], which provides protection from oxidative radicals; rpoH, which encodes ς32 [E(g),1.46]; and the survival protein, SurA [E(g),1.10], a chaperone which is a member of the PPIase family. We expect these proteins, by virtue of their codon usage patterns, to be capable of high levels of expression, especially when induced by starvation. Other starvation proteins (Table (Table12)12) have low to moderate E(g) values. The ςE factor, which regulates the activity of other periplasmic proteins, is not PHX, and the same is true for ς54 and ς38, which respond to nitrogen and/or carbon starvation, respectively. However, ς32 (rpoH), the principal chaperone sigma factor, pervasively registers as PHX, presumably to establish high levels of chaperone production.

Homologous PHX genes among the fast-growing bacteria.

Table Table1313 compares the numbers of homologous PHX gene families among the four rapidly dividing bacteria. There are 60 gene families common to the four fast-growing bacteria, with each member PHX. Thirty-two of these are families of RP genes, eight are families of TF genes, and nine are families of genes essential for energy metabolism. Twenty-three gene families distinguish E. coli with PHX representatives, but these are not PHX in the other three fast growers, including five CH genes and five TF genes.

Families of homologous genes among the four fast-growing bacteria with at least one PHX gene

E. coli and V. cholerae share 124 homologous genes that are both PHX and in total 236 homologous genes with one or both genes being PHX; the respective values for E. coli and H. influenzae are 105 and 226, and the values for V. cholerae and H. influenzae are 94 and 156. Paired PHX genes between fast-growing bacteria and non-fast-growing bacteria are fewer in numbers (Table (Table14).14). Of homologous genes among genomes with at least one PHX gene, the expression levels for E. coli versus archaeal genomes and E. coli versus H. pylori and M. genitalium genomes are uncorrelated or negatively correlated (Table (Table14).14). Similarly, V. cholerae, H. influenzae, and B. subtilis expression levels correlate negatively with homologous genes of archaeal genomes, possibly reflecting differences in lifestyles, habitats, and energy sources.

Numbers of pairs of homologousa genes with one or both genes PHX and correlations between their E(g) valuesb

Codon usages along the gene and expression levels.

For relatively long genes (≥600 codons long), we determined expression levels with the gene length divided into three equal parts (5′, middle, and 3′ parts). The pairwise correlations among the three parts of the E. coli genes are high, 0.86, 0.85, and 0.88, respectively, indicating that expression levels calculated from codon biases are effectively the same for the three parts of genes.

Independent of gene size, we observed (20) that the middle and 3′ end of the genes show quite similar codon frequencies, whereas the 5′ third-codon ensemble possesses somewhat different codon frequencies. This finding may reflect differences in translation initiation versus later stages of translation elongation. A prominent example concerns encoding of arginine with major codons (CGN) versus minor codons (AGR). The AGR codons are scarce in E. coli genes and are restricted mostly to the 5′ end of the genes (especially to the initial 30 bp), whereas CGN codons are preferred elsewhere in the genes (6).

PHX ORFs shared by the four fast-growing genomes.

Genes are considered homologous if their SSPA (significant segment pair alignment) score (percent similarity; see reference 5) is ≥40%. Examples include three ORFs (yaaH, yajC, and yeeX) common to E. coli and V. cholerae, three similar ORFs (yfiD, yjjK, and yebC) present in the genomes of E. coli, V. cholerae, and H. influenzae, respectively, and one ORF (ybaB) common to E. coli and B. subtilis. These PHX genes of unknown function offer attractive candidates for mutagenesis and knockout studies to determine their functions.

Distributions of PHX genes over the chromosomes.

Clusters of PHX genes are displayed in Table Table15.15. Statistical significance was assessed using the r-scan analysis protocol described elsewhere (18).

Clusters of PHX genes

The PHX genes in each cluster generally possess the same transcription orientation, mostly that of the leading strand. However, E. coli features the PHX fumarate reductase operon genes (kb 4380 → 4376) frdD, frdB, and frdA untypically located in the lagging strand (the direction of transcription is indicated by the arrow). The genes encoding the principal units of NADH dehydrogenase I, N, L, I, G, F, and C cover positions 2402 → 2387 (about a 5-kb extent) on the leading strand.

The PHX gene clusters of E. coli, apart from the segments at kb 450 → 447 and kb 4380 → 4376 of the cytochrome o ubiquinol oxidase operon and the fumarate reductase operon, respectively, are all located in the leading strand. Note that the two RP clusters near oriC (kb 3476 → 3437 and kb 4174 → 4183) include a number of TF genes and some PPIase genes. There are no extended intervals devoid of PHX genes in the E. coli genome.

The V. cholerae large chromosome contains two significantly long segments, at kb 43 to 327 and kb 1657 to 1985, each devoid of PHX genes and positioned antipodal in the chromosome. The main PHX clusters correspond to long RP operons located in the leading strand. These descriptions indicate that PHX genes are irregularly distributed in the V. cholerae chromosomes. The V. cholerae genome has two chromosomes (chromosome I, 2.96 Mb, and chromosome II, 1.07 Mb) containing 138 PHX genes and 14 PHX genes, respectively. The PHX genes in the large chromosome comprise 7% of its genes. V. cholerae has a single PHX RP gene on chromosome II.

In H. influenzae, the PHX clusters are of RP genes and protein synthesis genes.

B. subtilis contains a PHX cluster which features a conglomerate of 27 RP genes (kb 118 → 154) intermeshed with the protein synthesis genes rpoB, rpoC, fus, tuf, and rpoA. A compact operon of PHX genes distinguishes five glycolysis genes (kb 3482 → 3475), enolase (eno), phosphoglycerate mutase (pgm), triosephosphate isomerase (tpi), phosphoglycerate kinase (pgk), and glyceraldehyde-3-phosphate dehydrogenase (gap), located in the leading strand. The cluster at kb 3475 → 3482 ostensibly renders the main glycolysis genes highly efficient, putatively making it less important to express many respiration genes. All clusters are located in the leading strand. B. subtilis also has a 245-kb stretch devoid of PHX genes, at kb 35 to 280.


Gene expression can be evaluated in several ways. One currently popular way centers on DNA microarrays (DNA chips) aiming to dissect gene expression under varied physiological, clinical, and environmental conditions. These DNA chips have been applied to the monitoring of genes in different situations for the discovery of genes associated with diseases; for assessment of gene expression under inducements from drugs, chemicals, or toxins; for ascertainment of genes compensatory for knockout mutations; and for profiling of gene expression patterns in temporal and tissue-specific localizations. The current microarray methodology is restricted to discriminating transcription levels and not levels of translation or protein abundances (33, 42). Also, DNA chip hybridizations are generally unable to detect unambiguously low-abundance gene transcripts. Experimental evaluations of protein abundances under different cellular conditions can be assayed by 2D gel electrophoresis (reviewed in reference 46) supplemented by mass spectrometry (51), by antibody associations, and by biochemical tests. Also, correlations of 2D gel proteomes and microarray assessments of transcriptomes generally appear to be weak (13). However, Futcher and coworkers (11) reexamined these correlations in yeast and found generally good agreement.

Codon choice is presumably influenced by protein structure via evolutionary selection for the most accurately translated sequences at structurally important locations. Codon choices may be different at the beginning of a gene than at the central part of the gene (6). It has been suggested that translation pause sites, especially early in the coding sequence, can slow translation initiation (16). Accordingly, there appear to be conflicting selection pressures imposed by constraints on ribosomal binding for the rate of initiation, rate of elongation, and overall translation fidelities. In rapidly growing cells, where ribosomes are limiting for protein synthesis, a ribosome stalled at a rare codon is unavailable for the synthesis of other proteins, and the higher the molar abundance of the stalled protein, the greater the disruption of cellular growth (52). Protein structure may be correlated with codon usage (e.g., see references 30 and 44). Thanaraj and Argos (44) argue the rare-codon hypothesis for domains and secondary structures, in which repetition of rare codons reduces translation rates and introduces translation pauses, allowing time for protein domains and secondary structures to fold into native structural conformations.

Codon usage offers another way to evaluate gene expression with a different set of limitations. Our sequence methods are effectively complementary to the experimental procedures of 2D gel electrophoresis and DNA microarray analysis in assessing gene expression levels. By our methods, genes similar in codon frequencies to RP, TF, and CH genes but strongly deviant in codon usage from the average gene are identified as PHX. Our analyses and data support the hypothesis that each genome has evolved codon usage patterns indicating “optimal” gene expression levels for most situations of its habitat, energy sources, and lifestyle. The three protein families—ribosomal proteins, major translation/transcription processing factors, and chaperone/degradation proteins—are fundamental at many stages of the cell life in promoting growth and stability. Generally, PHX genes exploit favorable codon usages, tend to possess strong Shine-Dalgarno sequences, and putatively possess strong promoter sequences (cf. reference 21). Some limitations of our method result from an implicit assumption that the codon usage of a gene is not affected by its location in the genome, e.g., G+C-rich versus A+T-rich regions. The high variance of G+C composition (isochores) along mammalian genomes may be prohibitive with respect to predicting gene expression levels from codon usages. However, the nucleotide compositions of bacterial genomes are largely homogeneous. Some genes that deviate in G+C content (e.g., those for transposases or specialized pathogenicity islands) tend to be detected as “putative alien” genes (22, 28).

What does the expression level E(g) for a gene g reflect? Gene expression in prokaryotes is regulated at initiation, elongation, and termination of transcription and of translation, by different rates of transcription and translation, by differential mRNA stabilities, by segmental stability differences in polycistronic messages, by codon preferences, and by interactions with chaperone and other proteins. Expression is also influenced by lifestyle, habitat, and energy sources. The classes (RP, TF, and CH) of proteins that we have chosen to represent highly expressed genes are needed in high molar abundances when a high rate of protein synthesis is essential.

Multifunctional proteins and PHX levels

A protein that belongs to a PHX class and that performs several functions might be expected to register higher E(g) values than the average PHX gene. We offer several examples.

Polynucleotide phosphorylase (Pnp) is fundamental in RNA processing and mRNA degradation, and the gene attains the highest E(g) value, 2.66, among all the E. coli genes. Pnp is also a component of the mRNA degradosome, which involves RNase E, DnaK, RhlB helicase, and enolase (27). RNase E is also PHX in E. coli, with an E(g) value of 1.22, but it is not PHX in H. influenzae and V. cholerae and it is missing from B. subtilis. As an important multifunctional protein, Pnp is expected to be PHX at an increased level. The Pnp gene also has the highest E(g) value among all the genes in B. burgdorferi. This gene is also significantly PHX in the genomes of H. influenzae, V. cholerae, Synechocystis, M. tuberculosis, T. pallidum, Chlamydia pneumoniae, A. aeolicus, and T. maritima.

Enolase obtains the very high E(g) values of 2.11 in E. coli, 1.93 in H. influenzae, 1.59 in V. cholerae, and 1.92 in B. subtilis. Again, enolase is multifunctional, acting in energy metabolism (glycolysis) and partly in RNA degradation.

The enzyme aconitate hydratase (aconitase) interconverts citrate and isocitrate in the TCA cycle. Aconitase also serves as a sensor, detecting changes in the redox state and assaying iron content within the cell (36). This protein can further function as a transcriptional activator that specifically regulates gene expression for the transferrin receptor and controls quantities of ferritin (2). At its iron sulfur center, aconitase can be inactivated by oxidative stress or iron deprivation. Aconitase has the highest E(g) value, 2.56, in D. radiodurans (see also reference 22), and its gene is PHX in many genomes.

Apart from structural roles in ribosome formation, several ribosomal proteins act in multifunctional capacities (50). For example, the S9 protein is an accessory protein functioning in DNA repair (49). The E(g) values for S9 in the four fast-growing bacteria studied here are all >1.50, particularly ≥1.90 in E. coli and V. cholerae, significantly higher than the average ribosomal protein E(g) value. The L25 ribosomal protein (93 aa in E. coli) is homologous to the general stress protein (Ctc). This protein achieves the very high E(g) values of 1.90 in E. coli and 1.89 in D. radiodurans. Ctc is PHX also in C. trachomatis, Campylobacter jejuni, H. pylori, T. maritima, and A. aeolicus, none of which carries the L25 gene. In contrast, Ctc is absent in E. coli, V. cholerae, and H. influenzae, but their genomes encode the L25 ribosomal protein. In almost all genomes, the Ctc and L25 protein genes are mutually exclusive. The large ribosomal protein S1 gene is almost always among the top levels of eubacterial PHX genes. We conjecture that the S1 protein (generally ≥500 aa) possesses multifunctional activity yet to be determined. Interestingly, the S1 protein is composed of repetitions of an 86-aa element, usually involving six or more copies.

Other multifunctional PHX proteins from many genomes include glyceraldehyde-3-phosphate dehydrogenase, acting primarily in the first step of the second phase of glycolysis. This protein is very promiscuous, showing uracil DNA glycosylase activity, and binds to tRNA and DNA and to proteins with glutamine repeats. In eukaryotes, it also structurally binds filaments of actin and microtubules (39).

The elongation factor EF-1α is an essential component of the translation apparatus and also has a major function in severing microtubules (38). Phosphoglycerate kinase also functions as a disulfide reductase (25). Many different metabolic proteins serve as crystalline components for the lenses of different animal eyes; these include PPIases, aldehyde dehydrogenase, arginosuccinate lyase, enolase, and aldose reductase.

Contrasts in PHX levels among genes involved in energy metabolism in E. coli and B. subtilis.

As indicated earlier, certain genes of energy metabolism are predominantly PHX in all four fast-growing bacteria and have high expression levels [E(g), often >2.00]. This is manifestly valid (Table (Table9)9) for glycolysis genes and for genes of pyruvate oxidation. Why should most of the TCA genes of E. coli be PHX but not those of B. subtilis? We suggest four possible contributing causes. (i) Perhaps B. subtilis makes less use of the TCA cycle for ATP production than E. coli. The principal glycolysis genes of B. subtilis, unlike those of E. coli (dispersed all over the E. coli genome), are encoded from a single cluster (gap, pgk, tpi, pgm, and eno); see our earlier discussion of PHX clusters. (ii) The TCA cycle has at least two main tasks: the first, aerobic energy (ATP) production, and the second, synthesis of carbon chain precursors to various essential metabolites, such as amino acids. Can many of these precursors be more easily acquired by other means in B. subtilis? B. subtilis, in marked contrast to E. coli, has four PHX flagellin genes (flagellin [hap], flagellar hook protein [flgE], flagellar hook basal body [fliE], and flagellin homolog [yvzB]), whereas a single flagellin gene of E. coli is PHX (21). Moreover, flagellar genes are strictly regulated and inducible in E. coli but constitutive in B. subtilis (40). Assuming that soil is the primary B. subtilis habitat and that the human gut is the primary habitat for E. coli, different metabolic patterns may be appropriate. The swimming movements of B. subtilis mediated by its PHX flagellar proteins may facilitate the acquisition of nutrients, such as amino acids, from an assortment of soil sources. B. subtilis also excretes many digestive enzymes in gathering macromolecular nutrients for possible predatory objectives (1). (iii) There are also differences between E. coli and B. subtilis in energy pathways, which can influence expression levels. For example, E. coli uses succinyl-CoA as a precursor in the biosynthesis of lysine and methionine, whereas B. subtilis uses acetyl-CoA for this objective. E. coli possesses isocitrate lyase (AceA) in competition with isocitrate dehydrogenase, the first enzyme of the glyoxylate shunt pathway, which is very effective for acquiring a net carbon gain in the metabolism of fatty acids, whereas B. subtilis lacks AceA. The early genes in the TCA cycle of B. subtilis, those for aconitase and isocitrate dehydrogenase, are PHX, whereas the remaining genes are only predicted moderately expressed. Apparently, the order of TCA genes can be important. (iv) B. subtilis and E. coli are both facultative aerobic organisms (29). For anaerobic respiration, B. subtilis relies exclusively on nitrate or nitrite as its terminal electron acceptor, whereas E. coli has many alternative acceptors.

Highly expressed genes under varying conditions.

Can our methods be applied in conjunction with microarray analysis? We cannot change the codon usage of a given gene, but we can change the gene class standards for discerning expression levels relative to these gene classes (see Materials and Methods). Here, the gene class standards are RP, TF, and CH. It is hypothesized that similarity of codon usages, as characterized in Materials and Methods, for two or more natural gene classes may identify new genes with similar properties, as in the defining gene classes. Effectively, codon usage patterns provide a means to correlate genes and functional categories (20). By using several gene classes as standards, a figure corresponding to Fig. Fig.11 but in multiple dimensions, when coupled to a suitable clustering analysis, may discriminate additional genes highly expressed relative to the different gene class standards. For example, when we compare codon usages of genes with respect to the B. subtilis sporulation genes versus the class of all genes, the two coordinates plot a straight line. In another example, yeast mitochondrial genes feature a melange of PHX genes, putative alien genes, and average genes, and the genes for the ribosomal proteins functioning in the mitochondrion tend to show codon usages akin to average genes.


We thank G. Miklos, F. Neidhardt, A. L. Sonenshein, and A. Spormann for valuable discussions on the manuscript.

This work was supported in part by NIH grants 5R01GM10452-35 and 5R01HG00335-12.


1. Aizawa, S.-I., I. B. Zhulin, L. Marquez-Magana, and G. W. Ordal. Chemotaxis and motility in Bacillus subtilis. In A. L. Sonenshein (ed.), Bacillus subtilis, 2nd ed., in press. ASM Press, Washington, D.C.
2. Alén C, Sonenshein A L. Bacillus subtilis aconitase is an RNA-binding protein. Proc Natl Acad Sci USA. 1999;96:10412–10417. [PMC free article] [PubMed]
3. Antelmann H, Bernhardt J, Schmid R, Mach H, Volker U, Hecker M. First steps from two-dimensional protein index towards a response regulation map for Bacillus subtilis. Electrophoresis. 1997;18:1451–1463. [PubMed]
4. Bocharov E V, Gudkov A T, Budovskaya E V, Arseniev A S. Conformational independence of N- and C-domains in ribosomal protein L7/L12 and in the complex with protein L10. FEBS Lett. 1998;423:347–350. [PubMed]
5. Brocchieri L, Karlin S. A symmetric-iterated multiple alignment of protein sequences. J Mol Biol. 1998;276:249–264. [PubMed]
6. Chen G T, Inouye M. Role of the AGA/AGG codons, the rarest codons in global gene expression in Escherichia coli. Genes Dev. 1994;8:2641–2652. [PubMed]
7. Dai Y, Wang C. A mutant truncated protein disulfide isomerase with no chaperone activity. J Biol Chem. 1997;272:27572–27576. [PubMed]
8. Deuerling E, Schulze-Specking A, Tomoyasu T, Mogk A, Bukau B. Trigger factor and DnaK cooperate in folding of newly synthesized proteins. Nature. 1999;400:693–696. [PubMed]
9. Eisen J A, Hanawalt P C. A phylogenomic study of DNA repair genes, proteins, and processes. Mutat Res. 1999;435:171–213. [PMC free article] [PubMed]
10. Eyre-Walker A. Synonymous codon bias is related to gene length in Escherichia coli: selection for translational accuracy. Mol Biol Evol. 1996;13:864–872. [PubMed]
11. Futcher B, Latter G I, Monardo P, McLaughlin C S, Garrels J I. A sampling of the yeast proteome. Mol Cell Biol. 1999;19:7357–7368. [PMC free article] [PubMed]
12. Gonzalez J C, Peariso K, Penner-Hahn J E, Matthews R G. Cobalamin-independent methionine synthase from Escherichia coli: a zinc metalloenzyme. Biochemistry. 1996;35:12228–12234. [PubMed]
13. Gygi S P, Rochon Y, Franza B R, Aebersold R. Correlation between protein and mRNA abundance in yeast. Mol Cell Biol. 1999;19:1720–1730. [PMC free article] [PubMed]
14. Handy J, Doolittle R F. An attempt to pinpoint the phylogenetic introduction of glutaminyl-tRNA synthetase among bacteria. J Mol Evol. 1999;49:709–715. [PubMed]
15. Herrmann J M, Malkus P, Schekman R. Out of the ER: outfitters, escorts and guides. Trends Cell Biol. 1999;9:5–7. [PubMed]
16. Irwin B, Heck J D, Hatfield G W. Codon pair utilization biases influence translational elongation step times. J Biol Chem. 1995;270:22801–22806. [PubMed]
17. Jones C H, Danese P N, Pinkner J S, Silhavy T J, Hultgren S J. The chaperone-assisted membrane release and folding pathway is sensed by two signal transduction systems. EMBO J. 1997;16:6394–6406. [PMC free article] [PubMed]
18. Karlin S, Macken C. Some statistical problems in the assessment of inhomogeneities of DNA sequence data. J Am Stat Assoc. 1991;86:27–35.
19. Karlin S, Campbell A M, Mrázek J. Comparative DNA analysis across diverse genomes. Annu Rev Genet. 1998;32:185–225. [PubMed]
20. Karlin S, Mrázek J, Campbell A M. Codon usages in different gene classes of the Escherichia coli genome. Mol Microbiol. 1998;29:1341–1355. [PubMed]
21. Karlin S, Mrázek J. Predicted highly expressed genes of diverse prokaryotic genomes. J Bacteriol. 2000;182:5238–5250. [PMC free article] [PubMed]
22. Karlin S, Mrázek J. Predicted highly expressed and putative alien genes of Deinococcus radiodurans and implications for resistance to ionizing radiation damage. Proc Natl Acad Sci USA. 2001;98:5240–5245. [PMC free article] [PubMed]
23. Ladenstein R, Schneider M, Huber R, Bartunik H D, Wilson K, Schott K, Bacher A. Heavy riboflavin synthase from Bacillus subtilis. Crystal structure analysis of the icosahedral beta 60 capsid at 3.3 A resolution. J Mol Biol. 1988;203:1045–1070. [PubMed]
24. Lawrence J G, Roth J R. Selfish operons: horizontal transfer may drive the evolution of gene clusters. Genetics. 1996;143:1843–1860. [PMC free article] [PubMed]
25. Lay A J, Jiang X M, Kisker O, Flynn E, Underwood A, Condron R, Hogg P J. Phosphoglycerate kinase acts in tumour angiogenesis as a disulphide reductase. Nature. 2000;408:869–873. [PubMed]
26. Matin A, Baetens M, Pandza S, Park C H, Waggoner S. Survival strategies in stationary phase. In: Rosenberg E, editor. Microbial ecology and infectious diseases. Washington, D.C.: American Society for Microbiology; 1999. pp. 32–48.
27. Miczak A, Kaberdin V R, Wei C L, LinChao S. Proteins associated with RNase E in a multicomponent ribonucleolytic complex. Proc Natl Acad Sci USA. 1996;93:3865–3869. [PMC free article] [PubMed]
28. Mrázek J, Bhaya D, Grossman A R, Karlin S. Highly expressed and alien genes of the Synechocystis genome. Nucleic Acids Res. 2001;29:1590–1601. [PMC free article] [PubMed]
29. Nakano M M, Zuber P. Anaerobic growth of a “strict aerobe” (Bacillus subtilis) Annu Rev Microbiol. 1998;52:165–190. [PubMed]
30. Netzer W J, Hartl F U. Protein folding in the cytosol: chaperonin-dependent and -independent mechanisms. Trends Biochem Sci. 1998;23:68–73. [PubMed]
31. Oberto J, Bonnefoy E, Mouray E, Pellegrini O, Wikstrom P M, Rouviere-Yaniv J. The Escherichia coli ribosomal protein S16 is an endonuclease. Mol Microbiol. 1996;19:1319–1330. [PubMed]
32. Raivio T L, Popkin D L, Silhavy T J. The Cpx envelope stress response is controlled by amplification and feedback inhibition. J Bacteriol. 1999;181:5263–5272. [PMC free article] [PubMed]
33. Richmond C S, Glasner J D, Mau R, Jin H, Blattner F R. Genome-wide expression profiling in Escherichia coli K-12. Nucleic Acids Res. 1999;27:3821–3835. [PMC free article] [PubMed]
34. Rocha E P, Danchin A, Viari A. Translation in Bacillus subtilis: roles and trends of initiation and termination, insights from a genome analysis. Nucleic Acids Res. 1999;27:3567–3576. [PMC free article] [PubMed]
35. Rolland N, Janosi L, Block M A, Shuda M, Teyssier E, Miege C, Cheniclet C, Carde J P, Kaji A, Joyard J. Plant ribosome recycling factor homologue is a chloroplastic protein and is bactericidal in Escherichia coli carrying temperature-sensitive ribosome recycling factor. Proc Natl Acad Sci USA. 1999;96:5464–5469. [PMC free article] [PubMed]
36. Rouault T A, Klausner R D. Iron-sulfur clusters as biosensors of oxidants and iron. Trends Biochem Sci. 1996;21:174–177. [PubMed]
37. Selmer M, Al-Karadaghi S, Hirokawa G, Kaji A, Liljas A. Crystal structure of Thermotoga maritima ribosome recycling factor: a tRNA mimic. Science. 1999;286:2349–2352. [PubMed]
38. Shiina N, Gotoh Y, Kubomura N, Iwamatsu A, Nishida E. Microtubule severing by elongation factor 1 alpha. Science. 1994;266:282–285. [PubMed]
39. Sirover M A. New insights into an old protein: the functional diversity of mammalian glyceraldehyde-3-phosphate dehydrogenase. Biochim Biophys Acta. 1999;1432:159–184. [PubMed]
40. Soutourina O, Kolb A, Krin E, Laurent-Winter C, Rimsky S, Danchin A, Bertin P. Multiple control of flagellum biosynthesis in Escherichia coli: role of H-NS protein and the cyclic AMP-catabolite activator protein complex in transcription of the flhDC master operon. J Bacteriol. 1999;181:7500–7508. [PMC free article] [PubMed]
41. Takusagawa F, Kamitori S, Misaki S, Markham G D. Crystal structure of S-adenosylmethionine synthetase. J Biol Chem. 1996;271:136–147. [PubMed]
42. Tao H, Bausch C, Richmond C, Blattner F R, Conway T. Functional genomics: expression analysis of Escherichia coli growing on minimal and rich media. J Bacteriol. 1999;181:6425–6440. [PMC free article] [PubMed]
43. Teter S A, Houry W A, Ang D, Tradler T, Rockabrand D, Fischer G, Blum P, Georgopoulos C, Hartl F U. Polypeptide flux through bacterial Hsp70: DnaK cooperates with trigger factor in chaperoning nascent chains. Cell. 1999;97:755–765. [PubMed]
44. Thanaraj T A, Argos P. Ribosome-mediated translational pause and protein domain organization. Protein Sci. 1996;5:1594–1612. [PMC free article] [PubMed]
45. VanBogelen R A, Abshire K Z, Pertsemlidis A, Clark R L, Neidhardt F C. Gene-protein database of Escherichia coli K-12, edition 6. In: Neidhardt F C, Curtiss III R, Ingraham J L, Lin E C C, Low K B, Magasanik B, Reznikoff W S, Riley M, Schaechter M, Umbarger H E, editors. Escherichia coli and Salmonella: cellular and molecular biology. 2nd ed. Washington, D.C.: ASM Press; 1996. pp. 2067–2117.
46. VanBogelen R A, Schiller E E, Thomas J D, Neidhardt F C. Diagnosis of cellular states of microbial organisms using proteomics. Electrophoresis. 1999;20:2149–2159. [PubMed]
47. Wang C C, Tsou C L. Protein disulfide isomerase is both an enzyme and a chaperone. FASEB J. 1993;7:1515–1517. [PubMed]
48. Warner J R. The economics of ribosome biosynthesis in yeast. Trends Biochem Sci. 1999;24:437–440. [PubMed]
49. Woodgate R, Rajagopalan M, Lu C, Echols H. UmuC mutagenesis protein of Escherichia coli: purification and interaction with UmuD and UmuD′ Proc Natl Acad Sci USA. 1989;86:7301–7305. [PMC free article] [PubMed]
50. Wool I G. Extraribosomal functions of ribosomal proteins. Trends Biochem Sci. 1996;21:164–165. [PubMed]
51. Yates J R., III Mass spectrometry. From genomics to proteomics. Trends Genet. 2000;16:5–8. [PubMed]
52. Zahn K. Overexpression of an mRNA dependent on rare codons inhibits protein synthesis and cell growth. J Bacteriol. 1996;178:2926–2933. [PMC free article] [PubMed]

Articles from Journal of Bacteriology are provided here courtesy of American Society for Microbiology (ASM)
PubReader format: click here to try


Save items

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...