• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of jbacterPermissionsJournals.ASM.orgJournalJB ArticleJournal InfoAuthorsReviewers
J Bacteriol. Jun 1997; 179(12): 3899–3913.
PMCID: PMC179198

Compositional biases of bacterial genomes and evolutionary implications.


We compare and contrast genome-wide compositional biases and distributions of short oligonucleotides across 15 diverse prokaryotes that have substantial genomic sequence collections. These include seven complete genomes (Escherichia coli, Haemophilus influenzae, Mycoplasma genitalium, Mycoplasma pneumoniae, Synechocystis sp. strain PCC6803, Methanococcus jannaschii, and Pyrobaculum aerophilum). A key observation concerns the constancy of the dinucleotide relative abundance profiles over multiple 50-kb disjoint contigs within the same genome. (The profile is rhoXY* = fXY*/fX*fY* for all XY, where fX* denotes the frequency of the nucleotide X and fY* denotes the frequency of the dinucleotide XY, both computed from the sequence concatenated with its inverted complementary sequence.) On the basis of this constancy, we refer to the collection [rhoXY*] as the genome signature. We establish that the differences between [rhoXY*] vectors of 50-kb sample contigs of different genomes virtually always exceed the differences between those of the same genomes. Various di- and tetranucleotide biases are identified. In particular, we find that the dinucleotide CpG=CG is underrepresented in many thermophiles (e.g., M. jannaschii, Sulfolobus sp., and M. thermoautotrophicum) but overrepresented in halobacteria. TA is broadly underrepresented in prokaryotes and eukaryotes, but normal counts appear in Sulfolobus and P. aerophilum sequences. More than for any other bacterial genome, palindromic tetranucleotides are underrepresented in H. influenzae. The M. jannaschii sequence is unprecedented in its extreme underrepresentation of CTAG tetranucleotides and in the anomalous distribution of CTAG sites around the genome. Comparative analysis of numbers of long tetranucleotide microsatellites distinguishes H. influenzae. Dinucleotide relative abundance differences between bacterial sequences are compared. For example, in these assessments of differences, the cyanobacteria Synechocystis, Synechococcus, and Anabaena do not form a coherent group and are as far from each other as general gram-negative sequences are from general gram-positive sequences. The difference of M. jannaschii from low-G+C gram-positive proteobacteria is one-half of the difference from gram-negative proteobacteria. Interpretations and hypotheses center on the role of the genome signature in highlighting similarities and dissimilarities across different classes of prokaryotic species, possible mechanisms underlying the genome signature, the form and level of genome compositional flux, the use of the genome signature as a chronometer of molecular phylogeny, and implications with respect to the three putative eubacterial, archaeal, and eukaryote domains of life and to the origin and early evolution of eukaryotes.

Full Text

The Full Text of this article is available as a PDF (400K).

Selected References

These references are in PubMed. This may not be the complete list of references from this article.
  • Baldauf SL, Palmer JD, Doolittle WF. The root of the universal tree and the origin of eukaryotes based on elongation factor phylogeny. Proc Natl Acad Sci U S A. 1996 Jul 23;93(15):7749–7754. [PMC free article] [PubMed]
  • Benachenhou-Lahfa N, Forterre P, Labedan B. Evolution of glutamate dehydrogenase genes: evidence for two paralogous protein families and unusual branching patterns of the archaebacteria in the universal tree of life. J Mol Evol. 1993 Apr;36(4):335–346. [PubMed]
  • Beutler E, Gelbart T, Han JH, Koziol JA, Beutler B. Evolution of the genome and the genetic code: selection at the dinucleotide level by methylation and polyribonucleotide cleavage. Proc Natl Acad Sci U S A. 1989 Jan;86(1):192–196. [PMC free article] [PubMed]
  • Bhagwat AS, McClelland M. DNA mismatch correction by Very Short Patch repair may have altered the abundance of oligonucleotides in the E. coli genome. Nucleic Acids Res. 1992 Apr 11;20(7):1663–1668. [PMC free article] [PubMed]
  • Blaisdell BE, Campbell AM, Karlin S. Similarities and dissimilarities of phage genomes. Proc Natl Acad Sci U S A. 1996 Jun 11;93(12):5854–5859. [PMC free article] [PubMed]
  • Brendel V, Brocchieri L, Sandler SJ, Clark AJ, Karlin S. Evolutionary comparisons of RecA-like proteins across all major kingdoms of living organisms. J Mol Evol. 1997 May;44(5):528–541. [PubMed]
  • Breslauer KJ, Frank R, Blöcker H, Marky LA. Predicting DNA duplex stability from the base sequence. Proc Natl Acad Sci U S A. 1986 Jun;83(11):3746–3750. [PMC free article] [PubMed]
  • Brown JR, Masuchi Y, Robb FT, Doolittle WF. Evolutionary relationships of bacterial and archaeal glutamine synthetase genes. J Mol Evol. 1994 Jun;38(6):566–576. [PubMed]
  • Bult CJ, White O, Olsen GJ, Zhou L, Fleischmann RD, Sutton GG, Blake JA, FitzGerald LM, Clayton RA, Gocayne JD, et al. Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii. Science. 1996 Aug 23;273(5278):1058–1073. [PubMed]
  • Burge C, Campbell AM, Karlin S. Over- and under-representation of short oligonucleotides in DNA sequences. Proc Natl Acad Sci U S A. 1992 Feb 15;89(4):1358–1362. [PMC free article] [PubMed]
  • Castresana J, Saraste M. Evolution of energetic metabolism: the respiration-early hypothesis. Trends Biochem Sci. 1995 Nov;20(11):443–448. [PubMed]
  • Clark CG, Roger AJ. Direct evidence for secondary loss of mitochondria in Entamoeba histolytica. Proc Natl Acad Sci U S A. 1995 Jul 3;92(14):6518–6521. [PMC free article] [PubMed]
  • Cox EC, Yanofsky C. Altered base ratios in the DNA of an Escherichia coli mutator strain. Proc Natl Acad Sci U S A. 1967 Nov;58(5):1895–1902. [PMC free article] [PubMed]
  • Delcourt SG, Blake RD. Stacking energies in DNA. J Biol Chem. 1991 Aug 15;266(23):15160–15169. [PubMed]
  • Doolittle WF. At the core of the Archaea. Proc Natl Acad Sci U S A. 1996 Aug 20;93(17):8797–8799. [PMC free article] [PubMed]
  • Doolittle RF, Feng DF, Tsang S, Cho G, Little E. Determining divergence times of the major kingdoms of living organisms with a protein clock. Science. 1996 Jan 26;271(5248):470–477. [PubMed]
  • Echols H, Goodman MF. Fidelity mechanisms in DNA replication. Annu Rev Biochem. 1991;60:477–511. [PubMed]
  • Felsenstein J. Phylogenies from molecular sequences: inference and reliability. Annu Rev Genet. 1988;22:521–565. [PubMed]
  • Gray MW. The endosymbiont hypothesis revisited. Int Rev Cytol. 1992;141:233–357. [PubMed]
  • Gunsalus RP, Yanofsky C. Nucleotide sequence and expression of Escherichia coli trpR, the structural gene for the trp aporepressor. Proc Natl Acad Sci U S A. 1980 Dec;77(12):7117–7121. [PMC free article] [PubMed]
  • Gupta RS, Bustard K, Falah M, Singh D. Sequencing of heat shock protein 70 (DnaK) homologs from Deinococcus proteolyticus and Thermomicrobium roseum and their integration in a protein-based phylogeny of prokaryotes. J Bacteriol. 1997 Jan;179(2):345–357. [PMC free article] [PubMed]
  • Gupta RS, Golding GB. The origin of the eukaryotic cell. Trends Biochem Sci. 1996 May;21(5):166–171. [PubMed]
  • Gupta RS, Singh B. Phylogenetic analysis of 70 kD heat shock protein sequences suggests a chimeric origin for the eukaryotic cell nucleus. Curr Biol. 1994 Dec 1;4(12):1104–1114. [PubMed]
  • High NJ, Deadman ME, Moxon ER. The role of a repetitive DNA motif (5'-CAAT-3') in the variable expression of the Haemophilus influenzae lipopolysaccharide epitope alpha Gal(1-4)beta Gal. Mol Microbiol. 1993 Sep;9(6):1275–1282. [PubMed]
  • Hunter CA. Sequence-dependent DNA structure. The role of base stacking interactions. J Mol Biol. 1993 Apr 5;230(3):1025–1054. [PubMed]
  • JOSSE J, KAISER AD, KORNBERG A. Enzymatic synthesis of deoxyribonucleic acid. VIII. Frequencies of nearest neighbor base sequences in deoxyribonucleic acid. J Biol Chem. 1961 Mar;236:864–875. [PubMed]
  • Karlin S. Statistical significance of sequence patterns in proteins. Curr Opin Struct Biol. 1995 Jun;5(3):360–371. [PubMed]
  • Karlin S, Brendel V. Chance and statistical significance in protein and DNA sequence analysis. Science. 1992 Jul 3;257(5066):39–49. [PubMed]
  • Karlin S, Burge C. Dinucleotide relative abundance extremes: a genomic signature. Trends Genet. 1995 Jul;11(7):283–290. [PubMed]
  • Karlin S, Burge C, Campbell AM. Statistical analyses of counts and distributions of restriction sites in DNA sequences. Nucleic Acids Res. 1992 Mar 25;20(6):1363–1370. [PMC free article] [PubMed]
  • Karlin S, Campbell AM. Which bacterium is the ancestor of the animal mitochondrial genome? Proc Natl Acad Sci U S A. 1994 Dec 20;91(26):12842–12846. [PMC free article] [PubMed]
  • Karlin S, Cardon LR. Computational DNA sequence analysis. Annu Rev Microbiol. 1994;48:619–654. [PubMed]
  • Karlin S, Ladunga I. Comparisons of eukaryotic genomic sequences. Proc Natl Acad Sci U S A. 1994 Dec 20;91(26):12832–12836. [PMC free article] [PubMed]
  • Karlin S, Macken C. Assessment of inhomogeneities in an E. coli physical map. Nucleic Acids Res. 1991 Aug 11;19(15):4241–4246. [PMC free article] [PubMed]
  • Karlin S, Mrázek J. What drives codon choices in human genes? J Mol Biol. 1996 Oct 4;262(4):459–472. [PubMed]
  • Karlin S, Mrázek J, Campbell AM. Frequent oligonucleotides and peptides of the Haemophilus influenzae genome. Nucleic Acids Res. 1996 Nov 1;24(21):4263–4272. [PMC free article] [PubMed]
  • Kunkel TA. Biological asymmetries and the fidelity of eukaryotic DNA replication. Bioessays. 1992 May;14(5):303–308. [PubMed]
  • Lake JA. Origin of the eukaryotic nucleus: eukaryotes and eocytes are genotypically related. Can J Microbiol. 1989 Jan;35(1):109–118. [PubMed]
  • Lake JA. Reconstructing evolutionary trees from DNA and protein sequences: paralinear distances. Proc Natl Acad Sci U S A. 1994 Feb 15;91(4):1455–1459. [PMC free article] [PubMed]
  • Lorenz MG, Wackernagel W. Bacterial gene transfer by natural genetic transformation in the environment. Microbiol Rev. 1994 Sep;58(3):563–602. [PMC free article] [PubMed]
  • Mongold JA. DNA repair and the evolution of transformation in Haemophilus influenzae. Genetics. 1992 Dec;132(4):893–898. [PMC free article] [PubMed]
  • Moxon ER, Rainey PB, Nowak MA, Lenski RE. Adaptive evolution of highly mutable loci in pathogenic bacteria. Curr Biol. 1994 Jan 1;4(1):24–33. [PubMed]
  • Olsen GJ, Woese CR, Overbeek R. The winds of (evolutionary) change: breathing new life into microbiology. J Bacteriol. 1994 Jan;176(1):1–6. [PMC free article] [PubMed]
  • Otwinowski Z, Schevitz RW, Zhang RG, Lawson CL, Joachimiak A, Marmorstein RQ, Luisi BF, Sigler PB. Crystal structure of trp repressor/operator complex at atomic resolution. Nature. 1988 Sep 22;335(6188):321–329. [PubMed]
  • Phillips GJ, Arnold J, Ivarie R. Mono- through hexanucleotide composition of the Escherichia coli genome: a Markov chain analysis. Nucleic Acids Res. 1987 Mar 25;15(6):2611–2626. [PMC free article] [PubMed]
  • Rafferty JB, Somers WS, Saint-Girons I, Phillips SE. Three-dimensional crystal structures of Escherichia coli met repressor with and without corepressor. Nature. 1989 Oct 26;341(6244):705–710. [PubMed]
  • Rivera MC, Lake JA. Evidence that eukaryotes and eocyte prokaryotes are immediate relatives. Science. 1992 Jul 3;257(5066):74–76. [PubMed]
  • Roger AJ, Brown JR. A chimeric origin for eukaryotes re-examined. Trends Biochem Sci. 1996 Oct;21(10):370–372. [PubMed]
  • Russell GJ, Subak-Sharpe JH. Similarity of the general designs of protochordates and invertebrates. Nature. 1977 Apr 7;266(5602):533–536. [PubMed]
  • Russell GJ, Walker PM, Elton RA, Subak-Sharpe JH. Doublet frequency analysis of fractionated vertebrate nuclear DNA. J Mol Biol. 1976 Nov;108(1):1–23. [PubMed]
  • Solomon JM, Grossman AD. Who's competent and when: regulation of natural genetic competence in bacteria. Trends Genet. 1996 Apr;12(4):150–155. [PubMed]
  • Tiboni O, Cammarano P, Sanangelantoni AM. Cloning and sequencing of the gene encoding glutamine synthetase I from the archaeum Pyrococcus woesei: anomalous phylogenies inferred from analysis of archaeal and bacterial glutamine synthetase I sequences. J Bacteriol. 1993 May;175(10):2961–2969. [PMC free article] [PubMed]
  • Viale AM, Arakaki AK. The chaperone connection to the origins of the eukaryotic organelles. FEBS Lett. 1994 Mar 21;341(2-3):146–151. [PubMed]
  • Woese CR. Bacterial evolution. Microbiol Rev. 1987 Jun;51(2):221–271. [PMC free article] [PubMed]
  • Woese CR. Whither microbiology? Phylogenetic trees. Curr Biol. 1996 Sep 1;6(9):1060–1063. [PubMed]
  • Woese CR, Kandler O, Wheelis ML. Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya. Proc Natl Acad Sci U S A. 1990 Jun;87(12):4576–4579. [PMC free article] [PubMed]

Articles from Journal of Bacteriology are provided here courtesy of American Society for Microbiology (ASM)


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...