Analysis of codon usage diversity of bacterial genes with a self-organizing map (SOM): characterization of horizontally transferred genes with emphasis on the E. coli O157 genome

Gene. 2001 Oct 3;276(1-2):89-99. doi: 10.1016/s0378-1119(01)00673-4.

Abstract

With increases in the amounts of available DNA sequence data, it has become increasingly important to develop tools for comprehensive systematic analysis and comparison of species-specific characteristics of protein-coding sequences for a wide variety of genomes. In the present study, we used a novel neural-network algorithm, a self-organizing map (SOM), to efficiently and comprehensively analyze codon usage in approximately 60,000 genes from 29 bacterial species simultaneously. This SOM makes it possible to cluster and visualize genes of individual species separately at a much higher resolution than can be obtained with principal component analysis. The organization of the SOM can be explained by the genome G+C% and tRNA compositions of the individual species. We used SOM to examine codon usage heterogeneity in the E. coli O157 genome, which contains 'O157-unique segments' (O-islands), and showed that SOM is a powerful tool for characterization of horizontally transferred genes.

MeSH terms

  • Algorithms*
  • Base Composition
  • Classification / methods
  • Codon / genetics*
  • Escherichia coli O157 / genetics
  • GC Rich Sequence / genetics
  • Gene Transfer, Horizontal
  • Genes, Bacterial / genetics*
  • Genetic Variation
  • Genome, Bacterial
  • Neural Networks, Computer*
  • Species Specificity

Substances

  • Codon