A new approach for the analysis of bacterial microarray-based Comparative Genomic Hybridization: insights from an empirical study

BMC Genomics. 2005 May 27:6:78. doi: 10.1186/1471-2164-6-78.

Abstract

Background: Microarray-based Comparative Genomic Hybridization (M-CGH) has been used to characterize the extensive intraspecies genetic diversity found in bacteria at the whole-genome level. Although conventional microarray analytical procedures have proved adequate in handling M-CGH data, data interpretation using these methods is based on a continuous character model in which gene divergence and gene absence form a spectrum of decreasing gene conservation levels. However, whereas gene divergence may yet be accompanied by retention in gene function, gene absence invariably leads to loss of function. This distinction, if ignored, leads to a loss in the information to be gained from M-CGH data. We present here results from experiments in which two genome-sequenced strains of C. jejuni were compared against each other using M-CGH. Because the gene content of both strains was known a priori, we were able to closely examine the effects of sequence divergence and gene absence on M-CGH data in order to define analytical parameters for M-CGH data interpretation. This would facilitate the examination of the relative effects of sequence divergence or gene absence in comparative genomics analyses of multiple strains of any species for which genome sequence data and a DNA microarray are available.

Results: As a first step towards improving the analysis of M-CGH data, we estimated the degree of experimental error in a series of experiments in which identical samples were compared against each other by M-CGH. This variance estimate was used to validate a Log Ratio-based methodology for identification of outliers in M-CGH data. We compared two genome strains by M-CGH to examine the effect of probe/target identity on the Log Ratios of signal intensities using prior knowledge of gene divergence and gene absence to establish Log Ratio thresholds for the identification of absent and conserved genes.

Conclusion: The results from this empirical study validate the Log Ratio thresholds that have been used in other studies to establish gene divergence/absence. Moreover, the analytical framework presented here enhances the information content derived from M-CGH data by shifting the focus from divergent/absent gene detection to accurate detection of conserved and absent genes. This approach closely aligns the technical limitations of M-CGH analysis with practical limitations on the biological interpretation of comparative genomics data.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Campylobacter jejuni / metabolism
  • Chromosome Mapping
  • Conserved Sequence
  • Gene Expression Profiling / methods
  • Genes, Bacterial*
  • Genome
  • Genome, Bacterial
  • Genomics
  • Nucleic Acid Hybridization
  • Oligonucleotide Array Sequence Analysis / methods*
  • Reproducibility of Results
  • Sequence Analysis, DNA