A flexible and powerful bayesian hierarchical model for ChIP-Chip experiments

Biometrics. 2008 Jun;64(2):468-78. doi: 10.1111/j.1541-0420.2007.00899.x. Epub 2007 Sep 20.

Abstract

Chromatin-immunoprecipitation microarrays (ChIP-chip) that enable researchers to identify regions of a given genome that are bound by specific DNA-binding proteins present new challenges for statistical analysis due to the large number of probes, the high noise-to-signal ratio, and the spatial dependence between probes. We propose a method called BAC (Bayesian analysis of ChIP-chip) to detect transcription factor bound regions, which incorporate the dependence between probes while making little assumptions about the bound regions (e.g., length). BAC is robust to probe outliers with an exchangeable prior for the variances, which allows different variances for the probes but still shrink extreme empirical variances. Parameter estimation is carried out using Markov chain Monte Carlo and inference is based on the joint distribution of the parameters. Bound regions are detected using posterior probabilities computed from the joint posterior distribution of neighboring probes. We show that these posterior probabilities are well calibrated and can be used to obtain an estimate of the false discovery rate. The method is illustrated using two publicly available ChIP-chip data sets containing 18 experimentally validated regions. We compare our method to four other baseline and commonly used techniques, namely, the Wilcoxon's rank sum test, TileMap, HGMM, and MAT. We found BAC and HGMM to perform best at detecting validated regions. However, HGMM appears to be very sensitive to probe outliers compared to BAC. In addition, we present a simulation study, which shows that BAC is more powerful than the other four techniques under various simulation scenarios while being robust to model misspecification.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Bayes Theorem*
  • Chromatin Immunoprecipitation / methods*
  • Chromosome Mapping
  • Computer Simulation
  • DNA-Binding Proteins / genetics*
  • Data Interpretation, Statistical
  • Microarray Analysis / methods*
  • Models, Genetic*
  • Pattern Recognition, Automated / methods*
  • Sequence Analysis, DNA / methods*

Substances

  • DNA-Binding Proteins