An algorithm for finding protein-DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments

Nat Biotechnol. 2002 Aug;20(8):835-9. doi: 10.1038/nbt717. Epub 2002 Jul 8.

Abstract

Chromatin immunoprecipitation followed by cDNA microarray hybridization (ChIP-array) has become a popular procedure for studying genome-wide protein-DNA interactions and transcription regulation. However, it can only map the probable protein-DNA interaction loci within 1-2 kilobases resolution. To pinpoint interaction sites down to the base-pair level, we introduce a computational method, Motif Discovery scan (MDscan), that examines the ChIP-array-selected sequences and searches for DNA sequence motifs representing the protein-DNA interaction sites. MDscan combines the advantages of two widely adopted motif search strategies, word enumeration and position-specific weight matrix updating, and incorporates the ChIP-array ranking information to accelerate searches and enhance their success rates. MDscan correctly identified all the experimentally verified motifs from published ChIP-array experiments in yeast (STE12, GAL4, RAP1, SCB, MCB, MCM1, SFF, and SWI5), and predicted two motif patterns for the differential binding of Rap1 protein in telomere regions. In our studies, the method was faster and more accurate than several established motif-finding algorithms. MDscan can be used to find DNA motifs not only in ChIP-array experiments but also in other experiments in which a subgroup of the sequences can be inferred to contain relatively abundant motif sites. The MDscan web server can be accessed at http://BioProspector.stanford.edu/MDscan/.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Algorithms*
  • Base Sequence
  • Binding Sites
  • Chromatin / genetics
  • Chromatin / metabolism*
  • Computational Biology / methods
  • DNA / genetics*
  • DNA / metabolism*
  • DNA-Binding Proteins / metabolism*
  • Gene Expression Regulation
  • Genes, Fungal / genetics
  • Internet
  • Oligonucleotide Array Sequence Analysis / methods*
  • Precipitin Tests / methods*
  • Protein Binding
  • Response Elements / genetics
  • Saccharomyces cerevisiae / genetics
  • Saccharomyces cerevisiae Proteins / metabolism
  • Sensitivity and Specificity
  • Shelterin Complex
  • Software
  • Telomere / genetics
  • Telomere / metabolism
  • Telomere-Binding Proteins / metabolism
  • Time Factors
  • Transcription Factors / metabolism

Substances

  • Chromatin
  • DNA-Binding Proteins
  • RAP1 protein, S cerevisiae
  • Saccharomyces cerevisiae Proteins
  • Shelterin Complex
  • Telomere-Binding Proteins
  • Transcription Factors
  • DNA