Logo of pnasPNASInfo for AuthorsSubscriptionsAboutThis Article
Proc Natl Acad Sci U S A. 1989 Feb; 86(4): 1183–1187.
PMCID: PMC286650

Identifying protein-binding sites from unaligned DNA fragments.


The ability to determine important features within DNA sequences from the sequences alone is becoming essential as large-scale sequencing projects are being undertaken. We present a method that can be applied to the problem of identifying the recognition pattern for a DNA-binding protein given only a collection of sequenced DNA fragments, each known to contain somewhere within it a binding site for that protein. Information about the position or orientation of the binding sites within those fragments is not needed. The method compares the "information content" of a large number of possible binding site alignments to arrive at a matrix representation of the binding site pattern. The specificity of the protein is represented as a matrix, rather than a consensus sequence, allowing patterns that are typical of regulatory protein-binding sites to be identified. The reliability of the method improves as the number of sequences increases, but the time required increases only linearly with the number of sequences. An example, using known cAMP receptor protein-binding sites, illustrates the method.

Full text

Full text is available as a scanned copy of the original print version. Get a printable copy (PDF file) of the complete article (915K), or click on a page image below to browse page by page. Links to PubMed are also available for Selected References.

Selected References

These references are in PubMed. This may not be the complete list of references from this article.
  • Hawley DK, McClure WR. Compilation and analysis of Escherichia coli promoter DNA sequences. Nucleic Acids Res. 1983 Apr 25;11(8):2237–2255. [PMC free article] [PubMed]
  • Harley CB, Reynolds RP. Analysis of E. coli promoter sequences. Nucleic Acids Res. 1987 Mar 11;15(5):2343–2361. [PMC free article] [PubMed]
  • Stormo GD. Computer methods for analyzing sequence recognition of nucleic acids. Annu Rev Biophys Biophys Chem. 1988;17:241–263. [PubMed]
  • Mulligan ME, Hawley DK, Entriken R, McClure WR. Escherichia coli promoter sequences predict in vitro RNA polymerase selectivity. Nucleic Acids Res. 1984 Jan 11;12(1 Pt 2):789–800. [PMC free article] [PubMed]
  • Berg OG, von Hippel PH. Selection of DNA binding sites by regulatory proteins. Statistical-mechanical theory and application to operators and promoters. J Mol Biol. 1987 Feb 20;193(4):723–750. [PubMed]
  • Luo M, Tsao J, Rossmann MG, Basak S, Compans RW. Preliminary X-ray crystallographic analysis of canine parvovirus crystals. J Mol Biol. 1988 Mar 5;200(1):209–211. [PubMed]
  • de Crombrugghe B, Busby S, Buc H. Cyclic AMP receptor protein: role in transcription activation. Science. 1984 May 25;224(4651):831–838. [PubMed]
  • Schneider TD, Stormo GD, Gold L, Ehrenfeucht A. Information content of binding sites on nucleotide sequences. J Mol Biol. 1986 Apr 5;188(3):415–431. [PubMed]
  • Staden R. Computer methods to locate signals in nucleic acid sequences. Nucleic Acids Res. 1984 Jan 11;12(1 Pt 2):505–519. [PMC free article] [PubMed]
  • Staden R. Methods to define and locate patterns of motifs in sequences. Comput Appl Biosci. 1988 Mar;4(1):53–60. [PubMed]
  • Galas DJ, Eggert M, Waterman MS. Rigorous pattern-recognition methods for DNA sequences. Analysis of promoter sequences from Escherichia coli. J Mol Biol. 1985 Nov 5;186(1):117–128. [PubMed]
  • Goss TJ, Datta P. Molecular cloning and expression of the biodegradative threonine dehydratase gene (tdc) of Escherichia coli K12. Mol Gen Genet. 1985;201(2):308–314. [PubMed]
  • Bedouelle H, Hofnung M. A DNA sequence containing the control regions of the malEFG and malK-lamB operons in Escherichia coli K12. Mol Gen Genet. 1982;185(1):82–87. [PubMed]
  • Le Grice SF, Matzura H, Marcoli R, Iida S, Bickle TA. The catabolite-sensitive promoter for the chloramphenicol acetyl transferase gene is preceded by two binding sites for the catabolite gene activator protein. J Bacteriol. 1982 Apr;150(1):312–318. [PMC free article] [PubMed]
  • Sadler JR, Sasmor H, Betz JL. A perfectly symmetric lac operator binds the lac repressor very tightly. Proc Natl Acad Sci U S A. 1983 Nov;80(22):6785–6789. [PMC free article] [PubMed]
  • Lewin R. Genome projects ready to go. Science. 1988 Apr 29;240(4852):602–604. [PubMed]
  • Gribskov M, McLachlan AD, Eisenberg D. Profile analysis: detection of distantly related proteins. Proc Natl Acad Sci U S A. 1987 Jul;84(13):4355–4358. [PMC free article] [PubMed]
  • Gribskov M, Homyak M, Edenfield J, Eisenberg D. Profile scanning for three-dimensional structural patterns in protein sequences. Comput Appl Biosci. 1988 Mar;4(1):61–66. [PubMed]
  • Bacon DJ, Anderson WF. Multiple sequence alignment. J Mol Biol. 1986 Sep 20;191(2):153–161. [PubMed]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


  • Compound
    PubChem Compound links
  • MedGen
    Related information in MedGen
  • PubMed
    PubMed citations for these articles
  • Substance
    PubChem Substance links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...