• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of prosciprotein sciencecshl presssubscriptionsetoc alertsthe protein societyjournal home
Protein Sci. Aug 1995; 4(8): 1587–1595.
PMCID: PMC2143188

Finding flexible patterns in unaligned protein sequences.


We present a new method for the identification of conserved patterns in a set of unaligned related protein sequences. It is able to discover patterns of a quite general form, allowing for both ambiguous positions and for variable length wildcard regions. It allows the user to define a class of patterns (e.g., the degree of ambiguity allowed and the length and number of gaps), and the method is then guaranteed to find the conserved patterns in this class scoring highest according to a significance measure defined. Identified patterns may be refined using one of two new algorithms. We present a new (nonstatistical) significance measure for flexible patterns. The method is shown to recover known motifs for PROSITE families and is also applied to some recently described families from the literature.

Full Text

The Full Text of this article is available as a PDF (988K).

Selected References

These references are in PubMed. This may not be the complete list of references from this article.
  • Aasland R, Gibson TJ, Stewart AF. The PHD finger: implications for chromatin-mediated transcriptional regulation. Trends Biochem Sci. 1995 Feb;20(2):56–59. [PubMed]
  • Bairoch A, Boeckmann B. The SWISS-PROT protein sequence data bank. Nucleic Acids Res. 1992 May 11;20 (Suppl):2019–2022. [PMC free article] [PubMed]
  • Bairoch A, Bucher P. PROSITE: recent developments. Nucleic Acids Res. 1994 Sep;22(17):3583–3589. [PMC free article] [PubMed]
  • Devereux J, Haeberli P, Smithies O. A comprehensive set of sequence analysis programs for the VAX. Nucleic Acids Res. 1984 Jan 11;12(1 Pt 1):387–395. [PMC free article] [PubMed]
  • Dodd IB, Egan JB. Improved detection of helix-turn-helix DNA-binding motifs in protein sequences. Nucleic Acids Res. 1990 Sep 11;18(17):5019–5026. [PMC free article] [PubMed]
  • Etzold T, Argos P. SRS--an indexing and retrieval tool for flat file data libraries. Comput Appl Biosci. 1993 Feb;9(1):49–57. [PubMed]
  • Fuchs R. Predicting protein function: a versatile tool for the Apple Macintosh. Comput Appl Biosci. 1994 Apr;10(2):171–178. [PubMed]
  • Henikoff S, Henikoff JG. Automated assembly of protein blocks for database searching. Nucleic Acids Res. 1991 Dec 11;19(23):6565–6572. [PMC free article] [PubMed]
  • Karlin S, Altschul SF. Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc Natl Acad Sci U S A. 1990 Mar;87(6):2264–2268. [PMC free article] [PubMed]
  • Lawrence CE, Altschul SF, Boguski MS, Liu JS, Neuwald AF, Wootton JC. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science. 1993 Oct 8;262(5131):208–214. [PubMed]
  • Musacchio A, Gibson T, Lehto VP, Saraste M. SH3--an abundant protein domain in search of a function. FEBS Lett. 1992 Jul 27;307(1):55–61. [PubMed]
  • Neuwald AF, Green P. Detecting patterns in protein sequences. J Mol Biol. 1994 Jun 24;239(5):698–712. [PubMed]
  • Ogiwara A, Uchiyama I, Seto Y, Kanehisa M. Construction of a dictionary of sequence motifs that characterize groups of related proteins. Protein Eng. 1992 Sep;5(6):479–488. [PubMed]
  • Roytberg MA. A search for common patterns in many sequences. Comput Appl Biosci. 1992 Feb;8(1):57–64. [PubMed]
  • Saqi MA, Sternberg MJ. Identification of sequence motifs from a set of proteins with related function. Protein Eng. 1994 Feb;7(2):165–171. [PubMed]
  • Smith HO, Annau TM, Chandrasegaran S. Finding sequence motifs in groups of functionally related proteins. Proc Natl Acad Sci U S A. 1990 Jan;87(2):826–830. [PMC free article] [PubMed]
  • Smith RF, Smith TF. Automatic generation of primary sequence patterns from sets of related protein sequences. Proc Natl Acad Sci U S A. 1990 Jan;87(1):118–122. [PMC free article] [PubMed]
  • Taylor WR. Identification of protein sequence homology by consensus template alignment. J Mol Biol. 1986 Mar 20;188(2):233–258. [PubMed]
  • Thompson JD, Higgins DG, Gibson TJ. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994 Nov 11;22(22):4673–4680. [PMC free article] [PubMed]
  • Thompson JD, Higgins DG, Gibson TJ. Improved sensitivity of profile searches through the use of sequence weights and gap excision. Comput Appl Biosci. 1994 Feb;10(1):19–29. [PubMed]
  • Wang JT, Marr TG, Shasha D, Shapiro BA, Chirn GW. Discovering active motifs in sets of related protein sequences and using them for classification. Nucleic Acids Res. 1994 Jul 25;22(14):2769–2775. [PMC free article] [PubMed]

Articles from Protein Science : A Publication of the Protein Society are provided here courtesy of The Protein Society


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...