Logo of pnasPNASInfo for AuthorsSubscriptionsAboutThis Article
Proc Natl Acad Sci U S A. Dec 6, 1994; 91(25): 12091–12095.
PMCID: PMC45382

Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks.

Abstract

We describe an approach to analyzing protein sequence databases that, starting from a single uncharacterized sequence or group of related sequences, generates blocks of conserved segments. The procedure involves iterative database scans with an evolving position-dependent weight matrix constructed from a coevolving set of aligned conserved segments. For each iteration, the expected distribution of matrix scores under a random model is used to set a cutoff score for the inclusion of a segment in the next iteration. This cutoff may be calculated to allow the chance inclusion of either a fixed number or a fixed proportion of false positive segments. With sufficiently high cutoff scores, the procedure converged for all alignment blocks studied, with varying numbers of iterations required. Different methods for calculating weight matrices from alignment blocks were compared. The most effective of those tested was a logarithm-of-odds, Bayesian-based approach that used prior residue probabilities calculated from a mixture of Dirichlet distributions. The procedure described was used to detect novel conserved motifs of potential biological importance.

Full text

Full text is available as a scanned copy of the original print version. Get a printable copy (PDF file) of the complete article (1.2M), or click on a page image below to browse page by page. Links to PubMed are also available for Selected References.

Images in this article

Click on the image to see a larger version.

Selected References

These references are in PubMed. This may not be the complete list of references from this article.
  • Altschul SF, Boguski MS, Gish W, Wootton JC. Issues in searching molecular sequence databases. Nat Genet. 1994 Feb;6(2):119–129. [PubMed]
  • Pearson WR, Lipman DJ. Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A. 1988 Apr;85(8):2444–2448. [PMC free article] [PubMed]
  • Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990 Oct 5;215(3):403–410. [PubMed]
  • Altschul SF, Lipman DJ. Protein database searches for multiple alignments. Proc Natl Acad Sci U S A. 1990 Jul;87(14):5509–5513. [PMC free article] [PubMed]
  • Gish W, States DJ. Identification of protein coding regions by database similarity search. Nat Genet. 1993 Mar;3(3):266–272. [PubMed]
  • Pósfai J, Bhagwat AS, Pósfai G, Roberts RJ. Predictive motifs derived from cytosine methyltransferases. Nucleic Acids Res. 1989 Apr 11;17(7):2421–2435. [PMC free article] [PubMed]
  • Stormo GD, Hartzell GW., 3rd Identifying protein-binding sites from unaligned DNA fragments. Proc Natl Acad Sci U S A. 1989 Feb;86(4):1183–1187. [PMC free article] [PubMed]
  • Schuler GD, Altschul SF, Lipman DJ. A workbench for multiple alignment construction and analysis. Proteins. 1991;9(3):180–190. [PubMed]
  • Lawrence CE, Altschul SF, Boguski MS, Liu JS, Neuwald AF, Wootton JC. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science. 1993 Oct 8;262(5131):208–214. [PubMed]
  • Hodgman TC. The elucidation of protein function by sequence motif analysis. Comput Appl Biosci. 1989 Feb;5(1):1–13. [PubMed]
  • Smith RF, Smith TF. Automatic generation of primary sequence patterns from sets of related protein sequences. Proc Natl Acad Sci U S A. 1990 Jan;87(1):118–122. [PMC free article] [PubMed]
  • Bairoch A. The PROSITE dictionary of sites and patterns in proteins, its current status. Nucleic Acids Res. 1993 Jul 1;21(13):3097–3103. [PMC free article] [PubMed]
  • McLachlan AD. Analysis of gene duplication repeats in the myosin rod. J Mol Biol. 1983 Sep 5;169(1):15–30. [PubMed]
  • Schneider TD, Stormo GD, Gold L, Ehrenfeucht A. Information content of binding sites on nucleotide sequences. J Mol Biol. 1986 Apr 5;188(3):415–431. [PubMed]
  • Berg OG, von Hippel PH. Selection of DNA binding sites by regulatory proteins. Statistical-mechanical theory and application to operators and promoters. J Mol Biol. 1987 Feb 20;193(4):723–750. [PubMed]
  • Gribskov M, McLachlan AD, Eisenberg D. Profile analysis: detection of distantly related proteins. Proc Natl Acad Sci U S A. 1987 Jul;84(13):4355–4358. [PMC free article] [PubMed]
  • Stormo GD. Computer methods for analyzing sequence recognition of nucleic acids. Annu Rev Biophys Biophys Chem. 1988;17:241–263. [PubMed]
  • Dodd IB, Egan JB. Improved detection of helix-turn-helix DNA-binding motifs in protein sequences. Nucleic Acids Res. 1990 Sep 11;18(17):5019–5026. [PMC free article] [PubMed]
  • Henikoff S, Henikoff JG. Automated assembly of protein blocks for database searching. Nucleic Acids Res. 1991 Dec 11;19(23):6565–6572. [PMC free article] [PubMed]
  • Brown M, Hughey R, Krogh A, Mian IS, Sjölander K, Haussler D. Using Dirichlet mixture priors to derive hidden Markov models for protein families. Proc Int Conf Intell Syst Mol Biol. 1993;1:47–55. [PubMed]
  • Koonin EV, Bork P, Sander C. Yeast chromosome III: new gene functions. EMBO J. 1994 Feb 1;13(3):493–503. [PMC free article] [PubMed]
  • Jones DT, Taylor WR, Thornton JM. The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosci. 1992 Jun;8(3):275–282. [PubMed]
  • Henikoff S, Henikoff JG. Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A. 1992 Nov 15;89(22):10915–10919. [PMC free article] [PubMed]
  • Henikoff S, Henikoff JG. Performance evaluation of amino acid substitution matrices. Proteins. 1993 Sep;17(1):49–61. [PubMed]
  • Karlin S, Altschul SF. Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc Natl Acad Sci U S A. 1990 Mar;87(6):2264–2268. [PMC free article] [PubMed]
  • Staden R. Methods for calculating the probabilities of finding patterns in sequences. Comput Appl Biosci. 1989 Apr;5(2):89–96. [PubMed]
  • Bairoch A, Boeckmann B. The SWISS-PROT protein sequence data bank, recent developments. Nucleic Acids Res. 1993 Jul 1;21(13):3093–3096. [PMC free article] [PubMed]
  • Koonin EV, Dolja VV. Evolution and taxonomy of positive-strand RNA viruses: implications of comparative analysis of amino acid sequences. Crit Rev Biochem Mol Biol. 1993;28(5):375–430. [PubMed]
  • Braithwaite DK, Ito J. Compilation, alignment, and phylogenetic relationships of DNA polymerases. Nucleic Acids Res. 1993 Feb 25;21(4):787–802. [PMC free article] [PubMed]
  • Gorbalenya AE, Koonin EV. Superfamily of UvrA-related NTP-binding proteins. Implications for rational classification of recombination/repair systems. J Mol Biol. 1990 Jun 20;213(4):583–591. [PubMed]
  • Gribskov M. Translational initiation factors IF-1 and eIF-2 alpha share an RNA-binding motif with prokaryotic ribosomal protein S1 and polynucleotide phosphorylase. Gene. 1992 Sep 21;119(1):107–111. [PubMed]
  • Attwood TK, Findlay JB. Design of a discriminating fingerprint for G-protein-coupled receptors. Protein Eng. 1993 Feb;6(2):167–176. [PubMed]
  • Rohde K, Bork P. A fast, sensitive pattern-matching approach for protein sequences. Comput Appl Biosci. 1993 Apr;9(2):183–189. [PubMed]
  • Koonin EV. Prediction of an rRNA methyltransferase domain in human tumor-specific nucleolar protein P120. Nucleic Acids Res. 1994 Jul 11;22(13):2476–2478. [PMC free article] [PubMed]
  • Alonso JC, Stiege AC, Dobrinski B, Lurz R. Purification and properties of the RecR protein from Bacillus subtilis 168. J Biol Chem. 1993 Jan 15;268(2):1424–1429. [PubMed]
  • Umezu K, Chi NW, Kolodner RD. Biochemical interaction of the Escherichia coli RecF, RecO, and RecR proteins with RecA protein and single-stranded DNA binding protein. Proc Natl Acad Sci U S A. 1993 May 1;90(9):3875–3879. [PMC free article] [PubMed]
  • Confalonieri F, Elie C, Nadal M, de La Tour C, Forterre P, Duguet M. Reverse gyrase: a helicase-like domain and a type I topoisomerase in the same polypeptide. Proc Natl Acad Sci U S A. 1993 May 15;90(10):4753–4757. [PMC free article] [PubMed]
  • Lovett ST, Kolodner RD. Identification and purification of a single-stranded-DNA-specific exonuclease encoded by the recJ gene of Escherichia coli. Proc Natl Acad Sci U S A. 1989 Apr;86(8):2627–2631. [PMC free article] [PubMed]
  • West SC. The processing of recombination intermediates: mechanistic insights from studies of bacterial proteins. Cell. 1994 Jan 14;76(1):9–15. [PubMed]
  • Linderoth NA, Julien B, Flick KE, Calendar R, Christie GE. Molecular cloning and characterization of bacteriophage P2 genes R and S involved in tail completion. Virology. 1994 May 1;200(2):347–359. [PubMed]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

Formats:

Related citations in PubMed

Cited by other articles in PMC

See all...

Links

  • Cited in Books
    Cited in Books
    PubMed Central articles cited in books
  • MedGen
    MedGen
    Related information in MedGen
  • PubMed
    PubMed
    PubMed citations for these articles
  • Substance
    Substance
    PubChem Substance links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...