• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of narLink to Publisher's site
Nucleic Acids Res. Jul 15, 1996; 24(14): 2730–2739.
PMCID: PMC145991

PairWise and SearchWise: finding the optimal alignment in a simultaneous comparison of a protein profile against all DNA translation frames.

Abstract

DNA translation frames can be disrupted for several reasons, including: (i) errors in sequence determination; (ii) RNA processing, such as intron removal and guide RNA editing; (iii) less commonly, polymerase frameshifting during transcription or ribosomal frameshifting during translation. Frameshifts frequently confound computational activities involving homologous sequences, such as database searches and inferences on structure, function or phylogeny made from multiple alignments. A dynamic alignment algorithm is reported here which compares a protein profile (a residue scoring matrix for one or more aligned sequences) against the three translation frames of a DNA strand, allowing frameshifting. The algorithm has been incorporated into a new package, WiseTools, for comparison of biological sequences. A protein profile can be compared against either a DNA sequence or a protein sequence. The program PairWise may be used interactively for alignment of any two sequence inputs. SearchWise can perform combinations of searches through DNA or protein databases by a protein profile or DNA sequence. Routine application of the programs has revealed a set of database entries with frameshifts caused by errors in sequence determination.

Full Text

The Full Text of this article is available as a PDF (293K).

Selected References

These references are in PubMed. This may not be the complete list of references from this article.
  • Pearson WR, Lipman DJ. Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A. 1988 Apr;85(8):2444–2448. [PMC free article] [PubMed]
  • Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990 Oct 5;215(3):403–410. [PubMed]
  • States DJ. Molecular sequence accuracy: analysing imperfect data. Trends Genet. 1992 Feb;8(2):52–55. [PubMed]
  • States DJ, Botstein D. Molecular sequence accuracy and the analysis of protein coding regions. Proc Natl Acad Sci U S A. 1991 Jul 1;88(13):5518–5522. [PMC free article] [PubMed]
  • Posfai J, Roberts RJ. Finding errors in DNA sequences. Proc Natl Acad Sci U S A. 1992 May 15;89(10):4698–4702. [PMC free article] [PubMed]
  • Claverie JM. Detecting frame shifts by amino acid sequence comparison. J Mol Biol. 1993 Dec 20;234(4):1140–1157. [PubMed]
  • Beck S. Accuracy of DNA sequencing: should the sequence quality be monitored? DNA Seq. 1993;4(3):215–217. [PubMed]
  • Bairoch A, Boeckmann B. The SWISS-PROT protein sequence data bank: current status. Nucleic Acids Res. 1994 Sep;22(17):3578–3580. [PMC free article] [PubMed]
  • Adams MD, Kelley JM, Gocayne JD, Dubnick M, Polymeropoulos MH, Xiao H, Merril CR, Wu A, Olde B, Moreno RF, et al. Complementary DNA sequencing: expressed sequence tags and human genome project. Science. 1991 Jun 21;252(5013):1651–1656. [PubMed]
  • Bork P, Gibson TJ. Applying motif and profile searches. Methods Enzymol. 1996;266:162–184. [PubMed]
  • Pearson WR, Miller W. Dynamic programming algorithms for biological sequence comparison. Methods Enzymol. 1992;210:575–601. [PubMed]
  • Gribskov M, McLachlan AD, Eisenberg D. Profile analysis: detection of distantly related proteins. Proc Natl Acad Sci U S A. 1987 Jul;84(13):4355–4358. [PMC free article] [PubMed]
  • Thompson JD, Higgins DG, Gibson TJ. Improved sensitivity of profile searches through the use of sequence weights and gap excision. Comput Appl Biosci. 1994 Feb;10(1):19–29. [PubMed]
  • Waterman MS, Eggert M. A new algorithm for best subsequence alignments with application to tRNA-rRNA comparisons. J Mol Biol. 1987 Oct 20;197(4):723–728. [PubMed]
  • Gribskov M, Burgess RR. Sigma factors from E. coli, B. subtilis, phage SP01, and phage T4 are homologous proteins. Nucleic Acids Res. 1986 Aug 26;14(16):6745–6763. [PMC free article] [PubMed]
  • Henikoff S, Henikoff JG. Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A. 1992 Nov 15;89(22):10915–10919. [PMC free article] [PubMed]
  • Devereux J, Haeberli P, Smithies O. A comprehensive set of sequence analysis programs for the VAX. Nucleic Acids Res. 1984 Jan 11;12(1 Pt 1):387–395. [PMC free article] [PubMed]
  • Benson DA, Boguski M, Lipman DJ, Ostell J. GenBank. Nucleic Acids Res. 1994 Sep;22(17):3441–3444. [PMC free article] [PubMed]
  • Gibson TJ, Hyvönen M, Musacchio A, Saraste M, Birney E. PH domain: the first anniversary. Trends Biochem Sci. 1994 Sep;19(9):349–353. [PubMed]
  • Aasland R, Gibson TJ, Stewart AF. The PHD finger: implications for chromatin-mediated transcriptional regulation. Trends Biochem Sci. 1995 Feb;20(2):56–59. [PubMed]
  • Birney E, Kumar S, Krainer AR. Analysis of the RNA-recognition motif and RS and RGG domains: conservation in metazoan pre-mRNA splicing factors. Nucleic Acids Res. 1993 Dec 25;21(25):5803–5816. [PMC free article] [PubMed]
  • Benner SA, Cohen MA, Gonnet GH. Amino acid substitution during functionally constrained divergent evolution of protein sequences. Protein Eng. 1994 Nov;7(11):1323–1332. [PubMed]
  • Wilson R, Ainscough R, Anderson K, Baynes C, Berks M, Bonfield J, Burton J, Connell M, Copsey T, Cooper J, et al. 2.2 Mb of contiguous nucleotide sequence from chromosome III of C. elegans. Nature. 1994 Mar 3;368(6466):32–38. [PubMed]
  • Staden R. Searching for patterns in protein and nucleic acid sequences. Methods Enzymol. 1990;183:193–211. [PubMed]
  • Uberbacher EC, Mural RJ. Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach. Proc Natl Acad Sci U S A. 1991 Dec 15;88(24):11261–11265. [PMC free article] [PubMed]
  • Snyder EE, Stormo GD. Identification of protein coding regions in genomic DNA. J Mol Biol. 1995 Apr 21;248(1):1–18. [PubMed]
  • Séraphin B. Sm and Sm-like proteins belong to a large family: identification of proteins of the U6 as well as the U1, U2, U4 and U5 snRNPs. EMBO J. 1995 May 1;14(9):2089–2098. [PMC free article] [PubMed]
  • Tachibana K, Ishiura M, Uchida T, Kishimoto T. The starfish egg mRNA responsible for meiosis reinitiation encodes cyclin. Dev Biol. 1990 Aug;140(2):241–252. [PubMed]
  • Gibson TJ, Thompson JD, Blocker A, Kouzarides T. Evidence for a protein domain superfamily shared by the cyclins, TFIIB and RB/p107. Nucleic Acids Res. 1994 Mar 25;22(6):946–952. [PMC free article] [PubMed]
  • Katzav S, Martin-Zanca D, Barbacid M. vav, a novel human oncogene derived from a locus ubiquitously expressed in hematopoietic cells. EMBO J. 1989 Aug;8(8):2283–2290. [PMC free article] [PubMed]
  • Boguski MS, Bairoch A, Attwood TK, Michaels GS. Proto-vav and gene expression. Nature. 1992 Jul 9;358(6382):113–113. [PubMed]
  • Adams JM, Houston H, Allen J, Lints T, Harvey R. The hematopoietically expressed vav proto-oncogene shares homology with the dbl GDP-GTP exchange factor, the bcr gene and a yeast gene (CDC24) involved in cytoskeletal organization. Oncogene. 1992 Apr;7(4):611–618. [PubMed]
  • Aasland R, Stewart AF. The chromo shadow domain, a second chromo domain in heterochromatin-binding protein 1, HP1. Nucleic Acids Res. 1995 Aug 25;23(16):3168–3173. [PMC free article] [PubMed]
  • Kharrat A, Macias MJ, Gibson TJ, Nilges M, Pastore A. Structure of the dsRNA binding domain of E. coli RNase III. EMBO J. 1995 Jul 17;14(14):3572–3584. [PMC free article] [PubMed]
  • Aasland R, Stewart AF, Gibson T. The SANT domain: a putative DNA-binding domain in the SWI-SNF and ADA complexes, the transcriptional co-repressor N-CoR and TFIIIB. Trends Biochem Sci. 1996 Mar;21(3):87–88. [PubMed]
  • Guan X, Uberbacher EC. Alignments of DNA and protein sequences containing frameshift errors. Comput Appl Biosci. 1996 Feb;12(1):31–40. [PubMed]
  • Staden R. Measurements of the effects that coding for a protein has on a DNA sequence and their use for finding genes. Nucleic Acids Res. 1984 Jan 11;12(1 Pt 2):551–567. [PMC free article] [PubMed]
  • Gibson TJ, Higgins DG. Non-muscle and smooth muscle myosin light chain kinases: no end in sight. DNA Seq. 1993;3(5):333–335. [PubMed]
  • Xu Y, Mural RJ, Uberbacher EC. Correcting sequencing errors in DNA coding regions using a dynamic programming approach. Comput Appl Biosci. 1995 Apr;11(2):117–124. [PubMed]
  • Fichant GA, Quentin Y. A frameshift error detection algorithm for DNA sequencing projects. Nucleic Acids Res. 1995 Aug 11;23(15):2900–2908. [PMC free article] [PubMed]
  • Bork P. Sperm-egg binding protein or proto-oncogene? Science. 1996 Mar 8;271(5254):1431–1435. [PubMed]
  • Tatusov RL, Mushegian AR, Bork P, Brown NP, Hayes WS, Borodovsky M, Rudd KE, Koonin EV. Metabolism and evolution of Haemophilus influenzae deduced from a whole-genome comparison with Escherichia coli. Curr Biol. 1996 Mar 1;6(3):279–291. [PubMed]
  • Bairoch A, Bucher P, Hofmann K. The PROSITE database, its status in 1995. Nucleic Acids Res. 1996 Jan 1;24(1):189–196. [PMC free article] [PubMed]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

  • Compound
    Compound
    PubChem Compound links
  • MedGen
    MedGen
    Related information in MedGen
  • PubMed
    PubMed
    PubMed citations for these articles
  • Substance
    Substance
    PubChem Substance links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...