• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of pnasPNASInfo for AuthorsSubscriptionsAboutThis Article
Proc Natl Acad Sci U S A. Oct 29, 1996; 93(22): 12098–12103.
PMCID: PMC37949

Multiple DNA and protein sequence alignment based on segment-to-segment comparison.

Abstract

In this paper, a new way to think about, and to construct, pairwise as well as multiple alignments of DNA and protein sequences is proposed. Rather than forcing alignments to either align single residues or to introduce gaps by defining an alignment as a path running right from the source up to the sink in the associated dot-matrix diagram, we propose to consider alignments as consistent equivalence relations defined on the set of all positions occurring in all sequences under consideration. We also propose constructing alignments from whole segments exhibiting highly significant overall similarity rather than by aligning individual residues. Consequently, we present an alignment algorithm that (i) is based on segment-to-segment comparison instead of the commonly used residue-to-residue comparison and which (ii) avoids the well-known difficulties concerning the choice of appropriate gap penalties: gaps are not treated explicity, but remain as those parts of the sequences that do not belong to any of the aligned segments. Finally, we discuss the application of our algorithm to two test examples and compare it with commonly used alignment methods. As a first example, we aligned a set of 11 DNA sequences coding for functional helix-loop-helix proteins. Though the sequences show only low overall similarity, our program correctly aligned all of the 11 functional sites, which was a unique result among the methods tested. As a by-product, the reading frames of the sequences were identified. Next, we aligned a set of ribonuclease H proteins and compared our results with alignments produced by other programs as reported by McClure et al. [McClure, M. A., Vasi, T. K. & Fitch, W. M. (1994) Mol. Biol. Evol. 11, 571-592]. Our program was one of the best scoring programs. However, in contrast to other methods, our protein alignments are independent of user-defined parameters.

Full text

Full text is available as a scanned copy of the original print version. Get a printable copy (PDF file) of the complete article (1.1M), or click on a page image below to browse page by page. Links to PubMed are also available for Selected References.

Selected References

These references are in PubMed. This may not be the complete list of references from this article.
  • Needleman SB, Wunsch CD. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol. 1970 Mar;48(3):443–453. [PubMed]
  • Murata M, Richardson JS, Sussman JL. Simultaneous comparison of three protein sequences. Proc Natl Acad Sci U S A. 1985 May;82(10):3073–3077. [PMC free article] [PubMed]
  • Argos P. A sensitive procedure to compare amino acid sequences. J Mol Biol. 1987 Jan 20;193(2):385–396. [PubMed]
  • Argos P, Vingron M. Sensitivity comparison of protein amino acid sequences. Methods Enzymol. 1990;183:352–365. [PubMed]
  • Waterman MS, Jones R. Consensus methods for DNA and protein sequence alignment. Methods Enzymol. 1990;183:221–237. [PubMed]
  • Johnson MS, Doolittle RF. A method for the simultaneous alignment of three or more amino acid sequences. J Mol Evol. 1986;23(3):267–278. [PubMed]
  • McClure MA, Vasi TK, Fitch WM. Comparative analysis of multiple protein-sequence alignment methods. Mol Biol Evol. 1994 Jul;11(4):571–592. [PubMed]
  • Taylor WR. Motif-biased protein sequence alignment. J Comput Biol. 1994 Winter;1(4):297–310. [PubMed]
  • Vingron M, Sibbald PR. Weighting in sequence space: a comparison of methods in terms of generalized sequences. Proc Natl Acad Sci U S A. 1993 Oct 1;90(19):8777–8781. [PMC free article] [PubMed]
  • Dang CV, Dolde C, Gillison ML, Kato GJ. Discrimination between related DNA sites by a single amino acid residue of Myc-related basic-helix-loop-helix proteins. Proc Natl Acad Sci U S A. 1992 Jan 15;89(2):599–602. [PMC free article] [PubMed]
  • Feng DF, Doolittle RF. Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J Mol Evol. 1987;25(4):351–360. [PubMed]
  • Higgins DG, Sharp PM. Fast and sensitive multiple sequence alignments on a microcomputer. Comput Appl Biosci. 1989 Apr;5(2):151–153. [PubMed]
  • Higgins DG, Bleasby AJ, Fuchs R. CLUSTAL V: improved software for multiple sequence alignment. Comput Appl Biosci. 1992 Apr;8(2):189–191. [PubMed]
  • Martinez HM. A flexible multiple sequence alignment program. Nucleic Acids Res. 1988 Mar 11;16(5):1683–1691. [PMC free article] [PubMed]
  • Lessel U, Schomburg D. Similarities between protein 3-D structures. Protein Eng. 1994 Oct;7(10):1175–1187. [PubMed]
  • Gotoh O. Optimal alignment between groups of sequences and its application to multiple sequence alignment. Comput Appl Biosci. 1993 Jun;9(3):361–370. [PubMed]
  • Vingron M, Waterman MS. Sequence alignment and penalty choice. Review of concepts, case studies and implications. J Mol Biol. 1994 Jan 7;235(1):1–12. [PubMed]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

  • Compound
    Compound
    PubChem Compound links
  • PubMed
    PubMed
    PubMed citations for these articles
  • Substance
    Substance
    PubChem Substance links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...