• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of narLink to Publisher's site
Nucleic Acids Res. Apr 15, 1996; 24(8): 1515–1524.
PMCID: PMC145823

SAGA: sequence alignment by genetic algorithm.

Abstract

We describe a new approach to multiple sequence alignment using genetic algorithms and an associated software package called SAGA. The method involves evolving a population of alignments in a quasi evolutionary manner and gradually improving the fitness of the population as measured by an objective function which measures multiple alignment quality. SAGA uses an automatic scheduling scheme to control the usage of 22 different operators for combining alignments or mutating them between generations. When used to optimise the well known sums of pairs objective function, SAGA performs better than some of the widely used alternative packages. This is seen with respect to the ability to achieve an optimal solution and with regard to the accuracy of alignment by comparison with reference alignments based on sequences of known tertiary structure. The general attraction of the approach is the ability to optimise any objective function that one can invent.

Full Text

The Full Text of this article is available as a PDF (204K).

Selected References

These references are in PubMed. This may not be the complete list of references from this article.
  • Feng DF, Doolittle RF. Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J Mol Evol. 1987;25(4):351–360. [PubMed]
  • Taylor WR. A flexible method to align large numbers of biological sequences. J Mol Evol. 1988 Dec;28(1-2):161–169. [PubMed]
  • Thompson JD, Higgins DG, Gibson TJ. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994 Nov 11;22(22):4673–4680. [PMC free article] [PubMed]
  • Barton GJ, Sternberg MJ. A strategy for the rapid multiple alignment of protein sequences. Confidence levels from tertiary structure comparisons. J Mol Biol. 1987 Nov 20;198(2):327–337. [PubMed]
  • Krogh A, Brown M, Mian IS, Sjölander K, Haussler D. Hidden Markov models in computational biology. Applications to protein modeling. J Mol Biol. 1994 Feb 4;235(5):1501–1531. [PubMed]
  • Lipman DJ, Altschul SF, Kececioglu JD. A tool for multiple sequence alignment. Proc Natl Acad Sci U S A. 1989 Jun;86(12):4412–4415. [PMC free article] [PubMed]
  • Gupta SK, Kececioglu JD, Schäffer AA. Improving the practical space and time efficiency of the shortest-paths approach to sum-of-pairs multiple sequence alignment. J Comput Biol. 1995 Fall;2(3):459–472. [PubMed]
  • Kim J, Pramanik S, Chung MJ. Multiple sequence alignment using simulated annealing. Comput Appl Biosci. 1994 Jul;10(4):419–426. [PubMed]
  • Hirosawa M, Hoshida M, Ishikawa M, Toya T. MASCOT: multiple alignment system for protein sequences based on three-way dynamic programming. Comput Appl Biosci. 1993 Apr;9(2):161–167. [PubMed]
  • Altschul SF, Erickson BW. Optimal sequence alignment using affine gap costs. Bull Math Biol. 1986;48(5-6):603–616. [PubMed]
  • Henikoff S, Henikoff JG. Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A. 1992 Nov 15;89(22):10915–10919. [PMC free article] [PubMed]
  • Altschul SF. Gap costs for multiple sequence alignment. J Theor Biol. 1989 Jun 8;138(3):297–309. [PubMed]
  • Altschul SF, Carroll RJ, Lipman DJ. Weights for data related by a tree. J Mol Biol. 1989 Jun 20;207(4):647–653. [PubMed]
  • Thompson JD, Higgins DG, Gibson TJ. Improved sensitivity of profile searches through the use of sequence weights and gap excision. Comput Appl Biosci. 1994 Feb;10(1):19–29. [PubMed]
  • Henikoff S. Playing with blocks: some pitfalls of forcing multiple alignments. New Biol. 1991 Dec;3(12):1148–1154. [PubMed]
  • Pascarella S, Argos P. A data bank merging related protein structures and sequences. Protein Eng. 1992 Mar;5(2):121–137. [PubMed]
  • Waterman MS, Vingron M. Rapid and accurate estimates of statistical significance for sequence data base searches. Proc Natl Acad Sci U S A. 1994 May 24;91(11):4625–4628. [PMC free article] [PubMed]
  • Gribskov M, McLachlan AD, Eisenberg D. Profile analysis: detection of distantly related proteins. Proc Natl Acad Sci U S A. 1987 Jul;84(13):4355–4358. [PMC free article] [PubMed]
  • Vingron M, Argos P. Determination of reliable regions in protein sequence alignments. Protein Eng. 1990 Jul;3(7):565–569. [PubMed]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

  • PubMed
    PubMed
    PubMed citations for these articles

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...