Display Settings:

Format

Send to:

Choose Destination

    Bull Math Biol. 2005 Jan;67(1):169-91.

    Toward an accurate statistics of gapped alignments.

    Kschischo M, Lässig M, Yu YK.

    University of Applied Sciences Koblenz, RheinAhrCampus Remagen, Südallee 2, 53424 Remagen, Germany. kschischo@rheinahrcampus.de

    Sequence alignment has been an invaluable tool for finding homologous sequences. The significance of the homology found is often quantified statistically by p-values. Theory for computing p-values exists for gapless alignments [Karlin, S., Altschul, S.F., 1990. Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc. Natl. Acad. Sci. USA 87, 2264-2268; Karlin, S., Dembo A., 1992. Limit distributions of maximal segmental score among Markov-dependent partial sums. Adv. Appl. Probab. 24, 13-140], but a full generalization to alignments with gaps is not yet complete. We present a unified statistical analysis of two common sequence comparison algorithms: maximum-score (Smith-Waterman) alignments and their generalized probabilistic counterparts, including maximum-likelihood alignments and hidden Markov models. The most important statistical characteristic of these algorithms is the distribution function of the maximum score S(max), resp. the maximum free energy F(max), for mutually uncorrelated random sequences. This distribution is known empirically to be of the Gumbel form with an exponential tail P(S(max)>x) approximately exp(-lambdax) for maximum-score alignment and P(F(max)>x) approximately exp(-lambdax) for some classes of probabilistic alignment. We derive an exact expression for lambda for particular probabilistic alignments. This result is then used to obtain accurate lambda values for generic probabilistic and maximum-score alignments. Although the result demonstrated uses a simple match-mismatch scoring system, it is expected to be a good starting point for more general scoring functions.

    PMID: 15691544 [PubMed - indexed for MEDLINE]

    Supplemental Content

    Click here to read Click here to read