Gap Costs and Lambda Ratios
PubMed Entrez BLAST OMIM Taxonomy Structure

  Gap Costs and Lambda Ratios

Gap Costs

The raw score of an alignment is the sum of the scores for aligning pairs of residues and the scores for gaps. Gapped BLAST and PSI-BLAST use "affine gap costs" which charge the score -a for the existence of a gap, and the score -b for each residue in the gap. Thus a gap of k residues receives a total score of -(a+bk); specifically, a gap of length 1 receives the score -(a+b).

Lambda Ratio

To convert a raw score S into a normalized score S' expressed in bits, one uses the formula S' = (lambda*S - ln K)/(ln 2), where lambda and K are parameters dependent upon the scoring system (substitution matrix and gap costs) employed [7-9]. For determining S', the more important of these parameters is lambda. The "lambda ratio" quoted here is the ratio of the lambda for the given scoring system to that for one using the same substitution scores, but with  infinite gap costs [8]. This ratio indicates what proportion of information in an ungapped alignment must be sacrificed in the hope of improving its score through extension using gaps. We have found empirically that the most effective gap costs tend to be those with lambda ratios in the range 0.8 to 0.9.

[1] Altschul, S.F. (1991) "Amino acid substitution matrices from an information
    theoretic perspective." J. Mol. Biol. 219:555-565.
[2] States, D.J., Gish, W. & Altschul, S.F. (1991) "Improved sensitivity of
    nucleic acid database searches using application-specific scoring matrices."
    Methods 3:66-70.
[3] Altschul, S.F. (1993) "A protein alignment scoring system sensitive at all
    evolutionary distances." J. Mol. Evol. 36:290-300.
[4] Henikoff, S. & Henikoff, J.G. (1992) "Amino acid substitution matrices from
    protein blocks." Proc. Natl. Acad. Sci. USA 89:10915-10919.
[5] Dayhoff, M.O., Schwartz, R.M. & Orcutt, B.C. (1978) "A model of evolutionary
    change in proteins." In "Atlas of Protein Sequence and Structure, vol. 5,
    suppl. 3," M.O. Dayhoff (ed.), pp. 345-352, Natl. Biomed. Res. Found.,
    Washington, DC.
[6] Schwartz, R.M. & Dayhoff, M.O. (1978) "Matrices for detecting distant
    relationships." In "Atlas of Protein Sequence and Structure, vol. 5,
    suppl. 3," M.O. Dayhoff (ed.), pp. 353-358, Natl. Biomed. Res. Found.,
    Washington, DC.
[7] Karlin, S. & Altschul, S.F. (1990) "Methods for assessing the statistical
    significance of molecular sequence features by using general scoring
    schemes." Proc. Natl. Acad. Sci. USA 87:2264-2268.
[8] Altschul, S.F. & Gish, W. (1996) "Local alignment statistics." Meth.
    Enzymol. 266:460-480.**
[9] Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller,
    W. & Lipman, D.J. (1997) "Gapped BLAST and PSI-BLAST: a new generation of
    protein database search programs." Nucleic Acids Res. 25:3389-3402.

Revised Oct. 11, 2000