|
| |
Gap Costs and Lambda Ratios |
Gap Costs
The raw score of an alignment is the sum of the scores for aligning
pairs of residues and the scores for gaps. Gapped BLAST and PSI-BLAST use
"affine gap costs" which charge the score -a for the existence of a gap,
and the score -b for each residue in the gap. Thus a gap of k residues
receives a total score of -(a+bk); specifically, a gap of length 1 receives
the score -(a+b).
Lambda Ratio
To convert a raw score S into a normalized score S' expressed in bits,
one uses the formula S' = (lambda*S - ln K)/(ln 2), where lambda and K
are parameters dependent upon the scoring system (substitution matrix and
gap costs) employed [7-9]. For determining S', the more important of these
parameters is lambda. The "lambda ratio" quoted here is the ratio of the
lambda for the given scoring system to that for one using the same substitution
scores, but with infinite gap costs [8]. This ratio indicates what
proportion of information in an ungapped alignment must be sacrificed in
the hope of improving its score through extension using gaps. We have found
empirically that the most effective gap costs tend to be those with lambda
ratios in the range 0.8 to 0.9.
[1]
Altschul, S.F. (1991) "Amino acid substitution matrices from an information
theoretic perspective." J. Mol. Biol. 219:555-565.
[2] States, D.J., Gish, W. & Altschul, S.F. (1991) "Improved sensitivity
of
nucleic acid database searches using application-specific
scoring matrices."
Methods 3:66-70.
[3]
Altschul, S.F. (1993) "A protein alignment scoring system sensitive at
all
evolutionary distances." J. Mol. Evol. 36:290-300.
[4]
Henikoff, S. & Henikoff, J.G. (1992) "Amino acid substitution matrices
from
protein blocks." Proc. Natl. Acad. Sci. USA 89:10915-10919.
[5] Dayhoff, M.O., Schwartz, R.M. & Orcutt, B.C. (1978) "A model
of evolutionary
change in proteins." In "Atlas of Protein Sequence
and Structure, vol. 5,
suppl. 3," M.O. Dayhoff (ed.), pp. 345-352, Natl.
Biomed. Res. Found.,
Washington, DC.
[6] Schwartz, R.M. & Dayhoff, M.O. (1978) "Matrices for detecting
distant
relationships." In "Atlas of Protein Sequence and
Structure, vol. 5,
suppl. 3," M.O. Dayhoff (ed.), pp. 353-358, Natl.
Biomed. Res. Found.,
Washington, DC.
[7]
Karlin, S. & Altschul, S.F. (1990) "Methods for assessing the statistical
significance of molecular sequence features by using
general scoring
schemes." Proc. Natl. Acad. Sci. USA 87:2264-2268.
[8]
Altschul, S.F. & Gish, W. (1996) "Local alignment statistics." Meth.
Enzymol. 266:460-480.**
[9]
Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z.,
Miller,
W. & Lipman, D.J. (1997) "Gapped BLAST and PSI-BLAST:
a new generation of
protein database search programs." Nucleic Acids
Res. 25:3389-3402.
Revised Oct. 11, 2000 |