Scatterplots show the correlation between D, the fractional identity in the full alphabet computed from CLUSTAL_W alignments, and the fractional common k-mer count (F, equation 3) or the k-mer distance (Y, equation 4). We show three selected cases: the full alphabet A, k = 3 [the parameters used in (16)]; Dayhoff(6), k = 6 (used by MAFFT); and an intermediate case, SE-B(10), k = 4. Note that the relationship between D and Y is approximately linear.