(
A) Total number of possible bifurcating trees for different number of sequences. Computed by equation 5.1 of ref. 1. (
B) Fraction of all topologies that are examined by the neighbor-joining (NJ) (2) method in producing a final tree. For a given number of sequences (
m), the number of topologies explored by the NJ algorithm can be given by [
m(m
2 – 1)/6] – 7. This formula was derived from the observation that at each step of sequence clustering the NJ method examines
mi(
mi – 1)/2 trees for
mi ≥ 5 (assuming that each sequence pairing in the NJ algorithm examines a distinct tree), where
mi is the number of sequences at step
i. For
mi = 4, NJ examines all three possible trees. Because the number of sequences decreases by one in each step of sequence clustering, the total number of topologies examined is given by

. (
C) Proportion of data sets in which the original Tamura–Nei (TN) (3) distance was not applicable for one or more pairwise comparisons for type B model trees (filled symbols) and type C model trees (open symbols) in Fig. 2. The expected maximum distance was 0.47 for 32-sequence trees and 0.61 for 4,096-sequence trees. The simulation procedure was as follows. For a given model tree, a data set of extant nucleotide sequences of
n = 1,000 was generated by using the TN model with
k1 =
k2 = 20,
gA =
gT = 0.4 and
gC =
gG = 0.1. For each model tree, 100 data sets were generated, and the proportion of data sets in which unestimable distances occurred was computed.