- We are sorry, but NCBI web applications do not support your browser and may not function properly. More information

# Bootstrap confidence levels for phylogenetictrees

^{†}Department of Statistics, Stanford University, Stanford, CA 94305; and

^{‡}Department of Biostatistics, Rollins School of Public Health, Emory University, Atlanta, GA 30322

^{§}Permanent address: Biometrie–Institut National de la Recherche Agronomique, Montpellier, France.

*Proc. Natl. Acad. Sci. USA*(

**93,**7085–7090) is reprinted in its entirety with the author’s corrections incorporated.

Bradley Efron

## Abstract

Evolutionary trees are often estimated from DNA or RNA sequence
data. How much confidence should we have in the estimated trees? In
1985, Felsenstein [Felsenstein, J. (1985) *Evolution* 39,
783–791] suggested the use of the bootstrap to answer this question.
Felsenstein’s method, which in concept is a straightforward
application of the bootstrap, is widely used, but has been criticized
as biased in the genetics literature. This paper concerns the use of
the bootstrap in the tree problem. We show that Felsenstein’s method
is not biased, but that it can be corrected to better agree with
standard ideas of confidence levels and hypothesis testing. These
corrections can be made by using the more elaborate bootstrap method
presented here, at the expense of considerably more computation.

The bootstrap, as described in ref. 1, is a computer-based technique for assessing the accuracy of almost any statistical estimate. It is particularly useful in complicated nonparametric estimation problems, where analytic methods are impractical. Felsenstein (2) introduced the use of the bootstrap in the estimation of phylogenetic trees. His technique, which has been widely used, provides assessments of “confidence” for each clade of an observed tree, based on the proportion of bootstrap trees showing that same clade. However Felsenstein’s method has been criticized as biased. Hillis and Bull’s paper (3), for example, says that the bootstrap confidence values are consistently too conservative (i.e., biased downward) as an assessment of the tree’s accuracy.

Is the bootstrap biased for the assessment of phylogenetic trees? We
will show that the answer is no, at least to a first order of
statistical accuracy. Felsenstein’s method provides a reasonable first
approximation to the actual confidence levels of the observed clades.
More ambitious bootstrap methods can be fashioned to give still better
assessments of confidence. We will describe one such method and apply
it to the estimation of a phylogenetic tree for the malaria parasite
*Plasmodium*.

### Bootstrapping Trees

Fig. Fig.11 shows part of a data set used to
construct phylogenetic trees for malaria. The data are the aligned
sequences of small subunit RNA genes from 11 malaria species of the
genus *Plasmodium*. The 11 × 221 data matrix we will first
consider is composed of the 221 polytypic sites. Fig. Fig.11 shows the first
20 columns of **x**. There are another 1399 monotypic sites,
where the 11 species are identical.

*Plasmodium*. Shown are the first 20 columns of the 11 × 221 matrix

**x**of polytypic sites used in most of the analyses below. The final analysis of the last section

**...**

Fig. Fig.22 shows a phylogenetic tree constructed
from **x**. The tree-building algorithm proceeds in two main
steps: (*i*) an 11 × 11 distance matrix is
constructed for the 11 species, measuring differences between the
row vectors of **x**; and (*ii*) is
converted into a tree by a connection algorithm that connects the
closest two entries (species 9 and 10 here), reduces
to a 10 × 10 matrix according to some merging rule, connects the two
closest entries of the new *D* matrix, etc.

We can indicate the tree-building process schematically as

the hats indicating that we are dealing with estimated quantities.
A deliberately simple choice of algorithms was made in constructing
Fig. Fig.2:2: was the matrix of the Euclidean distances
between the rows of **x**, with (*A*,* G*,* C*,* T*) interpreted numerically as (1, 2,
5, 6), while the connection algorithm merged nodes by maximization.
Other, better, tree-building algorithms are available, as mentioned
later in the paper. Some of these, such as the maximum parsimony
method, do not involve a distance matrix, and some use all of the
sites, including the monotypical ones. The discussion here applies just
as well to all such tree-building algorithms.

Felsenstein’s method proceeds as follows. A bootstrap data matrix
**x*** is formed by randomly selecting 221 columns from the
original matrix **x ***with replacement*. For example the
first column of **x*** might be the 17th column of **x**,
the second might be the 209th column of **x**, the third the
17th column of **x**, etc. Then the original tree-building
algorithm is applied to **x***, giving a bootstrap
tree *,

This whole process is independently repeated some large number
*B* times, *B* = 200 in Fig. Fig.2,2, and the
proportions of bootstrap trees agreeing with the original tree are
calculated. “Agreeing” here refers to the topology of the tree
and not to the length of its arms.

These proportions are the bootstrap confidence values. For example the 9-10 clade seen in Fig. Fig.22 appeared in 193 of the 200 bootstrap trees, for an estimated confidence value of 0.965. Species 7-8-9-10 occurred as a clade in 199 of the 200 bootstrap trees, giving 0.995 confidence. (Not all of these 199 trees had the configuration shown in Fig. Fig.2;2; some instead first having 8 joined to 9-10 and then 7 joined to 8-9-10, as well as other variations.)

Felsenstein’s method is, nearly, a standard application of the
*nonparametric bootstrap*. The basic assumption, further
discussed in the next section, is that the columns of the data matrix
**x** are independent of each other and drawn from the same
probability distribution. Of course, if this assumption is a bad one,
then Felsenstein’s method goes wrong, but that is not the point of
concern here nor in the references, and we will take the independence
assumption as a given truth.

The bootstrap is more typically applied to statistics
^θ that estimate a parameter of interest
θ, both ^θ and θ being single numbers.
For example, ^θ could be the sample
correlation coefficient between the first two malaria species, Pre and
Pme, at the 221 sites, with (*A*,* G*,* C*,* T*) interpreted as (1, 2, 5, 6):
^θ = 0.616. How accurate is
^θ as an estimate of the true
correlation θ? The nonparametric bootstrap answers such
questions without making distributional assumptions.

Each bootstrap data set **x*** gives a bootstrap estimate
^θ*, in this case the sample
correlation between the first two rows of **x***. The central
idea of the bootstrap is to use the observed distribution of the
differences ^θ* −
^θ to infer the unobservable distribution
of ^θ − θ; in other words to learn about
the accuracy of ^θ. In our example, the 200
bootstrap replications of ^θ* −
^θ were observed to have expectation 0.622
and standard deviation 0.052. The inference is that
^θ is nearly unbiased for estimating θ,
with a standard error of about 0.052. We can also calculate bootstrap
confidence intervals for θ. A well-developed theory supports the
validity of these inferences [see Efron and Tibshirani (1)].

Felsenstein’s application of the bootstrap is nonstandard in
one important way: the statistic
, unlike the correlation
coefficient, does not change smoothly as a function of the data
set **x**. Rather,
is constant within large regions
of the **x**-space, and then changes discontinuously as certain
boundaries are crossed. This behavior raises questions about the
bootstrap inferences, questions that are investigated in the sections
that follow.

### A Model For The Bootstrap

The rationale underlying the bootstrap confidence values depends
on a simple multinomial probability model. There are *K*
= 4^{11} − 4 possible column vectors for **x**, the
number of vectors of length 11 based on a 4-letter alphabet, not
counting the 4 monotypic ones. Call these vectors
*X*_{1},* X*_{2}, . . .,* X*_{K}, and suppose that each observed column of **x**
is an independent selection from *X*_{1},* X*_{2}, . . .,* X*_{K}, equaling
*X*_{k} with probability π_{k}. This is the
*multinomial model* for the generation of **x**.

Denote ˜π = (π_{1},
π_{2},. . ., π_{K}), so the sum of
˜π’s coordinates is 1. The
data matrix **x** can be characterized by the proportion of its
*n* = 221 columns equalling each possible
*X*_{k}, say

with ˜ = (_{1},
_{2}, . . ., _{K}). This is a very
inefficient way to represent the data, since 4^{11} − 4 is so
much bigger than 221, but it is useful for understanding the bootstrap.
Later we will see that only the vectors *X*_{k} that
actually occur in **x** need be considered, at most *n*
of them.

Almost always the distance matrix is a function of the observed proportions ˜, so we can write the tree-building algorithm as

In a similar way the vector of true probabilities ˜π gives a true distance matrix and a true tree,

*D* would be the matrix with *ij*th
element {∑_{k}π_{k}(*X*_{ki} −* X _{kj}*)

^{2}}½ in our example, and TREE the tree obtained by applying the maximizing connection algorithm to

*D*.

Fig. Fig.33 is a schematic picture of the space of
possible ˜π vectors, divided into regions
_{1}, _{2}, . . .. The regions correspond to
different possible trees, so if
_{j} the *j*th possible tree
results. We hope that = TREE,
which is to say that
˜π and
˜ lie in the same region, or at least
that and TREE agree in their
most important aspects.

_{1},

_{2}. . . correspond to the different possible trees. In the case shown

**...**

The bootstrap data matrix **x*** has proportions of columns say

* =
(_{1}^{*},
_{2}^{*},….,
_{K}^{*}). We can indicate the
bootstrap tree-building

The hypothetical example of Fig. Fig.33 puts ˜π and ˜ in the same region, so that the estimate exactly equals the true TREE. However ˜* lies in a different region, with * not having the 9-10 clade. This actually happened in 7 out of the 200 bootstrap replications for Fig. Fig.22.

What the critics of Felsenstein’s method call its bias is the fact that the probability * = TREE is usually less than the probability = TREE. In terms of Fig. Fig.3,3, this means that ˜* has less probability than ˜ of lying in the same region as ˜π. Hillis and Bull (3) give specific simulation examples. The discussion below is intended to show that this property is not a bias, and that to a first order of approximation the bootstrap confidence values provide a correct assessment of ’s accuracy. A more valid criticism of Felsenstein’s method, discussed later, involves its relationship with the standard theory of statistical confidence levels based on hypothesis tests.

Returning to the correlation example of the previous section, it is
*not* true that ^θ* − θ
(as opposed to ^θ* −
^θ) has the same distribution
as ^θ − θ, even approximately. In fact
^θ* − θ will have nearly twice
the variance of ^θ − θ, the sum of the
variances of ^θ around θ and of
^θ* around
^θ. Similarly in Fig. Fig.33 the average
distance from ˜* to
˜π will be greater than the average distance
from ˜ to
˜π. This is the underlying reason for
results like those of Hillis and Bull, that
˜* has less probability than
˜ of lying in the same region as
˜π. However, to make valid bootstrap
inferences we need to use the observed differences between
* and
(not between
* and TREE) to infer
the differences between
and TREE. Just how this can be
done is discussed using a simplified model in the next two sections.

### A Simpler Model

The meaning of the bootstrap confidence values can be more easily
explained using a simple normal model rather than the multinomial
model. This same tactic is used in Felsenstein and Kishino (4). Now we
assume that the data **x** =
(*x*_{1},* x*_{2}) is a two
dimensional normal vector with expectation vector **μ** =
(μ_{1}, μ_{2}) and identity covariance matrix,
written

In other words *x*_{1} and
*x*_{2} are independent normal variates with
expectations μ_{1} and μ_{2}, and variances 1.
The obvious estimate of **μ** is =
**x**, and we will use this notation in what follows. The
**μ**-plane is partitioned into regions _{1},
_{2}, _{3}, . . . similarly to Fig.
Fig.3.3. We observe that lies in one of these regions,
say _{1}, and we wish to assign a confidence value to the
event that **μ** itself lies in _{1}.

Two examples are illustrated in Fig. Fig.4.4. In both
of them **x** = = (4.5,0) lies in
_{1}, one of two possible regions. Case I has
_{2} = {**μ** : **μ**_{1} ≤
3}, a half-plane, while case II has _{2} =
{**μ** : **μ** ≤3}, a disk of radius 3.

_{1}, and wish to assign a confidence value to

**μ**

_{1}. Case I,

_{2}is the region {μ

_{1}≤ 3}.

**...**

Bootstrap sampling in our simplified problem can be taken to be

This is a parametric version of the bootstrap, as in section 6.5
of Efron and Tibshirani (1), rather than the more familiar
nonparametric bootstrap considered previously, but it provides the
proper analogy with the multinomial model. The dashed circles in Fig. Fig.4
4
indicate the bootstrap density of * =
**x***, centered at . Felsenstein’s
confidence value is the bootstrap probability that
* lies in _{1}, say

The notation Prob_{} emphasizes that
the bootstrap probability is computed with fixed
and only * random. The bivariate normal model of
this section is simple enough to allow the values to be
calculated theoretically, without doing simulations,

Notice that _{II} is bigger than
_{I} because _{1} is bigger in case II.

In our normal model, * − has
the same distribution as − **μ**, both
distributions being the standard bivariate normal
*N*_{2}(**0**, I). The general idea of the
bootstrap is to use the observable bootstrap distribution of
* − to say something about
the unobservable distribution of the error −
**μ**. Notice, however, that the marginal distribution of
* − **μ** has *twice* as much
variance,

This generates the “bias” discussed previously, that
* has less probability than of
being in the same region as **μ**. But this kind of
interpretation of bootstrap results cannot give correct inferences.
Newton (5) makes a similar point, as do Zharkikh and Li (6) and
Felsenstein and Kishino (4).

We can use a Bayesian model to show that is a reasonable
assessment of the probability that _{1} contains μ.
Suppose we believe *apriori* that **μ** could lie
anywhere in the plane with equal probability. Then having observed
, the *aposteriori* distribution of
**μ** given is
*N*_{2}(,* I*), exactly
the same as the bootstrap distribution of *. In
other words, is the *aposteriori* probability of the
event **μ** _{1}, if we begin with an
“uninformative” prior density for **μ**.

Almost the same thing happens in the multinomial model. The
bootstrap probability that * =
is almost the same as
the *aposteriori* probability that TREE =
starting from an uninformative
prior density on ˜π [see section 10.6 of
Efron (7)]. The same statement holds for any part of the tree, for
example the existence of the 9-10 clade in Fig. Fig.2.2. There are reasons
for being skeptical about the Bayesian argument, as discussed in the
next section. However, the argument shows that Felsenstein’s bootstrap
confidence values are at least reasonable and certainly cannot be
universally biased downward.

### Hypothesis-Testing Confidence Levels

Fig. Fig.55 illustrates another more customary
way of assigning a confidence level to the event **μ**
_{1}. In both case I and case II,
_{0} = (3, 0) is the closest point to
on the boundary separating _{1} from
_{2}. We now bootstrap from
_{0} rather than from
, obtaining bootstrap data vectors

[The double star notation is intended to avoid confusion with the
previous bootstrap vectors **x*** ~
*N*_{2}(,* I*).]
The *confidence level* for the event **μ**
_{1} is the probability that the bootstrap vector
** = **x**** lies closer than
to the boundary. This has a familiar
interpretation: 1 − is the rejection level, one-sided, of
the usual likelihood ratio test of the null hypothesis that
**μ** does *not* lie in _{1}. Here we are
computing numerically, rather than relying on an asymptotic
χ^{2} approximation. In a one-dimensional testing problem, 1
− would exactly equal the usual *p* value obtained
from the test of the null hypothesis that the true parameter value lies
in _{2}.

Once again it is simple to compute the confidence level for our two cases, at least numerically,

In the first case _{I} equals
_{I}, the Felsenstein bootstrap confidence value.
However _{II} = 0.914 is less than
_{II} = 0.949.

Why do the answers differ? Comparing Figs. Figs.44 and and5,5, we see that,
roughly speaking, the confidence value is a probabilistic
measure of the distance from to the boundary,
while the confidence level measures distance from the
boundary to . The two ways of measuring distance
agree for the straight boundary, but not for the curved boundary of
case II. Because the boundary curves *away* from
, the confidence value is increased from
the straight-line case. However the set of vectors further than
from the boundary curves *toward*
_{0}, which decreases . We would
get the opposite results if the boundary between _{1} and
_{2} curved in the other direction.

The confidence level , rather than , provides the
more usual assessment of statistical belief. For example in case II let
θ = **μ** be the length of the expectation vector
**μ**. Then = 0.914 is the usual confidence level
attained for the event {θ ≥ 3}, based on observing
= = 4.5. And {θ ≥ 3} is the same as
the event {**μ** _{1}}.

Using the confidence value is equivalent to assuming a flat
Bayesian prior for **μ**. It can be shown that using
amounts, approximately, to assuming a different prior density for
**μ**, one that depends on the shape of the boundary. In case
II this prior is uniform on polar coordinates for **μ**,
rather than uniform on the original rectangular coordinates [see
Tibshirani (8)].

### The Relationship Between the Two Measures of Confidence

There is a simple approximation formula for converting a
Felsenstein confidence value to a hypothesis-testing
confidence level . This formula is conveniently expressed in
terms of the cumulative distribution function
**Φ**(*z*) of a standard one-dimensional normal
variate, and its inverse function **Φ**^{−1}:
**Φ**(1.645) = 0.95, **Φ**^{−1}(0.95) =
1.645, etc. We define the “*z* values” corresponding to
and ,

In case II, =
**Φ**^{−1}(0.949) = 1.64 and =
**Φ**^{−1}(0.914) = 1.37.

Now let ** ~* N*_{2}(_{0},*I*)
as in Fig. Fig.5,5, and define

For case I it is easy to see that *z*_{0} =
**Φ**^{−1}(0.50) = 0. For case II, standard
calculations show that *z*_{0} =
**Φ**^{−1}(0.567) = 0.17.

In normal problems of the sort shown in Figs. Figs.44 and and55 we can
approximate in terms of and
*z*_{0}:

Formula **1** is developed in Efron (9), where it is shown
to have “second order accuracy.” This means that in repeated
sampling situations [where we observe independent data vectors
*x*_{1}*,
x*_{2}*,* . . . ,* x*_{n} ~ *N*_{2}(**μ**,* I*) and estimate **μ** by =
∑_{i}^{n}=
*x _{i}/n*]

*z*

_{0}is of order 1/, and formula

**1**estimates with an error of order only 1/

*n*.

Second-order accuracy is a large sample property, but it usually
indicates good performance in actual problems. For case I, Eq.
**1** correctly predicts =
, both equalling **Φ**^{−1}(0.933)
= 1.50. For case II the prediction is = 1.64 −
0.34 = 1.30, compared with the actual value =
1.37.

Formula **1** allows us to compute the confidence level
for the event {**μ** _{1}}
solely in terms of bootstrap calculations, no matter how complicated
the boundary may be. A first level of bootstrap replications with
* ~
*N*_{2}(,* I*) gives
bootstrap data vectors *(1),
*(2), . . ., *(*B*),
from which we calculate

A second level of bootstrap replications with
** ~
*N*_{2}(_{0},*I*),
giving say **(1), **(2), . . .,
**(*B*_{2}), allows us to
calculate

Then formula **1** gives =
− 2*z*_{0}.

As few as *B* = 100, or even 50, replications
* are enough to provide a rough but useful estimate
of the confidence value . However, because the difference
between =
**Φ**^{−1}() and =
**Φ**^{−1}() is relatively small,
considerably larger bootstrap samples are necessary to make formula
**1** worthwhile. The calculations in section 9 of Efron (9)
suggest both *B* and *B*_{2} must be on the
order of at least 1000. This point did not arise in cases I and II
where we were able to do the calculations by direct numerical
integration, but it is important in the kind of complicated
tree-construction problems we are actually considering.

We now return to the problem of trees, as seen in Fig. Fig.2.2. The version
of formula **1** that applies to the multinomial model of Fig. Fig.3
3
is

Here “*a*” is the *acceleration constant*
introduced in ref. 9. It is quite a bit easier to calculate than
*z*_{0}, as shown in the next section. Formula
**2** is based on the bootstrap confidence intervals called
“*BC*_{a}” in ref. 9.

If we tried to draw Fig. Fig.33 accurately we would find that the
multi-dimensional boundaries were hopelessly complicated. Nevertheless,
formula **2** allows us to obtain a good approximation to the
hypothesis-testing confidence level =
**Φ**() solely in terms of bootstrap
computations. How to do so is illustrated in the next section.

### An Example Concerning the Malaria Data

Fig. Fig.22 shows an estimated confidence value of

for the existence of the 9-10 clade on the malaria evolutionary
tree. This value was based on *B* = 200 bootstrap
replications, but (with some luck) it agrees very closely with the
value = 0.962 obtained from *B* = 2000
replications. How does compare with , the
hypothesis-testing confidence level for the 9-10 clade? We will show
that

(or = 0.938 if we begin with = 0.962 instead of 0.965). To put it another way, our nonconfidence in the 9-10 clade goes from 1 − = 0.035 to 1 − = 0.058, a substantial change.

We will describe, briefly, the computational steps necessary to compute
. To do so we need notation for multinomial sampling. Let
**P** = (*P*_{1},
*P*_{2}, . . ., *P*_{n}) indicate a
probability vector on *n* = 221 components, so the
entries of the vector **P** are nonnegative numbers summing to
1. The notation

will indicate that **P*** =
(*P*_{1}^{*},
*P*_{2}^{*},…,
*P*_{n}^{*}) is the vector of proportions
obtained in a multinomial sample of size *n* from
**P**. In other words we independently draw integers
*I*_{1}^{*},* I*_{2}^{*}, . .
,* I*_{n}^{*} from {1, 2, . . ,* n*} with probability *P*_{k} on *k*,
and record the proportions *P*_{k}^{*} =
#{*I*_{i}^{*} =* k*}/*n*. This is the kind of multinomial sampling
pictured in Fig. Fig.3,3, expressed more efficiently in terms of
*n* = 221 coordinates instead of *K* =
4^{11} − 4.

Each vector **P*** is associated with a data matrix
**x*** that has proportion
*P*_{k}^{*} of its columns equal to the
*k*th column of the original data matrix **x**. Then
**P*** determines a distance matrix and a tree according to the
original tree-building algorithm,

The “central” vector

corresponds to the original data matrix **x** and the
original tree . Notice
that taking **P*** ~Mult(**P**^{(cent)})
amounts to doing ordinary bootstrap sampling, since then **x***
has its columns chosen independently and with equal probability from
the columns of **x**.

Resampling from **P**^{(cent)} means that each of the
221 columns is equally likely, but this is not the same as all possible
11 vectors being equally likely. There were only 149
*distinct* 11 vectors among the columns of **x**, and
these are the only ones that can appear in **x***. The vector
*TTTTCTTTTTT* appeared seven times among the columns of
**x**, so it shows up seven times as frequently in the columns
of **x***, compared with *ATAAAAAAAAA* which appeared
only once in **x**.

Here are the steps in the computation of .

*Step 1. B* = 2000 first-level bootstrap vectors
**P***(1),** P***(2), . . .,** P***(*B*)
were obtained as independent multinomials **P*** ~
Mult(**P**^{(cent)}). Some 1923 of the corresponding
bootstrap trees had the 9-10 clade, giving the estimate =
0.962 = 1923/2000.

*Step 2.* The first 200 of these included seven cases without
the 9-10 clade. Call the seven **P*** vectors
**P**^{(1)},** P**^{(2)}, . . .,** P**^{(7)}. For each of them, a value of *w*
between 0 and 1 was found such that the vector

was right on the 9-10 boundary. The vectors
**p**^{(j)} play the role of
_{0} in Fig. Fig.55.

Finding *w* is easy using a one-dimensional binary search
program, as on page 90 of ref. 10. At each step of the search it is
only necessary to check whether or not the current value of
*w***P**^{(j)} + (1 −
*w*)**P**^{(cent)} gives a tree having the
9-10 clade. Twelve steps of the binary search, the number used here,
locates the boundary value of *w* within 1/2^{12}.
The vectors **p**^{(j)} play the role of
_{0} in Fig. Fig.55.

*Step 3.* For each of the boundary vectors
**p**^{(j)} we generated *B*_{2} =
400 second-level bootstrap vectors

computed the corresponding tree, and counted the number of trees having the 9-10 clade. The numbers were as follows for the seven cases:

From the total we calculated an estimate of the correction term
*z*_{0} in formula **2**,

Binomial calculations indicate that *z*_{0} =
0.0995 has a standard error of about 0.02 due to the bootstrap sampling
(that is, due to taking 2800 instead of all possible bootstrap
replications), so 2800 is not lavishly excessive. Notice that we could
have started with the 77 out of the 2000 **P*** vectors not
having the 9-10 clade, rather than the 7 out of the first 200, and
taken *B*_{2} = 40 for each
**p**^{(j)}, giving about the same total second-level
sample.

*Step 4*. The acceleration constant “*a*”
appearing in formula **2** depends on the direction from
**P**^{(cent)} to the boundary, as explained in section
8 of ref. 9. For a given direction vector **U**,

Taking **U** = **p**^{(j)} −** P**^{(cent)} for each of the seven cases gave

*Step 5*. Finally we applied formula **2** with
= **Φ**^{−1}(0.962) = 1.77,
*z*_{0} = 0.0995, and *a* = 0.0129, to
get = 1.54, or =
**Φ**() = 0.938. If we begin with
= **Φ**^{−1}(0.965) then
= 0.942.

Notice that in this example we could say that Felsenstein’s bootstrap
confidence value was biased *upward*, not downward,
at least compared with the hypothesis-testing level . This
happened because *z*_{0} was positive, indicating
that the 9-10 boundary was curving away from
**P**^{(cent)}, just as in case 2 of Fig. Fig.5.5. The
opposite can also occur, and in fact did for other clades. For example
the clade at the top of Fig. Fig.22 that includes all of the species except
lizard (species 2) had = 0.775 compared with =
0.875.

We carried out these same calculations using the more efficient tree-building algorithm employed in Escalante and Ayala (11); that is we used Felsenstein’s phylip package (12) on the complete RNA sequences, neighbor-joining trees based on Kimura’s (13) two-parameter distances.

In order to vary our problem slightly, we looked at the clade 7-8 (Pfr-Pkn), which is more questionable than the 9-10 clade. The tree produced from the original set is:

*Step 1*. *B* = 2000 first-level bootstrap
vectors. Some 1218 of the corresponding bootstrap trees had the 7-8
clade, giving the estimate = 0.609 = 1218/2000.

*Step 2*. We took, as before, seven cases without the 7-8
clade, and for each one found a multinomial vector near the 7-8
boundary.

*Step 3*. For each of the boundary vectors
**p**^{(j)} we generated *B*_{2} =
400 second-level bootstrap vectors

computed the corresponding tree, and counted the number of trees having the 7-8 clade. The numbers were as follows for the seven cases:

From the total we calculated an estimate of the correction term
*z*_{0} in formula **2**,

*Step 4*. The acceleration constant “*a*”
appearing in formula **2** was computed as before giving:

*Step 5*. Finally we applied formula 2 with
= **Φ**^{−1}(0.609) = 0.277
to get = 0.417, or =
**Φ**() = 0.662. In this case is
bigger than , reflecting the fact that the 7-8 boundary curves
toward the central point, at least in a global sense.

Computing is about 20 times as much work as , but it is work for the computer and not for the investigator. Once the tree-building algorithm is available, all of the computations require no more than applying this algorithm to resampled versions of the original data set.

### Discussion and Summary

The discussion in this paper, which has gone lightly over many technical details of statistical inference, makes the following main points about the bootstrapping of phylogenetic trees.

(*i*) The confidence values obtained by
Felsenstein’s bootstrap method are not biased systematically downward.

(*ii*) In a Bayesian sense, the can be thought of as
reasonable assessments of error for the estimated tree.

(*iii*) More familiar non-Bayesian confidence levels
can also be defined. Typically and will converge as
the number *n* of independent sites grows large, at rate
1/.

(*iv*) The can be estimated by a two-level bootstrap
algorithm.

(*v*) As few as 100 or even 50 bootstrap replications
can give useful estimates of , while estimates
require at least 2000 total replications. None of the computations
requires more than applying the original tree-building algorithm to
resampled data sets.

## Acknowledgments

We are grateful to A. Escalante and F. Ayala (14) for providing us with these data. B.E. is grateful for support from Public Health Service Grant 5 R01 CA59039-20 and National Science Foundation Grant DMS95-04379. E.H. is supported by National Science Foundation Grant DMS94-10138 and National Institutes of Health Grant NIAID R29-A131057.

## Footnotes

The publication costs of this
article were defrayed in part by page
charge payment. This article
must therefore be hereby marked
“*advertisement*”
in accordance with 18 U.S.C.
§1734 solely to indicate this fact.

## References

*SIAM CBMS-NSF Monogr.*

**38**.

**National Academy of Sciences**

## Formats:

- Article |
- PubReader |
- ePub (beta) |
- PDF (294K)

- Bootstrap confidence levels for phylogenetic trees.[Proc Natl Acad Sci U S A. 1996]
*Efron B, Halloran E, Holmes S.**Proc Natl Acad Sci U S A. 1996 Jul 9; 93(14):7085-90.* - Bootstrap hypothesis tests for evolutionary trees and other dendrograms.[Proc Natl Acad Sci U S A. 1994]
*Brown JK.**Proc Natl Acad Sci U S A. 1994 Dec 6; 91(25):12293-7.* - A phylogenetic comparison of gene trees constructed from plastid, mitochondrial and genomic DNA of Plasmodium species.[Mol Biochem Parasitol. 2001]
*Rathore D, Wahl AM, Sullivan M, McCutchan TF.**Mol Biochem Parasitol. 2001 Apr 25; 114(1):89-94.* - Molecular phylogeny of plethodonine salamanders and hylid frogs: statistical analysis of protein comparisons.[Mol Biol Evol. 1991]
*Highton R.**Mol Biol Evol. 1991 Nov; 8(6):796-818.* - Tree disagreement: measuring and testing incongruence in phylogenies.[J Biomed Inform. 2006]
*Planet PJ.**J Biomed Inform. 2006 Feb; 39(1):86-102. Epub 2005 Sep 28.*

- Feline immunodeficiency virus (FIV) env recombinants are common in natural infections[Retrovirology. ]
*Bęczkowski PM, Hughes J, Biek R, Litster A, Willett BJ, Hosie MJ.**Retrovirology. 11(1)80* - Quantitative Relationship of Soil Texture with the Observed Population Density Reduction of Heterodera glycines after Annual Corn Rotation in Nebraska[Journal of Nematology. 2014]
*Pérez-Hernández O, Giesler LJ.**Journal of Nematology. 2014 Jun; 46(2)90-100* - A single evolutionary innovation drives the deep evolution of symbiotic N2-fixation in angiosperms[Nature Communications. ]
*Werner GD, Cornwell WK, Sprent JI, Kattge J, Kiers ET.**Nature Communications. 54087* - Gene set bagging for estimating the probability a statistically significant result will replicate[BMC Bioinformatics. ]
*Jaffe AE, Storey JD, Ji H, Leek JT.**BMC Bioinformatics. 14360* - Ancient Origins of Vertebrate-Specific Innate Antiviral Immunity[Molecular Biology and Evolution. 2014]
*Mukherjee K, Korithoski B, Kolaczkowski B.**Molecular Biology and Evolution. 2014 Jan; 31(1)140-153*

- Cited in BooksCited in BooksPubMed Central articles cited in books
- MedGenMedGenRelated information in MedGen
- PubMedPubMedPubMed citations for these articles
- TaxonomyTaxonomyRelated taxonomy entry
- Taxonomy TreeTaxonomy Tree

- Bootstrap confidence levels for phylogenetictreesBootstrap confidence levels for phylogenetictreesProceedings of the National Academy of Sciences of the United States of America. Nov 12, 1996; 93(23)13429PMC

Your browsing activity is empty.

Activity recording is turned off.

See more...