Dm. Hillis et Jj. Bull, AN EMPIRICAL-TEST OF BOOTSTRAPPING AS A METHOD FOR ASSESSING CONFIDENCE IN PHYLOGENETIC ANALYSIS, Systematic biology, 42(2), 1993, pp. 182-192
Bootstrapping is a common method for assessing confidence in phylogene
tic analyses. Although bootstrapping was first applied in phylogenetic
s to assess the repeatability of a given result, bootstrap results are
commonly interpreted as a measure of the probability that a phylogene
tic estimate represents the true phylogeny. Here we use computer simul
ations and a laboratory-generated phylogeny to test bootstrapping resu
lts of parsimony analyses, both as measures of repeatability (i.e., th
e probability of repeating a result given a new sample of characters)
and accuracy (i.e., the probability that a result represents the true
phylogeny). Our results indicate that any given bootstrap proportion p
rovides an unbiased but highly imprecise measure of repeatability, unl
ess the actual probability of replicating the relevant result is nearl
y one. The imprecision of the estimate is great enough to render the e
stimate virtually useless as a measure of repeatability. Under conditi
ons thought to be typical of most phylogenetic analyses, however, boot
strap proportions in majority-rule consensus trees provide biased but
highly conservative estimates of the probability of correctly inferrin
g the corresponding clades. Specifically, under conditions of equal ra
tes of change, symmetric phylogenies, and internodal change of less-th
an-or-equal-to 20% of the characters, bootstrap proportions of greater
-than-or-equal-to 70% usually correspond to a probability of greater-t
han-or-equal-to 95% that the corresponding clade is real. However, und
er conditions of very high rates of internodal change (approaching ran
domization of the characters among taxa) or highly unequal rates of ch
ange among taxa, bootstrap proportions >50% are overestimates of accur
acy.