V. Berry et O. Gascuel, ON THE INTERPRETATION OF BOOTSTRAP TREES - APPROPRIATE THRESHOLD OF CLADE SELECTION AND INDUCED GAIN, Molecular biology and evolution, 13(7), 1996, pp. 999-1011
In this study we address the problem of interpreting a bootstrap tree.
The main issue is choosing the threshold of clade selection in order
to separate reliable clades from unreliable ones, depending on their b
ootstrap proportion. This threshold depends on the chosen error measur
e. We investigate error measures that stem from a generalization of Ro
binson and Foulds' (1981) distance, used to quantify the divergence be
tween the true phylogeny and the estimated trees. We propose two analy
tical approximations of the optimum threshold of clade selection to in
terpret (i.e., reduce) the bootstrap tree. We performed extensive simu
lations along the lines of Kuhner and Felsenstein (1994) using the nei
ghbor-joining and the maximum-parsimony methods. These simulations sho
w that our approximations cause only small losses in quality when comp
ared to the optimum threshold resulting from empirical observation. Ne
xt, we measured the error reduction achieved when estimating the true
phylogeny by the properly reduced bootstrap tree rather than by the co
mplete original tree, obtained with a classical tree-building method.
Our simulations on short sequences show that an error reduction of 39%
is achieved with the parsimony method and an error reduction of 33% i
s achieved with the distance method when the error is measured with th
e standard Robinson and Foulds distance. The observed error reduction
is shown to originate from an important decrease in Type I error (wron
g inferences), while Type II error (omitted correct clades) is only sl
ightly increased. Greater error reduction is achieved when shorter seq
uences are used, and when more importance is given to Type I error tha
n to Type II error. To investigate the causes of error from another po
int of view, we propose a general decomposition of the error expectati
on in two terms of bias, and one of variance. Results for these terms
show that no fundamental bias is introduced by the bootstrap process,
the only source of bias being structural (lack of resolution). Moreove
r, the variance in the estimations is greatly reduced, providing anoth
er explanation for the better results of the reduced bootstrap tree co
mpared with the original tree estimate.