ON THE INTERPRETATION OF BOOTSTRAP TREES - APPROPRIATE THRESHOLD OF CLADE SELECTION AND INDUCED GAIN

Authors
Citation
V. Berry et O. Gascuel, ON THE INTERPRETATION OF BOOTSTRAP TREES - APPROPRIATE THRESHOLD OF CLADE SELECTION AND INDUCED GAIN, Molecular biology and evolution, 13(7), 1996, pp. 999-1011
Citations number
18
Categorie Soggetti
Biology
ISSN journal
07374038
Volume
13
Issue
7
Year of publication
1996
Pages
999 - 1011
Database
ISI
SICI code
0737-4038(1996)13:7<999:OTIOBT>2.0.ZU;2-N
Abstract
In this study we address the problem of interpreting a bootstrap tree. The main issue is choosing the threshold of clade selection in order to separate reliable clades from unreliable ones, depending on their b ootstrap proportion. This threshold depends on the chosen error measur e. We investigate error measures that stem from a generalization of Ro binson and Foulds' (1981) distance, used to quantify the divergence be tween the true phylogeny and the estimated trees. We propose two analy tical approximations of the optimum threshold of clade selection to in terpret (i.e., reduce) the bootstrap tree. We performed extensive simu lations along the lines of Kuhner and Felsenstein (1994) using the nei ghbor-joining and the maximum-parsimony methods. These simulations sho w that our approximations cause only small losses in quality when comp ared to the optimum threshold resulting from empirical observation. Ne xt, we measured the error reduction achieved when estimating the true phylogeny by the properly reduced bootstrap tree rather than by the co mplete original tree, obtained with a classical tree-building method. Our simulations on short sequences show that an error reduction of 39% is achieved with the parsimony method and an error reduction of 33% i s achieved with the distance method when the error is measured with th e standard Robinson and Foulds distance. The observed error reduction is shown to originate from an important decrease in Type I error (wron g inferences), while Type II error (omitted correct clades) is only sl ightly increased. Greater error reduction is achieved when shorter seq uences are used, and when more importance is given to Type I error tha n to Type II error. To investigate the causes of error from another po int of view, we propose a general decomposition of the error expectati on in two terms of bias, and one of variance. Results for these terms show that no fundamental bias is introduced by the bootstrap process, the only source of bias being structural (lack of resolution). Moreove r, the variance in the estimations is greatly reduced, providing anoth er explanation for the better results of the reduced bootstrap tree co mpared with the original tree estimate.