INTERIOR-BRANCHED AND BOOTSTRAP TESTS OF PHYLOGENETIC TREES

Citation
T. Sitnikova et al., INTERIOR-BRANCHED AND BOOTSTRAP TESTS OF PHYLOGENETIC TREES, Molecular biology and evolution, 12(2), 1995, pp. 319-333
Citations number
31
Categorie Soggetti
Biology
ISSN journal
07374038
Volume
12
Issue
2
Year of publication
1995
Pages
319 - 333
Database
ISI
SICI code
0737-4038(1995)12:2<319:IABTOP>2.0.ZU;2-5
Abstract
We have compared statistical properties of the interior-branch and boo tstrap tests of phylogenetic trees when the neighbor-joining tree-buil ding method is used. For each interior branch of a predetermined topol ogy, the interior-branch and bootstrap tests provide the confidence va lues. P-C and P-B, respectively, that indicate the extent of statistic al support of the sequence cluster generated by the branch. In phyloge netic analysis these two values are often interpreted in the same way, and if P-C and P-B are high (say, greater than or equal to 0.95), the sequence cluster is regarded as reliable. We have shown that P-C is i n fact the complement of the P-value used in the standard statistical test, but P-B is not. Actually, the bootstrap test usually underestima tes the extent of statistical support of species cluster. The relation ship between the confidence values obtained by the two tests varies wi th both the topology and expected branch lengths of the true (model) t ree. The most conspicuous difference between P-C and P-B is observed w hen the true tree is starlike, and there is a tendency for the differe nce to increase as the number of sequences in the tree increases. The reason for this is that the bootstrap test tends to become progressive ly more conservative as the number of sequences in the tree increases. Unlike the bootstrap, the interior-branch test has the same statistic al properties irrespective of the number of sequences used when a pred etermined tree is considered. Therefore, the interior-branch test appe ars to be preferable to the bootstrap test estimated from a given data set. P-C may give an overestimate of statistical confidence. For this case, we developed a method for computing a modified version (P'(C)) of the P-C value and showed that this P'(C) tends to give a conservati ve estimate of statistical confidence, though it is not as conservativ e as P-B. In this paper we have introduced a model in which evolutiona ry distances between sequences follow a multivariate normal distributi on. This model allowed us to study the relationships between the two t ests analytically.