THE SAMPLING DISTRIBUTIONS AND COVARIANCE-MATRIX OF PHYLOGENETIC SPECTRA

Citation
Pj. Waddell et al., THE SAMPLING DISTRIBUTIONS AND COVARIANCE-MATRIX OF PHYLOGENETIC SPECTRA, Molecular biology and evolution, 11(4), 1994, pp. 630-642
Citations number
18
Categorie Soggetti
Biology
ISSN journal
07374038
Volume
11
Issue
4
Year of publication
1994
Pages
630 - 642
Database
ISI
SICI code
0737-4038(1994)11:4<630:TSDACO>2.0.ZU;2-C
Abstract
We extend recent advances in computing variance-covariance matrices fr om genetic distances to a sequence method of phylogenetic analysis. Th ese matrices, together with other statistical properties of corrected sequence spectra, are studied as a foundation for more powerful and te stable methods in phylogenetics. We start with ($) over cap s, a vecto r of the proportion of sites in a sequence of length c showing each of the possible character-state patterns for t taxa. Hadamard conjugatio ns are then used to calculate ($) over cap gamma, a vector of the the support for bipartitions, or splits, in the data, after correcting for all implied multiple changes. These corrections are made independentl y of any tree and are illustrated with Cavender's two-character-state model. Each entry in ($) over cap gamma(($) over cap gamma(0) excluded ) that is not associated with an edge on the tree that generated the d ata is an invariant (sensu Cavender) with an expected value of 0 as th e number of sites c-->infinity. Under an independent identically distr ibuted model (sites are independent and identically distributed), vect or ($) over cap s is a random sample from a scaled multinomial distrib ution. Starting from this point, we illustrate the derivation of V[($) over cap gamma], the variance-covariance matrix of ($) over cap gamma . The bias induced by the delta method, a convenient approximation in deriving V[($) over cap gamma], is evaluated for both population and s ample variance-covariance matrices. It is found to be acceptable in th e first case and very good in the second. Likewise bias in ($) over ca p gamma due to a logarithmic transform and to short sequences is also acceptable. We infer the marginal distributions of entries in ($) over cap gamma. Simulations with illustrative values of c and lambda (the rate per site) show how ($) over cap gamma tends to multivariate norma l as c-->infinity. Our results extend naturally to four-color (nucleot ide) spectra.