Pj. Waddell et al., THE SAMPLING DISTRIBUTIONS AND COVARIANCE-MATRIX OF PHYLOGENETIC SPECTRA, Molecular biology and evolution, 11(4), 1994, pp. 630-642
We extend recent advances in computing variance-covariance matrices fr
om genetic distances to a sequence method of phylogenetic analysis. Th
ese matrices, together with other statistical properties of corrected
sequence spectra, are studied as a foundation for more powerful and te
stable methods in phylogenetics. We start with ($) over cap s, a vecto
r of the proportion of sites in a sequence of length c showing each of
the possible character-state patterns for t taxa. Hadamard conjugatio
ns are then used to calculate ($) over cap gamma, a vector of the the
support for bipartitions, or splits, in the data, after correcting for
all implied multiple changes. These corrections are made independentl
y of any tree and are illustrated with Cavender's two-character-state
model. Each entry in ($) over cap gamma(($) over cap gamma(0) excluded
) that is not associated with an edge on the tree that generated the d
ata is an invariant (sensu Cavender) with an expected value of 0 as th
e number of sites c-->infinity. Under an independent identically distr
ibuted model (sites are independent and identically distributed), vect
or ($) over cap s is a random sample from a scaled multinomial distrib
ution. Starting from this point, we illustrate the derivation of V[($)
over cap gamma], the variance-covariance matrix of ($) over cap gamma
. The bias induced by the delta method, a convenient approximation in
deriving V[($) over cap gamma], is evaluated for both population and s
ample variance-covariance matrices. It is found to be acceptable in th
e first case and very good in the second. Likewise bias in ($) over ca
p gamma due to a logarithmic transform and to short sequences is also
acceptable. We infer the marginal distributions of entries in ($) over
cap gamma. Simulations with illustrative values of c and lambda (the
rate per site) show how ($) over cap gamma tends to multivariate norma
l as c-->infinity. Our results extend naturally to four-color (nucleot
ide) spectra.