A. Grafen et M. Ridley, NONINDEPENDENCE IN STATISTICAL TESTS FOR DISCRETE CROSS-SPECIES DATA, Journal of theoretical biology, 188(4), 1997, pp. 507-514
The paper describes three previously undetected effects, due to biases
and non-independence, that can arise in statistical tests for associa
tions between character states in cross-species data. One kind, which
we call the family problem, is general to all known methods. In phylog
enetic data, the ancestral character state from which changes occur, o
r below which variation is found, is likely to he the same for many re
gions of the tree. The family problem interacts with two kinds of non-
independence that arise because of the methods of reconstruction of ch
aracter states that existing tests use. Different kinds of non-indepen
dence arise in methods that reconstruct joint, or single, character st
ates, respectively. Methods, like Ridley's (1983), that work with join
t character states suffer from the problem that a character state cann
ot change to itself with parsimony. Other methods that work with singl
e character states suffer from the problem that within a locally varia
ble region of the tree it is more likely with null data that there wil
l be two single changes in the two characters in separate branches tha
n one double change in both; associations opposite to the locally ance
stral state are therefore likely to be found in more than 50% of the v
ariable regions. In real data sets, the family problem acts to spotlig
ht the other kinds of bias: if the family problem is large the bias in
tests due to the way they reconstruct characters will be large, where
as if it is small, the local biases tend to cancel and disappear in th
e aggregate. (C) 1997 Academic Press Limited.