Measuring the phylogenetic randomness of biological data sets

Citation
Whe. Day et al., Measuring the phylogenetic randomness of biological data sets, SYST BIOL, 47(4), 1998, pp. 604-616
Citations number
83
Categorie Soggetti
Biology
Journal title
SYSTEMATIC BIOLOGY
ISSN journal
10635157 → ACNP
Volume
47
Issue
4
Year of publication
1998
Pages
604 - 616
Database
ISI
SICI code
1063-5157(199812)47:4<604:MTPROB>2.0.ZU;2-8
Abstract
Two qualitative taxonomic characters are potentially compatible if the stat es of each can be ordered into a character state tree in such a way that th e two resulting character state trees are compatible. The number of potenti ally compatible pairs (NPCP) of qualitative characters from a data set may be considered to be a measure of its phylogenetic randomness. The value of NPCP depends on the number of evolutionary units (EUs), the number of chara cters, the number of states in the characters, the distributions of EUs amo ng these states, and the amount and distribution of missing information and so does not directly indicate degree of phylogenetic randomness. Thus, for an observed data set, we used Monte Carlo methods to estimate the probabil ity that a data set chosen equiprobably from among those identical (with re spect to all the other above determining features) to the observed data set would have as high (or low) an NPCP as the observed data set. This probabi lity, the realized significance of the observed NPCP, is attractive as an i ndication of phylogenetic randomness because it does not require the assump tions made by other such methods: No character state trees are assumed and consequently, only potential compatibility can be determined; no particular method of phylogenetic estimation is assumed; and no phylogenetic trees ar e constructed. We determined the values and significances of NPCP for analy ses of 57 data sets taken from 53 published sources. All data sets from 37 of those sources exhibited realized significances of <0.01, indicating high levels of phylogenetic nonrandomness. From each of the remaining 16 source s, at least one data set was more phylogenetically random. Inclusion of out groups changed significance in some cases, but not always in the same direc tion. Data sets with significantly low NPCP may be consistent with an ancie nt hybrid origin (or other ancient polyphyletic gene exchange, crossing ove r, viral transfer, etc.) of the study group.