Va. Albert et al., CHARACTER-STATE WEIGHTING FOR CLADISTIC-ANALYSIS OF PROTEIN-CODING DNA-SEQUENCES, Annals of the Missouri Botanical Garden, 80(3), 1993, pp. 752-766
Nucleotide data are a restricted character system complex enough to co
nfound phylogenetic analyses yet simple enough to permit establishment
of probability models for sequence change and corresponding character
-state weighting schemes. We have previously developed a general metho
d for weighting DNA data that is here elaborated for protein-coding se
quences. Included in the present model are corrections for (i) multipl
e substitution events, (ii) transition/transversion bias, and (iii) di
fferential proportions of changes occurring at first, second, and thir
d codon positions. This model is shown to be generally consistent for
all phylogenetically useful data. Greater understanding of the propert
ies of equal versus differential character-state weighting comes from
consideration of numbers of terminal taxa and lengths of tree segments
. With insufficient sampling of taxa, differential weighting attempts
to correct for undetected multiple substitution events. Both equal and
differential weighting should give the same result if sufficient numb
ers of terminal taxa permit the detection of historically misleading c
haracter-state changes. Nevertheless, spurious attraction of tree segm
ents remains a systematic problem that is not easily resolved either b
y equal weighting or by our differential weighting model, which acts g
lobally rather than adjusting for different probabilities of character
-state change among tree segments. Artifactual segment attraction is b
est understood in terms of asymmetries in lambda (which represents sta
te changes per character during a particular segment interval). We rel
ate the consistency index to numbers of terminal taxa and lambda, illu
strating its dependence upon numbers of potential tree segments. Prosp
ects for phylogenetic reconstruction from protein-coding nucleotide da
ta are discussed with reference to the robustness of equal weighting (
given our own model) with adequate taxonomic sampling.