Mj. Sanderson et al., Error, bias, and long-branch attraction in data for two chloroplast photosystem genes in seed plants, MOL BIOL EV, 17(5), 2000, pp. 782-797
Sequences of two chloroplast photosystem genes, psaA and psbB, together com
prising about 3,500 bp, were obtained for all five major groups of extant s
eed plants and several outgroups among other vascular plants. Strongly supp
orted, but significantly conflicting, phylogenetic signals were obtained in
parsimony analyses from partitions of the data into first and second codon
positions versus third positions. In the former, both genes agreed on a mo
nophyletic gymnosperms, with Gnetales closely related to certain conifers.
In the latter, Gnetales are inferred to be the sister group of all other se
ed plants, with gymnosperms paraphyletic. None of the data supported the mo
dern "anthophyte hypothesis," which places Gnetales as the sister group of
flowering plants. A series of simulation studies were undertaken to examine
the error rate for parsimony inference. Three kinds of errors were examine
d: random error, systematic bias (both properties of finite data sets), and
statistical inconsistency owing to long-branch attraction (an asymptotic p
roperty). Parsimony reconstructions were extremely biased for third-positio
n data for psbB. Regardless of the true underlying tree, a tree in which Gn
etales are sister to all other seed plants was likely to be reconstructed f
or these data. None of the combinations of genes or partitions permits the
anthophyte tree to be reconstructed with high probability. Simulations of p
rogressively larger data sets indicate the existence of long-branch attract
ion (statistical inconsistency) for third-position psbB data if either the
anthophyte tree or the gymnosperm tree is correct. This is also true for th
e anthophyte tree using either psaA third positions or psbB first and secon
d positions. A factor contributing to bias and inconsistency is extremely s
hort branches at the base of the seed plant radiation, coupled with extreme
ly high rates in Gnetales and nonseed plant outgroups.