More taxa or more characters revisited: Combining data from nuclear protein-encoding genes for phylogenetic analyses of Noctuoidea (Insecta : Lepidoptera)
A. Mitchell et al., More taxa or more characters revisited: Combining data from nuclear protein-encoding genes for phylogenetic analyses of Noctuoidea (Insecta : Lepidoptera), SYST BIOL, 49(2), 2000, pp. 202-224
A central question concerning data collection strategy for molecular phylog
enies has been, is it better to increase the number of characters or the nu
mber of taxa sampled to improve the robustness of a phylogeny estimate? A r
ecent simulation study concluded that increasing the number of taxa sampled
is preferable to increasing the number of nucleotide characters, if tars a
re chosen specifically to break up long branches. We explore this hypothesi
s by using empirical data from noctuoid moths, one of the largest superfami
lies of insects. Separate studies of two nuclear genes, elongation factor-l
a (EF-la) and dopa decarboxylase (DDC), have yielded similar gene trees and
high concordance with morphological groupings for 49 exemplar species. How
ever, support levels were quite low for nodes deeper than the subfamily lev
el. We tested the effects on phylogenetic signal of (1) increasing the taxo
n sampling by nearly 60%;, to 77 species, and (2) combining data from the t
wo genes in a single analysis. Surprisingly, the increased taxon sampling,
although designed to break up long branches, generated greater disagreement
between the two gene data sets and decreased support levels for deeper nod
es. We appear to have inadvertently introduced new long branches, and break
ing these up may require a yet larger taxon sample. Sampling additional cha
racters (combining data) greatly increased the phylogenetic signal. To cont
rast the potential effect of combining data from independent genes with col
lection of the same total number of characters from a single gene, we simul
ated the latter by bootstrap augmentation of the single-gene data sets. Sup
port levels for combined data were at least as high as those For the bootst
rap-augmented data set for DDC and were much higher than those for the augm
ented EF-la data set. This supports the view that in obtaining additional s
equence data to solve a refractory systematic problem, it is prudent to tak
e them from an independent gene.