More taxa or more characters revisited: Combining data from nuclear protein-encoding genes for phylogenetic analyses of Noctuoidea (Insecta : Lepidoptera)

Citation
A. Mitchell et al., More taxa or more characters revisited: Combining data from nuclear protein-encoding genes for phylogenetic analyses of Noctuoidea (Insecta : Lepidoptera), SYST BIOL, 49(2), 2000, pp. 202-224
Citations number
59
Categorie Soggetti
Biology
Journal title
SYSTEMATIC BIOLOGY
ISSN journal
10635157 → ACNP
Volume
49
Issue
2
Year of publication
2000
Pages
202 - 224
Database
ISI
SICI code
1063-5157(200006)49:2<202:MTOMCR>2.0.ZU;2-5
Abstract
A central question concerning data collection strategy for molecular phylog enies has been, is it better to increase the number of characters or the nu mber of taxa sampled to improve the robustness of a phylogeny estimate? A r ecent simulation study concluded that increasing the number of taxa sampled is preferable to increasing the number of nucleotide characters, if tars a re chosen specifically to break up long branches. We explore this hypothesi s by using empirical data from noctuoid moths, one of the largest superfami lies of insects. Separate studies of two nuclear genes, elongation factor-l a (EF-la) and dopa decarboxylase (DDC), have yielded similar gene trees and high concordance with morphological groupings for 49 exemplar species. How ever, support levels were quite low for nodes deeper than the subfamily lev el. We tested the effects on phylogenetic signal of (1) increasing the taxo n sampling by nearly 60%;, to 77 species, and (2) combining data from the t wo genes in a single analysis. Surprisingly, the increased taxon sampling, although designed to break up long branches, generated greater disagreement between the two gene data sets and decreased support levels for deeper nod es. We appear to have inadvertently introduced new long branches, and break ing these up may require a yet larger taxon sample. Sampling additional cha racters (combining data) greatly increased the phylogenetic signal. To cont rast the potential effect of combining data from independent genes with col lection of the same total number of characters from a single gene, we simul ated the latter by bootstrap augmentation of the single-gene data sets. Sup port levels for combined data were at least as high as those For the bootst rap-augmented data set for DDC and were much higher than those for the augm ented EF-la data set. This supports the view that in obtaining additional s equence data to solve a refractory systematic problem, it is prudent to tak e them from an independent gene.