Distance-based species tree estimation under the coalescent: information-theoretic trade-off between number of loci and sequence length

Citation
Mossel, Elchanan et Roch, Sebastien, Distance-based species tree estimation under the coalescent: information-theoretic trade-off between number of loci and sequence length, Annals of applied probability , 27(5), 2017, pp. 2926-2955
ISSN journal
10505164
Volume
27
Issue
5
Year of publication
2017
Pages
2926 - 2955
Database
ACNP
SICI code
Abstract
We consider the reconstruction of a phylogeny from multiple genes under the multispecies coalescent. We establish a connection with the sparse signal detection problem, where one seeks to distinguish between a distribution and a mixture of the distribution and a sparse signal. Using this connection, we derive an information-theoretic trade-off between the number of genes, m, needed for an accurate reconstruction and the sequence length, k, of the genes. Specifically, we show that to detect a branch of length f, one needs $\mathrm{m}=\mathrm{\Theta }(1/[{\mathrm{f}}^{2}\sqrt{\mathrm{k}}\left]\right)$ genes.