Usefulness of single nucleotide polymorphism data for estimating population parameters

Citation
Mk. Kuhner et al., Usefulness of single nucleotide polymorphism data for estimating population parameters, GENETICS, 156(1), 2000, pp. 439-447
Citations number
13
Categorie Soggetti
Biology,"Molecular Biology & Genetics
Journal title
GENETICS
ISSN journal
00166731 → ACNP
Volume
156
Issue
1
Year of publication
2000
Pages
439 - 447
Database
ISI
SICI code
0016-6731(200009)156:1<439:UOSNPD>2.0.ZU;2-Z
Abstract
Single nucleotide polymorphism (SNP) data can be used for parameter estimat ion via maximum likelihood methods as long as the way in which the SNPs wer e determined is known, so that an appropriate likelihood formula can be con structed. We present such likelihoods for several sampling methods. As a te st of these approaches, we consider use of SNPs to estimate the parameter T heta = 4N(e)mu (the scaled product of effective population size and per-sit e mutation rate), which is related to the branch lengths of the reconstruct ed genealogy. With infinite amounts of data, ML models using SNP data are e xpected to produce consistent estimates of Theta. With finite amounts of da ta the estimates are accurate when Theta is high, but tend to be biased upw ard when Theta is low. If recombination is present and not allowed for in t he analysis, the results are additionally biased upward, but this effect ca n be removed by incorporating recombination into the analysis. SNPs defined as sites that are polymorphic in the actual sample under consideration (sa mple SNPs) are somewhat more accurate for estimation of Theta than SNPs def ined by their polymorphism in a panel chosen from the same population (pane l SNPs). Misrepresenting panel SNPs as sample SNPs leads to large errors in the maximum likelihood estimate of Theta. Researchers collecting SNPs shou ld collect and preserve information about the mettled of ascertainment so t hat the data can be accurately analyzed.