M. Kimmel et R. Chakraborty, MEASURES OF VARIATION AT DNA REPEAT LOCI UNDER A GENERAL STEPWISE MUTATION MODEL, Theoretical population biology, 50(3), 1996, pp. 345-367
Polymorphisms at tandem repeat loci are caused by mutations with allel
e sizes occasionally altered by more than one repeat unit in both forw
ard and backward directions. Such mutational changes may occur with as
ymmetric probabilities. Therefore, a one-step symmetric stepwise mutat
ion model may not be appropriate For studying the population dynamics
at all repeat loci. In this work, we evaluated the expectation and var
iance of the within-population variance of the allele size distributio
n in a finite population, and the expected homozygosity at a locus by
the coalescence approach under a general stepwise mutation model, wher
e mutational transitions of allele sizes can be arbitrary, including b
eing asymmetric. Under the special cases of symmetric one-step, two-st
ep, and multi-step geometric distributions of mutations, our general r
esults reduce to the corresponding results obtained by earlier investi
gators. The general results indicate that in a finite population, whic
h has reached a steady state under the (general stepwise) mutation and
drift balance, the within-population variance of allele sizes has a s
imple expectation (i.e., proportional to N nu, the product of the muta
tion rate, nu, and effective population size, N). However, its stochas
tic variance is a quadratic function of this composite parameter, N nu
. Furthermore, this second-order variance does not decay with the numb
er of alleles sampled from a population. Application of this theory to
data on allele size distributions in unrelated Caucasians from the CE
PH pedigree (obtained from the Genome Data Base) shows that the relati
onship of the variance and mean of within-population variance of allel
e sizes at tandem repeat loci, grouped by their chromosomal assignment
, has a trend compatible with the theory. However, there is an indicat
ion that the second-order variance is generally underestimated. One re
ason for this departure might be that the CEPH sample may not represen
t a single homogeneous population that reached equilibrium at all tand
em repeat loci. (C) 1996 Academic Press, Inc.