K. Rohde et R. Fuerst, Haplotyping and estimation of haplotype frequencies for closely linked biallelic multilocus genetic phenotypes including nuclear family information, HUM MUTAT, 17(4), 2001, pp. 289-295
With the discovery of single nucleotide polymorphisms (SNP) along the genom
e, genotyping of large samples of biallelic multilocus genetic phenotypes f
or (fine) mapping of disease genes or for population studies has become sta
ndard practice. A genetic trait, however, is mainly caused by an underlying
defective haplotype, and populations are best characterized by their haplo
type frequencies. Therefore, it is essential to infer from the phase unknow
n genetic phenotypes in a sample drawn from a population the haplotype freq
uencies in the population and the underlying haplotype pairs in the sample
in order to find disease predisposing genes by some association or haplotyp
e sharing algorithm. Haplotype frequencies and haplotype pairs are estimate
d via a maximum likelihood approach by a well-known expectation maximizatio
n (EM) algorithm, adapting it to a large number (up to 30) of biallelic loc
i (SNP), and including nuclear family information, if available, into the a
nalysis. Parents are treated as an independent sample from the population.
Their genotyped offspring reduces the number of potential haplotype pairs f
or both parents, resulting in a higher accuracy of the estimation, and may
also reduce computation time. In a series of simulations our approach of in
cluding nuclear family information has been tested against both the EM algo
rithm without nuclear family information and an alternative approach using
GENEHUNTER for the haplotyping of the families, using the locus-by-locus al
lele counts of the sample. Our new approach is more precise in haplotyping
in cases of a high number of heterozygous loci, whereas for a moderate numb
er of heterozygous positions in the sample all three different approaches g
ave the same perfect results. Hum Mutat 17:289-295, 2001. (C) zool Wiley-Li
ss, Inc.