AN E-M ALGORITHM AND TESTING STRATEGY FOR MULTIPLE-LOCUS HAPLOTYPES

Citation
Ec. Long et al., AN E-M ALGORITHM AND TESTING STRATEGY FOR MULTIPLE-LOCUS HAPLOTYPES, American journal of human genetics, 56(3), 1995, pp. 799-810
Citations number
41
Categorie Soggetti
Genetics & Heredity
ISSN journal
00029297
Volume
56
Issue
3
Year of publication
1995
Pages
799 - 810
Database
ISI
SICI code
0002-9297(1995)56:3<799:AEAATS>2.0.ZU;2-5
Abstract
This paper gives an expectation maximization (EM) algorithm to obtain allele frequencies, haplotype frequencies, and gametic disequilibrium coefficients for multiple-locus systems. It permits high polymorphism and null alleles at all loci. This approach effectively deals with the primary estimation problems associated with such systems; that is, th ere is not a one-to-one correspondence between phenotypic and genotypi c categories, and sample sizes tend to be much smaller than the number of phenotypic categories. The EM method provides maximum-likelihood e stimates and therefore allows hypothesis tests using likelihood ratio statistics that have chi(2) distributions with large sample sizes. We also suggest a data resampling approach to estimate test statistic sam pling distributions. The resampling approach is more computer intensiv e, but it is applicable to all sample sizes. A strategy to test hypoth eses about aggregate groups of gametic disequilibrium coefficients is recommended. This strategy minimizes the number of necessary hypothesi s tests while at the same time describing the structure of disequilibr ium. These methods are applied to three unlinked dinucleotide repeat l oci in Navajo Indians and to three linked HLA loci in Gila River (Pima ) Indians. The likelihood functions of both data sets are shown to be maximized by the EM estimates, and the testing strategy provides a use ful description of the structure of gametic disequilibrium. Following these applications, a number of simulation experiments are performed t o test how well the likelihood-ratio statistic distributions are appro ximated by chi(2) distributions. In most circumstances the chi(2) gros sly underestimated the probability of type I errors. However, at times they also overestimated the type 1 error probability. Accordingly, we recommend hypothesis tests that use the resampling method.