Data mining applied to linkage disequilibrium mapping

Citation
Htt. Toivonen et al., Data mining applied to linkage disequilibrium mapping, AM J HU GEN, 67(1), 2000, pp. 133-145
Citations number
29
Categorie Soggetti
Research/Laboratory Medicine & Medical Tecnology","Molecular Biology & Genetics
Journal title
AMERICAN JOURNAL OF HUMAN GENETICS
ISSN journal
00029297 → ACNP
Volume
67
Issue
1
Year of publication
2000
Pages
133 - 145
Database
ISI
SICI code
0002-9297(200007)67:1<133:DMATLD>2.0.ZU;2-V
Abstract
We introduce a new method for linkage disequilibrium mapping: haplotype pat tern mining (HPM). The method, inspired by data mining methods, is based on discovery of recurrent patterns. We define a class of useful haplotype pat terns in genetic case-control data and use the algorithm for finding diseas e-associated haplotypes. The haplotypes are ordered by their strength of as sociation with the phenotype, and all haplotypes exceeding a given threshol d level are used for prediction of disease susceptibility-gene location. Th e method is model-free, in the sense that it does not require land is unabl e to utilize) any assumptions about the inheritance model of the disease. T he statistical model is nonparametric. The haplotypes are allowed to contai n gaps, which improves the method's robustness to mutations and to missing and erroneous data. Experimental studies with simulated microsatellite and SNP data show that the method has good localization power in data sets with large degrees of phenocopies and with lots of missing and erroneous data. The power of HPM is roughly identical for marker maps at a density of 3 sin gle-nucleotide polymorphisms/cM or 1 microsatellite/cM The capacity to hand le high proportions of phenocopies makes the method promising for complex d isease mapping. An example of correct disease susceptibility-gene localizat ion with HPM is given with real marker data from families from the United K ingdom affected by type 1 diabetes. The method is extendable to include env ironmental covariates or phenotype measurements or to find several genes si multaneously.