ITA
ENG

Data mining applied to linkage disequilibrium mapping

Authors

Toivonen, HTT Onkamo, P Vasko, K Ollikainen, V Sevon, P Mannila, H Herr, M Kere, J

Citation

Htt. Toivonen et al., Data mining applied to linkage disequilibrium mapping, AM J HU GEN, 67(1), 2000, pp. 133-145

Citations number

Categorie Soggetti

Research/Laboratory Medicine & Medical Tecnology","Molecular Biology & Genetics

Journal title

AMERICAN JOURNAL OF HUMAN GENETICS

ISSN journal

00029297 → ACNP

Volume

Issue

Year of publication

2000

Pages

133 - 145

Database

ISI

SICI code

0002-9297(200007)67:1<133:DMATLD>2.0.ZU;2-V

Abstract

We introduce a new method for linkage disequilibrium mapping: haplotype pat tern mining (HPM). The method, inspired by data mining methods, is based on discovery of recurrent patterns. We define a class of useful haplotype pat terns in genetic case-control data and use the algorithm for finding diseas e-associated haplotypes. The haplotypes are ordered by their strength of as sociation with the phenotype, and all haplotypes exceeding a given threshol d level are used for prediction of disease susceptibility-gene location. Th e method is model-free, in the sense that it does not require land is unabl e to utilize) any assumptions about the inheritance model of the disease. T he statistical model is nonparametric. The haplotypes are allowed to contai n gaps, which improves the method's robustness to mutations and to missing and erroneous data. Experimental studies with simulated microsatellite and SNP data show that the method has good localization power in data sets with large degrees of phenocopies and with lots of missing and erroneous data. The power of HPM is roughly identical for marker maps at a density of 3 sin gle-nucleotide polymorphisms/cM or 1 microsatellite/cM The capacity to hand le high proportions of phenocopies makes the method promising for complex d isease mapping. An example of correct disease susceptibility-gene localizat ion with HPM is given with real marker data from families from the United K ingdom affected by type 1 diabetes. The method is extendable to include env ironmental covariates or phenotype measurements or to find several genes si multaneously.