We introduce a new method for linkage disequilibrium mapping: haplotype pat
tern mining (HPM). The method, inspired by data mining methods, is based on
discovery of recurrent patterns. We define a class of useful haplotype pat
terns in genetic case-control data and use the algorithm for finding diseas
e-associated haplotypes. The haplotypes are ordered by their strength of as
sociation with the phenotype, and all haplotypes exceeding a given threshol
d level are used for prediction of disease susceptibility-gene location. Th
e method is model-free, in the sense that it does not require land is unabl
e to utilize) any assumptions about the inheritance model of the disease. T
he statistical model is nonparametric. The haplotypes are allowed to contai
n gaps, which improves the method's robustness to mutations and to missing
and erroneous data. Experimental studies with simulated microsatellite and
SNP data show that the method has good localization power in data sets with
large degrees of phenocopies and with lots of missing and erroneous data.
The power of HPM is roughly identical for marker maps at a density of 3 sin
gle-nucleotide polymorphisms/cM or 1 microsatellite/cM The capacity to hand
le high proportions of phenocopies makes the method promising for complex d
isease mapping. An example of correct disease susceptibility-gene localizat
ion with HPM is given with real marker data from families from the United K
ingdom affected by type 1 diabetes. The method is extendable to include env
ironmental covariates or phenotype measurements or to find several genes si
multaneously.