SHARE: an adaptive algorithm to select the most informative set of SNPs for candidate genetic association

Citation
Y. Dai, James et al., SHARE: an adaptive algorithm to select the most informative set of SNPs for candidate genetic association, Biostatistics (Oxford. Print) , 10(4), 2009, pp. 680-693
ISSN journal
14654644
Volume
10
Issue
4
Year of publication
2009
Pages
680 - 693
Database
ACNP
SICI code
Abstract
Association studies have been widely used to identify genetic liability variants for complex diseases. While scanning the chromosomal region 1 single nucleotide polymorphism (SNP) at a time may not fully explore linkage disequilibrium, haplotype analyses tend to require a fairly large number of parameters, thus potentially losing power. Clustering algorithms, such as the cladistic approach, have been proposed to reduce the dimensionality, yet they have important limitations.We propose a SNP-Haplotype Adaptive REgression (SHARE) algorithm that seeks the most informative set of SNPs for genetic association in a targeted candidate region by growing and shrinking haplotypes with 1 more or less SNP in a stepwise fashion, and comparing prediction errors of different models via cross-validation.Depending on the evolutionary history of the disease mutations and the markers, this set may contain a single SNP or several SNPs that lay a foundation for haplotype analyses.Haplotype phase ambiguity is effectively accounted for by treating haplotype reconstruction as a part of the learning procedure.Simulations and a data application show that our method has improved power over existing methodologies and that the results are informative in the search for disease-causal loci.