Sample size determination for classifiers based on single-nucleotide polymorphisms

Citation
Liu, Xinyu et al., Sample size determination for classifiers based on single-nucleotide polymorphisms, Biostatistics (Oxford. Print) , 13(2), 2012, pp. 217-227
ISSN journal
14654644
Volume
13
Issue
2
Year of publication
2012
Pages
217 - 227
Database
ACNP
SICI code
Abstract
Single-nucleotide polymorphisms (SNPs), believed to determine human differences, are widely used to predict risk of diseases.Typically, clinical samples are limited and/or the sampling cost is high.Thus, it is essential to determine an adequate sample size needed to build a classifier based on SNPs.Such a classifier would facilitate correct classifications, while keeping the sample size to a minimum, thereby making the studies cost-effective.For coded SNP data from 2 classes, an optimal classifier and an approximation to its probability of correct classification (PCC) are derived.A linear classifier is constructed and an approximation to its PCC is also derived.These approximations are validated through a variety of Monte Carlo simulations. A sample size determination algorithm based on the criterion, which ensures that the difference between the 2 approximate PCCs is below a threshold, is given and its effectiveness is illustrated via simulations.For the HapMap data on Chinese and Japanese populations, a linear classifier is built using 51 independent SNPs, and the required total sample sizes are determined using our algorithm, as the threshold varies.For example, when the threshold value is 0.05, our algorithm determines a total sample size of 166 (83 for Chinese and 83 for Japanese) that satisfies the criterion.