Single-nucleotide polymorphisms (SNPs) have been explored as a high-resolut
ion marker set for accelerating the mapping of disease genes(1-11). Here we
report 48,196 candidate SNPs detected by statistical analysis of human exp
ressed sequence tags (ESTs). associated primarily with coding regions of ge
nes. We used Bayesian inference to weigh evidence for true polymorphism ver
sus sequencing error, misalignment or ambiguity, misclustering or chimaeric
EST sequences, assessing data such as raw chromatogram height, sharpness,
overlap and spacing, sequencing error rates, context-sensitivity and cDNA l
ibrary origin. Three separate validation-comparison with 54 genes screened
for SNPs independently, verification of HLA-A polymorphisms and restriction
fragment length polymorphism (RFLP) testing-verified 70%, 89% and 71% of o
ur predicted SNPs, respectively. Our method detects tenfold more true HLA-A
SNPs than previous analyses of the EST data. We found SNPs in a large frac
tion of known disease genes, including some disease-causing mutations (for
example, the HbS sickle-cell mutation). Our comprehensive analysis of human
coding region polymorphism provides a public resource for mapping of disea
se genes (available at http://www.bioinformatics.ucla.edu/snp).