There is considerable interest in the discovery and characterization of sin
gle nucleotide polymorphisms (SNPs) to enable the analysis of the potential
relationships between human genotype and phenotype. Here we present a stra
tegy that permits the rapid discovery of SNPs from publicly available expre
ssed sequence rag (EST) databases. From a set of ESTs derived from 19 diffe
rent cDNA libraries, we assembled 300,000 distinct sequences and identified
850 mismatches from contiguous EST data sets (candidate SNP sites), withou
t de novo sequencing. Through a polymerase-mediated, single-base, primer ex
tension technique, Genetic Bit Analysis (GBA), we confirmed the presence of
a subset of these candidate SNP sites and have estimated the allele freque
ncies in three human populations with different ethnic origins, Altogether,
our approach provides a basis for rapid and efficient regional and genome-
wide SNP discovery using data assembled from sequences from different libra
ries of cDNAs.