A common strategy For genotyping large samples begins with the characteriza
tion of human single nucleotide polymorphisms (SNPs) by sequencing candidat
e regions in a small sample for SNP discovery. This is usually followed by
typing in a large sample those sites observed to vary in a smaller sample.
We present results from a systematic investigation of variation at the huma
n apolipoprotein E locus (APOE), as well as the evaluation of the two-tiere
d sampling strategy based on these data. We sequenced 5.5 kb spanning the e
ntire APOE genomic region in a core sample of 72 individuals, including 24
each of African-Americans From Jackson, Mississippi; European-Americans fro
m Rochester, Minnesota; and Europeans from North Karelia, Finland. This seq
uence survey detected 21 SNPs and 1 multiallelic indel, 14 of which had not
been previously reported. Alleles varied in relative frequency among the p
opulations, and LO sites were polymorphic in only a single population sampl
e. Oligonucleotide ligation assays (OLA) were developed for 20 of these sit
es (omitting the indel and a closely-linked SNP). These were then scored in
2179 individuals sampled from the same three populations (n = 843, 884, an
d 452, respectively). Relative allele frequencies were generally consistent
with estimates from the core sample, although variation was found in some
populations in the larger sample at SNPs that were monomorphic in the corre
sponding smaller core sample. Sire variation in the larger samples showed n
o systematic deviation from Hardy-Weinberg expectation. The large OLA sampl
e clearly showed that variation in many, but not all, of OLA-typed SNPs is
significantly correlated with the classical protein-coding variants, implyi
ng that there may be important substructure within the classical epsilon2,
epsilon3, and epsilon4 alleles. Comparison of the levels and patterns of po
lymorphism in the core samples with those estimated for the OLA-typed sampl
es shows how nucleotide diversity is underestimated when only a subset of s
ites are typed and underscores the importance of adequate population sampli
ng at the polymorphism discovery stage.