Sequence diversity and large-scale typing of SNPs in the human apolipoprotein E gene

Citation
Da. Nickerson et al., Sequence diversity and large-scale typing of SNPs in the human apolipoprotein E gene, GENOME RES, 10(10), 2000, pp. 1532-1545
Citations number
60
Categorie Soggetti
Molecular Biology & Genetics
Journal title
GENOME RESEARCH
ISSN journal
10889051 → ACNP
Volume
10
Issue
10
Year of publication
2000
Pages
1532 - 1545
Database
ISI
SICI code
1088-9051(200010)10:10<1532:SDALTO>2.0.ZU;2-G
Abstract
A common strategy For genotyping large samples begins with the characteriza tion of human single nucleotide polymorphisms (SNPs) by sequencing candidat e regions in a small sample for SNP discovery. This is usually followed by typing in a large sample those sites observed to vary in a smaller sample. We present results from a systematic investigation of variation at the huma n apolipoprotein E locus (APOE), as well as the evaluation of the two-tiere d sampling strategy based on these data. We sequenced 5.5 kb spanning the e ntire APOE genomic region in a core sample of 72 individuals, including 24 each of African-Americans From Jackson, Mississippi; European-Americans fro m Rochester, Minnesota; and Europeans from North Karelia, Finland. This seq uence survey detected 21 SNPs and 1 multiallelic indel, 14 of which had not been previously reported. Alleles varied in relative frequency among the p opulations, and LO sites were polymorphic in only a single population sampl e. Oligonucleotide ligation assays (OLA) were developed for 20 of these sit es (omitting the indel and a closely-linked SNP). These were then scored in 2179 individuals sampled from the same three populations (n = 843, 884, an d 452, respectively). Relative allele frequencies were generally consistent with estimates from the core sample, although variation was found in some populations in the larger sample at SNPs that were monomorphic in the corre sponding smaller core sample. Sire variation in the larger samples showed n o systematic deviation from Hardy-Weinberg expectation. The large OLA sampl e clearly showed that variation in many, but not all, of OLA-typed SNPs is significantly correlated with the classical protein-coding variants, implyi ng that there may be important substructure within the classical epsilon2, epsilon3, and epsilon4 alleles. Comparison of the levels and patterns of po lymorphism in the core samples with those estimated for the OLA-typed sampl es shows how nucleotide diversity is underestimated when only a subset of s ites are typed and underscores the importance of adequate population sampli ng at the polymorphism discovery stage.