A general approach to single-nucleotide polymorphism discovery

Citation
Gt. Marth et al., A general approach to single-nucleotide polymorphism discovery, NAT GENET, 23(4), 1999, pp. 452-456
Citations number
23
Categorie Soggetti
Molecular Biology & Genetics
Journal title
NATURE GENETICS
ISSN journal
10614036 → ACNP
Volume
23
Issue
4
Year of publication
1999
Pages
452 - 456
Database
ISI
SICI code
1061-4036(199912)23:4<452:AGATSP>2.0.ZU;2-Z
Abstract
Single-nucleotide polymorphisms (SNPs) are the most abundant form of human genetic variation and a resource for mapping complex genetic traits'. The l arge volume of data produced by high-throughput sequencing projects is a ri ch and largely untapped source of SNPs (refs 2-5). We present here a unifie d approach to the discovery of variations in genetic sequence data of arbit rary DNA sources. We propose to use the rapidly emerging genomic: sequence( 6,7) as a template on which to layer often unmapped, fragmentary sequence d ata(8-11) and to use base quality values(12) to discern true allelic variat ions from sequencing errors. By taking advantage of the genomic sequence we are able to use simpler yet more accurate methods for sequence organizatio n: fragment clustering, paralogue identification and multiple alignment. We analyse these sequences with a novel, Bayesian inference engine, POLYBAYES , to calculate the probability that a given site is polymorphic. Rigorous t reatment of base quality permits completely automated evaluation of the ful l length of all sequences, without limitations on alignment depth. We demon strate this approach by accurate SNP predictions in human ESTs aligned to f inished and working-draft quality genomic sequences, a data set representat ive of the typical challenges of sequence-based SNP discovery.