ITA
ENG

A general approach to single-nucleotide polymorphism discovery

Authors

Marth, GT Korf, I Yandell, MD Yeh, RT Gu, ZJ Zakeri, H Stitziel, NO Hillier, L Kwok, PY Gish, WR

Citation

Gt. Marth et al., A general approach to single-nucleotide polymorphism discovery, NAT GENET, 23(4), 1999, pp. 452-456

Citations number

Categorie Soggetti

Molecular Biology & Genetics

Journal title

NATURE GENETICS

ISSN journal

10614036 → ACNP

Volume

Issue

Year of publication

1999

Pages

452 - 456

Database

ISI

SICI code

1061-4036(199912)23:4<452:AGATSP>2.0.ZU;2-Z

Abstract

Single-nucleotide polymorphisms (SNPs) are the most abundant form of human genetic variation and a resource for mapping complex genetic traits'. The l arge volume of data produced by high-throughput sequencing projects is a ri ch and largely untapped source of SNPs (refs 2-5). We present here a unifie d approach to the discovery of variations in genetic sequence data of arbit rary DNA sources. We propose to use the rapidly emerging genomic: sequence( 6,7) as a template on which to layer often unmapped, fragmentary sequence d ata(8-11) and to use base quality values(12) to discern true allelic variat ions from sequencing errors. By taking advantage of the genomic sequence we are able to use simpler yet more accurate methods for sequence organizatio n: fragment clustering, paralogue identification and multiple alignment. We analyse these sequences with a novel, Bayesian inference engine, POLYBAYES , to calculate the probability that a given site is polymorphic. Rigorous t reatment of base quality permits completely automated evaluation of the ful l length of all sequences, without limitations on alignment depth. We demon strate this approach by accurate SNP predictions in human ESTs aligned to f inished and working-draft quality genomic sequences, a data set representat ive of the typical challenges of sequence-based SNP discovery.