Statistical learning formulation of the DNA base-calling problem and its solution in a Bayesian EM framework

Citation
Ms. Pereira et al., Statistical learning formulation of the DNA base-calling problem and its solution in a Bayesian EM framework, DISCR APP M, 104(1-3), 2000, pp. 229-258
Citations number
22
Categorie Soggetti
Engineering Mathematics
Volume
104
Issue
1-3
Year of publication
2000
Pages
229 - 258
Database
ISI
SICI code
Abstract
A novel formulation of the important DNA sequence base-calling problem as w ell as algorithms for its solution are introduced. The proposed approach is to bring DNA base-calling within the framework of a powerful statistical l earning paradigm, which allows the incorporation of prior knowledge about t he structure of the problem directly into the base-calling algorithms, with out resorting to heuristics. Use of prior knowledge provides constraints wh ich help disambiguate the different possible interpretations that the data may have at regions of low SNR, and is shown to lead to a substantial incre ase of the number of DNA bases that can be accurately called in such region s. Our experimental results suggest that the proposed algorithms, without b eing optimized, can achieve base-calling performance that matches, and ofte n exceeds, that of commercially available software. Furthermore,due to thei r statistical basis, they also provide confidence estimates (in the form of posterior probabilities) for the produced base call decisions, which can b e used for sequence assembly and mutation detection purposes. (C) 2000 Else vier Science B.V. All rights reserved.