Ms. Pereira et al., Statistical learning formulation of the DNA base-calling problem and its solution in a Bayesian EM framework, DISCR APP M, 104(1-3), 2000, pp. 229-258
A novel formulation of the important DNA sequence base-calling problem as w
ell as algorithms for its solution are introduced. The proposed approach is
to bring DNA base-calling within the framework of a powerful statistical l
earning paradigm, which allows the incorporation of prior knowledge about t
he structure of the problem directly into the base-calling algorithms, with
out resorting to heuristics. Use of prior knowledge provides constraints wh
ich help disambiguate the different possible interpretations that the data
may have at regions of low SNR, and is shown to lead to a substantial incre
ase of the number of DNA bases that can be accurately called in such region
s. Our experimental results suggest that the proposed algorithms, without b
eing optimized, can achieve base-calling performance that matches, and ofte
n exceeds, that of commercially available software. Furthermore,due to thei
r statistical basis, they also provide confidence estimates (in the form of
posterior probabilities) for the produced base call decisions, which can b
e used for sequence assembly and mutation detection purposes. (C) 2000 Else
vier Science B.V. All rights reserved.