The faithful recovery of the base sequence in automatic DeoxyriboNucleic Ac
id (DNA) sequencing fundamentally depends on the underlying statistics of t
he DNA electrophoresis time series, Current DNA sequencing algorithms are h
euristic in, nature and modest in their use of statistical information, In
this paper, a Formal statistical model of the DNA time series is presented
and then used to construct the optimal maximum-likelihood (MZ) processor.
The DNA-ML algorithm that is derived in this paper features Kalman predicti
on of peak locations, peak parameter estimation, whitened waveform comparis
on and multiple hypothesis processing using the M-algorithm, Properties of
the algorithm are examined using both simulated and real data. Model parame
ters of critical importance and their impact on different types of error me
chanisms, such as insertions and deletions, are pointed out, The statistica
l model of the DNA time-series and the structure of the DNA-ML algorithm pr
ovides a basis for future investigation and refinement of DNA sequencing te
chniques.