L. Parida et D. Geiger, Mass estimation of DNA molecules and extraction of ordered restriction maps in optical mapping imagery, ALGORITHMIC, 25(2-3), 1999, pp. 295-310
The Human Genome project has been considered "one of the great enterprises
of 20th Century Science" [8]. The ultimate goal of many efforts in molecula
r biology, including the Human Genome Project, is to determine the entire s
equence of human DNA and to extract genetic information from it. In this co
ntext an important step is to build restriction maps (location of certain i
dentifiable markers) of portions of the DNA [8]. Optimal Mapping [24], [17]
, [21], a process that fixes elongated DNA molecules onto treated surfaces,
is a very promising emerging technology for rapid production of ordered re
striction maps. Our modeling and solution of detection, mass estimation, an
d extraction of ordered restriction maps of DNA molecules attempts to under
stand/answer questions of significant importance to geneticists, biologists
, and chemists.
In this paper we first describe the vision process that identifies the DNA
molecules along with the restriction sites, and then model the restriction
map problem and describe the algorithm that we use to extract the maps from
the data. The input to the vision process is an image of the DNA molecules
. We propose a simple Markov chain to model the DNA fragments, and use Sing
le Source Shortest Path (SSSP) algorithms to detect and measure the apparen
t length of the DNA fragments. This is at the heart of the imaging software
in use at the laboratory for over 2 years: the practical implementation co
ntinues to undergo various fine-tuning processes. In the second part we mod
el the ordered restriction map as a combinatorial problem which we call the
Binary Flip Cut (BFC) problem. We prove hardness results for this and seve
ral of its variants and offer a practical solution using techniques from me
an field annealing/EM algorithm and demonstrate its effectiveness in practi
ce.