D. Greenberg et S. Istrail, THE CHIMERIC MAPPING PROBLEM - ALGORITHMIC STRATEGIES AND PERFORMANCEEVALUATION ON SYNTHETIC GENOMIC DATA, Computers & chemistry, 18(3), 1994, pp. 207-220
The Human Genome Project requires better software for the creation of
physical maps of chromosomes. Current mapping techniques involve break
ing large segments of DNA into smaller, more-manageable pieces, gather
ing information on all the small pieces, and then constructing a map o
f the original large piece from the information about the small pieces
. Unfortunately, in the process of breaking up the DNA some informatio
n is lost and noise of various types is introduced; in particular, the
order of the pieces is not preserved. Thus, the map maker must solve
a combinatorial problem in order to reconstruct the map. Good software
is indispensable for quick, accurate reconstruction. The reconstructi
on is complicated by various experimental errors. A major source of di
fficulty-which sems to be inherent to the recombination technology-is
the presence of chimeric DNA clones. It is fairly common for two disjo
int DNA pieces to form a chimera, i.e. a fusion of two pieces which ap
pears as a single piece. Attempts to order chimera will fail unless th
ey are algorithmically divided into their constituent pieces. Despite
consensus within the genomic mapping community of the critical importa
nce of correcting chimerism, algorithms for solving the chimeric clone
problem have received only passing attention in the literature. Based
on a model proposed by Lander (1992a, b) this paper presents the firs
t algorithms for analyzing chimersim. We construct physical maps in th
e presence of chimerism by creating optimization functions which have
minimizations which correlate with map quality. Despite the fact that
these optimization functions are invariably NP-complete our algorithms
are guaranteed to produce solutions which are close to the optimum. T
he practical import of using these algorithms depends on the strength
of the correlation of the function to the map quality as well as on th
e accuracy of the approximations. We employ two fundamentally differen
t optimization functions as a means of avoiding biases likely to decor
relate the solutions from the desired map. Experiments on simulated da
ta show that both our algorithm which minimizes the number of chimeric
fragments in a solution and our algorithm which minimizes the maximum
number of fragments per clone in a solution do, in fact, correlate to
high quality solutions. Furthermore, tests on simulated data using pa
rameters set to mimic real experiments show that the algorithms have t
he potential to find high quality solutions with real data. We plan to
test our software against real data from the Whitehead Institute and
from Los Alamos Genomic Research Center in the near future.