GREEDY ALGORITHMS FOR THE SHORTEST COMMON SUPERSTRING THAT ARE ASYMPTOTICALLY OPTIMAL

Citation
A. Frieze et W. Szpankowski, GREEDY ALGORITHMS FOR THE SHORTEST COMMON SUPERSTRING THAT ARE ASYMPTOTICALLY OPTIMAL, Algorithmica, 21(1), 1998, pp. 21-36
Citations number
31
Categorie Soggetti
Mathematics,"Computer Science Software Graphycs Programming",Mathematics,"Computer Science Software Graphycs Programming
Journal title
ISSN journal
01784617
Volume
21
Issue
1
Year of publication
1998
Pages
21 - 36
Database
ISI
SICI code
0178-4617(1998)21:1<21:GAFTSC>2.0.ZU;2-B
Abstract
There has recently been a resurgence of interest in the shortest commo n superstring problem due to its important applications in molecular b iology (e.g., recombination of DNA) and data compression. The problem is NP-hard, but it has been known for some time that greedy algorithms work well for this problem. More precisely, it was proved in a recent sequence of papers that in the worst case a greedy algorithm produces a superstring that is at most beta times (2 less than or equal to bet a less than or equal to 4) worse than optimal. We analyze the problem in a probabilistic framework, and consider the optimal total overlap O -n(opt) and the overlap O-n(gr) produced by various greedy algorithms. These turn out to be asymptotically equivalent. We show that with hig h probability lim(n-->infinity) O-n(opt)/(n log n) = lim(n-->infinity) O-n(gr)/(n log n) = 1/H, where n is the number of original strings an d H is the entropy of the underlying alphabet. Our results hold under a condition that the lengths of all strings are not too short.