A. Frieze et W. Szpankowski, GREEDY ALGORITHMS FOR THE SHORTEST COMMON SUPERSTRING THAT ARE ASYMPTOTICALLY OPTIMAL, Algorithmica, 21(1), 1998, pp. 21-36
There has recently been a resurgence of interest in the shortest commo
n superstring problem due to its important applications in molecular b
iology (e.g., recombination of DNA) and data compression. The problem
is NP-hard, but it has been known for some time that greedy algorithms
work well for this problem. More precisely, it was proved in a recent
sequence of papers that in the worst case a greedy algorithm produces
a superstring that is at most beta times (2 less than or equal to bet
a less than or equal to 4) worse than optimal. We analyze the problem
in a probabilistic framework, and consider the optimal total overlap O
-n(opt) and the overlap O-n(gr) produced by various greedy algorithms.
These turn out to be asymptotically equivalent. We show that with hig
h probability lim(n-->infinity) O-n(opt)/(n log n) = lim(n-->infinity)
O-n(gr)/(n log n) = 1/H, where n is the number of original strings an
d H is the entropy of the underlying alphabet. Our results hold under
a condition that the lengths of all strings are not too short.