Eh. Yang et Z. Zhang, The shortest common superstring problem: Average case analysis for both exact and approximate matching, IEEE INFO T, 45(6), 1999, pp. 1867-1886
The shortest common superstring problem and its extension to approximate ma
tching are considered in the probability model,where each string in a given
set has the same length and letters of strings are drawn independently fro
m a finite set. In the exact matching case, several algorithms proposed in
the literature are shown to be asymptotically optimal in the sense that the
ratio of the savings resulting from the superstring constructed by each of
these algorithms, that is the difference between the total length of the s
trings in the given set and the length of the superstring, to the optimal s
avings from the shortest superstring approaches in probability to 1 as the
number of strings in the given set increases. In the approximate matching c
ase, a modified version of the shortest common approximate matching superst
ring problem is analyzed; it is demonstrated that the optimal savings in th
is case is given approximately by n log n/I-l(Q, Q, 2D), where n is the num
ber of strings in the given set, Q is the probability distribution governin
g the selection of letters of strings, I-l(Q, Q, 2D) is the lower mutual in
formation between Q and Q with respect to 2D, and D greater than or equal t
o 0 is the distortion allowed in approximate matching. In addition, an appr
oximation algorithm is proposed and proved asymptotically optimal.