The shortest common superstring problem: Average case analysis for both exact and approximate matching

Authors
Citation
Eh. Yang et Z. Zhang, The shortest common superstring problem: Average case analysis for both exact and approximate matching, IEEE INFO T, 45(6), 1999, pp. 1867-1886
Citations number
26
Categorie Soggetti
Information Tecnology & Communication Systems
Journal title
IEEE TRANSACTIONS ON INFORMATION THEORY
ISSN journal
00189448 → ACNP
Volume
45
Issue
6
Year of publication
1999
Pages
1867 - 1886
Database
ISI
SICI code
0018-9448(199909)45:6<1867:TSCSPA>2.0.ZU;2-G
Abstract
The shortest common superstring problem and its extension to approximate ma tching are considered in the probability model,where each string in a given set has the same length and letters of strings are drawn independently fro m a finite set. In the exact matching case, several algorithms proposed in the literature are shown to be asymptotically optimal in the sense that the ratio of the savings resulting from the superstring constructed by each of these algorithms, that is the difference between the total length of the s trings in the given set and the length of the superstring, to the optimal s avings from the shortest superstring approaches in probability to 1 as the number of strings in the given set increases. In the approximate matching c ase, a modified version of the shortest common approximate matching superst ring problem is analyzed; it is demonstrated that the optimal savings in th is case is given approximately by n log n/I-l(Q, Q, 2D), where n is the num ber of strings in the given set, Q is the probability distribution governin g the selection of letters of strings, I-l(Q, Q, 2D) is the lower mutual in formation between Q and Q with respect to 2D, and D greater than or equal t o 0 is the distortion allowed in approximate matching. In addition, an appr oximation algorithm is proposed and proved asymptotically optimal.