APPROXIMATING SHORTEST SUPERSTRINGS WITH CONSTRAINTS

Authors
Citation
T. Jiang et M. Li, APPROXIMATING SHORTEST SUPERSTRINGS WITH CONSTRAINTS, Theoretical computer science, 134(2), 1994, pp. 473-491
Citations number
18
Categorie Soggetti
Computer Sciences","Computer Science Theory & Methods
ISSN journal
03043975
Volume
134
Issue
2
Year of publication
1994
Pages
473 - 491
Database
ISI
SICI code
0304-3975(1994)134:2<473:ASSWC>2.0.ZU;2-G
Abstract
Various versions of the shortest common superstring problem play impor tant roles in data compression and DNA sequencing. Only recently, the open problem of how to approximate a shortest superstring given a set of strings was solved in (Blum et al., 1991; Li, 1990). Blum et al. (1 991) shows that several greedy algorithms produce a superstring of len gth O(n), where n is the optimal length. However, a major problem rema ins open: can we still linearly approximate a superstring in polynomia l time when the superstring is required to be consistent with some giv en negative strings, i.e., it must not contain any negative string? Th e best previous algorithm, Group-Merge given in (Jiang and Li, 1993; L i, 1990), produces a consistent superstring of length theta(n log n). The negative strings make the problem much more difficult and, as we w ill show, a greedy-style algorithm cannot achieve linear approximation for this problem. We present polynomial-time approximation algorithms that produce consistent superstrings of length O(n), for two importan t special cases: (a) when no negative strings contain positive strings as substrings; (b) when there are only a constant number of negative strings. The algorithms are obtained by making an essential use of the Hungarian algorithm, which can find an optimal cycle cover on weighte d graphs. The other main objective of this paper is to analyze the per formance of some greedy-style algorithms for this problem. Due to thei r time efficiency and simplicity, greedy algorithms are of practical i mportance. We introduce a new analysis showing that when no negative s trings contain positive strings, a greedy algorithm achieves O(n(4/3)) and O(n) if the number of negative examples is further bounded by som e constant.