Various versions of the shortest common superstring problem play impor
tant roles in data compression and DNA sequencing. Only recently, the
open problem of how to approximate a shortest superstring given a set
of strings was solved in (Blum et al., 1991; Li, 1990). Blum et al. (1
991) shows that several greedy algorithms produce a superstring of len
gth O(n), where n is the optimal length. However, a major problem rema
ins open: can we still linearly approximate a superstring in polynomia
l time when the superstring is required to be consistent with some giv
en negative strings, i.e., it must not contain any negative string? Th
e best previous algorithm, Group-Merge given in (Jiang and Li, 1993; L
i, 1990), produces a consistent superstring of length theta(n log n).
The negative strings make the problem much more difficult and, as we w
ill show, a greedy-style algorithm cannot achieve linear approximation
for this problem. We present polynomial-time approximation algorithms
that produce consistent superstrings of length O(n), for two importan
t special cases: (a) when no negative strings contain positive strings
as substrings; (b) when there are only a constant number of negative
strings. The algorithms are obtained by making an essential use of the
Hungarian algorithm, which can find an optimal cycle cover on weighte
d graphs. The other main objective of this paper is to analyze the per
formance of some greedy-style algorithms for this problem. Due to thei
r time efficiency and simplicity, greedy algorithms are of practical i
mportance. We introduce a new analysis showing that when no negative s
trings contain positive strings, a greedy algorithm achieves O(n(4/3))
and O(n) if the number of negative examples is further bounded by som
e constant.