This paper describes an algorithm, called SEQUITUR, that identifies hi
erarchical structure in sequences of discrete symbols and uses that in
formation for compression, On many practical sequences it performs wel
l at both compression and structural inference, producing comprehensib
le descriptions of sequence structure in the form of grammar rules, Th
e algorithm can be stated concisely in the form of two constraints on
a context-free grammar, Inference is performed incrementally, the stru
cture faithfully representing the input at all times, It can be implem
ented efficiently and operates in a time that is approximately linear
in sequence length, Despite its simplicity and efficiency, SEQUITUR su
cceeds in inferring a range of interesting hierarchical structures fro
m naturally occurring sequences.