Cg. Nevillmanning et Ih. Witten, IDENTIFYING HIERARCHICAL STRUCTURE IN SEQUENCES - A LINEAR-TIME ALGORITHM, The journal of artificial intelligence research, 7, 1997, pp. 67-82
Citations number
24
Categorie Soggetti
Controlo Theory & Cybernetics","Computer Science Artificial Intelligence
SEQUITUR is an algorithm that infers a hierarchical structure from a s
equence of discrete symbols by replacing repeated phrases with a gramm
atical rule that generates the phrase, and continuing this process rec
ursively. The result is a hierarchical representation of the original
sequence, which offers insights into its lexical structure. The algori
thm is driven by two constraints that reduce the size of the grammar,
and produce structure as a by-product. S EQUITUR breaks new ground by
. operating incrementally. Moreover, the method's simple structure per
mits a proof that it operates in space and time that is linear in the
size of the input. Our implementation can process 50,000 symbols per s
econd and has been applied to an extensive range of real world sequenc
es.