ITA
ENG

SPADE: An efficient algorithm for mining frequent sequences

Authors

Zaki, MJ

Citation

Mj. Zaki, SPADE: An efficient algorithm for mining frequent sequences, MACH LEARN, 42(1-2), 2001, pp. 31-60

Citations number

Categorie Soggetti

AI Robotics and Automatic Control

Journal title

MACHINE LEARNING

ISSN journal

08856125 → ACNP

Volume

Issue

1-2

Year of publication

2001

Pages

31 - 60

Database

ISI

SICI code

0885-6125(200101)42:1-2<31:SAEAFM>2.0.ZU;2-3

Abstract

In this paper we present SPADE, a new algorithm for fast discovery of Seque ntial Patterns. The existing solutions to this problem make repeated databa se scans, and use complex hash structures which have poor locality. SPADE u tilizes combinatorial properties to decompose the original problem into sma ller sub-problems, that can be independently solved in main-memory using ef ficient lattice search techniques, and using simple join operations. All se quences are discovered in only three database scans. Experiments show that SPADE outperforms the best previous algorithm by a factor of two, and by an order of magnitude with some pre-processed data. It also has linear scalab ility with respect to the number of input-sequences, and a number of other database parameters. Finally, we discuss how the results of sequence mining can be applied in a real application domain.