SPADE: An efficient algorithm for mining frequent sequences

Authors
Citation
Mj. Zaki, SPADE: An efficient algorithm for mining frequent sequences, MACH LEARN, 42(1-2), 2001, pp. 31-60
Citations number
15
Categorie Soggetti
AI Robotics and Automatic Control
Journal title
MACHINE LEARNING
ISSN journal
08856125 → ACNP
Volume
42
Issue
1-2
Year of publication
2001
Pages
31 - 60
Database
ISI
SICI code
0885-6125(200101)42:1-2<31:SAEAFM>2.0.ZU;2-3
Abstract
In this paper we present SPADE, a new algorithm for fast discovery of Seque ntial Patterns. The existing solutions to this problem make repeated databa se scans, and use complex hash structures which have poor locality. SPADE u tilizes combinatorial properties to decompose the original problem into sma ller sub-problems, that can be independently solved in main-memory using ef ficient lattice search techniques, and using simple join operations. All se quences are discovered in only three database scans. Experiments show that SPADE outperforms the best previous algorithm by a factor of two, and by an order of magnitude with some pre-processed data. It also has linear scalab ility with respect to the number of input-sequences, and a number of other database parameters. Finally, we discuss how the results of sequence mining can be applied in a real application domain.