Mining frequent patterns without candidate generation

Citation
Jw. Han et al., Mining frequent patterns without candidate generation, SIG RECORD, 29(2), 2000, pp. 1-12
Citations number
20
Categorie Soggetti
Computer Science & Engineering
Journal title
SIGMOD RECORD
ISSN journal
01635808 → ACNP
Volume
29
Issue
2
Year of publication
2000
Pages
1 - 12
Database
ISI
SICI code
0163-5808(200006)29:2<1:MFPWCG>2.0.ZU;2-Q
Abstract
Mining frequent patterns in transaction databases, time-series databases, a nd many other kinds of databases has been studied popularly in data mining research. Most of the previous studies adopt an Apriori-like candidate set generation-and-test approach. However, candidate set generation is still co stly, especially when there exist prolific patterns and/or long patterns. In this study, we propose a novel frequent pattern tree (FP-tree) structure , which is an extended prefix-tree structure for storing compressed, crucia l information about frequent patterns, and develop an efficient FP-tree-bas ed mining method, FP-growth, for mining the complete set of frequent patter ns by pattern fragment growth. Efficiency of mining is achieved with three techniques: (1) a large database is compressed into a highly condensed, muc h smaller data structure, which avoids costly, repeated database scans, (2) our FP-tree-based mining adopts a pattern fragment growth method to avoid the costly generation of a large number of candidate sets, and (3) a partit ioning-based, divide-and-conquer method is used to decompose the mining tas k into a set of smaller tasks for mining confined patterns in conditional d atabases, which dramatically reduces the search space. Our performance stud y shows that the FP-growth method is efficient and scalable for mining both long and short frequent patterns, and is about an order of magnitude faste r than the Apriori algorithm and also faster than some recently reported ne w frequent pattern mining methods.