ITA
ENG

Mining frequent patterns without candidate generation

Authors

Han, JW Pei, J Yin, YW

Citation

Jw. Han et al., Mining frequent patterns without candidate generation, SIG RECORD, 29(2), 2000, pp. 1-12

Citations number

Categorie Soggetti

Computer Science & Engineering

Journal title

SIGMOD RECORD

ISSN journal

01635808 → ACNP

Volume

Issue

Year of publication

2000

Pages

1 - 12

Database

ISI

SICI code

0163-5808(200006)29:2<1:MFPWCG>2.0.ZU;2-Q

Abstract

Mining frequent patterns in transaction databases, time-series databases, a nd many other kinds of databases has been studied popularly in data mining research. Most of the previous studies adopt an Apriori-like candidate set generation-and-test approach. However, candidate set generation is still co stly, especially when there exist prolific patterns and/or long patterns. In this study, we propose a novel frequent pattern tree (FP-tree) structure , which is an extended prefix-tree structure for storing compressed, crucia l information about frequent patterns, and develop an efficient FP-tree-bas ed mining method, FP-growth, for mining the complete set of frequent patter ns by pattern fragment growth. Efficiency of mining is achieved with three techniques: (1) a large database is compressed into a highly condensed, muc h smaller data structure, which avoids costly, repeated database scans, (2) our FP-tree-based mining adopts a pattern fragment growth method to avoid the costly generation of a large number of candidate sets, and (3) a partit ioning-based, divide-and-conquer method is used to decompose the mining tas k into a set of smaller tasks for mining confined patterns in conditional d atabases, which dramatically reduces the search space. Our performance stud y shows that the FP-growth method is efficient and scalable for mining both long and short frequent patterns, and is about an order of magnitude faste r than the Apriori algorithm and also faster than some recently reported ne w frequent pattern mining methods.