Mining frequent patterns in transaction databases, time-series databases, a
nd many other kinds of databases has been studied popularly in data mining
research. Most of the previous studies adopt an Apriori-like candidate set
generation-and-test approach. However, candidate set generation is still co
stly, especially when there exist prolific patterns and/or long patterns.
In this study, we propose a novel frequent pattern tree (FP-tree) structure
, which is an extended prefix-tree structure for storing compressed, crucia
l information about frequent patterns, and develop an efficient FP-tree-bas
ed mining method, FP-growth, for mining the complete set of frequent patter
ns by pattern fragment growth. Efficiency of mining is achieved with three
techniques: (1) a large database is compressed into a highly condensed, muc
h smaller data structure, which avoids costly, repeated database scans, (2)
our FP-tree-based mining adopts a pattern fragment growth method to avoid
the costly generation of a large number of candidate sets, and (3) a partit
ioning-based, divide-and-conquer method is used to decompose the mining tas
k into a set of smaller tasks for mining confined patterns in conditional d
atabases, which dramatically reduces the search space. Our performance stud
y shows that the FP-growth method is efficient and scalable for mining both
long and short frequent patterns, and is about an order of magnitude faste
r than the Apriori algorithm and also faster than some recently reported ne
w frequent pattern mining methods.