Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Top Down FP-Growth for Association Rule Mining Ke Wang Liu Tang Jiawei Han Junqiang Liu Simon Fraser University.

Similar presentations


Presentation on theme: "1 Top Down FP-Growth for Association Rule Mining Ke Wang Liu Tang Jiawei Han Junqiang Liu Simon Fraser University."— Presentation transcript:

1 1 Top Down FP-Growth for Association Rule Mining Ke Wang Liu Tang Jiawei Han Junqiang Liu Simon Fraser University

2 2 Introduction Association rule A  B : –A and B: sets of items –support: count(AB ) (# of transaction containing AB) Frequent: >= minimum support –confidence: count( AB ) / count(A ) Confident:>= minimum confidence Input: a set of transactions find all frequent patterns AB and A

3 3 TD-FP-Growth for frequent pattern mining Similar prefix tree as FP-tree –Items in transactions are sorted –Transactions share prefix as much as possible FP-growth: bottom-up mining TD-FP-Growth : top-down mining

4 4 b, e a, b, c, e b, c, e a, c, d a minsup = 2 itemHead of node-link abceabce FP-Growth: Bottom-up minig b: 2 root b: 1c: 1 a: 3 e: 1 c: 1e: 1 c: 1 e: 1 (b: 1) (b: 1, c: 1) (a: 1, b: 1, c: 1) e’s conditional pattern base Mining order: e, c, b, a

5 5 FP-Growth: Bottom-up mining (b: 1) (b: 1, c: 1) (a: 1, b: 1, c: 1) root b: 3 c: 2 itemHead of node-link bcbc  drawback! e’s conditional FP-tree must be created separately because counts at upper levels are modified.

6 6 FP-Growth: Top-down mining (TD-FP-Growth) process nodes at upper level first counts modified at upper level are not used at lower level reuse the paths in the original FP-tree for conditional pattern FP-trees See example 

7 7 TD-FP-Growth b, e a, b, c, e b, c, e a, c, d a minsup = 2 CT-tree and header table H Entry valuecountside-link abceabce 33333333 b: 2 root b: 1c: 1 a: 3 e: 1 c: 1e: 1 c: 1 e: 1 Mining order: a, b, c, e

8 8 TD-FP-Growth b, e a, b, c, e b, c, e a, c, d a minsup = 2 a: 2 b: 1 CT-tree and header table H b: 2 root b: 1c: 1 a: 3 e: 1 c: 1e: 1 c: 1 e: 1 sub-header-table H_c Entry valuecountside-link abab 2222 Entry valuecountside-link abceabce 33333333

9 9 Performance Data sets from UC_Irvine Machine Learning Database Repository: h ttp://www.ics.uci.edu/~mlearn/MLRepository.html. name of dataset # of transactions # of items in each transaction class distribution # of distinct items Dna-train200061 23.2%, 24.25%, 52.55% 240 Connect-4 6755743 9.55%, 24.62%, 65.83% 126 Forest58101213 0.47%, 1.63%, 2.99%, 3.53%, 6.15%, 36.36%, 48.76% 15916

10 10 Performance


Download ppt "1 Top Down FP-Growth for Association Rule Mining Ke Wang Liu Tang Jiawei Han Junqiang Liu Simon Fraser University."

Similar presentations


Ads by Google