Download presentation

Presentation is loading. Please wait.

Published byDashawn Uselton Modified about 1 year ago

1
Mining Frequent Patterns Using FP-Growth Method Ivan Tanasić Department of Computer Engineering and Computer Science, School of Electrical Engineering, University of Belgrade

2
Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach ◦ Jiawei Han (UIUC) ◦ Jian Pei (Buffalo) ◦ Yiwen Yin (SFU) ◦ Runying Mao (Microsoft) Ivan Tanasic

3
Problem Definition Mining frequent patterns from a DB ◦ Frequent intemsets (milk + bread) ◦ Frequent sequential patterns (computer -> printer -> paper) ◦ Frequent structural patterns (subgraphs, subtrees) Ivan Tanasic

4
Problem Importance 1/2 Basic DM primitive Used for mining data relationships ◦ Associations ◦ Correlations Helps with basic DM tasks ◦ Classification ◦ Clustering Ivan Tanasic

5
Problem importance 2/2 Association rules ◦ buys(“laptop”)=>buys(“mouse”) [support = 2%, confidence = 30%] Ivan Tanasic Support=% of all transactions containing that items Confidence=% of transactions containing I1 that contain I2 5/25

6
Problem Trend Apriori speedup using techniques New data structures (trees) Association rule specific algorithms Specific AR algorithms (OneR, ZeroR) FP-Growth still widely used Ivan Tanasic

7
Existing Solutions 1/3 (Apriori) Agrawal et al. (1994) AP: All nonempty subsets of a frequent itemset must also be frequent Starts from 1-itemsets Join + prune (using AP + min supp) Generates huge number of candidates Ivan Tanasic

8
Existing Solutions 2/3 (ECLAT) Zaki (2000) Equivalence CLass Transformation Vertical format: {item,TID_set} instead of {TID,itemset} Intersects TID_sets of candidates TID_sets holds support info (no scans) Still generates candidates Ivan Tanasic

9
Existing Solutions 3/3 (TreeProjection) Agarwal et al. (2001) Creates a lexicographical tree and projects db into sub-dbs based on the patterns mined so far Recursively mines subdatabases Less scalable then FP-Growth Ivan Tanasic

10
FP-Tree construction 1/6 Ivan Tanasic Desc. supp. sort Min support = 2 10/25

11
FP-Tree construction 2/6 Ivan Tanasic Desc. supp. sort T1={I2,I1,I5} 11/25

12
FP-Tree construction 3/6 Ivan Tanasic Desc. supp. sort T1 = {I2, I1, I5} T2 = {I2, I4} 12/25

13
FP-Tree construction 4/6 Ivan Tanasic Desc. supp. sort T1 = {I2, I1, I5} T2 = {I2, I4} T3 = {I2, I3} 13/25

14
FP-Tree construction 5/6 Ivan Tanasic Desc. supp. sort T1 = {I2, I1, I5} T2 = {I2, I4} T3 = {I2, I3} T4 = {I2, I1, I4} 14/25

15
FP-Tree construction 6/6 Ivan Tanasic Desc. supp. sort T1 = {I2, I1, I5} T2 = {I2, I4} T3 = {I2, I3} T4 = {I2, I1, I4} T5 = {I1, I3} T6 = {I2, I3} T7 = {I1, I3} T8 = {I2, I1, I3, I5} T9 = {I2, I1, I3} 15/25

16
Mining of the FP-Tree 1/4 Ivan Tanasic It.Conditional P. baseCond. FP-TreeFreq. Patterns Generated I5{{I2,I1:1},{I2,I1,I3:1}}{I2:2, I1:2}{I2,I5:2},{I1,I5:2},{I2,I1,I5:2} 16/25

17
Mining of the FP-Tree 2/4 Ivan Tanasic It.Conditional P. baseCond. FP-TreeFreq. Patterns Generated I5{{I2,I1:1},{I2,I1,I3:1}}{I2:2, I1:2}{I2,I5:2},{I1,I5:2},{I2,I1,I5:2} I4{{I2,I1:1},{I2:1}}{I2:2}{I2,I4:2} 17/25

18
Mining of the FP-Tree 3/4 Ivan Tanasic It.Conditional P. baseCond. FP-TreeFreq. Patterns Generated I5{{I2,I1:1},{I2,I1,I3:1}}{I2:2, I1:2}{I2,I5:2},{I1,I5:2},{I2,I1,I5:2} I4{{I2,I1:1},{I2:1}}{I2:2}{I2,I4:2} I3{{I2,I1:2},{I2:2},{I1:2}}{I2:4,I1:2},{I1:2}{I2,I3:4},{I1,I3:4},{I2,I1,I3:2} 18/25

19
Mining of the FP-Tree 4/4 Ivan Tanasic It.Conditional P. baseCond. FP-TreeFreq. Patterns Generated I5{{I2,I1:1},{I2,I1,I3:1}}{I2:2, I1:2}{I2,I5:2},{I1,I5:2},{I2,I1,I5:2} I4{{I2,I1:1},{I2:1}}{I2:2}{I2,I4:2} I3{{I2,I1:2},{I2:2},{I1:2}{I2:4,I1:2},{I1:2}{I2,I3:4},{I1,I3:4},{I2,I1,I3:2} I1{{I2:4}}{I2:4}{I2,I1:4} 19/25

20
How much batter is it 1/3? Ivan Tanasic Runtime on sparse data: 20/25

21
How much batter is it 2/3? Runtime on mixed data: Ivan Tanasic

22
How much batter is it 3/3? Compactness: Ivan Tanasic

23
Is it Original? A lot of methods try to improve Apriori ◦ Hashing ◦ Transaction reduction ◦ Partitioning ◦ Sampling TreeProjection uses similar structure, but it is still a different method Ivan Tanasic

24
Importance over time Basic primitive (strong foundation for tall building) Performance gets very important as databases are getting huge Scalability also FP-Growth has both performance and scalability Ivan Tanasic

25
Conclusion An important method for solving important DM tasks Fast Compact Scalable (db projection/tree on disk) Ivan Tanasic

26
Mining Frequent Patterns Using FPGrowth Method Ivan Tanasić Department of Computer Engineering and Computer Science, School of Electrical Engineering, University of Belgrade

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google