Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mining Frequent Patterns Using FP-Growth Method Ivan Tanasić Department of Computer Engineering and Computer Science, School of Electrical.

Similar presentations


Presentation on theme: "Mining Frequent Patterns Using FP-Growth Method Ivan Tanasić Department of Computer Engineering and Computer Science, School of Electrical."— Presentation transcript:

1 Mining Frequent Patterns Using FP-Growth Method Ivan Tanasić Department of Computer Engineering and Computer Science, School of Electrical Engineering, University of Belgrade

2 Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach ◦ Jiawei Han (UIUC) ◦ Jian Pei (Buffalo) ◦ Yiwen Yin (SFU) ◦ Runying Mao (Microsoft) Ivan Tanasic

3 Problem Definition Mining frequent patterns from a DB ◦ Frequent intemsets  (milk + bread) ◦ Frequent sequential patterns  (computer -> printer -> paper) ◦ Frequent structural patterns  (subgraphs, subtrees) Ivan Tanasic

4 Problem Importance 1/2 Basic DM primitive Used for mining data relationships ◦ Associations ◦ Correlations Helps with basic DM tasks ◦ Classification ◦ Clustering Ivan Tanasic

5 Problem importance 2/2 Association rules ◦ buys(“laptop”)=>buys(“mouse”) [support = 2%, confidence = 30%] Ivan Tanasic Support=% of all transactions containing that items Confidence=% of transactions containing I1 that contain I2 5/25

6 Problem Trend Apriori speedup using techniques New data structures (trees) Association rule specific algorithms Specific AR algorithms (OneR, ZeroR) FP-Growth still widely used Ivan Tanasic

7 Existing Solutions 1/3 (Apriori) Agrawal et al. (1994) AP: All nonempty subsets of a frequent itemset must also be frequent Starts from 1-itemsets Join + prune (using AP + min supp) Generates huge number of candidates Ivan Tanasic

8 Existing Solutions 2/3 (ECLAT) Zaki (2000) Equivalence CLass Transformation Vertical format: {item,TID_set} instead of {TID,itemset} Intersects TID_sets of candidates TID_sets holds support info (no scans) Still generates candidates Ivan Tanasic

9 Existing Solutions 3/3 (TreeProjection) Agarwal et al. (2001) Creates a lexicographical tree and projects db into sub-dbs based on the patterns mined so far Recursively mines subdatabases Less scalable then FP-Growth Ivan Tanasic

10 FP-Tree construction 1/6 Ivan Tanasic Desc. supp. sort Min support = 2 10/25

11 FP-Tree construction 2/6 Ivan Tanasic Desc. supp. sort T1={I2,I1,I5} 11/25

12 FP-Tree construction 3/6 Ivan Tanasic Desc. supp. sort T1 = {I2, I1, I5} T2 = {I2, I4} 12/25

13 FP-Tree construction 4/6 Ivan Tanasic Desc. supp. sort T1 = {I2, I1, I5} T2 = {I2, I4} T3 = {I2, I3} 13/25

14 FP-Tree construction 5/6 Ivan Tanasic Desc. supp. sort T1 = {I2, I1, I5} T2 = {I2, I4} T3 = {I2, I3} T4 = {I2, I1, I4} 14/25

15 FP-Tree construction 6/6 Ivan Tanasic Desc. supp. sort T1 = {I2, I1, I5} T2 = {I2, I4} T3 = {I2, I3} T4 = {I2, I1, I4} T5 = {I1, I3} T6 = {I2, I3} T7 = {I1, I3} T8 = {I2, I1, I3, I5} T9 = {I2, I1, I3} 15/25

16 Mining of the FP-Tree 1/4 Ivan Tanasic It.Conditional P. baseCond. FP-TreeFreq. Patterns Generated I5{{I2,I1:1},{I2,I1,I3:1}}{I2:2, I1:2}{I2,I5:2},{I1,I5:2},{I2,I1,I5:2} 16/25

17 Mining of the FP-Tree 2/4 Ivan Tanasic It.Conditional P. baseCond. FP-TreeFreq. Patterns Generated I5{{I2,I1:1},{I2,I1,I3:1}}{I2:2, I1:2}{I2,I5:2},{I1,I5:2},{I2,I1,I5:2} I4{{I2,I1:1},{I2:1}}{I2:2}{I2,I4:2} 17/25

18 Mining of the FP-Tree 3/4 Ivan Tanasic It.Conditional P. baseCond. FP-TreeFreq. Patterns Generated I5{{I2,I1:1},{I2,I1,I3:1}}{I2:2, I1:2}{I2,I5:2},{I1,I5:2},{I2,I1,I5:2} I4{{I2,I1:1},{I2:1}}{I2:2}{I2,I4:2} I3{{I2,I1:2},{I2:2},{I1:2}}{I2:4,I1:2},{I1:2}{I2,I3:4},{I1,I3:4},{I2,I1,I3:2} 18/25

19 Mining of the FP-Tree 4/4 Ivan Tanasic It.Conditional P. baseCond. FP-TreeFreq. Patterns Generated I5{{I2,I1:1},{I2,I1,I3:1}}{I2:2, I1:2}{I2,I5:2},{I1,I5:2},{I2,I1,I5:2} I4{{I2,I1:1},{I2:1}}{I2:2}{I2,I4:2} I3{{I2,I1:2},{I2:2},{I1:2}{I2:4,I1:2},{I1:2}{I2,I3:4},{I1,I3:4},{I2,I1,I3:2} I1{{I2:4}}{I2:4}{I2,I1:4} 19/25

20 How much batter is it 1/3? Ivan Tanasic Runtime on sparse data: 20/25

21 How much batter is it 2/3? Runtime on mixed data: Ivan Tanasic

22 How much batter is it 3/3? Compactness: Ivan Tanasic

23 Is it Original? A lot of methods try to improve Apriori ◦ Hashing ◦ Transaction reduction ◦ Partitioning ◦ Sampling TreeProjection uses similar structure, but it is still a different method Ivan Tanasic

24 Importance over time Basic primitive (strong foundation for tall building) Performance gets very important as databases are getting huge Scalability also FP-Growth has both performance and scalability Ivan Tanasic

25 Conclusion An important method for solving important DM tasks Fast Compact Scalable (db projection/tree on disk) Ivan Tanasic

26 Mining Frequent Patterns Using FPGrowth Method Ivan Tanasić Department of Computer Engineering and Computer Science, School of Electrical Engineering, University of Belgrade


Download ppt "Mining Frequent Patterns Using FP-Growth Method Ivan Tanasić Department of Computer Engineering and Computer Science, School of Electrical."

Similar presentations


Ads by Google