Download presentation

Presentation is loading. Please wait.

Published byDashawn Uselton Modified over 2 years ago

1
Mining Frequent Patterns Using FP-Growth Method Ivan Tanasić (itanasic@gmail.com) Department of Computer Engineering and Computer Science, School of Electrical Engineering, University of Belgrade

2
Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach ◦ Jiawei Han (UIUC) ◦ Jian Pei (Buffalo) ◦ Yiwen Yin (SFU) ◦ Runying Mao (Microsoft) Ivan Tanasic (itanasic@gmail.com)2/25

3
Problem Definition Mining frequent patterns from a DB ◦ Frequent intemsets (milk + bread) ◦ Frequent sequential patterns (computer -> printer -> paper) ◦ Frequent structural patterns (subgraphs, subtrees) Ivan Tanasic (itanasic@gmail.com)3/25

4
Problem Importance 1/2 Basic DM primitive Used for mining data relationships ◦ Associations ◦ Correlations Helps with basic DM tasks ◦ Classification ◦ Clustering Ivan Tanasic (itanasic@gmail.com)4/25

5
Problem importance 2/2 Association rules ◦ buys(“laptop”)=>buys(“mouse”) [support = 2%, confidence = 30%] Ivan Tanasic (itanasic@gmail.com) Support=% of all transactions containing that items Confidence=% of transactions containing I1 that contain I2 5/25

6
Problem Trend Apriori speedup using techniques New data structures (trees) Association rule specific algorithms Specific AR algorithms (OneR, ZeroR) FP-Growth still widely used Ivan Tanasic (itanasic@gmail.com)6/25

7
Existing Solutions 1/3 (Apriori) Agrawal et al. (1994) AP: All nonempty subsets of a frequent itemset must also be frequent Starts from 1-itemsets Join + prune (using AP + min supp) Generates huge number of candidates Ivan Tanasic (itanasic@gmail.com)7/25

8
Existing Solutions 2/3 (ECLAT) Zaki (2000) Equivalence CLass Transformation Vertical format: {item,TID_set} instead of {TID,itemset} Intersects TID_sets of candidates TID_sets holds support info (no scans) Still generates candidates Ivan Tanasic (itanasic@gmail.com)8/25

9
Existing Solutions 3/3 (TreeProjection) Agarwal et al. (2001) Creates a lexicographical tree and projects db into sub-dbs based on the patterns mined so far Recursively mines subdatabases Less scalable then FP-Growth Ivan Tanasic (itanasic@gmail.com)9/25

10
FP-Tree construction 1/6 Ivan Tanasic (itanasic@gmail.com) Desc. supp. sort Min support = 2 10/25

11
FP-Tree construction 2/6 Ivan Tanasic (itanasic@gmail.com) Desc. supp. sort T1={I2,I1,I5} 11/25

12
FP-Tree construction 3/6 Ivan Tanasic (itanasic@gmail.com) Desc. supp. sort T1 = {I2, I1, I5} T2 = {I2, I4} 12/25

13
FP-Tree construction 4/6 Ivan Tanasic (itanasic@gmail.com) Desc. supp. sort T1 = {I2, I1, I5} T2 = {I2, I4} T3 = {I2, I3} 13/25

14
FP-Tree construction 5/6 Ivan Tanasic (itanasic@gmail.com) Desc. supp. sort T1 = {I2, I1, I5} T2 = {I2, I4} T3 = {I2, I3} T4 = {I2, I1, I4} 14/25

15
FP-Tree construction 6/6 Ivan Tanasic (itanasic@gmail.com) Desc. supp. sort T1 = {I2, I1, I5} T2 = {I2, I4} T3 = {I2, I3} T4 = {I2, I1, I4} T5 = {I1, I3} T6 = {I2, I3} T7 = {I1, I3} T8 = {I2, I1, I3, I5} T9 = {I2, I1, I3} 15/25

16
Mining of the FP-Tree 1/4 Ivan Tanasic (itanasic@gmail.com) It.Conditional P. baseCond. FP-TreeFreq. Patterns Generated I5{{I2,I1:1},{I2,I1,I3:1}}{I2:2, I1:2}{I2,I5:2},{I1,I5:2},{I2,I1,I5:2} 16/25

17
Mining of the FP-Tree 2/4 Ivan Tanasic (itanasic@gmail.com) It.Conditional P. baseCond. FP-TreeFreq. Patterns Generated I5{{I2,I1:1},{I2,I1,I3:1}}{I2:2, I1:2}{I2,I5:2},{I1,I5:2},{I2,I1,I5:2} I4{{I2,I1:1},{I2:1}}{I2:2}{I2,I4:2} 17/25

18
Mining of the FP-Tree 3/4 Ivan Tanasic (itanasic@gmail.com) It.Conditional P. baseCond. FP-TreeFreq. Patterns Generated I5{{I2,I1:1},{I2,I1,I3:1}}{I2:2, I1:2}{I2,I5:2},{I1,I5:2},{I2,I1,I5:2} I4{{I2,I1:1},{I2:1}}{I2:2}{I2,I4:2} I3{{I2,I1:2},{I2:2},{I1:2}}{I2:4,I1:2},{I1:2}{I2,I3:4},{I1,I3:4},{I2,I1,I3:2} 18/25

19
Mining of the FP-Tree 4/4 Ivan Tanasic (itanasic@gmail.com) It.Conditional P. baseCond. FP-TreeFreq. Patterns Generated I5{{I2,I1:1},{I2,I1,I3:1}}{I2:2, I1:2}{I2,I5:2},{I1,I5:2},{I2,I1,I5:2} I4{{I2,I1:1},{I2:1}}{I2:2}{I2,I4:2} I3{{I2,I1:2},{I2:2},{I1:2}{I2:4,I1:2},{I1:2}{I2,I3:4},{I1,I3:4},{I2,I1,I3:2} I1{{I2:4}}{I2:4}{I2,I1:4} 19/25

20
How much batter is it 1/3? Ivan Tanasic (itanasic@gmail.com) Runtime on sparse data: 20/25

21
How much batter is it 2/3? Runtime on mixed data: Ivan Tanasic (itanasic@gmail.com)21/25

22
How much batter is it 3/3? Compactness: Ivan Tanasic (itanasic@gmail.com)22/25

23
Is it Original? A lot of methods try to improve Apriori ◦ Hashing ◦ Transaction reduction ◦ Partitioning ◦ Sampling TreeProjection uses similar structure, but it is still a different method Ivan Tanasic (itanasic@gmail.com)23/25

24
Importance over time Basic primitive (strong foundation for tall building) Performance gets very important as databases are getting huge Scalability also FP-Growth has both performance and scalability Ivan Tanasic (itanasic@gmail.com)24/25

25
Conclusion An important method for solving important DM tasks Fast Compact Scalable (db projection/tree on disk) Ivan Tanasic (itanasic@gmail.com)25/25

26
Mining Frequent Patterns Using FPGrowth Method Ivan Tanasić (itanasic@gmail.com) Department of Computer Engineering and Computer Science, School of Electrical Engineering, University of Belgrade

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on different solid figures names Ppt on tcp ip protocol Ppt on game theory wiki Download ppt on types of operating system Upload and view ppt online student Unlock ppt online free Ppt on dynamic web pages Ppt on needle stick injury pictures Ppt on peace and nonviolence Ppt on roman numerals