Mining Frequent Patterns without Candidate Generation.

Slides:

Advertisements

Similar presentations

Mining Association Rules

Advertisements

Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.

CSE 634 Data Mining Techniques

Association rules and frequent itemsets mining

Graph Mining Laks V.S. Lakshmanan

The FP-Growth/Apriori Debate Jeffrey R. Ellis CSE 300 – 01 April 11, 2002.

732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña FP grow algorithm Correlation analysis.

FP-Growth algorithm Vasiljevic Vladica,

FP (FREQUENT PATTERN)-GROWTH ALGORITHM ERTAN LJAJIĆ, 3392/2013 Elektrotehnički fakultet Univerziteta u Beogradu.

Data Mining Association Analysis: Basic Concepts and Algorithms

FPtree/FPGrowth (Complete Example). First scan – determine frequent 1- itemsets, then build header B8 A7 C7 D5 E3.

CPS : Information Management and Mining

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

Data Mining: Association Rule Mining CSE880: Database Systems.

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Association rules Apriori algorithm FP grow algorithm.

Data Mining Association Analysis: Basic Concepts and Algorithms

2015年5月25日星期一 2015年5月25日星期一 2015年5月25日星期一 Data Mining: Concepts and Techniques1 Chapter 5: Mining Frequent Patterns, Association and Correlations Basic.

Mining Association Rules in Large Databases

FP-growth. Challenges of Frequent Pattern Mining Improving Apriori Fp-growth Fp-tree Mining frequent patterns with FP-tree Visualization of Association.

Data Mining Association Analysis: Basic Concepts and Algorithms

1 Mining Frequent Patterns Without Candidate Generation Apriori-like algorithm suffers from long patterns or quite low minimum support thresholds. Two.

Mining Frequent patterns without candidate generation Jiawei Han, Jian Pei and Yiwen Yin.

Association Analysis: Basic Concepts and Algorithms.

Association Rule Mining. Generating assoc. rules from frequent itemsets  Assume that we have discovered the frequent itemsets and their support  How.

Data Mining Association Analysis: Basic Concepts and Algorithms

FPtree/FPGrowth. FP-Tree/FP-Growth Algorithm Use a compressed representation of the database using an FP-tree Then use a recursive divide-and-conquer.

Frequent-Pattern Tree. 2 Bottleneck of Frequent-pattern Mining  Multiple database scans are costly  Mining long patterns needs many passes of scanning.

Association Analysis (3). FP-Tree/FP-Growth Algorithm Use a compressed representation of the database using an FP-tree Once an FP-tree has been constructed,

1 1 Data Mining: Concepts and Techniques (3 rd ed.) — Chapter 6 — Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign.

© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.

SEG Tutorial 2 – Frequent Pattern Mining.

Chapter 5 Mining Association Rules with FP Tree Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.

Ch5 Mining Frequent Patterns, Associations, and Correlations

Mining Frequent Patterns without Candidate Generation Presented by Song Wang. March 18 th, 2009 Data Mining Class Slides Modified From Mohammed and Zhenyu’s.

Methods to Improve Apriori’s Efficiency  Hash-based itemset counting: A k-itemset whose corresponding hashing bucket count is below the threshold cannot.

Jiawei Han, Jian Pei, and Yiwen Yin School of Computing Science Simon Fraser University Mining Frequent Patterns without Candidate Generation SIGMOD 2000.

AR mining Implementation and comparison of three AR mining algorithms Xuehai Wang, Xiaobo Chen, Shen chen CSCI6405 class project.

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar.

Data Mining Frequent-Pattern Tree Approach Towards ARM Lecture

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

EFFICIENT ITEMSET EXTRACTION USING IMINE INDEX By By U.P.Pushpavalli U.P.Pushpavalli II Year ME(CSE) II Year ME(CSE)

Chapter 4 – Frequent Pattern Mining Shuaiqiang Wang ( 王帅强 ) School of Computer Science and Technology Shandong University of Finance and Economics Homepage:

Mining Frequent Patterns without Candidate Generation : A Frequent-Pattern Tree Approach 指導教授：廖述賢博士報告人：朱佩慧班級：管科所博一.

Frequent Pattern  交易資料庫中頻繁的被一起購買的產品  可以做為推薦產品、銷售決策的依據  兩大演算法 Apriori FP-Tree.

Parallel Mining Frequent Patterns: A Sampling-based Approach Shengnan Cong.

Frequent Item Mining. What is data mining? =Pattern Mining? What patterns? Why are they useful?

Frequent itemset mining and temporal extensions Sunita Sarawagi

KDD’09,June 28-July 1,2009,Paris,France Copyright 2009 ACM Frequent Pattern Mining with Uncertain Data.

CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.

1 Inverted Matrix: Efficient Discovery of Frequent Items in Large Datasets in the Context of Interactive Mining -SIGKDD’03 Mohammad El-Hajj, Osmar R. Zaïane.

1 Data Mining: Mining Frequent Patterns, Association and Correlations.

Association Analysis (3)

Reducing Number of Candidates Apriori principle: – If an itemset is frequent, then all of its subsets must also be frequent Apriori principle holds due.

2016年6月14日星期二 2016年6月14日星期二 2016年6月14日星期二 Data Mining: Concepts and Techniques1 Mining Frequent Patterns, Associations, and Correlations (Chapter 5)

Association Rule Mining CENG 514 Data Mining July 2,

MapReduce MapReduce is one of the most popular distributed programming models Model has two phases: Map Phase: Distributed processing based on key, value.

Reducing Number of Candidates

Data Mining: Concepts and Techniques

Market Basket Analysis and Association Rules

Mining Association Rules in Large Databases

Association Rule Mining

Mining Access Pattrens Efficiently from Web Logs Jian Pei, Jiawei Han, Behzad Mortazavi-asl, and Hua Zhu 2000년 5월 26일 DE Lab. 윤지영.

Find Patterns Having P From P-conditional Database

COMP5331 FP-Tree Prepared by Raymond Wong Presented by Raymond Wong

732A02 Data Mining - Clustering and Association Analysis

Mining Frequent Patterns without Candidate Generation

Frequent-Pattern Tree

FP-Growth Wenlong Zhang.

Mining Association Rules in Large Databases

Presentation transcript:

Mining Frequent Patterns without Candidate Generation

2 Frequent Pattern Mining Given a transaction database DB and a minimum support threshold ξ, find all frequent patterns (item sets) with support no less than ξ. Problem TIDItems bought 100{f, a, c, d, g, i, m, p} 200{a, b, c, f, l, m, o} 300 {b, f, h, j, o} 400 {b, c, k, s, p} 500 {a, f, c, e, l, p, m, n} DB: Minimum support: ξ =3 Input: Output : all frequent patterns, i.e., f, a, …, fa, fac, fam, … Problem: How to efficiently find all frequent patterns?

3 Outline Review: –Apriori-like methods Overview: –FP-tree based mining method FP-tree: –Construction, structure and advantages FP-growth: –FP-tree  conditional pattern bases  conditional FP-tree  frequent patterns Experiments Discussion: –Improvement of FP-growth Conclusion

4 The core of the Apriori algorithm: –Use frequent (k – 1)-itemsets (L k-1 ) to generate candidates of frequent k-itemsets C k –Scan database and count each pattern in C k, get frequent k- itemsets ( L k ). E.g., Review TIDItems bought 100{f, a, c, d, g, i, m, p} 200{a, b, c, f, l, m, o} 300 {b, f, h, j, o} 400 {b, c, k, s, p} 500 {a, f, c, e, l, p, m, n} Apriori iteration C1f,a,c,d,g,i,m,p,l,o,h,j,k,s,b,e,n L1f, a, c, m, b, p C2 fa, fc, fm, fp, ac, am, …bp L2 fa, fc, fm, … … Apriori

5 Performance Bottlenecks of Apriori The bottleneck of Apriori: candidate generation –Huge candidate sets: 10 4 frequent 1-itemset will generate 10 7 candidate 2- itemsets To discover a frequent pattern of size 100, e.g., {a 1, a 2, …, a 100 }, one needs to generate  candidates. –Multiple scans of database Review

6 Observations Perform one scan of DB to identify the sets of frequent items Avoid repeatedly scanning DB by using some compact structure. E.g., 1) merge transactions having identical frequent item sets; 2) merge the shared prefix among different frequent item sets. Overview: FP-tree based method TIDItems bought 100{f, a, c, d, g, i, m, p} 200{f, a, c, d, g, i, m, p} 300 {f, a, c, d, e, f, g, h} TIDItems bought N1{f, a, c, d, g, i, m, p}: {f, a, c, d, e, f, g, h} TIDItems bought N1 {i, m, p} : 1 {f, a, c, d, g} : 2 N2 {e, f, g, h}: 1 1) 2)

7 Compress a large database into a compact, Frequent- Pattern tree (FP-tree) structure –highly condensed, but complete for frequent pattern mining –avoid costly database scans Develop an efficient, FP-tree-based frequent pattern mining method (FP-growth) –A divide-and-conquer methodology: decompose mining tasks into smaller ones –Avoid candidate generation: sub-database test only. Ideas Overview: FP-tree based method

FP-tree: Design and Construction

9 Construct FP-tree 2 Steps: 1.Scan the transaction DB once, find frequent 1- itemsets (single item patterns) and order them into a list L in frequency descending order. e.g., L=[f:4, c:4, a:3, b:3, m:3, p:3} 2.Scan DB again, construct FP-tree by processing each transaction in L order. FP-tree

10 Construction Example {} f:4c:1 b:1 p:1 b:1c:3 a:3 b:1m:2 p:2m:1 Header Table Item frequency head f4 c4 a3 b3 m3 p3 TIDItems bought (ordered) frequent items 100{f, a, c, d, g, i, m, p}{f, c, a, m, p} 200{a, b, c, f, l, m, o}{f, c, a, b, m} 300 {b, f, h, j, o}{f, b} 400 {b, c, k, s, p}{c, b, p} 500 {a, f, c, e, l, p, m, n}{f, c, a, m, p} FP-tree L

11 Advantages of the FP-tree Structure The most significant advantage of the FP-tree –Scan the DB only twice. Completeness: –the FP-tree contains all the information related to mining frequent patterns (given the min_support threshold) Compactness: –The size of the tree is bounded by the occurrences of frequent items –The height of the tree is bounded by the maximum number of items in a transaction FP-tree

FP-growth: Mining Frequent Patterns Using FP-tree

13 Mining Frequent Patterns Using FP-tree General idea (divide-and-conquer) Recursively grow frequent patterns using the FP- tree: looking for shorter ones recursively and then concatenating the suffix: –For each frequent item, construct its conditional pattern base, and then its conditional FP-tree; –Repeat the process on each newly created conditional FP-tree until the resulting FP-tree is empty, or it contains only one path (single path will generate all the combinations of its sub-paths, each of which is a frequent pattern) FP-Growth

14 3 Major Steps Starting the processing from the end of list L: Step 1: Construct conditional pattern base for each item in the header table Step 2 Construct conditional FP-tree from each conditional pattern base Step 3 Recursively mine conditional FP-trees and grow frequent patterns obtained so far. If the conditional FP-tree contains a single path, simply enumerate all the patterns FP-Growth

15 Step 1: Construct Conditional Pattern Base Starting at the frequent-item header table in the FP-tree Traverse the FP-tree by following the link of each frequent item Accumulate all of transformed prefix paths of that item to form a conditional pattern base Conditional pattern bases itemcond. pattern base pfcam:2, cb:1 mfca:2, fcab:1 bfca:1, f:1, c:1 afc:3 cf:3 {} f:4c:1 b:1 p:1 b:1c:3 a:3 b:1m:2 p:2m:1 Header Table Item frequency head f4 c4 a3 b3 m3 p3 FP-Growth

16 Properties of Step 1 Node-link property –For any frequent item a i, all the possible frequent patterns that contain a i can be obtained by following a i 's node-links, starting from a i 's head in the FP-tree header. Prefix path property –To calculate the frequent patterns for a node a i in a path P, only the prefix sub-path of a i in P need to be accumulated, and its frequency count should carry the same count as node a i. FP-Growth

17 Step 2: Construct Conditional FP-tree For each pattern base –Accumulate the count for each item in the base –Construct the conditional FP-tree for the frequent items of the pattern base m-conditional pattern base: fca:2, fcab:1 {} f:3 c:3 a:3 m-conditional FP-tree  {} f:4 c:3 a:3 b:1m:2 m:1 Header Table Item frequency head f4 c4 a3 b3 m3 p3 FP-Growth 

18 Conditional Pattern Bases and Conditional FP-Tree Empty f {(f:3)}|c{(f:3)}c {(f:3, c:3)}|a{(fc:3)}a Empty{(fca:1), (f:1), (c:1)}b {(f:3, c:3, a:3)}|m{(fca:2), (fcab:1)}m {(c:3)}|p{(fcam:2), (cb:1)}p Conditional FP-treeConditional pattern base Item FP-Growth order of L

19 Step 3: Recursively mine the conditional FP-tree {} f:3 c:3 a:3 m-conditional FP-tree Cond. pattern base of “am”: (fc:3) {} f:3 c:3 am-conditional FP-tree Cond. pattern base of “cm”: (f:3) {} f:3 cm-conditional FP-tree Cond. pattern base of “cam”: (f:3) {} f:3 cam-conditional FP-tree FP-Growth Cond. pattern base of “fm”: 3 Cond. pattern base of “fam”: 3 Cond. pattern base of “m”: (fca:3) add “a” add “c” add “f” add “c” add “f” Frequent Pattern “fcam” Frequent Pattern

20 Principles of FP-Growth Pattern growth property –Let  be a frequent itemset in DB, B be  's conditional pattern base, and  be an itemset in B. Then    is a frequent itemset in DB iff  is frequent in B. Is “fcabm ” a frequent pattern? –“fcab” is a branch of m's conditional pattern base –“b” is NOT frequent in transactions containing “fcab ” –“bm” is NOT a frequent itemset. FP-Growth

21 Single FP-tree Path Generation Suppose an FP-tree T has a single path P. The complete set of frequent pattern of T can be generated by enumeration of all the combinations of the sub-paths of P {} f:3 c:3 a:3 m-conditional FP-tree All frequent patterns concerning m m, fm, cm, am, fcm, fam, cam, fcam  FP-Growth

22 Efficiency Analysis Facts: usually 1.FP-tree is much smaller than the size of the DB 2.Pattern base is smaller than original FP-tree 3.Conditional FP-tree is smaller than pattern base  mining process works on a set of usually much smaller pattern bases and conditional FP-trees  Divide-and-conquer and dramatic scale of shrinking FP-Growth