Mining Frequent Patterns without Candidate Generation : A Frequent-Pattern Tree Approach 指導教授:廖述賢博士 報 告 人:朱 佩 慧 班 級:管科所博一.

Slides:



Advertisements
Similar presentations
Mining Association Rules
Advertisements

Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Mining Frequent Patterns Using FP-Growth Method Ivan Tanasić Department of Computer Engineering and Computer Science, School of Electrical.
CSE 634 Data Mining Techniques
Graph Mining Laks V.S. Lakshmanan
732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña FP grow algorithm Correlation analysis.
FP-Growth algorithm Vasiljevic Vladica,
FP (FREQUENT PATTERN)-GROWTH ALGORITHM ERTAN LJAJIĆ, 3392/2013 Elektrotehnički fakultet Univerziteta u Beogradu.
Data Mining Association Analysis: Basic Concepts and Algorithms
FPtree/FPGrowth (Complete Example). First scan – determine frequent 1- itemsets, then build header B8 A7 C7 D5 E3.
CPS : Information Management and Mining
Rakesh Agrawal Ramakrishnan Srikant
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining: Association Rule Mining CSE880: Database Systems.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Association rules Apriori algorithm FP grow algorithm.
Data Mining Association Analysis: Basic Concepts and Algorithms
Mining Association Rules in Large Databases
Data Mining Association Analysis: Basic Concepts and Algorithms
FP-growth. Challenges of Frequent Pattern Mining Improving Apriori Fp-growth Fp-tree Mining frequent patterns with FP-tree Visualization of Association.
FP-Tree/FP-Growth Practice. FP-tree construction null B:1 A:1 After reading TID=1: After reading TID=2: null B:2 A:1 C:1 D:1.
Data Mining Association Analysis: Basic Concepts and Algorithms
1 Association Rule Mining Instructor Qiang Yang Slides from Jiawei Han and Jian Pei And from Introduction to Data Mining By Tan, Steinbach, Kumar.
1 Mining Frequent Patterns Without Candidate Generation Apriori-like algorithm suffers from long patterns or quite low minimum support thresholds. Two.
Mining Frequent patterns without candidate generation Jiawei Han, Jian Pei and Yiwen Yin.
Association Analysis: Basic Concepts and Algorithms.
Association Rule Mining. Generating assoc. rules from frequent itemsets  Assume that we have discovered the frequent itemsets and their support  How.
Data Mining Association Analysis: Basic Concepts and Algorithms
FPtree/FPGrowth. FP-Tree/FP-Growth Algorithm Use a compressed representation of the database using an FP-tree Then use a recursive divide-and-conquer.
1 Association Rule Mining (II) Instructor: Qiang Yang Thanks: J.Han and J. Pei.
Frequent-Pattern Tree. 2 Bottleneck of Frequent-pattern Mining  Multiple database scans are costly  Mining long patterns needs many passes of scanning.
Association Analysis (3). FP-Tree/FP-Growth Algorithm Use a compressed representation of the database using an FP-tree Once an FP-tree has been constructed,
1 1 Data Mining: Concepts and Techniques (3 rd ed.) — Chapter 6 — Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
SEG Tutorial 2 – Frequent Pattern Mining.
Chapter 5 Mining Association Rules with FP Tree Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.
Ch5 Mining Frequent Patterns, Associations, and Correlations
Mining Frequent Patterns without Candidate Generation Presented by Song Wang. March 18 th, 2009 Data Mining Class Slides Modified From Mohammed and Zhenyu’s.
Jiawei Han, Jian Pei, and Yiwen Yin School of Computing Science Simon Fraser University Mining Frequent Patterns without Candidate Generation SIGMOD 2000.
AR mining Implementation and comparison of three AR mining algorithms Xuehai Wang, Xiaobo Chen, Shen chen CSCI6405 class project.
Data Mining Frequent-Pattern Tree Approach Towards ARM Lecture
EFFICIENT ITEMSET EXTRACTION USING IMINE INDEX By By U.P.Pushpavalli U.P.Pushpavalli II Year ME(CSE) II Year ME(CSE)
Mining Frequent Patterns without Candidate Generation.
Frequent Pattern  交易資料庫中頻繁的被一起購買的產品  可以做為推薦產品、銷售決策的依據  兩大演算法 Apriori FP-Tree.
Parallel Mining Frequent Patterns: A Sampling-based Approach Shengnan Cong.
Frequent Item Mining. What is data mining? =Pattern Mining? What patterns? Why are they useful?
Frequent itemset mining and temporal extensions Sunita Sarawagi
KDD’09,June 28-July 1,2009,Paris,France Copyright 2009 ACM Frequent Pattern Mining with Uncertain Data.
1 Inverted Matrix: Efficient Discovery of Frequent Items in Large Datasets in the Context of Interactive Mining -SIGKDD’03 Mohammad El-Hajj, Osmar R. Zaïane.
Association Analysis (3)
Reducing Number of Candidates Apriori principle: – If an itemset is frequent, then all of its subsets must also be frequent Apriori principle holds due.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
1 Top Down FP-Growth for Association Rule Mining By Ke Wang.
Reducing Number of Candidates
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Frequent Pattern Mining
Byung Joon Park, Sung Hee Kim
Mining Association Rules in Large Databases
Association Rule Mining
A Parameterised Algorithm for Mining Association Rules
Find Patterns Having P From P-conditional Database
COMP5331 FP-Tree Prepared by Raymond Wong Presented by Raymond Wong
732A02 Data Mining - Clustering and Association Analysis
Mining Frequent Patterns without Candidate Generation
Frequent-Pattern Tree
FP-Growth Wenlong Zhang.
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 6 —
Mining Association Rules in Large Databases
Presentation transcript:

Mining Frequent Patterns without Candidate Generation : A Frequent-Pattern Tree Approach 指導教授:廖述賢博士 報 告 人:朱 佩 慧 班 級:管科所博一

2 AGENDA Introduction Review & Overview Design and Construction (FP tree) Mining frequent patterns (FP growth) DB Projection Experiments Discussion

3 Impact Factor: 2.42 (2007) Subject category "Comp. sc., artificial intelligence": Rank 11 of 93 Subject category "Comp. sc., information systems": Rank 7 of 92

4 Frequent Pattern Mining Given a transaction database DB and a minimum support threshold ξ, find all frequent patterns (item sets) with support no less than ξ. Problem TIDItems bought 100{f, a, c, d, g, i, m, p} 200{a, b, c, f, l, m, o} 300{b, f, h, j, o} 400{b, c, k, s, p} 500{a, f, c, e, l, p, m, n} Minimum support: ξ =3 Input:Output : Problem: How to efficiently find all frequent patterns? all frequent patterns f:4 c:4 a:3 b:3 m:3 p:3 cp:3 fm:3 cm:3 am:3 ca:3 fa:3 fc:3 cam:3 fam:3 fcm:3 fcam:3

5 內容分配 Total: 35 pages Abstract: 15 lines Introduction: 2 pages(6%) FP-tree(39%) Construction: 5 pages(14%) Use: 8.5 pages(24%) FP-growth: 3.5 pages(10%) Experiments: 8 pages(23%) Discussion: 3 pages(9%) Conclusion: 1 pages(3%)

6 文獻內容 - 35pages FP-tree 38.5%

7 Introduction Review: Apriori-like methods Overview: FP-tree based mining method Additional Progress Han, J., Pei, J., and Yin, Y Mining frequent patterns without candidate generation. In Proc ACMSIGMOD Int. Conf. Management of Data (SIGMOD’00), Dallas, TX, pp. 1–12. Introduction

8 The core of the Apriori algorithm: Use frequent ( k – 1)-itemsets (L k-1 ) to generate candidates of frequent k- itemsets C k Scan database and count each pattern in C k, get frequent k - itemsets ( L k ). E.g., TIDItems bought 100{f, a, c, d, g, i, m, p} 200{a, b, c, f, l, m, o} 300{b, f, h, j, o} 400{b, c, k, s, p} 500{a, f, c, e, l, p, m, n} Apriori iteration C1f,a,c,d,g,i,m,p,l,o,h,j,k,s,b,e,n L1f, a, c, m, b, p C2 fa, fc, fm, fp, ac, am, …bp L2 fa, fc, fm, … Apriori Algorithm Review =1.02x =1.05x =1.27x10 30 n=10 n=20 n=100 # of candidates ? Agrawal, R., Imielinski, T., and Swami, A Mining association rules between sets of items in large databases.

9 9 Compress a large database into a compact, Frequent-Pattern tree (FP-tree) structure highly condensed, but complete for frequent pattern mining avoid costly database scans Develop an efficient, FP-tree-based frequent pattern mining method (FP-growth) A divide-and-conquer methodology: decompose mining tasks into smaller ones Avoid candidate generation: sub-database test only. Ideas Overview

FP-tree: Design and Construction FP-Tree

11 Definition 1 ( FP-tree ) null Item-prefix subtree Frequent-item -header table Item-nameNode-link HItem-namecountNode-link

12 Construct FP-tree 1. First Scan: Collects the set of frequent items. 2. Second Scan: Construct the FP-tree. Construct FPt

13 First Scan Collects the set of frequent items Find frequent items (L 1 ) 建立 Frequent-item-header Table 將每筆交易內容依 frequency 遞減排序 ( 不含 infrequent item) Item frequency f4 c4 a3 b3 m3 p3 Frequent-item -header table Construct FPt

14 以遞減排序之內容建構 FP-Tree 會較為緊密 First Scan Collects the set of frequent items Construct FPt

15 TIDItems bought 100 {f, a, c, d, g, i, m, p} ↓ 100 {f, c, a, m, p} (a)Creat root-node { null }. (b)For each transaction, order its frequent items according to the order in L. (ex: TID=100) (c) Construct FP-tree by putting each frequency ordered transaction onto it. Second Scan Construct FP-tree Frequent-item -header table f:1 null c:1 a:1 m:1 p:1 ItemLink f c a b m p Construct FPt

16 TIDItems bought 200 {a, b, c, f, l, m, o} ↓ 200{f, c, a, b, m} Construct FP-tree Second Time (c) Construct FP-tree by putting each frequency ordered transaction onto it. (ex: TID=200) Construct FP-tree Frequent-item -header table f:2 null c:2 a:2 m:1 p:1 ItemLink f c a b m p b:1 m:1 Construct FPt

17 TIDItems bought 300 {b, f, h, j, o} ↓ 300{f, b} Construct FP-tree Second Time (c) Construct FP-tree by putting each frequency ordered transaction onto it. (ex: TID=300) Construct FP-tree Frequent-item -header table f:3 null c:2 a:2 m:1 p:1 ItemLink f c a b m p b:1 m:1 b:1 Construct FPt

18 TIDItems bought 400 {b, c, k, s, p} ↓ 400{c, b, p} Construct FP-tree Second Time (c) Construct FP-tree by putting each frequency ordered transaction onto it. (ex: TID=400) Construct FP-tree f:3 null c:2 a:2 m:1 p:1 b:1 m:1 b:1 c:1 b:1 p:1 Frequent-item -header table ItemLink f c a b m p Construct FPt

19 TIDItems bought 500 {a,f,c,e,l,p,m,n} ↓ 500{f,c,a,m,p} Construct FP-tree Second Time (c) Construct FP-tree by putting each frequency ordered transaction onto it. (ex: TID=500) Construct FP-tree Frequent-item -header table f:4 null c:3 a:3 m:2 p:2 ItemLink f c a b m p b:1 m:1 b:1 c:1 b:1 p:1 Construct FPt

20 Efficiency Analysis FP-tree is much smaller than the size of the DB Pattern base is smaller than original FP-tree Conditional FP-tree is smaller than pattern base mining process works on a set of usually much smaller pattern bases and conditional FP-trees Divide-and-conquer and dramatic scale of shrinking

21 Lemma 2.1 Given a transaction database DB and a support threshold ξ,the complete set of frequent item projections of transactions in the database can be derived from DB’s FP-tree. a k =“A”, C ak =5, C’ ak =3+1=4 Support(root ~A)=5 B:8 A:5 null C:3 D:1 contains the complete information of DB 1. BACD 2. BAC 3. BAC 4. BAD 5. BA 6. B 7. B 8. B

22 Lemma 2.2 Given a transaction database DB and a support threshold ξ. The size of an FP-tree is bounded by Σ T ∈ DB | freq (T)|, The height of the tree is bounded by max T ∈ DB {| freq (T)|} 右圖在 DB 中為 8 筆交易, 在 FP-tree 僅需 5 個 node. FP-tree 最長的 subtree 長度為最長的交易筆數 [BACD] (root: null 不計 ) 1. BACD 2. BAC 3. BAC 4. BAD 5. BA 6. B 7. B 8. B B:8 A:5 null C:3 D:1

3.Mining frequent patterns using FP-tree

24 Starting the processing from the end of list L: Step 1: Construct conditional pattern base. Step 2 Construct conditional FP-tree. Step 3 Recursively mine conditional FP-trees and grow frequent patterns obtained so far. FP-Growth Mining FP

25 Conditional FP-Tree (suffix p ) Frequent-item -header table f:4 c:3 a:3 m:2 p:2 ItemLink f c a b m p b:1 m:1 b:1 c:1 b:1 p:1 Conditional pattern base {(fcam:2),(cb:1)} Conditional FP-tree {(c:3)}| p Frequent Pattern {(cp:3)} f c a b m 2 2 _ 1 1 _ Mining FP null

26 Conditional FP-Tree (suffix m ) Frequent-item -header table f:4 null c:3 a:3 m:2 p:2 ItemLink f c a b m p b:1 m:1 b:1 c:1 b:1 p:1 Conditional pattern base {(fca:2),(fcab:1)} Frequent Pattern {(m:3), (am:3), (cm:3), (fm:3), (cam:3), (fam:3), (fcam:3), (fcm:3)} f c a b m _ Mining FP Conditional FP-tree {(f:3,c:3,a:3)}|m {(f:3,c:3)}|am {(f:3)}|cam {(f:3)}|cm (Recursively mine)

27 Conditional FP-Tree (suffix b ) Frequent-item -header table f:4 null c:3 a:3 m:2 p:2 ItemLink f c a b m p b:1 m:1 b:1 c:1 b:1 p:1 Conditional pattern base {(fca:1),(f:1)(c:1)} f c a b m _ Mining FP Conditional FP-tree {}|b

28 Conditional FP-Tree (suffix a ) Frequent-item -header table f:4 null c:3 a:3 m:2 p:2 ItemLink f c a b m p b:1 m:1 b:1 c:1 b:1 p:1 Conditional pattern base {(fc:3)} Frequent Pattern {(ca:3), (fca:3), (fa:3)} f c a b m 3 Mining FP Conditional FP-tree {(f:3,c:3)}|a {(f:3)}|cm (Recursively mine)

29 Conditional FP-Tree (suffix c ) Frequent-item -header table f:4 null c:3 a:3 m:2 p:2 ItemLink f c a b m p b:1 m:1 b:1 c:1 b:1 p:1 Conditional pattern base {(f:3)} Frequent Pattern {(fc:3)} f c a b m 3 Mining FP Conditional FP-tree {(f:3)}|c

30 Conditional FP-Tree (suffix f ) Frequent-item -header table f:4 null c:3 a:3 m:2 p:2 ItemLink f c a b m p b:1 m:1 b:1 c:1 b:1 p:1 Conditional pattern base {} Mining FP

31 Frequent Pattern TIDItems bought 100{f, a, c, d, g, i, m, p} 200{a, b, c, f, l, m, o} 300{b, f, h, j, o} 400{b, c, k, s, p} 500{a, f, c, e, l, p, m, n} Minimum support: ξ =3 Input:Output: all frequent patterns f:4 c:4 a:3 b:3 m:3 p:3 cp:3 fm:3 cm:3 am:3 ca:3 fa:3 fc:3 cam:3 fam:3 fcm:3 fcam:3

32 Single prefix path FP-tree freq pattern set(P) = {(a:10), (b:8), (c:7), (ab:8), (ac:7), (bc:7), (abc:7)}. freq pattern set(Q) = {(d:4), (e:3), (f:3), (df:3)} freq pattern set(Q) × freq pattern set(P) ={(ad:4),(ae:3),(af:3),(adf:3),(bd:4),(be:3),… (abcd:4), (abce:3), (abcf:3), (abcdf:3)}

4.Database Projection Main Memory-Based FP mining⇩ Disk-Based FP mining

34 Parallel Projection Item freq f4 c4 a3 b3 m3 p3 Projection

35 Partition Projection Projection Item freq f4 c4 a3 b3 m3 p3

5.Performance & Compactness

37 Performance & Compactness

38 Performance & Compactness

6.Discussions Improve the performance and scalability of FP-growth

40 Performance Improvement Projected DBs. partition the DB into a set of projected DBs and then construct an FP-tree and mine it in each projected DB. Disk-resident FP-tree Store the FP-tree in the hark disks by using B+ tree structure to reduce I/O cost. FP-tree Materialization a low ξ may usually satisfy most of the mining queries in the FP-tree construction. FP-tree Incremental update How to update an FP-tree when there are new data? ❖ Re-construct the FP-tree ❖ Or do not update the FP-tree

感謝聆聽 敬請指教

42 B:8 A:5 null C:3 D:1 A:2 C:1 D:1 E:1 D:1 E:1 C:3 D:1 E:1 B8 A7 C7 D5 E3 B:8 null B:5 A:5 null A:2 Suffix B {} {}|B {} Suffix A {B:5} {B:5}|A {BA:5} B:6 A:3 null C:3 A:1 C:1 C:3 Suffix C {BA:3,B:3,A:1} {B:6,A:4}|C {BC:6,AC:4} B:3 A:2 null C:1 D:1 A:2 C:1 D:1 C:1 D:1 Suffix D {BAC:1, BA:1,BC:1,AC:1,A:1} {B:3,A:4,C:3}|D {BD:3, AD:4, CD:3} B:1 null C:1 A:2 C:1 D:1 E:1 D:1 E:1 Suffix E {BC:1, ACD:1,AD:1} {B:1,A:2,C:2,D:2}|E {AE:2,CE:2,DE:2} Conditional FP-Tree Conditional pattern base Conditional FP-tree Frequent Pattern

43 6.Discussion Materialization and maintenance of FP-tree –Main memory  disk-resident FP-trees. –Select a good minimum support threshold ξ. –Incremental update of an FP-tree. Extensions of frequent-pattern growth method in data mining –FP-tree mining with constraints