FP-Growth Wenlong Zhang.

Slides:



Advertisements
Similar presentations
Mining Association Rules
Advertisements

Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
CSE 634 Data Mining Techniques
Association rules and frequent itemsets mining
Graph Mining Laks V.S. Lakshmanan
732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña FP grow algorithm Correlation analysis.
FP-Growth algorithm Vasiljevic Vladica,
FP (FREQUENT PATTERN)-GROWTH ALGORITHM ERTAN LJAJIĆ, 3392/2013 Elektrotehnički fakultet Univerziteta u Beogradu.
Data Mining Association Analysis: Basic Concepts and Algorithms
FPtree/FPGrowth (Complete Example). First scan – determine frequent 1- itemsets, then build header B8 A7 C7 D5 E3.
CPS : Information Management and Mining
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining: Association Rule Mining CSE880: Database Systems.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Association rules Apriori algorithm FP grow algorithm.
Mining Association Rules in Large Databases
FP-growth. Challenges of Frequent Pattern Mining Improving Apriori Fp-growth Fp-tree Mining frequent patterns with FP-tree Visualization of Association.
Data Mining Association Analysis: Basic Concepts and Algorithms
1 Mining Frequent Patterns Without Candidate Generation Apriori-like algorithm suffers from long patterns or quite low minimum support thresholds. Two.
Mining Frequent patterns without candidate generation Jiawei Han, Jian Pei and Yiwen Yin.
Association Analysis: Basic Concepts and Algorithms.
Association Rule Mining. Generating assoc. rules from frequent itemsets  Assume that we have discovered the frequent itemsets and their support  How.
Data Mining Association Analysis: Basic Concepts and Algorithms
FPtree/FPGrowth. FP-Tree/FP-Growth Algorithm Use a compressed representation of the database using an FP-tree Then use a recursive divide-and-conquer.
Frequent-Pattern Tree. 2 Bottleneck of Frequent-pattern Mining  Multiple database scans are costly  Mining long patterns needs many passes of scanning.
Association Analysis (3). FP-Tree/FP-Growth Algorithm Use a compressed representation of the database using an FP-tree Once an FP-tree has been constructed,
1 1 Data Mining: Concepts and Techniques (3 rd ed.) — Chapter 6 — Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign.
SEG Tutorial 2 – Frequent Pattern Mining.
Chapter 5 Mining Association Rules with FP Tree Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.
Ch5 Mining Frequent Patterns, Associations, and Correlations
Mining Frequent Patterns without Candidate Generation Presented by Song Wang. March 18 th, 2009 Data Mining Class Slides Modified From Mohammed and Zhenyu’s.
Jiawei Han, Jian Pei, and Yiwen Yin School of Computing Science Simon Fraser University Mining Frequent Patterns without Candidate Generation SIGMOD 2000.
AR mining Implementation and comparison of three AR mining algorithms Xuehai Wang, Xiaobo Chen, Shen chen CSCI6405 class project.
Data Mining Frequent-Pattern Tree Approach Towards ARM Lecture
EFFICIENT ITEMSET EXTRACTION USING IMINE INDEX By By U.P.Pushpavalli U.P.Pushpavalli II Year ME(CSE) II Year ME(CSE)
Mining Frequent Patterns without Candidate Generation.
Mining Frequent Patterns without Candidate Generation : A Frequent-Pattern Tree Approach 指導教授:廖述賢博士 報 告 人:朱 佩 慧 班 級:管科所博一.
Frequent Pattern  交易資料庫中頻繁的被一起購買的產品  可以做為推薦產品、銷售決策的依據  兩大演算法 Apriori FP-Tree.
Parallel Mining Frequent Patterns: A Sampling-based Approach Shengnan Cong.
Frequent itemset mining and temporal extensions Sunita Sarawagi
1 Inverted Matrix: Efficient Discovery of Frequent Items in Large Datasets in the Context of Interactive Mining -SIGKDD’03 Mohammad El-Hajj, Osmar R. Zaïane.
Association Analysis (3)
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Reducing Number of Candidates Apriori principle: – If an itemset is frequent, then all of its subsets must also be frequent Apriori principle holds due.
MapReduce MapReduce is one of the most popular distributed programming models Model has two phases: Map Phase: Distributed processing based on key, value.
DATA MINING ASSOCIATION RULES.
TITLE What should be in Objective, Method and Significant
Reducing Number of Candidates
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Frequent Pattern Mining
COMP 5331: Knowledge Discovery and Data Mining
Market Basket Analysis and Association Rules
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Vasiljevic Vladica, FP-Growth algorithm Vasiljevic Vladica,
Mining Association Rules in Large Databases
Association Rule Mining
Mining Access Pattrens Efficiently from Web Logs Jian Pei, Jiawei Han, Behzad Mortazavi-asl, and Hua Zhu 2000년 5월 26일 DE Lab. 윤지영.
Data Mining Association Analysis: Basic Concepts and Algorithms
Find Patterns Having P From P-conditional Database
COMP5331 FP-Tree Prepared by Raymond Wong Presented by Raymond Wong
732A02 Data Mining - Clustering and Association Analysis
Mining Frequent Patterns without Candidate Generation
Frequent-Pattern Tree
Market Basket Analysis and Association Rules
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 6 —
Mining Association Rules in Large Databases
Presentation transcript:

FP-Growth Wenlong Zhang

Introduction “In Data Mining, the task of finding frequent pattern in large databases is very important and has been studied in large scale in the past few years. Unfortunately, this task is computationally expensive, especially when a large number of patterns exist.”

Frequent Pattern Mining Given a transaction database DB and a minimum support threshold ξ, find all frequent patterns (item sets) with support no less than ξ. Input: DB: TID Items bought 100 {f, a, c, d, g, i, m, p} 200 {a, b, c, f, l, m, o} 300 {b, f, h, j, o} 400 {b, c, k, s, p} 500 {a, f, c, e, l, p, m, n} Minimum support: ξ =3 Output: all frequent patterns, i.e., f, a, …, fa, fac, fam, fm,am…

Algorithms Apriori Algorithm Bottlenecks: candidate generation Generate huge candidate sets Candidate Test incur multiple scans of database: each candidate

FP-Growth: Concept Compress a large database into a compact, Frequent-Pattern tree (FP-tree) structure highly compacted, but complete for frequent pattern mining avoid costly repeated database scans Develop an efficient, FP-tree-based frequent pattern mining method (FP-growth) A divide-and-conquer methodology: decompose mining tasks into smaller ones Avoid candidate generation: sub-database test only.

FP-Growth: FP-Tree Construction Two Steps: Scan the transaction DB for the first time, find frequent items (single item patterns) and order them into a list L in frequency descending order. e.g., L = {f:4, c:4, a:3, b:3, m:3, p:3} In the format of (item-name, support) For each transaction, order its frequent items according to the order in L; Scan DB the second time, construct FP-tree by putting each frequency ordered transaction onto it.

FP-Tree Construction: Step 1 Step 1: Scan DB for the first time to generate L Item frequency f 4 c 4 a 3 b 3 m 3 p 3 L TID Items bought 100 {f, a, c, d, g, i, m, p} 200 {a, b, c, f, l, m, o} 300 {b, f, h, j, o} 400 {b, c, k, s, p} 500 {a, f, c, e, l, p, m, n}

FP-Tree Construction: Step 2 Step 2: scan the DB for the second time, order frequent items in each transaction TID Items bought (ordered) frequent items 100 {f, a, c, d, g, i, m, p} {f, c, a, m, p} 200 {a, b, c, f, l, m, o} {f, c, a, b, m} 300 {b, f, h, j, o} {f, b} 400 {b, c, k, s, p} {c, b, p} 500 {a, f, c, e, l, p, m, n} {f, c, a, m, p}

FP-Tree Construction: Step 2 (Cont.) {} f:1 c:1 a:1 m:1 p:1 {f, c, a, m, p} f:2 c:2 a:2 b:1 {f, c, a, b, m} Node-Link

FP-Tree Construction: Step 2 (Cont.) {} f:4 c:1 b:1 p:1 c:3 a:3 m:2 p:2 m:1 f:3 c:2 a:2 {f, b} {c, b, p} {f, c, a, m, p}

FP-Tree Construction: Step 2 (Cont.) {} f:4 c:1 b:1 p:1 c:3 a:3 m:2 p:2 m:1 Header Table Item head f c a b m p

Advantages of the FP-tree Structure The most significant advantage of the FP-tree Scan the DB only twice and twice only. Completeness: the FP-tree contains all the information related to mining frequent patterns (given the min-support threshold). Compactness: The size of the tree is bounded by the occurrences of frequent items The height of the tree is bounded by the maximum number of items in a transaction

Mining Frequent Patterns Using FP-tree General idea (divide-and-conquer) Recursively grow frequent patterns using the FP-tree: looking for shorter ones recursively and then concatenating the suffix: For each frequent item, construct its conditional pattern base, and then its conditional FP-tree; Repeat the process on each newly created conditional FP-tree until the resulting FP-tree is empty, or it contains only one path (single path will generate all the combinations of its sub-paths, each of which is a frequent pattern)

Mining Frequent Patterns Using FP-tree (cont.) Starting the processing from the end of list L: Step 1: Construct conditional pattern base for each item in the header table Step 2 Construct conditional FP-tree from each conditional pattern base Step 3 Recursively mine conditional FP-trees and grow frequent patterns obtained so far. If the conditional FP-tree contains a single path, simply enumerate all the patterns

Step 1: Construct Conditional Pattern Base Starting at the bottom of frequent-item header table in the FP-tree Traverse the FP-tree by following the link of each frequent item Accumulate all of transformed prefix paths of that item to form a conditional pattern base {} f:4 c:1 b:1 p:1 c:3 a:3 m:2 p:2 m:1 Header Table Item head f c a b m p Conditional pattern bases Item cond. pattern base p fcam:2, cb:1 m fca:2, fcab:1 b fca:1, f:1, c:1 a fc:3 c f:3 f { }

Step 2: Construct Conditional FP-tree For each pattern base Accumulate the count for each item in the base Construct the conditional FP-tree for the frequent items of the pattern base {} Header Table Item head f c a b m p f:4 {} f:3 c:3 a:3 m - conditional FP-tree c:3 m - cond. pattern base: fca:2, fcab:1 a:3 m:2 b:1 m:1

Step 3: Recursively mine conditional FP-tree {} conditional FP-tree of “fm”: (:3) f:3 c:3 a:3 “m”: (fca:3) “am”: (fc:3) “cm”: (f:3) “fam”: (:3) “fcm”: (:3) “fcam”: (:3) add “a” add “c” add “f” {} f:3 conditional FP-tree of “cam”: (f:3)

Principles of FP-Growth Pattern growth property Let  be a frequent itemset in DB, B be 's conditional pattern base, and  be an itemset in B. Then  U  is a frequent itemset in DB iff  is frequent in B. Is “fcabm” a frequent pattern? “fcab” is a branch of m's conditional pattern base “b” is NOT frequent in transactions containing “fcab” “bm” is NOT a frequent itemset.

Conditional Pattern Bases and Conditional FP-Tree Empty f {(f:3)}|c {(f:3)} c {(f:3, c:3)}|a {(fc:3)} a {(fca:1), (f:1), (c:1)} b {(f:3, c:3, a:3)}|m {(fca:2), (fcab:1)} m {(c:3)}|p {(fcam:2), (cb:1)} p Conditional FP-tree Conditional pattern base Item

Reference https://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Frequent_Pattern_Min ing/The_FP-Growth_Algorithm Jiawei Han and Micheline Kamber, Data Mining: Concepts and Techniques.