Frequent-Pattern Tree

Slides:



Advertisements
Similar presentations
Mining Association Rules
Advertisements

Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Graph Mining Laks V.S. Lakshmanan
732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña FP grow algorithm Correlation analysis.
FP-Growth algorithm Vasiljevic Vladica,
FP (FREQUENT PATTERN)-GROWTH ALGORITHM ERTAN LJAJIĆ, 3392/2013 Elektrotehnički fakultet Univerziteta u Beogradu.
Data Mining Association Analysis: Basic Concepts and Algorithms
FPtree/FPGrowth (Complete Example). First scan – determine frequent 1- itemsets, then build header B8 A7 C7 D5 E3.
CPS : Information Management and Mining
Chapter 5: Mining Frequent Patterns, Association and Correlations
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Association rules Apriori algorithm FP grow algorithm.
Data Mining Association Analysis: Basic Concepts and Algorithms
Mining Association Rules in Large Databases
FP-growth. Challenges of Frequent Pattern Mining Improving Apriori Fp-growth Fp-tree Mining frequent patterns with FP-tree Visualization of Association.
FP-Tree/FP-Growth Practice. FP-tree construction null B:1 A:1 After reading TID=1: After reading TID=2: null B:2 A:1 C:1 D:1.
Data Mining Association Analysis: Basic Concepts and Algorithms
1 Mining Frequent Patterns Without Candidate Generation Apriori-like algorithm suffers from long patterns or quite low minimum support thresholds. Two.
Mining Frequent patterns without candidate generation Jiawei Han, Jian Pei and Yiwen Yin.
Association Analysis: Basic Concepts and Algorithms.
Association Rule Mining. Generating assoc. rules from frequent itemsets  Assume that we have discovered the frequent itemsets and their support  How.
Data Mining Association Analysis: Basic Concepts and Algorithms
FPtree/FPGrowth. FP-Tree/FP-Growth Algorithm Use a compressed representation of the database using an FP-tree Then use a recursive divide-and-conquer.
Association Rule Mining - MaxMiner. Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and.
Frequent-Pattern Tree. 2 Bottleneck of Frequent-pattern Mining  Multiple database scans are costly  Mining long patterns needs many passes of scanning.
Association Analysis (3). FP-Tree/FP-Growth Algorithm Use a compressed representation of the database using an FP-tree Once an FP-tree has been constructed,
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
SEG Tutorial 2 – Frequent Pattern Mining.
Chapter 5 Mining Association Rules with FP Tree Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.
Ch5 Mining Frequent Patterns, Associations, and Correlations
Mining Frequent Patterns without Candidate Generation Presented by Song Wang. March 18 th, 2009 Data Mining Class Slides Modified From Mohammed and Zhenyu’s.
Jiawei Han, Jian Pei, and Yiwen Yin School of Computing Science Simon Fraser University Mining Frequent Patterns without Candidate Generation SIGMOD 2000.
AR mining Implementation and comparison of three AR mining algorithms Xuehai Wang, Xiaobo Chen, Shen chen CSCI6405 class project.
Data Mining Frequent-Pattern Tree Approach Towards ARM Lecture
EFFICIENT ITEMSET EXTRACTION USING IMINE INDEX By By U.P.Pushpavalli U.P.Pushpavalli II Year ME(CSE) II Year ME(CSE)
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Mining Frequent Patterns without Candidate Generation.
Mining Frequent Patterns without Candidate Generation : A Frequent-Pattern Tree Approach 指導教授:廖述賢博士 報 告 人:朱 佩 慧 班 級:管科所博一.
Parallel Mining Frequent Patterns: A Sampling-based Approach Shengnan Cong.
Frequent Item Mining. What is data mining? =Pattern Mining? What patterns? Why are they useful?
Association Analysis (3)
Reducing Number of Candidates Apriori principle: – If an itemset is frequent, then all of its subsets must also be frequent Apriori principle holds due.
CS685: Special Topics in Data Mining The UNIVERSITY of KENTUCKY Frequent Itemset Mining II Tree-based Algorithm Max Itemsets Closed Itemsets.
DATA MINING ASSOCIATION RULES.
Reducing Number of Candidates
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Frequent Pattern Mining
FP-Tree/FP-Growth Detailed Steps
Dynamic Itemset Counting
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Market Baskets Frequent Itemsets A-Priori Algorithm
Vasiljevic Vladica, FP-Growth algorithm Vasiljevic Vladica,
Mining Association Rules in Large Databases
Association Rule Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Find Patterns Having P From P-conditional Database
COMP5331 FP-Tree Prepared by Raymond Wong Presented by Raymond Wong
732A02 Data Mining - Clustering and Association Analysis
Mining Frequent Patterns without Candidate Generation
FP-Growth Wenlong Zhang.
Association Rule Mining
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 6 —
Mining Association Rules in Large Databases
What Is Association Mining?
Presentation transcript:

Frequent-Pattern Tree

Bottleneck of Frequent-pattern Mining Multiple database scans are costly Mining long patterns needs many passes of scanning and generates lots of candidates To find frequent itemset i1i2…i100 # of scans: 100 # of Candidates: (1001) + (1002) + … + (110000) = 2100-1 = 1.27*1030 ! Bottleneck: candidate-generation-and-test Can we avoid candidate generation?

Mining Freq Patterns w/o Candidate Generation Grow long patterns from short ones using local frequent items “abc” is a frequent pattern Get all transactions having “abc”: DB|abc (projected database on abc) “d” is a local frequent item in DB|abc  abcd is a frequent pattern Get all transactions having “abcd” (projected database on “abcd”) and find longer itemsets

Mining Freq Patterns w/o Candidate Generation Compress a large database into a compact, Frequent-Pattern tree (FP-tree) structure Highly condensed, but complete for frequent pattern mining Avoid costly database scans Develop an efficient, FP-tree-based frequent pattern mining method A divide-and-conquer methodology: decompose mining tasks into smaller ones Avoid candidate generation: examine sub-database (conditional pattern base) only!

Construct FP-tree from a Transaction DB TID Items bought (ordered) frequent items 100 {f, a, c, d, g, i, m, p} {f, c, a, m, p} 200 {a, b, c, f, l, m, o} {f, c, a, b, m} 300 {b, f, h, j, o} {f, b} 400 {b, c, k, s, p} {c, b, p} 500 {a, f, c, e, l, p, m, n} {f, c, a, m, p} min_sup= 50% Steps: Scan DB once, find frequent 1-itemset (single item pattern) Order frequent items in frequency descending order: f, c, a, b, m, p (L-order) Process DB based on L-order a 3 i 1 b j c 4 k d l 2 e m f n g o h p

Construct FP-tree from a Transaction DB TID Items bought (ordered) frequent items 100 {f, a, c, d, g, i, m, p} {f, c, a, m, p} 200 {a, b, c, f, l, m, o} {f, c, a, b, m} 300 {b, f, h, j, o} {f, b} 400 {b, c, k, s, p} {c, b, p} 500 {a, f, c, e, l, p, m, n} {f, c, a, m, p} {} Header Table Item frequency head f 0 nil c 0 nil a 0 nil b 0 nil m 0 nil p 0 nil Initial FP-tree

Construct FP-tree from a Transaction DB TID Items bought (ordered) frequent items 100 {f, a, c, d, g, i, m, p} {f, c, a, m, p} 200 {a, b, c, f, l, m, o} {f, c, a, b, m} 300 {b, f, h, j, o} {f, b} 400 {b, c, k, s, p} {c, b, p} 500 {a, f, c, e, l, p, m, n} {f, c, a, m, p} {} Header Table Item frequency head f 1 c 1 a 1 b 0 nil m 1 p 1 f:1 c:1 a:1 m:1 Insert {f, c, a, m, p} p:1

Construct FP-tree from a Transaction DB TID Items bought (ordered) frequent items 100 {f, a, c, d, g, i, m, p} {f, c, a, m, p} 200 {a, b, c, f, l, m, o} {f, c, a, b, m} 300 {b, f, h, j, o} {f, b} 400 {b, c, k, s, p} {c, b, p} 500 {a, f, c, e, l, p, m, n} {f, c, a, m, p} {} Header Table Item frequency head f 2 c 2 a 2 b 1 m 2 p 1 f:2 c2 a:2 m:1 b:1 Insert {f, c, a, b, m} p:1 m:1

Construct FP-tree from a Transaction DB TID Items bought (ordered) frequent items 100 {f, a, c, d, g, i, m, p} {f, c, a, m, p} 200 {a, b, c, f, l, m, o} {f, c, a, b, m} 300 {b, f, h, j, o} {f, b} 400 {b, c, k, s, p} {c, b, p} 500 {a, f, c, e, l, p, m, n} {f, c, a, m, p} {} Header Table Item frequency head f 3 c 2 a 2 b 2 m 2 p 1 f:3 c:2 b:1 a:2 m:1 b:1 Insert {f, b} p:1 m:1

Construct FP-tree from a Transaction DB TID Items bought (ordered) frequent items 100 {f, a, c, d, g, i, m, p} {f, c, a, m, p} 200 {a, b, c, f, l, m, o} {f, c, a, b, m} 300 {b, f, h, j, o} {f, b} 400 {b, c, k, s, p} {c, b, p} 500 {a, f, c, e, l, p, m, n} {f, c, a, m, p} {} Header Table Item frequency head f 3 c 3 a 2 b 3 m 2 p 2 f:3 c:1 c:2 b:1 b:1 a:2 p:1 m:1 b:1 Insert {c, b, p} p:1 m:1

Construct FP-tree from a Transaction DB TID Items bought (ordered) frequent items 100 {f, a, c, d, g, i, m, p} {f, c, a, m, p} 200 {a, b, c, f, l, m, o} {f, c, a, b, m} 300 {b, f, h, j, o} {f, b} 400 {b, c, k, s, p} {c, b, p} 500 {a, f, c, e, l, p, m, n} {f, c, a, m, p} {} Header Table Item frequency head f 4 c 4 a 3 b 3 m 3 p 3 f:4 c:1 c:3 b:1 b:1 a:3 p:1 m:2 b:1 Insert {f, c, a, m, p} p:2 m:1

Benefits of FP-tree Structure Completeness: Preserve complete DB information for frequent pattern mining (given prior min support) Each transaction mapped to one FP-tree path; counts stored at each node Compactness One FP-tree path may correspond to multiple transactions; tree is never larger than original database (if not count node-links and counts) Reduce irrelevant information—infrequent items are gone Frequency-descending ordering: more frequent items are closer to tree top and more likely to be shared

How Effective Is FP-tree? Dataset: Connect-4 (a dense dataset)

Mining Frequent Patterns Using FP-tree General idea (divide-and-conquer) Recursively grow frequent pattern path using FP-tree Frequent patterns can be partitioned into subsets according to L-order L-order=f-c-a-b-m-p Patterns containing p Patterns having m but no p Patterns having b but no m or p … Patterns having c but no a nor b, m, p Pattern f

Mining Frequent Patterns Using FP-tree Step 1 : Construct conditional pattern base for each item in header table Step 2: Construct conditional FP-tree from each conditional pattern-base Step 3: Recursively mine conditional FP-trees and grow frequent patterns obtained so far If conditional FP-tree contains a single path, simply enumerate all patterns

Step 1: Construct Conditional Pattern Base Starting at header table of FP-tree Traverse FP-tree by following link of each frequent item Accumulate all transformed prefix paths of item to form a conditional pattern base Header Table Item frequency head f 4 c 4 a 3 b 3 m 3 p 3 {} Conditional pattern bases item cond. pattern base c f:3 a fc:3 b fca:1, f:1, c:1 m fca:2, fcab:1 p fcam:2, cb:1 f:4 c:1 c:3 b:1 b:1 a:3 p:1 m:2 b:1 p:2 m:1

Step 2: Construct Conditional FP-tree For each pattern-base Accumulate count for each item in base Construct FP-tree for frequent items of pattern base min_sup= 50% # transaction =5 Conditional pattern bases item cond. pattern base c f:3 a fc:3 b fca:1, f:1, c:1 m fca:2, fcab:1 p fcam:2, cb:1 p conditional FP-tree f 2 c 3 a m b 1 {} c:3 Item frequency head c 3 fcam cb

Mining Frequent Patterns by Creating Conditional Pattern-Bases Empty f {(f:3)}|c {(f:3)} c {(f:3, c:3)}|a {(fc:3)} a {(fca:1), (f:1), (c:1)} b {(f:3, c:3, a:3)}|m {(fca:2), (fcab:1)} m {(c:3)}|p {(fcam:2), (cb:1)} p Conditional FP-tree Conditional pattern-base Item

Step 3: Recursively mine conditional FP-tree Collect all patterns that end at p suffix: p(3) FP: p(3) CPB: fcam:2, cb:1 c(3) FP-tree: Suffix: cp(3) FP: cp(3) CPB: nil

Collect all patterns that end at m Step 3: Recursively mine conditional FP-tree Collect all patterns that end at m FP-tree: suffix: m(3) f(3) c(3) FP: m(3) CPB: fca:2, fcab:1 a(3) suffix: am(3) suffix: cm(3) suffix: fm(3) FP: cm(3) CPB: f:3 FP: fm(3) CPB: nil f(3) FP-tree: Continue next page suffix: fcm(3) FP: fcm(3) CPB: nil

Collect all patterns that end at m (cont’d) f(3) FP-tree: c(3) suffix: am(3) FP: am(3) CPB: fc:3 suffix: cam(3) f(3) FP-tree: suffix: fam(3) FP: cam(3) CPB: f:3 FP: fam(3) CPB: nil suffix: fcam(3) FP: fcam(3) CPB: nil

FP-growth vs. Apriori: Scalability With the Support Threshold Data set T25I20D10K

Why Is Frequent Pattern Growth Fast? Performance study shows FP-growth is an order of magnitude faster than Apriori Reasoning No candidate generation, no candidate test Use compact data structure Eliminate repeated database scan Basic operations are counting and FP-tree building

Weaknesses of FP-growth Support dependent; cannot accommodate dynamic support threshold Cannot accommodate incremental DB update Mining requires recursive operations

Maximal patterns and Border Maximal patterns: An frequent itemset X is maximal if none of its superset is frequent 20 patterns, but only 3 maximal patterns ! Can use 3 maximal patterns to represent all 20 patterns Maximal Patterns = {AD, ACE, BCDE} null AB AC AD AE BC BD BE CD CE DE A B C D E ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE ABCD ABCE ABDE ACDE BCDE Border

Closed Itemsets An itemset X is closed if there exists no item y (yX) such that every transaction containing X also contains y Example: AC is not closed since every transaction containing AC also contains W CDW is closed since transaction Tid=2 contains no other item Tid Itemset 1 ACTW 2 CDW 3 4 ACDW 5 ACDTW 6 CDT

Frequent Closed Patterns For frequent itemset X, if there exists no item y s.t. every transaction containing X also contains y, then X is a frequent closed pattern “acdf” is a frequent closed pattern Concise rep. of freq pats Reduce # of patterns and rules N. Pasquier et al. In ICDT’99 Min_sup=2 TID Items 10 a, c, d, e, f 20 a, b, e 30 c, e, f 40 a, c, d, f 50