Association Rule Mining. Generating assoc. rules from frequent itemsets  Assume that we have discovered the frequent itemsets and their support  How.

Slides:



Advertisements
Similar presentations
Mining Association Rules
Advertisements

Huffman Codes and Asssociation Rules (II) Prof. Sin-Min Lee Department of Computer Science.
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
CSE 634 Data Mining Techniques
Association rules and frequent itemsets mining
Graph Mining Laks V.S. Lakshmanan
732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña FP grow algorithm Correlation analysis.
FP-Growth algorithm Vasiljevic Vladica,
Data Mining Association Analysis: Basic Concepts and Algorithms
CPS : Information Management and Mining
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining: Association Rule Mining CSE880: Database Systems.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Association rules Apriori algorithm FP grow algorithm.
Data Mining Association Analysis: Basic Concepts and Algorithms
Mining Association Rules in Large Databases
Data Mining Association Analysis: Basic Concepts and Algorithms
FP-growth. Challenges of Frequent Pattern Mining Improving Apriori Fp-growth Fp-tree Mining frequent patterns with FP-tree Visualization of Association.
Data Mining Association Analysis: Basic Concepts and Algorithms
1 Mining Frequent Patterns Without Candidate Generation Apriori-like algorithm suffers from long patterns or quite low minimum support thresholds. Two.
Mining Frequent patterns without candidate generation Jiawei Han, Jian Pei and Yiwen Yin.
Association Analysis: Basic Concepts and Algorithms.
Data Mining Association Analysis: Basic Concepts and Algorithms
FPtree/FPGrowth. FP-Tree/FP-Growth Algorithm Use a compressed representation of the database using an FP-tree Then use a recursive divide-and-conquer.
Frequent-Pattern Tree. 2 Bottleneck of Frequent-pattern Mining  Multiple database scans are costly  Mining long patterns needs many passes of scanning.
Association Analysis (3). FP-Tree/FP-Growth Algorithm Use a compressed representation of the database using an FP-tree Once an FP-tree has been constructed,
Mining Association Rules
1 1 Data Mining: Concepts and Techniques (3 rd ed.) — Chapter 6 — Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign.
Mining Association Rules
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
SEG Tutorial 2 – Frequent Pattern Mining.
Chapter 5 Mining Association Rules with FP Tree Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.
Ch5 Mining Frequent Patterns, Associations, and Correlations
Mining Frequent Patterns without Candidate Generation Presented by Song Wang. March 18 th, 2009 Data Mining Class Slides Modified From Mohammed and Zhenyu’s.
Jiawei Han, Jian Pei, and Yiwen Yin School of Computing Science Simon Fraser University Mining Frequent Patterns without Candidate Generation SIGMOD 2000.
AR mining Implementation and comparison of three AR mining algorithms Xuehai Wang, Xiaobo Chen, Shen chen CSCI6405 class project.
Data Mining Frequent-Pattern Tree Approach Towards ARM Lecture
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
EFFICIENT ITEMSET EXTRACTION USING IMINE INDEX By By U.P.Pushpavalli U.P.Pushpavalli II Year ME(CSE) II Year ME(CSE)
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Mining Frequent Patterns without Candidate Generation.
Mining Frequent Patterns without Candidate Generation : A Frequent-Pattern Tree Approach 指導教授:廖述賢博士 報 告 人:朱 佩 慧 班 級:管科所博一.
Frequent Pattern  交易資料庫中頻繁的被一起購買的產品  可以做為推薦產品、銷售決策的依據  兩大演算法 Apriori FP-Tree.
Parallel Mining Frequent Patterns: A Sampling-based Approach Shengnan Cong.
Frequent Item Mining. What is data mining? =Pattern Mining? What patterns? Why are they useful?
Frequent itemset mining and temporal extensions Sunita Sarawagi
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
1 Data Mining: Mining Frequent Patterns, Association and Correlations.
Association Analysis (3)
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Reducing Number of Candidates Apriori principle: – If an itemset is frequent, then all of its subsets must also be frequent Apriori principle holds due.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
2016年6月14日星期二 2016年6月14日星期二 2016年6月14日星期二 Data Mining: Concepts and Techniques1 Mining Frequent Patterns, Associations, and Correlations (Chapter 5)
DATA MINING ASSOCIATION RULES.
Reducing Number of Candidates
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining: Concepts and Techniques
Frequent Pattern Mining
Market Basket Analysis and Association Rules
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Mining Association Rules in Large Databases
Data Mining Association Analysis: Basic Concepts and Algorithms
COMP5331 FP-Tree Prepared by Raymond Wong Presented by Raymond Wong
732A02 Data Mining - Clustering and Association Analysis
Mining Frequent Patterns without Candidate Generation
Frequent-Pattern Tree
FP-Growth Wenlong Zhang.
Mining Association Rules in Large Databases
Presentation transcript:

Association Rule Mining

Generating assoc. rules from frequent itemsets  Assume that we have discovered the frequent itemsets and their support  How do we generate association rules?  Frequent itemsets: {1}2 {2}3 {3}3 {5}3 {1,3}2 {2,3}2 {2,5}3 {3,5}2 {2,3,5}2 ? For each frequent itemset l find all nonempty subsets s. For each s generate rule s  l-s if sup(l)/sup(s)  min_conf Example: for {2,3,5}, min_conf = 75% {2,3}  5 {2,5}  3 {3,5}  2   X

Discovering Rules  Naïve Algorithm for each frequent itemset l do for each subset c of l do if (support( l ) / support( l - c ) >= minconf) then output the rule ( l – c )  c, with confidence = support( l ) / support ( l - c ) and support = support( l )

Discovering Rules (2)  Lemma. If consequent c generates a valid rule, so do all subsets of c. (e.g. X  YZ, then XY  Z and XZ  Y)  Example: Consider a frequent itemset ABCDE If ACDE  B and ABCE  D are the only one-consequent rules with minimum support confidence, then ACE  BD is the only other rule that needs to be tested

Is Apriori Fast Enough? — Performance Bottlenecks  The core of the Apriori algorithm: Use frequent (k – 1)-itemsets to generate candidate frequent k- itemsets Use database scan and pattern matching to collect counts for the candidate itemsets  The bottleneck of Apriori: candidate generation Huge candidate sets:  10 4 frequent 1-itemset will generate 10 7 candidate 2- itemsets  To discover a frequent pattern of size 100, e.g., {a 1, a 2, …, a 100 }, one needs to generate  candidates. Multiple scans of database:  Needs (n +1 ) scans, n is the length of the longest pattern

FP-growth: Mining Frequent Patterns Without Candidate Generation  Compress a large database into a compact, Frequent-Pattern tree (FP-tree) structure highly condensed, but complete for frequent pattern mining avoid costly database scans  Develop an efficient, FP-tree-based frequent pattern mining method A divide-and-conquer methodology: decompose mining tasks into smaller ones Avoid candidate generation: sub-database test only!

FP-tree Construction from a Transactional DB Item frequency f4 c4 a3 b3 m3 p3 min_support = 3 TIDItems bought (ordered) frequent items 100{f, a, c, d, g, i, m, p}{f, c, a, m, p} 200{a, b, c, f, l, m, o}{f, c, a, b, m} 300 {b, f, h, j, o}{f, b} 400 {b, c, k, s, p}{c, b, p} 500 {a, f, c, e, l, p, m, n}{f, c, a, m, p} Steps: 1. Scan DB once, find frequent 1-itemsets (single item patterns) 2. Order frequent items in descending order of their frequency 3. Scan DB again, construct FP-tree

FP-tree Construction root TIDfreq. Items bought 100{f, c, a, m, p} 200{f, c, a, b, m} 300 {f, b} 400 {c, p, b} 500 {f, c, a, m, p} Item frequency f4 c4 a3 b3 m3 p3 min_support = 3 f:1 c:1 a:1 m:1 p:1

FP-tree Construction root Item frequency f4 c4 a3 b3 m3 p3 min_support = 3 f:2 c:2 a:2 m:1 p:1 b:1 m:1 TIDfreq. Items bought 100{f, c, a, m, p} 200{f, c, a, b, m} 300 {f, b} 400 {c, p, b} 500 {f, c, a, m, p}

FP-tree Construction root Item frequency f4 c4 a3 b3 m3 p3 min_support = 3 f:3 c:2 a:2 m:1 p:1 b:1 m:1 b:1 TIDfreq. Items bought 100{f, c, a, m, p} 200{f, c, a, b, m} 300 {f, b} 400 {c, p, b} 500 {f, c, a, m, p} c:1 b:1 p:1

FP-tree Construction root Item frequency f4 c4 a3 b3 m3 p3 min_support = 3 f:4 c:3 a:3 m:2 p:2 b:1 m:1 b:1 TIDfreq. Items bought 100{f, c, a, m, p} 200{f, c, a, b, m} 300 {f, b} 400 {c, p, b} 500 {f, c, a, m, p} c:1 b:1 p:1 Header Table Item frequency head f4 c4 a3 b3 m3 p3

Benefits of the FP-tree Structure  Completeness: never breaks a long pattern of any transaction preserves complete information for frequent pattern mining  Compactness reduce irrelevant information—infrequent items are gone frequency descending ordering: more frequent items are more likely to be shared never be larger than the original database (if not count node-links and counts) Example: For Connect-4 DB, compression ratio could be over 100

Mining Frequent Patterns Using FP-tree  General idea (divide-and-conquer) Recursively grow frequent pattern path using the FP-tree  Method For each item, construct its conditional pattern-base, and then its conditional FP-tree Repeat the process on each newly created conditional FP- tree Until the resulting FP-tree is empty, or it contains only one path (single path will generate all the combinations of its sub-paths, each of which is a frequent pattern)

Mining Frequent Patterns Using the FP-tree (cont’d)  Start with last item in order (i.e., p).  Follow node pointers and traverse only the paths containing p.  Accumulate all of transformed prefix paths of that item to form a conditional pattern base Conditional pattern base for p fcam:2, cb:1 f:4 c:3 a:3 m:2 p:2 c:1 b:1 p:1 p Construct a new FP-tree based on this pattern, by merging all paths and keeping nodes that appear  sup times. This leads to only one branch c:3 Thus we derive only one frequent pattern cont. p. Pattern cp

Mining Frequent Patterns Using the FP-tree (cont’d)  Move to next least frequent item in order, i.e., m  Follow node pointers and traverse only the paths containing m.  Accumulate all of transformed prefix paths of that item to form a conditional pattern base f:4 c:3 a:3 m:2 m m:1 b:1 m-conditional pattern base: fca:2, fcab:1 {} f:3 c:3 a:3 m-conditional FP-tree (contains only path fca:3) All frequent patterns that include m m, fm, cm, am, fcm, fam, cam, fcam  

Properties of FP-tree for Conditional Pattern Base Construction  Node-link property For any frequent item a i, all the possible frequent patterns that contain a i can be obtained by following a i 's node-links, starting from a i 's head in the FP-tree header  Prefix path property To calculate the frequent patterns for a node a i in a path P, only the prefix sub-path of a i in P need to be accumulated, and its frequency count should carry the same count as node a i.

Conditional Pattern-Bases for the example Empty f {(f:3)}|c{(f:3)}c {(f:3, c:3)}|a{(fc:3)}a Empty{(fca:1), (f:1), (c:1)}b {(f:3, c:3, a:3)}|m{(fca:2), (fcab:1)}m {(c:3)}|p{(fcam:2), (cb:1)}p Conditional FP-treeConditional pattern-base Item

Principles of Frequent Pattern Growth  Pattern growth property Let  be a frequent itemset in DB, B be 's conditional pattern base, and  be an itemset in B. Then    is a frequent itemset in DB iff  is frequent in B.  “abcdef ” is a frequent pattern, if and only if “abcde ” is a frequent pattern, and “f ” is frequent in the set of transactions containing “abcde ”

Why Is Frequent Pattern Growth Fast?  Performance studies show FP-growth is an order of magnitude faster than Apriori, and is also faster than tree-projection  Reasoning No candidate generation, no candidate test Uses compact data structure Eliminates repeated database scan Basic operation is counting and FP-tree building

FP-growth vs. Apriori: Scalability With the Support Threshold Data set T25I20D10K