Chapter 5 Mining Association Rules with FP Tree Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.

Slides:



Advertisements
Similar presentations
Mining Association Rules
Advertisements

Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Mining Frequent Patterns Using FP-Growth Method Ivan Tanasić Department of Computer Engineering and Computer Science, School of Electrical.
CSE 634 Data Mining Techniques
Data Mining Techniques Association Rule
Mining Multiple-level Association Rules in Large Databases
732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña FP grow algorithm Correlation analysis.
Advanced Topics in Data Mining: Association Rules
FP-Growth algorithm Vasiljevic Vladica,
FP (FREQUENT PATTERN)-GROWTH ALGORITHM ERTAN LJAJIĆ, 3392/2013 Elektrotehnički fakultet Univerziteta u Beogradu.
1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing.
Data Mining Association Analysis: Basic Concepts and Algorithms
CPS : Information Management and Mining
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Association rules Apriori algorithm FP grow algorithm.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
FP-growth. Challenges of Frequent Pattern Mining Improving Apriori Fp-growth Fp-tree Mining frequent patterns with FP-tree Visualization of Association.
Data Mining Association Analysis: Basic Concepts and Algorithms
1 Mining Frequent Patterns Without Candidate Generation Apriori-like algorithm suffers from long patterns or quite low minimum support thresholds. Two.
Association Analysis: Basic Concepts and Algorithms.
Association Rule Mining. Generating assoc. rules from frequent itemsets  Assume that we have discovered the frequent itemsets and their support  How.
Data Mining Association Analysis: Basic Concepts and Algorithms
Keith C.C. Chan Department of Computing The Hong Kong Polytechnic University Ch 2 Discovering Association Rules COMP 578 Data Warehousing & Data Mining.
1 Association Rule Mining (II) Instructor: Qiang Yang Thanks: J.Han and J. Pei.
Frequent-Pattern Tree. 2 Bottleneck of Frequent-pattern Mining  Multiple database scans are costly  Mining long patterns needs many passes of scanning.
Association Analysis (3). FP-Tree/FP-Growth Algorithm Use a compressed representation of the database using an FP-tree Once an FP-tree has been constructed,
Association Rule Mining. Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and closed patterns.
SEG Tutorial 2 – Frequent Pattern Mining.
Ch5 Mining Frequent Patterns, Associations, and Correlations
Mining Frequent Patterns without Candidate Generation Presented by Song Wang. March 18 th, 2009 Data Mining Class Slides Modified From Mohammed and Zhenyu’s.
Jiawei Han, Jian Pei, and Yiwen Yin School of Computing Science Simon Fraser University Mining Frequent Patterns without Candidate Generation SIGMOD 2000.
AR mining Implementation and comparison of three AR mining algorithms Xuehai Wang, Xiaobo Chen, Shen chen CSCI6405 class project.
Data Mining Frequent-Pattern Tree Approach Towards ARM Lecture
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Mining Frequent Patterns without Candidate Generation.
Mining Frequent Patterns without Candidate Generation : A Frequent-Pattern Tree Approach 指導教授:廖述賢博士 報 告 人:朱 佩 慧 班 級:管科所博一.
Frequent Item Mining. What is data mining? =Pattern Mining? What patterns? Why are they useful?
Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Association Rules presented by Zbigniew W. Ras *,#) *) University of North Carolina – Charlotte #) ICS, Polish Academy of Sciences.
What is Frequent Pattern Analysis?
Data Mining  Association Rule  Classification  Clustering.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Reducing Number of Candidates Apriori principle: – If an itemset is frequent, then all of its subsets must also be frequent Apriori principle holds due.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
DATA MINING: ASSOCIATION ANALYSIS (2) Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Frequent Pattern Mining
Chapter 6 Tutorial.
Market Basket Analysis and Association Rules
Big Data Analytics: HW#2
Data Mining Association Analysis: Basic Concepts and Algorithms
Mining Association Rules in Large Databases
Frequent-Pattern Tree
Market Basket Analysis and Association Rules
FP-Growth Wenlong Zhang.
Department of Computer Science National Tsing Hua University
Presentation transcript:

Chapter 5 Mining Association Rules with FP Tree Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010

Mining Frequent Itemsets without Candidate Generation In many cases, the Apriori candidate generate-and-test method significantly reduces the size of candidate sets, leading to good performance gain. However, it suffer from two nontrivial costs: It may generate a huge number of candidates (for example, if we have 10^4 1-itemset, it may generate more than 10^7 candidata 2-itemset) It may need to scan database many times

Association Rules with Apriori Minimum support=2/9 Minimum confidence=70%

Bottleneck of Frequent-pattern Mining Multiple database scans are costly Mining long patterns needs many passes of scanning and generates lots of candidates To find frequent itemset i 1 i 2 …i 100 # of scans: 100 # of Candidates: ( ) + ( ) + … + ( ) = = 1.27*10 30 ! Bottleneck: candidate-generation-and-test Can we avoid candidate generation?

Mining Frequent Patterns Without Candidate Generation Grow long patterns from short ones using local frequent items “abc” is a frequent pattern Get all transactions having “abc”: DB|abc “d” is a local frequent item in DB|abc  abcd is a frequent pattern

Process of FP growth Scan DB once, find frequent 1-itemset (single item pattern) Sort frequent items in frequency descending order Scan DB again, construct FP-tree

Association Rules Let’s have an example T1001,2,5 T2002,4 T3002,3 T4001,2,4 T5001,3 T6002,3 T7001,3 T8001,2,3,5 T9001,2,3

FP Tree

Mining the FP tree

Benefits of the FP-tree Structure Completeness Preserve complete information for frequent pattern mining Never break a long pattern of any transaction Compactness Reduce irrelevant info—infrequent items are gone Items in frequency descending order: the more frequently occurring, the more likely to be shared Never be larger than the original database (not count node-links and the count field) For Connect-4 DB, compression ratio could be over 100

Exercise A dataset has five transactions, let min- support=60% and min_confidence=80% Find all frequent itemsets using FP Tree TIDItems_bought T1 T2 T3 T4 T5 M, O, N, K, E, Y D, O, N, K, E, Y M, A, K, E M, U, C, K,Y C, O, O, K, I,E

Association Rules with Apriori K:5KE:4KE E:4KM:3KM M:3KO:3KO O:3=>KY:3=>KY=>KEO Y:3EM:2EO EO:3 EY:2 MO:1 MY:2 OY:2

Association Rules with FP Tree K:5 E:4 M:3 O:3 Y:3

Association Rules with FP Tree Y: KEMO:1 KEO:1 KY:1 K:3KY O: KEM:1 KE:2 KE:3KO EO KEO M: KE:2 K:1 K:3KM E: K:4KE

FP-Growth vs. Apriori: Scalability With the Support Threshold Data set T25I20D10K

Why Is FP-Growth the Winner? Divide-and-conquer: decompose both the mining task and DB according to the frequent patterns obtained so far leads to focused search of smaller databases Other factors no candidate generation, no candidate test compressed database: FP-tree structure no repeated scan of entire database basic ops—counting local freq items and building sub FP-tree, no pattern search and matching

Strong Association Rules are not necessary interesting Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010

Example 5.8 Misleading “Strong” Association Rule Of the 10,000 transactions analyzed, the data show that 6,000 of the customer included computer games, while 7,500 include videos, And 4,000 included both computer games and videos

Misleading “Strong” Association Rule For this example: Support (Game & Video) = 4,000 / 10,000 =40% Confidence (Game => Video) = 4,000 / 6,000 = 66% Suppose it pass our minimum support and confidence (30%, 60%, respectively)

Misleading “Strong” Association Rule However, the truth is : “computer games and videos are negatively associated” Which means the purchase of one of these items actually decreases the likelihood of purchasing the other. (How to get this conclusion??)

Misleading “Strong” Association Rule Under the normal situation, 60% of customers buy the game 75% of customers buy the video Therefore, it should have 60% * 75% = 45% of people buy both That equals to 4,500 which is more than 4,000 (the actual value)

From Association Analysis to Correlation Analysis Lift is a simple correlation measure that is given as follows The occurrence of itemset A is independent of the occurrence of itemset B if P(A U B) = P(A)P(B) Otherwise, itemset A and B are dependent and correlated as events Lift(A,B) = P(A U B) / P(A)P(B) If the value is less than 1, the occurrence of A is negatively correlated with the occurrence of B If the value is greater than 1, then A and B are positively correlated

Mining Multiple-Level Association Rules Items often form hierarchies

Mining Multiple-Level Association Rules Items often form hierarchies

Mining Multiple-Level Association Rules Flexible support settings Items at the lower level are expected to have lower support uniform support Milk [support = 10%] 2% Milk [support = 6%] Skim Milk [support = 4%] Level 1 min_sup = 5% Level 2 min_sup = 5% Level 1 min_sup = 5% Level 2 min_sup = 3% reduced support

Multi-level Association: Redundancy Filtering Some rules may be redundant due to “ancestor” relationships between items. Example milk  wheat bread [support = 8%, confidence = 70%] 2% milk  wheat bread [support = 2%, confidence = 72%] We say the first rule is an ancestor of the second rule.