ASSOCIATION RULE DISCOVERY (MARKET BASKET-ANALYSIS) MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.

Slides:



Advertisements
Similar presentations
Association Rule Mining
Advertisements

Association Rules Spring Data Mining: What is it?  Two definitions:  The first one, classic and well-known, says that data mining is the nontrivial.
Pertemuan XIV FUNGSI MAYOR Assosiation. What Is Association Mining? Association rule mining: –Finding frequent patterns, associations, correlations, or.
Association Rule Mining. 2 The Task Two ways of defining the task General –Input: A collection of instances –Output: rules to predict the values of any.
MIS2502: Data Analytics Association Rule Mining. Uses What products are bought together? Amazon’s recommendation engine Telephone calling patterns Association.
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining, Frequent-Itemset Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis: Basic Concepts and Algorithms.
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rules Presented by: Anilkumar Panicker Presented by: Anilkumar Panicker.
Asssociation Rules Prof. Sin-Min Lee Department of Computer Science.
Data Mining, Frequent-Itemset Mining. Data Mining Some mining problems Find frequent itemsets in "market-basket" data – "50% of the people who buy hot.
1) Go over HW #1 solutions (Due today)
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Fast Algorithms for Association Rule Mining
Market Basket Analysis 포항공대 산업공학과 PASTA Lab. 석사과정 신원영.
Association Rules. 2 Customer buying habits by finding associations and correlations between the different items that customers place in their “shopping.
1 Statistics 202: Statistical Aspects of Data Mining Professor David Mease Tuesday, Thursday 9:00-10:15 AM Terman 156 Lecture 7 = Finish chapter 3 and.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Final Exam Review. The following is a list of items that you should review in preparation for the exam. Note that not every item in the following slides.
Supermarket shelf management – Market-basket model:  Goal: Identify items that are bought together by sufficiently many customers  Approach: Process.
Data & Text Mining1 Introduction to Association Analysis Zhangxi Lin ISQS 3358 Texas Tech University.
EXAM REVIEW MIS2502 Data Analytics. Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
The Three Analytics Techniques. Decision Trees – Determining Probability.
1 What is Association Analysis: l Association analysis uses a set of transactions to discover rules that indicate the likely occurrence of an item based.
Frequent-Itemset Mining. Market-Basket Model A large set of items, e.g., things sold in a supermarket. A large set of baskets, each of which is a small.
Association Rule Mining
ASSOCIATION RULES (MARKET BASKET-ANALYSIS) MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
Associations and Frequent Item Analysis. 2 Outline  Transactions  Frequent itemsets  Subset Property  Association rules  Applications.
Elsayed Hemayed Data Mining Course
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Chap 6: Association Rules. Rule Rules!  Motivation ~ recent progress in data mining + warehousing have made it possible to collect HUGE amount of data.
Elective-I Examination Scheme- In semester Assessment: 30 End semester Assessment :70 Text Books: Data Mining Concepts and Techniques- Micheline Kamber.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
MIS2502: Data Analytics Association Rule Mining David Schuff
Introduction to Data Mining Mining Association Rules Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.
MIS2502: Data Analytics Association Rule Mining Jeremy Shafer
Stats 202: Statistical Aspects of Data Mining Professor Rajan Patel
Data Mining – Association Rules
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Frequent Pattern Mining
William Norris Professor and Head, Department of Computer Science
Exam #3 Review Zuyin (Alvin) Zheng.
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Rules Assoc.Prof.Songül Varlı Albayrak
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis: Basic Concepts and Algorithms
Frequent patterns and Association Rules
MIS2502: Data Analytics Association Rule Mining
MIS2502: Data Analytics Association Rule Mining
MIS2502: Data Analytics Association Rule Learning
Association Analysis: Basic Concepts
Presentation transcript:

ASSOCIATION RULE DISCOVERY (MARKET BASKET-ANALYSIS) MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.

What is Association Mining? Discovering interesting relationships between variables in large databases ( Find out which items predict the occurrence of other items Also known as “affinity analysis” or “market basket” analysis

Examples of Association Mining Market basket analysis/affinity analysis What products are bought together? Where to place items on grocery store shelves? Amazon’s recommendation engine “People who bought this product also bought…” Telephone calling patterns Who do a set of people tend to call most often? Social network analysis Determine who you “may know”

Market-Basket Analysis Large set of items e.g., things sold in a supermarket Large set of baskets e.g., things one customer buys in one visit Supermarket chains keep terabytes of this data Informs store layout Suggests “tie-ins” Place spaghetti sauce in the pasta aisle Put diapers on sale and raise the price of beer

Market-Basket Transactions BasketItems 1 Bread, Milk 2 Bread, Diapers, Beer, Eggs 3 Milk, Diapers, Beer, Coke 4 Bread, Milk, Diapers, Beer 5 Bread, Milk, Diapers, Coke Association Rules from these transactions X  Y (antecedent  consequent) {Diapers}  {Beer}, {Milk, Bread}  {Diapers} {Beer, Bread}  {Milk}, {Bread}  {Milk, Diapers}

Core idea: The itemset Itemset: A group of items of interest {Milk, Beer, Diapers} This itemset is a “3 itemset” because it contains…3 items! An association rule expresses related itemsets X  Y, where X and Y are two itemsets {Milk, Diapers}  {Beer} means “when you have milk and diapers, you also have beer) BasketItems 1 Bread, Milk 2 Bread, Diapers, Beer, Eggs 3 Milk, Diapers, Beer, Coke 4 Bread, Milk, Diapers, Beer 5 Bread, Milk, Diapers, Coke

Support Support count (  ) Frequency of occurrence of an itemset  {Milk, Beer, Diapers} = 2 (i.e., it’s in baskets 4 and 5) Support (s) Fraction of transactions that contain all itemsets in the relationship X  Y s({Milk, Diapers, Beer}) = 2/5 = 0.4 You can calculate support for both X and Y separately Support for X = 3/5 = 0.6; Support for Y = 3/5 = 0.6 BasketItems 1 Bread, Milk 2 Bread, Diapers, Beer, Eggs 3 Milk, Diapers, Beer, Coke 4 Bread, Milk, Diapers, Beer 5 Bread, Milk, Diapers, Coke X Y 2 baskets have milk, beer, and diapers 5 baskets total

Confidence Confidence is the strength of the association Measures how often items in Y appear in transactions that contain X BasketItems 1 Bread, Milk 2 Bread, Diapers, Beer, Eggs 3 Milk, Diapers, Beer, Coke 4 Bread, Milk, Diapers, Beer 5 Bread, Milk, Diapers, Coke This says 67% of the times when you have milk and diapers in the itemset you also have beer! c must be between 0 and 1 1 is a complete association 0 is no association

Some sample rules Association RuleSupport (s)Confidence (c) {Milk,Diapers}  {Beer} 2/5 = 0.42/3 = 0.67 {Milk,Beer}  {Diapers} 2/5 = 0.42/2 = 1.0 {Diapers,Beer}  {Milk} 2/5 = 0.42/3 = 0.67 {Beer}  {Milk,Diapers} 2/5 = 0.42/3 = 0.67 {Diapers}  {Milk,Beer} 2/5 = 0.42/4 = 0.5 {Milk}  {Diapers,Beer} 2/5 = 0.42/4 = 0.5 BasketItems 1 Bread, Milk 2 Bread, Diapers, Beer, Eggs 3 Milk, Diapers, Beer, Coke 4 Bread, Milk, Diapers, Beer 5 Bread, Milk, Diapers, Coke All the above rules are binary partitions of the same itemset: {Milk, Diapers, Beer}

But don’t blindly follow the numbers Rules originating from the same itemset (X  Y) will have identical support Since the total elements (X  Y) are always the same But they can have different confidence Depending on what is contained in X and Y High confidence suggests a strong association But this can be deceptive Consider {Bread}  {Diapers} Support for the total itemset is 0.6 (3/5) And confidence is 0.75 (3/4) – pretty high But is this just because both are frequently occurring items (s=0.8)? You’d almost expect them to show up in the same baskets by chance

Lift Takes into account how co-occurrence differs from what is expected by chance i.e., if items were selected independently from one another Based on the support metric Support for total itemset X and Y Support for X times support for Y

Lift Example What’s the lift of the association rule {Milk, Diapers}  {Beer} So X = {Milk, Diapers} and Y = {Beer} s({Milk, Diapers, Beer}) = 2/5 = 0.4 s({Milk, Diapers}) = 3/5 = 0.6 s({Beer}) = 3/5 = 0.6 So BasketItems 1 Bread, Milk 2 Bread, Diapers, Beer, Eggs 3 Milk, Diapers, Beer, Coke 4 Bread, Milk, Diapers, Beer 5 Bread, Milk, Diapers, Coke When Lift > 1, the occurrence of X  Y together is more likely than what you would expect by chance

Another example Checking Account Savings Account NoYes No Yes Are people more likely to have a checking account if they have a savings account? Support ({Savings}  {Checking}) = 5000/10000 = 0.5 Support ({Savings}) = 6000/10000 = 0.6 Support ({Checking}) = 8500/10000 = 0.85 Confidence ({Savings}  {Checking}) = 5000/6000 = 0.83 Answer: No In fact, it’s slightly less than what you’d expect by chance!

Selecting the rules We know how to calculate the measures for each rule Support Confidence Lift Then we set up thresholds for the minimum rule strength we want to accept For support – called minsup For confidence – called minconf The steps List all possible association rules Compute the support and confidence for each rule Drop rules that don’t make the minsup and minconf thresholds Use lift to double- check the association

But this can be overwhelming Imagine all the combinations possible in your local grocery store Every product matched with every combination of other products Tens of thousands of possible rule combinations So where do you start?

Once you are confident in a rule, take action {Milk, Diapers}  {Beer} Possible Marketing Actions Lower the price of milk and diapers, raise it on beer Put beer next to one of those products in the store Put beer in a special section of the store away from milk and diapers Create “New Parent Coping Kits” of beer, milk, and diapers Target new beers to people who buy milk and diapers