MIS2502: Data Analytics Association Rule Learning

Slides:



Advertisements
Similar presentations
DATA MINING Association Rule Discovery. AR Definition aka Affinity Grouping Common example: Discovery of which items are frequently sold together at a.
Advertisements

FUNGSI MAYOR Assosiation. What Is Association Mining? Association rule mining: –Finding frequent patterns, associations, correlations, or causal structures.
Pertemuan XIV FUNGSI MAYOR Assosiation. What Is Association Mining? Association rule mining: –Finding frequent patterns, associations, correlations, or.
Association Rule Mining. 2 The Task Two ways of defining the task General –Input: A collection of instances –Output: rules to predict the values of any.
MIS2502: Data Analytics Association Rule Mining. Uses What products are bought together? Amazon’s recommendation engine Telephone calling patterns Association.
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Mining Association Rules. Association rules Association rules… –… can predict any attribute and combinations of attributes … are not intended to be used.
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rules Presented by: Anilkumar Panicker Presented by: Anilkumar Panicker.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Fast Algorithms for Association Rule Mining
Market Basket Analysis 포항공대 산업공학과 PASTA Lab. 석사과정 신원영.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
ASSOCIATION RULE DISCOVERY (MARKET BASKET-ANALYSIS) MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.
Supermarket shelf management – Market-basket model:  Goal: Identify items that are bought together by sufficiently many customers  Approach: Process.
EXAM REVIEW MIS2502 Data Analytics. Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
The Three Analytics Techniques. Decision Trees – Determining Probability.
1 What is Association Analysis: l Association analysis uses a set of transactions to discover rules that indicate the likely occurrence of an item based.
Frequent-Itemset Mining. Market-Basket Model A large set of items, e.g., things sold in a supermarket. A large set of baskets, each of which is a small.
Association Rule Mining
ASSOCIATION RULES (MARKET BASKET-ANALYSIS) MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
Associations and Frequent Item Analysis. 2 Outline  Transactions  Frequent itemsets  Subset Property  Association rules  Applications.
Charles Tappert Seidenberg School of CSIS, Pace University
CURE Clustering Using Representatives Handles outliers well. Hierarchical, partition First a constant number of points c, are chosen from each cluster.
Elsayed Hemayed Data Mining Course
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Chap 6: Association Rules. Rule Rules!  Motivation ~ recent progress in data mining + warehousing have made it possible to collect HUGE amount of data.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
MIS2502: Data Analytics Association Rule Mining David Schuff
MIS2502: Data Analytics Association Rule Mining Jeremy Shafer
Stats 202: Statistical Aspects of Data Mining Professor Rajan Patel
Data Mining – Association Rules
By Arijit Chatterjee Dr
Data Mining Association Analysis: Basic Concepts and Algorithms
Knowledge discovery & data mining Association rules and market basket analysis--introduction UCLA CS240A Course Notes*
Frequent Pattern Mining
Association Rules.
I. Association Market Basket Analysis.
Market Basket Many-to-many relationship between different objects
Exam #3 Review Zuyin (Alvin) Zheng.
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Rules Assoc.Prof.Songül Varlı Albayrak
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis: Basic Concepts and Algorithms
Frequent patterns and Association Rules
MIS2502: Data Analytics Association Rule Mining
Market Basket Analysis and Association Rules
MIS2502: Data Analytics Association Rule Mining
I. Association Market Basket Analysis.
MIS2502: Review for Exam 3 Aaron Zhi Cheng
Data Science in Industry
Department of Computer Science National Tsing Hua University
Lecture 11 (Market Basket Analysis)
Association Rues Analysis .Event A -> Event ?
Chapter 14 – Association Rules
Association Analysis: Basic Concepts
Data Mining CSCI 307, Spring 2019 Lecture 18
Charles Tappert Seidenberg School of CSIS, Pace University
Presentation transcript:

MIS2502: Data Analytics Association Rule Learning Zhe (Joe) Deng deng@temple.edu http://community.mis.temple.edu/zdeng

Association Rule Mining Find out which items predict the occurrence of other items Also known as “affinity analysis” or “market basket” analysis

Case 1: Amazon Recommender System

Case 2: The parable of the beer and diapers A retailer’ selling transaction data was mined extensively and many correlations appeared. Some of these were obvious; people who buy gin are also likely to buy tonic. However, one correlation stood out like a sore thumb because it was so unexpected. On Friday afternoons, young American males who buy diapers also have a predisposition to buy beer. No one had predicted that result, so no one would ever have even asked the question in the first place.

Market-Basket Transactions Items 1 Bread, Milk 2 Bread, Diapers, Beer, Eggs 3 Milk, Diapers, Beer, Coke 4 Bread, Milk, Diapers, Beer 5 Bread, Milk, Diapers, Coke We usually start from a data set like this – with baskets of transactions And the idea is to find associations between products

Market-Basket Transactions Items 1 Bread, Milk 2 Bread, Diapers, Beer, Eggs 3 Milk, Diapers, Beer, Coke 4 Bread, Milk, Diapers, Beer 5 Bread, Milk, Diapers, Coke X  Y (antecedent  consequent) (aka LHS RHS) {Diapers}  {Beer}, {Milk, Bread}  {Diapers} {Beer, Bread}  {Milk}, {Bread}  {Milk, Diapers} Association Rules from these transactions

Core idea: The itemset Itemset A group of items of interest {Milk, Diapers, Beer} Association rules express relationships between itemsets X  Y {Milk, Diapers}  {Beer} “when you have milk and diapers, you are also likely to have beer” Basket Items 1 Bread, Milk 2 Bread, Diapers, Beer, Eggs 3 Milk, Diapers, Beer, Coke 4 Bread, Milk, Diapers, Beer 5 Bread, Milk, Diapers, Coke

How To Measure The Association (X => Y)? Support Count & Support Support is an indication of how frequently the itemset appears in the dataset. Confidence Confidence is an indication of how often the rule has been found to be true. Lift The ratio of the observed support to that expected if X and Y were independent.

Support Count () Support count () In how many baskets does the itemset appear? {Milk, Diapers, Beer} = 2 (i.e., in baskets 3 and 4) You can calculate support count for both X and Y separately {Milk, Diapers} = ? {Beer} = ? Basket Items 1 Bread, Milk 2 Bread, Diapers, Beer, Eggs 3 Milk, Diapers, Beer, Coke 4 Bread, Milk, Diapers, Beer 5 Bread, Milk, Diapers, Coke X Y 2 baskets have milk, beer, and diapers 5 baskets total

Support (s) Support (s) Fraction of transactions that contain all items in the itemset s({Milk, Diapers, Beer}) = {Milk, Diapers, Beer} /(# of transactions) =2/5 = 0.4 You can calculate support for both X and Y separately Support for X: s{Milk, Diapers}= ? Support for Y: s{Beer}= ? Basket Items 1 Bread, Milk 2 Bread, Diapers, Beer, Eggs 3 Milk, Diapers, Beer, Coke 4 Bread, Milk, Diapers, Beer 5 Bread, Milk, Diapers, Coke X Y This means 40% of the baskets contain Milk, Diapers and Beers

Confidence (c) Confidence (c) is the strength of the association Basket Items 1 Bread, Milk 2 Bread, Diapers, Beer, Eggs 3 Milk, Diapers, Beer, Coke 4 Bread, Milk, Diapers, Beer 5 Bread, Milk, Diapers, Coke Confidence (c) is the strength of the association Measures how often items in Y appear in transactions that contain X Support for total itemset X and Y Support for X This says 67% of the times when you have milk and diapers in the itemset you also have beer! c must be between 0 and 1 1 is a complete association 0 is no association

Calculating and Interpreting Confidence Basket Items 1 Bread, Milk 2 Bread, Diapers, Beer, Eggs 3 Milk, Diapers, Beer, Coke 4 Bread, Milk, Diapers, Beer 5 Bread, Milk, Diapers, Coke Association Rule (XY) Confidence (XY) What it means {Milk,Diapers}  {Beer} 0.4/0.6 = 2/3= 0.67 2 baskets have milk, diapers, beer 3 baskets have milk and diapers So, 67% of the baskets with milk and diapers also have beer {Milk,Beer}  {Diapers} 0.4/0.4 = 2/2= 1.0 2 baskets have milk and beer So, 100% of the baskets with milk and beer also have diapers {Milk}  {Diapers,Beer} 0.4/0.8 = 2/4 = 0.5 4 baskets have milk So, 50% of the baskets with milk also have diapers and beer

But don’t blindly follow the numbers i.e., high confidence suggests a strong association… But this can be deceptive Consider {Bread} {Diapers} Support for the total itemset is 0.6 (3/5) And confidence is 0.75 (3/4) – pretty high But is this just because both are frequently occurring items (s=0.8)? You’d almost expect them to show up in the same baskets by chance

Lift Takes into account how co-occurrence differs from what is expected by chance i.e., if items were selected independently from one another Support for total itemset X and Y Support for X times support for Y

What does the Lift mean? Thus, we can re-write Lift as Recall that 𝑐 𝑋→𝑌 = 𝑠 𝑋→𝑌 𝑠(𝑋) Thus, we can re-write Lift as 𝐿𝑖𝑓𝑡 𝑋→𝑌 = 𝑠 𝑋→𝑌 𝑆 𝑋 ∗𝑆 𝑌 = 𝑠 𝑋→𝑌 𝑆 𝑋 𝑆 𝑌 = 𝑐 𝑋→𝑌 𝑆 𝑌 𝑐 𝑋→𝑌 : how often items in Y appear in transactions that contain X 𝑆(𝑌): how often items in Y appear in all transactions Lift > 1 The occurrence of X  Y together is more likely than what you would expect by chance ( 𝐜 𝐗→𝐘 𝐬 𝐘 >𝟏) Lift<1 The occurrence of X  Y together is less likely than what you would expect by chance Lift=1 The occurrence of X  Y together is the same as what you would expect by chance (i.e. X and Y are independent of each other)

Lift Example What’s the lift for the rule: {Milk, Diapers}  {Beer} So X = {Milk, Diapers} Y = {Beer} s({Milk, Diapers}  {Beer}) = 2/5 = 0.4 s({Milk, Diapers}) = 3/5 = 0.6 s({Beer}) = 3/5 = 0.6 So Basket Items 1 Bread, Milk 2 Bread, Diapers, Beer, Eggs 3 Milk, Diapers, Beer, Coke 4 Bread, Milk, Diapers, Beer 5 Bread, Milk, Diapers, Coke When Lift > 1, the occurrence of X  Y together is more likely than what you would expect by chance

Another example Netflix Cable TV Yes 200 3800 8000 1000 What is the effect of Netflix on Cable TV? (Netflix  CableTV) Total = 200 + 3800 + 8000 + 1000 = 13000 People with both services = 1000/13000  7% People with Cable TV = (8000+1000)/13000  69% People with Netflix = (3800+1000)/13000  37% Having one negatively affects the purchase of the other (lift < 1)

Selecting the rules The steps List all possible association rules We know how to calculate the measures for each rule Support Confidence Lift Then we set up thresholds for the minimum rule strength we want to accept The steps List all possible association rules Compute the support and confidence for each rule Drop rules that don’t make the thresholds Use lift to further check the association

Once you are confident in a rule, take action {Diapers}  {Beer} Possible Marketing Actions Put diaper next to beer in the store Put diaper away from beer in the store (why?) Bundle beer and diaper into “New Parent Coping Kit” What are some others?

Summary Support, confidence, and lift In-Class Activity: Explain what each means Can you have high confidence and low lift? How to compute In-Class Activity: ICA 14: Computing Confidence, Support, and Lift ICA 15: Association Rule Mining Using R

Formulas Support Confidence Lift Fraction of transactions that contain all items Confidence Measures how often items in Y appear in transactions that contain X Formulas Lift How co-occurrence differs from what is expected by chance 𝐿𝑖𝑓𝑡 𝑋→𝑌 = 𝑠 𝑋→𝑌 𝑆 𝑋 ∗𝑆 𝑌 = 𝑠 𝑋→𝑌 𝑆 𝑋 𝑆 𝑌 = 𝑐 𝑋→𝑌 𝑆 𝑌

Time for our 14th & 15th ICA!