Chapter 14 – Association Rules

Slides:



Advertisements
Similar presentations
Association Rules Evgueni Smirnov.
Advertisements

Mining Association Rules from Microarray Gene Expression Data.
Data Mining Techniques Association Rule
DATA MINING Association Rule Discovery. AR Definition aka Affinity Grouping Common example: Discovery of which items are frequently sold together at a.
IT 433 Data Warehousing and Data Mining Association Rules Assist.Prof.Songül Albayrak Yıldız Technical University Computer Engineering Department
Association Rule Mining. 2 The Task Two ways of defining the task General –Input: A collection of instances –Output: rules to predict the values of any.
MIS2502: Data Analytics Association Rule Mining. Uses What products are bought together? Amazon’s recommendation engine Telephone calling patterns Association.
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Mining Association Rules. Association rules Association rules… –… can predict any attribute and combinations of attributes … are not intended to be used.
Chapter 9 Business Intelligence Systems
Fast Algorithms for Association Rule Mining
IBM SPSS Modeler 14.2 Data Mining Concepts Introduction to Undirected Data Mining: Association Analysis Prepared by David Douglas, University of ArkansasHosted.
Introduction to Data Mining Data mining is a rapidly growing field of business analytics focused on better understanding of characteristics and.
MIS 451 Building Business Intelligence Systems Association Rule Mining (1)
Chapter 13 – Association Rules
Association Discovery from Databases Association rules are a simple formalism for expressing positive connections between columns in a 0/1 matrix. A classical.
Basic Data Mining Techniques
Overview DM for Business Intelligence.
Market Basket Analysis 포항공대 산업공학과 PASTA Lab. 석사과정 신원영.
Association Rules. 2 Customer buying habits by finding associations and correlations between the different items that customers place in their “shopping.
Association Rules. CS583, Bing Liu, UIC 2 Association rule mining Proposed by Agrawal et al in Initially used for Market Basket Analysis to find.
ASSOCIATION RULE DISCOVERY (MARKET BASKET-ANALYSIS) MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.
EXAM REVIEW MIS2502 Data Analytics. Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering.
Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.
1 What is Association Analysis: l Association analysis uses a set of transactions to discover rules that indicate the likely occurrence of an item based.
Lecture 4: Association Market Basket Analysis Analysis of Customer Behavior and Service Modeling.
Association Rule Mining
ASSOCIATION RULES (MARKET BASKET-ANALYSIS) MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.
Associations and Frequent Item Analysis. 2 Outline  Transactions  Frequent itemsets  Subset Property  Association rules  Applications.
Charles Tappert Seidenberg School of CSIS, Pace University
CURE Clustering Using Representatives Handles outliers well. Hierarchical, partition First a constant number of points c, are chosen from each cluster.
Elsayed Hemayed Data Mining Course
Data Mining  Association Rule  Classification  Clustering.
Chapter 8 Association Rules. Data Warehouse and Data Mining Chapter 10 2 Content Association rule mining Mining single-dimensional Boolean association.
Chap 6: Association Rules. Rule Rules!  Motivation ~ recent progress in data mining + warehousing have made it possible to collect HUGE amount of data.
Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.
MIS2502: Data Analytics Association Rule Mining David Schuff
Chapter 14 – Association Rules and Collaborative Filtering © Galit Shmueli and Peter Bruce 2016 Data Mining for Business Analytics (3rd ed.) Shmueli, Bruce.
Chapter 13 – Association Rules DM for Business Intelligence.
MIS2502: Data Analytics Association Rule Mining Jeremy Shafer
Mining Association Rules in Large Database This work is created by Dr. Anamika Bhargava, Ms. Pooja Kaul, Ms. Priti Bali and Ms. Rajnipriya Dhawan and licensed.
By Arijit Chatterjee Dr
Association Rules Repoussis Panagiotis.
Mining Association Rules
Knowledge discovery & data mining Association rules and market basket analysis--introduction UCLA CS240A Course Notes*
Tutorial 3: Using XLMiner for Association Rule Mining
Association Rules.
Association Rule Mining (Data Mining Tool)
I. Association Market Basket Analysis.
Eco 6380 Predictive Analytics For Economists Spring 2016
Market Basket Analysis and Association Rules
Market Basket Many-to-many relationship between different objects
Association Rule Mining
Data Mining II: Association Rule mining & Classification
DIRECT HASHING AND PRUNING (DHP) ALGORITHM
Data Mining Association Rules Assoc.Prof.Songül Varlı Albayrak
Transactional data Algorithm Applications
Analysis of Customer Behavior and Service Modeling
Frequent patterns and Association Rules
MIS2502: Data Analytics Association Rule Mining
Market Basket Analysis and Association Rules
MIS2502: Data Analytics Association Rule Mining
©Jiawei Han and Micheline Kamber
I. Association Market Basket Analysis.
Data Science in Industry
Lecture 11 (Market Basket Analysis)
Association Rules :A book store case
MIS2502: Data Analytics Association Rule Learning
Charles Tappert Seidenberg School of CSIS, Pace University
Presentation transcript:

Chapter 14 – Association Rules DM for Business Intelligence

What are Association Rules? Study of “what goes with what” “Customers who bought X also bought Y” What symptoms go with what diagnosis Transaction-based or event-based Also called “market basket analysis” and “affinity analysis” Originated with study of customer transactions databases to determine associations among items purchased Example of Unsupervised Learning

Used in many recommender systems

Generating Rules

Terms “IF” part = antecedent, represented by “(a)” “THEN” part = consequent, represented by “(c)” “Item set” = the items (e.g., products) comprising the antecedent and consequent, represented by “(a U c)” Antecedent and consequent are disjoint (i.e., have no items in common) For example, you wouldn’t say “customers bought beer and beer”

Tiny Example: Phone Cases/Faceplates

Many Rules are Possible For example: Transaction 1 supports several rules, such as “If red, then white” (“If a red faceplate is purchased, then so is a white one”) “If white, then red” “If red and white, then green” + several more

Frequent Item Sets Ideally, we want to create all possible combinations of items Problem: computation time grows exponentially as number of combinations increases Solution: consider only “frequent item sets” Criterion for frequent: Support

Support Support = Count (or percent) of total transactions that include an Item or Item Set Support(a) Support(c) Support(a U c) Example: support for the Item Set {red, white} is 4 out of 10 transactions, or 40%

Tiny Example: Phone Cases/Faceplates Support for rule: “If Red, then White” Tiny Example: Phone Cases/Faceplates Support = 4/10 or 40%

Apriori Algorithm

Generating Frequent Item Sets For k products… User sets a minimum support criterion Next, generate list of one-item sets that meet the support criterion Use the list of one-item sets to generate list of two- item sets that meet the support criterion Use list of two-item sets to generate list of three- item sets Continue up through k-item sets

Measures of Performance Support: = count or % of total transactions that include an Item or Item Set. Ex, Support(a), Support(c), or Support (a U c) Confidence = % of antecedent (IFs) transactions that also have the consequent (THEN) item set (“within sets” measure . . . IF and THENs / IFs) Or Support(a U c) / Support(a) Benchmark Confidence = transactions with consequent as % of all transactions (THENs / Total Records in Data) Or Support(c) / Total Records Lift = Confidence / Benchmark Confidence Lift > 1 indicates a rule that is useful in finding consequent items sets (i.e., more useful than just selecting transactions randomly)

Tiny Example: Phone Cases/Faceplates Analyzing the rule: “If Red, then White” Tiny Example: Phone Cases/Faceplates Confidence = 4/5 or 80%

Tiny Example: Phone Cases/Faceplates Analyzing the rule: “If Red, then White” Tiny Example: Phone Cases/Faceplates Confidence = 4/5 or 80%

Tiny Example: Phone Cases/Faceplates Analyzing the rule: “If Red, then White” Tiny Example: Phone Cases/Faceplates Confidence = 4/5 or 80% Benchmark Confidence = 8/10 or 80% Lift = 80%  80% = 1 (not greater than 1, so not good lift)

Select Rules with Confidence Generate all rules that meet specified support & confidence Find frequent item sets (those with sufficient support – see above) From these item sets, generate rules with sufficient confidence

Alternate Data Format: Binary Matrix Analyzing the rule: “If Red and White, then Green” Alternate Data Format: Binary Matrix (Antecedent = a) (Consequent = c) IF Red and White = 4 (a) IF Red and White, THEN Green = 2 (a U c) = 2  4 or 50% Confidence

Example: Rules from {red, white, green} {red, white}  {green} with confidence = 2/4 = 50% [(support {red, white, green})/(support {red, white})] {red, green}  {white} with confidence = 2/2 = 100% [(support {red, white, green})/(support {red, green})] Plus 4 more with confidence of 100%, 33%, 29% & 100% If confidence criterion is 70%, report only rules 2, 3 and 6

All Rules (XLMiner Output) “IF” “THEN” 1 2 3 4 5 6 50 100 33 29 Red, White => Red, Green => White, Green => Red => White => Green => Green White Red White, Green Red, Green Red, White 4 2 6 7 2 7 6 4 2 2.5 1.4 1.7 1.5 Confidence = Support(a U c) / Support(a) Benchmark Confidence = Support(c) / Total Records (e.g., 10 records) Lift Ratio = Confidence / Benchmark Confidence

Interpretation Lift ratio shows how effective the rule is in finding consequents (useful if finding particular consequents is important) Confidence shows the rate at which consequents will be found (useful in learning costs of promotion) Support measures overall impact

Caution: The Role of Chance Random data can generate apparently interesting association rules The more rules you produce, the greater this danger Rules based on large numbers of records are less subject to this danger

Summary Association rules (or affinity analysis, or market basket analysis) produce rules on among items from a database of transactions Is Unsupervised Learning Widely used in recommender systems Most popular method is Apriori algorithm To reduce computation, we consider only “frequent” item sets Performance is measured by confidence and lift Can produce too many rules; review is required to identify useful rules and to reduce redundancy