Chapter 14 – Association Rules

Slides:

Advertisements

Similar presentations

Association Rules Evgueni Smirnov.

Advertisements

Mining Association Rules from Microarray Gene Expression Data.

Data Mining Techniques Association Rule

DATA MINING Association Rule Discovery. AR Definition aka Affinity Grouping Common example: Discovery of which items are frequently sold together at a.

IT 433 Data Warehousing and Data Mining Association Rules Assist.Prof.Songül Albayrak Yıldız Technical University Computer Engineering Department

Association Rule Mining. 2 The Task Two ways of defining the task General –Input: A collection of instances –Output: rules to predict the values of any.

MIS2502: Data Analytics Association Rule Mining. Uses What products are bought together? Amazon’s recommendation engine Telephone calling patterns Association.

Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.

Mining Association Rules. Association rules Association rules… –… can predict any attribute and combinations of attributes … are not intended to be used.

Chapter 9 Business Intelligence Systems

Fast Algorithms for Association Rule Mining

IBM SPSS Modeler 14.2 Data Mining Concepts Introduction to Undirected Data Mining: Association Analysis Prepared by David Douglas, University of ArkansasHosted.

Introduction to Data Mining Data mining is a rapidly growing field of business analytics focused on better understanding of characteristics and.

MIS 451 Building Business Intelligence Systems Association Rule Mining (1)

Chapter 13 – Association Rules

Association Discovery from Databases Association rules are a simple formalism for expressing positive connections between columns in a 0/1 matrix. A classical.

Basic Data Mining Techniques

Overview DM for Business Intelligence.

Market Basket Analysis 포항공대 산업공학과 PASTA Lab. 석사과정 신원영.

Association Rules. 2 Customer buying habits by finding associations and correlations between the different items that customers place in their “shopping.

Association Rules. CS583, Bing Liu, UIC 2 Association rule mining Proposed by Agrawal et al in Initially used for Market Basket Analysis to find.

ASSOCIATION RULE DISCOVERY (MARKET BASKET-ANALYSIS) MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.

EXAM REVIEW MIS2502 Data Analytics. Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering.

Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.

1 What is Association Analysis: l Association analysis uses a set of transactions to discover rules that indicate the likely occurrence of an item based.

Lecture 4: Association Market Basket Analysis Analysis of Customer Behavior and Service Modeling.

Association Rule Mining

ASSOCIATION RULES (MARKET BASKET-ANALYSIS) MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.

Associations and Frequent Item Analysis. 2 Outline  Transactions  Frequent itemsets  Subset Property  Association rules  Applications.

Charles Tappert Seidenberg School of CSIS, Pace University

CURE Clustering Using Representatives Handles outliers well. Hierarchical, partition First a constant number of points c, are chosen from each cluster.

Elsayed Hemayed Data Mining Course

Data Mining  Association Rule  Classification  Clustering.

Chapter 8 Association Rules. Data Warehouse and Data Mining Chapter 10 2 Content Association rule mining Mining single-dimensional Boolean association.

Chap 6: Association Rules. Rule Rules!  Motivation ~ recent progress in data mining + warehousing have made it possible to collect HUGE amount of data.

Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.

MIS2502: Data Analytics Association Rule Mining David Schuff

Chapter 14 – Association Rules and Collaborative Filtering © Galit Shmueli and Peter Bruce 2016 Data Mining for Business Analytics (3rd ed.) Shmueli, Bruce.

Chapter 13 – Association Rules DM for Business Intelligence.

MIS2502: Data Analytics Association Rule Mining Jeremy Shafer

Mining Association Rules in Large Database This work is created by Dr. Anamika Bhargava, Ms. Pooja Kaul, Ms. Priti Bali and Ms. Rajnipriya Dhawan and licensed.

By Arijit Chatterjee Dr

Association Rules Repoussis Panagiotis.

Mining Association Rules

Knowledge discovery & data mining Association rules and market basket analysis--introduction UCLA CS240A Course Notes*

Tutorial 3: Using XLMiner for Association Rule Mining

Association Rules.

Association Rule Mining (Data Mining Tool)

I. Association Market Basket Analysis.

Eco 6380 Predictive Analytics For Economists Spring 2016

Market Basket Analysis and Association Rules

Market Basket Many-to-many relationship between different objects

Association Rule Mining

Data Mining II: Association Rule mining & Classification

DIRECT HASHING AND PRUNING (DHP) ALGORITHM

Data Mining Association Rules Assoc.Prof.Songül Varlı Albayrak

Transactional data Algorithm Applications

Analysis of Customer Behavior and Service Modeling

Frequent patterns and Association Rules

MIS2502: Data Analytics Association Rule Mining

Market Basket Analysis and Association Rules

MIS2502: Data Analytics Association Rule Mining

©Jiawei Han and Micheline Kamber

I. Association Market Basket Analysis.

Data Science in Industry

Lecture 11 (Market Basket Analysis)

Association Rules :A book store case

MIS2502: Data Analytics Association Rule Learning

Charles Tappert Seidenberg School of CSIS, Pace University

Presentation transcript:

Chapter 14 – Association Rules DM for Business Intelligence

What are Association Rules? Study of “what goes with what” “Customers who bought X also bought Y” What symptoms go with what diagnosis Transaction-based or event-based Also called “market basket analysis” and “affinity analysis” Originated with study of customer transactions databases to determine associations among items purchased Example of Unsupervised Learning

Used in many recommender systems

Generating Rules

Terms “IF” part = antecedent, represented by “(a)” “THEN” part = consequent, represented by “(c)” “Item set” = the items (e.g., products) comprising the antecedent and consequent, represented by “(a U c)” Antecedent and consequent are disjoint (i.e., have no items in common) For example, you wouldn’t say “customers bought beer and beer”

Tiny Example: Phone Cases/Faceplates

Many Rules are Possible For example: Transaction 1 supports several rules, such as “If red, then white” (“If a red faceplate is purchased, then so is a white one”) “If white, then red” “If red and white, then green” + several more

Frequent Item Sets Ideally, we want to create all possible combinations of items Problem: computation time grows exponentially as number of combinations increases Solution: consider only “frequent item sets” Criterion for frequent: Support

Support Support = Count (or percent) of total transactions that include an Item or Item Set Support(a) Support(c) Support(a U c) Example: support for the Item Set {red, white} is 4 out of 10 transactions, or 40%

Tiny Example: Phone Cases/Faceplates Support for rule: “If Red, then White” Tiny Example: Phone Cases/Faceplates Support = 4/10 or 40%

Apriori Algorithm

Generating Frequent Item Sets For k products… User sets a minimum support criterion Next, generate list of one-item sets that meet the support criterion Use the list of one-item sets to generate list of two- item sets that meet the support criterion Use list of two-item sets to generate list of three- item sets Continue up through k-item sets

Measures of Performance Support: = count or % of total transactions that include an Item or Item Set. Ex, Support(a), Support(c), or Support (a U c) Confidence = % of antecedent (IFs) transactions that also have the consequent (THEN) item set (“within sets” measure . . . IF and THENs / IFs) Or Support(a U c) / Support(a) Benchmark Confidence = transactions with consequent as % of all transactions (THENs / Total Records in Data) Or Support(c) / Total Records Lift = Confidence / Benchmark Confidence Lift > 1 indicates a rule that is useful in finding consequent items sets (i.e., more useful than just selecting transactions randomly)

Tiny Example: Phone Cases/Faceplates Analyzing the rule: “If Red, then White” Tiny Example: Phone Cases/Faceplates Confidence = 4/5 or 80%

Tiny Example: Phone Cases/Faceplates Analyzing the rule: “If Red, then White” Tiny Example: Phone Cases/Faceplates Confidence = 4/5 or 80%

Tiny Example: Phone Cases/Faceplates Analyzing the rule: “If Red, then White” Tiny Example: Phone Cases/Faceplates Confidence = 4/5 or 80% Benchmark Confidence = 8/10 or 80% Lift = 80%  80% = 1 (not greater than 1, so not good lift)

Select Rules with Confidence Generate all rules that meet specified support & confidence Find frequent item sets (those with sufficient support – see above) From these item sets, generate rules with sufficient confidence

Alternate Data Format: Binary Matrix Analyzing the rule: “If Red and White, then Green” Alternate Data Format: Binary Matrix (Antecedent = a) (Consequent = c) IF Red and White = 4 (a) IF Red and White, THEN Green = 2 (a U c) = 2  4 or 50% Confidence

Example: Rules from {red, white, green} {red, white}  {green} with confidence = 2/4 = 50% [(support {red, white, green})/(support {red, white})] {red, green}  {white} with confidence = 2/2 = 100% [(support {red, white, green})/(support {red, green})] Plus 4 more with confidence of 100%, 33%, 29% & 100% If confidence criterion is 70%, report only rules 2, 3 and 6

All Rules (XLMiner Output) “IF” “THEN” 1 2 3 4 5 6 50 100 33 29 Red, White => Red, Green => White, Green => Red => White => Green => Green White Red White, Green Red, Green Red, White 4 2 6 7 2 7 6 4 2 2.5 1.4 1.7 1.5 Confidence = Support(a U c) / Support(a) Benchmark Confidence = Support(c) / Total Records (e.g., 10 records) Lift Ratio = Confidence / Benchmark Confidence

Interpretation Lift ratio shows how effective the rule is in finding consequents (useful if finding particular consequents is important) Confidence shows the rate at which consequents will be found (useful in learning costs of promotion) Support measures overall impact

Caution: The Role of Chance Random data can generate apparently interesting association rules The more rules you produce, the greater this danger Rules based on large numbers of records are less subject to this danger

Summary Association rules (or affinity analysis, or market basket analysis) produce rules on among items from a database of transactions Is Unsupervised Learning Widely used in recommender systems Most popular method is Apriori algorithm To reduce computation, we consider only “frequent” item sets Performance is measured by confidence and lift Can produce too many rules; review is required to identify useful rules and to reduce redundancy