1 What is Association Analysis: l Association analysis uses a set of transactions to discover rules that indicate the likely occurrence of an item based.

Slides:



Advertisements
Similar presentations
1 Statistics 202: Statistical Aspects of Data Mining Professor David Mease Tuesday, Thursday 9:00-10:15 AM Terman 156 Lecture 8 = Finish chapter 6 Agenda:
Advertisements

Association Rule Mining
Mining Association Rules in Large Databases
Association Rules Spring Data Mining: What is it?  Two definitions:  The first one, classic and well-known, says that data mining is the nontrivial.
FUNGSI MAYOR Assosiation. What Is Association Mining? Association rule mining: –Finding frequent patterns, associations, correlations, or causal structures.
Pertemuan XIV FUNGSI MAYOR Assosiation. What Is Association Mining? Association rule mining: –Finding frequent patterns, associations, correlations, or.
Association Rule Mining. 2 The Task Two ways of defining the task General –Input: A collection of instances –Output: rules to predict the values of any.
MIS2502: Data Analytics Association Rule Mining. Uses What products are bought together? Amazon’s recommendation engine Telephone calling patterns Association.
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Frequent Item Mining.
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Organization “Association Analysis”
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis: Basic Concepts and Algorithms.
Data Mining Association Analysis: Basic Concepts and Algorithms
1) Go over HW #1 solutions (Due today)
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
1 Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant Slides from Ofer Pasternak.
Eick, Tan, Steinbach, Kumar: Association Analysis Part1 Organization “Association Analysis” 1. What is Association Analysis? 2. Association Rules 3. The.
1 Statistics 202: Statistical Aspects of Data Mining Professor David Mease Tuesday, Thursday 9:00-10:15 AM Terman 156 Lecture 7 = Finish chapter 3 and.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Association Rules. CS583, Bing Liu, UIC 2 Association rule mining Proposed by Agrawal et al in Initially used for Market Basket Analysis to find.
ASSOCIATION RULE DISCOVERY (MARKET BASKET-ANALYSIS) MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.
Eick, Tan, Steinbach, Kumar: Association Analysis Part1 Organization “Association Analysis” 1. What is Association Analysis? 2. Association Rules 3. The.
Supermarket shelf management – Market-basket model:  Goal: Identify items that are bought together by sufficiently many customers  Approach: Process.
Data Mining Association Analysis Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/
Data & Text Mining1 Introduction to Association Analysis Zhangxi Lin ISQS 3358 Texas Tech University.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
EXAM REVIEW MIS2502 Data Analytics. Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
ASSOCIATION RULES (MARKET BASKET-ANALYSIS) MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
CS 345: Topics in Data Warehousing Thursday, November 18, 2004.
What is Frequent Pattern Analysis?
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Elective-I Examination Scheme- In semester Assessment: 30 End semester Assessment :70 Text Books: Data Mining Concepts and Techniques- Micheline Kamber.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
MIS2502: Data Analytics Association Rule Mining David Schuff
Introduction to Data Mining Mining Association Rules Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.
MIS2502: Data Analytics Association Rule Mining Jeremy Shafer
Stats 202: Statistical Aspects of Data Mining Professor Rajan Patel
Data Mining – Association Rules
Mining Association Rules in Large Databases
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Frequent Pattern Mining
William Norris Professor and Head, Department of Computer Science
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis: Basic Concepts and Algorithms
MIS2502: Data Analytics Association Rule Mining
MIS2502: Data Analytics Association Rule Mining
Mining Association Rules in Large Databases
Mining Association Rules in Large Databases
Association Analysis: Basic Concepts
Presentation transcript:

1 What is Association Analysis: l Association analysis uses a set of transactions to discover rules that indicate the likely occurrence of an item based on the occurrences of other items in the transaction l Examples: {Diaper}  {Beer}, {Milk, Bread}  {Eggs,Coke} {Beer, Bread}  {Milk} l Implication means co-occurrence, not causality!

2 Definitions: lItemset –A collection of one or more items –Example: {Milk, Bread, Diaper} –k-itemset = An itemset that contains k items lSupport count (  ) –Frequency of occurrence of an itemset –E.g.  ({Milk, Bread,Diaper}) = 2 lSupport –Fraction of transactions that contain an itemset –E.g. s({Milk, Bread, Diaper}) = 2/5 lFrequent Itemset –An itemset whose support is greater than or equal to a minsup threshold

3 Another Definition: lAssociation Rule –An implication expression of the form X  Y, where X and Y are itemsets –Example: {Milk, Diaper}  {Beer}

4 Even More Definitions: lAssociation Rule Evaluation Metrics –Support (s) =Fraction of transactions that contain both X and Y –Confidence (c) =Measures how often items in Y appear in transactions that contain X lExample:

5 In class exercise: Compute the support for itemsets {a}, {b, d}, and {a,b,d} by treating each transaction ID as a market basket.

6 In class exercise: Use the results in the previous problem to compute the confidence for the association rules {b, d} → {a} and {a} → {b, d}. State what these values mean in plain English.

7 In class exercise: Compute the support for itemsets {a}, {b, d}, and {a,b,d} by treating each customer ID as a market basket.

8 In class exercise: Use the results in the previous problem to compute the confidence for the association rules {b, d} → {a} and {a} → {b, d}. State what these values mean in plain English.

9 An Association Rule Mining Task: l Given a set of transactions T, find all rules having both - support ≥ minsup threshold - confidence ≥ minconf threshold l Brute-force approach: - List all possible association rules - Compute the support and confidence for each rule - Prune rules that fail the minsup and minconf thresholds - Problem: this is computationally prohibitive!

10 The Support and Confidence Requirements can be Decoupled l All the above rules are binary partitions of the same itemset: {Milk, Diaper, Beer} l Rules originating from the same itemset have identical support but can have different confidence l Thus, we may decouple the support and confidence requirements {Milk,Diaper}  {Beer} (s=0.4, c=0.67) {Milk,Beer}  {Diaper} (s=0.4, c=1.0) {Diaper,Beer}  {Milk} (s=0.4, c=0.67) {Beer}  {Milk,Diaper} (s=0.4, c=0.67) {Diaper}  {Milk,Beer} (s=0.4, c=0.5) {Milk}  {Diaper,Beer} (s=0.4, c=0.5)

11 Two Step Approach: 1) Frequent Itemset Generation = Generate all itemsets whose support ≥ minsup 2) Rule Generation = Generate high confidence ( confidence ≥ minconf ) rules from each frequent itemset, where each rule is a binary partitioning of a frequent itemset l Note: Frequent itemset generation is still computationally expensive and your book discusses algorithms that can be used

12 In class exercise #23: Use the two step approach to generate all rules having support ≥.4 and confidence ≥.6 for the transactions below.