Introduction to Machine Learning Lecture 13 Introduction to Association Rules Albert Orriols i Puig Artificial.

Slides:



Advertisements
Similar presentations
Association Rule Mining
Advertisements

Mining Association Rules in Large Databases
Recap: Mining association rules from large datasets
Huffman Codes and Asssociation Rules (II) Prof. Sin-Min Lee Department of Computer Science.
Data Mining Techniques Association Rule
Association Rules Spring Data Mining: What is it?  Two definitions:  The first one, classic and well-known, says that data mining is the nontrivial.
FUNGSI MAYOR Assosiation. What Is Association Mining? Association rule mining: –Finding frequent patterns, associations, correlations, or causal structures.
IT 433 Data Warehousing and Data Mining Association Rules Assist.Prof.Songül Albayrak Yıldız Technical University Computer Engineering Department
Association Rule Mining. 2 The Task Two ways of defining the task General –Input: A collection of instances –Output: rules to predict the values of any.
Association rules The goal of mining association rules is to generate all possible rules that exceed some minimum user-specified support and confidence.
1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Chapter 5: Mining Frequent Patterns, Association and Correlations
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Association rules Apriori algorithm FP grow algorithm.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Spring 2003Data Mining by H. Liu, ASU1 5. Association Rules Market Basket Analysis and Itemsets APRIORI Efficient Association Rules Multilevel Association.
Association Analysis: Basic Concepts and Algorithms.
Data Mining Association Analysis: Basic Concepts and Algorithms
Mining Association Rules in Large Databases
6/23/2015CSE591: Data Mining by H. Liu1 Association Rules Transactional data Algorithm Applications.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Mining Association Rules
Mining Association Rules
Mining Association Rules in Large Databases. What Is Association Rule Mining?  Association rule mining: Finding frequent patterns, associations, correlations,
Association Rules. 2 Customer buying habits by finding associations and correlations between the different items that customers place in their “shopping.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Association Rules. CS583, Bing Liu, UIC 2 Association rule mining Proposed by Agrawal et al in Initially used for Market Basket Analysis to find.
Supermarket shelf management – Market-basket model:  Goal: Identify items that are bought together by sufficiently many customers  Approach: Process.
Data & Text Mining1 Introduction to Association Analysis Zhangxi Lin ISQS 3358 Texas Tech University.
M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining Association Rule Mining March 5, 2009.
9/03Data Mining – Association G Dong (WSU) 1 5. Association Rules Market Basket Analysis APRIORI Efficient Mining Post-processing.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Data Mining Find information from data data ? information.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Association Rules presented by Zbigniew W. Ras *,#) *) University of North Carolina – Charlotte #) ICS, Polish Academy of Sciences.
Data Mining  Association Rule  Classification  Clustering.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Chapter 8 Association Rules. Data Warehouse and Data Mining Chapter 10 2 Content Association rule mining Mining single-dimensional Boolean association.
Reducing Number of Candidates Apriori principle: – If an itemset is frequent, then all of its subsets must also be frequent Apriori principle holds due.
Chap 6: Association Rules. Rule Rules!  Motivation ~ recent progress in data mining + warehousing have made it possible to collect HUGE amount of data.
Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.
COMP53311 Association Rule Mining Prepared by Raymond Wong Presented by Raymond Wong
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining Jinze Liu.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining COMP Seminar BCB 713 Module Spring 2011.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
Reducing Number of Candidates
Data Mining Association Analysis: Basic Concepts and Algorithms
Association rule mining
Frequent Pattern Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Rules Assoc.Prof.Songül Varlı Albayrak
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Association Analysis: Basic Concepts and Algorithms
Market Basket Analysis and Association Rules
Department of Computer Science National Tsing Hua University
Association Analysis: Basic Concepts
Presentation transcript:

Introduction to Machine Learning Lecture 13 Introduction to Association Rules Albert Orriols i Puig Artificial Intelligence – Machine Learning Enginyeria i Arquitectura La Salle Universitat Ramon Llull

Recap of Lecture 5-12 LET’S START WITH DATA CLASSIFICATION Slide 2 Artificial IntelligenceMachine Learning

Recap of Lecture 5-12 Data Set Classification Model How? We have seen four different types of approaches to classification : Decision trees (C4.5) Instance-based algorithms (kNN & CBR) Bayesian classifiers (Naïve Bayes) Neural Networks (Perceptron, Adaline, Madaline, SVM) Slide 3 Artificial IntelligenceMachine Learning

Today’s Agenda □ Introduction to Association Rules 口 A Taxonomy of Association Rules 口 Measures of Interest □ Apriori Slide 4 Artificial IntelligenceMachine Learning

Introduction to AR □ Ideas come from the market basket analysis (MBA) Let’s go shopping! Milk, eggs, sugar, bread Milk, eggs, cereal, bread Eggs, sugar Customer1 Customer2Customer3 What do my customer buy? Which product are bought together? Aim: Find associations and correlations between the different items that customers place in their shopping basket Slide 5 Artificial IntelligenceMachine Learning

Introduction to AR □ Formalizing the problem a little bit Transaction Database T: a set of transactions T = {t 1, t 2, …, t n } Each transaction contains a set of items I (item set) An itemset is a collection of items I = {i 1, i 2, …, i m } □ General aim: Find frequent/interesting patterns, associations, correlations, or causal structures among sets of items or elements in databases or other information repositories. Put this relationships in terms of association rules Y X  Y Slide 6 Artificial IntelligenceMachine Learning

Example of AR TIDItems T1bread, jelly, peanut-butter Examples: bread  peanut-butter T2bread, peanut-butter T3bread, milk, peanut-butter beer  bread T4beer, bread T5beer, milk □ Frequent itemsets: Items that frequently appear together I = {bread, peanut-butter} I = {beer, bread} Slide 7 Artificial IntelligenceMachine Learning

What’s an Interesting Rule? □ Support count (σ) TIDItems Frequency of occurrence of and itemset T1bread, jelly, peanut-butter T2 bread, peanut-butter T3bread, milk, peanut-butter T4 beer, bread T5beer, milk Y o({bread, peanut-butter}) = 3 o({beer, bread}) = 1 Y □ Support Fraction of transactions that contain an itemset Y s ({bread,peanut-butter}) = 3/5 s ({beer, bread}) = 1/5 Y □ Frequent itemset An itemsetwhose support is greater than or equal to a minimum support threshold (minsup) Slide 8 Artificial IntelligenceMachine Learning

What’s an Interesting Rule? □ An association rule is an TIDItems implication of two itemsets X  Y T1bread, jelly, peanut-butter T2bread, peanut-butter T3bread, milk, peanut-butter T4beer, bread T5beer, milk □ Many measures of interest. The two most used are: Support (s) Y The occurring frequency of the rule, i.e., number of transactions that contain both X and Y  ( X  Y ) ( X  Y ) # oftrans. s  Confidence (c) Y The strength of the association, i.e., measures of how often items in Y Slide 9 Artificial IntelligenceMachine Learning c   ( X  Y )c   ( X  Y ) appear in transactions that contain X (X)

Interestingness of Rules □ Many other interesting measures The method presented herein are based on these two approaches Slide 10 Artificial IntelligenceMachine Learning TIDsc bread  peanut-butter peanut-butter  bread beer  bread peanut-butter  jelly jelly  peanut-butter jelly  milk 0.00 TIDItems T1bread, jelly, peanut-butter T2bread, peanut-butter T3bread, milk, peanut-butter T4beer, bread T5beer, milk

Types of AR □ Binary association rules: bread  peanut-butter □ Quantitative association rules: weight in [70kg – 90kg]  height in [170cm – 190cm] □ Fuzzy association rules: weight in TALL  height in TALL □ Let’s start for the beginning Binary association rules – A priori Slide 11 Artificial IntelligenceMachine Learning

Apriori □ This is the most influential AR miner □ It consists of two steps 1. Generate all frequent itemsets whose support ≥ minsup 2. Use frequent itemsets to generate association rules □ So, let’s pay attention to the first step Slide 12 Artificial IntelligenceMachine Learning

Apriori null ABCDE ABADACAEBDBCBECECDDE ABCABEABDACDADEACEBCDBDEBCECDE ABCDABCEABDEACDEBCDE ABCDE Given d items, we have 2d possible itemsets. Do I have to generate them all? Slide 13 Artificial IntelligenceMachine Learning

Apriori □ Let’s avoid expanding all the graph □ Key idea: Downward closure property: Any subsets of a frequent itemset are also frequent itemsets □ Therefore, the algorithm iteratively does: Create itemsets Only continue exploration of those whose support ≥ minsup Slide 14 Artificial IntelligenceMachine Learning

Example Itemset Generation null Infrequent itemset ABCDE ABADACAEBDBCBECECDDE ABCABEABDACDADEACEBCDBDEBCECDE ABCDABCEABDEACDEBCDE ABCD Given d items, we have 2d possible itemsets. Do I have to generate them all? Slide 15 Artificial IntelligenceMachine Learning

Recovering the Example Minimum support = 3 1-itemsets Itemcount bread peanut-b itemsets jelly milk beer Slide 16 Artificial IntelligenceMachine Learning TIDItems T1bread, jelly, peanut-butter T2bread, peanut-butter T3bread, milk, peanut-butter T4beer, bread T5beer, milk Itemcount bread, peanut-b3

Apriori Algorithm 口 k=1 □ Generate frequent itemsets of length 1 □ Repeat until no frequent itemsets are found k := k+1 Generate itemsets of size k from the k-1 frequent itemsets Compute the support of each candidate by scanning DB Slide 17 Artificial IntelligenceMachine Learning

Apriori Algorithm Algorithm Apriori(T) C 1  init-pass(T); F 1  {f | f  C 1, f.count/n  minsup}; for (k = 2; F k-1   ; k++) do C k  candidate-gen(F k-1 ); for each transaction t  T do for each candidate c  C k do if c is contained in t then c.count++; end // n: no. of transactions in T F k  {c  C k | c.count/n  minsup} end return F  k Fk;k Fk; Slide 18 Artificial IntelligenceMachine Learning

Apriori Algorithm Function candidate-gen(F k-1 ) C k   ; forall f 1, f 2  F k-1 // prune with f 1 = {i 1, …, i k-2, i k-1 } and f 2 = {i 1, …, i k-2, i’ k-1 } and i k-1 < i’ k-1 do c  {i 1, …, i k-1, i’ k-1 }; C k  C k  {c}; for each (k-1)-subset s of c do if (s  F k-1 ) then delete c from C k ; end end return C k ; Slide 19 Artificial IntelligenceMachine Learning // join f 1 and f 2

Example of Apriori Run Database TDB C1C1 L1L1 1 st scan C2C2 C2C2 Itemsetsup Itemset Itemsetsup L2L2 2 nd scan {A, B} {A, C} {A, B}1 {A, C}2 {A, E}1 {B, C} {B, E} {B, C}2 {B, E}3 {C, E}2 C3C3 L3L3 3 rd scan {C, E} Itemset {B, C, E} Itemsetsup Slide 20 Artificial IntelligenceMachine Learning {B, C, E}2 Itemsetsup {A}2 {B}3 {C}3 {D}1 {E}3 {A}2 {B}3 {C}3 {E}3 TidItems 10A, C, D 20B, C, E 30A, B, C, E 40B, E Itemsetsup {A, C}2 {B, C}2 {B, E}3 {C, E}2

Apriori □ Remember that Apriori consists of two steps 1. Generate all frequent itemsets whose support ≥ minsup 2. Use frequent itemsets to generate association rules □ We accomplished step 1. So we have all frequent itemsets □ So, let’s pay attention to the second step Slide 21 Artificial IntelligenceMachine Learning

Rule Generation in Apriori □ Given a frequent itemset L Find all non-empty subsetsF in L, such that the association rule F  {L-F}satisfies the minimum confidence Create the rule F  {L-F} □ If L={A,B,C} The candidate itemsets are: AB  C, AC  B, BC  A, A  BC, B  AC, C  AB In general, there are 2 K -2 candidate solutions, where k is the length of the itemset L Slide 22 Artificial IntelligenceMachine Learning

Can you Be More Efficient? □ Can we apply the same trick used with support? Confidence does not have anti-monote property That is, c(AB  D) > c(A  D)? Y Don’t know! □ But confidence of rules generated from the same itemset does have the anti-monote property L={A,B,C,D} Y C(ABC  D)≥ c(AB  CD) ≥ c(A  BCD) We can apply this property to prune the rule generation Slide 23 Artificial IntelligenceMachine Learning

Example of Efficient Rule Generation ABCD Low confidence ABC  DABD  CACD  BBCD  A AB  CDAC  BDBC  ADBD  ADAD  BCCD  AB A  BCDB  ACDC  ABDD  ABC Slide 24 Artificial IntelligenceMachine Learning

Challenges in AR Mining □ Challenges Apriori scans the data base multiple times Most often, there is a high number of candidates Support counting for candidates can be time expensive □ Several methods try to improve this points by Reduce the number of scans of the data base Shrink the number of candidates Counting the support of candidates more efficiently Slide 25 Artificial IntelligenceMachine Learning

Next Class □ Advanced topics in association rule mining Slide 26 Artificial IntelligenceMachine Learning

Introduction to Machine Learning Lecture 13 Introduction to Association Rules Albert Orriols i Puig Artificial Intelligence – Machine Learning Enginyeria i Arquitectura La Salle Universitat Ramon Llull