LOGO Association Rule Lecturer: Dr. Bo Yuan

Slides:



Advertisements
Similar presentations
Association Rule Mining
Advertisements

Recap: Mining association rules from large datasets
Huffman Codes and Asssociation Rules (II) Prof. Sin-Min Lee Department of Computer Science.
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Data Mining Techniques Association Rule
COMP5318 Knowledge Discovery and Data Mining
10 -1 Lecture 10 Association Rules Mining Topics –Basics –Mining Frequent Patterns –Mining Frequent Sequential Patterns –Applications.
ICDM'06 Panel 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant (description by C. Faloutsos)
1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing.
Rakesh Agrawal Ramakrishnan Srikant
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Chapter 5: Mining Frequent Patterns, Association and Correlations
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Association rules Apriori algorithm FP grow algorithm.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Data Mining Association Analysis: Basic Concepts and Algorithms
Spring 2003Data Mining by H. Liu, ASU1 5. Association Rules Market Basket Analysis and Itemsets APRIORI Efficient Association Rules Multilevel Association.
Association Analysis: Basic Concepts and Algorithms.
Data Mining Association Analysis: Basic Concepts and Algorithms
Mining Association Rules in Large Databases
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Mining Association Rules
1 Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant Slides from Ofer Pasternak.
Mining Association Rules
Performance and Scalability: Apriori Implementation.
Mining Association Rules in Large Databases. What Is Association Rule Mining?  Association rule mining: Finding frequent patterns, associations, correlations,
Pattern Recognition Lecture 20: Data Mining 3 Dr. Richard Spillman Pacific Lutheran University.
Mining Association Rules between Sets of Items in Large Databases presented by Zhuang Wang.
Association Rules. 2 Customer buying habits by finding associations and correlations between the different items that customers place in their “shopping.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Association Rules. CS583, Bing Liu, UIC 2 Association rule mining Proposed by Agrawal et al in Initially used for Market Basket Analysis to find.
Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )
Data & Text Mining1 Introduction to Association Analysis Zhangxi Lin ISQS 3358 Texas Tech University.
M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining Association Rule Mining March 5, 2009.
Frequent Item Mining. What is data mining? =Pattern Mining? What patterns? Why are they useful?
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Data Mining Find information from data data ? information.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Association Rules presented by Zbigniew W. Ras *,#) *) University of North Carolina – Charlotte #) ICS, Polish Academy of Sciences.
M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining ARM: Improvements March 10, 2009 Slide.
Data Mining  Association Rule  Classification  Clustering.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Chapter 8 Association Rules. Data Warehouse and Data Mining Chapter 10 2 Content Association rule mining Mining single-dimensional Boolean association.
Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
Data Mining Find information from data data ? information.
Reducing Number of Candidates
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining: Concepts and Techniques
Information Management course
Association rule mining
Knowledge discovery & data mining Association rules and market basket analysis--introduction UCLA CS240A Course Notes*
Frequent Pattern Mining
Frequent Itemsets Association Rules
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Gyozo Gidofalvi Uppsala Database Laboratory
Association Rule Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Association Analysis: Basic Concepts
Presentation transcript:

LOGO Association Rule Lecturer: Dr. Bo Yuan

Overview  Frequent Itemsets  Association Rules  Sequential Patterns 2

Future Store 3

A Real Example 4

Market-Based Problems  Finding associations among items in a transactional database.  Items  Bread, Milk, Chocolate, Butter …  Transaction (Basket)  A non-empty subset of all items  Cross Selling  Selling additional products or services to an existing customer.  Bundle Discount  Shop Layout Design  Minimum Distance vs. Maximum Distance  “Baskets” & “Items”  Sentences & Words 5

Definitions  A transaction is a set of items: T={i a, i b,…,i t }  T is a subset of I where I is the set of all possible items.  The dataset D contains a set of transactions.  An association rule is in the form of  A set of items is referred to as itemset.  An itemset containing k items is called k-itemset.  An itemset can be seen as a conjunction of items. 6

Transactions 7 Items 1 Bread, Jelly, Peanut, Butter 2 Bread, Butter 3 Bread, Jelly 4 Bread, Milk, Butter 5 Chips, Milk 6 Bread, Chips 7 Bread, Milk 8 Chips, Jelly Searching for rules in the form of: Bread  Butter

Support of an Itemset 8 ItemsetSupportItemsetSupport Bread6/8Bread, Butter 3/8 Butter3/8 … Chips2/8Bread, Butter, Chips0/8 Jelly3/8 … Milk3/8Bread, Butter, Chips, Jelly 0/8 Peanut1/8 … Bread, Butter, Chips, Jelly, Milk0/8 … Bread, Butter, Chips, Jelly, Milk, Peanut0/8

Support & Confidence of Association Rule 9

 Support measures how often the rule occurs in the dataset.  Confidence measures the strength of the rule. 10 TransactionsItems 1Bread, Jelly, Peanut, Butter 2Bread, Butter 3Bread, Jelly 4Bread, Milk, Butter 5Chips, Milk 6Bread, Chips 7Bread, Milk 8Chips, Jelly Bread  Milk Support: 2/8 Confidence: 1/3 Milk  Bread Support: 2/8 Confidence: 2/3

Frequent Itemsets and Strong Rules  Support and Confidence are bounded by thresholds:  Minimum support σ  Minimum confidence Φ  A frequent (large) itemset is an itemset with support larger than σ.  A strong rule is a rule that is frequent and its confidence is higher than Φ.  Association Rule Problem  Given I, D, σ and Φ, to find all strong rules in the form of X  Y.  The number of all possible association rules is huge.  Brute force strategy is infeasible.  A smart way is to find frequent itemsets first. 11

The Big Picture  Step 1: Find all frequent itemsets.  Step 2: Use frequent itemsets to generate association rules.  For each frequent itemset f Create all non-empty subsets of f.  For each non-empty subset s of f Output s  (f-s) if support (f) / support (s) > Φ 12 {a, b, c} a b  c a c  b b c  a a  b c b  a c c  a b The key is to find frequent itemsets.

Myth No. 1  A rule with high confidence is not necessarily plausible.  For example:  |D|=10000  #{DVD}=7500  #{Tape}=6000  #{DVD, Tape}=4000  Thresholds: σ=30%, Φ=50%  Support(Tape  DVD)= 4000/10000=40%  Confidence(Tape  DVD)=4000/6000=66%  Now we have a strong rule: Tape  DVD  Seems that Tapes will help promote DVDs.  However, P(DVD)=75% > P(DVD | Tape) !!  Tape buyers are less likely to purchase DVDs. 13

Myth No Transactions Bread, Milk Bread, Battery Bread, Butter Bread, Honey Bread, Chips Yogurt, Coke Bread, Battery Cookie, Jelly

Myth No Association ≠ Causality P(Y|X) is just the conditional probability.

Itemset Generation 16 Ø ABCD AB ABC ABCD ACCDBDBCAD BCDACDABD

Itemset Calculation 17

The Apriori Method  One of the best known algorithms in Data Mining  Key ideas  A subset of a frequent itemset must be frequent. {Milk, Bread, Coke} is frequent  {Milk, Coke} is frequent  The supersets of any infrequent itemset cannot be frequent. {Battery} is infrequent  {Milk, Battery} is infrequent 18

Candidate Pruning 19 Ø A B B CD AB ABC ABCD ACCD BD BC AD BCD ACD ABD

General Procedure  Generate itemsets of a particular size.  Scan the database once to see which of them are frequent.  Use frequent itemsets to generate candidate itemsets of size=size+1.  Iteratively find frequent itemsets with cardinality from 1 to k.  Avoid generating candidates that are known to be infrequent.  Require multiple scans of the database.  Efficient indexing techniques such as Hash function & Bitmap may help. 20

Apriori Algorithm 21

L k  C k+1 22

L k  C k+1 23 Ordered List

Correctness 24 Join

Demo 25

Clothing Example 26

Clothing Example 27

Clothing Example 28

Real Examples 29

Effective Recommendation 30

Sequential Pattern 31 TimeCustomer

Sequence 32 stY/N Yes Yes No Yes

Support of Sequence 33 CIDTimeItems A11, 2, 4 A22, 3 A35 B11, 2 B22, 3, 4 C11, 2 C22, 3, 4 C32, 4, 5 D12 D23, 4 D34, 5 E11, 3 E22, 4, 5 Support 60% 60% 80% 80% 80% 60% 60% 60% 60%

Candidate Space 34

Candidate Generation  A sequence s 1 is merged with another sequence s 2 if and only if the subsequence obtained by dropping the first item in s 1 is identical to the subsequence obtained by dropping the last item in s sequences Candidate Pruning s1s1 s2s2

Reading Materials  Text Book  J. Han and M. Kamber, Data Mining: Concepts and Techniques, Chapter 6, Morgan Kaufmann.  Core Papers  J. Han, J. Pei, Y. Yin and R. Mao (2004) “Mining frequent patterns without candidate generation: A frequent-pattern tree approach”. Data Mining and Knowledge Discovery, Vol. 3, pp  R. Agrawal and R. Srikant (1995) “Mining sequential patterns”. In Proceedings of the Eleventh International Conference on Data Engineering (ICDE), pp  R. Agrawal and R. Srikant (1994) “Fast algorithms for mining association rules in large databases”. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB), pp  R. Agrawal, T. Imielinski, and A. Swami (1993) “Mining association rules between sets of items in large databases”. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pp

Review  What is an itemset?  What is an association rule?  What is the support of an itemset or an association rule?  What is the confidence of an association rule?  What does (not) an association rule tell us?  What is the key idea of Apriori?  What is the framework of Apriori?  What is a good recommendation?  What is the difference between itemsets and sequences? 37

Next Week’s Class Talk  Volunteers are required for next week’s class talk.  Topic : Frequent Pattern Tree  Hints:  Avoids the costly database scans.  Avoids the costly generation of candidates.  Applies partitioning-based, divide-and-conquer method.  Data Mining and Knowledge Discovery, 8, 53-87, 2004  Length: 20 minutes plus question time 38