1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing.

Slides:



Advertisements
Similar presentations
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Advertisements

Data Mining Techniques Association Rule
Association Rules Spring Data Mining: What is it?  Two definitions:  The first one, classic and well-known, says that data mining is the nontrivial.
LOGO Association Rule Lecturer: Dr. Bo Yuan
IT 433 Data Warehousing and Data Mining Association Rules Assist.Prof.Songül Albayrak Yıldız Technical University Computer Engineering Department
Chapter 5: Mining Frequent Patterns, Association and Correlations
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining: Concepts and Techniques (2nd ed.) — Chapter 5 —
732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Association rules Apriori algorithm FP grow algorithm.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
1 Association Rule Mining Instructor Qiang Yang Slides from Jiawei Han and Jian Pei And from Introduction to Data Mining By Tan, Steinbach, Kumar.
Business Systems Intelligence: 4. Mining Association Rules Dr. Brian Mac Namee (
Association Analysis: Basic Concepts and Algorithms.
1 Association Rule Mining Instructor Qiang Yang Thanks: Jiawei Han and Jian Pei.
Chapter 4: Mining Frequent Patterns, Associations and Correlations
Mining Association Rules in Large Databases
Mining Association Rules in Large Databases
Asssociation Rules Prof. Sin-Min Lee Department of Computer Science.
Mining Association Rules
Mining Association Rules
Mining Association Rules in Large Databases. What Is Association Rule Mining?  Association rule mining: Finding frequent patterns, associations, correlations,
Pattern Recognition Lecture 20: Data Mining 3 Dr. Richard Spillman Pacific Lutheran University.
Mining Association Rules between Sets of Items in Large Databases presented by Zhuang Wang.
Association Rules. 2 Customer buying habits by finding associations and correlations between the different items that customers place in their “shopping.
Ch5 Mining Frequent Patterns, Associations, and Correlations
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 6 —
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Data Warehousing 資料倉儲 Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授 Dept. of Information Management, Tamkang University Dept. of Information ManagementTamkang.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.
Data Mining Find information from data data ? information.
Association Rule Mining
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Mining Frequent Patterns. What Is Frequent Pattern Analysis? Frequent pattern: a pattern (a set of items, subsequences, substructures, etc.) that occurs.
Chapter 6: Mining Frequent Patterns, Association and Correlations
Dept. of Information Management, Tamkang University
What is Frequent Pattern Analysis?
Data Mining  Association Rule  Classification  Clustering.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Chapter 8 Association Rules. Data Warehouse and Data Mining Chapter 10 2 Content Association rule mining Mining single-dimensional Boolean association.
The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining.
Reducing Number of Candidates Apriori principle: – If an itemset is frequent, then all of its subsets must also be frequent Apriori principle holds due.
Chap 6: Association Rules. Rule Rules!  Motivation ~ recent progress in data mining + warehousing have made it possible to collect HUGE amount of data.
Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining Jinze Liu.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining COMP Seminar BCB 713 Module Spring 2011.
Introduction to Machine Learning Lecture 13 Introduction to Association Rules Albert Orriols i Puig Artificial.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
Mining Association Rules in Large Database This work is created by Dr. Anamika Bhargava, Ms. Pooja Kaul, Ms. Priti Bali and Ms. Rajnipriya Dhawan and licensed.
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Information Management course
Association rule mining
Knowledge discovery & data mining Association rules and market basket analysis--introduction UCLA CS240A Course Notes*
Frequent Pattern Mining
Market Basket Analysis and Association Rules
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Rules Assoc.Prof.Songül Varlı Albayrak
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Unit 3 MINING FREQUENT PATTERNS ASSOCIATION AND CORRELATIONS
Frequent patterns and Association Rules
Department of Computer Science National Tsing Hua University
Association Rule Mining
Association Analysis: Basic Concepts
Presentation transcript:

1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing and IT Dept.

2 of 25 2 of 45 What Is Association Mining? Association rule mining: –Finding frequent patterns, associations, correlations, or causal structures among sets of items or objects in transaction databases, relational databases, and other information repositories Frequent Pattern: A pattern (set of items, sequence, etc.) that occurs frequently in a database

3 of 25 3 of 45 Motivations For Association Mining Motivation: Finding regularities in data –What products were often purchased together? Beer and nappies! –What are the subsequent purchases after buying a PC? –What kinds of DNA are sensitive to this new drug? –Can we automatically classify web documents?

4 of 25 4 of 45 Motivations For Association Mining (cont…) Broad applications –Basket data analysis, cross-marketing, catalog design, sale campaign analysis –Web log (click stream) analysis, DNA sequence analysis, etc.

5 of 25 5 of 45 Market Basket Analysis Market basket analysis is a typical example of frequent itemset mining Customers buying habits are divined by finding associations between different items that customers place in their “shopping baskets” This information can be used to develop marketing strategies

6 of 25 6 of 45 Market Basket Analysis (cont…)

7 of 25 7 of 45 Application of Association Association analysis can be used in promoting/improving marketing strategy by analysing frequent itemset. As a marketing manager of a Company X for instance you would like to determine which items are frequently purchased together within the same transactions.

8 of 25 8 of 45 Application of Association An example of such a rule, mined from the X Company transactional database, is buys(X; “computer”)=>buys(X; “software”) [support = 1%; confidence = 50%] where X is a variable representing a customer. A confidence, or certainty, of 50% means that if a customer buys a computer, there is a 50% chance that she will buy software as well.

9 of 25 9 of 45 Application of Association A 1% support means that 1% of all of the transactions under analysis showed that computer and software were purchased together. This association rule involves a single attribute or predicate (i.e., buys) that repeats. Association rules that contain a single predicate are referred to as single- dimensional association rules.

10 of of 45 Application of Association In addition to the marketing application, the same sort of question has the following uses: Baskets = documents; items = words. Words appearing frequently together in documents may represent phrases or linked concepts. Can be used for intelligence gathering.

11 of of 45 Application of Association Baskets = sentences, items = documents. Two documents with many of the same sentences could represent plagiarism or mirror sites on the Web.

12 of of 45 Association Rule Basic Concepts Let I be a set of items {I 1, I 2, I 3,…, I m } Let D be a database of transactions where each transaction T is a set of items such that T I So, if A is a set of items a transaction T is said to contain A if and only if A T An association rule is an implication A B where A I, B I, and A B=

13 of of 45 Association Rule Support & Confidence We say that an association rule A B holds in the transaction set D with support, s, and confidence, c The support of the association rule is given as the percentage of transactions in D that contain both A and B (or A B ) So, the support can be considered the probability P(A B)

14 of of 45 Association Rule Support & Confidence (cont…) The confidence of the association rule is given as the percentage of transactions in D containing A that also contain B So, the confidence can be considered the conditional probability P(B|A) Association rules that satisfy minimum support and confidence values are said to be strong

15 of of 45 Itemsets & Frequent Itemsets An itemset is a set of items A k -itemset is an itemset that contains k items The occurrence frequency of an itemset is the number of transactions that contain the itemset –This is also known more simply as the frequency, support count or count An itemset is said to be frequent if the support count satisfies a minimum support count threshold The set of frequent itemsets is denoted L k

16 of of 45 Support & Confidence Again Support and confidence values can be calculated as follows:

17 of of 45 Mining Association Rules: An Example Transaction-idItems bought 10A, B, C 20A, C 30A, D 40B, E, F Frequent patternSupport {A}75% {B}50% {C}50% {A, C}50%

18 of of 45 Mining Association Rules: An Example (cont…) Transaction-idItems bought 10A, B, C 20A, C 30A, D 40B, E, F Frequent patternSupport {A}75% {B}50% {C}50% {A, C}50%

19 of of 45 Association Rule Mining So, in general association rule mining can be reduced to the following two steps: 1.Find all frequent itemsets Each itemset will occur at least as frequently as as a minimum support count 2.Generate strong association rules from the frequent itemsets These rules will satisfy minimum support and confidence measures

20 of of 45 Combinatorial Explosion! A major challenge in mining frequent itemsets is that the number of frequent itemsets generated can be massive For example, a long frequent itemset will contain a combinatorial number of shorter frequent sub-itemsets A frequent itemset of length 100 will contains the following number of frequent sub-itemsets:

21 of of 45 The Apriori Algorithm Any subset of a frequent itemset must be frequent –If {beer, nappy, nuts} is frequent, so is {beer, nappy} –Every transaction having {beer, nappy, nuts} also contains {beer, nappy} Apriori pruning principle: If there is any itemset which is infrequent, its superset should not be generated/tested!

22 of of 45 The Apriori Algorithm (cont…) The Apriori algorithm is known as a candidate generation-and-test approach Method: –Generate length ( k+1 ) candidate itemsets from length k frequent itemsets –Test the candidates against the DB Performance studies show the algorithm’s efficiency and scalability

23 of of 45 The Apriori Algorithm: An Example Database TDB 1 st scan C1C1 L1L1 L2L2 C2C2 C2C2 2 nd scan C3C3 L3L3 3 rd scan TidItems 10A, C, D 20B, C, E 30A, B, C, E 40B, E Itemsetsup {A}2 {B}3 {C}3 {D}1 {E}3 Itemsetsup {A}2 {B}3 {C}3 {E}3 Itemset {A, B} {A, C} {A, E} {B, C} {B, E} {C, E} Itemsetsup {A, B}1 {A, C}2 {A, E}1 {B, C}2 {B, E}3 {C, E}2 Itemsetsup {A, C}2 {B, C}2 {B, E}3 {C, E}2 Itemset {B, C, E} Itemsetsup {B, C, E}2

24 of of 45 Important Details Of The Apriori Algorithm There are two crucial questions in implementing the Apriori algorithm: –How to generate candidates? –How to count supports of candidates?

25 of of 45 Generating Candidates There are 2 steps to generating candidates: –Step 1: Self-joining L k –Step 2: Pruning Example of Candidate-generation –L 3 ={abc, abd, acd, ace, bcd} –Self-joining: L 3 *L 3 abcd from abc and abd acde from acd and ace –Pruning: acde is removed because ade is not in L 3 –C 4 ={abcd}

26 of of 45 How to Count Supports Of Candidates? Why counting supports of candidates a problem? –The total number of candidates can be huge –One transaction may contain many candidates Method: –Candidate itemsets are stored in a hash-tree –Leaf node of hash-tree contains a list of itemsets and counts –Interior node contains a hash table –Subset function: finds all the candidates contained in a transaction

27 of of 45 Generating Association Rules Once all frequent itemsets have been found association rules can be generated Strong association rules from a frequent itemset are generated by calculating the confidence in each possible rule arising from that itemset and testing it against a minimum confidence threshold

28 of of 45 Example TIDList of item_IDs T100Coke, Crisps, Milk T200Crisps, Bread T300Crisps, Nappies T400Coke, Crisps, Bread T500Coke, Nappies T600Crisps, Nappies T700Coke, Nappies T800Coke, Crisps, Nappies, Milk T900Coke, Crisps, Nappies IDItem I1Coke I2Crisps I3Nappies I4Bread I5Milk

29 of of 45 Example

30 of of 45 Challenges Of Frequent Pattern Mining Challenges –Multiple scans of transaction database –Huge number of candidates –Tedious workload of support counting for candidates Improving Apriori: general ideas –Reduce passes of transaction database scans –Shrink number of candidates –Facilitate support counting of candidates

31 of of 45 Questions? ?