Data Mining Find information from data data ? information.

Slides:



Advertisements
Similar presentations
Mining Association Rules
Advertisements

Huffman Codes and Asssociation Rules (II) Prof. Sin-Min Lee Department of Computer Science.
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Data Mining Techniques Association Rule
LOGO Association Rule Lecturer: Dr. Bo Yuan
1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing.
Association Rules l Mining Association Rules between Sets of Items in Large Databases (R. Agrawal, T. Imielinski & A. Swami) l Fast Algorithms for.
Rakesh Agrawal Ramakrishnan Srikant
Chapter 5: Mining Frequent Patterns, Association and Correlations
Organization “Association Analysis”
Data Mining: Concepts and Techniques (2nd ed.) — Chapter 5 —
732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Association rules Apriori algorithm FP grow algorithm.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Mining Association Rules in Large Databases
1 Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant.
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
1 Association Rule Mining Instructor Qiang Yang Slides from Jiawei Han and Jian Pei And from Introduction to Data Mining By Tan, Steinbach, Kumar.
4/3/01CS632 - Data Mining1 Data Mining Presented By: Kevin Seng.
Association Analysis: Basic Concepts and Algorithms.
1 Association Rule Mining Instructor Qiang Yang Thanks: Jiawei Han and Jian Pei.
Chapter 4: Mining Frequent Patterns, Associations and Correlations
Mining Association Rules in Large Databases
Mining Association Rules in Large Databases
Mining Association Rules
1 Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant Slides from Ofer Pasternak.
Mining Association Rules
Performance and Scalability: Apriori Implementation.
Mining Association Rules in Large Databases. What Is Association Rule Mining?  Association rule mining: Finding frequent patterns, associations, correlations,
Pattern Recognition Lecture 20: Data Mining 3 Dr. Richard Spillman Pacific Lutheran University.
Association Discovery from Databases Association rules are a simple formalism for expressing positive connections between columns in a 0/1 matrix. A classical.
Mining Association Rules between Sets of Items in Large Databases presented by Zhuang Wang.
Eick, Tan, Steinbach, Kumar: Association Analysis Part1 Organization “Association Analysis” 1. What is Association Analysis? 2. Association Rules 3. The.
1 Apriori Algorithm Review for Finals. SE 157B, Spring Semester 2007 Professor Lee By Gaurang Negandhi.
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 6 —
What Is Association Mining? l Association rule mining: – Finding frequent patterns, associations, correlations, or causal structures among sets of items.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Data Warehousing 資料倉儲 Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授 Dept. of Information Management, Tamkang University Dept. of Information ManagementTamkang.
DATA MINING LECTURE 3 Frequent Itemsets Association Rules.
Information Systems Data Analysis – Association Mining Prof. Les Sztandera.
Frequent Item Mining. What is data mining? =Pattern Mining? What patterns? Why are they useful?
Fast Algorithms For Mining Association Rules By Rakesh Agrawal and R. Srikant Presented By: Chirayu Modi.
CS 8751 ML & KDDSupport Vector Machines1 Mining Association Rules KDD from a DBMS point of view –The importance of efficiency Market basket analysis Association.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.
Lecture 4: Association Market Basket Analysis Analysis of Customer Behavior and Service Modeling.
Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Chapter 6: Mining Frequent Patterns, Association and Correlations
Dept. of Information Management, Tamkang University
What is Frequent Pattern Analysis?
Data Mining  Association Rule  Classification  Clustering.
The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining Spring 2009.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Chapter 8 Association Rules. Data Warehouse and Data Mining Chapter 10 2 Content Association rule mining Mining single-dimensional Boolean association.
The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining.
Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.
COMP53311 Association Rule Mining Prepared by Raymond Wong Presented by Raymond Wong
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining Jinze Liu.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining COMP Seminar BCB 713 Module Spring 2011.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
Data Mining Find information from data data ? information.
Data Mining: Concepts and Techniques
Association rule mining
Mining Association Rules
©Jiawei Han and Micheline Kamber
Mining Association Rules in Large Databases
Association Rule Mining
©Jiawei Han and Micheline Kamber
Association Rule Mining
Presentation transcript:

Data Mining Find information from data data ? information

Data Mining Find information from data data Questions ? information What data  any data What information  anything useful ? information

Data Mining Find information from data data Questions Characteristics What data  any data What information  anything useful Characteristics Data is huge volume Computation is extremely intensive ? information

Mining Association Rules CS461 Lecture Department of Computer Science Iowa State University Ames, IA 50011

Basket Data Retail organizations, e.g., supermarkets, collect and store massive amounts sales data, called basket data. Each basket is a transaction, which consists of transaction date items bought

Association Rule: Basic Concepts Given: (1) database of transactions, (2) each transaction is a list of items Find: all rules that correlate the presence of one set of items with that of another set of items E.g., 98% of people who purchase tires and auto accessories also get automotive services done

Rule Measures: Support and Confidence Customer buys both Customer buys diaper Find all the rules X  Y with minimum confidence and support support, s, probability that a transaction contains {X, Y} confidence, c, conditional probability that a transaction having {X} also contains Y Customer buys beer Let minimum support 50%, and minimum confidence 50%, we have A  C (50%, 66.6%) C  A (50%, 100%)

Applications Basket data analysis, cross-marketing, catalog design, loss-leader analysis, clustering, classification, etc. Maintenance Agreement (What the store should do to boost Maintenance Agreement sales) Home Electronics (What other products should the store stocks up?) Attached mailing in direct marketing

Challenges Finding all rules XY with minimum support and minimum confidence X could any set of items Y could any set of items Naïve approach Enumerate all candidates XY For each candidate XY, compute its minimum support and minimum confidence

Mining Frequent Itemsets: the Key Step STEP1: Find the frequent itemsets: the sets of items that have minimum support The key step STEP2: Use the frequent itemsets to generate association rules

Mining Association Rules—An Example Min. support 50% Min. confidence 50% For rule A  C: support = support({A , C}) = 50% confidence = support({A, C})/support({A}) = 66.6%

Mining Association Rules—An Example Min. support 50% Min. confidence 50% How to generate frequent itemset?

Apriori Principle Any subset of a frequent itemset must also be a frequent itemset If {AB} is a frequent itemset, both {A} and {B} must be a frequent itemset If {AB} is not a frequent itemset, {ABX} cannot be a frequent itemset

Finding Frequent Itemsets Iteratively find frequent itemsets with cardinality from 1 to k (k-itemset) Find frequent 1-itemsets {A}, {B} Find frequent 2-itemset {AX}, {BX} …

The Apriori Algorithm Pseudo-code: Ck: candidate itemset of size k Lk : frequent itemset of size k L1 = {frequent items}; for (k = 1; Lk !=; k++) do begin Ck+1 = candidates generated from Lk; for each transaction t in database do increment the count of all candidates in Ck+1 contained in t Lk+1 = candidates in Ck+1 with min_support end return k Lk;

The Apriori Algorithm — Example Database D L1 C1 Scan D C2 C2 L2 Scan D C3 L3 Scan D

How to Generate Candidates? Step 1: self-joining Lk-1 Observation: all possible frequent k-itemsets can be generated by self-joining Lk-1 Step 2: pruning Observation: If any subset of an K-itemset is not a frequent itemset, the K-itemset cannot be frequent

Example of Generating Candidates L3={abc, abd, acd, ace, bcd} Self-joining: L3*L3 abcd from abc and abd acde from acd and ace Pruning: acde is removed because ade is not in L3 C4={abcd}

Generating Candidates: Pseudo Code Suppose the items in Lk-1 are listed in an order Step 1: self-joining Lk-1 insert into Ck select p.item1, p.item2, …, p.itemk-1, q.itemk-1 from Lk-1 p, Lk-1 q where p.item1=q.item1, …, p.itemk-2=q.itemk-2, p.itemk-1 < q.itemk-1 Step 2: pruning forall itemsets c in Ck do forall (k-1)-subsets s of c do if (s is not in Lk-1) then delete c from Ck

How to Count Supports of Candidates? Why counting supports of candidates a problem? The total number of candidates can be very huge It is too expensive to scan the whole database for each candidate One transaction may contain many candidates It is also expensive to check each transaction against the entire set of candidates Method Indexing candidate itemsets using hash-tree TID items 1 abcdefg 2 acdefg 3 abcfg 4 sdf 5 dfg ::: hfhg 9..9 dxv Frequent 3-item set abc acd bcd :::: xyz

Hash-Tree Leaf node: contains a list of itemsets Interior node: contains a hash table Each bucket points to another node Depth of root = 1 Buckets of a node at depth d points to nodes at depth d+1 All itemsets are stored in leaf nodes Depth=1 H H H H

Hash-Tree: Example Hash(k1) Hash(k2) Hash(k3) K1, K2, K3 The hash-tree constructed for C2: root _________________|________________ | | | | bread cucumber onion parsley {bread, cucumber} {cucumber, onion} {onion, parsley} {parsley, tomato} {bread, onion} {cucumber, parsley} {onion, tomato} {bread, parsley} {cucumber, tomato} {bread, tomato} K1, K2, K3 Depth 1: hash(K1) Depth 2: hash(K2) Depth 3: hash(K3)

Hash-Tree: Construction Searching for an itemset c: start from the root At depth d, to choose the branch to follow, apply a hash function to the d th item of c Insertion of an itemset c Search for the corresponding leaf node Insert the itemset into that leaf If an overflow occurs: Transform the leaf node into an internal node Distribute the entries to the new leaf nodes according to the hash function Depth=1 H H H H The hash-tree constructed for C2: root _________________|________________ | | | | bread cucumber onion parsley {bread, cucumber} {cucumber, onion} {onion, parsley} {parsley, tomato} {bread, onion} {cucumber, parsley} {onion, tomato} {bread, parsley} {cucumber, tomato} {bread, tomato}

Hash-Tree: Counting Support Search for all candidate itemsets contained in a transaction T(t1, t2, …, tn) : At the root Determine the hash values for each item in T Continue the search in the resulting child nodes At an internal node at level d (reached after hashing of item ti) Determine the hash values and continue the search for each item tk with K>I At a leaf node Check whether the itemsets in the leaf node are contained in transaction T Depth=1 H H H H The hash-tree constructed for C2: root _________________|________________ | | | | bread cucumber onion parsley {bread, cucumber} {cucumber, onion} {onion, parsley} {parsley, tomato} {bread, onion} {cucumber, parsley} {onion, tomato} {bread, parsley} {cucumber, tomato} {bread, tomato}

Generation of Rules from Frequent Itemsets For each frequent itemset X: For each subset A of X, form a rule A(X - A) Compute the confidence of the rule Delete the rule if it does not have minimum confidence For any itemset c contained in transaction t, the first item of c must be in t. At root, by hashing on every item in t, we ensure that we only ignore itemsets that start with an item not in t.

Is Apriori Fast Enough? — Performance Bottlenecks The core of the Apriori algorithm: Use frequent (k – 1)-itemsets to generate candidate frequent k-itemsets Use database scan and pattern matching to collect counts for the candidate itemsets The bottleneck of Apriori: candidate generation Huge candidate sets: 104 frequent 1-itemset will generate 107 candidate 2-itemsets To discover a frequent pattern of size 100, e.g., {a1, a2, …, a100}, one needs to generate 2100  1030 candidates. Multiple scans of database: Needs (n +1 ) scans, n is the length of the longest pattern

Summary Association rule mining An interesting research direction probably the most significant contribution from the database community in KDD A large number of papers have been published An interesting research direction Association analysis in other types of data: spatial data, multimedia data, time series data, etc.

References R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. SIGMOD'93, 207-216, Washington, D.C. R. Agrawal and R. Srikant. Fast algorithms for mining association rules. VLDB'94 487-499, Santiago, Chile.