1 Mining Association Rules Mohamed G. Elfeky. 2 Introduction Data mining is the discovery of knowledge and useful information from the large amounts of.

Slides:



Advertisements
Similar presentations
Association Rules Evgueni Smirnov.
Advertisements

Brian Chase.  Retailers now have massive databases full of transactional history ◦ Simply transaction date and list of items  Is it possible to gain.
Huffman Codes and Asssociation Rules (II) Prof. Sin-Min Lee Department of Computer Science.
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Data Mining Techniques Association Rule
DATA MINING Association Rule Discovery. AR Definition aka Affinity Grouping Common example: Discovery of which items are frequently sold together at a.
Data Mining (Apriori Algorithm)DCS 802, Spring DCS 802 Data Mining Apriori Algorithm Spring of 2002 Prof. Sung-Hyuk Cha School of Computer Science.
Mining Multiple-level Association Rules in Large Databases
IT 433 Data Warehousing and Data Mining Association Rules Assist.Prof.Songül Albayrak Yıldız Technical University Computer Engineering Department
Association Rule Mining. 2 The Task Two ways of defining the task General –Input: A collection of instances –Output: rules to predict the values of any.
10 -1 Lecture 10 Association Rules Mining Topics –Basics –Mining Frequent Patterns –Mining Frequent Sequential Patterns –Applications.
Association rules The goal of mining association rules is to generate all possible rules that exceed some minimum user-specified support and confidence.
Association Rules l Mining Association Rules between Sets of Items in Large Databases (R. Agrawal, T. Imielinski & A. Swami) l Fast Algorithms for.
Data Mining Techniques Cluster Analysis Induction Neural Networks OLAP Data Visualization.
Rakesh Agrawal Ramakrishnan Srikant
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
1 Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant.
Data Mining Association Analysis: Basic Concepts and Algorithms
Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee, David W. Cheung, Ben Kao The University of Hong.
4/3/01CS632 - Data Mining1 Data Mining Presented By: Kevin Seng.
1 Mining Quantitative Association Rules in Large Relational Database Presented by Jin Jin April 1, 2004.
Data Mining Association Analysis: Basic Concepts and Algorithms
Fast Algorithms for Mining Association Rules * CS401 Final Presentation Presented by Lin Yang University of Missouri-Rolla * Rakesh Agrawal, Ramakrishnam.
Association Rules Presented by: Anilkumar Panicker Presented by: Anilkumar Panicker.
6/23/2015CSE591: Data Mining by H. Liu1 Association Rules Transactional data Algorithm Applications.
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Association Rule Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis)
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
2/8/00CSE 711 data mining: Apriori Algorithm by S. Cha 1 CSE 711 Seminar on Data Mining: Apriori Algorithm By Sung-Hyuk Cha.
Fast Algorithms for Association Rule Mining
Lecture14: Association Rules
Mining Association Rules
Mining Sequences. Examples of Sequence Web sequence:  {Homepage} {Electronics} {Digital Cameras} {Canon Digital Camera} {Shopping Cart} {Order Confirmation}
1 Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant Slides from Ofer Pasternak.
Pattern Recognition Lecture 20: Data Mining 3 Dr. Richard Spillman Pacific Lutheran University.
Association Discovery from Databases Association rules are a simple formalism for expressing positive connections between columns in a 0/1 matrix. A classical.
Mining Association Rules between Sets of Items in Large Databases presented by Zhuang Wang.
Apriori algorithm Seminar of Popular Algorithms in Data Mining and Machine Learning, TKK Presentation Lauri Lahti.
Association Rules. 2 Customer buying habits by finding associations and correlations between the different items that customers place in their “shopping.
1 Apriori Algorithm Review for Finals. SE 157B, Spring Semester 2007 Professor Lee By Gaurang Negandhi.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
© 2007 Cios / Pedrycz / Swiniarski / Kurgan Chapter 10 ASSOCIATION RULES Cios / Pedrycz / Swiniarski / Kurgan.
3.Mining Association Rules in Large Database 3.1 Market Basket Analysis:Example for Association Rule Mining 1.A typical example of association rule mining.
Mining Sequential Patterns Rakesh Agrawal Ramakrishnan Srikant Proc. of the Int ’ l Conference on Data Engineering (ICDE) March 1995 Presenter: Sam Brown.
Fast Algorithms For Mining Association Rules By Rakesh Agrawal and R. Srikant Presented By: Chirayu Modi.
CS 8751 ML & KDDSupport Vector Machines1 Mining Association Rules KDD from a DBMS point of view –The importance of efficiency Market basket analysis Association.
Outline Knowledge discovery in databases. Data warehousing. Data mining. Different types of data mining. The Apriori algorithm for generating association.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Fast Algorithms for Mining Association Rules Rakesh Agrawal and Ramakrishnan Srikant VLDB '94 presented by kurt partridge cse 590db oct 4, 1999.
Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.
Part II - Association Rules © Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II – Association Rules Margaret H. Dunham Department of.
Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee, David W. Cheung, Ben Kao The University of Hong.
Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
CURE Clustering Using Representatives Handles outliers well. Hierarchical, partition First a constant number of points c, are chosen from each cluster.
 Frequent Word Combinations Mining and Indexing on HBase Hemanth Gokavarapu Santhosh Kumar Saminathan.
HEMANTH GOKAVARAPU SANTHOSH KUMAR SAMINATHAN Frequent Word Combinations Mining and Indexing on HBase.
A Scalable Association Rules Mining Algorithm Based on Sorting, Indexing and Trimming Chuang-Kai Chiou, Judy C. R Tseng Proceedings of the Sixth International.
Chapter 8 Association Rules. Data Warehouse and Data Mining Chapter 10 2 Content Association rule mining Mining single-dimensional Boolean association.
Chap 6: Association Rules. Rule Rules!  Motivation ~ recent progress in data mining + warehousing have made it possible to collect HUGE amount of data.
COMP53311 Association Rule Mining Prepared by Raymond Wong Presented by Raymond Wong
Introduction to Machine Learning Lecture 13 Introduction to Association Rules Albert Orriols i Puig Artificial.
Association Rules Repoussis Panagiotis.
Association Rules.
Market Basket Analysis and Association Rules
Market Basket Analysis and Association Rules
Graduate Course DataMining
Fast Algorithms for Mining Association Rules
Presentation transcript:

1 Mining Association Rules Mohamed G. Elfeky

2 Introduction Data mining is the discovery of knowledge and useful information from the large amounts of data stored in databases. Association Rules: describing association relationships among the attributes in the set of relevant data.

3 Rules Body ==> Consequent [ Support, Confidence ] Body: represents the examined data. Consequent: represents a discovered property for the examined data. Support: represents the percentage of the records satisfying the body or the consequent. Confidence: represents the percentage of the records satisfying both the body and the consequent to those satisfying only the body.

4 Association Rules Examples Basket Data Tea ^ Milk ==> Sugar [0.3, 0.9] Relational Data x.diagnosis = Heart ^ x.sex = Male ==> x.age > 50 [0.4, 0.7] Object-Oriented Data s.hobbies = { sport, art } ==> s.age() = Young [0.5, 0.8]

5 Topics of Discussion Formal Statement of the Problem Different Algorithms AIS SETM Apriori AprioriTid AprioriHybrid Performance Analysis

6 Formal Statement of the Problem I = { i 1, i 2, …, i m } is a set of items D is a set of transactions T Each transaction T is a set of items (subset of I ) TID is a unique identifier that is associated with each transaction The problem is to generate all association rules that have support and confidence greater than the user-specified minimum support and minimum confidence

7 Problem Decomposition The problem can be decomposed into two subproblems: 1. Find all sets of items (itemsets) that have support (number of transactions) greater than the minimum support (large itemsets). 2. Use the large itemsets to generate the desired rules. For each large itemset l, find all non-empty subsets, and for each subset a generate a rule a ==> (l-a) if its confidence is greater than the minimum confidence.

8 General Algorithm 1. In the first pass, the support of each individual item is counted, and the large ones are determined 2. In each subsequent pass, the large itemsets determined in the previous pass is used to generate new itemsets called candidate itemsets. 3. The support of each candidate itemset is counted, and the large ones are determined. 4. This process continues until no new large itemsets are found.

9 AIS Algorithm Candidate itemsets are generated and counted on-the- fly as the database is scanned. 1. For each transaction, it is determined which of the large itemsets of the previous pass are contained in this transaction. 2. New candidate itemsets are generated by extending these large itemsets with other items in this transaction. The disadvantage is that this results in unnecessarily generating and counting too many candidate itemsets that turn out to be small.

10 Example TIDItems Database ItemsetSupport {1}2 {2}3 {3}3 {5}3 L1L1 ItemsetSupport {1 3}*2 {1 4}1 {3 4}1 {2 3}*2 {2 5}*3 {3 5}*2 {1 2}1 {1 5}1 C2C2 ItemsetSupport {1 3 4}1 {2 3 5}*2 {1 3 5}1 C3C3

11 SETM Algorithm Candidate itemsets are generated on-the-fly as the database is scanned, but counted at the end of the pass. 1. New candidate itemsets are generated the same way as in AIS algorithm, but the TID of the generating transaction is saved with the candidate itemset in a sequential structure. 2. At the end of the pass, the support count of candidate itemsets is determined by aggregating this sequential structure It has the same disadvantage of the AIS algorithm. Another disadvantage is that for each candidate itemset, there are as many entries as its support value.

12 Example TIDItems Database ItemsetSupport {1}2 {2}3 {3}3 {5}3 L1L1 ItemsetTID {1 3}100 {1 4}100 {3 4}100 {2 3}200 {2 5}200 {3 5}200 {1 2}300 {1 3}300 {1 5}300 {2 3}300 {2 5}300 {3 5}300 {2 5}400 C2C2 ItemsetTID {1 3 4}100 {2 3 5}200 {1 3 5}300 {2 3 5}300 C3C3

13 Apriori Algorithm Candidate itemsets are generated using only the large itemsets of the previous pass without considering the transactions in the database. 1. The large itemset of the previous pass is joined with itself to generate all itemsets whose size is higher by Each generated itemset, that has a subset which is not large, is deleted. The remaining itemsets are the candidate ones.

14 Example TIDItems Database ItemsetSupport {1}2 {2}3 {3}3 {5}3 L1L1 ItemsetSupport {1 2}1 {1 3}*2 {1 5}1 {2 3}*2 {2 5}*3 {3 5}*2 C2C2 ItemsetSupport {2 3 5}*2 C3C3 {1 2 3} {1 3 5} {2 3 5}

15 AprioriTid Algorithm The database is not used at all for counting the support of candidate itemsets after the first pass. 1. The candidate itemsets are generated the same way as in Apriori algorithm. 2. Another set C’ is generated of which each member has the TID of each transaction and the large itemsets present in this transaction. This set is used to count the support of each candidate itemset. The advantage is that the number of entries in C’ may be smaller than the number of transactions in the database, especially in the later passes.

16 Example TIDItems Database ItemsetSupport {1}2 {2}3 {3}3 {5}3 L1L1 ItemsetSupport {1 2}1 {1 3}*2 {1 5}1 {2 3}*2 {2 5}*3 {3 5}*2 C2C2 ItemsetSupport {2 3 5}*2 C3C3 100{1 3} 200{2 3}, {2 5}, {3 5} 300{1 2}, {1 3}, {1 5}, {2 3}, {2 5}, {3 5} 400{2 5} C’ 2 200{2 3 5} 300{2 3 5} C’ 3

17 Performance Analysis

18 AprioriHybrid Algorithm Performance Analysis shows that: 1. Apriori does better than AprioriTid in the earlier passes. 2. AprioriTid does better than Apriori in the later passes. Hence, a hybrid algorithm can be designed that uses Apriori in the initial passes and switches to AprioriTid when it expects that the set C’ will fit in memory.