Association Rules Apriori Algorithm

Slides:



Advertisements
Similar presentations
Association rule mining
Advertisements

Association Rules Evgueni Smirnov.
Association Rule and Sequential Pattern Mining for Episode Extraction Jonathan Yip.
Association Rule Mining
Mining Association Rules in Large Databases
G54DMT – Data Mining Techniques and Applications Dr. Jaume Bacardit
Huffman Codes and Asssociation Rules (II) Prof. Sin-Min Lee Department of Computer Science.
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
CSE 634 Data Mining Techniques
Institut für Scientific Computing - Universität WienP.Brezany 1 Datamining Methods Mining Association Rules and Sequential Patterns.
Data Mining Techniques Association Rule
Association rules and frequent itemsets mining
DATA MINING Association Rule Discovery. AR Definition aka Affinity Grouping Common example: Discovery of which items are frequently sold together at a.
Association Rules Spring Data Mining: What is it?  Two definitions:  The first one, classic and well-known, says that data mining is the nontrivial.
LOGO Association Rule Lecturer: Dr. Bo Yuan
IT 433 Data Warehousing and Data Mining Association Rules Assist.Prof.Songül Albayrak Yıldız Technical University Computer Engineering Department
1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing.
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Association rules Apriori algorithm FP grow algorithm.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis: Basic Concepts and Algorithms.
Data Mining Association Analysis: Basic Concepts and Algorithms
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Mining Association Rules
Mining Association Rules in Large Databases. What Is Association Rule Mining?  Association rule mining: Finding frequent patterns, associations, correlations,
Pattern Recognition Lecture 20: Data Mining 3 Dr. Richard Spillman Pacific Lutheran University.
Bayesian Decision Theory Making Decisions Under uncertainty 1.
Apriori algorithm Seminar of Popular Algorithms in Data Mining and Machine Learning, TKK Presentation Lauri Lahti.
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 6 —
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Introduction of Data Mining and Association Rules cs157 Spring 2009 Instructor: Dr. Sin-Min Lee Student: Dongyi Jia.
CS 8751 ML & KDDSupport Vector Machines1 Mining Association Rules KDD from a DBMS point of view –The importance of efficiency Market basket analysis Association.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Association Rule Mining
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
1 Knowledge discovery & data mining Association rules and market basket analysis --introduction UCLA CS240A Course Notes* __________________________ *
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Chapter 8 Association Rules. Data Warehouse and Data Mining Chapter 10 2 Content Association rule mining Mining single-dimensional Boolean association.
Reducing Number of Candidates Apriori principle: – If an itemset is frequent, then all of its subsets must also be frequent Apriori principle holds due.
Association Rules Carissa Wang February 23, 2010.
Introduction to Machine Learning Lecture 13 Introduction to Association Rules Albert Orriols i Puig Artificial.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
Data Mining: Concepts and Techniques
A Research Oriented Study Report By :- Akash Saxena
Association rule mining
Knowledge discovery & data mining Association rules and market basket analysis--introduction UCLA CS240A Course Notes*
Frequent Pattern Mining
Waikato Environment for Knowledge Analysis
Market Basket Analysis and Association Rules
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Rules Assoc.Prof.Songül Varlı Albayrak
Transactional data Algorithm Applications
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Market Basket Analysis and Association Rules
關連分析 (Association Analysis)
Association Analysis: Basic Concepts
Presentation transcript:

Association Rules Apriori Algorithm

Machine Learning Overview Sales Transaction and Association Rules Aprori Algorithm Example

Machine Learning Common ground of presented methods Distinction Statistical Learning Methods (frequency/similarity based) Distinction Data are represented in a vector space or symbolically Supervised learning or unsupervised Scalable, works with very large data or not

Extracting Rules from Examples Apriori Algorithm FP-Growth Large quantities of stored data (symbolic) Large means often extremely large, and it is important that the algorithm gets scalable Unsupervised learning

Cluster Analysis K-Means EM (Expectation Maximization) COBWEB Clustering Assesment KNN (k Nearest Neighbor) Data are represented in a vector space Unsupervised learning

Uncertain knowledge Naive Bayes Belief Networks (Bayesian Networks) Main tool is the probability theory, which assigns to each item numerical degree of belief between 0 and 1 Learning from Observation (symbolic data) Unsupervised Learning

Decision Trees ID3 C4.5 cart Learning from Observation (symbolic data) Unsupervised Learning

Supervised Classifiers - artificial Neural Networks Feed forward Networks with one layer: Perceptron With several layers: Backpropagation Algorithm RBF Networks Support Vector Machines Data are represented in a vector space Supervised learning

Prediction Linear Regression Logistic Regression

Sales Transaction Table We would like to perform a basket analysis of the set of products in a single transaction Discovering for example, that a customer who buys shoes is likely to buy socks

Transactional Database The set of all sales transactions is called the population We represent the transactions in one record per transaction The transaction are represented by a data tuple TX1 Shoes,Socks,Tie TX2 Shoes,Socks,Tie,Belt,Shirt TX3 Shoes,Tie TX4 Shoes,Socks,Belt

Sock is the rule antecedent Tie is the rule consequent

Support and Confidence Any given association rule has a support level and a confidence level Support it the percentage of the population which satisfies the rule If the percentage of the population in which the antendent is satisfied is s, then the confidence is that percentage in which the consequent is also satisfied

Transactional Database Support is 50% (2/4) Confidence is 66.67% (2/3) TX1 Shoes,Socks,Tie TX2 Shoes,Socks,Tie,Belt,Shirt TX3 Shoes,Tie TX4 Shoes,Socks,Belt

Apriori Algorithm Mining for associations among items in a large database of sales transaction is an important database mining function For example, the information that a customer who purchases a keyboard also tends to buy a mouse at the same time is represented in association rule below: Keyboard ⇒Mouse [support = 6%, confidence = 70%]

Association Rules Based on the types of values, the association rules can be classified into two categories: Boolean Association Rules and Quantitative Association Rules Boolean Association Rule: Keyboard ⇒ Mouse [support = 6%, confidence = 70%] Quantitative Association Rule: (Age = 26 ...30) ⇒(Cars =1, 2) [support 3%, confidence = 36%]

Minimum Support threshold The support of an association pattern is the percentage of task-relevant data transactions for which the pattern is true

Minimum Confidence Threshold Confidence is defined as the measure of certainty or trustworthiness associated with each discovered pattern The probability of B given that all we know is A

Itemset A set of items is referred to as itemset An itemset containing k items is called k-itemset An itemset can be seen as a conjunction of items (or a presdcate)

Frequent Itemset Suppose min_sup is the minimum support threshold An itemset satisfies minimum support if the occurrence frequency of the itemset is greater or equal to min_sup If an itemset satisfies minimum support, then it is a frequent itemset

Strong Rules Rules that satisfy both a minimum support threshold and a minimum confidence threshold are called strong

Association Rule Mining Find all frequent itemsets Generate strong association rules from the frequent itemsets Apriori algorithm is mining frequent itemsets for Boolean associations rules

Apriori Algorithm Lelel-wise search k-itemsets (itensets with k items) are used to explore (k+1)- itemsets from transactional databases for Boolean association rules First, the set of frequent 1-itemsets is found (denoted L1) L1 is used to find L2, the set of frquent 2-itemsets L2 is used to find L3, and so on, until no frequent k-itemsets can be found Generate strong association rules from the frequent itemsets

Example Supmin = 2 Database TDB L1 C1 1st scan C2 C2 L2 2nd scan C3 L3 Itemset sup {A} 2 {B} 3 {C} {D} 1 {E} Database TDB Itemset sup {A} 2 {B} 3 {C} {E} L1 C1 Tid Items 10 A, C, D 20 B, C, E 30 A, B, C, E 40 B, E 1st scan C2 Itemset sup {A, B} 1 {A, C} 2 {A, E} {B, C} {B, E} 3 {C, E} C2 Itemset {A, B} {A, C} {A, E} {B, C} {B, E} {C, E} L2 2nd scan Itemset sup {A, C} 2 {B, C} {B, E} 3 {C, E} C3 L3 Itemset {B, C, E} 3rd scan Itemset sup {B, C, E} 2

The name of the algorithm is based on the fact that the algorithm uses prior knowledge of frequent items Employs an iterative approach known as level-wise search, where k-items are used to explore k+1 items

Apriori Property Apriori property is used to reduce the search space Apriori property: All nonempty subset of frequent items must be also frequent Anti-monotone in the sense that if a set cannot pass a test, all its supper sets will fail the same test as well

Apriori Property Reducing the search space to avoid finding of each Lk requires one full scan of the database (Lk set of frequent k-itemsets) If an itemset I does not satisfy the minimum support threshold, min_sup, the I is not frequent, P(I) < min_sup If an item A is added to the itemset I, then the resulting itemset cannot occur more frequent than I, therfor I  A is not frequent, P(I  A) < min_sup

Scalable Methods for Mining Frequent Patterns The downward closure property of frequent patterns Any subset of a frequent itemset must be frequent If {beer, diaper, nuts} is frequent, so is {beer, diaper} i.e., every transaction having {beer, diaper, nuts} also contains {beer, diaper} Scalable mining methods: Three major approaches Apriori (Agrawal & Srikant@VLDB’94) Freq. pattern growth (FPgrowth—Han, Pei & Yin @SIGMOD’00) Vertical data format approach (Charm—Zaki & Hsiao @SDM’02)

Algorithm Scan the (entire) transaction database to get the support S of each 1-itemset, compare S with min_sup, and get a set of frequent 1-itemsets, L1 Use Lk-1 join Lk-1 to generate a set of candidate k-itemsets. Use Apriori property to prune the unfreqset k-itemset Scan the transaction database to get the support S of each candidate k-itemset in the final set, compare S with min_sup, and get a set of frequent k-itemsets, Lk Is the candidate set empty, if not goto 2

For each frequent itemset l, generate all nonempty subsets of l For every nonempty subset s of l, output the rule if its confidence C > min_conf I={A1,A2,A5}

Example Five transactions from a supermarket (diaper=fralda) TID List of Items 1 Beer,Diaper,Baby Powder,Bread,Umbrella 2 Diaper,Baby Powder 3 Beer,Diaper,Milk 4 Diaper,Beer,Detergent 5 Beer,Milk,Coca-Cola

Step 1 Min_sup 40% (2/5) C1  L1 Item Support Beer "4/5" Diaper Baby Powder "2/5" Bread "1/5" Umbrella Milk Detergent Coca-Cola Item Support Beer "4/5" Diaper Baby Powder "2/5" Milk

Step 2 and Step 3 C2  L2 Item Support Beer, Diaper "3/5" Beer, Baby Powder "1/5" Beer, Milk "2/5" Diaper,Baby Powder Diaper,Milk Baby Powder,Milk "0" Item Support Beer, Diaper "3/5" Beer, Milk "2/5" Diaper,Baby Powder

Step 4 C3  empty Min_sup 40% (2/5) Item Support Beer, Diaper,Baby Powder "1/5" Beer, Diaper,Milk Beer, Milk,Baby Powder "0" Diaper,Baby Powder,Milk

Step 5 min_sup=40% min_conf=70% Item Support(A,B) Suport A Confidence Beer, Diaper 60% 80% 75% Beer, Milk 40% 50% Diaper,Baby Powder Diaper,Beer Milk,Beer 100% Baby Powder, Diaper

Results support 60%, confidence 70% support 60%, confidence 70%

Interpretation Some results are belivable, like Baby Powder  Diaper Some rules need aditional analysis, like Milk  Beer Some rules are unbelivable, like Diaper  Beer This example could contain unreal results because of the small data

Machine Learning Overview Sales Transaction and Association Rules Aprori Algorithm Example

How to make Apriori faster? FP-growth