Association Rule Mining Zhenjiang Lin Group Presentation April 10, 2007.

Slides:



Advertisements
Similar presentations
Mining Association Rules
Advertisements

CSE 634 Data Mining Techniques
Data Mining Techniques Association Rule
Association rules and frequent itemsets mining
Mining Frequent Patterns II: Mining Sequential & Navigational Patterns Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Association Rules Spring Data Mining: What is it?  Two definitions:  The first one, classic and well-known, says that data mining is the nontrivial.
LOGO Association Rule Lecturer: Dr. Bo Yuan
IT 433 Data Warehousing and Data Mining Association Rules Assist.Prof.Songül Albayrak Yıldız Technical University Computer Engineering Department
Association rules The goal of mining association rules is to generate all possible rules that exceed some minimum user-specified support and confidence.
1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing.
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Association rules Apriori algorithm FP grow algorithm.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Quality Class 9. Rule Discovery Decision and Classification Trees Association Rules.
1 Mining Frequent Patterns Without Candidate Generation Apriori-like algorithm suffers from long patterns or quite low minimum support thresholds. Two.
Mining Frequent patterns without candidate generation Jiawei Han, Jian Pei and Yiwen Yin.
Association Analysis: Basic Concepts and Algorithms.
Association Rule Mining. Generating assoc. rules from frequent itemsets  Assume that we have discovered the frequent itemsets and their support  How.
Data Mining Association Analysis: Basic Concepts and Algorithms
Chapter 4: Mining Frequent Patterns, Associations and Correlations
Association Rules Presented by: Anilkumar Panicker Presented by: Anilkumar Panicker.
6/23/2015CSE591: Data Mining by H. Liu1 Association Rules Transactional data Algorithm Applications.
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Association Rule Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
Mining Association Rules
Mining Association Rules
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
SEG Tutorial 2 – Frequent Pattern Mining.
Mining Association Rules in Large Databases. What Is Association Rule Mining?  Association rule mining: Finding frequent patterns, associations, correlations,
Pattern Recognition Lecture 20: Data Mining 3 Dr. Richard Spillman Pacific Lutheran University.
Mining Association Rules between Sets of Items in Large Databases presented by Zhuang Wang.
Mining Frequent Patterns without Candidate Generation Presented by Song Wang. March 18 th, 2009 Data Mining Class Slides Modified From Mohammed and Zhenyu’s.
Jiawei Han, Jian Pei, and Yiwen Yin School of Computing Science Simon Fraser University Mining Frequent Patterns without Candidate Generation SIGMOD 2000.
AR mining Implementation and comparison of three AR mining algorithms Xuehai Wang, Xiaobo Chen, Shen chen CSCI6405 class project.
Data Mining Frequent-Pattern Tree Approach Towards ARM Lecture
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Association Rules. CS583, Bing Liu, UIC 2 Association rule mining Proposed by Agrawal et al in Initially used for Market Basket Analysis to find.
Sequential Pattern Mining
Mining Frequent Patterns without Candidate Generation.
Frequent Item Mining. What is data mining? =Pattern Mining? What patterns? Why are they useful?
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.
Data Mining Find information from data data ? information.
Association Rule Mining
Data Mining  Association Rule  Classification  Clustering.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Reducing Number of Candidates Apriori principle: – If an itemset is frequent, then all of its subsets must also be frequent Apriori principle holds due.
Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Knowledge discovery & data mining Association rules and market basket analysis--introduction UCLA CS240A Course Notes*
Market Basket Analysis and Association Rules
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Rules Assoc.Prof.Songül Varlı Albayrak
Association Rule Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Mining Frequent Patterns without Candidate Generation
Market Basket Analysis and Association Rules
FP-Growth Wenlong Zhang.
Association Analysis: Basic Concepts
Presentation transcript:

Association Rule Mining Zhenjiang Lin Group Presentation April 10, 2007

2 Overview Associations Market Basket Analysis Basic Concepts Frequent Itemsets Generating Frequent Itemsets Apriori FP-Growth Applications

3 Association Rule Learners to discover elements that co-occur frequently within a data set consisting of multiple independent selections of elements (such as purchasing transactions), and to discover rules, such as implication or correlation, which relate co-occurring elements. to answer questions such as "if a customer purchases product A, how likely is he to purchase product B?" and "What products will a customer buy if he buys products C and D?" are answered by association-finding algorithms. to reduce a potentially huge amount of information to a small, understandable set of statistically supported statements. also known as “market basket analysis”.

4 Associations Rules expressing relationships between items Example cereal, milk  fruit “People who bought cereal and milk also bought fruit.” Stores might want to offer specials on milk and cereal to get people to buy more fruit.

5 Market Basket Analysis Analyze tables of transactions Can we hypothesize? Chips => Salsa Lettuce => Spinach PersonBasket AChips, Salsa, Cookies, Crackers, Coke, Beer BLettuce, Spinach, Oranges, Celery, Apples, Grapes CChips, Salsa, Frozen Pizza, Frozen Cake DLettuce, Spinach, Milk, Butter

6 Market Baskets In general, data consists of TIDBasket Transaction ID Subset of items

7 Basic Concepts Set of items Transaction Association Rule - set of transactions (i.e., our data)

8 Measuring Interesting Rules Support Ratio of # of transactions containing A and B to the total # of transactions Confidence Ratio of # of transactions containing A and B to #of transactions containing A

9 Measuring Interesting Rules Rules are included/excluded based on two metrics minimum support level - how frequently all of the items in a rule appear in transactions minimum confidence level - how frequently the left hand side of a rule implies the right hand side

10 Market Basket Analysis What is I? What is T for person B? What is s(Chips=>Salsa)? What is c(Chips=>Salsa)? PersonBasket AChips, Salsa, Cookies, Crackers, Coke, Beer BLettuce, Spinach, Oranges, Celery, Apples, Grapes CChips, Salsa, Frozen Pizza, Frozen Cake DLettuce, Spinach, Milk, Butter, Chips

11 Frequent Itemsets itemset – any set of items k-itemset – an itemset containing k items frequent itemset – an itemset that satisfies a minimum support level If I contains m items, how many itemsets are there?

12 Strong Association Rules Given an itemset, it’s easy to generate association rules Given itemset, {Chips, Salsa} => Chips, Salsa Chips => Salsa Salsa => Chips Chips, Salsa => Strong rules are interesting Generally defined as those rules satisfying minimum support and minimum confidence

13 Association Rule Mining Two basic steps Find all frequent itemsets  Satisfying minimum support Find all strong association rules  Generate association rules from frequent itemsets  Keep rules satisfying minimum confidence

14 Generating Frequent Itemsets Naïve algorithm n <- |D| for each subset s of I do l <- 0 for each transaction T in D do if s is a subset of T then l <- l + 1 if minimum support <= l/n then add s to frequent subsets

15 Generating Frequent Itemsets Analysis of naïve algorithm 2 m subsets of I Scan n transactions for each subset O(2 m n) tests of s being subset of T Growth is exponential in the number of items! Can we do better?

16 Generating Frequent Itemsets Frequent itemsets support the apriori property If A is not a frequent itemset, then any superset of A is not a frequent itemset. Proof: Let n be the number of transactions. Suppose A is a subset of l transactions. If A’  A, then A’ is a subset of l’  l transactions. Thus, if l/n < minimum support, so is l’/n.

17 Generating Frequent Itemsets Central idea: Build candidate k-itemsets from frequent (k-1)-itemsets Approach Find all frequent 1-itemsets Extend (k-1)-itemsets to candidate k- itemsets Prune candidate itemsets that do not meet the minimum support.

18 Generating Frequent Itemsets (Basic Apriori) L 1 = {frequent 1-itemsets} for (k=2; L (k-1) is not empty; k++) { C k = generate k-itemset candidates from L (k-1) for each transaction t in D { // The candidates that are subsets of t C t =subset(C k,t) for each candidate c in C t { c.count++; } L k = {c in C k | c.count >= min_sup} } The frequent itemsets are the union of the L k

19 FP Growth (Han, Pei, Yin 2000) One problematic aspect of the Apriori is the candidate generation Source of exponential growth Another approach is to use a divide and conquer strategy Idea: Compress the database into a frequent pattern tree representing frequent items

20 FP Growth (Tree construction) Initially, scan database for frequent 1- itemsets Place resulting set in a list L in descending order by frequency (support) Construct an FP-tree Create a root node labeled null Scan database  Process the items in each transaction in L order  From the root, add nodes in the order in which items appear in the transactions  Link nodes representing items along different branches

21 Frequent 1-itemsets Minimum support of 20% (frequency of 2) Frequent 1-itemsets I1,I2,I3,I4,I5 Construct list L = {(I2,7),(I1,6),(I3,6),(I4,2),(I5,2)} TIDItems 1I1,I2,I5 2I2,I4 3I2,I3,I6 4I1,I2,I4 5I1,I3 6I2,I3 7I1,I3 8I1,I2,I3,I5 9I1,I2,I3

22 Build FP-Tree Create root node null Scan database Transaction1: I1, I2, I5 Order: I2, I1, I5 Process transaction Add nodes in item order Label with items, count (I2,1) (I1,1) (I5,1) 1I5 0I4 0I3 1I1 1I2 Maintain header table

23 Build FP-Tree null (I2,2) (I1,1) (I5,1) 1I5 1I4 0I3 1I1 2I2 (I4,1) TIDItems 1I1,I2,I5 2I2,I4 3I2,I3,I6 4I1,I2,I4 5I1,I3 6I2,I3 7I1,I3 8I1,I2,I3,I5 9I1,I2,I3

24 Minining the FP-tree Start at the last item in the table Find all paths containing item Follow the node-links Identify conditional patterns Patterns in paths with required frequency Build conditional FP-tree C Append item to all paths in C, generating frequent patterns Mine C recursively (appending item) Remove item from table and tree

25 Mining the FP-Tree null (I2,7) (I1,4) (I5,1) 2I5 2I4 6I3 6I1 7I2 (I4,1) (I3,2) (I4,1) (I5,1) (I1,2) (I3,2) Prefix Paths (I2 I1,1) (I2 I1 I3, 1) Conditional Path (I2 I1, 2) Conditional FP-tree (I2 I1 I5, 2) null (I2,2) (I1,2)

26 Applications Web Personalization Genomic Data

27 Web Personalization “Effective Personalization Based on Association Rule Discovery from Web Usage Data,” Mobasher, et al., ACM Workshop on Web Information and Data Management, Personalization and recommendation systems e.g. Amazon.com’s recommended books

28 Data Preprocessing Identify set of pageviews P Which files result in a single browser display (complicated by frames, images, etc.) P = {p 1, …, p n } Transactions T From session IDs or cookies T = {t 1, …, t m }

29 Data Preprocessing A transaction t consists of t = {(p 1 t, w(p 1 t )), …, (p l t,w(p l t ))} The w is a weight associated with the pageview Could be binary (purchase or non-purchase) Could be related to amount of time spent on the page

30 Data Preprocessing In the paper, only considered pageviews in a transaction with w(p) = 1 Ordering of pageviews didn’t matter

31 Recommendation Engine Has to run online i.e. must be fast generate frequent itemsets first and store in a graph data structure for efficient searching Maintains a history of the user’s current session Sets a window size w (e.g. 3) Consider pageviews A, B, C  {A,B,C} If user then visits D  {B,C,D}

32 Genomic Data “Finding Association Rules on Heterogenous Genome Data,” Satou et al. Combined data from PDB, SWISS-PROT, and PROSITE Protein Name sequence feature1 sequence feature2 structure feature1 function1function2 name name200110

33 Genomic Data After mining, association rules were generated (minimum support = 5, minimum confidence = 65%) Post process results with max support of 30 Itemsets appearing too frequently aren’t interesting Reduced to 381 rules

34 Genomic Data Rules generated were corroborated by biological background data Found common substructures in serine endopeptidases Rules were not distributed well over protein families Still some work to be done on the data preprocessing stage

35 Association Rule Summary Association rule mining is a fundamental tool in data mining Several algorithms Apriori: Use a provable mathematical property to improve performance FP-Growth: Stop candidate generation, use effective data structure Correlation Rules: Evaluate interestingness based on statistics Query Flocks: Generalize approach with the purpose of query optimization (incorporation into database systems)

36 Association Rule Summary There exist several extensions Hierarchical attributes (e.g. year->month- >week->day or computer->luggable- >handheld->palm)  Multilevel/multidimensional Numerical attributes Constraint based