Association Rule and Sequential Pattern Mining for Episode Extraction Jonathan Yip.

Slides:



Advertisements
Similar presentations
Association rule mining
Advertisements

Association Rules Apriori Algorithm
Association Rules Evgueni Smirnov.
Association Rule Mining
Mining Association Rules in Large Databases
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
CSE 634 Data Mining Techniques
Institut für Scientific Computing - Universität WienP.Brezany 1 Datamining Methods Mining Association Rules and Sequential Patterns.
Data Mining Techniques Association Rule
Association Analysis (Data Engineering). Type of attributes in assoc. analysis Association rule mining assumes the input data consists of binary attributes.
Data Mining (Apriori Algorithm)DCS 802, Spring DCS 802 Data Mining Apriori Algorithm Spring of 2002 Prof. Sung-Hyuk Cha School of Computer Science.
Mining Multiple-level Association Rules in Large Databases
Association Rules Spring Data Mining: What is it?  Two definitions:  The first one, classic and well-known, says that data mining is the nontrivial.
IT 433 Data Warehousing and Data Mining Association Rules Assist.Prof.Songül Albayrak Yıldız Technical University Computer Engineering Department
10 -1 Lecture 10 Association Rules Mining Topics –Basics –Mining Frequent Patterns –Mining Frequent Sequential Patterns –Applications.
1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing.
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Mining Data Mining Spring Transactional Database Transaction – A row in the database i.e.: {Eggs, Cheese, Milk} Transactional Database.
Association Rules l Mining Association Rules between Sets of Items in Large Databases (R. Agrawal, T. Imielinski & A. Swami) l Fast Algorithms for.
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Association rules Apriori algorithm FP grow algorithm.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis: Basic Concepts and Algorithms.
Data Mining Association Analysis: Basic Concepts and Algorithms
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Fast Algorithms for Association Rule Mining
Mining Sequences. Examples of Sequence Web sequence:  {Homepage} {Electronics} {Digital Cameras} {Canon Digital Camera} {Shopping Cart} {Order Confirmation}
Mining Association Rules in Large Databases. What Is Association Rule Mining?  Association rule mining: Finding frequent patterns, associations, correlations,
Apriori algorithm Seminar of Popular Algorithms in Data Mining and Machine Learning, TKK Presentation Lauri Lahti.
Association Rules. 2 Customer buying habits by finding associations and correlations between the different items that customers place in their “shopping.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Introduction of Data Mining and Association Rules cs157 Spring 2009 Instructor: Dr. Sin-Min Lee Student: Dongyi Jia.
CS 8751 ML & KDDSupport Vector Machines1 Mining Association Rules KDD from a DBMS point of view –The importance of efficiency Market basket analysis Association.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Association Rules presented by Zbigniew W. Ras *,#) *) University of North Carolina – Charlotte #) ICS, Polish Academy of Sciences.
CMU SCS : Multimedia Databases and Data Mining Lecture #30: Data Mining - assoc. rules C. Faloutsos.
Data Mining  Association Rule  Classification  Clustering.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Chapter 8 Association Rules. Data Warehouse and Data Mining Chapter 10 2 Content Association rule mining Mining single-dimensional Boolean association.
Chap 6: Association Rules. Rule Rules!  Motivation ~ recent progress in data mining + warehousing have made it possible to collect HUGE amount of data.
Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.
COMP53311 Association Rule Mining Prepared by Raymond Wong Presented by Raymond Wong
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining Jinze Liu.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining COMP Seminar BCB 713 Module Spring 2011.
Introduction to Machine Learning Lecture 13 Introduction to Association Rules Albert Orriols i Puig Artificial.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
Data Mining – Association Rules
Data Mining Association Analysis: Basic Concepts and Algorithms
Knowledge discovery & data mining Association rules and market basket analysis--introduction UCLA CS240A Course Notes*
Frequent Pattern Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Rules Assoc.Prof.Songül Varlı Albayrak
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Unit 3 MINING FREQUENT PATTERNS ASSOCIATION AND CORRELATIONS
©Jiawei Han and Micheline Kamber
15-826: Multimedia Databases and Data Mining
Association Analysis: Basic Concepts
Presentation transcript:

Association Rule and Sequential Pattern Mining for Episode Extraction Jonathan Yip

Introduction to Association Rule Associating multiple objects/events together Associating multiple objects/events together Example: A customer buying a laptop also Example: A customer buying a laptop also buys a wireless LAN card (2- itemset) buys a wireless LAN card (2- itemset) Wireless LAN Card Laptop Laptop Wireless LAN Card

Association Rule (cont) Measures of Rule Interestingness Support == P(Laptop LAN card)Support == P(Laptop LAN card) Probability that all studied sets occur Confidence == P(LAN card Laptop)Confidence == P(LAN card Laptop) =P(Laptop U LAN card)/P(Laptop) Conditional Probability that a customer bought Laptop also bought Wireless LAN card Buy both Thresholds: Minimum Support: 25% Minimum Confidence: 30% [Support = 40%, Confidence = 60%] Laptop Wireless LAN Card

Association Rule (eg.) TIDItems 1 Bread, Coke, Milk 2 Chips, Bread 3 Coke, Eggs, Milk Coke, Eggs, Milk 4 Bread, Eggs, Milk, Coke 5 Coke, Eggs, Milk Min_Sup = 25% Min_Conf = 25% Milk Eggs Support :P(Milk Eggs) = 3/5 = 60% Support : P(Milk Eggs) = 3/5 = 60% Confidence :P (Eggs|Milk) Confidence : P (Eggs|Milk) = P(Milk U Eggs)/P(Milk) P(Milk) = 4/5 = 80% P(Eggs Milk)=60%/80% = 75% (75% Confidence that a customer buys milk also buys eggs)

Types of Association Boolean vs. QuantitativeBoolean vs. Quantitative Single dimension vs. Multiple dimensionSingle dimension vs. Multiple dimension Single level vs. Multiple level AnalysisSingle level vs. Multiple level Analysis Example: Example: 1.) Gender(X,Male) ^ Income(X,>50K) ^Age(X,35…50) Buys (X, BMW Sedan) Buys (X, BMW Sedan) 2.) Income(X,,>50K) Buys (X, BMW Sedan) 3.) Gender(X,Male) ^ Income(X,>50K) ^Age(X,35…50) Buys (X, BMW 540i)

Association Rule (DB Miner)

Apriori Algorithm Purpose Purpose To mine frequent itemsets for boolean To mine frequent itemsets for boolean association rules association rules Use prior knowledge to predict future values Use prior knowledge to predict future values Has to be frequent (Support>Min_Sup) Has to be frequent (Support>Min_Sup) Anti-monotone concept Anti-monotone concept If a set cannot pass a min_sup test, all If a set cannot pass a min_sup test, all supersets will fail as well supersets will fail as well

Apriori Algorithm Psuedo-Code Pseudo-code:Pseudo-code: C k : Candidate itemset of size k L k : frequent itemset of size k L 1 = {frequent items}; for (k = 1; L k != ; k++) do begin C k+1 = candidates generated from L k ; C k+1 = candidates generated from L k ; for each transaction t in database do for each transaction t in database do increment the count of all candidates in C k+1 that are contained in t increment the count of all candidates in C k+1 that are contained in t L k+1 = candidates in C k+1 with min_support L k+1 = candidates in C k+1 with min_support end end return k L k ;

Apriori Algorithm Procedures Step 1 Scan & find support of each item (C1): TIDItems 1 Bread, Coke, Milk 2 Chips, Bread 3 Coke, Eggs, Milk Coke, Eggs, Milk 4 Bread, Eggs, Milk, Coke 5 Coke, Eggs, Milk Example revisited: 5 – itemset with 5 transactions Min_Sup = 25% Min Support Count = 2 items Min Support Count = 2 items Min_Conf = 25% ItemssupportBread3 Coke4 Milk4 Chips 1 (fail) Eggs3 ItemssupportBread3 Coke4 Milk4 Eggs3 Step 2 Compare with Min_Sup and eliminate (prune) I<Min_Sup (L1):

Apriori Algorithm (cont) Supports Bread & Coke:2/5=40% Bread & Milk:2/5=40% Bread & Eggs:1/5=20% Coke & Milk:4/5=80% Coke & Eggs:2/5=40% Milk & Eggs:3/5=60% ItemsBread Coke Milk Eggs ItemsBread Coke Milk Eggs Step 3 Join (L1 L1) Repeated Step: Eliminate (prune) items<min_supPrune (C2): L1 set

Supports Bread & Coke Bread & Milk Coke & Milk Coke & Eggs Milk & Eggs L2 set Join L2 L2 Supports Bread & Coke Bread & Milk Coke & Milk Coke & Eggs Milk & Eggs ItemsSupport Bread & Coke & Milk 2 Bread & Coke & Eggs 1 (fail) Bread & Coke & Milk & Eggs 1 (fail) Coke & Milk & Eggs 3 L2 set Compare with Min_Sup then eliminate (prune) items <Min_sup: Conclusion: Bread & Coke & Milk have strong correlationBread & Coke & Milk have strong correlation Coke & Milk & Eggs have strong correlationCoke & Milk & Eggs have strong correlation Apriori Algorithm (cont)

Sequential Pattern Mining Introduction Mining of frequently occurring patterns related to time or other sequencesMining of frequently occurring patterns related to time or other sequencesExamples 70% of customers rent Star Wars, then Empire Strikes Back, and then Return of the Jedi70% of customers rent Star Wars, then Empire Strikes Back, and then Return of the JediApplication Intrusion detection on computersIntrusion detection on computers Web access patternWeb access pattern Predict disease with sequence of symptomsPredict disease with sequence of symptoms Many other areasMany other areas Star WarsEmpire Strikes Back Return of the Jedi

Sequential Pattern Mining (cont) Steps: Sort PhaseSort Phase Sort by Cust_ID, Transaction_ID Sort by Cust_ID, Transaction_ID Litemset PhaseLitemset Phase Find large itemsets Find large itemsets Transform PhaseTransform Phase Eliminates items < min_sup Eliminates items < min_sup Sequence PhaseSequence Phase Find desired sequences Find desired sequences Maximal PhaseMaximal Phase Find the maximal sequences among set of large sequences Find the maximal sequences among set of large sequences

Sequential Pattern Mining (cont) Cust ID Trans. Time Items Bought 1 June June June , 2 2 June June , 6, 7 3 June , 5, 7 4 June June , 7 4 July June Example: Database sorted by Cust_ID & Transaction Time (Min_sup=25%) Organized format with Cust_ID: Cust ID Original Sequence 1 {(3) (9)} 2 {(1,2) (3) (4,6,7)} 3{(3,5,7)} 4 {(3) (4,7) (9)} 5{(9)}

Sequential Pattern Mining (cont) Cust ID Original Sequence Items to study SupportCount 1{(3)(9)} {(3)} {(9)} {(3,9)} 3,3, 2 5{(9)}{(9)}1 Step 1: Sort (examples of several transaction): Conclusion: >25% Min_sup {(3) (9)} && {(3) (4,7)}

Sequential Pattern Mining (cont) Cust ID Original Sequence Transformed Cust. Sequence After mapping 1 {(3) (9)} ({3} {(9)} ({1} {5}) 2 {(1,2) (3) (4,6,7)} {(3}) {(4) (7) (4,7)} ({1} {2 3 4}) 3{(3,5,7)} {(3) (7)} ({1,3}) 4 {(3) (4,7) (9)} ({3} {(4) (7) (4 7)} {(9)} ({1} {2 3 4} {5}) 5{(9)}{(9)}({5}) Data sequence of each customer: Sequences < min_support: {(1,2) (3)}, {(3)},{(4)},{(7)},{(9)}, {(3) (4)}, {(3) (7), {(4) (7)} Support > 25% {(3) (9)} {(3) (4 7)} The most right column implies customers buying patterns L Item Ma pp ed To (30)1 (40)2 (70)3 (40 70) 4 (90)5 Step 2: Litemset phase

Sequential Pattern Mining Algorithm Algorithm AprioriAllAprioriAll Count all large sequence, including those not maximal Pseudo-code: Ck: Candidate sequence of size k Lk : frequent or large sequence of size k L1 = {large 1-sequence}; //result of litemset phase for (k = 2; Lk != ; k++) do begin Ck = candidates generated from Lk-1; for each customer sequence c in database do Increment the count of all candidates in Ck that are contained in c end Answer=Maximal sequences in k Lk; AprioriSome AprioriSome Generates every candidate sequence, but skips counting some large sequences (Forward Phase). Then, discards candidates not maximal and counts remaining large sequences (Backward Phase).

Episode Extraction A partially ordered collection of events occurring togetherA partially ordered collection of events occurring together Goal: To analyze sequence of events, and to discover recurrent episodesGoal: To analyze sequence of events, and to discover recurrent episodes First finding small frequent episodes then progressively looking larger episodesFirst finding small frequent episodes then progressively looking larger episodes Types of episodesTypes of episodes Serial () – E occurs before F Serial () – E occurs before F Parallel() – No constraints on Parallel() – No constraints on relativelyorder of A & B relativelyorder of A & B Non-Serial/Non-Parallel () Non-Serial/Non-Parallel () - Occurrence of A & B - Occurrence of A & B precedes C precedes C E F A B A B C

Episode Extraction (cont) E D F A B C E F C D B A D C E F C B E A E C F A S = {(A 1,t 1 ),(A 2,t 2 ),….,(A n, t n ) s={(E,31),(D,32),(F,33)….(A,65)} Time window is set to bind the interestingnessTime window is set to bind the interestingness W(s,5) slides and snapshot the whole sequence W(s,5) slides and snapshot the whole sequence eg. (w,35,40) contains A,B,C,E episodes, occurs but not eg. (w,35,40) contains A,B,C,E episodes, occurs but not User specifies how many windows an episode has to occur to be User specifies how many windows an episode has to occur to be frequent frequent Formula : Formula : A Sequence of events:

Episode Extraction Minimal occurrences Look at exact occurrences of episodes & relationships between occurrences Look at exact occurrences of episodes & relationships between occurrences Can modify width of windowCan modify width of window Eliminates unnecessary repetition of the recognition effortEliminates unnecessary repetition of the recognition effort ExampleExample mo( ) = {[35,38), [46,48),[57,60)} mo( ) = {[35,38), [46,48),[57,60)} When episode is a subepisode of another; this relation is used forWhen episode is a subepisode of another; this relation is used for discovering all frequent episodes discovering all frequent episodes

Applications of Episodes Extraction Computer SecurityComputer Security BioinformaticsBioinformatics FinanceFinance Market AnalysisMarket Analysis And more……And more……

References Discovery of Frequent Episodes in Event Sequences (Manilla,Toivonen, Verkamo) Mining Sequential Patterns (Agrawal, Srikant) Principles of Data Mining (Hand, Manilla, Smyth) 2001 Data Mining Concepts and Techniques (Han, Kamber) 2001

END