Modul 8: Sequential Pattern Mining. Terminology  Item  Itemset  Sequence (Customer-sequence)  Subsequence  Support for a sequence  Large/frequent.

Slides:



Advertisements
Similar presentations
Mining Sequence Data. © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Sequence Data ObjectTimestampEvents A102, 3, 5 A206, 1 A231 B114,
Advertisements

Association Rule and Sequential Pattern Mining for Episode Extraction Jonathan Yip.
Sequential PAttern Mining using A Bitmap Representation
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Data Mining (Apriori Algorithm)DCS 802, Spring DCS 802 Data Mining Apriori Algorithm Spring of 2002 Prof. Sung-Hyuk Cha School of Computer Science.
Mining Frequent Patterns II: Mining Sequential & Navigational Patterns Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Data e Web Mining Paolo Gobbo
LOGO Association Rule Lecturer: Dr. Bo Yuan
Mining Sequential Patterns Authors: Rakesh Agrawal and Ramakrishnan Srikant. Presenter: Jeremy Dalmer.
Data Mining Techniques Cluster Analysis Induction Neural Networks OLAP Data Visualization.
Rakesh Agrawal Ramakrishnan Srikant
Efficiently Mining Long Patterns from Databases Roberto J. Bayardo Jr. IBM Almaden Research Center.
Chapter 5: Mining Frequent Patterns, Association and Correlations
Generalized Sequential Pattern (GSP) Step 1: – Make the first pass over the sequence database D to yield all the 1-element frequent sequences Step 2: Repeat.
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
4/3/01CS632 - Data Mining1 Data Mining Presented By: Kevin Seng.
Mining Sequential Patterns Rakesh Agrawal Ramakrishnan Srikant Proc. of the Int’l Conference on Data Engineering (ICDE) March 1995 Presenter: Phil Schlosser.
Keith C.C. Chan Department of Computing The Hong Kong Polytechnic University Ch 2 Discovering Association Rules COMP 578 Data Warehousing & Data Mining.
ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis)
Association Rule Mining (Some material adapted from: Mining Sequential Patterns by Karuna Pande Joshi)‏
2/8/00CSE 711 data mining: Apriori Algorithm by S. Cha 1 CSE 711 Seminar on Data Mining: Apriori Algorithm By Sung-Hyuk Cha.
Data Mining Association Rules: Advanced Concepts and Algorithms
Mining Association Rules
Mining Sequences. Examples of Sequence Web sequence:  {Homepage} {Electronics} {Digital Cameras} {Canon Digital Camera} {Shopping Cart} {Order Confirmation}
Data Mining Association Rules: Advanced Concepts and Algorithms Lecture Notes for Chapter 7 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
1 Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant Slides from Ofer Pasternak.
Mining Association Rules
Mining Sequential Patterns: Generalizations and Performance Improvements R. Srikant R. Agrawal IBM Almaden Research Center Advisor: Dr. Hsu Presented by:
Data Mining Association Rules: Advanced Concepts and Algorithms
Data Mining Techniques Sequential Patterns. Sequential Pattern Mining Progress in bar-code technology has made it possible for retail organizations to.
Advanced Association Rule Mining and Beyond. Continuous and Categorical Attributes Example of Association Rule: {Number of Pages  [5,10)  (Browser=Mozilla)}
Chapter 2: Association Rules & Sequential Patterns.
Mining Sequential Patterns Rakesh Agrawal Ramakrishnan Srikant Proc. of the Int ’ l Conference on Data Engineering (ICDE) March 1995 Presenter: Sam Brown.
Data Mining Association Rules: Advanced Concepts and Algorithms Lecture Notes for Chapter 7 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Rules: Advanced Concepts and Algorithms
Data Mining Association Analysis Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/
Data Mining Association Rules: Advanced Concepts and Algorithms Lecture Notes for Chapter 7 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Modul 8: Sequential Pattern Mining
Mining Association Rules: Advanced Concepts and Algorithms Lecture Notes for Chapter 7 By Gun Ho Lee Intelligent Information Systems Lab.
Data & Text Mining1 Introduction to Association Analysis Zhangxi Lin ISQS 3358 Texas Tech University.
Sequential Pattern Mining
Fast Algorithms For Mining Association Rules By Rakesh Agrawal and R. Srikant Presented By: Chirayu Modi.
Christoph F. Eick Questions and Topics Review Dec. 6, Compare AGNES /Hierarchical clustering with K-means; what are the main differences? 2 Compute.
Data Mining Association Rules: Advanced Concepts and Algorithms Lecture Notes Introduction to Data Mining by Tan, Steinbach, Kumar.
Data Mining Association Rules: Advanced Concepts and Algorithms
1 Efficient Algorithms for Incremental Update of Frequent Sequences Minghua ZHANG Dec. 7, 2001.
1 1 MSCIT 5210: Knowledge Discovery and Data Mining Acknowledgement: Slides modified by Dr. Lei Chen based on the slides provided by Tan, Steinbach, Kumar.
Data Mining Find information from data data ? information.
S EQUENTIAL P ATTERNS & THE GSP A LGORITHM BY : J OE C ASABONA.
Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Course on Data Mining: Seminar Meetings Page 1/30 Course on Data Mining ( ): Seminar Meetings Ass. Rules EpisodesEpisodes Text Mining
Mining Sequential Patterns © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 Slides are adapted from Introduction to Data Mining by Tan, Steinbach,
COMP53311 Association Rule Mining Prepared by Raymond Wong Presented by Raymond Wong
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining Jinze Liu.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining COMP Seminar BCB 713 Module Spring 2011.
Rakesh Agrawal Ramakrishnan Srikant Proc. of the Int ’ l Conference on Data Engineering (ICDE) March 1995 Presenter: İlkcan Keleş.
Business Intelligence Technologies – Data Mining ` Lecture 2 Market Basket Analysis, Association Rules 1.
Sequential Pattern Mining
Spring 2016 Presentation by: Julianne Daly
Mining Association Rules: Advanced Concepts and Algorithms
Data Mining Association Rules: Advanced Concepts and Algorithms
Mining Sequential Patterns
Mining Sequential Patterns
Data Mining Association Rules: Advanced Concepts and Algorithms
Association Rule Mining
COMP5331 FP-Tree Prepared by Raymond Wong Presented by Raymond Wong
Mining Sequential Patterns
Presentation transcript:

Modul 8: Sequential Pattern Mining

Terminology  Item  Itemset  Sequence (Customer-sequence)  Subsequence  Support for a sequence  Large/frequent sequence

Example Q. How to find the sequential patterns?

Example Item Itemset Transaction

Example (cont.) Sequence 3-Sequence

Subsequence

Example (cont.) is supported by customer 1 and 4 is supported by customer 2 and 4 customer 1 and 4 contain

Example (cont.) Q. Find the large/frequent sequences with minimum support set to 25%: -Frequent sequence = The sequence with minimum support,,,,,

The Algorithm Apriori  Five phases Sort phase Large itemset phase Transformation phase Sequence phase Maximal phase

Sort the database with customer-id as the major key and transaction-time as the minor key Sort phase

 Find the large itemset.  Itemsets mapping Litemset phase

Transformation phase  Deleting non-large itemsets  Mapping large itemsets to integers

Sequence phase  Use the set of litemsets to find the desired sequence.  Two families of algorithms: Count-all: Algorithm AprioriAll Count-some: Algorithm AprioriSome, Algorithm DynamicSome

AprioriAll  The basic method to mine sequential patterns  Based on the Apriori algorithm.  Count all the large sequences, including non-maximal sequences.  Use Apriori-generate function to generate candidate sequence.

AprioriAll (cont.) L 1 = {large 1-sequences}; // Result of the phase for ( k=2; L k-1 ≠Φ; k++) do begin C k = New candidate generate from L k-1 foreach customer-sequence c in the database do Increment the count of all candidates in C k that are contained in c L k = Candidates in C k with minimum support. End Answer=Maximal Sequences in U k L k ;

Apriori Candidate Generation  generate candidates for pass using only the large sequences found in the previous pass and then makes a pass over the data to find their support.

 Algorithm: L k the set of all large k-sequences C k the set of candidate k-sequences Apriori Candidate Generation insert into C k select p.litemset 1, p.litemset 2,…, p.litemset k-1, q.litemset k-1 from L k-1 p, L k-1 q where p.litemset 1 =q.litemset 1,…, p.litemset k-2 =q.litemset k-2 ; forall sequences c  C k do forall (k-1)-subsequences s of c do if (s  L k-1 ) then delete c from C k;

 Example: Transformed Customer Sequences Apriori Candidate Generation next step: find the large 1-sequences With minimum set to 25%

next step: find the large 2-sequences SequenceSupport Example Large 1-Sequence

next step: find the large 3-sequences SequenceSupport Example Large 2-Sequence

next step: find the large 4-sequences SequenceSupport Example Large 3-Sequence

next step: find the maximal sequential pattern SequenceSupport 2 Example Large 4-Sequence

Maximal phase  Find the maximum sequences among the set of large sequences.  In some algorithms, this phase is combined with the sequence phase.

Maximal phase  Algorithm: S the set of all litemsets n the length of the longest sequence for (k = n; k > 1; k--) do foreach k-sequence s k do Delete from S all subsequences of s k

SequenceSupport 2 Example SequenceSupport SequenceSupport SequenceSupport Find the maximal large sequences

26 Examples of Sequence Data Sequence Database SequenceElement (Transaction) Event (Item) CustomerPurchase history of a given customer A set of items bought by a customer at time t Books, diary products, CDs, etc Web DataBrowsing activity of a particular Web visitor A collection of files viewed by a Web visitor after a single mouse click Home page, index page, contact info, etc Event dataHistory of events generated by a given sensor Events triggered by a sensor at time t Types of alarms generated by sensors Genome sequences DNA sequence of a particular species An element of the DNA sequence Bases A,T,G,C

27 Examples of Sequence  Web sequence:  Sequence of initiating events causing the nuclear accident at 3-mile Island: (  Sequence of books checked out at a library:

28 GSP algorithm

29 Candidate generation  Contains 2 phase: Join phase and Prune phase  Join phase: C k = F k-1 x F k-1 A sequence s1 and s2 in F k-1 can be joined if the subsequence obtained by dropping the first item of s1 is the same as the subsequence obtained by dropping the last item of s2. The resulting sequence is the sequence s1 extended by the last item in s2.  The added item becomes a separate element if it was a separate element in s2, and part of element s1 otherwise

30 Candidate Generation Examples  Merging the sequences w 1 = and w 2 = will produce the candidate sequence because the last two events in w 2 (4 and 5) belong to the same element  Merging the sequences w 1 = and w 2 = will produce the candidate sequence because the last two events in w 2 (4 and 5) do not belong to the same element  We do not have to merge the sequences w 1 = and w 2 = to produce the candidate because if the latter is a viable candidate, then it can be obtained by merging w 1 with

31  Pruning phase: Delete candidate sequences that have an infrequent (k-1)- subsequence.

33 GSP Example

34 Database Example

35 The mining result