Modul 8: Sequential Pattern Mining

Slides:



Advertisements
Similar presentations
Mining Sequence Data. © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Sequence Data ObjectTimestampEvents A102, 3, 5 A206, 1 A231 B114,
Advertisements

Association Rule and Sequential Pattern Mining for Episode Extraction Jonathan Yip.
Sequential PAttern Mining using A Bitmap Representation
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Data Mining (Apriori Algorithm)DCS 802, Spring DCS 802 Data Mining Apriori Algorithm Spring of 2002 Prof. Sung-Hyuk Cha School of Computer Science.
Mining Frequent Patterns II: Mining Sequential & Navigational Patterns Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Data e Web Mining Paolo Gobbo
LOGO Association Rule Lecturer: Dr. Bo Yuan
Mining Sequential Patterns Authors: Rakesh Agrawal and Ramakrishnan Srikant. Presenter: Jeremy Dalmer.
Data Mining Techniques Cluster Analysis Induction Neural Networks OLAP Data Visualization.
Rakesh Agrawal Ramakrishnan Srikant
Efficiently Mining Long Patterns from Databases Roberto J. Bayardo Jr. IBM Almaden Research Center.
Chapter 5: Mining Frequent Patterns, Association and Correlations
Generalized Sequential Pattern (GSP) Step 1: – Make the first pass over the sequence database D to yield all the 1-element frequent sequences Step 2: Repeat.
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
4/3/01CS632 - Data Mining1 Data Mining Presented By: Kevin Seng.
Mining Sequential Patterns Rakesh Agrawal Ramakrishnan Srikant Proc. of the Int’l Conference on Data Engineering (ICDE) March 1995 Presenter: Phil Schlosser.
Keith C.C. Chan Department of Computing The Hong Kong Polytechnic University Ch 2 Discovering Association Rules COMP 578 Data Warehousing & Data Mining.
ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis)
Association Rule Mining (Some material adapted from: Mining Sequential Patterns by Karuna Pande Joshi)‏
2/8/00CSE 711 data mining: Apriori Algorithm by S. Cha 1 CSE 711 Seminar on Data Mining: Apriori Algorithm By Sung-Hyuk Cha.
Data Mining Association Rules: Advanced Concepts and Algorithms
Mining Association Rules
Mining Sequences. Examples of Sequence Web sequence:  {Homepage} {Electronics} {Digital Cameras} {Canon Digital Camera} {Shopping Cart} {Order Confirmation}
Data Mining Association Rules: Advanced Concepts and Algorithms Lecture Notes for Chapter 7 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
1 Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant Slides from Ofer Pasternak.
Mining Association Rules
Mining Sequential Patterns: Generalizations and Performance Improvements R. Srikant R. Agrawal IBM Almaden Research Center Advisor: Dr. Hsu Presented by:
Data Mining Association Rules: Advanced Concepts and Algorithms
Data Mining Techniques Sequential Patterns. Sequential Pattern Mining Progress in bar-code technology has made it possible for retail organizations to.
Advanced Association Rule Mining and Beyond. Continuous and Categorical Attributes Example of Association Rule: {Number of Pages  [5,10)  (Browser=Mozilla)}
Chapter 2: Association Rules & Sequential Patterns.
Mining Sequential Patterns Rakesh Agrawal Ramakrishnan Srikant Proc. of the Int ’ l Conference on Data Engineering (ICDE) March 1995 Presenter: Sam Brown.
Modul 8: Sequential Pattern Mining. Terminology  Item  Itemset  Sequence (Customer-sequence)  Subsequence  Support for a sequence  Large/frequent.
Data Mining Association Rules: Advanced Concepts and Algorithms Lecture Notes for Chapter 7 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Rules: Advanced Concepts and Algorithms
Data Mining Association Analysis Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/
Data Mining Association Rules: Advanced Concepts and Algorithms Lecture Notes for Chapter 7 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Mining Association Rules: Advanced Concepts and Algorithms Lecture Notes for Chapter 7 By Gun Ho Lee Intelligent Information Systems Lab.
Data & Text Mining1 Introduction to Association Analysis Zhangxi Lin ISQS 3358 Texas Tech University.
Sequential Pattern Mining
Fast Algorithms For Mining Association Rules By Rakesh Agrawal and R. Srikant Presented By: Chirayu Modi.
Christoph F. Eick Questions and Topics Review Dec. 6, Compare AGNES /Hierarchical clustering with K-means; what are the main differences? 2 Compute.
Data Mining Association Rules: Advanced Concepts and Algorithms Lecture Notes Introduction to Data Mining by Tan, Steinbach, Kumar.
Data Mining Association Rules: Advanced Concepts and Algorithms
1 Efficient Algorithms for Incremental Update of Frequent Sequences Minghua ZHANG Dec. 7, 2001.
1 1 MSCIT 5210: Knowledge Discovery and Data Mining Acknowledgement: Slides modified by Dr. Lei Chen based on the slides provided by Tan, Steinbach, Kumar.
Data Mining Find information from data data ? information.
S EQUENTIAL P ATTERNS & THE GSP A LGORITHM BY : J OE C ASABONA.
Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Course on Data Mining: Seminar Meetings Page 1/30 Course on Data Mining ( ): Seminar Meetings Ass. Rules EpisodesEpisodes Text Mining
Mining Sequential Patterns © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 Slides are adapted from Introduction to Data Mining by Tan, Steinbach,
COMP53311 Association Rule Mining Prepared by Raymond Wong Presented by Raymond Wong
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining Jinze Liu.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining COMP Seminar BCB 713 Module Spring 2011.
Rakesh Agrawal Ramakrishnan Srikant Proc. of the Int ’ l Conference on Data Engineering (ICDE) March 1995 Presenter: İlkcan Keleş.
Business Intelligence Technologies – Data Mining ` Lecture 2 Market Basket Analysis, Association Rules 1.
Sequential Pattern Mining
Spring 2016 Presentation by: Julianne Daly
Mining Association Rules: Advanced Concepts and Algorithms
Data Mining Association Rules: Advanced Concepts and Algorithms
Mining Sequential Patterns
Mining Sequential Patterns
Data Mining Association Rules: Advanced Concepts and Algorithms
Association Rule Mining
COMP5331 FP-Tree Prepared by Raymond Wong Presented by Raymond Wong
Mining Sequential Patterns
Presentation transcript:

Modul 8: Sequential Pattern Mining

TUGAS 2 ewinarko.staff.ugm.ac.id/datamining/tugas2

Terminology Item Itemset Sequence (Customer-sequence) Subsequence Support for a sequence Large/frequent sequence

Example Q. How to find the sequential patterns?

Example Transaction Item Itemset

Example (cont.) Sequence 3-Sequence

Subsequence

Example (cont.) customer 1 and 4 contain <(30) (90)> <(30) (90)> is supported by customer 1 and 4 <30 (40 70)> is supported by customer 2 and 4

Example (cont.) Q. Find the large/frequent sequences with minimum support set to 25%: Frequent sequence = The sequence with minimum support <(30)>, <(40)>, <(70)>, <(90)> <(30) (40)>, <(30) (70)>, <(40 70)>

Examples of Sequence Data Sequence Database Sequence Element (Transaction) Event (Item) Customer Purchase history of a given customer A set of items bought by a customer at time t Books, diary products, CDs, etc Web Data Browsing activity of a particular Web visitor A collection of files viewed by a Web visitor after a single mouse click Home page, index page, contact info, etc Event data History of events generated by a given sensor Events triggered by a sensor at time t Types of alarms generated by sensors Genome sequences DNA sequence of a particular species An element of the DNA sequence Bases A,T,G,C

Examples of Sequence Web sequence: < {Homepage} {Electronics} {Digital Cameras} {Canon Digital Camera} {Shopping Cart} {Order Confirmation} {Return to Shopping} > Sequence of initiating events causing the nuclear accident at 3-mile Island: (http://stellar-one.com/nuclear/staff_reports/summary_SOE_the_initiating_event.htm) < {clogged resin} {outlet valve closure} {loss of feedwater} {condenser polisher outlet valve shut} {booster pumps trip} {main waterpump trips} {main turbine trips} {reactor pressure increases}> Sequence of books checked out at a library: <{Fellowship of the Ring} {The Two Towers} {Return of the King}>

GSP algorithm

Candidate generation Contains 2 phase: Join phase and Prune phase Ck = Fk-1 x Fk-1 A sequence s1 and s2 in Fk-1 can be joined if the subsequence obtained by dropping the first item of s1 is the same as the subsequence obtained by dropping the last item of s2. The resulting sequence is the sequence s1 extended by the last item in s2. The added item becomes a separate element if it was a separate element in s2, and part of element s1 otherwise

Candidate Generation Examples Merging the sequences w1=<{1} {2 3} {4}> and w2 =<{2 3} {4 5}> will produce the candidate sequence < {1} {2 3} {4 5}> because the last two events in w2 (4 and 5) belong to the same element Merging the sequences w1=<{1} {2 3} {4}> and w2 =<{2 3} {4} {5}> will produce the candidate sequence < {1} {2 3} {4} {5}> because the last two events in w2 (4 and 5) do not belong to the same element We do not have to merge the sequences w1 =<{1} {2 6} {4}> and w2 =<{1} {2} {4 5}> to produce the candidate < {1} {2 6} {4 5}> because if the latter is a viable candidate, then it can be obtained by merging w1 with < {1} {2 6} {5}>

Pruning phase: Delete candidate sequences that have an infrequent (k-1)-subsequence.

GSP Example

Database Example

The mining result

The Algorithm Apriori Five phases Sort phase Large itemset phase Transformation phase Sequence phase Maximal phase

Sort phase Sort the database with customer-id as the major key and transaction-time as the minor key

Litemset phase Find the large itemset. Itemsets mapping

Transformation phase Deleting non-large itemsets Mapping large itemsets to integers

Sequence phase Algorithm AprioriAll Algorithm AprioriSome, Use the set of litemsets to find the desired sequence. Two families of algorithms: Count-all: Algorithm AprioriAll Count-some: Algorithm AprioriSome, Algorithm DynamicSome

AprioriAll The basic method to mine sequential patterns Based on the Apriori algorithm. Count all the large sequences, including non-maximal sequences. Use Apriori-generate function to generate candidate sequence.

AprioriAll (cont.) L1 = {large 1-sequences}; // Result of the phase for ( k=2; Lk-1≠Φ; k++) do begin Ck = New candidate generate from Lk-1 foreach customer-sequence c in the database do Increment the count of all candidates in Ck that are contained in c Lk = Candidates in Ck with minimum support. End Answer=Maximal Sequences in UkLk;

Apriori Candidate Generation generate candidates for pass using only the large sequences found in the previous pass and then makes a pass over the data to find their support.

Apriori Candidate Generation Algorithm: Lk the set of all large k-sequences Ck the set of candidate k-sequences insert into Ck select p.litemset1, p.litemset2,…, p.litemsetk-1, q.litemsetk-1 from Lk-1 p, Lk-1 q where p.litemset1=q.litemset1,…, p.litemsetk-2=q.litemsetk-2; forall sequences cCk do forall (k-1)-subsequences s of c do if (sLk-1) then delete c from Ck;

Apriori Candidate Generation Example: Transformed Customer Sequences <{1 5}{2}{3}{4}> <{1}{3}{4}{3 5}> <{1}{2}{3}{4}> <{1}{3}{5}> <{4}{5}> With minimum set to 25% next step: find the large 1-sequences

Example <{1 5}{2}{3}{4}> <{1}{3}{4}{3 5}> Large 1-Sequence <{1 5}{2}{3}{4}> <{1}{3}{4}{3 5}> <{1}{2}{3}{4}> <{1}{3}{5}> <{4}{5}> Sequence Support <1> <2> <3> <4> <5> 4 2 4 4 2 next step: find the large 2-sequences

Example <{1 5}{2}{3}{4}> <{1}{3}{4}{3 5}> Large 2-Sequence Sequence Support <1 2> 2 <1 3> 4 <1 4> 3 <1 5> <2 3> <2 4> <3 4> <3 5> <4 5> <{1 5}{2}{3}{4}> <{1}{3}{4}{3 5}> <{1}{2}{3}{4}> <{1}{3}{5}> <{4}{5}> next step: find the large 3-sequences

Example <{1 5}{2}{3}{4}> <{1}{3}{4}{3 5}> Large 3-Sequence Sequence Support <1 2 3> 2 <1 2 4> <1 3 4> 3 <1 3 5> <2 3 4> <{1 5}{2}{3}{4}> <{1}{3}{4}{3 5}> <{1}{2}{3}{4}> <{1}{3}{5}> <{4}{5}> next step: find the large 4-sequences

Example <{1 5}{2}{3}{4}> <{1}{3}{4}{3 5}> Large 4-Sequence Sequence Support <1 2 3 4> 2 <{1 5}{2}{3}{4}> <{1}{3}{4}{3 5}> <{1}{2}{3}{4}> <{1}{3}{5}> <{4}{5}> next step: find the maximal sequential pattern

Maximal phase Find the maximum sequences among the set of large sequences. In some algorithms, this phase is combined with the sequence phase.

Maximal phase for (k = n; k > 1; k--) do foreach k-sequence sk do Algorithm: S the set of all litemsets n the length of the longest sequence for (k = n; k > 1; k--) do foreach k-sequence sk do Delete from S all subsequences of sk

Example Find the maximal large sequences Sequence Support <1> 4 <2> 2 <3> <4> <5> Sequence Support <1 2> 2 <1 3> 4 <1 4> 3 <1 5> <2 3> <2 4> <3 4> <3 5> <4 5> Sequence Support <1 2 3> 2 <1 2 4> <1 3 4> 3 <1 3 5> <2 3 4> Sequence Support <1 2 3 4> 2