Presentation is loading. Please wait.

Presentation is loading. Please wait.

Modul 8: Sequential Pattern Mining. Terminology  Item  Itemset  Sequence (Customer-sequence)  Subsequence  Support for a sequence  Large/frequent.

Similar presentations


Presentation on theme: "Modul 8: Sequential Pattern Mining. Terminology  Item  Itemset  Sequence (Customer-sequence)  Subsequence  Support for a sequence  Large/frequent."— Presentation transcript:

1 Modul 8: Sequential Pattern Mining

2 Terminology  Item  Itemset  Sequence (Customer-sequence)  Subsequence  Support for a sequence  Large/frequent sequence

3 Example Q. How to find the sequential patterns?

4 Example Item Itemset Transaction

5 Example (cont.) Sequence 3-Sequence

6 Subsequence

7 Example (cont.) is supported by customer 1 and 4 is supported by customer 2 and 4 customer 1 and 4 contain

8 Example (cont.) Q. Find the large/frequent sequences with minimum support set to 25%: -Frequent sequence = The sequence with minimum support,,,,,

9 The Algorithm Apriori  Five phases Sort phase Large itemset phase Transformation phase Sequence phase Maximal phase

10 Sort the database with customer-id as the major key and transaction-time as the minor key Sort phase

11  Find the large itemset.  Itemsets mapping Litemset phase

12 Transformation phase  Deleting non-large itemsets  Mapping large itemsets to integers

13 Sequence phase  Use the set of litemsets to find the desired sequence.  Two families of algorithms: Count-all: Algorithm AprioriAll Count-some: Algorithm AprioriSome, Algorithm DynamicSome

14 AprioriAll  The basic method to mine sequential patterns  Based on the Apriori algorithm.  Count all the large sequences, including non-maximal sequences.  Use Apriori-generate function to generate candidate sequence.

15 AprioriAll (cont.) L 1 = {large 1-sequences}; // Result of the phase for ( k=2; L k-1 ≠Φ; k++) do begin C k = New candidate generate from L k-1 foreach customer-sequence c in the database do Increment the count of all candidates in C k that are contained in c L k = Candidates in C k with minimum support. End Answer=Maximal Sequences in U k L k ;

16 Apriori Candidate Generation  generate candidates for pass using only the large sequences found in the previous pass and then makes a pass over the data to find their support.

17  Algorithm: L k the set of all large k-sequences C k the set of candidate k-sequences Apriori Candidate Generation insert into C k select p.litemset 1, p.litemset 2,…, p.litemset k-1, q.litemset k-1 from L k-1 p, L k-1 q where p.litemset 1 =q.litemset 1,…, p.litemset k-2 =q.litemset k-2 ; forall sequences c  C k do forall (k-1)-subsequences s of c do if (s  L k-1 ) then delete c from C k;

18  Example: Transformed Customer Sequences Apriori Candidate Generation next step: find the large 1-sequences With minimum set to 25%

19 next step: find the large 2-sequences SequenceSupport Example Large 1-Sequence 4 2 4 4 2

20 next step: find the large 3-sequences SequenceSupport 2 4 3 3 2 2 3 2 2 Example Large 2-Sequence

21 next step: find the large 4-sequences SequenceSupport 2 2 3 2 2 Example Large 3-Sequence

22 next step: find the maximal sequential pattern SequenceSupport 2 Example Large 4-Sequence

23 Maximal phase  Find the maximum sequences among the set of large sequences.  In some algorithms, this phase is combined with the sequence phase.

24 Maximal phase  Algorithm: S the set of all litemsets n the length of the longest sequence for (k = n; k > 1; k--) do foreach k-sequence s k do Delete from S all subsequences of s k

25 SequenceSupport 2 Example SequenceSupport 4 2 4 4 2 SequenceSupport 2 4 3 3 2 2 3 2 2 SequenceSupport 2 2 3 2 2 Find the maximal large sequences

26 26 Examples of Sequence Data Sequence Database SequenceElement (Transaction) Event (Item) CustomerPurchase history of a given customer A set of items bought by a customer at time t Books, diary products, CDs, etc Web DataBrowsing activity of a particular Web visitor A collection of files viewed by a Web visitor after a single mouse click Home page, index page, contact info, etc Event dataHistory of events generated by a given sensor Events triggered by a sensor at time t Types of alarms generated by sensors Genome sequences DNA sequence of a particular species An element of the DNA sequence Bases A,T,G,C

27 27 Examples of Sequence  Web sequence:  Sequence of initiating events causing the nuclear accident at 3-mile Island: (http://stellar-one.com/nuclear/staff_reports/summary_SOE_the_initiating_event.htm)  Sequence of books checked out at a library:

28 28 GSP algorithm

29 29 Candidate generation  Contains 2 phase: Join phase and Prune phase  Join phase: C k = F k-1 x F k-1 A sequence s1 and s2 in F k-1 can be joined if the subsequence obtained by dropping the first item of s1 is the same as the subsequence obtained by dropping the last item of s2. The resulting sequence is the sequence s1 extended by the last item in s2.  The added item becomes a separate element if it was a separate element in s2, and part of element s1 otherwise

30 30 Candidate Generation Examples  Merging the sequences w 1 = and w 2 = will produce the candidate sequence because the last two events in w 2 (4 and 5) belong to the same element  Merging the sequences w 1 = and w 2 = will produce the candidate sequence because the last two events in w 2 (4 and 5) do not belong to the same element  We do not have to merge the sequences w 1 = and w 2 = to produce the candidate because if the latter is a viable candidate, then it can be obtained by merging w 1 with

31 31  Pruning phase: Delete candidate sequences that have an infrequent (k-1)- subsequence.

32

33 33 GSP Example

34 34 Database Example

35 35 The mining result


Download ppt "Modul 8: Sequential Pattern Mining. Terminology  Item  Itemset  Sequence (Customer-sequence)  Subsequence  Support for a sequence  Large/frequent."

Similar presentations


Ads by Google