Presentation is loading. Please wait.

Presentation is loading. Please wait.

Modul 8: Sequential Pattern Mining

Similar presentations


Presentation on theme: "Modul 8: Sequential Pattern Mining"— Presentation transcript:

1 Modul 8: Sequential Pattern Mining

2 TUGAS 2 ewinarko.staff.ugm.ac.id/datamining/tugas2

3 Terminology Item Itemset Sequence (Customer-sequence) Subsequence
Support for a sequence Large/frequent sequence

4 Example Q. How to find the sequential patterns?

5 Example Transaction Item Itemset

6 Example (cont.) Sequence 3-Sequence

7 Subsequence

8 Example (cont.) customer 1 and 4 contain <(30) (90)>
<(30) (90)> is supported by customer 1 and 4 <30 (40 70)> is supported by customer 2 and 4

9 Example (cont.) Q. Find the large/frequent sequences
with minimum support set to 25%: Frequent sequence = The sequence with minimum support <(30)>, <(40)>, <(70)>, <(90)> <(30) (40)>, <(30) (70)>, <(40 70)>

10 Examples of Sequence Data
Sequence Database Sequence Element (Transaction) Event (Item) Customer Purchase history of a given customer A set of items bought by a customer at time t Books, diary products, CDs, etc Web Data Browsing activity of a particular Web visitor A collection of files viewed by a Web visitor after a single mouse click Home page, index page, contact info, etc Event data History of events generated by a given sensor Events triggered by a sensor at time t Types of alarms generated by sensors Genome sequences DNA sequence of a particular species An element of the DNA sequence Bases A,T,G,C

11 Examples of Sequence Web sequence:
< {Homepage} {Electronics} {Digital Cameras} {Canon Digital Camera} {Shopping Cart} {Order Confirmation} {Return to Shopping} > Sequence of initiating events causing the nuclear accident at 3-mile Island: ( < {clogged resin} {outlet valve closure} {loss of feedwater} {condenser polisher outlet valve shut} {booster pumps trip} {main waterpump trips} {main turbine trips} {reactor pressure increases}> Sequence of books checked out at a library: <{Fellowship of the Ring} {The Two Towers} {Return of the King}>

12 GSP algorithm

13 Candidate generation Contains 2 phase: Join phase and Prune phase
Ck = Fk-1 x Fk-1 A sequence s1 and s2 in Fk-1 can be joined if the subsequence obtained by dropping the first item of s1 is the same as the subsequence obtained by dropping the last item of s2. The resulting sequence is the sequence s1 extended by the last item in s2. The added item becomes a separate element if it was a separate element in s2, and part of element s1 otherwise

14 Candidate Generation Examples
Merging the sequences w1=<{1} {2 3} {4}> and w2 =<{2 3} {4 5}> will produce the candidate sequence < {1} {2 3} {4 5}> because the last two events in w2 (4 and 5) belong to the same element Merging the sequences w1=<{1} {2 3} {4}> and w2 =<{2 3} {4} {5}> will produce the candidate sequence < {1} {2 3} {4} {5}> because the last two events in w2 (4 and 5) do not belong to the same element We do not have to merge the sequences w1 =<{1} {2 6} {4}> and w2 =<{1} {2} {4 5}> to produce the candidate < {1} {2 6} {4 5}> because if the latter is a viable candidate, then it can be obtained by merging w1 with < {1} {2 6} {5}>

15 Pruning phase: Delete candidate sequences that have an infrequent (k-1)-subsequence.

16 GSP Example

17 Database Example

18 The mining result

19 The Algorithm Apriori Five phases Sort phase Large itemset phase
Transformation phase Sequence phase Maximal phase

20 Sort phase Sort the database with customer-id as the major key and transaction-time as the minor key

21 Litemset phase Find the large itemset. Itemsets mapping

22 Transformation phase Deleting non-large itemsets
Mapping large itemsets to integers

23 Sequence phase Algorithm AprioriAll Algorithm AprioriSome,
Use the set of litemsets to find the desired sequence. Two families of algorithms: Count-all: Algorithm AprioriAll Count-some: Algorithm AprioriSome, Algorithm DynamicSome

24 AprioriAll The basic method to mine sequential patterns
Based on the Apriori algorithm. Count all the large sequences, including non-maximal sequences. Use Apriori-generate function to generate candidate sequence.

25 AprioriAll (cont.) L1 = {large 1-sequences}; // Result of the phase
for ( k=2; Lk-1≠Φ; k++) do begin Ck = New candidate generate from Lk-1 foreach customer-sequence c in the database do Increment the count of all candidates in Ck that are contained in c Lk = Candidates in Ck with minimum support. End Answer=Maximal Sequences in UkLk;

26 Apriori Candidate Generation
generate candidates for pass using only the large sequences found in the previous pass and then makes a pass over the data to find their support.

27 Apriori Candidate Generation
Algorithm: Lk the set of all large k-sequences Ck the set of candidate k-sequences insert into Ck select p.litemset1, p.litemset2,…, p.litemsetk-1, q.litemsetk-1 from Lk-1 p, Lk-1 q where p.litemset1=q.litemset1,…, p.litemsetk-2=q.litemsetk-2; forall sequences cCk do forall (k-1)-subsequences s of c do if (sLk-1) then delete c from Ck;

28 Apriori Candidate Generation
Example: Transformed Customer Sequences <{1 5}{2}{3}{4}> <{1}{3}{4}{3 5}> <{1}{2}{3}{4}> <{1}{3}{5}> <{4}{5}> With minimum set to 25% next step: find the large 1-sequences

29 Example <{1 5}{2}{3}{4}> <{1}{3}{4}{3 5}>
Large 1-Sequence <{1 5}{2}{3}{4}> <{1}{3}{4}{3 5}> <{1}{2}{3}{4}> <{1}{3}{5}> <{4}{5}> Sequence Support <1> <2> <3> <4> <5> 4 2 4 4 2 next step: find the large 2-sequences

30 Example <{1 5}{2}{3}{4}> <{1}{3}{4}{3 5}>
Large 2-Sequence Sequence Support <1 2> 2 <1 3> 4 <1 4> 3 <1 5> <2 3> <2 4> <3 4> <3 5> <4 5> <{1 5}{2}{3}{4}> <{1}{3}{4}{3 5}> <{1}{2}{3}{4}> <{1}{3}{5}> <{4}{5}> next step: find the large 3-sequences

31 Example <{1 5}{2}{3}{4}> <{1}{3}{4}{3 5}>
Large 3-Sequence Sequence Support <1 2 3> 2 <1 2 4> <1 3 4> 3 <1 3 5> <2 3 4> <{1 5}{2}{3}{4}> <{1}{3}{4}{3 5}> <{1}{2}{3}{4}> <{1}{3}{5}> <{4}{5}> next step: find the large 4-sequences

32 Example <{1 5}{2}{3}{4}> <{1}{3}{4}{3 5}>
Large 4-Sequence Sequence Support < > 2 <{1 5}{2}{3}{4}> <{1}{3}{4}{3 5}> <{1}{2}{3}{4}> <{1}{3}{5}> <{4}{5}> next step: find the maximal sequential pattern

33 Maximal phase Find the maximum sequences among the set of large sequences. In some algorithms, this phase is combined with the sequence phase.

34 Maximal phase for (k = n; k > 1; k--) do foreach k-sequence sk do
Algorithm: S the set of all litemsets n the length of the longest sequence for (k = n; k > 1; k--) do foreach k-sequence sk do Delete from S all subsequences of sk

35 Example Find the maximal large sequences Sequence Support <1> 4
<2> 2 <3> <4> <5> Sequence Support <1 2> 2 <1 3> 4 <1 4> 3 <1 5> <2 3> <2 4> <3 4> <3 5> <4 5> Sequence Support <1 2 3> 2 <1 2 4> <1 3 4> 3 <1 3 5> <2 3 4> Sequence Support < > 2


Download ppt "Modul 8: Sequential Pattern Mining"

Similar presentations


Ads by Google