Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University.

Similar presentations


Presentation on theme: "Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University."— Presentation transcript:

1 Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University (SIGKDD 2002) Presenter 李佩書 P 楊璨瑜 P 陳奕廷 P 李昕純 Q /11/20 1

2 Outline 1. Introduction 2. The SPAM algorithm 3. Data representation 4. Experimental 5. Conclusion & Discussion 2014/11/20 2

3 Introduction 2014/11/20 3

4 Sequential Patterns R. Agrawal and R. Srikant.(In ICDE 1995) Algorithm : AprioriALL, AprioriSOME, PrefixSpan… 2014/11/20 4

5 Problem 2014/11/20 5

6 SPAM Algorithm Sequential PAttern Mining Algorithm The first DFS(depth-first search) strategy for mining sequential patterns Vertical bitmap representation for simple, efficient counting. 2014/11/20 6

7 The SPAM Algorithm 2014/11/20 7

8 Lexicographic Tree Sequence-extended Sequence (S-step) Generate by adding a new transaction consisting of a single item to the end of sequence Ex: ({a, b, c}, {a, b})→({a, b, c}, {a, b}, {a}) Itemset-extended sequence (I-step) Generate by adding an item to the last itemset in the sequence Ex 1: ({a, b, c}, {a, b}) →({a, b, c}, {a, b, d}) Ex 2: ({a, b, c}, {a, b, d}) →({a, b, c}, {a, b, d, c}) Identifies two sets of each node n S n : the set of candidate items for S-step extensions I n : the set of candidate items for I-step extensions 2014/11/20 8

9 I={a,b} 2014/11/20 9

10 Pruning Apriori-Based Minimizing the size of S n and I n Pruning candidate by DFS. S-step Pruning I-step Pruning 2014/11/20 10

11 S-step Pruning S ({a}) = {a, b, c, d} I ({a}) = {b, c, d} S ({a}, {a}) = S ({a}, {b}) = {a, b, c, d} I ({a}, {a}) = {b, c, d} I ({a}, {b}) = {c, d} 2014/11/20 11

12 I-step Pruning S ({a, b}) = S ({a, d}) = {a, b} I ({a}, {b}) = {c, d} I ({a}, {d}) = {} 2014/11/20 12

13 2014/11/20 13

14 Data Representation 2014/11/20 14

15 If the size of a sequence between 2 k +1 and 2 k+1 2 k+1 -bit sequence We store each candidate sequence as a vertical bitmap Each customer is assigned a fixed slice of each bitmap for all of its transactions 2014/11/20 15

16 Bitmap of itemset {a} {b} {a,b} & 2014/11/20 16

17 Bitmap of sequence Define B(s) as the bitmap for sequence s. In sequence s If the last itemset is in transaction j and the other itemsets is in transaction before j Then set 1 , otherwise set 0 Example1: Customer ID Transaction ID Itemset 11{b} 12{d} 13{e} 14{c} ({b},{c}) 2014/11/20 17

18 Bitmap of sequence Define B(s) as the bitmap for sequence s. In sequence s If the last itemset is in transaction j and the other itemsets is in transaction before j Then set 1 , otherwise set 0 Example1: Customer ID Transaction ID Itemset 11{b} 12{d} 13{e} 14{c} ({b},{c}) 2014/11/20 18

19 Bitmap of sequence Define B(s) as the bitmap for sequence s. In sequence s I f the last itemset is in transaction j and the other itemsets is in transaction before j Then set 1 , otherwise set 0 Example1: Customer ID Transaction ID Itemset 11{b} 12{d} 13{e} 14{c} ({b},{c}) 2014/11/20 19

20 Bitmap of sequence Define B(s) as the bitmap for sequence s. In sequence s I f the last itemset is in transaction j and the other itemsets is in transaction before j Then set 1 , otherwise set 0 Example1: Customer ID Transaction ID Itemset 11{b} 12{d} 13{e} 14{c} ({b},{c}) 2014/11/20 20

21 Bitmap of sequence Define B(s) as the bitmap for sequence s. In sequence s I f the last itemset is in transaction j and the other itemsets is in transaction before j Then set 1 , otherwise set 0 Example1: Customer ID Transaction ID Itemset 11{b} 12{d} 13{e} 14{c} ({b},{c}) 2014/11/20 21

22 Bitmap of sequence Define B(s) as the bitmap for sequence s. In sequence s I f the last itemset is in transaction j and the other itemsets is in transaction before j Then set 1 , otherwise set 0 Example1: Customer ID Transaction ID Itemset 11{b} 12{d} 13{e} 14{c} ({b},{c}) /11/20 22

23 Bitmap of sequence Define B(s) as the bitmap for sequence s. In sequence s I f the last itemset is in transaction j and the other itemsets is in transaction before j Then set 1 , otherwise set 0 Example1: Customer ID Transaction ID Itemset 11{b} 12{d} 13{e} 14{c} ({b},{c}) /11/20 23

24 Example2 Customer ID Transaction ID Itemset 11{a,b,d} 13{b,c,d} ({a},{b,d}) /11/20 24

25 S-step Process Step 1 : S-Step Process to construct the transformed bitmap ({a}) s Step 2 : ANDing B({a}) s and B({b}) s Support=2 2014/11/20 25

26 S-step Process Step 1:S-Step Process to construct the transformed bitmap ({a}) s Step 2:ANDing B({a}) s and B({b}) s 2014/11/20 26

27 I-step Process Support=2 2014/11/20 27

28 I-step Process 2014/11/20 28

29 Experimental 2014/11/20 29

30 Comparison With SPADE and PrefixSpan Method-1 Compare for various minimum support values on  Small datasets  Medium datasets  Large datasets Methods-2 Compare several parameters in the dataset  Number of customers  Number of transactions per customer  Number of items per transaction  Average length of the maximal sequences 2014/11/20 30

31 2014/11/20 31

32 Conclusion & Discussion 2014/11/20 32

33 CONCLUSION ALGORITHM Outperforms SPADE and PrefixSpan on large datasets Faster then SPADE and PrefixSpan DATA REPRESENTATION Bitmap representation S-step/I-step traversal S-step/I-step pruning Especially efficient when the sequential patterns are very long 2014/11/20 33

34 Implement SPAM algorithm SPMF is an mining mining framework  Written in Java/Open-source data  /11/20 34

35 DISCUSSION 1. SPAM assumes that the entire database completely fit into main memory, what is the solution ? 2. Why they set the size of a sequence between 2 k +1 and 2 k+1 ? 2014/11/20 35


Download ppt "Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University."

Similar presentations


Ads by Google