Download presentation

Presentation is loading. Please wait.

Published byBrent Packard Modified over 2 years ago

1
Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University (SIGKDD 2002) Presenter 李佩書 P 楊璨瑜 P 陳奕廷 P 李昕純 Q /11/20 1

2
Outline 1. Introduction 2. The SPAM algorithm 3. Data representation 4. Experimental 5. Conclusion & Discussion 2014/11/20 2

3
Introduction 2014/11/20 3

4
Sequential Patterns R. Agrawal and R. Srikant.(In ICDE 1995) Algorithm ： AprioriALL, AprioriSOME, PrefixSpan… 2014/11/20 4

5
Problem 2014/11/20 5

6
SPAM Algorithm Sequential PAttern Mining Algorithm The first DFS(depth-first search) strategy for mining sequential patterns Vertical bitmap representation for simple, efficient counting. 2014/11/20 6

7
The SPAM Algorithm 2014/11/20 7

8
Lexicographic Tree Sequence-extended Sequence (S-step) Generate by adding a new transaction consisting of a single item to the end of sequence Ex: ({a, b, c}, {a, b})→({a, b, c}, {a, b}, {a}) Itemset-extended sequence (I-step) Generate by adding an item to the last itemset in the sequence Ex 1: ({a, b, c}, {a, b}) →({a, b, c}, {a, b, d}) Ex 2: ({a, b, c}, {a, b, d}) →({a, b, c}, {a, b, d, c}) Identifies two sets of each node n S n : the set of candidate items for S-step extensions I n : the set of candidate items for I-step extensions 2014/11/20 8

9
I={a,b} 2014/11/20 9

10
Pruning Apriori-Based Minimizing the size of S n and I n Pruning candidate by DFS. S-step Pruning I-step Pruning 2014/11/20 10

11
S-step Pruning S ({a}) = {a, b, c, d} I ({a}) = {b, c, d} S ({a}, {a}) = S ({a}, {b}) = {a, b, c, d} I ({a}, {a}) = {b, c, d} I ({a}, {b}) = {c, d} 2014/11/20 11

12
I-step Pruning S ({a, b}) = S ({a, d}) = {a, b} I ({a}, {b}) = {c, d} I ({a}, {d}) = {} 2014/11/20 12

13
2014/11/20 13

14
Data Representation 2014/11/20 14

15
If the size of a sequence between 2 k +1 and 2 k+1 2 k+1 -bit sequence We store each candidate sequence as a vertical bitmap Each customer is assigned a fixed slice of each bitmap for all of its transactions 2014/11/20 15

16
Bitmap of itemset {a} {b} {a,b} & 2014/11/20 16

17
Bitmap of sequence Define B(s) as the bitmap for sequence s. In sequence s If the last itemset is in transaction j and the other itemsets is in transaction before j Then set 1 ， otherwise set 0 Example1: Customer ID Transaction ID Itemset 11{b} 12{d} 13{e} 14{c} ({b},{c}) 2014/11/20 17

18
Bitmap of sequence Define B(s) as the bitmap for sequence s. In sequence s If the last itemset is in transaction j and the other itemsets is in transaction before j Then set 1 ， otherwise set 0 Example1: Customer ID Transaction ID Itemset 11{b} 12{d} 13{e} 14{c} ({b},{c}) 2014/11/20 18

19
Bitmap of sequence Define B(s) as the bitmap for sequence s. In sequence s I f the last itemset is in transaction j and the other itemsets is in transaction before j Then set 1 ， otherwise set 0 Example1: Customer ID Transaction ID Itemset 11{b} 12{d} 13{e} 14{c} ({b},{c}) 2014/11/20 19

20
Bitmap of sequence Define B(s) as the bitmap for sequence s. In sequence s I f the last itemset is in transaction j and the other itemsets is in transaction before j Then set 1 ， otherwise set 0 Example1: Customer ID Transaction ID Itemset 11{b} 12{d} 13{e} 14{c} ({b},{c}) 2014/11/20 20

21
Bitmap of sequence Define B(s) as the bitmap for sequence s. In sequence s I f the last itemset is in transaction j and the other itemsets is in transaction before j Then set 1 ， otherwise set 0 Example1: Customer ID Transaction ID Itemset 11{b} 12{d} 13{e} 14{c} ({b},{c}) 2014/11/20 21

22
Bitmap of sequence Define B(s) as the bitmap for sequence s. In sequence s I f the last itemset is in transaction j and the other itemsets is in transaction before j Then set 1 ， otherwise set 0 Example1: Customer ID Transaction ID Itemset 11{b} 12{d} 13{e} 14{c} ({b},{c}) /11/20 22

23
Bitmap of sequence Define B(s) as the bitmap for sequence s. In sequence s I f the last itemset is in transaction j and the other itemsets is in transaction before j Then set 1 ， otherwise set 0 Example1: Customer ID Transaction ID Itemset 11{b} 12{d} 13{e} 14{c} ({b},{c}) /11/20 23

24
Example2 Customer ID Transaction ID Itemset 11{a,b,d} 13{b,c,d} ({a},{b,d}) /11/20 24

25
S-step Process Step 1 : S-Step Process to construct the transformed bitmap ({a}) s Step 2 : ANDing B({a}) s and B({b}) s Support=2 2014/11/20 25

26
S-step Process Step 1:S-Step Process to construct the transformed bitmap ({a}) s Step 2:ANDing B({a}) s and B({b}) s 2014/11/20 26

27
I-step Process Support=2 2014/11/20 27

28
I-step Process 2014/11/20 28

29
Experimental 2014/11/20 29

30
Comparison With SPADE and PrefixSpan Method-1 Compare for various minimum support values on Small datasets Medium datasets Large datasets Methods-2 Compare several parameters in the dataset Number of customers Number of transactions per customer Number of items per transaction Average length of the maximal sequences 2014/11/20 30

31
2014/11/20 31

32
Conclusion & Discussion 2014/11/20 32

33
CONCLUSION ALGORITHM Outperforms SPADE and PrefixSpan on large datasets Faster then SPADE and PreﬁxSpan DATA REPRESENTATION Bitmap representation S-step/I-step traversal S-step/I-step pruning Especially efficient when the sequential patterns are very long 2014/11/20 33

34
Implement SPAM algorithm SPMF is an mining mining framework Written in Java/Open-source data /11/20 34

35
DISCUSSION 1. SPAM assumes that the entire database completely fit into main memory, what is the solution ? 2. Why they set the size of a sequence between 2 k +1 and 2 k+1 ? 2014/11/20 35

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google