# Sequential PAttern Mining using A Bitmap Representation

## Presentation on theme: "Sequential PAttern Mining using A Bitmap Representation"— Presentation transcript:

Sequential PAttern Mining using A Bitmap Representation
2014/11/20 Sequential PAttern Mining using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University (SIGKDD 2002) Presenter 李佩書 P 楊璨瑜 P 陳奕廷 P 李昕純 Q

Outline Introduction The SPAM algorithm Data representation
2014/11/20 Outline Introduction The SPAM algorithm Data representation Experimental Conclusion & Discussion

2014/11/20 Introduction

Sequential Patterns R. Agrawal and R. Srikant.(In ICDE 1995)
2014/11/20 Sequential Patterns R. Agrawal and R. Srikant.(In ICDE 1995) Algorithm：AprioriALL, AprioriSOME, PrefixSpan…

Problem Mining sequential patterns Given a minimum support minSup
2014/11/20 Problem Mining sequential patterns Given a minimum support minSup Find all frequent sequential patterns Sa supD(Sa) ≥ minSup

SPAM Algorithm Sequential PAttern Mining Algorithm
2014/11/20 SPAM Algorithm Sequential PAttern Mining Algorithm The first DFS(depth-first search) strategy for mining sequential patterns Vertical bitmap representation for simple, efficient counting.

2014/11/20 The SPAM Algorithm

Lexicographic Tree Sequence-extended Sequence (S-step)
2014/11/20 Lexicographic Tree Sequence-extended Sequence (S-step) Generate by adding a new transaction consisting of a single item to the end of sequence Ex: ({a, b, c}, {a, b})→({a, b, c}, {a, b}, {a}) Itemset-extended sequence (I-step) Generate by adding an item to the last itemset in the sequence Ex 1: ({a, b, c}, {a, b}) →({a, b, c}, {a, b, d}) Ex 2: ({a, b, c}, {a, b, d}) →({a, b, c}, {a, b, d, c}) Identifies two sets of each node n Sn: the set of candidate items for S-step extensions In: the set of candidate items for I-step extensions

2014/11/20 I={a,b}

Pruning Apriori-Based Minimizing the size of Sn and In
2014/11/20 Pruning Apriori-Based Minimizing the size of Sn and In Pruning candidate by DFS. S-step Pruning I-step Pruning

S-step Pruning S({a}) = {a, b, c, d} I({a}) = {b, c, d}
2014/11/20 S-step Pruning S({a}) = {a, b, c, d} I({a}) = {b, c, d} S({a}, {a}) = S({a}, {b}) = {a, b, c, d} I({a}, {a}) = {b, c, d} I({a}, {b}) = {c, d}

I-step Pruning S({a, b}) = S({a, d}) = {a, b} I({a}, {b}) = {c, d}
2014/11/20 I-step Pruning S({a, b}) = S({a, d}) = {a, b} I({a}, {b}) = {c, d} I({a}, {d}) = {}

2014/11/20

2014/11/20 Data Representation

We store each candidate sequence as a vertical bitmap
2014/11/20 We store each candidate sequence as a vertical bitmap Each customer is assigned a fixed slice of each bitmap for all of its transactions If the size of a sequence between 2k+1 and 2k+1 2k+1-bit sequence

2014/11/20 Bitmap of itemset {a} {b} {a,b} 1 1 1 &

Bitmap of sequence Define B(s) as the bitmap for sequence s. Example1:
2014/11/20 Bitmap of sequence Define B(s) as the bitmap for sequence s. In sequence s If the last itemset is in transaction j and the other itemsets is in transaction before j Then set 1，otherwise set 0 Example1: Customer ID Transaction ID Itemset 1 {b} 2 {d} 3 {e} 4 {c} ({b},{c})

Bitmap of sequence Define B(s) as the bitmap for sequence s. Example1:
2014/11/20 Bitmap of sequence Define B(s) as the bitmap for sequence s. In sequence s If the last itemset is in transaction j and the other itemsets is in transaction before j Then set 1，otherwise set 0 Example1: Customer ID Transaction ID Itemset 1 {b} 2 {d} 3 {e} 4 {c} ({b},{c})

Bitmap of sequence Define B(s) as the bitmap for sequence s. Example1:
2014/11/20 Bitmap of sequence Define B(s) as the bitmap for sequence s. In sequence s If the last itemset is in transaction j and the other itemsets is in transaction before j Then set 1，otherwise set 0 Example1: Customer ID Transaction ID Itemset 1 {b} 2 {d} 3 {e} 4 {c} ({b},{c})

Bitmap of sequence Define B(s) as the bitmap for sequence s. Example1:
2014/11/20 Bitmap of sequence Define B(s) as the bitmap for sequence s. In sequence s If the last itemset is in transaction j and the other itemsets is in transaction before j Then set 1，otherwise set 0 Example1: Customer ID Transaction ID Itemset 1 {b} 2 {d} 3 {e} 4 {c} ({b},{c})

Bitmap of sequence Define B(s) as the bitmap for sequence s. Example1:
2014/11/20 Bitmap of sequence Define B(s) as the bitmap for sequence s. In sequence s If the last itemset is in transaction j and the other itemsets is in transaction before j Then set 1，otherwise set 0 Example1: Customer ID Transaction ID Itemset 1 {b} 2 {d} 3 {e} 4 {c} ({b},{c})

Bitmap of sequence Define B(s) as the bitmap for sequence s. Example1:
2014/11/20 Bitmap of sequence Define B(s) as the bitmap for sequence s. In sequence s If the last itemset is in transaction j and the other itemsets is in transaction before j Then set 1，otherwise set 0 Example1: Customer ID Transaction ID Itemset 1 {b} 2 {d} 3 {e} 4 {c} ({b},{c}) 1

Bitmap of sequence Define B(s) as the bitmap for sequence s. Example1:
2014/11/20 Bitmap of sequence Define B(s) as the bitmap for sequence s. In sequence s If the last itemset is in transaction j and the other itemsets is in transaction before j Then set 1，otherwise set 0 Example1: Customer ID Transaction ID Itemset 1 {b} 2 {d} 3 {e} 4 {c} ({b},{c}) 1

Example2 ({a},{b,d}) Customer ID Transaction ID Itemset 1 {a,b,d} 3
2014/11/20 Example2 Customer ID Transaction ID Itemset 1 {a,b,d} 3 {b,c,d} 6 -- ({a},{b,d}) 1

2014/11/20 S-step Process Step 1 : S-Step Process to construct the transformed bitmap ({a})s Step 2 : ANDing B({a})s and B({b})s Support=2

2014/11/20 S-step Process Step 1:S-Step Process to construct the transformed bitmap ({a})s Step 2:ANDing B({a}) s and B({b})s

2014/11/20 I-step Process Support=2

2014/11/20 I-step Process

2014/11/20 Experimental

2014/11/20 Comparison With SPADE and PrefixSpan Method-1 Compare for various minimum support values on Small datasets Medium datasets Large datasets Methods-2 Compare several parameters in the dataset Number of customers Number of transactions per customer Number of items per transaction Average length of the maximal sequences

2014/11/20

Conclusion & Discussion
2014/11/20 Conclusion & Discussion

CONCLUSION ALGORITHM DATA REPRESENTATION
2014/11/20 CONCLUSION ALGORITHM Outperforms SPADE and PrefixSpan on large datasets Faster then SPADE and PreﬁxSpan DATA REPRESENTATION Bitmap representation S-step/I-step traversal S-step/I-step pruning Especially efficient when the sequential patterns are very long

Implement SPAM algorithm
2014/11/20 Implement SPAM algorithm SPMF is an mining mining framework Written in Java/Open-source data Philippe-Fournier-Viger, Antonio Gomariz, Ted Gueniche, Azadeh Soltani, Cheng-Wei Wu and Vincent S. Tseng, "SPMF: a Java Open-Source Pattern Mining Library," accepted and to appear in Journal of Machine Learning Research.

2014/11/20 DISCUSSION SPAM assumes that the entire database completely fit into main memory, what is the solution ? Why they set the size of a sequence between 2k+1 and 2k+1 ?