Presentation is loading. Please wait.

Presentation is loading. Please wait.

1/16 PREFIXSPAN ALGORITHM Mining Sequential Patterns Efficiently by Prefix- Projected Pattern Growth

Similar presentations


Presentation on theme: "1/16 PREFIXSPAN ALGORITHM Mining Sequential Patterns Efficiently by Prefix- Projected Pattern Growth"— Presentation transcript:

1 1/16 PREFIXSPAN ALGORITHM Mining Sequential Patterns Efficiently by Prefix- Projected Pattern Growth

2 2/16 2. Problem statement 5. Algorithm 1. Introduction 3. Existing Solutions 4. Proposed Solution 6. Conclusion 7. References Contet

3 3/16 Introduction Given a set of sequences, where each sequence consists of a list of elements and each element consists of set of items. ◦ - 5 elements, 9 items ◦ - 9-sequence ◦ ≠ id Sequence

4 4/16 Subsequence vs. super sequence Given two sequences α = and β =. α is called a subsequence of β, denoted as α ⊆ β, if there exist integers 1≤j 1

5 5/16 Sequential Pattern Mining Find all the frequent subsequences, i.e. the subsequences whose occurrence frequency in the set of sequences is no less than min_support (user-specified). id Sequence Solution – 53 frequent subsequences: min_support = 2

6 6/16 Existing Solutions Apriori-like approaches (AprioriSome (1995.), AprioriAll (1995.), DynamicSome (1995.), GSP (1996.)): ◦ Potentially huge set of candidate sequences, ◦ „ Multiple scans of databases, ◦ „ Difficulties at mining long sequential patterns. FreeSpan (2000.) - pattern groth method (Frequent pattern-projected Sequential pattern mining) General idea is to use frequent items to recursively project sequence databases into a smaller projected databases, and grow subsequence fragments in each projected database. PrefixSpan (Prefix-projected Sequential pattern mining) ◦ „ Less projections and quickly shrinking sequence.

7 7/16 Prefix Given two sequences α = and β =, m≤n. Sequence β is called a prefix of α if and only if: ◦ b i = a i for i ≤ m-1; ◦ b m ⊆ a m ; Example : ◦ α = ◦ β =

8 8/16 Projection Given sequences α and β, such that β is a subsequence of α. A subsequence α ’ of sequence α is called a projection of α w.r.t. β prefix if and only if: ◦ α ’ has prefix β ; ◦ There exist no proper super-sequence α ’’ of α ’ such that: α ’’ is a subsequence of α and also has prefix β. Example: α = β = α ’ =

9 9/16 Postfix Let α ’ = be the projection of α w.r.t. prefix β = (m ≤n) „. Sequence γ = is called the postfix of α w.r.t. prefix β, denoted as γ = α / β, where a’’ m =(a m - a’ m ).„ We also denote α = β ⋅ γ. Example: α ’ =, β =, γ =.

10 10/16 PrefixSpan – Algorithm Input of the algorithm : A sequence database S, and the minimum support threshold min_support. „Output of the algorithm: The complete set of sequential patterns. „Subroutine: PrefixSpan( α, L, S| α ). Parameters: ◦ α : sequential pattern, ◦ „L: the length of α ; ◦ „ S| α : : the α -projected database, if α ≠<>; otherwise; the sequence database S. Call PrefixSpan(<>,0,S). id Sequence

11 11/16 PrefixSpan – Algorithm (2) Method: 1. Scan S | α once, find the set of frequent items b such that: ◦ b can be assembled to the last element of α to form a sequential pattern; or ◦ can be appended to α to form a sequential pattern. 2. For each frequent item b: ◦ append it to α to form a sequential pattern α ’ and output α ’; ◦ output α ’; 3. For each α ’: ◦ construct α ’-projected database S | α ’ and ◦ call PrefixSpan( α ’, L+1, S | α ’ ).

12 12/16 1. Find length1sequential patterns: 2. Divide search space Prefix id Sequence PrefixSpan - Example

13 13/16 PrefixSpan – Example (2) Find subsets of sequential patterns: <>

14 14/16 Conclusions PrefixSpan ◦ „ Efficient pattern growth method. ◦ „ Outperforms both GSP and FreeSpan. ◦ „ Explores prefix-projection in sequential pattern mining. ◦ „ Mines the complete set of patterns, but reduces the effort of candidate subsequence generation. ◦ „ Prefix-projection reduces the size of projected database and leads to efficient processing. ◦ „ Bi-level projection and pseudo-projection may improve mining efficiency.

15 15/16 References Pei J., Han J., Mortazavi-Asl J., Pinto H., Chen Q., Dayal U., Hsu M., “PrefixSpan: Mining Sequential Patterns Efficiently by Prefix- Projected Pattern Growth”, 17th International Conference on Data Engineering (ICDE), April Agrawal R., Srikant R., “Mining sequential patterns”, Proceedings 1995 Int. Conf. Very Large Data Bases (VLDB’94), pp , Han J., Dong G., Mortazavi-Asl B., Chen Q., Dayal U., Hsu M.-C., ”Freespan: Frequent pattern-projected sequential pattern mining”, Proceedings 2000 Int. Conf. Knowledge Discovery and Data Mining (KDD’00), pp , Wojciech Stach, “http://webdocs.cs.ualberta.ca/~zaiane/courses/cmput /slides/PrefixSpan-Wojciech.pdf”.

16 16/16 Thank you for attention. Questions? Lazar Arsić 2011/3301


Download ppt "1/16 PREFIXSPAN ALGORITHM Mining Sequential Patterns Efficiently by Prefix- Projected Pattern Growth"

Similar presentations


Ads by Google