Download presentation

Published byAmira Jerry Modified over 2 years ago

1
PREFIXSPAN ALGORITHM Mining Sequential Patterns Efficiently by Prefix- Projected Pattern Growth

2
**Contet 1. Introduction 2. Problem statement 3. Existing Solutions**

4. Proposed Solution 5. Algorithm 6. Conclusion 7. References

3
**Introduction Given a set of sequences, where**

each sequence consists of a list of elements and each element consists of set of items. <a(abc)(ac)d(cf)> - 5 elements, 9 items <a(abc)(ac)d(cf)> - 9-sequence <a(abc)(ac)d(cf)> ≠ <a(ac)(abc)d(cf)> id Sequence 10 <a(abc)(ac)d(cf)> 20 <(ad)c(bc)(ae)> 30 <(ef)(ab)(df)cb> 40 <eg(af)cbc>

4
**Subsequence vs. super sequence**

Given two sequences α=<a1a2…an> and β=<b1b2…bm>. α is called a subsequence of β, denoted as α⊆ β, if there exist integers 1≤j1<j2<…<jn≤m such that a1⊆bj1, a2 ⊆bj2,…, an⊆bjn . β is a super sequence of α. Example: β =<a(abc)(ac)d(cf)> Correct : α1=<aa(ac)d(c)> α2=<(ac)(ac)d(cf)> α3=<ac> Not correct: α4=<df(cf)> α5=<(cf)d> α6=<(abc)dcf

5
**Sequential Pattern Mining**

Find all the frequent subsequences, i.e. the subsequences whose occurrence frequency in the set of sequences is no less than min_support (user-specified). Solution – 53 frequent subsequences: <a><aa> <ab> <a(bc)> <a(bc)a> <aba> <abc> <(ab)> <(ab)c> <(ab)d> <(ab)f> <(ab)dc> <ac> <aca> <acb> <acc> <ad> <adc> <af> <b> <ba> <bc> <(bc)> <(bc)a> <bd> <bdc> <bf> <c> <ca> <cb> <cc> <d> <db> <dc> <dcb> <e> <ea> <eab> <eac> <eacb> <eb> <ebc> <ec> <ecb> <ef> <efb> <efc> <efcb> <f> <fb> <fbc> <fc> <fcb> id Sequence 10 <a(abc)(ac)d(cf)> 20 <(ad)c(bc)(ae)> 30 <(ef)(ab)(df)cb> 40 <eg(af)cbc> min_support = 2

6
Existing Solutions Apriori-like approaches (AprioriSome (1995.), AprioriAll (1995.), DynamicSome (1995.), GSP (1996.)): Potentially huge set of candidate sequences, Multiple scans of databases, Difficulties at mining long sequential patterns. FreeSpan (2000.) - pattern groth method (Frequent pattern-projected Sequential pattern mining) General idea is to use frequent items to recursively project sequence databases into a smaller projected databases , and grow subsequence fragments in each projected database. PrefixSpan (Prefix-projected Sequential pattern mining) Less projections and quickly shrinking sequence.

7
Prefix Given two sequences α=<a1a2…an> and β=<b1b2…bm>, m≤n. Sequence β is called a prefix of α if and only if: bi= ai for i ≤ m-1; bm ⊆ am; Example : α =<a(abc)(ac)d(cf)> β =<a(abc)a>

8
**Projection Given sequences α and β, such that β is a subsequence of α.**

A subsequence α’ of sequence α is called a projection of α w.r.t. β prefix if and only if: α’ has prefix β; There exist no proper super-sequence α’’ of α’ such that: α’’ is a subsequence of α and also has prefix β. Example: α =<a(abc)(ac)d(cf)> β =<(bc)a> α’ =<(bc)(ac)d(cf)>

9
**Postfix Let α’ =<a1,a2…an> be the projection of α w.r.t. prefix**

β=<a1a2…am-1a’m> (m ≤n) . Sequence γ=<a’’mam+1…an> is called the postfix of α w.r.t. prefix β, denoted as γ= α/ β, where a’’m=(am - a’m). We also denote α =β ⋅ γ. Example: α’ =<a(abc)(ac)d(cf)>, β =<a(abc)a>, γ=<(_c)d(cf)>.

10
**PrefixSpan – Algorithm**

Input of the algorithm : A sequence database S, and the minimum support threshold min_support. Output of the algorithm: The complete set of sequential patterns. Subroutine: PrefixSpan(α, L, S|α). Parameters: α: sequential pattern, L: the length of α; S|α: : the α-projected database, if α ≠<>; otherwise; the sequence database S. Call PrefixSpan(<>,0,S). id Sequence 10 <a(abc)(ac)d(cf)> 20 <(ad)c(bc)(ae)> 30 <(ef)(ab)(df)cb> 40 <eg(af)cbc>

11
**PrefixSpan – Algorithm (2)**

Method: 1. Scan S|α once, find the set of frequent items b such that: b can be assembled to the last element of α to form a sequential pattern; or <b> can be appended to α to form a sequential pattern. 2. For each frequent item b: append it to α to form a sequential pattern α’ and output α’; output α’; 3. For each α’: construct α’-projected database S|α’ and call PrefixSpan(α’, L+1, S|α’).

12
**PrefixSpan - Example 1. Find length1sequential patterns:**

2. Divide search space id Sequence 10 <a(abc)(ac)d(cf)> 20 <(ad)c(bc)(ae)> 30 <(ef)(ab)(df)cb> 40 <eg(af)cbc> <a> <b> <c> <d> <e> <f> <g> 4 3 1 <a><b><c><d><e><f> Prefix <a> <(abc)(ac)d(cf)> <(_d)c(bc)(ae)> <(_b)(df)cb> <(_f)cbc> <b> <(_c)(ac)d(cf)> <(_c)(ae)> <(df)cb> <c> <c> <(ac)d(cf)> <(bc)(ae)> <b> <bc> <d> <(cf)> <c(bc)(ae)> <(_f)cb> <e> <(_f)(ab)(df)cb> <(af)cbc> <f> <(ab)(df)cb> <cbc>

13
**PrefixSpan – Example (2)**

Find subsets of sequential patterns: <d> <(cf)> <c(bc)(ae)> <(_f)cb> <a> <b> <c> <d> <e> <f> <_f> 1 2 3 <db> <dc> <db> <(_c)(ae)> <dc> <(bc)(ae)> <b> <b> <a> <e> <c> 2 1 <dcb> <dcb> <>

14
**Conclusions PrefixSpan Efficient pattern growth method.**

Outperforms both GSP and FreeSpan. Explores prefix-projection in sequential pattern mining. Mines the complete set of patterns, but reduces the effort of candidate subsequence generation. Prefix-projection reduces the size of projected database and leads to efficient processing. Bi-level projection and pseudo-projection may improve mining efficiency.

15
References Pei J., Han J., Mortazavi-Asl J., Pinto H., Chen Q., Dayal U., Hsu M., “PrefixSpan: Mining Sequential Patterns Efficiently by Prefix- Projected Pattern Growth”, 17th International Conference on Data Engineering (ICDE), April 2001. Agrawal R., Srikant R., “Mining sequential patterns”, Proceedings Int. Conf. Very Large Data Bases (VLDB’94), pp , 1999. Han J., Dong G., Mortazavi-Asl B., Chen Q., Dayal U., Hsu M.-C., ”Freespan: Frequent pattern-projected sequential pattern mining”, Proceedings 2000 Int. Conf. Knowledge Discovery and Data Mining (KDD’00), pp , 2000. Wojciech Stach, “http://webdocs.cs.ualberta.ca/~zaiane/courses/cmput /slides/PrefixSpan-Wojciech.pdf”.

16
**Thank you for attention. Questions**

Thank you for attention. Questions? Lazar Arsić 2011/3301

Similar presentations

OK

CloSpan: Mining Closed Sequential Patterns in Large Datasets Xifeng Yan, Jiawei Han and Ramin Afshar Proceedings of 2003 SIAM International Conference.

CloSpan: Mining Closed Sequential Patterns in Large Datasets Xifeng Yan, Jiawei Han and Ramin Afshar Proceedings of 2003 SIAM International Conference.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on current account deficit us Ppt on shipping corporation of india Ppt on social contract theory thomas Ppt on magnetic levitation transportation Ppt on unity in diversity dance Ppt on self awareness theory Create ppt on ipad 2 Ppt on solid dielectrics and waves Ppt on online banking management system Ppt on international business environment