PREFIXSPAN ALGORITHM Mining Sequential Patterns Efficiently by Prefix- Projected Pattern Growth al113301m@student.etf.rs.

Slides:

Advertisements

Similar presentations

An Efficient Algorithm for Mining Time Interval-based Patterns in Large Databases Yi-Cheng Chen, Ji-Chiang Jiang, Wen-Chih Peng and Suh-Yin Lee Department.

Advertisements

Sequential PAttern Mining using A Bitmap Representation

1 © R. Guerraoui The Limitations of Registers R. Guerraoui Distributed Programming Laboratory.

Data Mining (Apriori Algorithm)DCS 802, Spring DCS 802 Data Mining Apriori Algorithm Spring of 2002 Prof. Sung-Hyuk Cha School of Computer Science.

Mining Frequent Patterns II: Mining Sequential & Navigational Patterns Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

Differentially Private Transit Data Publication: A Case Study on the Montreal Transportation System ` Introduction With the deployment of smart card automated.

Rule Discovery from Time Series Presented by: Murali K. Kadimi.

Exercise 1.1(Paid Exchanges Necessary) Consider the three-element list with the following initial configuration: (i.e. x1 is at the front). Prove that.

Zhou Zhao, Da Yan and Wilfred Ng

Edi Winarko, John F. Roddick

IncSpan: Incremental Mining of Sequential Patterns in Large Databases Hong Cheng,Xifeng Yan,Jiawei Han University of Illinois at Urbana-Champaign.

Christoph F. Eick Questions and Topics Review Dec. 1, Give an example of a problem that might benefit from feature creation 2.Compute the Silhouette.

Multi-dimensional Sequential Pattern Mining

Sequential Pattern Mining

CSE115: Introduction to Computer Science I Dr. Carl Alphonce 343 Davis Hall

Temporal Pattern Matching of Moving Objects for Location-Based Service GDM Ronald Treur14 October 2003.

4/3/01CS632 - Data Mining1 Data Mining Presented By: Kevin Seng.

Mining Sequential Patterns Rakesh Agrawal Ramakrishnan Srikant Proc. of the Int’l Conference on Data Engineering (ICDE) March 1995 Presenter: Phil Schlosser.

Mining Long Sequential Patterns in a Noisy Environment Jiong Yang, Wei Wang, Philip S. Yu, Jiawei Han SIGMOD 2002.

Pattern-growth Methods for Sequential Pattern Mining: Principles and Extensions Jiawei Han (UIUC) Jian Pei (Simon Fraser Univ.)

Mining Sequential Patterns: Generalizations and Performance Improvements R. Srikant R. Agrawal IBM Almaden Research Center Advisor: Dr. Hsu Presented by:

What Is Sequential Pattern Mining?

USpan: An Efficient Algorithm for Mining High Utility Sequential Patterns Authors: Junfu Yin, Zhigang Zheng, Longbing Cao In: Proceedings of the 18th ACM.

October 2, 2015 Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 8 — 8.3 Mining sequence patterns in transactional.

Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )

Mining Multidimensional Sequential Patterns over Data Streams Chedy Raїssi and Marc Plantevit DaWak_2008.

Lecture Set 12 Sequential Files and Structures Part C – Reading and Writing Binary Files.

MINING FREQUENT ITEMSETS IN A STREAM TOON CALDERS, NELE DEXTERS, BART GOETHALS ICDM2007 Date: 5 June 2008 Speaker: Li, Huei-Jyun Advisor: Dr. Koh, Jia-Ling.

Mining Sequential Patterns Rakesh Agrawal Ramakrishnan Srikant Proc. of the Int ’ l Conference on Data Engineering (ICDE) March 1995 Presenter: Sam Brown.

Data Mining Association Rules: Advanced Concepts and Algorithms

Pattern-Growth Methods for Sequential Pattern Mining Iris Zhang

Sequential Pattern Mining

Alva Erwin Department ofComputing Raj P. Gopalan, and N.R. Achuthan Department of Mathematics and Statistics Curtin University of Technology Kent St. Bentley.

CEMiner – An Efficient Algorithm for Mining Closed Patterns from Time Interval-based Data Yi-Cheng Chen, Wen-Chih Peng and Suh-Yin Lee ICDM 2011.

Jian Pei Jiawei Han Behzad Mortazavi-Asl Helen Pinto ICDE’01

Generalized Sequential Pattern Mining with Item Intervals Yu Hirate Hayato Yamana PAKDD2006.

Frequent Sequential Attack Patterns of Malware in Botnets Nur Rohman Rosyid.

1 Section 2.1 Algorithms. 2 Algorithm A finite set of precise instructions for performing a computation or for solving a problem.

1 Efficient Algorithms for Incremental Update of Frequent Sequences Minghua ZHANG Dec. 7, 2001.

Clustering Sequential Data: Research Paper Review Presented by Glynis Hawley April 28, 2003 On the Optimal Clustering of Sequential Data by Cheng-Ru Lin.

CloSpan: Mining Closed Sequential Patterns in Large Datasets Xifeng Yan, Jiawei Han and Ramin Afshar Proceedings of 2003 SIAM International Conference.

Introduction to Real Analysis Dr. Weihu Hong Clayton State University 9/18/2008.

1 Mining Sequential Patterns with Constraints in Large Database Jian Pei, Jiawei Han,Wei Wang Proc. of the 2002 IEEE International Conference on Data Mining.

Intelligent Database Systems Lab Advisor ： Dr.Hsu Graduate ： Keng-Wei Chang Author ： Salvatore Orlando Raffaele Perego Claudio Silvestri 國立雲林科技大學 National.

SeqStream: Mining Closed Sequential Pattern over Stream Sliding Windows Lei Chang Tengjiao Wang Dongqing Yang Hua Luan ICDM’08 Lei Chang Tengjiao Wang.

Review Examples. 1.1 Patterns and Inductive Reasoning “Our character is basically a composite of our habits. Because they are consistent, often.

Sets and Basic Operations on Sets Notation A set will usually be denoted by a capital letter, such as, A,B,X, Y,..., whereas lower-case letters, a, b,

PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth Jiawei Han, Jian Pei, Helen Pinto, Behzad Mortazavi-Asl, Qiming Chen,

Section 11.2 Arithmetic Sequences and Series Copyright ©2013, 2009, 2006, 2005 Pearson Education, Inc.

A POCKET GUIDE TO PUBLIC SPEAKING 4TH EDITION Chapter 13

Mining Sequential Patterns With Item Constraints

Sequential Pattern Mining

Reduce the line space; we would like to see minimum 2 line of products

A POCKET GUIDE TO PUBLIC SPEAKING 4TH EDITION Chapter 13

Sequential Pattern Mining Using A Bitmap Representation

Advanced Pattern Mining 02

Jiawei Han Department of Computer Science

Mining Sequential Patterns

Mining Complex Data COMP Seminar Spring 2011.

Mining Access Pattrens Efficiently from Web Logs Jian Pei, Jiawei Han, Behzad Mortazavi-asl, and Hua Zhu 2000년 5월 26일 DE Lab. 윤지영.

1Affiliation; 2Affiliation; ( )

Data Warehousing Mining & BI

Chapter 12: Selecting an organizational pattern

Mining Sequential Patterns

CS145 Discussion Week 9 Yunsheng, Shengming 03/08/2019.

THE GROWTH ENVIRONMENT.

Presentation transcript:

PREFIXSPAN ALGORITHM Mining Sequential Patterns Efficiently by Prefix- Projected Pattern Growth al113301m@student.etf.rs

Contet 1. Introduction 2. Problem statement 3. Existing Solutions 4. Proposed Solution 5. Algorithm 6. Conclusion 7. References al113301m@student.etf.rs

Introduction Given a set of sequences, where each sequence consists of a list of elements and each element consists of set of items. <a(abc)(ac)d(cf)> - 5 elements, 9 items <a(abc)(ac)d(cf)> - 9-sequence <a(abc)(ac)d(cf)> ≠ <a(ac)(abc)d(cf)> id Sequence 10 <a(abc)(ac)d(cf)> 20 <(ad)c(bc)(ae)> 30 <(ef)(ab)(df)cb> 40 <eg(af)cbc> al113301m@student.etf.rs

Subsequence vs. super sequence Given two sequences α=<a1a2…an> and β=<b1b2…bm>. α is called a subsequence of β, denoted as α⊆ β, if there exist integers 1≤j1<j2<…<jn≤m such that a1⊆bj1, a2 ⊆bj2,…, an⊆bjn . β is a super sequence of α. Example: β =<a(abc)(ac)d(cf)> Correct : α1=<aa(ac)d(c)> α2=<(ac)(ac)d(cf)> α3=<ac> Not correct: α4=<df(cf)> α5=<(cf)d> α6=<(abc)dcf al113301m@student.etf.rs

Sequential Pattern Mining Find all the frequent subsequences, i.e. the subsequences whose occurrence frequency in the set of sequences is no less than min_support (user-specified). Solution – 53 frequent subsequences: <a><aa> <ab> <a(bc)> <a(bc)a> <aba> <abc> <(ab)> <(ab)c> <(ab)d> <(ab)f> <(ab)dc> <ac> <aca> <acb> <acc> <ad> <adc> <af> <b> <ba> <bc> <(bc)> <(bc)a> <bd> <bdc> <bf> <c> <ca> <cb> <cc> <d> <db> <dc> <dcb> <e> <ea> <eab> <eac> <eacb> <eb> <ebc> <ec> <ecb> <ef> <efb> <efc> <efcb> <f> <fb> <fbc> <fc> <fcb> id Sequence 10 <a(abc)(ac)d(cf)> 20 <(ad)c(bc)(ae)> 30 <(ef)(ab)(df)cb> 40 <eg(af)cbc> min_support = 2 al113301m@student.etf.rs

Existing Solutions Apriori-like approaches (AprioriSome (1995.), AprioriAll (1995.), DynamicSome (1995.), GSP (1996.)): Potentially huge set of candidate sequences, Multiple scans of databases, Difficulties at mining long sequential patterns. FreeSpan (2000.) - pattern groth method (Frequent pattern-projected Sequential pattern mining) General idea is to use frequent items to recursively project sequence databases into a smaller projected databases , and grow subsequence fragments in each projected database. PrefixSpan (Prefix-projected Sequential pattern mining) Less projections and quickly shrinking sequence. al113301m@student.etf.rs

Prefix Given two sequences α=<a1a2…an> and β=<b1b2…bm>, m≤n. Sequence β is called a prefix of α if and only if: bi= ai for i ≤ m-1; bm ⊆ am; Example : α =<a(abc)(ac)d(cf)> β =<a(abc)a> al113301m@student.etf.rs

Projection Given sequences α and β, such that β is a subsequence of α. A subsequence α’ of sequence α is called a projection of α w.r.t. β prefix if and only if: α’ has prefix β; There exist no proper super-sequence α’’ of α’ such that: α’’ is a subsequence of α and also has prefix β. Example: α =<a(abc)(ac)d(cf)> β =<(bc)a> α’ =<(bc)(ac)d(cf)> al113301m@student.etf.rs

Postfix Let α’ =<a1,a2…an> be the projection of α w.r.t. prefix β=<a1a2…am-1a’m> (m ≤n) . Sequence γ=<a’’mam+1…an> is called the postfix of α w.r.t. prefix β, denoted as γ= α/ β, where a’’m=(am - a’m). We also denote α =β ⋅ γ. Example: α’ =<a(abc)(ac)d(cf)>, β =<a(abc)a>, γ=<(_c)d(cf)>. al113301m@student.etf.rs

PrefixSpan – Algorithm Input of the algorithm : A sequence database S, and the minimum support threshold min_support. Output of the algorithm: The complete set of sequential patterns. Subroutine: PrefixSpan(α, L, S|α). Parameters: α: sequential pattern, L: the length of α; S|α: : the α-projected database, if α ≠<>; otherwise; the sequence database S. Call PrefixSpan(<>,0,S). id Sequence 10 <a(abc)(ac)d(cf)> 20 <(ad)c(bc)(ae)> 30 <(ef)(ab)(df)cb> 40 <eg(af)cbc> al113301m@student.etf.rs

PrefixSpan – Algorithm (2) Method: 1. Scan S|α once, find the set of frequent items b such that: b can be assembled to the last element of α to form a sequential pattern; or <b> can be appended to α to form a sequential pattern. 2. For each frequent item b: append it to α to form a sequential pattern α’ and output α’; output α’; 3. For each α’: construct α’-projected database S|α’ and call PrefixSpan(α’, L+1, S|α’). al113301m@student.etf.rs

PrefixSpan - Example 1. Find length1sequential patterns: 2. Divide search space id Sequence 10 <a(abc)(ac)d(cf)> 20 <(ad)c(bc)(ae)> 30 <(ef)(ab)(df)cb> 40 <eg(af)cbc> <a> <b> <c> <d> <e> <f> <g> 4 3 1 <a><b><c><d><e><f> Prefix <a> <(abc)(ac)d(cf)> <(_d)c(bc)(ae)> <(_b)(df)cb> <(_f)cbc> <b> <(_c)(ac)d(cf)> <(_c)(ae)> <(df)cb> <c> <c> <(ac)d(cf)> <(bc)(ae)> <b> <bc> <d> <(cf)> <c(bc)(ae)> <(_f)cb> <e> <(_f)(ab)(df)cb> <(af)cbc> <f> <(ab)(df)cb> <cbc> al113301m@student.etf.rs

PrefixSpan – Example (2) Find subsets of sequential patterns: <d> <(cf)> <c(bc)(ae)> <(_f)cb> <a> <b> <c> <d> <e> <f> <_f> 1 2 3 <db> <dc> <db> <(_c)(ae)> <dc> <(bc)(ae)> <b> <b> <a> <e> <c> 2 1 <dcb> <dcb> <> al113301m@student.etf.rs

Conclusions PrefixSpan Efficient pattern growth method. Outperforms both GSP and FreeSpan. Explores prefix-projection in sequential pattern mining. Mines the complete set of patterns, but reduces the effort of candidate subsequence generation. Prefix-projection reduces the size of projected database and leads to efficient processing. Bi-level projection and pseudo-projection may improve mining efficiency. al113301m@student.etf.rs

References Pei J., Han J., Mortazavi-Asl J., Pinto H., Chen Q., Dayal U., Hsu M., “PrefixSpan: Mining Sequential Patterns Efficiently by Prefix- Projected Pattern Growth”, 17th International Conference on Data Engineering (ICDE), April 2001. Agrawal R., Srikant R., “Mining sequential patterns”, Proceedings 1995 Int. Conf. Very Large Data Bases (VLDB’94), pp. 487-499, 1999. Han J., Dong G., Mortazavi-Asl B., Chen Q., Dayal U., Hsu M.-C., ”Freespan: Frequent pattern-projected sequential pattern mining”, Proceedings 2000 Int. Conf. Knowledge Discovery and Data Mining (KDD’00), pp. 355-359, 2000. Wojciech Stach, “http://webdocs.cs.ualberta.ca/~zaiane/courses/cmput695- 04/slides/PrefixSpan-Wojciech.pdf”. al113301m@student.etf.rs

Thank you for attention. Questions Thank you for attention. Questions? Lazar Arsić 2011/3301 AL113301m@student.etf.rs al113301m@student.etf.rs