Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang.

Slides:



Advertisements
Similar presentations
Online Mining of Frequent Query Trees over XML Data Streams Hua-Fu Li*, Man-Kwan Shan and Suh-Yin Lee Department of Computer Science.
Advertisements

Construct chronicles For each fuzzy clusters of step : instances are sorted in the decreasing order of their membership degree the T first instances that.
An Efficient Algorithm for Mining Time Interval-based Patterns in Large Databases Yi-Cheng Chen, Ji-Chiang Jiang, Wen-Chih Peng and Suh-Yin Lee Department.
Discovering Lag Interval For Temporal Dependencies Larisa Shwartz Liang Tang, Tao Li, Larisa Shwartz1 Liang Tang, Tao Li
Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
Frequent Closed Pattern Search By Row and Feature Enumeration
LCM: An Efficient Algorithm for Enumerating Frequent Closed Item Sets L inear time C losed itemset M iner Takeaki Uno Tatsuya Asai Hiroaki Arimura Yuzo.
Edi Winarko, John F. Roddick
Genome-scale disk-based suffix tree indexing Benjarath Phoophakdee Mohammed J. Zaki Compiled by: Amit Mahajan Chaitra Venus.
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Leiden University Efficient Frequent Query Discovery in F ARMER Siegfried Nijssen and Joost N. Kok ECML/PKDD-2003, Cavtat.
Mining Sequential Patterns Dimitrios Gunopulos, UCR.
Chapter 16 Parallel Data Mining 16.1From DB to DW to DM 16.2Data Mining: A Brief Overview 16.3Parallel Association Rules 16.4Parallel Sequential Patterns.
1 Mining Frequent Patterns Without Candidate Generation Apriori-like algorithm suffers from long patterns or quite low minimum support thresholds. Two.
Temporal Pattern Matching of Moving Objects for Location-Based Service GDM Ronald Treur14 October 2003.
1 Mining Quantitative Association Rules in Large Relational Database Presented by Jin Jin April 1, 2004.
Continuous Data Stream Processing
Continuous Data Stream Processing MAKE Lab Date: 2006/03/07 Post-Excellence Project Subproject 6.
A TABU SEARCH APPROACH TO POLYGONAL APPROXIMATION OF DIGITAL CURVES.
1 Efficiently Mining Frequent Trees in a Forest Mohammed J. Zaki.
What Is Sequential Pattern Mining?
Abrar Fawaz AlAbed-AlHaq Kent State University October 28, 2011
USpan: An Efficient Algorithm for Mining High Utility Sequential Patterns Authors: Junfu Yin, Zhigang Zheng, Longbing Cao In: Proceedings of the 18th ACM.
Mining Frequent Itemsets with Constraints Takeaki Uno Takeaki Uno National Institute of Informatics, JAPAN Nov/2005 FJWCP.
VLDB 2012 Mining Frequent Itemsets over Uncertain Databases Yongxin Tong 1, Lei Chen 1, Yurong Cheng 2, Philip S. Yu 3 1 The Hong Kong University of Science.
October 2, 2015 Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 8 — 8.3 Mining sequence patterns in transactional.
Sequential PAttern Mining using A Bitmap Representation
Approximate Frequency Counts over Data Streams Loo Kin Kong 4 th Oct., 2002.
1 Verifying and Mining Frequent Patterns from Large Windows ICDE2008 Barzan Mozafari, Hetal Thakkar, Carlo Zaniolo Date: 2008/9/25 Speaker: Li, HueiJyun.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )
Mining Multidimensional Sequential Patterns over Data Streams Chedy Raїssi and Marc Plantevit DaWak_2008.
Takeaki Uno Tatsuya Asai Yuzo Uchida Hiroki Arimura
MINING FREQUENT ITEMSETS IN A STREAM TOON CALDERS, NELE DEXTERS, BART GOETHALS ICDM2007 Date: 5 June 2008 Speaker: Li, Huei-Jyun Advisor: Dr. Koh, Jia-Ling.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
TAR: Temporal Association Rules on Evolving Numerical Attributes Wei Wang, Jiong Yang, and Richard Muntz Speaker: Sarah Chan CSIS DB Seminar May 7, 2003.
Mining High Utility Itemset in Big Data
Mining Serial Episode Rules with Time Lags over Multiple Data Streams Tung-Ying Lee, En Tzu Wang Dept. of CS, National Tsing Hua Univ. (Taiwan) Arbee L.P.
Sequential Pattern Mining
CEMiner – An Efficient Algorithm for Mining Closed Patterns from Time Interval-based Data Yi-Cheng Chen, Wen-Chih Peng and Suh-Yin Lee ICDM 2011.
Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.
Reporter : Yu Shing Li 1.  Introduction  Querying and update in the cloud  Multi-dimensional index R-Tree and KD-tree Basic Structure Pruning Irrelevant.
Outline Introduction – Frequent patterns and the Rare Item Problem – Multiple Minimum Support Framework – Issues with Multiple Minimum Support Framework.
LCM ver.3: Collaboration of Array, Bitmap and Prefix Tree for Frequent Itemset Mining Takeaki Uno Masashi Kiyomi Hiroki Arimura National Institute of Informatics,
MINING COLOSSAL FREQUENT PATTERNS BY CORE PATTERN FUSION FEIDA ZHU, XIFENG YAN, JIAWEI HAN, PHILIP S. YU, HONG CHENG ICDE07 Advisor: Koh JiaLing Speaker:
Mining Quantitative Association Rules in Large Relational Tables ACM SIGMOD Conference 1996 Authors: R. Srikant, and R. Agrawal Presented by: Sasi Sekhar.
CanTree: a tree structure for efficient incremental mining of frequent patterns Carson Kai-Sang Leung, Quamrul I. Khan, Tariqul Hoque ICDM ’ 05 報告者:林靜怡.
Discovering multi-label temporal patterns in sequence databases Yen-Liang Chen, Shin-Yi Wu, Yu-Cheng Wang IS (Information Sciences)
18 February 2003Mathias Creutz 1 T Seminar: Discovery of frequent episodes in event sequences Heikki Mannila, Hannu Toivonen, and A. Inkeri Verkamo.
Predicting the Location and Time of Mobile Phone Users by Using Sequential Pattern Mining Techniques Mert Özer, Ilkcan Keles, Ismail Hakki Toroslu, Pinar.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
1 The Strategies for Mining Fault-Tolerant Patterns Jia-Ling Koh Department of Information and Computer Education National Taiwan Normal University.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Mining Complex Data COMP Seminar Spring 2011.
Approach to Data Mining from Algorithm and Computation Takeaki Uno, ETH Switzerland, NII Japan Hiroki Arimura, Hokkaido University, Japan.
Gspan: Graph-based Substructure Pattern Mining
CLASS INHERITANCE TREE (CIT)
CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets
TITLE What should be in Objective, Method and Significant
Online Frequent Episode Mining
CARPENTER Find Closed Patterns in Long Biological Datasets
CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets
Mining Frequent Itemsets over Uncertain Databases
Objective of This Course
Mining Complex Data COMP Seminar Spring 2011.
Lecture 2- Query Processing (continued)
Market Basket Analysis and Association Rules
FP-Growth Wenlong Zhang.
CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets
Discovering Frequent Poly-Regions in DNA Sequences
Presentation transcript:

Temporal Database Paper Reading R 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang

Introduction Discover frequent serial episodes to find relationships between events. Discover frequent serial episodes to find relationships between events. - explain the problems that cause a particular event - predict future result Episode : a partially ordered collection of events occurring together. Episode : a partially ordered collection of events occurring together. - the user defines “ how close is close enough ” - win : the width of the time window

Three classes of episodes Introduced by Mannila et al. Introduced by Mannila et al. Serial episodes Serial episodes - patterns of a total order in the sequence Parallel episodes Parallel episodes - no constraints on the relative order Composite episodes Composite episodes - serial combination of parallel episodes

Examples : episodes

Algorithms (old) Presented by Mannila et al. Presented by Mannila et al. Finding parallel and serial episodes that are frequent enough. Finding parallel and serial episodes that are frequent enough. WINEPI WINEPI - consider the support of an episode MINEPI MINEPI - consider the number of minimal occurrences of an episode of an episode

WINEPI Consider the Sequence S=A 3 A 4 B 5 B 6. Consider the Sequence S=A 3 A 4 B 5 B 6. support : the number of sliding windows with width = win. support : the number of sliding windows with width = win. Given win =3, there are six windows : Given win =3, there are six windows : W 1 =A 3, W 2 =A 3 A 4, W 3 =A 3 A 4 B 5, W 4 =A 4 B 5 B 6, W 5 =B 5 B 6, W 6 =B 6. is supported by two windows. is supported by two windows.

MINEPI Consider the Sequence S=A 3 A 4 B 5 B 6. Consider the Sequence S=A 3 A 4 B 5 B 6. minimal occurrences : an interval that contains episode α, but no proper sub- interval does. minimal occurrences : an interval that contains episode α, but no proper sub- interval does. has mo support 2. has mo support 2. - interval [3,3] and [4,4]. has mo support 1. has mo support 1. - interval [4,5].

Complex sequences Several events occurring at one time Several events occurring at one time Example : Example : A temporal database is a complex sequence with temporal attributes. A temporal database is a complex sequence with temporal attributes. ADBABECEABFACEBDFD

Algorithms (new) Extend the algorithm to deal with complex sequences. Extend the algorithm to deal with complex sequences. MINEPI+ MINEPI+ - depth-first enumeration to generate the frequent episodes by equalJoin and temporalJoin. EMMA EMMA - Episodes Mining using Memory Anchor - utilizes memory anchors to accelerate mining task

More about MINEPI Breath-first manner Breath-first manner - enumerate longer episodes from shorter ones Parameters Parameters - maxwin : maximum window width for an episode - minsup : minimal frequent for “ frequent episode ” Temporal Join Temporal Join - connects events from different time intervals

Example : MINEPI S = A 1 A 2 B 3 A 4 B 5, maxwin =4, minsup =2 S = A 1 A 2 B 3 A 4 B 5, maxwin =4, minsup =2 Find frequent 1-episode first Find frequent 1-episode first - mo (A)={[1,1],[2,2],[4,4]}, mo (B)={[3,3],[5,5]} Temporal Join with maxwin =4 Temporal Join with maxwin =4 - possibles of : [1,3],[2,3],[2,5],[4,5] - mo( )={[2,3],[4,5]} (choose minimal ones) - support( )={[1,4],[2,5],[4,5]} - support count = 3, counting distinct start point

MINEPI+ Must deal with complex sequences. Must deal with complex sequences. Depth-first manner for memory saving Depth-first manner for memory saving Equal Join Equal Join - connects events at the same interval Bound List Bound List For a serial episode P= For a serial episode P= - {[ts i,te i ] : S contains P in time [ts i,te i ]} For an event YFor an event Y - {[t i,t i ] : S contains P in time t i }

Example : bound list maxwin = 4. maxwin = 4. Bound list of : {[1,4],[3,6]}. Bound list of : {[1,4],[3,6]}. Bound list of : {[4,4],[6,6]}. Bound list of : {[4,4],[6,6]} ADBABECEABFACEBDFD

Operations Given P= and an event f. Given P= and an event f. - P.boundlist = {[ts 1,te 1 ], …,[ts n,te n ]} - f.boundlist = {[ts’ 1,ts’ 1 ],…,[ts’ m,ts’ m ]} Equal Join : P 1 =P ⊙ f=. Equal Join : P 1 =P ⊙ f=. - P 1.boundlist are [ts i,te i ] such that te i =ts’ j for some j (1 ≦ j ≦ m) te i =ts’ j for some j (1 ≦ j ≦ m) Temporal Join : P 2 =P . f=. Temporal Join : P 2 =P . f=. - P 2.boundlist are [ts i,ts ’ j ] such that ts’ j -ts i te i for some j (1 ≦ j ≦ m) ts’ j -ts i te i for some j (1 ≦ j ≦ m)

Drawbacks of MINEPI+ Huge amount of combinations Huge amount of combinations - Consider |I| 1-frequent episodes - O(|I| 2 ) checking for temporal joins and equal joins Unnecessary joins Unnecessary joins - should skip temporal joins for a prefix if the number of extendable matching bounds < minsup × |TDB| of extendable matching bounds < minsup × |TDB| Duplicate joins Duplicate joins - episode need 4+1 joins : → → → → → → → → → →

EMMA Divide into three phases Divide into three phases (I) Mining frequent itemset in the complex sequence. (II) Encode each frequent itemset with a unique ID, and construct a encoded horizontal database. (III) Mining episodes in the encoded database. Depth-First Search Depth-First Search Memory Anchor Memory Anchor - utilize the boundlists to access information - timelists of frequent itemsets are their boundlists

Example : database minsup = 5 minsup = 5

Combine episodes Only combine existing episodes with a “ local ” frequent 1-tuple episode. Only combine existing episodes with a “ local ” frequent 1-tuple episode. - overcome the huge amount of generations Projected boundlist (PBL) Projected boundlist (PBL) - episode #3= has boundlist {[1,1],[2,2],[4,4],[8,8],[11,11],[14,14],[15,15]} {[1,1],[2,2],[4,4],[8,8],[11,11],[14,14],[15,15]} - given maxwin = 4, the projected boundlist is {[2,4],[3,5],[5,7],[9,11],[12,14],[15,16],[16,16]} {[2,4],[3,5],[5,7],[9,11],[12,14],[15,16],[16,16]} - note that |TDB|=16

Example : PBL #3.timelist={1,2,4,8,11,14,15}. #3.timelist={1,2,4,8,11,14,15}. 1 → [2,4] 2 → [3,5] 4 → [5,7] 8 → [9,11] 11 → [12,14] 14 → [15,16] 15 → [16,16] with maxwin = 4 and |TDB|=16.

Local frequent ID A local frequent ID has boundlist that can match into other episode ’ s PBL. A local frequent ID has boundlist that can match into other episode ’ s PBL. - #3.PBL={[2,4],[3,5],[5,7],[9,11],[12,14],[15,16],[16,16]} - #4.BL={[3,3],[5,5],[6,6],[9,9],[12,12],[13,13],[16,16]} Record boundlist of ID when examining. Record boundlist of ID when examining. - get the boundlist immediately at temporal join - = then.boundlist = {[1,3],[2,3],[4,5],[8,9],[11,12],[14,16],[15,16]} {[1,3],[2,3],[4,5],[8,9],[11,12],[14,16],[15,16]}

Example : temporal join #4.BL={[3,3],[5,5],[6,6],[9,9],[12,12],[13,13],[16,16]}. #4.BL={[3,3],[5,5],[6,6],[9,9],[12,12],[13,13],[16,16]}. Recall the construction of #3.PBL Recall the construction of #3.PBL 1 → [2,4] : [3,3] in it 2 → [3,5] : [3,3] in it (take minimal) 4 → [5,7] : [5,5] in it 8 → [9,11] : [9,9] in it 11 → [12,14] : [12,12] in it 14 → [15,16] : [16,16] in it 15 → [16,16] : [16,16] in it Result : {[1,3],[2,3],[4,5],[8,9],[11,12],[14,16],[15,16]} Result : {[1,3],[2,3],[4,5],[8,9],[11,12],[14,16],[15,16]}

Procedure : emmajoin Recursively extend the episodes Recursively extend the episodes - until no more serial episodes can be extended Avoid unnecessary checking in MINEPI+ Avoid unnecessary checking in MINEPI+ - stop when the number of extendable bounds for a serial episode is less than minsup × |TDB|. serial episode is less than minsup × |TDB|. Example : #2=. Example : #2=. - #2.BL={[3,3],[6,6],[9,9],[12,12],[16,16]} - #2.PBL={[4,6],[7,9],[10,12],[13,15]} (|TDB|=16) - do not need to extend #2 if minsup = 5

Example : emmajoin #3.BL={[1,1],[4,4],[8,8],[11,11],[14,14],[15,15]}. #3.BL={[1,1],[4,4],[8,8],[11,11],[14,14],[15,15]}. #7.BL={[1,1],[4,4],[8,8],[11,11],[14,14]}. #7.BL={[1,1],[4,4],[8,8],[11,11],[14,14]}. #9.BL={[3,3],[6,6],[9,9],[12,12],[16,16]}. #9.BL={[3,3],[6,6],[9,9],[12,12],[16,16]}. Call emmajoin to extend each 1-tuple episodes Call emmajoin to extend each 1-tuple episodes #3.PBL={[2,4],[5,7],[9,11],[12,14],[15,16],[16,16]}. #3.PBL={[2,4],[5,7],[9,11],[12,14],[15,16],[16,16]}. Find local frequent IDs in #3.PBL. Find local frequent IDs in #3.PBL.

Example : emmajoin (cont.) minsup = 5, maxwin = 4. minsup = 5, maxwin = 4. By temporal Join : By temporal Join : -.BL={} -.BL={[1,4],[8,11],[11,14],[14,15]} -.BL={} -.BL={[1,4],[8,11],[11,14]} -.BL={[1,3],[4,6],[8,9],[11,12],[14,16]} - is generated from prefix #3 - recursively call emmajoin to extend - recursively call emmajoin to extend -.PBL={[4,4],[7,7],[10,11],[13,14]} - there are no local frequent IDs since minsup =5 Back to call emmajoin for episode #7. Back to call emmajoin for episode #7.

Experiments On a dataset composed of 10 stocks. On a dataset composed of 10 stocks. Parameters : maxwin / minsup. Parameters : maxwin / minsup. - more running time when maxwin increases - more running time when minsup decreases - since the number of frequent episodes increases EMMA runs faster than MINEPI+. EMMA runs faster than MINEPI+. MINEPI+ uses lesser space than EMMA. MINEPI+ uses lesser space than EMMA. - EMMA needs large memory as minsup decreases

Conclusion Modify MINEPI to MINEPI+ Modify MINEPI to MINEPI+ - for mining episodes in a complex sequence Propose EMMA Propose EMMA - avoid the drawbacks of MINEPI+ EMMA is more efficient than MINEPI+. EMMA is more efficient than MINEPI+. Future work Future work - only discussed serial episodes - parallel and composite episodes remain to be solved