Discovering multi-label temporal patterns in sequence databases Yen-Liang Chen, Shin-Yi Wu, Yu-Cheng Wang IS (Information Sciences) 2011 1.

Slides:



Advertisements
Similar presentations
An Efficient Algorithm for Mining Time Interval-based Patterns in Large Databases Yi-Cheng Chen, Ji-Chiang Jiang, Wen-Chih Peng and Suh-Yin Lee Department.
Advertisements

Spatial Database Systems. Spatial Database Applications GIS applications (maps): Urban planning, route optimization, fire or pollution monitoring, utility.
Iyad Batal Dmitriy Fradkin James Harrison Fabian Moerchen
Discovering Lag Interval For Temporal Dependencies Larisa Shwartz Liang Tang, Tao Li, Larisa Shwartz1 Liang Tang, Tao Li
10 -1 Lecture 10 Association Rules Mining Topics –Basics –Mining Frequent Patterns –Mining Frequent Sequential Patterns –Applications.
Chapter 4 Introduction to Probability
Edi Winarko, John F. Roddick
Marzena Kryszkiewicz DaWak 2009 Non-Derivable Item Set and Non- Derivable Literal Set Representations of Patterns Admitting Negation.
New Sampling-Based Summary Statistics for Improving Approximate Query Answers P. B. Gibbons and Y. Matias (ACM SIGMOD 1998) Rongfang Li Feb 2007.
Generalized Sequential Pattern (GSP) Step 1: – Make the first pass over the sequence database D to yield all the 1-element frequent sequences Step 2: Repeat.
Jierui Xie, Boleslaw Szymanski, Mohammed J. Zaki Department of Computer Science Rensselaer Polytechnic Institute Troy, NY 12180, USA {xiej2, szymansk,
Temporal Pattern Matching of Moving Objects for Location-Based Service GDM Ronald Treur14 October 2003.
1 Mining Quantitative Association Rules in Large Relational Database Presented by Jin Jin April 1, 2004.
Structural Knowledge Discovery Used to Analyze Earthquake Activity Jesus A. Gonzalez Lawrence B. Holder Diane J. Cook.
WELL-DESIGNED DATABASES Process faster Easy to develop and maintain Easy to read and write code.
Mining Long Sequential Patterns in a Noisy Environment Jiong Yang, Wei Wang, Philip S. Yu, Jiawei Han SIGMOD 2002.
Mining Sequences. Examples of Sequence Web sequence:  {Homepage} {Electronics} {Digital Cameras} {Canon Digital Camera} {Shopping Cart} {Order Confirmation}
Mining Sequential Patterns: Generalizations and Performance Improvements R. Srikant R. Agrawal IBM Almaden Research Center Advisor: Dr. Hsu Presented by:
1 A Bayesian Method for Guessing the Extreme Values in a Data Set Mingxi Wu, Chris Jermaine University of Florida September 2007.
Approximate Frequency Counts over Data Streams Gurmeet Singh Manku, Rajeev Motwani Standford University VLDB2002.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Anthony K.H. Tung Hongjun Lu Jiawei Han Ling Feng 國立雲林科技大學 National.
Turing Clusters into Patterns: Rectangle-based Discriminative Data Description Byron J. Gao and Martin Ester IEEE ICDM 2006 Adviser: Koh Jia-Ling Speaker:
Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )
Mining Multidimensional Sequential Patterns over Data Streams Chedy Raїssi and Marc Plantevit DaWak_2008.
MINING FREQUENT ITEMSETS IN A STREAM TOON CALDERS, NELE DEXTERS, BART GOETHALS ICDM2007 Date: 5 June 2008 Speaker: Li, Huei-Jyun Advisor: Dr. Koh, Jia-Ling.
Mining High Utility Itemset in Big Data
Web Usage Mining for Semantic Web Personalization جینی شیره شعاعی زهرا.
Modul 8: Sequential Pattern Mining. Terminology  Item  Itemset  Sequence (Customer-sequence)  Subsequence  Support for a sequence  Large/frequent.
Data Mining Association Rules: Advanced Concepts and Algorithms
HW#2: A Strategy for Mining Association Rules Continuously in POS Scanner Data.
CEMiner – An Efficient Algorithm for Mining Closed Patterns from Time Interval-based Data Yi-Cheng Chen, Wen-Chih Peng and Suh-Yin Lee ICDM 2011.
1 Discovering Robust Knowledge from Databases that Change Chun-Nan HsuCraig A. Knoblock Arizona State UniversityUniversity of Southern California Journal.
Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.
A New Method to Forecast Enrollments Using Fuzzy Time Series and Clustering Techniques Kurniawan Tanuwijaya 1 and Shyi-Ming Chen 1, 2 1 Department of Computer.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Multi-Relational Data Mining: An Introduction Joe Paulowskey.
Generalized Sequential Pattern Mining with Item Intervals Yu Hirate Hayato Yamana PAKDD2006.
Data Mining Association Rules: Advanced Concepts and Algorithms Lecture Notes Introduction to Data Mining by Tan, Steinbach, Kumar.
Data Mining Association Rules: Advanced Concepts and Algorithms
New Sampling-Based Summary Statistics for Improving Approximate Query Answers Yinghui Wang
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Mining Logs Files for Data-Driven System Management Advisor.
Temporal Database Paper Reading R 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang.
Intelligent Database Systems Lab Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Salvatore Orlando Raffaele Perego Claudio Silvestri 國立雲林科技大學 National.
Introduction to Data Mining by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
18 February 2003Mathias Creutz 1 T Seminar: Discovery of frequent episodes in event sequences Heikki Mannila, Hannu Toivonen, and A. Inkeri Verkamo.
Discovering Evolutionary Theme Patterns from Text - An Exploration of Temporal Text Mining Qiaozhu Mei and ChengXiang Zhai Department of Computer Science.
1 Discovery of Structural and Functional Features in RNA Pseudoknots Qingfeng Chen and Yi-Ping Phoebe Chen, Senior Member, IEEE IEEE TRANSACTIONS ON KNOWLEDGE.
Predicting the Location and Time of Mobile Phone Users by Using Sequential Pattern Mining Techniques Mert Özer, Ilkcan Keles, Ismail Hakki Toroslu, Pinar.
APEX: An Adaptive Path Index for XML data Chin-Wan Chung, Jun-Ki Min, Kyuseok Shim SIGMOD 2002 Presentation: M.S.3 HyunSuk Jung Data Warehousing Lab. In.
Mining Sequential Patterns © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 Slides are adapted from Introduction to Data Mining by Tan, Steinbach,
Efficient Rule-Based Attribute-Oriented Induction for Data Mining Authors: Cheung et al. Graduate: Yu-Wei Su Advisor: Dr. Hsu.
Mining Progressive Confident Rules M. Zhang, W. Hsu and M.L. Lee Int'l Conf on Data Mining (ICDM),2006 IEEE Advisor : Jia-Ling Koh Speaker : Tsui-Feng.
An Energy-Efficient Approach for Real-Time Tracking of Moving Objects in Multi-Level Sensor Networks Vincent S. Tseng, Eric H. C. Lu, & Kawuu W. Lin Institute.
DECISION TREES Asher Moody, CS 157B. Overview  Definition  Motivation  Algorithms  ID3  Example  Entropy  Information Gain  Applications  Conclusion.
1 Discovering Calendar-based Temporal Association Rules SHOU Yu Tao May. 21 st, 2003 TIME 01, 8th International Symposium on Temporal Representation and.
Using category-Based Adherence to Cluster Market-Basket Data Author : Ching-Huang Yun, Kun-Ta Chuang, Ming-Syan Chen Graduate : Chien-Ming Hsiao.
Indexing and Mining Free Trees Yun Chi, Yirong Yang, Richard R. Muntz Department of Computer Science University of California, Los Angeles, CA {
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Mining Complex Data COMP Seminar Spring 2011.
Fuzzy data mining for interesting generalized association rules Source : Fuzzy Sets and Systems ; Vol.138, No. 2, 2003, pp Author : Tzung-Pei,
Differential Analysis on Deep Web Data Sources Tantan Liu, Fan Wang, Jiedan Zhu, Gagan Agrawal December.
Mining Sequential Patterns With Item Constraints
Discovering Frequent Arrangements of Temporal Intervals Papapetrou, P. ; Kollios, G. ; Sclaroff, S. ; Gunopulos, D. ICDM 2005.
G10 Anuj Karpatne Vijay Borra
Sequential Pattern Mining Using A Bitmap Representation
Spatio-temporal Rule Mining: Issues and Techniques
En Wang 1,2 , Yongjian Yang 1 , and Jie Wu 2
Mining Access Pattrens Efficiently from Web Logs Jian Pei, Jiawei Han, Behzad Mortazavi-asl, and Hua Zhu 2000년 5월 26일 DE Lab. 윤지영.
Danger Prediction by Case-Based Approach on Expressways
Presentation transcript:

Discovering multi-label temporal patterns in sequence databases Yen-Liang Chen, Shin-Yi Wu, Yu-Cheng Wang IS (Information Sciences)

OUTLINE 1. Introduction 2. Related works 3. Problem definition 4. The algorithm 5. Performance evaluation and real case experiments 6. Conclusions and future work 2

1. Introduction Multi-label event 3

1. Introduction Multi-label temporal pattern representation MLTPM (Multi-label temporal pattern mining) for discovering multi-label temporal patterns from multi-label sequence data. 4

2. Related works Allen-based representation “Maintaining knowledge about temporal intervals” Kam and Fu’s method TPrefixSpan HTPM 5

3. Problem definition Let event types 1, 2,..., and u be all the event types in temporal database D. Let L i = {l i 1, l i 2, …, l i t } be the set of all labels for event type i. A multi-label item has three related attributes: 1. event type 2. occurrence number of the event type 3. label index 6

3. Problem definition We define the following notations for a multi-label item it: A multi-label sequence is a sequence of multi-label items. The total number of items in a multi-label sequence is the length of the sequence. 7

3. Problem definition 8 EXAMPLE The first occurrence of event type a with three statuses : (a 1 1, a 1 3, a 1 2 ) length = 3 The second occurrence of event type a with two statuses : (a 2 2, a 2 3 ) length = 2 The first occurrence of event type b with two statuses : (b 1 2, b 1 4 ) length = 2 The second occurrence of event type b with two statuses : (b 2 2, b 2 3 ) length = 2 The first occurrence of event type c with two statuses : (c 1 2 ) length = 1

3. Problem definition 9 EXAMPLE (a 1 1, a 1 3, a 1 2 ) is the first occurrence of event type a in the sequence. (a 2 2, a 2 3 ) is the second occurrence. a 1 1.oNum = 1 a 1 1.lNum = 1 a 1 1.eType = a a 2 3.oNum = 2 a 2 3.lNum = 3 a 2 3.eType = a

3. Problem definition Let time(u) be the occurrence time of item u. Then, the order relation Rel(u,v) of two items u and v can be defined as ‘‘<” if time(u) < time(v), and as ‘‘=” if time(u) = time(v). EX: 10 Rel(a 1 1, b 1 2 ) = “<”, because time (a 1 1 ) = 4 < time (b 1 2 ) = 6

3. Problem definition A multi-label temporal sequence or pattern is a sequence of multi-label items interweaved with temporal relationships. 11

3. Problem definition In a multi-label sequence or a multi-label temporal pattern, item u must be placed before item v based on the following conditions: 12

3. Problem definition 13 EXAMPLE a 1 1 < a 1 2, b 1 1 < b 1 2 a 1 2 = a 2 2 a 1 3 = b 1 1 a 2 2 = a 1 3

3. Problem definition Function Small ( ⊕ r, ⊕ r+1,…, ⊕ q ), where ⊕ i ∈ {<, =}, will output “<“ if any ⊕ i, r ≤ i ≤ q, is “<”. Otherwise, the output of Small is “=”. EX: mltp = (a 1 1 < b 1 2 < a 1 2 < a 1 3 = b 1 3 = c 1 1 ), then Rel (a 1 2, c 1 1 ) = Small (<, =, =) = “<“, and Rel (a 1 3, c 1 1 ) = Small (=, =) = “=“, 14

3. Problem definition 15 EXAMPLE mltp = (a 1 1 < a 1 2 < a 1 3 < b 1 3 < b 1 4 ) mlts = (a 1 1 < a 1 2 < b 1 2 < a 1 3 < b 1 3 < c 1 1 < a 2 2 < b 2 3 < b 2 4 ) we show that mltp ⊆ mlts because we can find s 1,s 2, s 4, s 8,and s 9 in mlts.

3. Problem definition (Cont.) (1) Type equivalence(2) Label equivalence: p 1.eType = s 1.eType = ap 1.lNum = s 1.lNum = 1 p 2.eType = s 2.eType = ap 2.lNum = s 2.lNum = 2 p 3.eType = s 4.eType = ap 3.lNum = s 4.lNum = 3 p 4.eType = s 8.eType = bp 4.lNum = s 8.lNum = 3 p 5.eType = s 9.eType = bp 5.lNum = s 9.lNum = mltp = (a 1 1 < a 1 2 < a 1 3 < b 1 3 < b 1 4 ) mlts = (a 1 1 < a 1 2 < b 1 2 < a 1 3 < b 1 3 < c 1 1 < a 2 2 < b 2 3 < b 2 4 )

3. Problem definition (Cont.) (3) Occurrence number agreement: p 1, p 2, p 3,have the same event type and occurrence number, so do s 1, s 2, s 4. p 4, p 5 have the same event type and occurrence number, so do s 8, s mltp = (a 1 1 < a 1 2 < a 1 3 < b 1 3 < b 1 4 ) mlts = (a 1 1 < a 1 2 < b 1 2 < a 1 3 < b 1 3 < c 1 1 < a 2 2 < b 2 3 < b 2 4 )

3. Problem definition (Cont.) (4) Same label ordering: ¤ 1 = Small ( ⊕ 1 ) = Small (<) = “<” ¤ 2 = Small ( ⊕ 2, ⊕ 3 ) = Small (<, <) = “<” ¤ 3 = Small ( ⊕ 4, ⊕ 5, ⊕ 6, ⊕ 7 ) = Small (<, <, <, <) = “<” ¤ 4 = Small ( ⊕ 8 ) = Small (<) = “<” mltp = (a 1 1 < a 1 2 < a 1 3 < b 1 3 < b 1 4 ) mlts = (a 1 1 < a 1 2 < b 1 2 < a 1 3 < b 1 3 < c 1 1 < a 2 2 < b 2 3 < b 2 4 )

4. The algorithm There are two kinds of multi-label temporal patterns. Intra-event pattern It consists of only one event occurrence and intra-L k is the set of frequent intra-event patterns with length k, where k is the number of items Inter-event pattern It consists of more than one event occurrence and inter- L k is the set of frequent inter-event patterns with length k, where k is the number of event occurrences 19

4. The algorithm MLTPM(Multi-label temporal pattern mining) Phase 1 : intra-event pattern mining, discovering patterns with only one event occurrence. Phase 2 : inter-event pattern mining, discovering patterns with more than one event occurrence. EX: A multi-label temporal pattern a 1 1 < a 1 2 < a 1 3 < a 2 2 < a 2 4 is treated as an inter-event pattern because event type a occurs twice. 20

4. The algorithm Phase 1 21

4. The algorithm 22 EXAMPLE But occurrence records and cannot be joined in this phase. Join (a 1 1 ) and (a 1 2 ), we obtain the pattern (a 1 1 < a 1 2 )

4. The algorithm 23 EXAMPLE Generate intra- L k from intra- L (k-1)

4. The algorithm Phase 2 24

4. The algorithm After phase 1, we combine all intra-event patterns to obtain inter-L 1. When generating inter-L 2, GenInterLk joins all pairs of inter-event patterns (including self-join) in inter-L 1. The occurrence records for two patterns in inter-L 1 are joinable (1) If the patterns have different event types. (2) If the patterns have the same event type, they have different occurrence numbers. 25

4. The algorithm 26 EXAMPLE The two inter-L 1 patterns (a 1 1 < a 1 2 ) and (b 1 2 < b 1 3 ) have different event types, so they are joinable.

4. The algorithm 27 EXAMPLE Although the two inter-L 1 patterns (b 1 2 ) and (b 1 2 < b 1 3 ) have the same event type, their occurrence records have different occurrence numbers.

4. The algorithm When generating inter-L k (k > 2), GenInterLk only joins pairs of inter-event patterns in inter-L (k-1) that have the same first (k-2) events. They must have the same occurrence number and the same occurrence time. Two occurrence records for patterns in inter-L (k-1) are joinable. (1) If they have different last event types. (2) If they have the same last event type, they have different occurrence numbers. 28

29 EXAMPLE The two inter-L 2 patterns are joinable because (1) They have the same first 1 event, a 1 1. (2) Although they have the same last event type b, they have different occurrence numbers.

5. Performance evaluation and real case experiments 30

5. Performance evaluation and real case experiments 31

6. Conclusions and future work MLTPM 32