Presentation is loading. Please wait.

Presentation is loading. Please wait.

CEMiner – An Efficient Algorithm for Mining Closed Patterns from Time Interval-based Data Yi-Cheng Chen, Wen-Chih Peng and Suh-Yin Lee ICDM 2011.

Similar presentations


Presentation on theme: "CEMiner – An Efficient Algorithm for Mining Closed Patterns from Time Interval-based Data Yi-Cheng Chen, Wen-Chih Peng and Suh-Yin Lee ICDM 2011."— Presentation transcript:

1 CEMiner – An Efficient Algorithm for Mining Closed Patterns from Time Interval-based Data Yi-Cheng Chen, Wen-Chih Peng and Suh-Yin Lee ICDM 2011

2 Outlines 2012/6/132  Motivation  Preliminaries  Endpoint representation  CEMiner algorithm  Experimental result  Conclusion

3 Motivation 2012/6/133  Existing studies only focus on mining closed sequential patterns from time point-based data.

4 Cont. 2012/6/134 to discover closed temporal patterns from interval-based data  In this paper, we discuss and design an efficient method to discover closed temporal patterns from interval-based data.  Three contributions :  We simplify the processing of complex relations. i.e., only “before”, “after” and “equal.”  Endpoint representation  A novel algorithm, CEMiner (Closed Endpoint Temporal Miner).

5 Preliminaries 2012/6/135  Definition 1. Event interval and event sequence the set of event symbols  E = { e 1, e 2,…, e k } be the set of event symbols : { A, B, C, D, E } event interval  The triplet ( e i, s i, f i ) is an event interval : ( A, 2, 7) event sequence  An event sequence is a series of event interval triplets :.

6 Cont. 2012/6/136  Definition 2. Temporal database each record r i consists of a sequence-id, SID and an event  Database DB = { r 1, r 2, …, r m }, each record r i, consists of a sequence-id, SID and an event.  DB is called a temporal database.

7 Endpoint representation 2012/6/137  When describing relationships among more than three events, Allen’s temporal logics may suffer several problems.  A suitable representation is very important for describing a temporal pattern. endpoint representation  A new expression, endpoint representation is proposed to address the ambiguous and scalable problem.

8 Cont. 2012/6/138

9 Cont. 2012/6/139  The endpoint representation has several benefits :  Scalability  Nonambiguity  Simplicity

10 CEMiner algorithm 2012/6/1310

11 Cont. 2012/6/1311  Definition 4. Closed temporal pattern  CTP = {( ∈ TP ) ˄ ( ∄ ∈ TP ) such that ( ⊆ β ) ∧ ( support () = support () )}  Given two sequence and  If is a closed temporal pattern,  is a temporal pattern and  there doesn’t exist a supersequence and support () = support ().

12 Cont. 2012/6/1312

13 Cont. 2012/6/1313  Closure Checking  To verify a new closed temporal pattern p, we require checking whether p is a sub-sequence or super-sequence of an existing temporal pattern p’ and the projected database of p and p’ is equal.  This paper borrow BI-Directional Extension [WH04] to check patterns’ closure.  Forward-extension  Backward-extension

14 Cont. 2012/6/1314

15 Cont. 2012/6/1315  If there exists no forward-extension endpoint nor backward-extension, must be a closed endpoint sequence.  The CEMiner checks closure in two directions as follows,  Forward directional checking  Backward directional checking

16 Cont. 2012/6/1316  Definition First instance of a prefix sequence  Ex.  The first instance of the prefix sequence AB in sequence CAABC is CAAB.

17 Cont. 2012/6/1317

18 Cont. 2012/6/1318

19 Cont. 2012/6/1319

20 CEMiner Algorithm 2012/6/1320  We use three pruning strategies to reduce the searching space efficiently and effectively.  (1) pre-pruning  (2) post-pruning  (3) pair-pruning

21 CEMiner Algo. 2012/6/1321

22 CEMiner Algo. 2012/6/1322

23 CEMiner Algo. 2012/6/1323

24 CEMiner Algo. 2012/6/1324

25 CEMiner Algo. 2012/6/1325  Pair-pruning:  If the endpoint is a starting endpoint, we can omit the closure checking.  Because the starting endpoint and finishing endpoint always occur in pairs in an endpoint sequence.

26 CEMiner Algo. 2012/6/1326

27 CEMiner Algo. 2012/6/1327

28 CEMiner Algo. 2012/6/1328

29 CEMiner Algo. 2012/6/1329

30 CEMiner Algo. 2012/6/1330

31 Cont. 2012/6/1331

32 Experimental result 2012/6/1332

33 Conclusion 2012/6/1333  We develop an efficient algorithm, CEMiner, to discover closed temporal patterns without candidate generation, based on proposed endpoint representation.  The algorithm further employs three pruning methods to reduce the search space effectively.


Download ppt "CEMiner – An Efficient Algorithm for Mining Closed Patterns from Time Interval-based Data Yi-Cheng Chen, Wen-Chih Peng and Suh-Yin Lee ICDM 2011."

Similar presentations


Ads by Google