Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mining Multidimensional Sequential Patterns over Data Streams Chedy Raїssi and Marc Plantevit DaWak_2008.

Similar presentations


Presentation on theme: "Mining Multidimensional Sequential Patterns over Data Streams Chedy Raїssi and Marc Plantevit DaWak_2008."— Presentation transcript:

1 Mining Multidimensional Sequential Patterns over Data Streams Chedy Raїssi and Marc Plantevit DaWak_2008

2 Outlines Introduction Problem Definition The MDSDS Approach Experimental Results Conclusions 2

3 Introduction We propose to consider the intrinsic multidimensionality of the streams for the extraction of more interesting sequential patterns. The search space in multidimensional framework is huge. We only focus on the most specific abstraction level for items instead of mining at all possible levels. 3

4 Problem Definition multidimensional item a = (d 1,..., d m ) * : wild-card value that can be interpreted by ALL. multidimensional itemset i = {a 1,..., a k } multidimensional sequence s = 4

5 Cont. 5 We focus on the most specific frequent items to generate the multidimensional sequential patterns. E.g. ▫If items (LA, ∗, M, ∗ ) and ( ∗, ∗, M, Wii) are frequent, we do not consider the frequent items (LA, ∗, ∗, ∗ ), ( ∗, ∗, M, ∗ ) and ( ∗, ∗, ∗, Wii).

6 Cont. Data stream DS = B 0, B 1,..., B n B i = { B 1, B 2, B 3,..., B k } 6 B0B0 B1B1 B1B1 B2B2 B3B3

7 Cont. min_sup = 50% specialization 7

8 The MDSDS Approach MDSDS extracts the most specific multidimensional items. MDSDS uses a data structure consisting of a prefix-tree and tilted-time windows tables. The patterns are: (1) frequent patterns, (2) sub-frequent patterns, (3) infrequent patterns (not stored in the prefix-tree). 8

9 Cont. 9 Step 1 : mine the most specific multidimensional items ▫.▫. ▫Multidimensional representation : (LA, ∗, ∗, ∗ ), ( ∗, ∗, M, ∗ ) ▫Detecting the specialization or generalization. 1 2 3 5 4 6 7 8 9 10 11 12 14 13 15

10 Cont. 10 Step 2 : ▫Subfrequent sequences may become frequent in future batches. ▫Using PrefixSpan algorithm to mine efficiently the multidimensional sequences.

11 PrefixSpan algorithm 11. 1. Find length-1 sequential patterns, :4, :4, :4, :3, :3, :3. 2. Divide search space, (1) the ones having prefix ;…; and (6) the ones having prefix. ▫ -projected database:,,,. ▫The length-2 sequential patterns :2, :4, :2, :4, :2, :2. ▫… min_sup = 2

12 Cont. 12 3. Find subsets of sequential patterns.

13 Cont. 13 Step 3 : ▫Tilted-time windows table ▫The updating operations and pruning techniques are done after receiving a batch from the data stream.

14 Tilted-time windows 14.

15 Cont. 15.

16 Experimental Results 16

17 Cont. 17

18 Cont. 18

19 Conclusions Experiments on real data gathered from TCP/IP network traffic provide compelling evidence that it is possible to obtain accurate and fast results for multidimensional sequential pattern mining. We propose to take multidimensional framework into account in order to detect high-level changes like trends. 19


Download ppt "Mining Multidimensional Sequential Patterns over Data Streams Chedy Raїssi and Marc Plantevit DaWak_2008."

Similar presentations


Ads by Google