Download presentation

Presentation is loading. Please wait.

Published byJulien Mayberry Modified over 2 years ago

1
Resource-oriented Approximation for Frequent Itemset Mining from Bursty Data Streams SIGMOD’14 Toshitaka Yamamoto, Koji Iwanuma, Shoshi Fukuda

2
Introduction ▪A data stream: unbounded sequence of data arriving at high speed ▪FIM-DS: Frequent Itemset Mining form Data Stream ▪i.e. : {a}:4 ， {b}:3 ， {c}:3 ▪Application of FIM-DS: monitoring surveillance systems, communication networks, retail industry…… ▪A Challenging Problem of FIM-DS: handling a huge combinatorial number of entries to be generated form each streaming transaction and stored in memory ▪This study considers approximation techniques for FIM-DS. 2

3
Introduction Some approximation methods for FIM-DS: ▪Parameter-oriented approaches ▪One-scan approximation algorithm ▪Two Type: deletion approach & random sampling approach ▪provide some guarantee that the resulting itemsets have frequencies with errors bounded by a given parameter ▪No false negative under some condition 3

4
Introduction 4

5
5

6
Contents ▪Introduction ▪Preliminary and Background ▪Parameter-Oriented V.S. Resource-Oriented ▪LC-SS Algorithm ▪Skip LC-SS Algorithm (Introduction & Performance & Improvement) ▪Furthermore Improvements ▪Conclusion and Future Work 6

7
Notations and Terminology 7

8
8

9
9

10
10

11
Notations and Terminology 11

12
Lossy Counting algorithm 12

13
Lossy Counting algorithm 13

14
Lossy Counting algorithm ▪The challenging problem: ▪The LC algorithm must generate (and check) every transaction subset at least once ▪Combinatorial explosion of memory consumption 14

15
Space-Saving algorithm 15

16
Space-Saving algorithm 16

17
Contents ▪Introduction ▪Preliminary and Background ▪Parameter-Oriented V.S. Resource-Oriented ▪LC-SS Algorithm ▪Skip LC-SS Algorithm (Introduction & Performance & Improvement) ▪Furthermore Improvements ▪Conclusion and Future Work 17

18
Parameter-Oriented V.S. Resource-Oriented 18

19
19

20
20

21
Parameter-Oriented V.S. Resource-Oriented 21

22
Parameter-Oriented V.S. Resource-Oriented ▪Resource-Oriented approaches: ▪Approximation methods ▪Guarantee a resource-specified constraint: memory consumption or data processing time 22

23
Contents ▪Introduction ▪Preliminary and Background ▪Parameter-Oriented V.S. Resource-Oriented ▪LC-SS Algorithm ▪Skip LC-SS Algorithm (Introduction & Performance & Improvement) ▪Furthermore Improvements ▪Conclusion and Future Work 23

24
LC-SS Algorithm 24

25
LC-SS Algorithm 25

26
LC-SS Algorithm 26

27
LC-SS Algorithm 27

28
The validity in the LC-SS Algorithm 28

29
The validity in the LC-SS Algorithm 29

30
The validity in the LC-SS Algorithm ▪Theorem 2 guarantees the validity(i.e., completeness and accuracy) of the outputs by Algorithm 1. 30

31
The validity in the LC-SS Algorithm 31

32
Contents ▪Introduction ▪Preliminary and Background ▪Parameter-Oriented V.S. Resource-Oriented ▪LC-SS Algorithm ▪Skip LC-SS Algorithm (Introduction & Performance & Improvement) ▪Furthermore Improvements ▪Conclusion and Future Work 32

33
Skip LC-SS Algorithm 33

34
Skip LC-SS Algorithm 34

35
Skip LC-SS Algorithm 35

36
Skip LC-SS Algorithm 36

37
Skip LC-SS Algorithm 37

38
The Validity of the output 38

39
Performance of the Skip LC-SS algorithm ▪Data: ▪online data for earthquake occurrences from 1981 to 2013 in Japan, which consists of 16769 transactions with 1229 items. ▪Using C ▪Mac Pro, Mac OS 10.6, 3.33GHz, 16GB 39

40
Performance of the Skip LC-SS algorithm 40

41
Performance of the Skip LC-SS algorithm 41

42
Performance of the Skip LC-SS algorithm 42

43
Improvement of Skip LC-SS algorithm ▪Two bottleneck: ▪1.updating entries ▪2.replace entries ▪The replacement operation tends to be more expensive than the update one 43

44
R-skip 44

45
T-skip 45

46
46

47
47

48
48

49
Contents ▪Introduction ▪Preliminary and Background ▪Parameter-Oriented V.S. Resource-Oriented ▪LC-SS Algorithm ▪Skip LC-SS Algorithm (Introduction & Performance & Improvement) ▪Furthermore Improvements ▪Conclusion and Future Work 49

50
Furthermore Improvements ▪Key idea: use the stream reduction to dynamically repress each transaction ▪One fact: most items in bursty transactions are non-frequest ▪The principle of non-monotonicity: every itemset with any non-frequest item is no longer frequent ▪Eliminate non-frequent items from each transaction ▪In this paper, use SS-ST algorithm to perform the stream reduction 50

51
SS-ST algorithm 51

52
Experimental results 52

53
Experimental results 53 ▪Web-log data: 19466 transactions with 9961 items, the maximal length decreases by 29 from 106

54
Experimental results ▪Retail data: 88162 transactions with 16470 items, the maximal length decrease by 58 from 76 54

55
Contents ▪Introduction ▪Preliminary and Background ▪Parameter-Oriented V.S. Resource-Oriented ▪LC-SS Algorithm ▪Skip LC-SS Algorithm (Introduction & Performance & Improvement) ▪Furthermore Improvements ▪Conclusion and Future Work 55

56
Conclusion 56

57
Future Work ▪1.introduce efficient data structures for the Skip LC-SS algorithm ▪2.investigate the adaptive approach using the Skip LC-SS algorithm that can fit the relevant resource in the context of FIM-DS 57

58
Thank you! 58

Similar presentations

OK

August 21, 2002VLDB 20021 Gurmeet Singh Manku Frequency Counts over Data Streams Frequency Counts over Data Streams Stanford University, USA.

August 21, 2002VLDB 20021 Gurmeet Singh Manku Frequency Counts over Data Streams Frequency Counts over Data Streams Stanford University, USA.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on leadership styles theories Ppt on project management in software engineering Ppt on wireless sensor network security Ppt on switching network connection Ppt on schottky barrier diode Ppt on meeting skills training Sixth sense technology best ppt on robotics Compress ppt online Ppt on unsustainable to sustainable development Ppt on channels of distribution definition