Download presentation

Presentation is loading. Please wait.

Published byJulien Mayberry Modified over 2 years ago

1
Resource-oriented Approximation for Frequent Itemset Mining from Bursty Data Streams SIGMOD’14 Toshitaka Yamamoto, Koji Iwanuma, Shoshi Fukuda

2
Introduction ▪A data stream: unbounded sequence of data arriving at high speed ▪FIM-DS: Frequent Itemset Mining form Data Stream ▪i.e. : {a}:4 ， {b}:3 ， {c}:3 ▪Application of FIM-DS: monitoring surveillance systems, communication networks, retail industry…… ▪A Challenging Problem of FIM-DS: handling a huge combinatorial number of entries to be generated form each streaming transaction and stored in memory ▪This study considers approximation techniques for FIM-DS. 2

3
Introduction Some approximation methods for FIM-DS: ▪Parameter-oriented approaches ▪One-scan approximation algorithm ▪Two Type: deletion approach & random sampling approach ▪provide some guarantee that the resulting itemsets have frequencies with errors bounded by a given parameter ▪No false negative under some condition 3

4
Introduction 4

5
5

6
Contents ▪Introduction ▪Preliminary and Background ▪Parameter-Oriented V.S. Resource-Oriented ▪LC-SS Algorithm ▪Skip LC-SS Algorithm (Introduction & Performance & Improvement) ▪Furthermore Improvements ▪Conclusion and Future Work 6

7
Notations and Terminology 7

8
8

9
9

10
10

11
Notations and Terminology 11

12
Lossy Counting algorithm 12

13
Lossy Counting algorithm 13

14
Lossy Counting algorithm ▪The challenging problem: ▪The LC algorithm must generate (and check) every transaction subset at least once ▪Combinatorial explosion of memory consumption 14

15
Space-Saving algorithm 15

16
Space-Saving algorithm 16

17
Contents ▪Introduction ▪Preliminary and Background ▪Parameter-Oriented V.S. Resource-Oriented ▪LC-SS Algorithm ▪Skip LC-SS Algorithm (Introduction & Performance & Improvement) ▪Furthermore Improvements ▪Conclusion and Future Work 17

18
Parameter-Oriented V.S. Resource-Oriented 18

19
19

20
20

21
Parameter-Oriented V.S. Resource-Oriented 21

22
Parameter-Oriented V.S. Resource-Oriented ▪Resource-Oriented approaches: ▪Approximation methods ▪Guarantee a resource-specified constraint: memory consumption or data processing time 22

23
Contents ▪Introduction ▪Preliminary and Background ▪Parameter-Oriented V.S. Resource-Oriented ▪LC-SS Algorithm ▪Skip LC-SS Algorithm (Introduction & Performance & Improvement) ▪Furthermore Improvements ▪Conclusion and Future Work 23

24
LC-SS Algorithm 24

25
LC-SS Algorithm 25

26
LC-SS Algorithm 26

27
LC-SS Algorithm 27

28
The validity in the LC-SS Algorithm 28

29
The validity in the LC-SS Algorithm 29

30
The validity in the LC-SS Algorithm ▪Theorem 2 guarantees the validity(i.e., completeness and accuracy) of the outputs by Algorithm 1. 30

31
The validity in the LC-SS Algorithm 31

32
Contents ▪Introduction ▪Preliminary and Background ▪Parameter-Oriented V.S. Resource-Oriented ▪LC-SS Algorithm ▪Skip LC-SS Algorithm (Introduction & Performance & Improvement) ▪Furthermore Improvements ▪Conclusion and Future Work 32

33
Skip LC-SS Algorithm 33

34
Skip LC-SS Algorithm 34

35
Skip LC-SS Algorithm 35

36
Skip LC-SS Algorithm 36

37
Skip LC-SS Algorithm 37

38
The Validity of the output 38

39
Performance of the Skip LC-SS algorithm ▪Data: ▪online data for earthquake occurrences from 1981 to 2013 in Japan, which consists of 16769 transactions with 1229 items. ▪Using C ▪Mac Pro, Mac OS 10.6, 3.33GHz, 16GB 39

40
Performance of the Skip LC-SS algorithm 40

41
Performance of the Skip LC-SS algorithm 41

42
Performance of the Skip LC-SS algorithm 42

43
Improvement of Skip LC-SS algorithm ▪Two bottleneck: ▪1.updating entries ▪2.replace entries ▪The replacement operation tends to be more expensive than the update one 43

44
R-skip 44

45
T-skip 45

46
46

47
47

48
48

49
Contents ▪Introduction ▪Preliminary and Background ▪Parameter-Oriented V.S. Resource-Oriented ▪LC-SS Algorithm ▪Skip LC-SS Algorithm (Introduction & Performance & Improvement) ▪Furthermore Improvements ▪Conclusion and Future Work 49

50
Furthermore Improvements ▪Key idea: use the stream reduction to dynamically repress each transaction ▪One fact: most items in bursty transactions are non-frequest ▪The principle of non-monotonicity: every itemset with any non-frequest item is no longer frequent ▪Eliminate non-frequent items from each transaction ▪In this paper, use SS-ST algorithm to perform the stream reduction 50

51
SS-ST algorithm 51

52
Experimental results 52

53
Experimental results 53 ▪Web-log data: 19466 transactions with 9961 items, the maximal length decreases by 29 from 106

54
Experimental results ▪Retail data: 88162 transactions with 16470 items, the maximal length decrease by 58 from 76 54

55
Contents ▪Introduction ▪Preliminary and Background ▪Parameter-Oriented V.S. Resource-Oriented ▪LC-SS Algorithm ▪Skip LC-SS Algorithm (Introduction & Performance & Improvement) ▪Furthermore Improvements ▪Conclusion and Future Work 55

56
Conclusion 56

57
Future Work ▪1.introduce efficient data structures for the Skip LC-SS algorithm ▪2.investigate the adaptive approach using the Skip LC-SS algorithm that can fit the relevant resource in the context of FIM-DS 57

58
Thank you! 58

Similar presentations

OK

Data Stream Algorithms Ke Yi Hong Kong University of Science and Technology.

Data Stream Algorithms Ke Yi Hong Kong University of Science and Technology.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google