Download presentation

Presentation is loading. Please wait.

Published byIssac Gillard Modified over 2 years ago

1
1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura Iowa State University

2
2 Introduction Algorithm Analysis Outline of Talk

3
3 Time Data stream: For simplicity assume unit valued elements

4
4 Current time Most recent time window of duration Compute the sum of elements with time stamps in time window Goal: Data stream:

5
5 Example I: All packets on a network link, maintain the number of different ip sources in the last one hour Example II: Large database, continuously maintain averages and frequency moments

6
6 Synchronous stream t i : In ascending order Asynchronous stream t i : No order guaranteed Data stream:

7
7 Why Asynchronous Data Streams? Synchronous streamAsynchronous stream Synchronous Asynchronous Merge w/o control Network delay & multi-path routing

8
8 Processing Requirements: One pass processing Small workspace: poly-logarithmic in the size of data Fast processing time per element Approximate answers are ok

9
9 Our results: A deterministic data aggregation algorithm Time: Space: Relative Error:

10
10 Previous Work: [Datar, Gionis, Indyk, Motwani. SIAM Journal on Computing, 2002] Deterministic, Synchronous [Tirthapura, Xu, Busch, PODC, 2006] Randomized, Asynchronous Merging buckets Random sampling

11
11 Introduction Algorithm Analysis Outline of Talk

12
12 Current time Time Data stream: For simplicity assume unit valued elements

13
13 Current time Most recent time window of duration Data stream: Compute the sum of elements with time stamps in time window Goal:

14
14 Divide time into periods of duration

15
15 The sliding window may span at most two time periods sliding window

16
16 sliding window Sum can be written as two sub-sums In two time periods

17
17 sliding window Data structure that maintains an estimate of In left time period

18
18 Without loss of Generality, Consider data structure in time period

19
19 Data structure consists of various levels is an upper bound of the sum in a period

20
20 Counts up to elements Time period Bucket at Level Consider level

21
21 Increase counter value Stream:

22
22 Increase counter value Stream:

23
23 Stream: Increase counter value

24
24 Increase counter value Stream:

25
25 Stream: Counter threshold of reached Split bucket

26
26 Stream: New buckets have threshold also

27
27 Stream: Increase appropriate bucket

28
28 Stream: Increase appropriate bucket

29
29 Stream: Increase appropriate bucket

30
30 Stream: Split bucket

31
31 Stream:

32
32 Stream: Increase appropriate bucket

33
33 Stream: Split bucket

34
34 Stream:

35
35 Splitting Tree

36
36 Leaf buckets of duration 1 are not split any further Max depth =

37
37 The initial bucket may be split into many buckets Leaf buckets

38
38 Due to space limitations we only keep the last buckets Leaf buckets

39
39 Suppose we want to find the sum of elements in time period

40
40 of splitting threshold Consider various levels

41
41 First level with a leaf bucket that intersects timeline

42
42 Estimate of S: Consider buckets on right of timeline

43
43 First level with a leaf bucket On right timeline OR

44
44 Introduction Algorithm Analysis Outline of Talk

45
45 Suppose that we use level in order to compute the estimate

46
46 Stream: A data element is counted in the appropriate bucket Consider splitting threshold level

47
47 Stream: We can assume that the element is placed in the respective bucket

48
48 Stream: We can assume that when bucket splits the element is placed in an arbitrary child bucket

49
49 Stream: If: GOOD! Element counted in correct bucket

50
50 Stream: If: BAD! Element counted in wrong bucket

51
51 Consider Leaf Buckets If GOOD!

52
52 Consider Leaf Buckets If BAD! Element counted in wrong bucket

53
53 :elements of left part counted on right Consider Leaf Buckets :elements of right part counted on left

54
54 elements of left part counted on right Must have been initially inserted in one of these buckets

55
55 Since tree depth

56
56 Since tree depth Similarly, we can prove Therefore:

57
57 It can be proven Since

58
58 It can be proven Since Combined with We obtain relative error :

Similar presentations

OK

Mining High-Speed Data Streams Presented by: William Kniffin Pedro Domingos Geoff Hulten Sixth ACM SIGKDD International Conference - 2000.

Mining High-Speed Data Streams Presented by: William Kniffin Pedro Domingos Geoff Hulten Sixth ACM SIGKDD International Conference - 2000.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on obesity management program Ppt on social networking download Ppt on hiv/aids in hindi Ppt on review of literature science Ppt on ip address classes and special ip Ppt on adjectives for grade 3 Ppt on fire extinguisher types for electronics Ppt on carl friedrich gauss Ppt on eye os glasses Ppt on different types of computer softwares logos