Presentation is loading. Please wait.

Presentation is loading. Please wait.

June 16, 2004 PODS 1 Approximate Counts and Quantiles over Sliding Windows Arvind Arasu, Gurmeet Singh Manku Stanford University.

Similar presentations


Presentation on theme: "June 16, 2004 PODS 1 Approximate Counts and Quantiles over Sliding Windows Arvind Arasu, Gurmeet Singh Manku Stanford University."— Presentation transcript:

1 June 16, 2004 PODS 1 Approximate Counts and Quantiles over Sliding Windows Arvind Arasu, Gurmeet Singh Manku Stanford University

2 June 16, 2004 2PODS Sliding Window Model 2152163511204251161385211927 time

3 June 16, 2004 3PODS Sliding Window Model 2152163511204251161385211927 time

4 June 16, 2004 4PODS Sliding Window Model 2152163511204251161385211927 time SUM = 66

5 June 16, 2004 5PODS Sliding Window Model 2152163511204251161385211927 time SUM = 59

6 June 16, 2004 6PODS Statistics over Sliding Windows Easy if we store entire window Easy if we store entire window Storing entire window expensive Storing entire window expensive Space: “last 1 hour” window @ 1000 elements/sec Space: “last 1 hour” window @ 1000 elements/sec Focus of much previous work: Focus of much previous work: Compute approximate statistics using limited space

7 June 16, 2004 7PODS Contributions Algorithms for computing approximate quantiles and approximate frequency counts over sliding windows Algorithms for computing approximate quantiles and approximate frequency counts over sliding windows Space requirement: Space requirement: є = error parameter є = error parameter N = size of the window N = size of the window Logarithmic in window size (N) Logarithmic in window size (N) (Almost) linear in (Almost) linear in poly-log (, N ) 1є1є 1є

8 June 16, 2004 8PODS Contributions over Previous Work Frequency counts: First known algorithm for sliding window model Frequency counts: First known algorithm for sliding window model Quantiles: Improves over [ LLXY `04 ] Quantiles: Improves over [ LLXY `04 ] [LLXY `04] space: [LLXY `04] space: Quadratic in Quadratic in 1 є2 ( ) poly-log (, N ) 1 є 1є

9 June 16, 2004 9PODS Rest of the Talk Formal problem specification Formal problem specification Sliding windows Sliding windows (Approximate) frequency counts (Approximate) frequency counts Our algorithms Our algorithms Fixed-size sliding windows Fixed-size sliding windows Variable-size sliding windows Variable-size sliding windows Frequency Counts only, for Quantiles see paper

10 June 16, 2004 10PODS Sliding Windows Two abstract window models Two abstract window models Fixed-size sliding windows Fixed-size sliding windows Row-based windows Row-based windows Variable-size sliding windows Variable-size sliding windows Time-based windows, shared windows Time-based windows, shared windows

11 June 16, 2004 11PODS Fixed-Size Sliding Windows 2152163511204251161385211927 time Window size (N) = 5

12 June 16, 2004 12PODS Fixed-Size Sliding Windows 2152163511204251161385211927 time Window size (N) = 5

13 June 16, 2004 13PODS Fixed-Size Sliding Windows 2152163511204251161385211927 time Window size (N) = 5

14 June 16, 2004 14PODS Fixed-Size Sliding Windows 2152163511204251161385211927 time Window size (N) = 5

15 June 16, 2004 15PODS Variable-Size Sliding Windows 2152163511204251161385211927 time Window size (N) = 5

16 June 16, 2004 16PODS Variable-Size Sliding Windows 2152163511204251161385211927 time Window size (N) = 6

17 June 16, 2004 17PODS Variable-Size Sliding Windows 2152163511204251161385211927 time Window size (N) = 7

18 June 16, 2004 18PODS Variable-Size Sliding Windows 2152163511204251161385211927 time Window size (N) = 6

19 June 16, 2004 19PODS Variable-Size Sliding Windows 2152163511204251161385211927 time Window size (N) = 5

20 June 16, 2004 20PODS Variable-Size Sliding Windows 2152163511204251161385211927 time Window size (N) = 4

21 June 16, 2004 21PODS Variable-Size Sliding Windows 2152163511204251161385211927 time Window size (N) = 3

22 June 16, 2004 22PODS Frequency Counts 15132173141353133917 ElementCount 17 36 5 2 72 91 21 41 Select Element, Count(*) From Multiset Group by Element

23 June 16, 2004 23PODS Approximate Frequency Counts Elements and their approximate counts Elements and their approximate counts Approximate Count : Approximate Count : True Count – є M < Approximate Count ≤ True Count True Count – є M < Approximate Count ≤ True Count Error parameter: є Error parameter: є Size of input: M Size of input: M Only elements with Approximate Count > 0 Only elements with Approximate Count > 0 References: [MG ’82, DLM ’02, MM ’02, KSP ’03] References: [MG ’82, DLM ’02, MM ’02, KSP ’03]

24 June 16, 2004 24PODS Approximate Frequency Counts 15132173141353133917 Input Size: M = 20 ElementTrue Count Error Error parameter: є = 0.25 Absolute error: є M = 5 Approx. Count 4 2 2 1 0 1 3 4 5 7 9 2 7 6 1 2 2 1 1 3 4 0 0 1 1 01 1

25 June 16, 2004 25PODS Approximate Frequency Counts 15132173141353133917 Input Size: M = 20 ElementTrue Count Error Approx. Count 4 2 0 2 1 1 3 4 5 7 9 2 7 6 1 2 2 1 1 3 4 0 0 0 0 01 1 Error parameter: є = 0.25 Absolute error: є M = 5

26 June 16, 2004 26PODS Approximate Frequency Counts All elements with frequency ≥ єM appear in the output. All elements with frequency ≥ єM appear in the output. There exists an output with ≤ elements. There exists an output with ≤ elements. Theorem: An approximate frequency count of size O( ) can be produced in one pass over the input using O( ) space. Theorem: An approximate frequency count of size O( ) can be produced in one pass over the input using O( ) space. References: [MG ’82, DLM ’02, KSP ’03] References: [MG ’82, DLM ’02, KSP ’03] 1 є 1є 1 є

27 June 16, 2004 27PODS Rest of the Talk Formal problem specification Formal problem specification Sliding windows Sliding windows (Approximate) frequency counts (Approximate) frequency counts Our algorithms Our algorithms Fixed-size sliding windows Fixed-size sliding windows Variable-size sliding windows Variable-size sliding windows Frequency Counts only, for Quantiles see paper

28 June 16, 2004 28PODS Fixed-Size Sliding Windows Window Size: N Window Size: N Error parameter: є Error parameter: є Absolute error: є N Absolute error: є N

29 June 16, 2004 29PODS Overview N

30 June 16, 2004 30PODS Overview N

31 June 16, 2004 31PODS Overview N

32 June 16, 2004 32PODS Overview N

33 June 16, 2004 33PODS Overview N

34 June 16, 2004 34PODS Overview N

35 June 16, 2004 35PODS Overview N

36 June 16, 2004 36PODS Overview N

37 June 16, 2004 37PODS Overview N

38 June 16, 2004 38PODS Details N єNєN 4 1 є log ( ) є 1 є 0 є 2 = O(єN)

39 June 16, 2004 39PODS Error Invariant Absolute error of all blocks identical є i N i єNєN 1 є log ( ) = є i Error parameter for block N i Number of elements in block

40 June 16, 2004 40PODS Merge Operation N

41 Block 1Block 2Block1 + Block2 є 2 N 2 ˜ f 2 < - f 2 f 2 f 1 + () є 1 N 1 є 2 N 2 ( + )< ˜ f 1 ˜ f 2 + + f 1 f 1 f 2 f 2 ˜ f 2 ˜ f 1 ˜ f 1 ˜ f 2 + є 1 N 1 ˜ f 1 < - f 1 - Add approximate counts of elements. Absolute error adds up. True count Approx. count ≤ f 1 ≤ f 2 ≤ f 2 f 1 + ()

42 June 16, 2004 42PODS Error Analysis N O(єN) log ( є ) єNєN 1 () O ( є ) 1 ++

43 June 16, 2004 43PODS Space Requirement N єNєN 4 1 є log ( ) є 1 є 0 є 2

44 June 16, 2004 44PODS Approximate Frequency Counts All elements with frequency ≥ єM appear in the output. All elements with frequency ≥ єM appear in the output. There exists an output with ≤ elements. There exists an output with ≤ elements. Theorem: An approximate frequency count of size O( ) can be produced in one pass over the input using O( ) space. Theorem: An approximate frequency count of size O( ) can be produced in one pass over the input using O( ) space. References: [MG ’82, DLM ’02, KSP ’03] References: [MG ’82, DLM ’02, KSP ’03] 1 є 1є 1 є

45 June 16, 2004 45PODS Space Requirement Space required for level-ℓ blocks: 1 є ℓ x N N ℓ Size of approx. count Number of “active” blocks N єN / log ( 1 є ) == 1 є 1 є () Total space : x log () 1 є 1 є 1 є () 2 = 1 є 1 є ()

46 June 16, 2004 46PODS Fixed-Size Sliding Windows: Summary Theorem: є-approximate frequency counts can be maintained over a fixed-size sliding window of size N using space. 1 є 1 є log () 2

47 June 16, 2004 47PODS Variable-Size Windows Error parameter: є Error parameter: є Variable window size: n Variable window size: n Variable absolute error: єn Variable absolute error: єn

48 June 16, 2004 48PODS Fixed-Size Window Algorithm? єNєN 4 1 є log ( ) є 1 є 0 є 2 N

49 June 16, 2004 49PODS Fixed-Size Window Algorithm? F (є, N) єNєN n n error parameter = N

50 June 16, 2004 50PODS Limited Variability F(є/2, N) computes є-approximate frequency counts for window sizes (N/2 ≤ n ≤ N). F(є/2, N) computes є-approximate frequency counts for window sizes (N/2 ≤ n ≤ N).

51 June 16, 2004 51PODS Variable-Size Windows time n F(є/2, N) F(є/2, N/2) F(є/2, 2/є) log (єn) N = 2 ≥ n > N/2 p

52 June 16, 2004 52PODS Variable-Size Windows time F(є/2, N) F(є/2, N/2) F(є/2, 2/є) n

53 June 16, 2004 53PODS Variable-Size Windows time F(є/2, N) F(є/2, N/2) F(є/2, 2/є) n

54 June 16, 2004 54PODS Variable-Size Windows time F(є/2, N) F(є/2, N/2) F(є/2, 2/є) n

55 June 16, 2004 55PODS Variable-Size Windows time F(є/2, N/2) F(є/2, 2/є) n

56 June 16, 2004 56PODS Variable-Size Windows time F(є/2, N/2) F(є/2, 2/є) n

57 June 16, 2004 57PODS Variable-Size Windows time F(є/2, N/2) F(є/2, 2/є) n

58 June 16, 2004 58PODS Variable-Size Windows time F(є/2, N/2) F(є/2, 2/є) n F(є/2, N)

59 June 16, 2004 59PODS Variable-Size Windows: Summary Theorem: є-approximate frequency counts can be maintained over variable-size windows using 1 є 1 є log () 2 log (є n) space, where n is the current size of the sliding window.

60 June 16, 2004 60PODS See Paper for … Randomized algorithms for frequency counts Randomized algorithms for frequency counts Deterministic and randomized algorithms for quantiles Deterministic and randomized algorithms for quantiles A general technique for variable-size window algorithms. A general technique for variable-size window algorithms. Converts fixed-size window algorithms to variable- size window algorithms Converts fixed-size window algorithms to variable- size window algorithms Works for Sum, Bit-Count Works for Sum, Bit-Count

61 June 16, 2004 61PODS References used in Talk [DLM ’02]: E. D. Demaine, A. Lopez-Ortiz, and J.I. Munro. Frequency estimation of internet packet streams with limited space. ESA 2002. [DLM ’02]: E. D. Demaine, A. Lopez-Ortiz, and J.I. Munro. Frequency estimation of internet packet streams with limited space. ESA 2002. [KSP ’03]: R. M. Karp, S. Shenker, and C. H. Papadimitriou. A simple algorithm for finding frequent elements in streams and bags. TODS 2003. [KSP ’03]: R. M. Karp, S. Shenker, and C. H. Papadimitriou. A simple algorithm for finding frequent elements in streams and bags. TODS 2003. [LLXY ’04]: X. Lin, H. Lu, J. Xu, and Y. X. Yu. Continuously maintaining quantile summaries of the most recent N elements over a data stream. ICDE 2004. [LLXY ’04]: X. Lin, H. Lu, J. Xu, and Y. X. Yu. Continuously maintaining quantile summaries of the most recent N elements over a data stream. ICDE 2004. [MG ’82]: J. Misra, D. Gries. Finding repeated elements. Sci. Comput. Programming. 1982. [MG ’82]: J. Misra, D. Gries. Finding repeated elements. Sci. Comput. Programming. 1982. [MM ’02]: G. S. Manku, R. Motwani. Approximate frequency counts over data streams. VLDB 2002. [MM ’02]: G. S. Manku, R. Motwani. Approximate frequency counts over data streams. VLDB 2002.


Download ppt "June 16, 2004 PODS 1 Approximate Counts and Quantiles over Sliding Windows Arvind Arasu, Gurmeet Singh Manku Stanford University."

Similar presentations


Ads by Google