Download presentation

Presentation is loading. Please wait.

Published byAlejandro Maynard Modified over 4 years ago

1
Optimal Space Lower Bounds for All Frequency Moments David Woodruff MIT dpwood@mit.edu

2
The Streaming Model 0113734 … Stream of elements a 1, …, a q each in {1, …, m} Want to compute statistics on stream Elements arranged in adversarial order Algorithms given one pass over stream Goal: Minimum space algorithm

3
Frequency Moments q = stream size, m = universe size f i = # occurrences of item i Define k-th Frequency Moment: Applications F_0 = # distinct elements in stream, F_1 = q F_2 = repeat rate Compute self-joins in database

4
The Best Determininistic Algorithm Trivial Algorithm for F k Store/update frequency f i of each item i Space: m items i, log q bits for each f i Total Space = O(m log q) Negative Result [AMS96]: Any algorithm computing F k exactly must use (m) space. Can we do better?

5
Approximating F k Negative Result [AMS96]: Any deterministic algorithm that outputs x with |F k – x| < F k must use (m) space. What about randomized approximation algorithms? Randomized algorithm A -approximates F k if A outputs x with Pr[|F k – x| 2/3

6
Previous Work Upper Bounds: Can -approximate F 0 [BJKST02], F 2 [AMS96], F k [CK04], k > 2 with space respectively: Lower Bounds: [AMS96] 8 k, –approximating F k need (log m) space [IW03] -approximating F 0 requires space if Questions: Does the bound hold for k 0? Does it hold for F 0 for smaller ?

7
First Result Optimal Lower Bound: 8 k 1 and any = (m -1/2 ), any -approximator for F k must use ( -2 ) bits of space. F 1 = q computed trivially in log q space F k computed in O(m log q) space, so need = (m -.5 ) Technique: Reduction from 2-party protocol for computing Hamming distance (x,y)

8
Idea Behind Lower Bounds x 2 {0,1} m y 2 {0,1} m Stream s(x) Stream s(y) (1 § ) F k algorithm A Internal state of A Compute (1 § ) F k (s(x) ± s(y)) w.p. > 2/3 Idea: If can decide f(x,y) w.p. > 2/3, space used by A at least randomized 1-way comm. Complexity of f(,) S AliceBob

9
Randomized 1-way comm. complexity Boolean function f: X £ Y ! {0,1} Alice has x 2 X, Bob y 2 Y. Bob wants f(x,y) Only 1 message sent: must be from Alice to Bob Comm. cost of protocol = expected length of longest message sent over all inputs. -error randomized 1-way comm. complexity of f, R (f), is comm. cost of optimal protocol computing f w.p. ¸ 1- How do we lower bound R (f)?

10
The VC Dimension [KNR] F = {f : X ! {0,1}} family of Boolean functions f 2 F is length-|X | bitstring For S µ X, shatter coefficient SC(f S ) of S is |{f | S } f 2 F | = # distinct bitstrings when F restricted to S SC(F, p) = max S 2 X, |S| = p SC(f S ) If SC(f S ) = 2 |S|, S shattered by F VC Dimension of F, VCD(F), = size of largest S shattered by F

11
Shatter Coefficient Theorem Notation: For f: X £ Y ! {0,1}, define: f X = { f x (y) : Y ! {0,1} | x 2 X }, where f x (y) = f(x,y) Theorem [BJKS]: For every f: X £ Y ! {0,1}, every p ¸ VCD( f X ), R 1/3 (f) = (log(SC(f X, p)))

12
Hamming Distance Decision Problem (HDDP) We will lower bound R 1/3 (f) via SC(f X, t), but first, a critical lemma… Set t = (1/ 2 ) x 2 {0,1} t y 2 {0,1} t AliceBob Promise Problem : (x,y) · t/2 – t 1/2 (x,y) > t/2 f(x,y) = 0 OR f(x,y) = 1

13
Main Lemma S µ{0,1} n y = T = S-T Show 9 S µ {0,1} n with |S| = n s.t. there exists 2 (n) good sets T µ S so that: 9 a separator y 2 {0,1} n s.t 1. 8 t 2 T, (y, t) · n/2 – cn 1/2 for some c > 0 2. 8 t 2 S – T, (y,t) > n/2

14
Lemma Solves HDDP Complexity Theorem: R 1/3 (f) = (t) = ( -2 ). Proof: 1. Alice gets y T for random good set T applying main lemma with n = t. 2. Bob gets random s 2 S 3. Let f: {y T } T £ S ! {0,1}. 4. Main Lemma =>SC(f) = 2 (t) 5. [BJKS] => R 1/3 (f) = (t) = ( -2 )

15
Back to Frequency Moments Idea: Use -approximator for F k in a protocol to solve HDDP y 2 {0,1} t s 2 S µ {0,1} t F k Alg State ayay asas ith universe element included exactly once in auxiliary stream a y (resp. a s ) if and only if y i (resp. s i ) = 1.

16
Solving HDDP with F k Alice/Bob compute -approx to F k (a y ± a s ) F k (a y ± a s ) = 2 k wt(y Æ s) + 1 k (y,s) For k 1, Conclusion: -approximating F k (a y ± a s ) decides HDDP, so space for F k is (t) = ( -2 ) Alice also transmits wt(y) in log m space.

17
But How to Prove Main Lemma? Recall: show 9 S µ {0,1} n with |S| = n s.t. there exists 2 (n) sets T µ S so that: 9 a separator y 2 {0,1} n s.t 1) 8 t 2 T, (y, t) · n/2 – cn 1/2 for some c > 0 2) 8 t 2 S – T, (y,t) > n/2 Use probabilistic method For S, choose n random elts in {0,1} n Show probability arbitrary T µ S satisfies (1),(2) is > 2 -zn for constant z < 1. Hence expected such T is 2 (n) So exists S with 2 (n) such T Key

18
Proving the Main Lemma Let T ={t 1, …, t n/2 } µ S be arbitrary Let y i = majority(t 1,i,..., t n/2,i ) for all i 2 [m] What is probability p that both: 1) 8 t 2 T, (y, t) · n/2 – cn 1/2 for some c > 0 2) 8 t 2 S – T, (y,t) > n/2 For 1, let x = Pr[8 t 2 T, (y,t) · n/2 – cn.5 ] For 2, let y = Pr[8 t 2 S-T, (y,t) > n/2] = 2 -n/2 By independence, p = x ¢ y. It remains to lower bound x…

19
The Matrix Problem WLOG, assume y = 1 n (recall y is majority word) Want lower bound Pr[8 t 2 T, (y,t) · n/2 – cn.5 ] Equivalent to matrix problem: t1 -> t2 -> … t n/2 -> 101001000101111001 100101011100011110 001110111101010101 101010111011100011 Given random n/2 x n binary matrix w/each column majority 1, what is probablity each row has at least n/2 + cn.5 1s?

20
Bipartite Graphs Matrix Problem Bipartite Graph Counting Problem: How many bipartite graphs exist on n/2 by n vertices s.t. each left vertex has degree > n/2 + cn.5 and each right vertex degree > n/2? ……

21
Second Result Bipartite graph count: Probabilistic argument shows at least 2 n^2/2 – zn/2 –n such bipartite graphs for constant z < 1. Analysis generalizes to show # bipartite graphs on m + n vertices w/each left vertex having degree > n/2 and each right vertex degree > m/2 is > 2 mn-zm-n. Previous known count: 2 mn-m-n [MW – personal comm.] Follows easily from a correlation inequality of Kleitman. Our proof uses correlation inequalities, but more involved analysis.

22
Summary Results: Optimal Lower Bound: 8 k 1 and any = (m -1/2 ), any -approximator for F k must use ( -2 ) bits of space. Bipartite Graph Count: # bipartite graphs on m + n vertices w/each left vertex having degree > n/2 and each right vertex degree > m/2 is at least 2 mn-zm-n for constant z < 1.

Similar presentations

Presentation is loading. Please wait....

OK

Estimating Distinct Elements, Optimally

Estimating Distinct Elements, Optimally

© 2018 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google