Presentation is loading. Please wait.

Presentation is loading. Please wait.

Information Complexity Lower Bounds for Data Streams David Woodruff IBM Almaden.

Similar presentations


Presentation on theme: "Information Complexity Lower Bounds for Data Streams David Woodruff IBM Almaden."— Presentation transcript:

1 Information Complexity Lower Bounds for Data Streams David Woodruff IBM Almaden

2 Last time: 1-Way Communication of Index Alice has x 2 {0,1} n Bob has i 2 [n] Alice sends a (randomized) message M to Bob I(M ; X) = sum i I(M ; X i | X < i ) ¸ sum i I(M; X i ) = n – sum i H(X i | M) Fano: H(X i | M) 1- δ CC δ (Index) ¸ I(M ; X) ¸ n(1-H(δ)) The same lower bound applies if the protocol is only correct on average over x and i drawn independently from a uniform distribution

3 Distributional Communication Complexity X Y f(X,Y)?

4 Indexing is Universal for Product Distributions [Kremer, Nisan, Ron] [ [ 0 0 1 1 0 0 0 0 0 0 1 0 0 0 1 1 1 0

5 Indexing with Low Error Index Problem with 1/3 error probability and 0 error probability both have £ (n) communication In some applications want lower bounds in terms of error probability Indexing on Large Alphabets: –Alice has x 2 {0,1} n/δ with wt(x) = n, Bob has i 2 [n/δ] –Bob wants to decide if x i = 1 with error probability δ –[Jayram, W] 1-way communication is  (n log(1/δ))

6 Beyond Product Distributions

7 Non-Product Distributions Needed for stronger lower bounds Example: approximate |x| 1 up to a multiplicative factor of B in a stream –Lower bounds for heavy hitters, p-norms, etc. Promise: |x-y| 1 · 1 or |x-y| 1 ¸ B Hard distribution non-product  (n/B 2 ) lower bound [Saks, Sun] [Bar-Yossef, Jayram, Kumar, Sivakumar] x 2 {0, …, B} n y 2 {0, …, B} n Gap 1 (x,y) Problem

8 Direct Sums Gap 1 (x,y) doesn’t have a hard product distribution, but has a hard distribution μ = λ n in which the coordinate pairs (x 1, y 1 ), …, (x n, y n ) are independent –w.pr. 1-1/n, (x i, y i ) random subject to |x i – y i | · 1 –w.pr. 1/n, (x i, y i ) random subject to |x i – y i | ¸ B Direct Sum: solving Gap 1 (x,y) requires solving n single- coordinate sub-problems g In g, Alice and Bob have J,K 2 {0, …, B}, and want to decide if |J-K| · 1 or |J-K| ¸ B

9 Direct Sum Theorem Let M be the message from Alice to Bob For X, Y » μ, I(M ; X, Y) = H(X,Y) – H(X,Y | M ) is the information cost of the protocol [BJKS]: ?!?!?!?! why not measure I(M ; X, Y) when (X,Y) satisfy |X-Y| 1 · 1? –Is I(M ; X, Y) large? –Let us go back to protocols correct on each X, Y w.h.p. Redefine μ = λ n, where (X i, Y i ) » λ is random subject to |X i -Y i | · 1 IC(g) = inf ψ I(ψ ; J, K), where ψ ranges over all 2/3-correct 1-way protocols for g, and J,K » ¸ Is I(M ; X, Y) =  (n) ¢ IC(g)?

10 The Embedding Step I(M ; X, Y) ¸  i I(M ; X i, Y i ) We need to show I(M ; X i, Y i ) ¸ IC(g) for each i J K Alice i-th coordinate Bob X Y J K Suppose Alice and Bob could fill in the remaining coordinates j of X, Y so that (X j, Y j ) » λ Then we get a correct protocol for g!

11 Conditional Information Cost (X j, Y j ) » λ is not a product distribution [BJKS] Define D = ((P 1, V 1 )…, (P n, V n )): –P j uniform on {Alice, Bob} –V j uniform on {1, …, B} if P j = Alice –V j uniform on {0, …, B-1} if P j = Bob –If P j = Alice, then Y j = V j and X j is uniform on {V j -1, V j } –If P j = Bob, then X j = V j and Y j is uniform on {V j, V j +1} X and Y are independent conditioned on D! I(M ; X, Y | D) =  (n) ¢ IC(g | (P,V)) holds now! IC(g | (P,V)) = inf ψ I(ψ ; J, K | (P,V)), where ψ ranges over all 2/3-correct protocols for g, and J,K » ¸

12 Primitive Problem Need to lower bound IC(g | (P,V)) For fixed P = Alice and V = v, this is I(ψ ; K) where K is uniform over v-1, v From previous lectures: I(ψ ; K) ¸ D JS (ψ v-1,v, ψ v, v ) IC(g | (P,V)) ¸ E v [D JS (ψ v-1,v, ψ v, v ) + D JS (ψ v,v, ψ v,v+1 )]/2 Forget about distributions, let’s move to unit vectors!

13 Hellinger Distance

14 Pythagorean Property

15 Lower Bounding the Primitive Problem

16 Direct Sum Wrapup  (n/B 2 ) bound for Gap 1 (x,y) Similar argument gives  (n) bound for disjointness [BJKS] [MYW] Sometimes can “beat” a direct sum: solving all n copies simultaneously with constant probability as hard as solving each copy with probability 1-1/n –E.g., 1-way communication complexity of Equality Direct sums are nice, but often a problem can’t be split into simpler smaller problems, e.g., no known embedding step in gap-Hamming


Download ppt "Information Complexity Lower Bounds for Data Streams David Woodruff IBM Almaden."

Similar presentations


Ads by Google