Download presentation

Presentation is loading. Please wait.

Published byHannah Pearson Modified over 4 years ago

1
Tight Bounds for Distributed Functional Monitoring David Woodruff IBM Almaden Qin Zhang Aarhus University MADALGO Based on a paper in STOC, 2012

2
k-party Number-In-Hand Model P1P1 P2P2 P3P3 PkPk P4P4 … x1x1 x2x2 x3x3 x4x4 xkxk Goals: - compute a function f(x 1, …, x k ) - minimize communication complexity -Player to player communication - Protocol transcript always determines who speaks next -Player to player communication - Protocol transcript always determines who speaks next

3
k-party Number-In-Hand Model C P1P1 P2P2 P3P3 PkPk … Convenient to introduce a coordinator C All communication goes through the coordinator Communication only affected by a factor of 2 x1x1 x2x2 x3x3 xkxk

4
Model Motivation Data distributed and stored in the cloud –Impractical to put data on a single device Sensor networks –Communication is power-intensive Network routers –Bandwidth limitations Distributed functional monitoring Authors: Can, Cormode, Huang, Muthukrishnan, Patt- Shamir, Shafrir, Tirthapura, Wang, Yi, Zhao, …

5
k-Party Number-In-Hand Model C P1P1 P2P2 P3P3 PkPk … x1x1 x2x2 x3x3 xkxk Which functions do we care about? - 8 i, x i 2 {0,1, … n} n - x = x 1 + x 2 + … + x k - f(x) = |x| p = (Σ i x i p ) 1/p - |x| 0 is number of non-zero coordinates - Talk will focus on |x| 0 and |x| 2 For distributed databases: |x| 0 is number of distinct elements |x| 2 2 is known as self-join size |x| 2 useful for regression, low-rank approx For distributed databases: |x| 0 is number of distinct elements |x| 2 2 is known as self-join size |x| 2 useful for regression, low-rank approx Important for applications that the x i are non- negative

6
Randomized Communication Complexity What is the randomized communication cost of f? i.e., the minimal cost of a protocol, which for every set of inputs, fails in computing f with probability < 1/3 (n) cost for |x| 0 and |x| 2 Reduction from 2-Player Set-Disjointness (DISJ) Alice has a set S µ [n] Bob has a set T µ [n] Either |S Å T| = 0 or |S Å T| = 1 |S Å T| = 1 ! DISJ(S,T) = 1, |S Å T| = 0 ! DISJ(S,T) = 0 [KS, R] (n) communication Prohibitive

7
Approximate Answers Compute a relation with probability > 2/3: f(x) 2 (1 ± ε) |x| 0 f(x) 2 (1 ± ε) |x| 2 What is the randomized communication cost as a function of k, ε, and n? Will ignore log(nk/ε) factors Understanding dependence on ε is critical, e.g., ε<.01

8
Previous Results |x| 0 : (k + ε -2 ) and O(k ¢ ε -2 ) |x| 2 : (k + ε -2 ) and O(k ¢ ε -2 )

9
Our Results |x| 0 : (k + ε -2 ) and O(k ¢ ε -2 ) (k ¢ ε -2 ) |x| 2 : (k + ε -2 ) and O(k ¢ ε -2 ) (k ¢ ε -2 ) First lower bounds to depend on product of k and ε - 2 Implications for data streams: - First tight space lower bound for estimating number of distinct elements without using the Gap-Hamming Problem - Improves lower bound for estimation of |x| p, p > 2

10
Previous Lower Bounds Lower bounds for |x| 0 and |x| [CMY] (k) [ABC] (ε -2 ) Reduction from Gap-Orthogonality (GAP-ORT) P 1, P 2 have u, v 2 {0,1} ε -2, respectively | ¢ (u, v) – 1/(2ε 2 )| 2/ε [CR, S] (ε -2 ) communication

11
Talk Outline Lower Bounds –|x| 0 –|x| 2

12
Lower Bound for |x| 0 Improve bound to optimal (k ¢ ε -2 ) Study a simpler problem: k-GAP-THRESH –Each player P i holds a bit Z i –Z i are i.i.d. Bernoulli( ¯ ) –Decide if i=1 k Z i > ¯ k + ( ¯ k) 1/2 or i=1 k Z i < ¯ k - ( ¯ k) 1/2 Otherwise dont care Rectangle property: for any correct protocol transcript ¿, Z 1, Z 2, …, Z k are independent conditioned on ¿

13
Rectangle Property of Communication Let r be the randomness of C, P 1, …, P k For any fixed r, the set S of inputs giving rise to a transcript ¿ is a combinatorial rectangle: S = S 1 x S 2 x … x S k If input distribution is a product distribution, conditioned on ¿ and r, inputs are independent Since this holds for every r, inputs are independent conditioned on ¿

14
k-GAP-THRESH C P1P1 P2P2 P3P3 PkPk … Z1Z1 Z2Z2 Z3Z3 ZkZk The Z i are i.i.d. Bernoulli( ¯ ) Coordinator wants to decide if: i=1 k Z i > ¯ k + ( ¯ k) 1/2 or i=1 k Z i < ¯ k - ( ¯ k) 1/2 By independence of the Z i | ¿, equivalent to C having noisy independent copies of the Z i

15
A Key Lemma Lemma: For any protocol ¦ which succeeds w.pr. >.99, the transcript ¿ is such that w.pr. > 1/2, for at least k/2 different i, H(Z i | ¿ ) < H(.01 ¯ ) Proof: Suppose ¿ does not satisfy this –With large probability, ¯ k - O( ¯ k) 1/2 i=1 k Z i | ¿ ] < ¯ k + O( ¯ k) 1/2 –Since the Z i are independent given ¿, i=1 k Z i | ¿ is a sum of independent Bernoullis –Since most H(Z i | ¿ ) are large, by anti-concentration, both events occur with constant probability: i=1 k Z i | ¿ > ¯ k + ( ¯ k) 1/2, i=1 k Z i | ¿ < ¯ k - ( ¯ k) 1/2 So ¦ cant succeed with large probability

16
Composition Idea C P1P1 P2P2 P3P3 PkPk … Z3Z3 Z2Z2 Z1Z1 ZkZk The input to P i in k-GAP-THRESH, denoted Z i, is the output of a 2-party Disjointness (DISJ) instance between C and S i - Let S be a random set of size 1/(4ε 2 ) from {1, 2, …, 1/ε 2 } - For each i, if Z i = 1, then choose T i of size 1/(4ε 2 ) so that DISJ(S, T i ) = 1, else choose T i so that DISJ(S, T i ) = 0 - Distributional complexity of solving DISJ with probability 1- ¯ /100, when DISJ(S,T) = 1 with probability ¯, is (1/ε 2 ) [R] DISJ

17
Putting it All Together Key Lemma ! For most i, H(Z i | ¿ ) < H(.01 ¯ ) Since H(Z i ) = H( ¯ ) for all i, for most i protocol ¦ solves DISJ(X, Y i ) with probability ¸ 1- ¯ /100 For most i, the communication between C and P i is (ε -2 ) –Otherwise, C could simulate the other players without any communication and contradict lower bound for DISJ(X, Y i ) Total communication is (k ¢ ε -2 ) Can show a reduction to estimating |x| 0

18
Reduction to |x| 0 Think of C as a player Cs input vector x C is characteristic vector of the set [1/ε 2 ] \ S P i s input vector x i is characteristic vector of the set T i When |T i Å S| = 1, support of x = x C + i x i usually increases by 1 Choose ¯ = £ (1/(ε 2 k)) so that i=1 k Z i = ¯ k +- ( ¯ k) 1/2 = 1/ε 2 +- 1/ε

19
Talk Outline Lower Bounds –|x| 0 –|x| 2

20
Lower Bound for Euclidean Norm Improve (k + ε - ) bound to optimal (k ¢ ε -2 ) Use Gap-Orthogonality (GAP-ORT(X, Y)) –GAP-ORT(X,Y) = 1 –Alice, Bob have X, Y 2 {0,1} ε -2 –Decide: | ¢ (X, Y) – 1/(2ε 2 )| 2/ε –Consider uniform distribution on X,Y [KLLRX, CKW] For any protocol ¦ that solves GAP- ORT with constant probability, I(X, Y; ¦ ) = H(X,Y) – H(X,Y | ¦ ) = (1/ε 2 )

21
Information Implications By chain rule, I(X, Y ; ¦ ) = i=1 1/ε 2 I(X i, Y i ; ¦ | X < i, Y < i ) = (ε -2 ) For most i, I(X i, Y i ; ¦ | X < i, Y < i ) = (1)

22
XOR DISJ Choose random j 2 [n] and random S 2 {00, 10, 01, 11}: S = 00: j doesnt occur in any T i S = 10: j occurs only in T 1, …, T k/2 S = 01: j occurs only in T k/, …, T k S = 11: j occurs in T 1, …, T k Every j j occurs in at most one set T i Output equals 1 if S 2 {10, 01}, otherwise output is 0 I( ¦ ; T 1, …, T k | j, S, D) = (k) for any ¦ for which I( ¦ ; S) = (1) P1P1 P k/2+1 …PkPk P k/2 T1T1 T k/2+1 T k/2 T k µ [n] We compose GAP-ORT with a variant of k-Party DISJ … ……

23
GAP-ORT + XOR DISJ Take 1/ε 2 independent copies of XOR DISJ –T i = (T i 1, …, T i k ), j i, S i, D i are variables for i-th instance Is the number of outputs equal to 1 about 1/(2ε 2 ) +-1/ε or about 1/(2ε 2 ) +- 2/ε? XOR DISJ instance … { 1/ε 2

24
Intuitive Proof GAP-ORT is embedded inside of GAP-ORT + XOR DISJ Output is XOR of bits in S Implies for any correct protocol ¦ : For most i, I(S i ; ¦ | S < i ) = (1) Implies via a direct sum: For most i, I( ¦ ; T i | j, S, D, T < i ) = (k) Implies via the chain rule: I( ¦ ; T 1, …, T 1/ε 2 | j, S, D) = (k/ε 2 ) Implies communication is (k/ε 2 )

25
Conclusions Tight communication lower bounds for estimating |x| 0 and |x| 2 Techniques imply tight lower bounds for empirical entropy, heavy hitters, quantiles Other results: –Model in which the x i undergo poly(n) additive updates to their coordinates –Coordinator continually maintains (1+ε)-approximation –Improve k 2 /poly(ε) to k/poly(ε) communication for |x| 2

Similar presentations

OK

Turnstile Streaming Algorithms Might as Well Be Linear Sketches Yi Li Huy L. Nguyen David Woodruff.

Turnstile Streaming Algorithms Might as Well Be Linear Sketches Yi Li Huy L. Nguyen David Woodruff.

© 2018 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google