 # Tight Bounds for Distributed Functional Monitoring David Woodruff IBM Almaden Qin Zhang Aarhus University MADALGO.

## Presentation on theme: "Tight Bounds for Distributed Functional Monitoring David Woodruff IBM Almaden Qin Zhang Aarhus University MADALGO."— Presentation transcript:

Tight Bounds for Distributed Functional Monitoring David Woodruff IBM Almaden Qin Zhang Aarhus University MADALGO

Distributed Functional Monitoring C P1P1 P2P2 P3P3 PkPk … coordinator time sites Static case vs. Dynamic case Problems on x 1 + x 2 + … + x k : sampling, p-norms, heavy hitters, compressed sensing, quantiles, entropy Authors: Can, Cormode, Huang, Muthukrishnan, Patt-Shamir, Shafrir, Tirthapura, Wang, Yi, Zhao, many others Communication x1x1 x2x2 x3x3 xkxk inputs: Updates: x i Ã x i + e j Updates: x i Ã x i + e j

Motivation Data distributed and stored in the cloud –Impractical to put data on a single device Sensor networks –Communication very power-intensive Network routers –Bandwidth limitations

Problems Which functions f(x 1, …, x k ) do we care about? x 1, …, x k are non-negative length-n vectors x = i=1 k x i f(x 1, …, x k ) = |x| p = ( i=1 n x i p ) 1/p |x| 0 is the number of non-zero coordinates What is the randomized communication cost of these problems? I.e., the minimal cost of a protocol, which for every input, fails with probability < 1/3 Static case, Dynamic Case What is the randomized communication cost of these problems? I.e., the minimal cost of a protocol, which for every input, fails with probability < 1/3 Static case, Dynamic Case

Exact Answers An (n) communication bound for computing |x| p, p 1 Reduction from 2-Player Set-Disjointness (DISJ) Alice has a set S µ [n] of size n/4 Bob has a set T µ [n] of size n/4 with either |S Å T| = 0 or |S Å T| = 1 Is S Å T = ; ? |X Å Y| = 1 ! DISJ(X,Y) = 1, |X Å Y| = 0 ! DISJ(X,Y) = 0 [KS, R] (n) communication Prohibitive for applications

Approximate Answers f(x 1, …, x k ) = (1 ± ε) |x | p What is the randomized communication cost as a function of k, ε, and n? Ignore log(nk/ε) factors

Previous Results Lower bounds in static model, upper bounds in dynamic model (underlying vectors are non-negative) |x| 0 : (k + ε -2 ) and O(k ¢ ε -2 ) |x| p : (k + ε -2 ) |x| 2 : O(k 2 /ε + k 1.5 /ε 3 ) |x| p, p > 2: O(k 2p+1 n 1-2/p ¢ poly(1/ε))

Our Results Lower bounds in static model, upper bounds in dynamic model (underlying vectors are non-negative) |x| 0 : (k + ε -2 ) and O(k ¢ ε -2 ) (k ¢ ε -2 ) |x| p : (k + ε -2 ) (k p-1 ¢ ε -2 ). Talk will focus on p = 2 |x| 2 : O(k 2 /ε + k 1.5 /ε 3 ) O(k ¢ poly(1/ε)) |x| p, p > 2: O(k 2p+1 n 1-2/p ¢ poly(1/ε)) O(k p-1 ¢ poly(1/ε)) First lower bounds to depend on product of k and ε - 2 Upper bound doesnt depend polynomially on n

Talk Outline Lower Bounds –Non-zero elements –Euclidean norm Upper Bounds –p-norm

Previous Lower Bounds Lower bounds for any p-norm, p != 1 [CMY] (k) [ABC] (ε -2 ) Reduction from Gap-Orthogonality (GAP-ORT) Alice, Bob have u, v 2 {0,1} ε -2, respectively | ¢ (u, v) – 1/(2ε 2 )| 2/ε [CR, S] (ε -2 ) communication

Talk Outline Lower Bounds –Non-zero elements –Euclidean norm Upper Bounds –p-norm

Lower Bound for Distinct Elements Improve bound to optimal (k ¢ ε -2 ) Simpler problem: k-GAP-THRESH –Each site P i holds a bit Z i –Z i are i.i.d. Bernoulli( ¯ ) –Decide if i=1 k Z i > ¯ k + ( ¯ k) 1/2 or i=1 k Z i < ¯ k - ( ¯ k) 1/2 Otherwise dont care Rectangle property: for any correct protocol transcript ¿, Z 1, Z 2, …, Z k are independent conditioned on ¿

A Key Lemma Lemma: For any protocol ¦ which succeeds w.pr. >.9999, the transcript ¿ is such that w.pr. > 1/2, for at least k/2 different i, H(Z i | ¿ ) < H(.01 ¯ ) Proof: Suppose ¿ does not satisfy this –With large probability, ¯ k - O( ¯ k) 1/2 i=1 k Z i | ¿ ] < ¯ k + O( ¯ k) 1/2 –Since the Z i are independent given ¿, i=1 k Z i | ¿ is a sum of independent Bernoullis –Since most H(Z i | ¿ ) are large, by anti-concentration, both events occur with constant probability: i=1 k Z i | ¿ > ¯ k + ( ¯ k) 1/2, i=1 k Z i | ¿ < ¯ k - ( ¯ k) 1/2 So ¦ cant succeed with large probability

Composition Idea C P1P1 P2P2 P3P3 PkPk … Z3Z3 Z2Z2 Z1Z1 ZkZk The input to P i in k-GAP-THRESH, denoted Z i, is the output of a 2-party Disjointness (DISJ) instance between C and S i - Let X be a random set of size 1/(4ε 2 ) from {1, 2, …, 1/ε 2 } - For each i, if Z i = 1, then choose Y i so that DISJ(X, Y i ) = 1, else choose Y i so that DISJ(X, Y i ) = 0 - Distributional complexity (1/ε 2 ) [Razborov] DISJ Can think of C as a player

Putting it All Together Key Lemma ! For most i, H(Z i | ¿ ) < H(.01 ¯ ) Since H(Z i ) = H( ¯ ) for all i, for most i protocol ¦ solves DISJ(X, Y i ) with constant probability Since the Z i | ¿ are independent, solving DISJ requires communication (ε -2 ) on each of k/2 copies Total communication is (k ¢ ε -2 ) Can show a reduction: –|x| 0 > 1/(2ε 2 ) + 1/ε if i=1 k Z i > ¯ k + ( ¯ k) 1/2 –|x| 0 < 1/(2ε 2 ) - 1/ε if i=1 k Z i < ¯ k - ( ¯ k) 1/2

Talk Outline Lower Bounds –Non-zero elements –Euclidean norm Upper Bounds –p-norm

Lower Bound for Euclidean Norm Improve (k + ε - ) bound to optimal (k ¢ ε -2 ) Base problem: Gap-Orthogonality (GAP-ORT(X, Y)) –Consider uniform distribution on (X,Y) We observe information lower bound for GAP-ORT Sherstovs lower bound for GAP-ORT holds for uniform distribution on (X,Y) [BBCR] + [Sherstov] ! for any protocol ¦ and t > 0, I(X, Y; ¦ ) = (1/(ε 2 log t)) or ¦ uses t communication

Information Implications By chain rule, I(X, Y ; ¦ ) = i=1 1/ε 2 I(X i, Y i ; ¦ | X < i, Y < i ) = (ε -2 ) For most i, I(X i, Y i ; ¦ | X < i, Y < i ) = (1) Maximum Likelihood Principle: non-trivial advantage in guessing (X i, Y i )

2-BIT k-Party DISJ Choose a random j 2 [k 2 ] –j doesnt occur in any T i –j occurs only in T 1, …, T k/2 –j occurs only in T k/, …, T k –j occurs in T 1, …, T k All j j occur in at most one set T i (assume k ¸ 4) We show (k) information cost P1P1 P2P2 …PkPk P3P3 T1T1 T2T2 T3T3 T k 2 [k 2 ] We compose GAP-ORT with a variant of k-Party DISJ

Rough Composition Idea 2-BIT k-party DISJ instance … { 1/ε 2 Show (k/ε 2 ) overall information is revealed Bits X i and Y i in GAP- ORT determine output of i-th 2-BIT k-party DISJ instance An algorithm for approximating Euclidean norm solves GAP-ORT, therefore solves most 2-BIT k-party DISJ instances GAP -ORT - Information adds (if we condition on enough helper variables) - P i participates in all instances - Information adds (if we condition on enough helper variables) - P i participates in all instances

Talk Outline Lower Bounds –Non-zero elements –Euclidean norm Upper Bounds –p-norm

Algorithm for p-norm We get k p-1 poly(1/ε), improving k 2p+1 n 1-2/p poly(1/ε) for general p and O(k 2 /ε + k 1.5 /ε 3 ) for p = 2 Our protocol is the first 1-way protocol, that is, all communication is from sites to coordinator Focus on Euclidean norm (p = 2) in talk Non-negative vectors Just determine if Euclidean norm exceeds a threshold θ

The Most Naïve Thing to Do x i is Site is current vector x = i=1 k x i Suppose Site i sees an update x i Ã x i + e j Send j to Coordinator with a certain probability that only depends on k and θ?

Sample and Send P1P1 P2P2 …PkPk P3P3 C 1…10…00…0………0…01…10…00…0………0…0 0…01…10…0………0…00…01…10…0………0…0 0…00…01…1………0…00…00…01…1………0…0 ……………………………………………………………………………… 0…00…00…0………1…10…00…00…0………1…1 |x| 2 = k 2 { k |x| 2 = 2k 2 1 1 1 1 1 Send each update with probability at least 1/k Communication = O(k), so okay Send each update with probability at least 1/k Communication = O(k), so okay Suppose x has k 4 coordinates that are 1, and may have a unique coordinate which is k 2, occurring k times on each site - Send update with probability 1/k 2 - Will find the large coordinate - But communication is (k 2 ) - Send update with probability 1/k 2 - Will find the large coordinate - But communication is (k 2 )

What Is Happening? Sampling with probability ¼ 1/k 2 is good to get a few samples from heavy item But all the light coordinates are in the way, making the communication (k 2 ) Suppose we put a barrier of k, that is, sample with probability ¼ 1/k 2 but only send an item if it has occurred at least k times on a site Now communication is O(1) and found heavy coordinate But light coordinates also contribute to overall |x| 2 value

Sample at different scales with different barriers Use public coin to create O(log n) groups T 1, …, T log n of the n input coordinates T z contains n/2 z random coordinates Suppose Site i sees the update x i Ã x i + e j For each T z containing j If x i j > (θ/2 z ) 1/2 /k then with probability (2 z /θ) 1/2 ¢ poly(ε -1 log n), send (j, z) to the coordinator Algorithm for Euclidean Norm Expected communication O~(k) If a group of coordinates contributes to |x| 2, there is a z for which a few coordinates in the group are sampled multiple times

Conclusions Improved communication lower and upper bounds for estimating |x| p Implies tight lower bounds for estimating entropy, heavy hitters, quantiles Implications for data stream model –First lower bound for |x| 0 without Gap-Hamming –Useful information cost lower bound for Gap-Hamming, or protocol has very large communication –Improve (n 1-2/p /ε 2/p ) bound for estimating |x| p in a stream to (n 1-2/p /ε 4/p )

Similar presentations