Tight Bounds for Distributed Functional Monitoring David Woodruff IBM Almaden Qin Zhang Aarhus University MADALGO.

Slides:

Advertisements

Similar presentations

Polylogarithmic Private Approximations and Efficient Matching

Advertisements

Estimating Distinct Elements, Optimally

Rectangle-Efficient Aggregation in Spatial Data Streams Srikanta Tirthapura David Woodruff Iowa State IBM Almaden.

Fast Moment Estimation in Data Streams in Optimal Space Daniel Kane, Jelani Nelson, Ely Porat, David Woodruff Harvard MIT Bar-Ilan IBM.

Optimal Approximations of the Frequency Moments of Data Streams Piotr Indyk David Woodruff.

1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.

The Data Stream Space Complexity of Cascaded Norms T.S. Jayram David Woodruff IBM Almaden.

Tight Bounds for Distributed Functional Monitoring David Woodruff IBM Almaden Qin Zhang Aarhus University MADALGO Based on a paper in STOC, 2012.

The Complexity of Linear Dependence Problems in Vector Spaces David Woodruff IBM Almaden Joint work with Arnab Bhattacharyya, Piotr Indyk, and Ning Xie.

Optimal Space Lower Bounds for All Frequency Moments David Woodruff MIT

Numerical Linear Algebra in the Streaming Model Ken Clarkson - IBM David Woodruff - IBM.

Optimal Space Lower Bounds for all Frequency Moments David Woodruff Based on SODA 04 paper.

The Average Case Complexity of Counting Distinct Elements David Woodruff IBM Almaden.

An Optimal Algorithm for the Distinct Elements Problem

Optimal Bounds for Johnson- Lindenstrauss Transforms and Streaming Problems with Sub- Constant Error T.S. Jayram David Woodruff IBM Almaden.

Numerical Linear Algebra in the Streaming Model

Efficient Private Approximation Protocols Piotr Indyk David Woodruff Work in progress.

Sublinear-time Algorithms for Machine Learning Ken Clarkson Elad Hazan David Woodruff IBM Almaden Technion IBM Almaden.

Xiaoming Sun Tsinghua University David Woodruff MIT

Tight Lower Bounds for the Distinct Elements Problem David Woodruff MIT Joint work with Piotr Indyk.

Subspace Embeddings for the L1 norm with Applications Christian Sohler David Woodruff TU Dortmund IBM Almaden.

Truthful Mechanisms for Combinatorial Auctions with Subadditive Bidders Speaker: Shahar Dobzinski Based on joint works with Noam Nisan & Michael Schapira.

An Introduction to Randomness Extractors Ronen Shaltiel University of Haifa Daddy, how do computers get random bits?

Why Simple Hash Functions Work : Exploiting the Entropy in a Data Stream Michael Mitzenmacher Salil Vadhan And improvements with Kai-Min Chung.

Sublinear Algorithms … Lecture 23: April 20.

Extracting Randomness From Few Independent Sources Boaz Barak, IAS Russell Impagliazzo, UCSD Avi Wigderson, IAS.

Raef Bassily Penn State Local, Private, Efficient Protocols for Succinct Histograms Based on joint work with Adam Smith (Penn State) (To appear in STOC.

Circuit and Communication Complexity. Karchmer – Wigderson Games Given The communication game G f : Alice getss.t. f(x)=1 Bob getss.t. f(y)=0 Goal: Find.

On Complexity, Sampling, and -Nets and -Samples. Range Spaces A range space is a pair, where is a ground set, it’s elements called points and is a family.

The Communication Complexity of Approximate Set Packing and Covering

Sketching for M-Estimators: A Unified Approach to Robust Regression

Turnstile Streaming Algorithms Might as Well Be Linear Sketches Yi Li Huy L. Nguyen David Woodruff.

CS151 Complexity Theory Lecture 6 April 15, 2015.

On the tightness of Buhrman- Cleve-Wigderson simulation Shengyu Zhang The Chinese University of Hong Kong On the relation between decision tree complexity.

CS151 Complexity Theory Lecture 7 April 20, 2004.

Probably Approximately Correct Model (PAC)

Sketching for M-Estimators: A Unified Approach to Robust Regression Kenneth Clarkson David Woodruff IBM Almaden.

Experts and Boosting Algorithms. Experts: Motivation Given a set of experts –No prior information –No consistent behavior –Goal: Predict as the best expert.

1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 13 June 22, 2005

How Robust are Linear Sketches to Adaptive Inputs? Moritz Hardt, David P. Woodruff IBM Research Almaden.

Ragesh Jaiswal Indian Institute of Technology Delhi Threshold Direct Product Theorems: a survey.

Tight Bounds for Graph Problems in Insertion Streams Xiaoming Sun and David P. Woodruff Chinese Academy of Sciences and IBM Research-Almaden.

Information Complexity Lower Bounds for Data Streams David Woodruff IBM Almaden.

Hardness of Learning Halfspaces with Noise Prasad Raghavendra Advisor Venkatesan Guruswami.

Streaming Algorithms Piotr Indyk MIT. Data Streams A data stream is a sequence of data that is too large to be stored in available memory Examples: –Network.

By: Amir Ronen, Department of CS Stanford University Presented By: Oren Mizrahi Matan Protter Issues on border of economics & computation, 2002.

Information Theory for Data Streams David P. Woodruff IBM Almaden.

Data Stream Algorithms Ke Yi Hong Kong University of Science and Technology.

The Cost of Fault Tolerance in Multi-Party Communication Complexity Binbin Chen Advanced Digital Sciences Center Haifeng Yu National University of Singapore.

Umans Complexity Theory Lectures Lecture 7b: Randomization in Communication Complexity.

Data Stream Algorithms Lower Bounds Graham Cormode

Lower bounds on data stream computations Seminar in Communication Complexity By Michael Umansky Instructor: Ronitt Rubinfeld.

1 Fault-Tolerant Consensus. 2 Communication Model Complete graph Synchronous, network.

The Message Passing Communication Model David Woodruff IBM Almaden.

Continuous Monitoring of Distributed Data Streams over a Time-based Sliding Window MADALGO – Center for Massive Data Algorithmics, a Center of the Danish.

Information Complexity Lower Bounds

Stochastic Streams: Sample Complexity vs. Space Complexity

New Characterizations in Turnstile Streams with Applications

Streaming & sampling.

Applications of Information Complexity II

Lecture 7: Dynamic sampling Dimension Reduction

Turnstile Streaming Algorithms Might as Well Be Linear Sketches

The Curve Merger (Dvir & Widgerson, 2008)

The Communication Complexity of Distributed Set-Joins

Linear sketching with parities

The Byzantine Secretary Problem

Streaming Symmetric Norms via Measure Concentration

Lecture 6: Counting triangles Dynamic graphs & sampling

CS151 Complexity Theory Lecture 7 April 23, 2019.

Presentation transcript:

Tight Bounds for Distributed Functional Monitoring David Woodruff IBM Almaden Qin Zhang Aarhus University MADALGO

Distributed Functional Monitoring C P1P1 P2P2 P3P3 PkPk … coordinator time sites Static case vs. Dynamic case Problems on x 1 + x 2 + … + x k : sampling, p-norms, heavy hitters, compressed sensing, quantiles, entropy Authors: Can, Cormode, Huang, Muthukrishnan, Patt-Shamir, Shafrir, Tirthapura, Wang, Yi, Zhao, many others Communication x1x1 x2x2 x3x3 xkxk inputs: Updates: x i Ã x i + e j Updates: x i Ã x i + e j

Motivation Data distributed and stored in the cloud –Impractical to put data on a single device Sensor networks –Communication very power-intensive Network routers –Bandwidth limitations

Problems Which functions f(x 1, …, x k ) do we care about? x 1, …, x k are non-negative length-n vectors x = i=1 k x i f(x 1, …, x k ) = |x| p = ( i=1 n x i p ) 1/p |x| 0 is the number of non-zero coordinates What is the randomized communication cost of these problems? I.e., the minimal cost of a protocol, which for every input, fails with probability < 1/3 Static case, Dynamic Case What is the randomized communication cost of these problems? I.e., the minimal cost of a protocol, which for every input, fails with probability < 1/3 Static case, Dynamic Case

Exact Answers An (n) communication bound for computing |x| p, p 1 Reduction from 2-Player Set-Disjointness (DISJ) Alice has a set S µ [n] of size n/4 Bob has a set T µ [n] of size n/4 with either |S Å T| = 0 or |S Å T| = 1 Is S Å T = ; ? |X Å Y| = 1 ! DISJ(X,Y) = 1, |X Å Y| = 0 ! DISJ(X,Y) = 0 [KS, R] (n) communication Prohibitive for applications

Approximate Answers f(x 1, …, x k ) = (1 ± ε) |x | p What is the randomized communication cost as a function of k, ε, and n? Ignore log(nk/ε) factors

Previous Results Lower bounds in static model, upper bounds in dynamic model (underlying vectors are non-negative) |x| 0 : (k + ε -2 ) and O(k ¢ ε -2 ) |x| p : (k + ε -2 ) |x| 2 : O(k 2 /ε + k 1.5 /ε 3 ) |x| p, p > 2: O(k 2p+1 n 1-2/p ¢ poly(1/ε))

Our Results Lower bounds in static model, upper bounds in dynamic model (underlying vectors are non-negative) |x| 0 : (k + ε -2 ) and O(k ¢ ε -2 ) (k ¢ ε -2 ) |x| p : (k + ε -2 ) (k p-1 ¢ ε -2 ). Talk will focus on p = 2 |x| 2 : O(k 2 /ε + k 1.5 /ε 3 ) O(k ¢ poly(1/ε)) |x| p, p > 2: O(k 2p+1 n 1-2/p ¢ poly(1/ε)) O(k p-1 ¢ poly(1/ε)) First lower bounds to depend on product of k and ε - 2 Upper bound doesnt depend polynomially on n

Talk Outline Lower Bounds –Non-zero elements –Euclidean norm Upper Bounds –p-norm

Previous Lower Bounds Lower bounds for any p-norm, p != 1 [CMY] (k) [ABC] (ε -2 ) Reduction from Gap-Orthogonality (GAP-ORT) Alice, Bob have u, v 2 {0,1} ε -2, respectively | ¢ (u, v) – 1/(2ε 2 )| 2/ε [CR, S] (ε -2 ) communication

Talk Outline Lower Bounds –Non-zero elements –Euclidean norm Upper Bounds –p-norm

Lower Bound for Distinct Elements Improve bound to optimal (k ¢ ε -2 ) Simpler problem: k-GAP-THRESH –Each site P i holds a bit Z i –Z i are i.i.d. Bernoulli( ¯ ) –Decide if i=1 k Z i > ¯ k + ( ¯ k) 1/2 or i=1 k Z i < ¯ k - ( ¯ k) 1/2 Otherwise dont care Rectangle property: for any correct protocol transcript ¿, Z 1, Z 2, …, Z k are independent conditioned on ¿

A Key Lemma Lemma: For any protocol ¦ which succeeds w.pr. >.9999, the transcript ¿ is such that w.pr. > 1/2, for at least k/2 different i, H(Z i | ¿ ) < H(.01 ¯ ) Proof: Suppose ¿ does not satisfy this –With large probability, ¯ k - O( ¯ k) 1/2 i=1 k Z i | ¿ ] < ¯ k + O( ¯ k) 1/2 –Since the Z i are independent given ¿, i=1 k Z i | ¿ is a sum of independent Bernoullis –Since most H(Z i | ¿ ) are large, by anti-concentration, both events occur with constant probability: i=1 k Z i | ¿ > ¯ k + ( ¯ k) 1/2, i=1 k Z i | ¿ < ¯ k - ( ¯ k) 1/2 So ¦ cant succeed with large probability

Composition Idea C P1P1 P2P2 P3P3 PkPk … Z3Z3 Z2Z2 Z1Z1 ZkZk The input to P i in k-GAP-THRESH, denoted Z i, is the output of a 2-party Disjointness (DISJ) instance between C and S i - Let X be a random set of size 1/(4ε 2 ) from {1, 2, …, 1/ε 2 } - For each i, if Z i = 1, then choose Y i so that DISJ(X, Y i ) = 1, else choose Y i so that DISJ(X, Y i ) = 0 - Distributional complexity (1/ε 2 ) [Razborov] DISJ Can think of C as a player

Putting it All Together Key Lemma ! For most i, H(Z i | ¿ ) < H(.01 ¯ ) Since H(Z i ) = H( ¯ ) for all i, for most i protocol ¦ solves DISJ(X, Y i ) with constant probability Since the Z i | ¿ are independent, solving DISJ requires communication (ε -2 ) on each of k/2 copies Total communication is (k ¢ ε -2 ) Can show a reduction: –|x| 0 > 1/(2ε 2 ) + 1/ε if i=1 k Z i > ¯ k + ( ¯ k) 1/2 –|x| 0 < 1/(2ε 2 ) - 1/ε if i=1 k Z i < ¯ k - ( ¯ k) 1/2

Talk Outline Lower Bounds –Non-zero elements –Euclidean norm Upper Bounds –p-norm

Lower Bound for Euclidean Norm Improve (k + ε - ) bound to optimal (k ¢ ε -2 ) Base problem: Gap-Orthogonality (GAP-ORT(X, Y)) –Consider uniform distribution on (X,Y) We observe information lower bound for GAP-ORT Sherstovs lower bound for GAP-ORT holds for uniform distribution on (X,Y) [BBCR] + [Sherstov] ! for any protocol ¦ and t > 0, I(X, Y; ¦ ) = (1/(ε 2 log t)) or ¦ uses t communication

Information Implications By chain rule, I(X, Y ; ¦ ) = i=1 1/ε 2 I(X i, Y i ; ¦ | X < i, Y < i ) = (ε -2 ) For most i, I(X i, Y i ; ¦ | X < i, Y < i ) = (1) Maximum Likelihood Principle: non-trivial advantage in guessing (X i, Y i )

2-BIT k-Party DISJ Choose a random j 2 [k 2 ] –j doesnt occur in any T i –j occurs only in T 1, …, T k/2 –j occurs only in T k/, …, T k –j occurs in T 1, …, T k All j j occur in at most one set T i (assume k ¸ 4) We show (k) information cost P1P1 P2P2 …PkPk P3P3 T1T1 T2T2 T3T3 T k 2 [k 2 ] We compose GAP-ORT with a variant of k-Party DISJ

Rough Composition Idea 2-BIT k-party DISJ instance … { 1/ε 2 Show (k/ε 2 ) overall information is revealed Bits X i and Y i in GAP- ORT determine output of i-th 2-BIT k-party DISJ instance An algorithm for approximating Euclidean norm solves GAP-ORT, therefore solves most 2-BIT k-party DISJ instances GAP -ORT - Information adds (if we condition on enough helper variables) - P i participates in all instances - Information adds (if we condition on enough helper variables) - P i participates in all instances

Talk Outline Lower Bounds –Non-zero elements –Euclidean norm Upper Bounds –p-norm

Algorithm for p-norm We get k p-1 poly(1/ε), improving k 2p+1 n 1-2/p poly(1/ε) for general p and O(k 2 /ε + k 1.5 /ε 3 ) for p = 2 Our protocol is the first 1-way protocol, that is, all communication is from sites to coordinator Focus on Euclidean norm (p = 2) in talk Non-negative vectors Just determine if Euclidean norm exceeds a threshold θ

The Most Naïve Thing to Do x i is Site is current vector x = i=1 k x i Suppose Site i sees an update x i Ã x i + e j Send j to Coordinator with a certain probability that only depends on k and θ?

Sample and Send P1P1 P2P2 …PkPk P3P3 C 1…10…00…0………0…01…10…00…0………0…0 0…01…10…0………0…00…01…10…0………0…0 0…00…01…1………0…00…00…01…1………0…0 ……………………………………………………………………………… 0…00…00…0………1…10…00…00…0………1…1 |x| 2 = k 2 { k |x| 2 = 2k Send each update with probability at least 1/k Communication = O(k), so okay Send each update with probability at least 1/k Communication = O(k), so okay Suppose x has k 4 coordinates that are 1, and may have a unique coordinate which is k 2, occurring k times on each site - Send update with probability 1/k 2 - Will find the large coordinate - But communication is (k 2 ) - Send update with probability 1/k 2 - Will find the large coordinate - But communication is (k 2 )

What Is Happening? Sampling with probability ¼ 1/k 2 is good to get a few samples from heavy item But all the light coordinates are in the way, making the communication (k 2 ) Suppose we put a barrier of k, that is, sample with probability ¼ 1/k 2 but only send an item if it has occurred at least k times on a site Now communication is O(1) and found heavy coordinate But light coordinates also contribute to overall |x| 2 value

Sample at different scales with different barriers Use public coin to create O(log n) groups T 1, …, T log n of the n input coordinates T z contains n/2 z random coordinates Suppose Site i sees the update x i Ã x i + e j For each T z containing j If x i j > (θ/2 z ) 1/2 /k then with probability (2 z /θ) 1/2 ¢ poly(ε -1 log n), send (j, z) to the coordinator Algorithm for Euclidean Norm Expected communication O~(k) If a group of coordinates contributes to |x| 2, there is a z for which a few coordinates in the group are sampled multiple times

Conclusions Improved communication lower and upper bounds for estimating |x| p Implies tight lower bounds for estimating entropy, heavy hitters, quantiles Implications for data stream model –First lower bound for |x| 0 without Gap-Hamming –Useful information cost lower bound for Gap-Hamming, or protocol has very large communication –Improve (n 1-2/p /ε 2/p ) bound for estimating |x| p in a stream to (n 1-2/p /ε 4/p )