The Message Passing Communication Model David Woodruff IBM Almaden.

Slides:



Advertisements
Similar presentations
Estimating Distinct Elements, Optimally
Advertisements

Rectangle-Efficient Aggregation in Spatial Data Streams Srikanta Tirthapura David Woodruff Iowa State IBM Almaden.
Optimal Approximations of the Frequency Moments of Data Streams Piotr Indyk David Woodruff.
1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.
The Data Stream Space Complexity of Cascaded Norms T.S. Jayram David Woodruff IBM Almaden.
Tight Bounds for Distributed Functional Monitoring David Woodruff IBM Almaden Qin Zhang Aarhus University MADALGO Based on a paper in STOC, 2012.
The Complexity of Linear Dependence Problems in Vector Spaces David Woodruff IBM Almaden Joint work with Arnab Bhattacharyya, Piotr Indyk, and Ning Xie.
Tight Bounds for Distributed Functional Monitoring David Woodruff IBM Almaden Qin Zhang Aarhus University MADALGO.
Optimal Space Lower Bounds for All Frequency Moments David Woodruff MIT
Numerical Linear Algebra in the Streaming Model Ken Clarkson - IBM David Woodruff - IBM.
Optimal Space Lower Bounds for all Frequency Moments David Woodruff Based on SODA 04 paper.
The Average Case Complexity of Counting Distinct Elements David Woodruff IBM Almaden.
Optimal Bounds for Johnson- Lindenstrauss Transforms and Streaming Problems with Sub- Constant Error T.S. Jayram David Woodruff IBM Almaden.
Numerical Linear Algebra in the Streaming Model
Sublinear-time Algorithms for Machine Learning Ken Clarkson Elad Hazan David Woodruff IBM Almaden Technion IBM Almaden.
Xiaoming Sun Tsinghua University David Woodruff MIT
Tight Lower Bounds for the Distinct Elements Problem David Woodruff MIT Joint work with Piotr Indyk.
Subspace Embeddings for the L1 norm with Applications Christian Sohler David Woodruff TU Dortmund IBM Almaden.
Truthful Mechanisms for Combinatorial Auctions with Subadditive Bidders Speaker: Shahar Dobzinski Based on joint works with Noam Nisan & Michael Schapira.
Circuit and Communication Complexity. Karchmer – Wigderson Games Given The communication game G f : Alice getss.t. f(x)=1 Bob getss.t. f(y)=0 Goal: Find.
The Communication Complexity of Approximate Set Packing and Covering
An Ω(n 1/3 ) Lower Bound for Bilinear Group Based Private Information Retrieval Alexander Razborov Sergey Yekhanin.
Approximation Algorithms Chapter 14: Rounding Applied to Set Cover.
Multicut Lower Bounds via Network Coding Anna Blasiak Cornell University.
Sketching for M-Estimators: A Unified Approach to Robust Regression
Turnstile Streaming Algorithms Might as Well Be Linear Sketches Yi Li Huy L. Nguyen David Woodruff.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 8 May 4, 2005
Avraham Ben-Aroya (Tel Aviv University) Oded Regev (Tel Aviv University) Ronald de Wolf (CWI, Amsterdam) A Hypercontractive Inequality for Matrix-Valued.
1 Sampling Lower Bounds via Information Theory Ziv Bar-Yossef IBM Almaden.
Computing Sketches of Matrices Efficiently & (Privacy Preserving) Data Mining Petros Drineas Rensselaer Polytechnic Institute (joint.
Preference Analysis Joachim Giesen and Eva Schuberth May 24, 2006.
Sketching for M-Estimators: A Unified Approach to Robust Regression Kenneth Clarkson David Woodruff IBM Almaden.
Collecting Correlated Information from a Sensor Network Micah Adler University of Massachusetts, Amherst.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 13 June 22, 2005
How Robust are Linear Sketches to Adaptive Inputs? Moritz Hardt, David P. Woodruff IBM Research Almaden.
Dana Moshkovitz, MIT Joint work with Subhash Khot, NYU.
Ch. 8 & 9 – Linear Sorting and Order Statistics What do you trade for speed?
Approximating the MST Weight in Sublinear Time Bernard Chazelle (Princeton) Ronitt Rubinfeld (NEC) Luca Trevisan (U.C. Berkeley)
Ragesh Jaiswal Indian Institute of Technology Delhi Threshold Direct Product Theorems: a survey.
Tight Bounds for Graph Problems in Insertion Streams Xiaoming Sun and David P. Woodruff Chinese Academy of Sciences and IBM Research-Almaden.
Information Complexity Lower Bounds for Data Streams David Woodruff IBM Almaden.
Streaming Algorithms Piotr Indyk MIT. Data Streams A data stream is a sequence of data that is too large to be stored in available memory Examples: –Network.
Yaomin Jin Design of Experiments Morris Method.
Lower Bounds for Read/Write Streams Paul Beame Joint work with Trinh Huynh (Dang-Trinh Huynh-Ngoc) University of Washington.
Information Theory for Data Streams David P. Woodruff IBM Almaden.
PODC Distributed Computation of the Mode Fabian Kuhn Thomas Locher ETH Zurich, Switzerland Stefan Schmid TU Munich, Germany TexPoint fonts used in.
One-way multi-party communication lower bound for pointer jumping with applications Emanuele Viola & Avi Wigderson Columbia University IAS work done while.
Communication vs. Computation S Venkatesh Univ. Victoria Presentation by Piotr Indyk (MIT) Kobbi Nissim Microsoft SVC Prahladh Harsha MIT Joe Kilian NEC.
The Cost of Fault Tolerance in Multi-Party Communication Complexity Binbin Chen Advanced Digital Sciences Center Haifeng Yu National University of Singapore.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 7.
Data Stream Algorithms Lower Bounds Graham Cormode
Lower bounds on data stream computations Seminar in Communication Complexity By Michael Umansky Instructor: Ronitt Rubinfeld.
Tight Bound for the Gap Hamming Distance Problem Oded Regev Tel Aviv University TexPoint fonts used in EMF. Read the TexPoint manual before you delete.
Approximation Algorithms based on linear programming.
Sketching complexity of graph cuts Alexandr Andoni joint work with: Robi Krauthgamer, David Woodruff.
Theory of Computational Complexity Probability and Computing Ryosuke Sasanuma Iwama and Ito lab M1.
A Story of Principal Component Analysis in the Distributed Model David Woodruff IBM Almaden Based on works with Christos Boutsidis, Ken Clarkson, Ravi.
PROBABILITY AND COMPUTING RANDOMIZED ALGORITHMS AND PROBABILISTIC ANALYSIS CHAPTER 1 IWAMA and ITO Lab. M1 Sakaidani Hikaru 1.
Algorithms for Big Data: Streaming and Sublinear Time Algorithms
Probabilistic Algorithms
Information Complexity Lower Bounds
Stochastic Streams: Sample Complexity vs. Space Complexity
New Characterizations in Turnstile Streams with Applications
Streaming & sampling.
Lecture 18: Uniformity Testing Monotonicity Testing
Sublinear Algorithmic Tools 2
Turnstile Streaming Algorithms Might as Well Be Linear Sketches
The Communication Complexity of Distributed Set-Joins
Linear sketching with parities
Presentation transcript:

The Message Passing Communication Model David Woodruff IBM Almaden

k-party Number-In-Hand Model P1P1 P2P2 P3P3 PkPk P4P4 … x1x1 x2x2 x3x3 x4x4 Goals: - compute a function f(x 1, …, x k ) - minimize communication complexity - Point-to-point communication - Protocol transcript determines who speaks next - Point-to-point communication - Protocol transcript determines who speaks next xkxk

k-party Number-In-Hand Model C P1P1 P2P2 P3P3 PkPk … Convenient to introduce a “coordinator” C All communication goes through the coordinator Communication only affected by a factor of 2 (plus one word per message) x1x1 x2x2 x3x3 xkxk

Model Motivation Data distributed and stored in the cloud –For speed –Just doesn’t fit on one device Sensor networks / Network routers –Communication very power-intensive –Bandwidth limitations Distributed functional monitoring –Continuously monitor a statistic of distributed data –Don’t want to keep sending all data to one place

Randomized Communication Complexity Randomized communication complexity R(f) of a function f: The communication cost of a protocol is the sum of all individual message lengths, maximized over all inputs and random coins R(f) is the minimal cost of a protocol, which for every set of inputs, fails in computing f with probability < 1/3

Talk Outline Database Problems Graph Problems Linear-Algebra Problems Recent Work / Conclusions

Database Problems C P1P1 P2P2 P3P3 PkPk … x1x1 x2x2 x3x3 Some well-studied problems - Server i has x i - x = x 1 + x 2 + … + x k - f(x) = |x| p = (Σ i x i p ) 1/p - for binary vectors x i, |x| 0 is the number of distinct values (focus of this talk) xixi

Exact Number of Distinct Elements  (n) randomized complexity for exact computation of |x| 0 Lower bound holds already for 2 players Reduction from 2-Player Set-Disjointness (DISJ) Either |S Å T| = 0 or |S Å T| = 1 |S Å T| = 1 ! DISJ(S,T) = 1, |S Å T| = 0 ! DISJ(S,T) = 0 [KS, R]  (n) communication |x| 0 = |S| + |T| - |S Å T| S µ [n] T µ [n]

Approximate Answers Output an estimate f(x) with f(x) 2 (1 ± ε) |x| 0 What is the randomized communication cost as a function of k, ε, and n? Note that understanding the dependence on ε is critical, e.g., ε <.01

An Upper Bound Player i interprets its input as the i-th set in a data stream Players run a data stream algorithm, and pass the state of the algorithm to each other There is a data stream algorithm for estimating # of distinct elements using O(1/ ε 2 + log n) bits of space Gives a protocol with O(k/ ε 2 + k log n) communication …

Lower Bound This approach is optimal! We show an  (k/ ε 2 + k log n) communication lower bound First show an  (k/ ε 2 ) bound [W, Zhang 12] –Start with a simpler problem GAP- THRESHOLD

Lower Bound for Approximate |x| 0 GAP-THRESHOLD problem: –Player P i holds a bit Z i –Z i are i.i.d. Bernoulli(1/2) –Decide if  i=1 k Z i > k/2 + k 1/2 or   i=1 k Z i < k/2 - k 1/2 Otherwise don’t care (promise problem) Intuitively  (k) bits of communication is required Sampling doesn’t work… How to prove such a statement??

Rectangle Property Claim: for any protocol transcript ¿, it holds that Z 1, Z 2, …, Z k are independent conditioned on ¿ Can assume players are deterministic by Yao’s minimax principle The input vector Z in {0,1} k giving rise to a transcript ¿ is a combinatorial rectangle: S = S 1 x S 2 x … x S k where S i in {0,1} Since the Z i are i.i.d. Bernoulli(1/2), conditioned on being in S, they are still independent!

GAP-THRESHOLD C P1P1 P2P2 P3P3 PkPk … Z1Z1 Z2Z2 Z3Z3 ZkZk The Z i are i.i.d. Bernoulli(1/2) Coordinator wants to decide if:  i=1 k Z i > k/2 + k 1/2 or   i=1 k Z i < k/2 - k 1/2 By independence of the Z i | ¿, it is equivalent to fixing some Z i to be 0 or 1, and the remaining Z i to be Bernoulli(1/2)

The Proof Lemma [Unbiased Conditional Expectation]: W.pr. 2/3, over the transcript ¿,  i=1 k Z i | ¿ ] – k/2 | < 100 k 1/2 Otherwise, since Var[  i=1 k Z i | ¿ ] 1/2,  i=1 k Z i – k/2| > 50k 1/2 contradicting concentration Lemma [Lots of Randomness After Conditioning]: If the communication is o(k), then w.pr. 1-o(1), over the transcript ¿, for a 1-o(1) fraction of the indices i, Z i | ¿ is Bernoulli(1/2)

The Proof Continued Let’s condition on a ¿ satisfying the previous two lemmas Lemma [Anti-Concentration]: With probability.001, over the Z i | ¿  i=1 k Z i | ¿ ] -  i=1 k Z i | ¿ > 100 k 1/2 With probability.001, over the Z i | ¿  i=1 k Z i | ¿ -  i=1 k Z i | ¿ ] > 100 k 1/2 These follow by anti-concentration So the protocol fails with this probability

Generalizations Generalizes to: Z i are i.i.d. Bernoulli(β), β > 1/k Coordinator wants to decide if:  i=1 k Z i > βk + (β k) 1/2 or   i=1 k Z i < βk – (βk) 1/2 When the players have internal randomness, the proof generalizes: any successful protocol must satisfy: Pr ¿ [for 1-o(1) fraction of indices i, H(Z i | ¿ ) = o(1)] > 2/3 How to get a lower bound for approximating |x| 0 ?

Composition Idea C P1P1 P2P2 P3P3 PkPk … T3T3 T2T2 T1T1 TkTk - Let S be a random set from {1, 2, …, m} -If Z i = 1, give P i a random set T i so that DISJ(S,T i ) = 1, else give P i a random set T i so that DISJ(S,T i ) = 0 -Is  i=1 k DISJ(S,T i ) > k/2 + k 1/2 or   i=1 k DISJ(S, T i )< k/2 - k 1/2 ? -Equivalently, is  i=1 k Z i > k/2 + k 1/2 or   i=1 k Z i < k/2 - k 1/2 -Our Result: total communication is Ω(mk) DISJ

Composition Idea Continued For this composed problem, a correct protocol satisfies: Pr ¿ [for 1-o(1) fraction of indices i, H(Z i | ¿ ) = o(1)] > 2/3 Most DISJ instances are “solved” by the protocol How to formalize? Suppose the communication were o(km) By averaging, there is a player P i so that The communication between C and P i is o(m) H(Z i | ¿ ) = o(1) with large probability

The Punch Line Reduce to a 2-player problem! Let the two players in the 2-player DISJ problem be the coordinator C and P i C can sample the inputs of all players P j for j != i Run the multi-player protocol. Messages between C and P j is sent, for j != i, can be simulated locally! So total communication is o(m) to solve DISJ with large probability, a contradiction! C PiPi PkPk … T3T3 T2T2 T1T1

Reduction to |x| 0 m = 1/ε 2. Coordinator wants to decide if:  i=1 k Z i > βk + (β k) 1/2 or   i=1 k Z i < βk – (βk) 1/2 Set probability β of intersection to be 1/(kε 2 ) Approximating |x| 0 up to 1+ε solves this problem C P1P1 P2P2 P3P3 PkPk … T3T3 T2T2 T1T1 TkTk DISJ

Other Lower Bound for |x| 0 Overall lower bound is  (k/ ε 2 + k log n) The k log n lower bound also a reduction to a 2-player problem! [W, Zhang 14] –This time to a 2-player Equality problem (details omitted)

Talk Outline Database Problems Graph Problems Linear-Algebra Problems Recent Work / Conclusions

Graph Problems [W,Zhang13] Canonical hard-multiplayer problem for graph problems: k x n binary matrix A –Each player has a row of A –Is the number of columns with at least one 1 larger than n/2? Requires  (kn) bits of communication to solve with probability at least 2/3  (kn) lower bound for connectivity and bipartiteness without edge duplications

Talk Outline Database Problems Graph Problems Linear-Algebra Problems Recent Work / Conclusions

Linear Algebra [Li,Sun,Wang,W] k players each have an n x n matrix in a finite field of p elements Players want to know if the sum of their matrices is invertible Randomized  (kn 2 log p) communication lower bound Same lower bound for rank, solving linear equations

Talk Outline Database Problems Graph Problems Linear-Algebra Problems Recent Work / Conclusions

Recent Work Braverman et al. obtain  (kn) lower bound for k-player disjointness –Strengthens canonical hard problem for graphs (additional applications like diameter) Chattopadhyay, Radhakrishnan, Rudra study multiplayer communication in topologies other than star topology –Obtain bounds that depend on 1-median of the network

Conclusion Illustrated techniques for lower bounds for multiplayer communication via the distinct elements problem Many tight lower bounds known –Statistical problems (lp norms) –Graph problems –Linear algebra problems Future directions –Rounds vs. communication –Topology-sensitive problems