1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 13 June 22, 2005

Slides:



Advertisements
Similar presentations
1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.
Advertisements

The Data Stream Space Complexity of Cascaded Norms T.S. Jayram David Woodruff IBM Almaden.
Tight Bounds for Distributed Functional Monitoring David Woodruff IBM Almaden Qin Zhang Aarhus University MADALGO Based on a paper in STOC, 2012.
Tight Bounds for Distributed Functional Monitoring David Woodruff IBM Almaden Qin Zhang Aarhus University MADALGO.
Optimal Space Lower Bounds for All Frequency Moments David Woodruff MIT
Optimal Space Lower Bounds for all Frequency Moments David Woodruff Based on SODA 04 paper.
The Average Case Complexity of Counting Distinct Elements David Woodruff IBM Almaden.
Optimal Bounds for Johnson- Lindenstrauss Transforms and Streaming Problems with Sub- Constant Error T.S. Jayram David Woodruff IBM Almaden.
Xiaoming Sun Tsinghua University David Woodruff MIT
Tight Lower Bounds for the Distinct Elements Problem David Woodruff MIT Joint work with Piotr Indyk.
Shortest Vector In A Lattice is NP-Hard to approximate
The Communication Complexity of Approximate Set Packing and Covering
Gillat Kol (IAS) joint work with Ran Raz (Weizmann + IAS) Interactive Channel Capacity.
1 Information complexity and exact communication bounds April 26, 2013 Mark Braverman Princeton University Based on joint work with Ankit Garg, Denis Pankratov,
Chain Rules for Entropy
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 13 June 25, 2006
Rotem Zach November 1 st, A rectangle in X × Y is a subset R ⊆ X × Y such that R = A × B for some A ⊆ X and B ⊆ Y. A rectangle R ⊆ X × Y is called.
CS151 Complexity Theory Lecture 6 April 15, 2015.
On the tightness of Buhrman- Cleve-Wigderson simulation Shengyu Zhang The Chinese University of Hong Kong On the relation between decision tree complexity.
CS151 Complexity Theory Lecture 7 April 20, 2004.
Seminar in Foundations of Privacy Gil Segev Message Authentication in the Manual Channel Model.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 12 June 18, 2006
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 8 May 4, 2005
CPSC 689: Discrete Algorithms for Mobile and Wireless Systems Spring 2009 Prof. Jennifer Welch.
The Goldreich-Levin Theorem: List-decoding the Hadamard code
1 Lecture 18 Syntactic Web Clustering CS
Avraham Ben-Aroya (Tel Aviv University) Oded Regev (Tel Aviv University) Ronald de Wolf (CWI, Amsterdam) A Hypercontractive Inequality for Matrix-Valued.
Derandomizing LOGSPACE Based on a paper by Russell Impagliazo, Noam Nissan and Avi Wigderson Presented by Amir Rosenfeld.
1 Sampling Lower Bounds via Information Theory Ziv Bar-Yossef IBM Almaden.
1 The Complexity of Massive Data Set Computations Ziv Bar-Yossef Computer Science Division U.C. Berkeley Ph.D. Dissertation Talk May 6, 2002.
1 Streaming Computation of Combinatorial Objects Ziv Bar-Yossef U.C. Berkeley Omer Reingold AT&T Labs – Research Ronen.
Locally Decodable Codes Uri Nadav. Contents What is Locally Decodable Code (LDC) ? Constructions Lower Bounds Reduction from Private Information Retrieval.
Information Theory Rong Jin. Outline  Information  Entropy  Mutual information  Noisy channel model.
Quantum Algorithms II Andrew C. Yao Tsinghua University & Chinese U. of Hong Kong.
Theory of Computing Lecture 22 MAS 714 Hartmut Klauck.
Lower Bounds for Massive Dataset Algorithms T.S. Jayram (IBM Almaden) IIT Kanpur Workshop on Data Streams.
Lecture 3. Relation with Information Theory and Symmetry of Information Shannon entropy of random variable X over sample space S: H(X) = ∑ P(X=x) log 1/P(X=x)‏,
Some basic concepts of Information Theory and Entropy
§1 Entropy and mutual information
INFORMATION THEORY BYK.SWARAJA ASSOCIATE PROFESSOR MREC.
Complexity Theory Lecture 2 Lecturer: Moni Naor. Recap of last week Computational Complexity Theory: What, Why and How Overview: Turing Machines, Church-Turing.
Tight Bounds for Graph Problems in Insertion Streams Xiaoming Sun and David P. Woodruff Chinese Academy of Sciences and IBM Research-Almaden.
1 Fingerprinting techniques. 2 Is X equal to Y? = ? = ?
1 Information and interactive computation January 16, 2012 Mark Braverman Computer Science, Princeton University.
Information Complexity Lower Bounds for Data Streams David Woodruff IBM Almaden.
Streaming Algorithms Piotr Indyk MIT. Data Streams A data stream is a sequence of data that is too large to be stored in available memory Examples: –Network.
Channel Capacity.
JHU CS /Jan Hajic 1 Introduction to Natural Language Processing ( ) Essential Information Theory I AI-lab
Information Theory for Data Streams David P. Woodruff IBM Almaden.
Asymmetric Communication Complexity And its implications on Cell Probe Complexity Slides by Elad Verbin Based on a paper of Peter Bro Miltersen, Noam Nisan,
Umans Complexity Theory Lectures Lecture 7b: Randomization in Communication Complexity.
Massive Data Sets and Information Theory Ziv Bar-Yossef Department of Electrical Engineering Technion.
Data Stream Algorithms Lower Bounds Graham Cormode
Communication Complexity Guy Feigenblat Based on lecture by Dr. Ely Porat Some slides where adapted from various sources Complexity course Computer science.
Lower bounds on data stream computations Seminar in Communication Complexity By Michael Umansky Instructor: Ronitt Rubinfeld.
1 Introduction to Quantum Information Processing CS 467 / CS 667 Phys 467 / Phys 767 C&O 481 / C&O 681 Richard Cleve DC 3524 Course.
Tight Bound for the Gap Hamming Distance Problem Oded Regev Tel Aviv University TexPoint fonts used in EMF. Read the TexPoint manual before you delete.
Channel Coding Theorem (The most famous in IT) Channel Capacity; Problem: finding the maximum number of distinguishable signals for n uses of a communication.
Hartmut Klauck Centre for Quantum Technologies Nanyang Technological University Singapore.
The Message Passing Communication Model David Woodruff IBM Almaden.
Theory of Computational Complexity Probability and Computing Ryosuke Sasanuma Iwama and Ito lab M1.
1 Design and Analysis of Algorithms Yoram Moses Lecture 13 June 17, 2010
Random Access Codes and a Hypercontractive Inequality for
Information Complexity Lower Bounds
New Characterizations in Turnstile Streams with Applications
Introduction to Information theory
Branching Programs Part 3
COMS E F15 Lecture 2: Median trick + Chernoff, Distinct Count, Impossibility Results Left to the title, a presenter can insert his/her own image.
CS 154, Lecture 6: Communication Complexity
Locality In Distributed Graph Algorithms
Presentation transcript:

1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 13 June 22,

2 Data Stream Space Lower Bounds

3 Data Streams: Reminder Data (n is very large) Computer Program Stream through the data; Use limited memory Approximation of How do we prove memory lower bounds for data stream algorithms?

4 CC(g) = min  :  computes g cost(  ) Communication Complexity [Yao 79] Alice Bob m1m1 m2m2 m3m3 m4m4 Referee cost(  ) =  i |m i |  a,b) “transcript”

5 CC 1 (g) = min  :  computes g cost(  ) 1-Way Communication Complexity Alice Bob Referee cost(  ) = |m 1 | + |m 2 |  a,b) “transcript” m1m1 m2m2

6 Data Stream Space Lower Bounds f: A n  B: a function DS(f): data stream space complexity of f  Minimum amount of memory used by any data stream algorithm computing f g: A n/2 × A n/2  B: g(x,y) = f(xy) CC 1 (g): 1-way communication complexity of g Proposition: DS(f) ≥  (CC 1 (g)) Corollary: In order to obtain a lower bound on DS(f), suffices to prove a lower bound on CC 1 (g).

7 Reduction: 1-Way CC to DS Proof of reduction:  S = DS(f)  M: an S-space data stream algorithm for f Lemma: There is a 1-way CC protocol  for g whose cost is S + log(|B|). Proof:  works as follows:  Alice runs M on x  Alice sends state of M to Bob (S bits)  Bob continues execution of M on y  Bob sends output of M to Alice

8 Distinct Elements: Reminder Input: a vector x 2 {1,2,…,m} n DE(x) = number of distinct elements of x  Example: if x = (1,2,3,1,2,3), then DE(x) = 3\ Space complexity of DE:  Randomized approximate: O(log m) space  Deterministic approximate:  (m) space  Randomized exact:  (m) space

9 The Equality Function EQ: X × X  {0,1} EQ(x,y) = 1 iff x = y Theorem: CC 1 (EQ) ≥  (log |X|) Proof:   : any 1-way protocol for EQ   A (x): Alice’s message on input x   B (m,y): Bob’s message when receiving message m from Alice and input y   (x,y) = (  A (x),  B (  A (x),y)): transcript on (x,y)

10 Equality Lower Bound Proof (cont.):  Suppose |  A (x)| < log|X| for all x  X  Then, # of distinct messages of Alice < 2 log|X| = |X|  By pigeonhole principle, there exist two inputs x,x’  X s.t.  A (x) =  A (x’)  Therefore,  (x,x) =  (x’,x)  But EQ(x,x) ≠ EQ(x’,x). Contradiction.

11 Combinatorial Designs 1.For each i, |T i | = |U|/4 2.For each i,j, |T i  T j | ≤ |U|/8. T1T1 T2T2 T3T3 U A family of subsets T 1,…,T k of a universe U s.t. Fact: There exist designs of size k = 2  (|U|). (Constant rate, constant relative minimum distance binary error-correcting codes).

12 Reduction from EQ to DE U = {1,2,…,m} X = { T 1,….,T k }: design of size k = 2  (m) EQ: X × X  {0,1} Note:  If x = y, then DE(xy) = m/2  If x ≠ y, then DE(xy) ≥ 3m/4 Therefore:  deterministic data stream algorithm that approximates DE with space S  1-way protocol for EQ with cost S + O(log m) Conclusion: DS(DE) ≥  (m)

13 Randomized Communication Complexity Alice & Bob are allowed to use random coin tosses For any inputs x,y, the referee needs to find g(x,y) w.p. 1 -  RCC(g) = minimum cost of a randomized protocol that computes g with error ¼. RCC 1 (g) = minimum cost of a randomized 1-way protocol that computes g with error ¼.  What is RCC 1 (EQ)? RDS(f) = minimum amount of memory used by any randomized data stream algorithm computing f with error ¼. Lemma: RDS(f) ≥  (RCC 1 (g))

14 Set Disjointness U = { 1,2,…,m } X = 2 U = {0,1} m Disj: X × X  {0,1} Disj(x,y) = 1 iff x  y ≠  Equivalently: Theorem [Kalyanasundaram-Schnitger 88] RCC(Disj) ≥  (m)

15 Reduction from Disj to DE U = {1,2,…,m} X = 2 U Disj: X × X  {0,1} Note: DE(xy) = |x  y| Hence,  If x  y = , then DE(xy) = |x| + |y|  If x  y ≠ , then DE(xy) < |x| + |y| Therefore:  randomized data stream algorithm that computes DE exactly with space S  1-way randomized protocol for Disj with cost S + O(log m) Conclusion: RDS(DE) ≥  (m)

16 Information Theory Primer X: random variable on U H(X) = entropy of X = amount of “uncertainty” in X (in bits)  Ex: if X is uniform, then H(X) = log(|U|)  Ex: if X is constant, then H(X) = 0 H(X | Y) = conditional entropy of X given Y = amount of uncertainty left in X after knowing Y  Ex: H(X | X) = 0  Ex: If X,Y are independent, H(X | Y) = H(X) I (X ; Y) = H(X) – H(X | Y) = H(Y) – H(Y | X) = mutual information between X and Y H(X) H(Y) H(X|Y) H(Y|X) I (X;Y) H(X,Y )

17 Information Theory Primer (cont.) Sub-additivity of entropy: H(X,Y) ≤ H(X) + H(Y). Equality iff X,Y independent. Conditional mutual information: I (X ; Y | Z) = H(X | Z) – H(X | Y,Z)

18 Information Complexity g: A × B  C: a function  : a communication complexity protocol that computes g  : distribution on A × B (X,Y): random variable with distribution  Information cost of  : icost(  ) = I (X,Y;  (X,Y)) Information complexity of g: IC  (g) = min  :  computes g icost(  ) Lemma: For any , RCC(g) ≥ IC  (g)

19 Direct Sum for Information Complexity We want to:  Find a distribution  on {0,1} m  {0,1} m  Show that IC  (Disj) ≥  (m) Recall that Disj(x,y) = OR i = 1 to m (x i AND y i ) We will prove a “direct sum” theorem for IC  (Disj):  Disj is a “sum” of m independent copies of AND  Hence, information complexity of Disj is m times information complexity of AND We will define a distribution on {0,1}  {0,1} and then define  = m “Theorem” [Direct Sum]: IC  (Disj) ≥ m  IC (AND) It would then suffice to prove an  (1) lower bound on IC (AND)

20 Conditional Information Complexity Cannot prove direct sum directly for information complexity Recall:   : distribution on {0,1} m  {0,1} m  (X,Y): random variable with distribution  (X,Y) is product, if X,Y are independent Z: some auxiliary random variable on domain S (X,Y) is product conditioned on Z, if for any z  S, X,Y are independent conditioned on the event { Z = z }. Conditional information complexity of g given Z: CIC  (g | Z) = min  :  computes g I (X,Y ;  (X,Y) | Z) Lemma: For any  and Z, IC  (g) ≥ CIC  (g | Z)

21 Input Distributions for Set Disjointness : distribution on pairs {0,1}  {0,1} (U,V): random variable with distribution (U,V) are generated as follows:  Choose a uniform bit D  If D = 0, choose U uniformly from {0,1} and set V = 0  If D = 1, choose V uniformly from {0,1} and set U = 0 We define  = m Note:   is not product  Conditioned on Z = D m,  is product  For every (x,y) in the support of , Disj(x,y) = 0.

22 Direct Sum for IC [Bar-Yossef, Jayram, Kumar, Sivakumar 02] Theorem: CIC  (Disj | D m ) ≥ CIC (AND | D) Proof outline:  Decomposition step: I (X,Y;  (X,Y) | D m ) ≥  i I ((X i,Y i ) ;  (X,Y) | D m )  Reduction step: I ((X i,Y i ) ;  (X,Y) | D m ) ≥ CIC (AND | D)

23 Decomposition Step I (X,Y;  (X,Y) | D m ) = = H(X,Y | D m ) – H(X,Y |  (X,Y), D m ) ≥  i H(X i,Y i | D m ) -  i H(X i,Y i |  (X,Y), D m ) (by independence of (X 1,Y 1 ),…,(X m,Y m ) and by sub-additivity of entropy). =  i I ((X i,Y i ) ;  (X,Y) | D m )

24 Reduction Step Want to show: I ((X i,Y i ) ;  (X,Y) | D m ) ≥ CIC (AND | D) I ((X i,Y i ) ;  (X,Y) | D m ) =  d -i Pr(D -i = d -i ) I ((X i,Y i ) ;  (X,Y) | D i,D -i = d -i ) A protocol for computing AND(x i,y i ):  For all j  i, Alice and Bob select X j and Y j independently using d j  Alice and Bob run the protocol  on X = (X 1,…,X i-1,x i,X i+1,…,X m ) and Y = (Y 1,…,Y i-1,y i,Y i+1,…,Y m )  Note that Disj(X,Y) = AND(x i,y i )  icost of this protocol = I ((X i,Y i ) ;  (X,Y) | D i,D -i = d -i )

25 End of Lecture 13