Download presentation

Presentation is loading. Please wait.

Published byJesse Chase Modified over 3 years ago

1
Tight Lower Bounds for the Distinct Elements Problem David Woodruff MIT dpwood@mit.edu Joint work with Piotr Indyk

2
The Problem Stream of elements a 1, …, a n each in {1, …, m} Want F 0 = # of distinct elements Elements in adversarial order Algorithms given one pass over stream Goal: Minimum-space algorithm 0113734 …

3
A Trivial Algorithm … 0113734 Keep m-bit characteristic vector v of stream j in stream $ v j = 1 F 0 = wt(10011011) = 5 Space = m 0000000010011011 Can we do better?

4
Negative Results Any algorithm computing F 0 exactly must use (m) space [AMS96] Any deterministic alg. that outputs x with |F 0 – x| < F 0 must use (m) space [AMS96] What about randomized approximation algorithms?

5
Rand. Approx. Algorithms for F 0 O(log log m/ 2 + log m log 1/ ) alg. outputs x with Pr[| F 0 – x| ¾ [BJKST02] Lots of hashing tricks Is this optimal? Previous lower bounds (log m) [AMS96] (1/ ) [Bar-Yossef] Open Problem of [BJKST02]: GAP: 1/ << 1/ 2

6
Idea Behind Lower Bounds x 2 {0,1} m y 2 {0,1} m Stream s(x) Stream s(y) (1 § ) F 0 algorithm A Internal state of A Compute (1 § ) F 0 (s(x) ± s(y)) w.p. > ¾ Idea: If can decide f(x,y) w.p. > ¾, space used by A at least fs rand. 1-way comm. complexity S AliceBob

7
Randomized 1-way comm. complexity Boolean function f: X £ Y ! {0,1} Alice has x 2 X, Bob y 2 Y. Bob wants f(x,y) Only 1 message sent: must be from Alice to Bob Comm. cost of protocol = expected length of longest message sent over all inputs. -error randomized 1-way comm. complexity of f, R (f), is comm. cost of optimal protocol computing f w.p. ¸ 1- How do we lower bound R (f)?

8
The VC Dimension [KNR] F = {f : X ! {0,1}} family of Boolean functions f 2 F is length-|X | bit string For S µ X, shatter coefficient SC(f S ) of S is |{f | S } f 2 F | = # distinct bit strings when F restricted to S SC(F, p) = max S 2 X, |S| = p SC(f S ) If SC(f S ) = 2 |S|, S shattered by F VC Dimension of F, VCD(F), = size of largest S shattered by F

9
Shatter Coefficient Theorem Notation: For f: X £ Y ! {0,1}, define: f X = { f x (y) : Y ! {0,1} | x 2 X }, where f x (y) = f(x,y) Theorem [BJKS]: For every f: X £ Y ! {0,1}, every p ¸ VCD( f X ), R 1/4 (f) = (log(SC(f X, p)))

10
The (1/ ) Lower Bound [Bar-Yossef] Alice has x 2 R {0,1} m, wt(x) = m/2 Bob has y 2 R {0,1} m, wt(y) = m and: Either wt(x Æ y) = 0 OR wt(x Æ y) = m f(x,y) = 0 f(x,y) = 1 R 1/4 (f) = (VCD(f X )) = (1/ ) [Bar-Yossef] s(x), s(y) any streams w/char. vectors x, y f(x,y) = 1 ! F 0 (s(x) ± s(y)) = m/2 f(x,y) = 0 ! F 0 (s(x) ± s(y)) = m/2 + m (1+)m/2 < (1 -)(m/2 + m) for = ( ) Hence, can decide f ! F 0 alg. uses (1/ ) space

11
Our Results Remainder of talk: (1/ 2 ) lower bound for = (m -1/(9+k) ) for any k > 0. ! O(log log m/ 2 + log m log 1/ ) upper bound almost optimal IDEA: Reduce from protocol for computing dot product

12
The Promise Problem X = {x 2 [0,1] t, ||x|| = 1 and 9 y 2 Y s.t. (x,y) 2 } We lower bound R 1/4 (f) via SC(f X, t) t = (1/ 2 ), Y = basis of unit vectors of R t x 2 [0,1] t ||x|| = 1 y 2 Y AliceBob Promise Problem : h x,y i = 0 h x,y i = 2/t 1/2 f(x,y) = 0 OR f(x,y) = 1

13
Bounding SC(f X, t) Theorem: SC(f X, t/4) = 2 (t) Proof: 1. 8 T ½ {Y} s.t. |T| = t/4, put x T = (2/t 1/2 ) ¢ e 2 T e 2.Define X 1 ½ X as X 1 = {x T | T ½ {Y}, |T| = t/4} 3.Claim: 8 s 2 {0,1} t w/ wt(x) = t/4, s 2 truth tab. of f X 1 4.Proof: 1.Let s 2 {0,1} t with 1s in positions i 1, …, i t/4 2.Put T = {e i1, …, e it/4 }. 8 e 2 T, he, x T i = 2/t 1/2 = 2 3. 8 e 2 Y - T, h e, x T i = 0 5.There are 2 (t) such s.

14
Bounding R 1/4 (f) Corollary: ReductionReduction: we need protocol computing f with communication = space used by any (1 § ) F 0 approx. alg.

15
Reduction Recall: hx,yi = 0 if f(x,y) = 0 hx,yi = 2/t 1/2 if f(x,y) = 1 Goal:Goal: Reduce separation of hx,yi to separation of F 0 (s(x) ± s(y)) for streams s(x),s(y) Alice/Bob can derive from x,y Use relation: ||y-x|| 2 = ||y|| 2 + ||x|| 2 – 2hx, yi f(x,y) = 0 ! ||y-x|| = 2 1/2 f(x,y) = 1 ! ||y-x|| < 2 1/2 (1- 1/t 1/2 ) = 2 1/2 (1 - ( ))

16
Overview of Reduction x 2 [0,1] t ||x|| = 1 y 2 E 1.Low-distortion embedding : l 2 t ! l 1 poly(t) 2. Rational Approximation (x) (y) 3. Scale rationals to integers s 4. Convert integer coords to unary to get {0,1} vectors x,y x y F 0 (s(x) ± s(y)) can decide f(x,y) w.p. ¸ 3/4 F 0 Alg F 0 (s(x) ± s(y)) F 0 Alg State s(x)s(y)

17
Embedding l 2 t into l 1 poly(t) A (1+ )-distortion embedding : l 2 t ! l 1 d is mapping s.t. 8 p,q 2 l 2 t, Theorem [FLM77]: 8 9 a (1+ )- distortion embedding : l 2 t ! l 1 d with:

18
Embedding l 2 t into l 1 d x 2 [0,1] t ||x|| = 1 y 2 E Low-distortion embedding : l 2 t ! l 1 d (x) (y) Using Theorem [FLM77], Alice/Bob get (x), (y) 2 R d with d = O(t ¢ (log 1/ ) / 2 ): specified later

19
Rational Approximation z = z(t): N ! N; assume z ¸ d Approximate each coord. of output of embedding by integer multiple of 1/z

20
Scaling Alice (resp. Bob) multiplies each coord. of (resp. ) by z Obtains s( ) (resp. s( ) Claim: coords. are integers in range [-2z, 2z] Proof: 1. | | · | (¢)| + d/z · 2 2. |s( )| = z| |

21
Converting to Unary For i=1 to d j Ã s( ) i Replace s( ) i with 1 2z+j 0 2z-j Bob does same for s( ) x, y denote new length 4dz bitstrings wt(x) = |s( )|, wt(y) = |s( )| (x,y) = |s( ) – s( )|

22
Reducing (x,y) to F 0 Alice (Bob) chooses stream a x (a y ) with char. vector x (y). Lemma: If 1 < wt(x), wt(y) < 2, then: 1 + (x,y)/2 < F 0 (a x ± a y ) < 2 + (x,y)/2 Follows from fact: F 0 (a x ± a y ) = wt(x Ç y)

23
Reducing (x,y) to F 0 Use lemma to show: Set = ( ), z = (1/ 5 log 1/ ) so that two cases distinguished by (1 § ( )) F 0 alg

24
Conclusions a x, a y must be in universe of size ¸ 4zd = (log (1/ )/ 9 ) Reduction only valid if 4zd · m (1/ 2 ) bound for = (m -1/(9+k) ) 8 k > 0. Recently lower bound improved to: (1/ 2 ) for ¸ m -1/2, which is optimal Find set of vectors directly in Hamming space via involved prob. method argument

Similar presentations

OK

The Average Case Complexity of Counting Distinct Elements David Woodruff IBM Almaden.

The Average Case Complexity of Counting Distinct Elements David Woodruff IBM Almaden.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on regular expression java Ppt on artificial intelligence system Ppt on motivation in psychology Ppt on cost accounting standard Lungs anatomy and physiology ppt on cells Ppt on basic proportionality theorem for class 10 Ppt on aerobics workout Ppt on teamviewer free Ppt on social networking security Ppt on power line communication chip