# Optimal Bounds for Johnson- Lindenstrauss Transforms and Streaming Problems with Sub- Constant Error T.S. Jayram David Woodruff IBM Almaden.

## Presentation on theme: "Optimal Bounds for Johnson- Lindenstrauss Transforms and Streaming Problems with Sub- Constant Error T.S. Jayram David Woodruff IBM Almaden."— Presentation transcript:

Optimal Bounds for Johnson- Lindenstrauss Transforms and Streaming Problems with Sub- Constant Error T.S. Jayram David Woodruff IBM Almaden

Data Stream Model Have a stream of m updates to an n-dimensional vector v –add x to coordinate i –Insertion model -> all updates x are positive –Turnstile model -> x can be positive or negative –stream length and updates < poly(n) Estimate statistics of v –# of distinct elements F 0 –L p -norm |v| p = (Σ i |v i | p ) 1/p –entropy –and so on Goal: output a (1+ ε)-approximation with limited memory

Lots of Optimal Papers Lots of optimal results –An optimal algorithm for the distinct elements problem [KNW] –Fast moment estimation in optimal space [KNPW] –A near-optimal algorithm for estimating entropy of a stream [CCM] –Optimal approximations of the frequency moments of data streams [IW] –A near-optimal algorithm for L1-difference [NW] –Optimal space lower bounds for all frequency moments [W] This paper –Optimal Bounds for Johnson-Lindenstrauss Transforms and Streaming Problems with Sub-Constant Error

What Is Optimal? F 0 = # of non-zero entries in v For a stream of indices in {1, …, n}, our algorithm computes a (1+ε)-approximation using an optimal O(ε -2 + log n) bits of space with 2/3 success probability… This probability can be amplified by independent repetition. If we want high probability, say, 1-1/n, this increases the space by a multiplicative log n So optimal algorithms are only optimal for algorithms with constant success probability

Can We Improve the Lower Bounds? x 2 {0,1} ε -2 y 2 {0,1} ε -2 Gap-Hamming: either Δ(x,y) > ½ + ε or Δ(x,y) < ½-ε Lower bound of Ω(ε -2 ) with 1/3 error probability But upper bound of ε -2 with 0 error probability

Our Results

Streaming Results Independent repetition is optimal! Estimating L p -norm in turnstile model up to 1+ε w.p. 1-δ –Ω(ε -2 log n log 1/δ) bits for any p –[KNW] get O(ε -2 log n log 1/δ) for 0 · p · 2 Estimating F 0 in insertion model up to 1+ε w.p. 1-δ –Ω(ε -2 log 1/δ + log n) bits –[KNW] get O(ε -2 log 1/δ) for ε -2 > log n Estimating entropy in turnstile model up to 1+ε w.p. 1-δ –Ω(ε -2 log n log 1/δ) bits –Improves Ω(ε -2 log n) bound [KNW]

Johnson-Lindenstrauss Transforms Let A be a random matrix so that with probability 1- δ, for any fixed q 2 R d |Aq| 2 = (1 ± ε) |q| 2 [JL] A can be a 1/ε 2 log 1/δ x d matrix –Gaussians or sign variables work [Alon] A needs to have (1/ε 2 log 1/δ) / log 1/ε rows Our result: A needs to have 1/ε 2 log 1/δ rows

Communication Complexity Separation x y f(x,y) 2 {0,1} D 1/3, ρ (f) = communication of best 1-way deterministic protocol that errs w.p. 1/3 on distribution ρ [KNR]: R || 1/3 (f) = max product distributions ¹ £ λ D ¹ £ λ,1/3 (f) 0 1

Communication Complexity Separation f(x,y) 2 {0,1} VC-dimension: maximum number r of columns for which all 2 r rows occur in communication matrix on these columns [KNR]: R || 1/3 (f) = Θ(VC-dimension(f)) Our result: there exist f and g with VC-dimension k, but: R || δ (f) = Θ(k log 1/δ) while R || δ (g) = Θ(k)

Our Techniques

Lopsided Set Intersection (LSI) S ½ {1, 2, …, U} |S| = 1/ ε 2 U = 1/ ε 2 ¢ 1/δ Is S Å T = ; ? - Alice cannot describe S with o( ε -2 log U) bits - If x, y are uniform then with constant probability, S Å T = ; - R || 1/3 (LSI) > D uniform, 1/3 (LSI) = Ω(ε -2 log 1/δ) T ½ {1, 2, …, U} |T| = 1/δ

Lopsided Set Intersection (LSI2) S ½ {1, 2, …, U} |S| = 1/ ε 2 U = 1/ ε 2 ¢ 1/δ Is S Å T = ; ? T ½ {1, 2, …, U} |T| = 1 -R || δ/3 (LSI2) ¸ R | | 1/3 (LSI) = Ω(ε -2 log 1/δ) - Union bound over set elements in LSI instance

Low Error Inner Product x 2 {0, ε} U |x| 2 = 1 y 2 {0, 1} U |y| 2 = 1 Does = 0? Estimate up to ε w.p. 1-δ -> solve LSI2 w.p. 1-δ R || δ (inner product ε ) = Ω(ε -2 log 1/δ) U = 1/ε 2 ¢ 1/δ

L 2 -estimation ε x 2 {0, ε} U |x| 2 = 1 What is |x-y| 2 ? U = 1/ ε 2 ¢ 1/δ y 2 {0, 1 } U |y| 2 = 1 - |x-y| 2 2 = |x| 2 2 + |y| 2 2 - 2 = 2 – 2 - Estimate |x-y| 2 up to (1+ Θ( ε))-factor solves inner-product ε - So R || δ (L 2 -estimation ε ) = Ω(ε -2 log 1/δ) - log 1/δ factor is new, but want an (ε -2 log n log 1/δ) lower bound - Can use a known trick to get an extra log n factor - log 1/δ factor is new, but want an (ε -2 log n log 1/δ) lower bound - Can use a known trick to get an extra log n factor

Augmented Lopsided Set Intersection (ALSI2) Universe [U] = [1/ ε 2 ¢ 1/δ] S 1, …, S r ½ [U] All i: |S i | = 1/ ε 2 j 2 [U] i * 2 {1, 2, …, r} S i*+1 …, S r Is j 2 S i * ? R || 1/3 (ALSI2) = (r ε -2 log 1/δ)

Reduction of ALSI2 to L 2 -estimation ε S1S2…SrS1S2…Sr x1x2…xrx1x2…xr j S i*+1 … S r y i* x i*+1 … x r } x } y y - x = 10 i* y i* - i i* 10 i ¢ x i |y-x| 2 is dominated by 10 i* |y i* – x i* | 2 - Set r = Θ(log n) - R || δ (L 2 -estmation ε ) = ( ε -2 log n log 1/δ) - Streaming Space > R || δ (L 2 -estimation ε ) - Set r = Θ(log n) - R || δ (L 2 -estmation ε ) = ( ε -2 log n log 1/δ) - Streaming Space > R || δ (L 2 -estimation ε )

Lower Bounds for Johnson- Lindenstrauss Use public randomness to agree on a JL matrix A x 2 {-n O(1), …, n O(1) } t AxAy- |A(x-y)| 2 - Can estimate |x-y| 2 up to 1+ ε w.p. 1-δ - #rows(A) = (r ε -2 log 1/δ / log n) - Set r = Θ(log n) - Can estimate |x-y| 2 up to 1+ ε w.p. 1-δ - #rows(A) = (r ε -2 log 1/δ / log n) - Set r = Θ(log n) y 2 {-n O(1), …, n O(1) } t

Low-Error Hamming Distance Universe = [n ] Δ(x,y) = Hamming Distance between x and y x 2 {0,1} n y 2 {0,1} n R || δ (Δ(x,y) ε ) = (ε -2 log 1/ δ log n) Reduction to ALSI2 Gap-Hamming to LSI2 reductions with Low Error Implies our lower bounds for estimating Any L p -norm Distinct Elements Entropy

Conclusions Prove first streaming space lower bounds that depend on probability of error δ –Optimal for L p -norms, distinct elements –Improves lower bound for entropy –Optimal dimensionality bound for JL transforms Adds several twists to augmented indexing proofs –Augmented indexing with a small set in a large domain –Proof builds upon lopsided set disjointness lower bounds –Uses multiple Gap-Hamming to Indexing reductions that handle low error

ALSI2 to Hamming Distance - Let t = 1/ ε 2 log 1/δ - Use public coin to generate t random strings b 1, …, b t 2 {0,1} t - Alice sets x i = majority k in S i b i, k - Bob sets y i = b i,j S 1, …, S r ½ [1/ ε 2 ¢ 1/δ ] All i: |S i | = 1/ ε 2 j 2 [U] i * 2 {1, 2, …, r} S i*+1 …, S r Embed multiple copies by duplicating coordinates at different scales

Similar presentations