Download presentation

Presentation is loading. Please wait.

Published byChana Pitman Modified over 2 years ago

1
Efficient Algorithms via Precision Sampling Robert Krauthgamer (Weizmann Institute) joint work with: Alexandr Andoni (Microsoft Research) Krzysztof Onak (CMU)

2
Goal Compute the fraction of Dacians in the empire Estimate S=a 1 +a 2 +…a n where a i [0,1]

3
Sampling Send accountants to a subset J of provinces, |J|=m Estimator: S ̃ =∑ j J a j * n / m Chebyshev bound: with 90% success probability 0.5*S – O(n/m) < S ̃ < 2*S + O(n/m) For constant additive error, need m~ n

4
Send accountants to each province, but require only approximate counts Estimate a i up to pre-selected precision u i i.e. |a i – a ̃ i |__
__

5
Formalization Estimator (Alg)Adversary 1. fix (hidden) a 1,a 2,…a n 1. fix precisions u i 2. fix ã 1,ã 2,…ã n s.t. |a i –ã i |__
__

6
Precision Sampling Lemma Goal: estimate ∑a i from {a ̃ i } satisfying |a i -a ̃ i |__
__

7
Precision Sampling Algorithm Precision Sampling Lemma: can get, with 90% success: O(1) additive error and 1.5 multiplicative error: S – O(1) < S ̃ < 1.5*S + O(1) with average cost equal to O(log n) Algorithm: Choose each u i [0,1] i.i.d. Estimator: S ̃ = count number of i‘s s.t. a ̃ i /u i > 6 (and normalize) Outline of analysis: E[ S ̃ ] = ∑ i Pr[a ̃ i /u i > 6] = ∑ i Pr[a i > (6±1)u i ] ≈ ∑ a i /6. Actually, a ̃ i may have also 1.5-multiplicative error w.r.t. a i E[1/u i ] = O(log n) w.h.p. (after truncation) function of [ã i /u i - 4/ε] + and u i ’s concrete distrib. = minimum of O(ε -3 ) uniform r.v. O(ε -3 log n) ε1+ε S – ε < S̃ < (1+ ε)S + ε

8
Why? Save time: Problem: computing edit distance between two strings [FOCS’10] new algorithm that obtains (log n) 1/ ε approximation in n 1+O( ε ) time via property-testing-like algorithm using Precision Sampling (recursively) Save space: Problem: compute norms/moments of frequencies in a data- stream [FOCS’11] a simple and unified approach to compute all l p -norms/moments, and related problems

9
Streaming/sketching IPFrequency IPFrequency Challenge: log statistics of the data, using small space

10
Streaming moments Setup: 1+ ε estimate frequencies in small space Let x i = frequency of IP i p th moment: Σ i x i p p=1: keep one counter! p [0,2]: space O( ε -2 ¢ log n) [AMS’96, I’00, GC’07, Li’08, NW’10, KNW’10, KNPW’11] p>2: space O ̃ ε (n 1-2/p ) [AMS’96, SS’02, BJKS’02, CKS’03, IW’05, BGKS’06, BO’11] Generally, x R n (updates: to coordinate i with ±1) Sketch = embedding into a “space” of small dimension Usually, linear L:R n R m for m ¿ n, thus L(x±e i )=Lx±Le i IPFrequency

11
l p moments Theorem: linear sketch for l p with O(1) approximation, and O(n 1-2/p log n) space (90% succ. prob.). =weak embedding of l p n into l ∞ m of dim m=O(n 1-2/p log n) Sketch: pick random u i [0,1], r i {±1} and let y i = r i ∙x i /u i 1/p throw y i ‘s into hash table H with m=O(n 1-2/p log n) cells Estimator: via PSL or just Max j [m] |H[j]| p Randomness: O(1) independence suffices x1x1 x2x2 x3x3 x4x4 x5x5 x6x6 y1+y3y1+y3 y4y4 y2+y5+y6y2+y5+y6 x= H= 1 … m

12
Under the Hood: Using PSL Idea: Use PSL to compute the sum ||x|| p p =∑ i |x i | p Assume ||x|| 2 =1 by scaling Set PSL additive error ε small compared to ||x|| 2 p /n p/2-1 · ||x|| p p Outline: 1. Pick u i ’s according to PSL and let y i =x i /u i 1/p 2. Compute every y i p =x i p /u i within additive approximation 1 done via heavy hitters of the vector y 3. Use PSL on |y i p u i |=|x i | p to compute the sum ∑ i |x i | p Space bound is controlled by the norm ||y|| 2 2. Since heavy hitters under l 2 is the best we can do Notice E||y|| 2 2 = ||x|| 2 2 ¢ E[1/u 2/p ] · (1/ ε ) 2/p =(n p/2-1 ) 2/p.

13
More Streaming Algorithms Other streaming algorithms: Same algorithm for all p-moments, including p≤2 For p>2, improves existing space bounds [AMS96, IW05, BGKS06, BO10] For p≤2, worse space bounds [AMS96, I00, GC07, Li08, NW10, KNW10, KNPW11] Algorithms for mixed norms ( l p of l q ) [CM05, GBD08, JW09] Space bounded by (Rademacher) p-type constant Algorithm for l p -sampling problem [MW’10] This work extended to give tight bounds by [JST’10] Connections: Inspired by the streaming algorithm of [IW05], but simpler Turns out to be distant relative of Priority Sampling [DLT’07]

14
Finale Other applications for Precision Sampling framework? Better algorithms for precision sampling? For average cost (for 1+ ε approximation) Upper bound: O( ε -3 log n) (tight for our algorithm) Lower bound: Ω ( ε -2 log n) Bounds for other cost models? E.g., for 1/square root of precision, the bound is O( ε -3/2 ) Other forms of “access” to a i ’s?

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google