Presentation is loading. Please wait.

Presentation is loading. Please wait.

Efficient Algorithms via Precision Sampling Robert Krauthgamer (Weizmann Institute) joint work with: Alexandr Andoni (Microsoft Research) Krzysztof Onak.

Similar presentations


Presentation on theme: "Efficient Algorithms via Precision Sampling Robert Krauthgamer (Weizmann Institute) joint work with: Alexandr Andoni (Microsoft Research) Krzysztof Onak."— Presentation transcript:

1 Efficient Algorithms via Precision Sampling Robert Krauthgamer (Weizmann Institute) joint work with: Alexandr Andoni (Microsoft Research) Krzysztof Onak (CMU)

2 Goal Compute the fraction of Dacians in the empire Estimate S=a 1 +a 2 +…a n where a i  [0,1]

3 Sampling  Send accountants to a subset J of provinces, |J|=m  Estimator: S ̃ =∑ j  J a j * n / m  Chebyshev bound: with 90% success probability 0.5*S – O(n/m) < S ̃ < 2*S + O(n/m)  For constant additive error, need m~ n

4  Send accountants to each province, but require only approximate counts  Estimate a i up to pre-selected precision u i i.e. |a i – a ̃ i |<u i  Challenge: achieve good tradeoff between  quality of approximation to S  total cost of computing each a ̃ i (within precision u i ) Precision Sampling Framework

5 Formalization Estimator (Alg)Adversary 1. fix (hidden) a 1,a 2,…a n 1. fix precisions u i 2. fix ã 1,ã 2,…ã n s.t. |a i –ã i |<u i 3. report S̃ s.t. |∑ i a i –S̃| < 1  What is our cost model?  Here, average cost = 1/n * ∑ i 1/u i  Achieving precision u i requires 1/u i “resources”: e.g., if a i is itself a sum a i =∑ j a ij computed by subsampling, then one needs Θ( 1/u i ) samples  For example, can choose all u i =1/n  Average cost ≈ n  This is best possible, if estimator S ̃ = ∑ i a ̃ i

6 Precision Sampling Lemma  Goal: estimate ∑a i from {a ̃ i } satisfying |a i -a ̃ i |<u i.  Precision Sampling Lemma: can get, with 90% success:  O(1) additive error and 1.5 multiplicative error: S – O(1) < S ̃ < 1.5*S + O(1)  with average cost O(log n)  Example: distinguish Σ a i =3 vs Σ a i =1  Consider two extreme cases:  if three a i =1: estimate all a i with crude approx (u i =0.1)  if all a i =3/n: estimate few with good approx u i =1/n, the rest with u i =1 ε1+ε S – ε < S̃ < (1+ ε)S + ε O(ε -3 log n)

7 Precision Sampling Algorithm  Precision Sampling Lemma: can get, with 90% success:  O(1) additive error and 1.5 multiplicative error: S – O(1) < S ̃ < 1.5*S + O(1)  with average cost equal to O(log n)  Algorithm:  Choose each u i  [0,1] i.i.d.  Estimator: S ̃ = count number of i‘s s.t. a ̃ i /u i > 6 (and normalize)  Outline of analysis:  E[ S ̃ ] = ∑ i Pr[a ̃ i /u i > 6] = ∑ i Pr[a i > (6±1)u i ] ≈ ∑ a i /6.  Actually, a ̃ i may have also 1.5-multiplicative error w.r.t. a i  E[1/u i ] = O(log n) w.h.p. (after truncation) function of [ã i /u i - 4/ε] + and u i ’s concrete distrib. = minimum of O(ε -3 ) uniform r.v. O(ε -3 log n) ε1+ε S – ε < S̃ < (1+ ε)S + ε

8 Why?  Save time:  Problem: computing edit distance between two strings [FOCS’10]  new algorithm that obtains (log n) 1/ ε approximation in n 1+O( ε ) time  via property-testing-like algorithm using Precision Sampling (recursively)  Save space:  Problem: compute norms/moments of frequencies in a data- stream [FOCS’11]  a simple and unified approach to compute all l p -norms/moments, and related problems

9 Streaming/sketching IPFrequency 131.107.65.143 18.0.1.122 80.97.56.202 131.107.65.14 18.0.1.12 80.97.56.20 IPFrequency 131.107.65.143 18.0.1.122 80.97.56.202 127.0.0.19 192.168.0.18 257.2.5.70 16.09.20.111 Challenge: log statistics of the data, using small space

10 Streaming moments  Setup:  1+ ε estimate frequencies in small space  Let x i = frequency of IP i  p th moment: Σ i x i p  p=1: keep one counter!  p  [0,2]: space O( ε -2 ¢ log n) [AMS’96, I’00, GC’07, Li’08, NW’10, KNW’10, KNPW’11]  p>2: space O ̃ ε (n 1-2/p ) [AMS’96, SS’02, BJKS’02, CKS’03, IW’05, BGKS’06, BO’11]  Generally, x  R n (updates: to coordinate i with ±1)  Sketch = embedding into a “space” of small dimension  Usually, linear L:R n  R m for m ¿ n, thus L(x±e i )=Lx±Le i IPFrequency 131.107.65.143 18.0.1.122 80.97.56.202

11 l p moments  Theorem: linear sketch for l p with O(1) approximation, and O(n 1-2/p log n) space (90% succ. prob.).  =weak embedding of l p n into l ∞ m of dim m=O(n 1-2/p log n)  Sketch:  pick random u i  [0,1], r i  {±1} and let y i = r i ∙x i /u i 1/p  throw y i ‘s into hash table H with m=O(n 1-2/p log n) cells  Estimator:  via PSL or just Max j  [m] |H[j]| p  Randomness: O(1) independence suffices x1x1 x2x2 x3x3 x4x4 x5x5 x6x6 y1+y3y1+y3 y4y4 y2+y5+y6y2+y5+y6 x= H= 1 … m

12 Under the Hood: Using PSL  Idea: Use PSL to compute the sum ||x|| p p =∑ i |x i | p  Assume ||x|| 2 =1 by scaling  Set PSL additive error ε small compared to ||x|| 2 p /n p/2-1 · ||x|| p p  Outline:  1. Pick u i ’s according to PSL and let y i =x i /u i 1/p  2. Compute every y i p =x i p /u i within additive approximation 1  done via heavy hitters of the vector y  3. Use PSL on |y i p u i |=|x i | p to compute the sum ∑ i |x i | p  Space bound is controlled by the norm ||y|| 2 2.  Since heavy hitters under l 2 is the best we can do  Notice E||y|| 2 2 = ||x|| 2 2 ¢ E[1/u 2/p ] · (1/ ε ) 2/p =(n p/2-1 ) 2/p.

13 More Streaming Algorithms  Other streaming algorithms:  Same algorithm for all p-moments, including p≤2  For p>2, improves existing space bounds [AMS96, IW05, BGKS06, BO10]  For p≤2, worse space bounds [AMS96, I00, GC07, Li08, NW10, KNW10, KNPW11]  Algorithms for mixed norms ( l p of l q ) [CM05, GBD08, JW09]  Space bounded by (Rademacher) p-type constant  Algorithm for l p -sampling problem [MW’10]  This work extended to give tight bounds by [JST’10]  Connections:  Inspired by the streaming algorithm of [IW05], but simpler  Turns out to be distant relative of Priority Sampling [DLT’07]

14 Finale  Other applications for Precision Sampling framework?  Better algorithms for precision sampling?  For average cost (for 1+ ε approximation)  Upper bound: O( ε -3 log n) (tight for our algorithm)  Lower bound: Ω ( ε -2 log n)  Bounds for other cost models?  E.g., for 1/square root of precision, the bound is O( ε -3/2 )  Other forms of “access” to a i ’s?


Download ppt "Efficient Algorithms via Precision Sampling Robert Krauthgamer (Weizmann Institute) joint work with: Alexandr Andoni (Microsoft Research) Krzysztof Onak."

Similar presentations


Ads by Google