Download presentation

Presentation is loading. Please wait.

Published byChana Pitman Modified over 4 years ago

1
Efficient Algorithms via Precision Sampling Robert Krauthgamer (Weizmann Institute) joint work with: Alexandr Andoni (Microsoft Research) Krzysztof Onak (CMU)

2
Goal Compute the fraction of Dacians in the empire Estimate S=a 1 +a 2 +…a n where a i [0,1]

3
Sampling Send accountants to a subset J of provinces, |J|=m Estimator: S ̃ =∑ j J a j * n / m Chebyshev bound: with 90% success probability 0.5*S – O(n/m) < S ̃ < 2*S + O(n/m) For constant additive error, need m~ n

4
Send accountants to each province, but require only approximate counts Estimate a i up to pre-selected precision u i i.e. |a i – a ̃ i |<u i Challenge: achieve good tradeoff between quality of approximation to S total cost of computing each a ̃ i (within precision u i ) Precision Sampling Framework

5
Formalization Estimator (Alg)Adversary 1. fix (hidden) a 1,a 2,…a n 1. fix precisions u i 2. fix ã 1,ã 2,…ã n s.t. |a i –ã i |<u i 3. report S̃ s.t. |∑ i a i –S̃| < 1 What is our cost model? Here, average cost = 1/n * ∑ i 1/u i Achieving precision u i requires 1/u i “resources”: e.g., if a i is itself a sum a i =∑ j a ij computed by subsampling, then one needs Θ( 1/u i ) samples For example, can choose all u i =1/n Average cost ≈ n This is best possible, if estimator S ̃ = ∑ i a ̃ i

6
Precision Sampling Lemma Goal: estimate ∑a i from {a ̃ i } satisfying |a i -a ̃ i |<u i. Precision Sampling Lemma: can get, with 90% success: O(1) additive error and 1.5 multiplicative error: S – O(1) < S ̃ < 1.5*S + O(1) with average cost O(log n) Example: distinguish Σ a i =3 vs Σ a i =1 Consider two extreme cases: if three a i =1: estimate all a i with crude approx (u i =0.1) if all a i =3/n: estimate few with good approx u i =1/n, the rest with u i =1 ε1+ε S – ε < S̃ < (1+ ε)S + ε O(ε -3 log n)

7
Precision Sampling Algorithm Precision Sampling Lemma: can get, with 90% success: O(1) additive error and 1.5 multiplicative error: S – O(1) < S ̃ < 1.5*S + O(1) with average cost equal to O(log n) Algorithm: Choose each u i [0,1] i.i.d. Estimator: S ̃ = count number of i‘s s.t. a ̃ i /u i > 6 (and normalize) Outline of analysis: E[ S ̃ ] = ∑ i Pr[a ̃ i /u i > 6] = ∑ i Pr[a i > (6±1)u i ] ≈ ∑ a i /6. Actually, a ̃ i may have also 1.5-multiplicative error w.r.t. a i E[1/u i ] = O(log n) w.h.p. (after truncation) function of [ã i /u i - 4/ε] + and u i ’s concrete distrib. = minimum of O(ε -3 ) uniform r.v. O(ε -3 log n) ε1+ε S – ε < S̃ < (1+ ε)S + ε

8
Why? Save time: Problem: computing edit distance between two strings [FOCS’10] new algorithm that obtains (log n) 1/ ε approximation in n 1+O( ε ) time via property-testing-like algorithm using Precision Sampling (recursively) Save space: Problem: compute norms/moments of frequencies in a data- stream [FOCS’11] a simple and unified approach to compute all l p -norms/moments, and related problems

9
Streaming/sketching IPFrequency 131.107.65.143 18.0.1.122 80.97.56.202 131.107.65.14 18.0.1.12 80.97.56.20 IPFrequency 131.107.65.143 18.0.1.122 80.97.56.202 127.0.0.19 192.168.0.18 257.2.5.70 16.09.20.111 Challenge: log statistics of the data, using small space

10
Streaming moments Setup: 1+ ε estimate frequencies in small space Let x i = frequency of IP i p th moment: Σ i x i p p=1: keep one counter! p [0,2]: space O( ε -2 ¢ log n) [AMS’96, I’00, GC’07, Li’08, NW’10, KNW’10, KNPW’11] p>2: space O ̃ ε (n 1-2/p ) [AMS’96, SS’02, BJKS’02, CKS’03, IW’05, BGKS’06, BO’11] Generally, x R n (updates: to coordinate i with ±1) Sketch = embedding into a “space” of small dimension Usually, linear L:R n R m for m ¿ n, thus L(x±e i )=Lx±Le i IPFrequency 131.107.65.143 18.0.1.122 80.97.56.202

11
l p moments Theorem: linear sketch for l p with O(1) approximation, and O(n 1-2/p log n) space (90% succ. prob.). =weak embedding of l p n into l ∞ m of dim m=O(n 1-2/p log n) Sketch: pick random u i [0,1], r i {±1} and let y i = r i ∙x i /u i 1/p throw y i ‘s into hash table H with m=O(n 1-2/p log n) cells Estimator: via PSL or just Max j [m] |H[j]| p Randomness: O(1) independence suffices x1x1 x2x2 x3x3 x4x4 x5x5 x6x6 y1+y3y1+y3 y4y4 y2+y5+y6y2+y5+y6 x= H= 1 … m

12
Under the Hood: Using PSL Idea: Use PSL to compute the sum ||x|| p p =∑ i |x i | p Assume ||x|| 2 =1 by scaling Set PSL additive error ε small compared to ||x|| 2 p /n p/2-1 · ||x|| p p Outline: 1. Pick u i ’s according to PSL and let y i =x i /u i 1/p 2. Compute every y i p =x i p /u i within additive approximation 1 done via heavy hitters of the vector y 3. Use PSL on |y i p u i |=|x i | p to compute the sum ∑ i |x i | p Space bound is controlled by the norm ||y|| 2 2. Since heavy hitters under l 2 is the best we can do Notice E||y|| 2 2 = ||x|| 2 2 ¢ E[1/u 2/p ] · (1/ ε ) 2/p =(n p/2-1 ) 2/p.

13
More Streaming Algorithms Other streaming algorithms: Same algorithm for all p-moments, including p≤2 For p>2, improves existing space bounds [AMS96, IW05, BGKS06, BO10] For p≤2, worse space bounds [AMS96, I00, GC07, Li08, NW10, KNW10, KNPW11] Algorithms for mixed norms ( l p of l q ) [CM05, GBD08, JW09] Space bounded by (Rademacher) p-type constant Algorithm for l p -sampling problem [MW’10] This work extended to give tight bounds by [JST’10] Connections: Inspired by the streaming algorithm of [IW05], but simpler Turns out to be distant relative of Priority Sampling [DLT’07]

14
Finale Other applications for Precision Sampling framework? Better algorithms for precision sampling? For average cost (for 1+ ε approximation) Upper bound: O( ε -3 log n) (tight for our algorithm) Lower bound: Ω ( ε -2 log n) Bounds for other cost models? E.g., for 1/square root of precision, the bound is O( ε -3/2 ) Other forms of “access” to a i ’s?

Similar presentations

OK

Measurement of a Pond Basics of plane geometry Idea of the Coordinate Plane Confining an Area Practical skill Cooperation in the Group Lake measurement.

Measurement of a Pond Basics of plane geometry Idea of the Coordinate Plane Confining an Area Practical skill Cooperation in the Group Lake measurement.

© 2018 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google