Download presentation

Presentation is loading. Please wait.

Published byChana Pitman Modified over 3 years ago

1
Efficient Algorithms via Precision Sampling Robert Krauthgamer (Weizmann Institute) joint work with: Alexandr Andoni (Microsoft Research) Krzysztof Onak (CMU)

2
Goal Compute the fraction of Dacians in the empire Estimate S=a 1 +a 2 +…a n where a i [0,1]

3
Sampling Send accountants to a subset J of provinces, |J|=m Estimator: S ̃ =∑ j J a j * n / m Chebyshev bound: with 90% success probability 0.5*S – O(n/m) < S ̃ < 2*S + O(n/m) For constant additive error, need m~ n

4
Send accountants to each province, but require only approximate counts Estimate a i up to pre-selected precision u i i.e. |a i – a ̃ i |<u i Challenge: achieve good tradeoff between quality of approximation to S total cost of computing each a ̃ i (within precision u i ) Precision Sampling Framework

5
Formalization Estimator (Alg)Adversary 1. fix (hidden) a 1,a 2,…a n 1. fix precisions u i 2. fix ã 1,ã 2,…ã n s.t. |a i –ã i |<u i 3. report S̃ s.t. |∑ i a i –S̃| < 1 What is our cost model? Here, average cost = 1/n * ∑ i 1/u i Achieving precision u i requires 1/u i “resources”: e.g., if a i is itself a sum a i =∑ j a ij computed by subsampling, then one needs Θ( 1/u i ) samples For example, can choose all u i =1/n Average cost ≈ n This is best possible, if estimator S ̃ = ∑ i a ̃ i

6
Precision Sampling Lemma Goal: estimate ∑a i from {a ̃ i } satisfying |a i -a ̃ i |<u i. Precision Sampling Lemma: can get, with 90% success: O(1) additive error and 1.5 multiplicative error: S – O(1) < S ̃ < 1.5*S + O(1) with average cost O(log n) Example: distinguish Σ a i =3 vs Σ a i =1 Consider two extreme cases: if three a i =1: estimate all a i with crude approx (u i =0.1) if all a i =3/n: estimate few with good approx u i =1/n, the rest with u i =1 ε1+ε S – ε < S̃ < (1+ ε)S + ε O(ε -3 log n)

7
Precision Sampling Algorithm Precision Sampling Lemma: can get, with 90% success: O(1) additive error and 1.5 multiplicative error: S – O(1) < S ̃ < 1.5*S + O(1) with average cost equal to O(log n) Algorithm: Choose each u i [0,1] i.i.d. Estimator: S ̃ = count number of i‘s s.t. a ̃ i /u i > 6 (and normalize) Outline of analysis: E[ S ̃ ] = ∑ i Pr[a ̃ i /u i > 6] = ∑ i Pr[a i > (6±1)u i ] ≈ ∑ a i /6. Actually, a ̃ i may have also 1.5-multiplicative error w.r.t. a i E[1/u i ] = O(log n) w.h.p. (after truncation) function of [ã i /u i - 4/ε] + and u i ’s concrete distrib. = minimum of O(ε -3 ) uniform r.v. O(ε -3 log n) ε1+ε S – ε < S̃ < (1+ ε)S + ε

8
Why? Save time: Problem: computing edit distance between two strings [FOCS’10] new algorithm that obtains (log n) 1/ ε approximation in n 1+O( ε ) time via property-testing-like algorithm using Precision Sampling (recursively) Save space: Problem: compute norms/moments of frequencies in a data- stream [FOCS’11] a simple and unified approach to compute all l p -norms/moments, and related problems

9
Streaming/sketching IPFrequency 131.107.65.143 18.0.1.122 80.97.56.202 131.107.65.14 18.0.1.12 80.97.56.20 IPFrequency 131.107.65.143 18.0.1.122 80.97.56.202 127.0.0.19 192.168.0.18 257.2.5.70 16.09.20.111 Challenge: log statistics of the data, using small space

10
Streaming moments Setup: 1+ ε estimate frequencies in small space Let x i = frequency of IP i p th moment: Σ i x i p p=1: keep one counter! p [0,2]: space O( ε -2 ¢ log n) [AMS’96, I’00, GC’07, Li’08, NW’10, KNW’10, KNPW’11] p>2: space O ̃ ε (n 1-2/p ) [AMS’96, SS’02, BJKS’02, CKS’03, IW’05, BGKS’06, BO’11] Generally, x R n (updates: to coordinate i with ±1) Sketch = embedding into a “space” of small dimension Usually, linear L:R n R m for m ¿ n, thus L(x±e i )=Lx±Le i IPFrequency 131.107.65.143 18.0.1.122 80.97.56.202

11
l p moments Theorem: linear sketch for l p with O(1) approximation, and O(n 1-2/p log n) space (90% succ. prob.). =weak embedding of l p n into l ∞ m of dim m=O(n 1-2/p log n) Sketch: pick random u i [0,1], r i {±1} and let y i = r i ∙x i /u i 1/p throw y i ‘s into hash table H with m=O(n 1-2/p log n) cells Estimator: via PSL or just Max j [m] |H[j]| p Randomness: O(1) independence suffices x1x1 x2x2 x3x3 x4x4 x5x5 x6x6 y1+y3y1+y3 y4y4 y2+y5+y6y2+y5+y6 x= H= 1 … m

12
Under the Hood: Using PSL Idea: Use PSL to compute the sum ||x|| p p =∑ i |x i | p Assume ||x|| 2 =1 by scaling Set PSL additive error ε small compared to ||x|| 2 p /n p/2-1 · ||x|| p p Outline: 1. Pick u i ’s according to PSL and let y i =x i /u i 1/p 2. Compute every y i p =x i p /u i within additive approximation 1 done via heavy hitters of the vector y 3. Use PSL on |y i p u i |=|x i | p to compute the sum ∑ i |x i | p Space bound is controlled by the norm ||y|| 2 2. Since heavy hitters under l 2 is the best we can do Notice E||y|| 2 2 = ||x|| 2 2 ¢ E[1/u 2/p ] · (1/ ε ) 2/p =(n p/2-1 ) 2/p.

13
More Streaming Algorithms Other streaming algorithms: Same algorithm for all p-moments, including p≤2 For p>2, improves existing space bounds [AMS96, IW05, BGKS06, BO10] For p≤2, worse space bounds [AMS96, I00, GC07, Li08, NW10, KNW10, KNPW11] Algorithms for mixed norms ( l p of l q ) [CM05, GBD08, JW09] Space bounded by (Rademacher) p-type constant Algorithm for l p -sampling problem [MW’10] This work extended to give tight bounds by [JST’10] Connections: Inspired by the streaming algorithm of [IW05], but simpler Turns out to be distant relative of Priority Sampling [DLT’07]

14
Finale Other applications for Precision Sampling framework? Better algorithms for precision sampling? For average cost (for 1+ ε approximation) Upper bound: O( ε -3 log n) (tight for our algorithm) Lower bound: Ω ( ε -2 log n) Bounds for other cost models? E.g., for 1/square root of precision, the bound is O( ε -3/2 ) Other forms of “access” to a i ’s?

Similar presentations

OK

Sketching and Embedding are Equivalent for Norms Alexandr Andoni (Simons Inst. / Columbia) Robert Krauthgamer (Weizmann Inst.) Ilya Razenshteyn (MIT, now.

Sketching and Embedding are Equivalent for Norms Alexandr Andoni (Simons Inst. / Columbia) Robert Krauthgamer (Weizmann Inst.) Ilya Razenshteyn (MIT, now.

© 2018 SlidePlayer.com Inc.

All rights reserved.

By using this website, you agree with our use of **cookies** to functioning of the site. More info in our Privacy Policy and Google Privacy & Terms.

Ads by Google

Ppt on south african culture pictures Ppt on railway track maintenance Ppt on new technology 2013 Ppt on various types of web browsers and their comparative features Ppt on drama julius caesar by william shakespeare Ppt on flora and fauna of kerala Ppt on event driven programming concept Ppt on diode as rectifier bridge Ppt on different occupations workers Ppt on eisenmenger syndrome image