Sublinear Algorithms via Precision Sampling Alexandr Andoni (Microsoft Research) joint work with: Robert Krauthgamer (Weizmann Inst.) Krzysztof Onak (CMU)

Goal Compute the number of Dacians in the empire Estimate S=a 1 +a 2 +…a n where a i  [0,1] sublinearly…

Sampling  Send accountants to a subset J of provinces  Estimator: S ̃ =∑ j  J a j * n/J  Chebyshev bound: with 90% success probability 0.5*S – O(n/m) < S ̃ < 2*S + O(n/m)  For constant additive error, need m~ n

 Send accountants to each province, but require only approximate counts  Estimate a ̃ i, up to some pre-selected precision u i : |a i – a ̃ i | < u i  Challenge: achieve good trade-off between  quality of approximation to S  total cost of estimating each a ̃ i to precision u i Precision Sampling Framework

Formalization Sum EstimatorAdversary 1. fix a 1,a 2,…a n 1. fix precisions u i 2. fix ã 1,ã 2,…ã n s.t. |a i – ã i | < u i 3. given ã 1,ã 2,…ã n, output S̃ s.t. |∑a i – S̃| < 1.  What is cost?  Here, average cost = 1/n * ∑ 1/u i  to achieve precision u i, use 1/u i “resources”: e.g., if a i is itself a sum a i =∑ j a ij computed by subsampling, then one needs Θ( 1/u i ) samples  For example, can choose all u i =1/n  Average cost ≈ n  This is best possible, if estimator S ̃ = ∑a ̃ i

Precision Sampling Lemma  Goal: estimate ∑a i from {a ̃ i } satisfying |a i -a ̃ i |<u i.  Precision Sampling Lemma: can get, with 90% success:  O(1) additive error and 1.5 multiplicative error: S – O(1) < S ̃ < 1.5*S + O(1)  with average cost equal to O(log n)  Example: distinguish Σ a i =5 vs Σ a i =0  Consider two extreme cases:  if five a i =1: sample all, but need only crude approx (u i =1/10) if all a i =5/n: only few with good approx u i =1/n, and the rest with u i =1 ε1+ε S – ε < S̃ < (1+ ε)S + ε O(ε -3 log n)

Precision Sampling Algorithm  Precision Sampling Lemma: can get, with 90% success:  O(1) additive error and 1.5 multiplicative error: S – O(1) < S ̃ < 1.5*S + O(1)  with average cost equal to O(log n)  Algorithm:  Choose each u i  [0,1] i.i.d.  Estimator: S ̃ = count number of i‘s s.t. a ̃ i / u i > 6 (modulo a normalization constant)  Proof of correctness:  we use only a ̃ i which are (1+ ε )-approximation to a i  E[ S ̃ ] ≈ ∑ Pr[a i / u i > 6] = ∑ a i /6.  E[1/u] = O(log n) w.h.p. function of [ã i /u i - 4/ε] + and u i ’s concrete distrib. = minimum of O(ε -3 ) u.r.v. O(ε -3 log n) ε1+ε S – ε < S̃ < (1+ ε)S + ε

Why?  Save time:  Problem: computing edit distance between two strings  new algorithm that obtains (log n) 1/ ε approximation in n 1+O( ε ) time  via efficient property-testing algorithm that uses Precision Sampling  More details: see the talk by Robi on Friday!  Save space:  Problem: compute norms/frequency moments in streams  gives a simple and unified approach to compute all l p, F k moments, and other goodies  More details: now

Streaming frequencies  Setup:  1+ ε estimate frequencies in small space  Let x i = frequency of ethnicity i  k th moment: Σ x i k  k  [0,2]: space O(1/ ε 2 ) [AMS’96,I’00, GC07, Li08, NW10, KNW10, KNPW11]  k>2: space O ̃ (n 1-2/k ) [AMS’96,SS’02,BYJKS’02,CKS’03,IW’05,BGKS’06,BO10]  Sometimes frequencies x i are negative:  If measuring traffic difference (delay, etc)  We want linear “dim reduction” L:R n  R m m<<n EthnicityFrequency Dacians358 Galois12 Barbarians2988

Norm Estimation via Precision Sampling  Idea:  Use PSL to compute the sum ||x|| k k =∑ |x i | k  General approach  1. Pick u i ’s according to PSL and let y i =x i /u i 1/k  2. Compute all y i k up to additive approximation O(1)  Can be done by computing the heavy hitters of the vector y  3. Use PSL to compute the sum ||x|| k k =∑ |x i | k  Space bound is controlled by the norm ||y|| 2  Since heavy hitters under l 2 is the best we can do  Note that ||y|| 2 ≤||x|| 2 * E[1/u i ]

Streaming F k moments  Theorem: linear sketch for F k with O(1) approximation, O(1) update, and O(n 1-2/k log n) space (in words).  Sketch:  Pick random u i  [0,1], s i  {±1}, and let y i = s i * x i / u i 1/k  throw into one hash table H,  size m=O(n 1-2/k log n) cells  Update: on (i, a)  H[h(i)] += s i *a/u i 1/k  Estimator:  Max j  [m] |H[j]| k  Randomness: O(1) independence suffices x1x1 x2x2 x3x3 x4x4 x5x5 x6x6 y1+y3y1+y3 y4y4 y2+y5+y6y2+y5+y6 x= H=

More Streaming Algorithms  Other streaming algorithms:  Algorithm for all k-moments, including k≤2  For k>2, improves existing space bounds [AMS96, IW05, BGKS06, BO10]  For k≤2, worse space bounds [AMS96, I00, GC07, Li08, NW10, KNW10, KNPW11]  Improved algorithm for mixed norms ( l p of l k ) [CM05, GBD08, JW09]  space bounded by (Rademacher) p-type constant  Algorithm for l p -sampling problem [MW’10]  This work extended to give tight bounds by [JST’11]  Connections:  Inspired by the streaming algorithm of [IW05], but simpler  Turns out to be distant relative of Priority Sampling [DLT’07]

Finale  Other applications for Precision Sampling framework ?  Better algorithms for precision sampling ?  Best bound for average cost (for 1+ ε approximation)  Upper bound: O(1/ ε 3 * log n) (tight for our algorithm)  Lower bound: Ω (1/ ε 2 * log n)  Bounds for other cost models?  E.g., for 1/square root of precision, the bound is O(1 / ε 3/2 )  Other forms of “access” to a i ’s ?

Sublinear Algorithms via Precision Sampling Alexandr Andoni (Microsoft Research) joint work with: Robert Krauthgamer (Weizmann Inst.) Krzysztof Onak (CMU)

Similar presentations

Presentation on theme: "Sublinear Algorithms via Precision Sampling Alexandr Andoni (Microsoft Research) joint work with: Robert Krauthgamer (Weizmann Inst.) Krzysztof Onak (CMU)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Sublinear Algorithms via Precision Sampling Alexandr Andoni (Microsoft Research) joint work with: Robert Krauthgamer (Weizmann Inst.) Krzysztof Onak (CMU)

Similar presentations

Presentation on theme: "Sublinear Algorithms via Precision Sampling Alexandr Andoni (Microsoft Research) joint work with: Robert Krauthgamer (Weizmann Inst.) Krzysztof Onak (CMU)"— Presentation transcript:

Similar presentations

About project

Feedback