Download presentation

Presentation is loading. Please wait.

Published byJada Sanders Modified over 4 years ago

1
Fast Moment Estimation in Data Streams in Optimal Space Daniel Kane, Jelani Nelson, Ely Porat, David Woodruff Harvard MIT Bar-Ilan IBM

2
l p -estimation: Problem Statement Model x = (x 1, x 2, …, x n ) starts off as 0 n Stream of m updates (j 1, v 1 ), …, (j m, v m ) Update (j, v) causes change x j = x j + v v 2 {-M, -M+1, …, M} Problem Output l p = j=1 n |x j | p = |x| p Want small space and fast update time For simplicity: n, m, M are polynomially related

3
Some Bad News Alon, Matias, and Szegedy –No sublinear space algorithms unless Approximation (allow output to be (1± ε) l p ) Randomization (allow 1% failure probability) New goal –Output (1±ε) l p with probability 99%

4
Some More Bad News Estimating l p for p > 2 in a stream requires n 1-2/p space [AMS, IW, SS] We focus on the feasible regime, when p 2 (0,2) p = 0 and p = 2 well-understood –p = 0 is number of distinct elements –p = 2 is Euclidean norm

5
Applications for p 2 [1,2) l p -norm for p 2 [1,2) less sensitive to outliers –Nearest neighbor –Regression –Subspace approximation Query point a 2 R d Database points b1b2…bnb1b2…bn Want argmin j |a-b j | p Less likely to be spoiled by noise in each coordinate Can quickly replace d- dimensional points with small sketches

6
Applications for p 2 (0,1) Best entropy estimation in a stream [HNO] –Empirical entropy = j q j log(1/q j ), where q j = |x j |/|x| 1 –Estimates |x| p for O(log 1/ε) different p 2 (0,1) –Interpolates a polynomial through these values to estimate entropy –Entropy used for detecting DoS attacks, etc.

7
Previous Work for p 2 (0,2) Lot of players –FKSV, I, KNW, GC, NW, AOK Tradeoffs possible –Can get optimal ε -2 log n bits of space, but then the update time is at least 1/ε 2 –BIG difference in practice between ε -2 update time and O(1) (e.g., AMS vs. TZ for p = 2) –No way to get close to optimal space with less than poly(1/ε) update time

8
Our Results For every p 2 (0,2) –estimate l p with optimal ε -2 log n bits of space –log 2 1/ε log log 1/ε update time –exponential improvement over previous update time For entropy –Exponential improvement over previous update time (polylog 1/ε versus poly 1/ε)

9
Our Algorithm Split coordinates into head and tail j 2 head if |x j | p ¸ ε 2 |x| p p j 2 tail if |x j | p < ε 2 |x| p p Estimate |x| p p = |x head | p p + |x tail | p p separately Two completely different procedures

10
Outline Estimating |x head | p p Estimating |x tail | p p Putting it all together

11
Simplifications We can assume we know the set of head coordinates, as well as their signs Can be found using known algorithms [CountSketch] Challenge Need j in head |x j | p

12
Estimating |x head | p p xjxj log 1/ε rows 1/ ε 2 columns Hash each coordinate to a unique column in each row We DO NOT - maintain sum of values in each cell We DO NOT - maintain the inner product of values in a cell with a random sign vector Key idea: for each cell c, if S is the set of items hashed to c, let V(c) j in S x j ¢ exp(2 ¼i h(j)/r ) r is a parameter, i = sqrt(-1) Key idea: for each cell c, if S is the set of items hashed to c, let V(c) j in S x j ¢ exp(2 ¼i h(j)/r ) r is a parameter, i = sqrt(-1)

13
Our Algorithm To estimate |x head | p p –For each j in the head, find an arbitrary cell c(j) containing j and no other head coordinates –Compute y j = sign(x j ) ¢ exp(-2 ¼i h(j)/r) ¢ V(c) Recall V(c) j in S x j ¢ exp(2 ¼i h(j)/r ) –Expected value of y j is |x j | –What can we say about y j p ? –What does it mean?

14
Our Algorithm Recall y j = sign(x j ) ¢ exp(-2 ¼i h(j)/r) ¢ V(c) What is y j 1/2 if y j = -4? -4 = 4 exp( ¼ i) (-4) 1/2 = 2 exp( ¼ i / 2) = 2i or 2 exp(- ¼ i / 2) = -2i By y j p we mean |y j | p exp(i p arg(z)), where arg(z) 2 (- ¼, ¼ ] is the angle of y j in the complex plane

15
Our Algorithm Wishful thinking Estimator = j in head y j p Intuitively, when p = 1, since E[y j ] = |y j | we have an unbiased estimator For general p, this may be complex, so how about Estimator = Re [ j in head y j p ]? Almost correct, but we want optimal space, and were ignoring most of the cells Better: y j = Mean cells c isolating j sign(x j ) ¢ exp(-2 ¼i h(j)/r) ¢ V(c)

16
Analysis Why did we use roots of unity? Estimator is real part of j in head y j p j in head y j p = j in head |y j | p ¢ (1+z j ) p for z j = (y j - |y j |)/|y j | Can apply Generalized Binomial theorem E[|y j | p (1+z j ) p ] = |y j | p ¢ k=0 1 {p choose k} E[z j k ] = |y j | p + small since E[z j k ] = 0 if 0 < k < r Generalized binomial coefficient {p choose k} = p ¢ (p-1) (p-k+1)/k! = O(1/k 1+p ) Generalized binomial coefficient {p choose k} = p ¢ (p-1) (p-k+1)/k! = O(1/k 1+p ) Intuitively variance is small because head coordinates dont collide

17
Outline Estimating |x head | p p Estimating |x tail | p p Putting it all together

18
Our Algorithm x(b) Estimating |x tail | p p xjxj In each bucket b maintain an unbiased estimator of the p-th power of the p-norm |x(b)| p p in the bucket [Li] If Z 1, …, Z s are p-stable, for any vector a = (a 1, …, a s ), j=1 s Z j ¢ a j » |a| p Z, for Z also p-stable Add up estimators in all buckets not containing a head coordinate (variance is small)

19
Outline Estimating |x head | p p Estimating |x tail | p p Putting it all together

20
Complexity Bag of tricks Example For optimal space, in buckets in the light estimator, we prove 1/ ε p – wise independent p-stable variables suffice –Rewrite Lis estimator so that [KNW] can be applied Need to evaluate a degree- 1/ε p polynomial per update Instead: batch 1/ε p updates together and do fast multipoint evaluation –Can be deamortized –Use that different buckets are pairwise independent

21
Complexity Example # 2 Finding head coordinates requires ε -2 log 2 n space Reduce the universe size to poly 1/ε by hashing Now requires ε -2 log n log 1/ε space Replace ε with ε log 1/2 1/ε Head estimator okay, but slightly adjust light estimator

22
Conclusion For every p 2 (0,2) –estimate l p with optimal ε -2 log n bits of space –log 2 1/ε log log 1/ε update time –exponential improvement over previous update time For entropy –Exponential improvement over previous update time (polylog 1/ε versus poly 1/ε)

Similar presentations

OK

Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.

Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.

© 2018 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google

Slideshare ppt on marketing Ppt on digital image processing fundamentals Ppt on business etiquettes training Ppt on boilers operations specialist Make ppt online Ppt on functional requirements Ppt on object oriented programming in java Ppt on finite difference method Ppt on post abortion care Ppt on arabinose operon