# Fast Moment Estimation in Data Streams in Optimal Space Daniel Kane, Jelani Nelson, Ely Porat, David Woodruff Harvard MIT Bar-Ilan IBM.

## Presentation on theme: "Fast Moment Estimation in Data Streams in Optimal Space Daniel Kane, Jelani Nelson, Ely Porat, David Woodruff Harvard MIT Bar-Ilan IBM."— Presentation transcript:

Fast Moment Estimation in Data Streams in Optimal Space Daniel Kane, Jelani Nelson, Ely Porat, David Woodruff Harvard MIT Bar-Ilan IBM

l p -estimation: Problem Statement Model x = (x 1, x 2, …, x n ) starts off as 0 n Stream of m updates (j 1, v 1 ), …, (j m, v m ) Update (j, v) causes change x j = x j + v v 2 {-M, -M+1, …, M} Problem Output l p = j=1 n |x j | p = |x| p Want small space and fast update time For simplicity: n, m, M are polynomially related

Some Bad News Alon, Matias, and Szegedy –No sublinear space algorithms unless Approximation (allow output to be (1± ε) l p ) Randomization (allow 1% failure probability) New goal –Output (1±ε) l p with probability 99%

Some More Bad News Estimating l p for p > 2 in a stream requires n 1-2/p space [AMS, IW, SS] We focus on the feasible regime, when p 2 (0,2) p = 0 and p = 2 well-understood –p = 0 is number of distinct elements –p = 2 is Euclidean norm

Applications for p 2 [1,2) l p -norm for p 2 [1,2) less sensitive to outliers –Nearest neighbor –Regression –Subspace approximation Query point a 2 R d Database points b1b2…bnb1b2…bn Want argmin j |a-b j | p Less likely to be spoiled by noise in each coordinate Can quickly replace d- dimensional points with small sketches

Applications for p 2 (0,1) Best entropy estimation in a stream [HNO] –Empirical entropy = j q j log(1/q j ), where q j = |x j |/|x| 1 –Estimates |x| p for O(log 1/ε) different p 2 (0,1) –Interpolates a polynomial through these values to estimate entropy –Entropy used for detecting DoS attacks, etc.

Previous Work for p 2 (0,2) Lot of players –FKSV, I, KNW, GC, NW, AOK Tradeoffs possible –Can get optimal ε -2 log n bits of space, but then the update time is at least 1/ε 2 –BIG difference in practice between ε -2 update time and O(1) (e.g., AMS vs. TZ for p = 2) –No way to get close to optimal space with less than poly(1/ε) update time

Our Results For every p 2 (0,2) –estimate l p with optimal ε -2 log n bits of space –log 2 1/ε log log 1/ε update time –exponential improvement over previous update time For entropy –Exponential improvement over previous update time (polylog 1/ε versus poly 1/ε)

Our Algorithm Split coordinates into head and tail j 2 head if |x j | p ¸ ε 2 |x| p p j 2 tail if |x j | p < ε 2 |x| p p Estimate |x| p p = |x head | p p + |x tail | p p separately Two completely different procedures

Outline Estimating |x head | p p Estimating |x tail | p p Putting it all together

Simplifications We can assume we know the set of head coordinates, as well as their signs Can be found using known algorithms [CountSketch] Challenge Need j in head |x j | p

Estimating |x head | p p xjxj log 1/ε rows 1/ ε 2 columns Hash each coordinate to a unique column in each row We DO NOT - maintain sum of values in each cell We DO NOT - maintain the inner product of values in a cell with a random sign vector Key idea: for each cell c, if S is the set of items hashed to c, let V(c) j in S x j ¢ exp(2 ¼i h(j)/r ) r is a parameter, i = sqrt(-1) Key idea: for each cell c, if S is the set of items hashed to c, let V(c) j in S x j ¢ exp(2 ¼i h(j)/r ) r is a parameter, i = sqrt(-1)

Our Algorithm To estimate |x head | p p –For each j in the head, find an arbitrary cell c(j) containing j and no other head coordinates –Compute y j = sign(x j ) ¢ exp(-2 ¼i h(j)/r) ¢ V(c) Recall V(c) j in S x j ¢ exp(2 ¼i h(j)/r ) –Expected value of y j is |x j | –What can we say about y j p ? –What does it mean?

Our Algorithm Recall y j = sign(x j ) ¢ exp(-2 ¼i h(j)/r) ¢ V(c) What is y j 1/2 if y j = -4? -4 = 4 exp( ¼ i) (-4) 1/2 = 2 exp( ¼ i / 2) = 2i or 2 exp(- ¼ i / 2) = -2i By y j p we mean |y j | p exp(i p arg(z)), where arg(z) 2 (- ¼, ¼ ] is the angle of y j in the complex plane

Our Algorithm Wishful thinking Estimator = j in head y j p Intuitively, when p = 1, since E[y j ] = |y j | we have an unbiased estimator For general p, this may be complex, so how about Estimator = Re [ j in head y j p ]? Almost correct, but we want optimal space, and were ignoring most of the cells Better: y j = Mean cells c isolating j sign(x j ) ¢ exp(-2 ¼i h(j)/r) ¢ V(c)

Analysis Why did we use roots of unity? Estimator is real part of j in head y j p j in head y j p = j in head |y j | p ¢ (1+z j ) p for z j = (y j - |y j |)/|y j | Can apply Generalized Binomial theorem E[|y j | p (1+z j ) p ] = |y j | p ¢ k=0 1 {p choose k} E[z j k ] = |y j | p + small since E[z j k ] = 0 if 0 < k < r Generalized binomial coefficient {p choose k} = p ¢ (p-1) (p-k+1)/k! = O(1/k 1+p ) Generalized binomial coefficient {p choose k} = p ¢ (p-1) (p-k+1)/k! = O(1/k 1+p ) Intuitively variance is small because head coordinates dont collide

Outline Estimating |x head | p p Estimating |x tail | p p Putting it all together

Our Algorithm x(b) Estimating |x tail | p p xjxj In each bucket b maintain an unbiased estimator of the p-th power of the p-norm |x(b)| p p in the bucket [Li] If Z 1, …, Z s are p-stable, for any vector a = (a 1, …, a s ), j=1 s Z j ¢ a j » |a| p Z, for Z also p-stable Add up estimators in all buckets not containing a head coordinate (variance is small)

Outline Estimating |x head | p p Estimating |x tail | p p Putting it all together

Complexity Bag of tricks Example For optimal space, in buckets in the light estimator, we prove 1/ ε p – wise independent p-stable variables suffice –Rewrite Lis estimator so that [KNW] can be applied Need to evaluate a degree- 1/ε p polynomial per update Instead: batch 1/ε p updates together and do fast multipoint evaluation –Can be deamortized –Use that different buckets are pairwise independent

Complexity Example # 2 Finding head coordinates requires ε -2 log 2 n space Reduce the universe size to poly 1/ε by hashing Now requires ε -2 log n log 1/ε space Replace ε with ε log 1/2 1/ε Head estimator okay, but slightly adjust light estimator

Conclusion For every p 2 (0,2) –estimate l p with optimal ε -2 log n bits of space –log 2 1/ε log log 1/ε update time –exponential improvement over previous update time For entropy –Exponential improvement over previous update time (polylog 1/ε versus poly 1/ε)

Download ppt "Fast Moment Estimation in Data Streams in Optimal Space Daniel Kane, Jelani Nelson, Ely Porat, David Woodruff Harvard MIT Bar-Ilan IBM."

Similar presentations