Download presentation

Presentation is loading. Please wait.

Published byColin Bolton Modified over 5 years ago

1
The Data Stream Space Complexity of Cascaded Norms T.S. Jayram David Woodruff IBM Almaden

2
Data streams Algorithms access data in a sequential fashion One pass / small space Need to be randomized and approximate [FM, MP, AMS] Algorithm Main Memory 2 3 4 16 0 100 5 4 501 200 401 2 3 6 0

3
Frequency Moments and Norms Stream defines updates to a set of items 1,2,…,d. f i = weight of item i positive-only vs. turnstile model k-th Frequency Moment F k = i |f i | k p-th Norm: L p = k f k p = ( i |f i | p ) 1/p Maximum frequency: p= 1 Distinct Elements: p=0 Heavy hitters Assume length of stream and magnitude of updates is · poly(d)

4
Classical Results Approximating L p and F p is the same problem For 0 · p · 2, F p is approximable in O~(1) space (AMS, FM, Indyk, …) For p > 2, F p is approximable in O~(d 1-2/p ) space (IW) this is best-possible (BJKS, CKS)

5
Cascaded Aggregates Stream defines updates to pairs of items in {1,2,…n} x {1,2,…,d} f ij = weight of item (i,j) Two aggregates P and Q Q P P ± Q P ± Q = cascaded aggregate

6
Motivation Multigraph streams for analyzing IP traffic [Cormode-Muthukrishnan] Corresponds to P ± F 0 for different Ps F 0 returns #destinations accessed by each source Also introduced the more general problem of estimating P ± Q Computing complex join estimates Product metrics [Andoni-Indyk-Krauthgamer] Stock volatility, computational geometry, operator norms

7
k n n 1-2/k d 1 k=p 012 1 0 1 2 p n 1-2/k d 1-2/p n 1-1/k £ (1) ? d 1-2/p d n 1-1/k The Picture Estimating L k ± L p We give a 1-pass O~(n 1-2/k d 1-2/p ) space algorithm when k ¸ p We also provide a matching lower bound based on multiparty disjointness We give a 1-pass O~(n 1-2/k d 1-2/p ) space algorithm when k ¸ p We also provide a matching lower bound based on multiparty disjointness We give the (n 1-1/k ) bound for L k ± L 0 and L k ± L 1 Õ(n 1/2 ) for L 2 ± L 0 without deletions [CM] Õ(n 1-1/k ) for L k ± L p for any p in {0,1} in turnstile [MW] We give the (n 1-1/k ) bound for L k ± L 0 and L k ± L 1 Õ(n 1/2 ) for L 2 ± L 0 without deletions [CM] Õ(n 1-1/k ) for L k ± L p for any p in {0,1} in turnstile [MW] [Ganguly] (without deletions) Follows from techniques of [ADIW] Follows from techniques of [ADIW] Our upper bound

8
Our Problem: F k ± F p F k ± F p (M) = i ( j |f ij | p ) k = i F p (Row i) k M =

9
High Level Ideas: F k ± F p 1. We want the F k -value of the vector (F p (Row 1), …, F p (Row n)) 2. We try to sample a row i with probability / F p (Row i) 3. Spend an extra pass to compute F p (Row i) 4. Could then output F p (M) ¢ F p (Row i) k-1 (can be seen as a generalization of [AMS]) How do we do the sampling efficiently??

10
Review – Estimating F p [IW] Level sets: Level t is good if |S t |(1+ε) 2t ¸ F 2 /B Items from such level sets are also good

11
² -Histogram [IW] Finds approximate sizes s t of level sets For all S t, s t · (1+ε)|S t | For good S t, s t ¸ (1- ε)|S t | Also provides O~(1) random samples from each good S t Space: O~(B)

12
Sampling Rows According to F p value Treat n x d matrix M as a vector: Run ε-Histogram on M for certain B Obtain (1 § ε)-approximation s t to |S t | for good t F k ± F p (M) ¸ (1-ε) F k ± F p (M), where M is M restricted to good items (Holders inequality) To sample, Choose a good t with probability s t (1+ε) pt /F p (M), where F p (M) = sum good t s t (1+ε) pt Choose random sample (i, j) from S t Let row i be the current sample Pr[row i] = t [ s t (1+ε) pt /F p (M)] ¢ [|S t Å row i|/|S t |] ¼ F p (row i)/F p (M) Pr[row i] = t [ s t (1+ε) pt /F p (M)] ¢ [|S t Å row i|/|S t |] ¼ F p (row i)/F p (M) Problems 1. High level algorithm requires many samples (up to n 1-1/k ) from the S t, but [IW] just gives O~(1). Cant afford to repeat in low space 2. Algorithm may misclassify a pair (i,j) into S t when it is in S t-1 Problems 1. High level algorithm requires many samples (up to n 1-1/k ) from the S t, but [IW] just gives O~(1). Cant afford to repeat in low space 2. Algorithm may misclassify a pair (i,j) into S t when it is in S t-1

13
High Level Ideas: F k ± F p 1. We want the F k -value of the vector (F p (Row 1), …, F p (Row n)) 2. We try to sample a row i with probability / F p (Row i) 3. Spend an extra pass to compute F p (Row i) 4. Could then output F p (M) ¢ F p (Row i) k-1 (can be seen as a generalization of [AMS]) How do we avoid an extra pass??

14
Avoiding an Extra Pass Now we can sample a Row i / F p (Row i) We design a new F k -algorithm to run on (F p (Row 1), …, F p (Row n)) which only receives IDs i with probability / F p (Row i) For each j 2 [log n], algorithm does: 1. Choose a random subset of n/2 j rows 2. Sample a row i from this set with Pr[Row i] / F p (Row i) We show that O~(n 1-1/k ) oracle samples is enough to estimate F k up to 1 § ε

15
New Lower Bounds AliceBob n x d matrix A n x d matrix B NO instance: for all rows i, ¢ (A i, B i ) · 1 YES instance: there is a unique row j for which ¢ (A j, B j ) = d, and for all i j, ¢ (A i, B i ) · 1 We show distinguishing these cases requires (n/d) randomized communication CC Implies estimating L k (L 0 ) or L k (L 1 ) needs (n 1-1/k ) space

16
Information Complexity Paradigm [CSWY, BJKS]: the information cost IC is the amount of information the transcript reveals about the inputs For any function f, CC(f) ¸ IC(f) Using their direct sum theorem, it suffices to show an (1/d) information cost of a protocol for deciding if ¢ (x,y) = d or ¢ (x,y) · 1 Caveat: distribution is only on instances where ¢ (x,y) · 1

17
Working with Hellinger Distance Given the prob. distribution vector ¼ (x,y) over transcripts of an input (x,y) Let Ã (x,y) ¿ = ¼ (x,y) ¿ 1/2 for all ¿ Information cost can be lower bounded by ¢ (u,v) = 1 k Ã (u,u) - Ã (u,v) k 2 Unlike previous work, we exploit the geometry of the squared Euclidean norm (useful in later work [AJP]) Short diagonals property: ¢ (u,v) = 1 k Ã (u,u) - Ã (u,v) k 2 ¸ (1/d) ¢ (u,v) = d k Ã (u,u) - Ã (u,v) k 2 a b c d e f a 2 + b 2 + c 2 + d 2 ¸ e 2 + f 2

18
Open Problems L k ± L p estimation for k < p Other cascaded aggregates, e.g. entropy Cascaded aggregates with 3 or more stages

Similar presentations

OK

Data Streams and Applications in Computer Science David Woodruff IBM Almaden Presburger lecture, ICALP, 2014.

Data Streams and Applications in Computer Science David Woodruff IBM Almaden Presburger lecture, ICALP, 2014.

© 2019 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google