# The Data Stream Space Complexity of Cascaded Norms T.S. Jayram David Woodruff IBM Almaden.

## Presentation on theme: "The Data Stream Space Complexity of Cascaded Norms T.S. Jayram David Woodruff IBM Almaden."— Presentation transcript:

The Data Stream Space Complexity of Cascaded Norms T.S. Jayram David Woodruff IBM Almaden

Data streams Algorithms access data in a sequential fashion One pass / small space Need to be randomized and approximate [FM, MP, AMS] Algorithm Main Memory 2 3 4 16 0 100 5 4 501 200 401 2 3 6 0

Frequency Moments and Norms Stream defines updates to a set of items 1,2,…,d. f i = weight of item i positive-only vs. turnstile model k-th Frequency Moment F k = i |f i | k p-th Norm: L p = k f k p = ( i |f i | p ) 1/p Maximum frequency: p= 1 Distinct Elements: p=0 Heavy hitters Assume length of stream and magnitude of updates is · poly(d)

Classical Results Approximating L p and F p is the same problem For 0 · p · 2, F p is approximable in O~(1) space (AMS, FM, Indyk, …) For p > 2, F p is approximable in O~(d 1-2/p ) space (IW) this is best-possible (BJKS, CKS)

Cascaded Aggregates Stream defines updates to pairs of items in {1,2,…n} x {1,2,…,d} f ij = weight of item (i,j) Two aggregates P and Q Q P P ± Q P ± Q = cascaded aggregate

Motivation Multigraph streams for analyzing IP traffic [Cormode-Muthukrishnan] Corresponds to P ± F 0 for different Ps F 0 returns #destinations accessed by each source Also introduced the more general problem of estimating P ± Q Computing complex join estimates Product metrics [Andoni-Indyk-Krauthgamer] Stock volatility, computational geometry, operator norms

k n n 1-2/k d 1 k=p 012 1 0 1 2 p n 1-2/k d 1-2/p n 1-1/k £ (1) ? d 1-2/p d n 1-1/k The Picture Estimating L k ± L p We give a 1-pass O~(n 1-2/k d 1-2/p ) space algorithm when k ¸ p We also provide a matching lower bound based on multiparty disjointness We give a 1-pass O~(n 1-2/k d 1-2/p ) space algorithm when k ¸ p We also provide a matching lower bound based on multiparty disjointness We give the (n 1-1/k ) bound for L k ± L 0 and L k ± L 1 Õ(n 1/2 ) for L 2 ± L 0 without deletions [CM] Õ(n 1-1/k ) for L k ± L p for any p in {0,1} in turnstile [MW] We give the (n 1-1/k ) bound for L k ± L 0 and L k ± L 1 Õ(n 1/2 ) for L 2 ± L 0 without deletions [CM] Õ(n 1-1/k ) for L k ± L p for any p in {0,1} in turnstile [MW] [Ganguly] (without deletions) Follows from techniques of [ADIW] Follows from techniques of [ADIW] Our upper bound

Our Problem: F k ± F p F k ± F p (M) = i ( j |f ij | p ) k = i F p (Row i) k M =

High Level Ideas: F k ± F p 1. We want the F k -value of the vector (F p (Row 1), …, F p (Row n)) 2. We try to sample a row i with probability / F p (Row i) 3. Spend an extra pass to compute F p (Row i) 4. Could then output F p (M) ¢ F p (Row i) k-1 (can be seen as a generalization of [AMS]) How do we do the sampling efficiently??

Review – Estimating F p [IW] Level sets: Level t is good if |S t |(1+ε) 2t ¸ F 2 /B Items from such level sets are also good

² -Histogram [IW] Finds approximate sizes s t of level sets For all S t, s t · (1+ε)|S t | For good S t, s t ¸ (1- ε)|S t | Also provides O~(1) random samples from each good S t Space: O~(B)

Sampling Rows According to F p value Treat n x d matrix M as a vector: Run ε-Histogram on M for certain B Obtain (1 § ε)-approximation s t to |S t | for good t F k ± F p (M) ¸ (1-ε) F k ± F p (M), where M is M restricted to good items (Holders inequality) To sample, Choose a good t with probability s t (1+ε) pt /F p (M), where F p (M) = sum good t s t (1+ε) pt Choose random sample (i, j) from S t Let row i be the current sample Pr[row i] = t [ s t (1+ε) pt /F p (M)] ¢ [|S t Å row i|/|S t |] ¼ F p (row i)/F p (M) Pr[row i] = t [ s t (1+ε) pt /F p (M)] ¢ [|S t Å row i|/|S t |] ¼ F p (row i)/F p (M) Problems 1. High level algorithm requires many samples (up to n 1-1/k ) from the S t, but [IW] just gives O~(1). Cant afford to repeat in low space 2. Algorithm may misclassify a pair (i,j) into S t when it is in S t-1 Problems 1. High level algorithm requires many samples (up to n 1-1/k ) from the S t, but [IW] just gives O~(1). Cant afford to repeat in low space 2. Algorithm may misclassify a pair (i,j) into S t when it is in S t-1

High Level Ideas: F k ± F p 1. We want the F k -value of the vector (F p (Row 1), …, F p (Row n)) 2. We try to sample a row i with probability / F p (Row i) 3. Spend an extra pass to compute F p (Row i) 4. Could then output F p (M) ¢ F p (Row i) k-1 (can be seen as a generalization of [AMS]) How do we avoid an extra pass??

Avoiding an Extra Pass Now we can sample a Row i / F p (Row i) We design a new F k -algorithm to run on (F p (Row 1), …, F p (Row n)) which only receives IDs i with probability / F p (Row i) For each j 2 [log n], algorithm does: 1. Choose a random subset of n/2 j rows 2. Sample a row i from this set with Pr[Row i] / F p (Row i) We show that O~(n 1-1/k ) oracle samples is enough to estimate F k up to 1 § ε

New Lower Bounds AliceBob n x d matrix A n x d matrix B NO instance: for all rows i, ¢ (A i, B i ) · 1 YES instance: there is a unique row j for which ¢ (A j, B j ) = d, and for all i j, ¢ (A i, B i ) · 1 We show distinguishing these cases requires (n/d) randomized communication CC Implies estimating L k (L 0 ) or L k (L 1 ) needs (n 1-1/k ) space

Information Complexity Paradigm [CSWY, BJKS]: the information cost IC is the amount of information the transcript reveals about the inputs For any function f, CC(f) ¸ IC(f) Using their direct sum theorem, it suffices to show an (1/d) information cost of a protocol for deciding if ¢ (x,y) = d or ¢ (x,y) · 1 Caveat: distribution is only on instances where ¢ (x,y) · 1

Working with Hellinger Distance Given the prob. distribution vector ¼ (x,y) over transcripts of an input (x,y) Let Ã (x,y) ¿ = ¼ (x,y) ¿ 1/2 for all ¿ Information cost can be lower bounded by ¢ (u,v) = 1 k Ã (u,u) - Ã (u,v) k 2 Unlike previous work, we exploit the geometry of the squared Euclidean norm (useful in later work [AJP]) Short diagonals property: ¢ (u,v) = 1 k Ã (u,u) - Ã (u,v) k 2 ¸ (1/d) ¢ (u,v) = d k Ã (u,u) - Ã (u,v) k 2 a b c d e f a 2 + b 2 + c 2 + d 2 ¸ e 2 + f 2

Open Problems L k ± L p estimation for k < p Other cascaded aggregates, e.g. entropy Cascaded aggregates with 3 or more stages