# Numerical Linear Algebra in the Streaming Model Ken Clarkson - IBM David Woodruff - IBM.

## Presentation on theme: "Numerical Linear Algebra in the Streaming Model Ken Clarkson - IBM David Woodruff - IBM."— Presentation transcript:

Numerical Linear Algebra in the Streaming Model Ken Clarkson - IBM David Woodruff - IBM

The Problems Given n x d matrix A and n x d matrix B, we want estimators for The matrix product A T B The matrix X * minimizing ||AX-B|| –A slightly generalized version of least-squares regression Given integer k, the matrix A k of rank k minimizing ||A-A k || We consider the Frobenius matrix norm: root of sum of squares

General Properties of Our Algorithms 1 pass over matrix entries, given in any order (allow multiple updates) Maintain compressed versions or sketches of matrices Do small work per entry to maintain the sketches Output result using the sketches Randomized approximation algorithms

Matrix Compression Methods In a line of similar efforts… –Element-wise sampling [AM01], [AHK06] –Sketching / Random Projection: maintain a small number of random linear combinations of rows or columns [S06] –Row / column sampling [DK01], [DKM04], [DMM08] –Usually more than 1 pass Here: sketching

Outline Matrix Product Regression Low-rank approximation

Approximate Matrix Product A and B have n rows, we want to estimate A T B Let S be an n x m sign (Rademacher) matrix –Each entry is +1 or -1 with probability ½ –m is small, to be specified –Entries are O(log 1/δ)-wise independent Sketches are S T A and S T B Our estimate of A T B is [A T S][S T B]/m Easy to maintain sketches given updates –O(m) time per update, O(mc log(nc)) bits of space for S T A and S T B –Output [A T S][S T B]/m using fast rectangular matrix multiplication

Expected Error and a Tail Estimate Using linearity of expectation, E[A T SS T B/m] = A T E[SS T ]B/m = A T B Moreover, for δ, ε > 0, there is m = O(log 1/ δ) ε -2 so that Pr[||A T SS T B/m-A T B|| > ε ||A|| ||B||] · δ (again ||C|| = [Σ i, j C i, j 2 ] 1/2 ) This tail estimate seems to be new –Follows from bounding O(log 1/δ)-th moment of ||A T SS T B/m-A T B|| Improves space of Sarl Ó s 1-pass algorithm by a log c factor

Matrix Product Lower Bound Our algorithm is space-optimal for constant δ –a new lower bound Reduction from a communication game –Augmented Indexing, players Alice and Bob –Alice has random x 2 {0,1} s –Bob has random i 2 {1, 2, …, s} also x i+1, …, x s Alice sends Bob one message Bob should output x i with probability at least 2/3 Theorem [MNSW]: Message must be (s) bits on average

Lower Bound Proof Set s := £ (cε -2 log cn) Alice makes matrix U –Uses x 1 …x s Bob makes matrix U and B –Uses i and x i+1, …, x s Alg input will be A:=U+U and B –A and B are n x c/2 Alice: –Runs streaming matrix product Alg on U –Sends Alg state to Bob –Bob continues Alg with A := U + U and B A T B determines x i with probability at least 2/3 –By choice of U, U, B –Solving Augmented Indexing So space of Alg must be (s) = (cε -2 log cn) bits

Lower Bound Proof U = U (1) ; U (2) ; …, U (log (cn)) ; 0s –U (k) is an £ ( ε -2 ) x c/2 submatrix with entries in {-10 k, 10 k } –U (k) i, j = 10 k if matched entry of x is 0, else U (k) i, j = -10 k Bobs index i corresponds to U (k*) i*, j* U is such that A = U+U = U (1) ; U (2) ; …, U (k*) ; 0s –U is determined from x i+1, …, x s A T B is i * -th row of U (k*) ||A|| ¼ ||U (k *) || since the entries of A grow geometrically ε 2 ||A|| 2 ¢ ||B|| 2, the squared error, is small, so most entries of the approximation to A T B have the correct sign

Linear Regression The problem: min X ||AX-B|| X * minimizing this has X * = A - B, where A - is the pseudo-inverse of A The algorithm is –Maintain S T A and S T B –Return X solving min X ||S T (AX-B)|| Main claim: if A has rank k, there is an m = O(kε -1 log(1/δ)) so that with probability at least 1- δ, ||AX-B|| · (1+ε)||AX * -B|| –That is, relative error for X is small A new lower bound (omitted here) shows our space is optimal

Regression Analysis Why should X be good? First reduce to showing that ||A(X * -X)|| is small –Use normal equation A T AX * = A T B –Implies ||AX-B|| 2 = ||AX * -B|| 2 + ||A(X-X * )|| 2

Regression Analysis Continued Bound ||A(X-X * )|| 2 using several tools: –Normal equation (S T A) T (S T A)X = (S T A) T (S T B) –Our tail estimate for matrix product –Subspace JL: for m = O(kε -1 log(1/δ)), S T approximately preserves lengths of all vectors in a k- space Overall, ||A(X-X * )|| 2 = O(ε)||AX * -B|| 2 But ||AX-B|| 2 = ||AX * -B|| 2 + ||A(X-X * )|| 2

Best Low-Rank Approximation For any matrix A and integer k, there is a matrix A k of rank k that is closest to A among all matrices of rank k. LSI, PCA, recommendation systems, clustering The sketch S T A holds information about A There is a rank k matrix A k in the rowspace of S T A so that ||A-A k || · (1+ε)||A-A k ||

Low-Rank Approximation via Regression Why is there such an A k? Apply the regression results with A ! A k, B ! A The X minimizing ||S T (A k X-A)|| has ||A k X-A|| · (1+ ε)||A k X * -A|| But here X * = I, and X = (S T A k ) - S T A So the matrix A k X = A k (S T A k ) - S T A: –Has rank k –Is in the rowspace of S T A –Is within 1+ ε of the smallest distance of rank-k matrix Problem: seems to require 2 passes

1 Pass Algorithm Suppose R is a d x m sign matrix. By regression results transposed, the columnspace of AR contains a good rank-k approximation to A X minimizing ||ARX-A|| has ||ARX-A|| · (1+ ε)||A-A k || Apply regression results with A ! AR and B ! A and X minimizing ||S T (ARX-A)|| So X=(S T AR) - S T A has ||ARX-A|| · (1+ ε)||ARX-A|| · (1+ε) 2 ||A-A k || Algorithm maintains AR and S T A, and computes ARX = AR(S T AR) - S T A Compute best rank-k approximation to ARX in the columnspace of AR (ARX has rank kε -2 )

Concluding Remarks Space bounds are tight for product, regression Get 1-pass and O(kε -2 (n+dε -2 )log(nd)) space for low-rank approximation. –First sublinear 1-pass streaming algorithm with relative error –Show our space is almost optimal via new lower bound (omitted) –Total work is O(Nkε -4 ), where N is the number of non-zero entries of A –In next talk, [NDT] give a faster algorithm for dense matrices Improve the dependence on ε Look at tight bounds for multiple passes

A Lower Bound binary string x index i and x i+1, x i+2, … matrix A 10, -10 -10, 10 -10,-10 … -100, -100 100, 100 … -1000, -1000 -1000, 1000 1000, 1000 … ………………………… 0, 0 … 10000, -10000 -10000, -10000 … 0, 0 … 00………00……… Error now dominated by block of interest Bob also inserts a k x k identity submatrix into block of interest n-k rows k ε -1 columns per block 0s k rows

Lower Bound Details Block of interest: n-k rows k ε -1 columns * k rows 0s P*I k Bob inserts k x k identity submatrix, scaled by large value P Show any rank-k approximation must err on all of shaded region So good rank-k approximation likely has correct sign on Bobs entry

Similar presentations