Presentation on theme: "Sparse Recovery Using Sparse (Random) Matrices Piotr Indyk MIT Joint work with: Radu Berinde, Anna Gilbert, Howard Karloff, Martin Strauss and Milan Ruzic."— Presentation transcript:
Sparse Recovery Using Sparse (Random) Matrices Piotr Indyk MIT Joint work with: Radu Berinde, Anna Gilbert, Howard Karloff, Martin Strauss and Milan Ruzic
Linear Compression (learning Fourier coeffs, linear sketching, finite rate of innovation, compressed sensing...) Setup: –Data/signal in n-dimensional space : x E.g., x is an 1000x1000 image n=1000,000 –Goal: compress x into a “sketch” Ax, where A is a m x n “sketch matrix”, m << n Requirements: –Plan A: want to recover x from Ax Impossible: undetermined system of equations –Plan B: want to recover an “approximation” x* of x Sparsity parameter k Want x* such that ||x*-x|| p C(k) ||x’-x|| q ( l p /l q guarantee ) over all x’ that are k-sparse (at most k non-zero entries) The best x* contains k coordinates of x with the largest abs value Want: –Good compression (small m) –Efficient algorithms for encoding and recovery Why linear compression ? = A x Ax x x* k=2
Linear compression: applications Data stream algorithms (e.g. for network monitoring) –Efficient increments: A(x+ ) = Ax + A Single pixel camera [Wakin, Laska, Duarte, Baron, Sarvotham, Takhar, Kelly, Baraniuk’06] Pooling, Microarray Experiments [Kainkaryam, Woolf], [Hassibi et al], [Dai-Sheikh, Milenkovic, Baraniuk] source destination
Constructing matrix A Choose encoding matrix A at random (the algorithms for recovering x* are more complex) –Sparse matrices: Data stream algorithms Coding theory (LDPCs) –Dense matrices: Compressed sensing Complexity/learning theory (Fourier matrices) “Traditional” tradeoffs: –Sparse: computationally more efficient, explicit –Dense: shorter sketches Goal: the “best of both worlds”
Prior and New Results PaperRand. / Det. Sketch length Encode time Column sparsity Recovery timeApprox
Prior and New Results PaperR/ D Sketch lengthEncode time Column sparsity Recovery timeApprox [CCF’02], [CM’06] Rk log nn log nlog nn log nl2 / l2 Rk log c nn log c nlog c nk log c nl2 / l2 [CM’04]Rk log nn log nlog nn log nl1 / l1 Rk log c nn log c nlog c nk log c nl1 / l1 [CRT’04] [RV’05] Dk log(n/k)nk log(n/k)k log(n/k)ncnc l2 / l1 Dk log c nn log nk log c nncnc l2 / l1 [GSTV’06] [GSTV’07] Dk log c nn log c nlog c nk log c nl1 / l1 Dk log c nn log c nk log c nk 2 log c nl2 / l1 [BGIKS’08]Dk log(n/k)n log(n/k)log(n/k)ncnc l1 / l1 [GLR’08]Dk logn logloglogn kn 1-a n 1-a ncnc l2 / l1 [NV’07], [DM’08], [NT’08], [BD’08], [GK’09], … Dk log(n/k)nk log(n/k)k log(n/k)nk log(n/k) * Tl2 / l1 Dk log c nn log nk log c nn log n * Tl2 / l1 [IR’08], [BIR’08]Dk log(n/k)n log(n/k)log(n/k)n log(n/k)l1 / l1 Excellent Scale: Very GoodGoodFair Dk log(n/k)n log(n/k)log(n/k)n log(n/k) * Tl1 / l1 [BI’09]Dk log(n/k)n log(n/k)log(n/k)nlog(n/k)*T*stuffl1 / l1 “state of art” [GLSP’09]Rk log(n/k)n log c nlog c nk log c nl2 / l1
Recovery “in principle” (when is a matrix “good”)
Restricted Isometry Property (RIP) * - sufficient property of a dense matrix A: is k-sparse || || 2 ||A || 2 C || || 2 Holds w.h.p. for: –Random Gaussian/Bernoulli: m=O(k log (n/k)) –Random Fourier: m=O(k log O(1) n) Consider random m x n 0-1 matrices with d ones per column Do they satisfy RIP ? –No, unless m= (k 2 ) [Chandar’07] However, they can satisfy the following RIP-1 property [Berinde-Gilbert-Indyk- Karloff-Strauss’08]: is k-sparse d (1- ) || || 1 ||A || 1 d|| || 1 Sufficient (and necessary) condition: the underlying graph is a ( k, d(1- /2) )-expander dense vs. sparse * Other (weaker) conditions exist, e.g., “kernel of A is a near-Euclidean subspace of l1”. More later.
Expanders A bipartite graph is a (k,d(1- ))-expander if for any left set S, |S|≤k, we have |N(S)|≥(1- )d |S| Constructions: –Randomized: m=O(k log (n/k)) –Explicit: m=k quasipolylog n Plenty of applications in computer science, coding theory etc. In particular, LDPC-like techniques yield good algorithms for exactly k-sparse vectors x [Xu-Hassibi’07, Indyk’08, Jafarpour-Xu-Hassibi-Calderbank’08] n m d S N(S)
Proof: d(1- /2)-expansion RIP-1 Want to show that for any k-sparse we have d (1- ) || || 1 ||A || 1 d|| || 1 RHS inequality holds for any LHS inequality: –W.l.o.g. assume | 1 |≥… ≥| k | ≥ | k+1 |=…= | n |=0 –Consider the edges e=(i,j) in a lexicographic order –For each edge e=(i,j) define r(e) s.t. r(e)=-1 if there exists an edge (i’,j)<(i,j) r(e)=1 if there is no such edge Claim 1: ||A || 1 ≥∑ e=(i,j) | i |r e Claim 2: ∑ e=(i,j) | i |r e ≥ (1- ) d|| || 1 n m d
Matching Pursuit(s) Iterative algorithm: given current approximation x* : –Find (possibly several) i s. t. A i “correlates” with Ax-Ax*. This yields i and z s. t. ||x*+ze i -x|| p << ||x* - x|| p –Update x* –Sparsify x* (keep only k largest entries) –Repeat Norms: –p=2 : CoSaMP, SP, IHT etc (dense matrices) –p=1 : SMP, this paper (sparse matrices) –p=0 : LDPC bit flipping (sparse matrices) = i i A x*-xAx-Ax*
Sequential Sparse Matching Pursuit Algorithm: –x*=0 –Repeat T times Repeat S=O(k) times –Find i and z that minimize* ||A(x*+ze i )-b|| 1 –x* = x*+ze i Sparsify x* (set all but k largest entries of x* to 0) Similar to SMP, but updates done sequentially A i N(i) x-x* Ax-Ax* * Set z=median[ (Ax*-Ax) N(i) ]. Instead, one could first optimize (gradient) i and then z [ Fuchs’09]
SSMP: Running time Algorithm: –x*=0 –Repeat T times For each i=1…n compute* z i that achieves D i =min z ||A(x*+ze i )-b|| 1 and store D i in a heap Repeat S=O(k) times –Pick i,z that yield the best gain –Update x* = x*+ze i –Recompute and store D i’ for all i’ such that N(i) and N(i’) intersect Sparsify x* (set all but k largest entries of x* to 0) Running time: T [ n(d+log n) + k nd/m*d (d+log n)] = T [ n(d+log n) + nd (d+log n)] = T [ nd (d+log n)] A i x-x* Ax-Ax*
SSMP: Approximation guarantee Want to find k-sparse x* that minimizes ||x-x*|| 1 By RIP1, this is approximately the same as minimizing ||Ax-Ax*|| 1 Need to show we can do it greedily a1a1 a2a2 x a1a1 a2a2 x Supports of a 1 and a 2 have small overlap (typically)
Experiments 256x256 SSMP is ran with S=10000,T=20. SMP is ran for 100 iterations. Matrix sparsity is d=8.
Conclusions Even better algorithms for sparse approximation (using sparse matrices) State of the art: can do 2 out of 3: –Near-linear encoding/decoding –O(k log (n/k)) measurements –Approximation guarantee with respect to L2/L1 norm Questions: –3 out of 3 ? “Kernel of A is a near-Euclidean subspace of l1” [Kashin et al] Constructions of sparse A exist [Guruswami-Lee-Razborov, Guruswami-Lee-Wigderson] Challenge: match the optimal measurement bound –Explicit constructions This talk Thanks!