Presentation on theme: "Sparse Recovery Using Sparse (Random) Matrices Piotr Indyk MIT Joint work with: Radu Berinde, Anna Gilbert, Howard Karloff, Martin Strauss and Milan Ruzic."— Presentation transcript:
Sparse Recovery Using Sparse (Random) Matrices Piotr Indyk MIT Joint work with: Radu Berinde, Anna Gilbert, Howard Karloff, Martin Strauss and Milan Ruzic
Goal of this talk “Stay awake until the end”
Linear Compression (learning Fourier coeffs, linear sketching, finite rate of innovation, compressed sensing...) Setup: –Data/signal in n-dimensional space : x E.g., x is an 1000x1000 image n=1000,000 –Goal: compress x into a “sketch” Ax, where A is a m x n “sketch matrix”, m << n Requirements: –Plan A: want to recover x from Ax Impossible: undetermined system of equations –Plan B: want to recover an “approximation” x* of x Sparsity parameter k Want x* such that ||x*-x|| p C(k) ||x’-x|| q ( l p /l q guarantee ) over all x’ that are k-sparse (at most k non-zero entries) The best x* contains k coordinates of x with the largest abs value Want: –Good compression (small m) –Efficient algorithms for encoding and recovery Why linear compression ? = A x Ax x x* k=2
Linear compression: applications Data stream algorithms (e.g. for network monitoring) –Efficient increments: A(x+ ) = Ax + A Single pixel camera [Wakin, Laska, Duarte, Baron, Sarvotham, Takhar, Kelly, Baraniuk’06] Pooling, Microarray Experiments [Kainkaryam, Woolf], [Hassibi et al], [Dai-Sheikh, Milenkovic, Baraniuk] source destination
Constructing matrix A Choose encoding matrix A at random (the algorithms for recovering x* are more complex) –Sparse matrices: Data stream algorithms Coding theory (LDPCs) –Dense matrices: Compressed sensing Complexity/learning theory (Fourier matrices) “Traditional” tradeoffs: –Sparse: computationally more efficient, explicit –Dense: shorter sketches Goal: the “best of both worlds”
Prior and New Results PaperRand. / Det. Sketch length Encode time Column sparsity Recovery timeApprox
Prior and New Results PaperR/ D Sketch lengthEncode time Column sparsity Recovery timeApprox [CCF’02], [CM’06] Rk log nn log nlog nn log nl2 / l2 Rk log c nn log c nlog c nk log c nl2 / l2 [CM’04]Rk log nn log nlog nn log nl1 / l1 Rk log c nn log c nlog c nk log c nl1 / l1 [CRT’04] [RV’05] Dk log(n/k)nk log(n/k)k log(n/k)ncnc l2 / l1 Dk log c nn log nk log c nncnc l2 / l1 [GSTV’06] [GSTV’07] Dk log c nn log c nlog c nk log c nl1 / l1 Dk log c nn log c nk log c nk 2 log c nl2 / l1 [BGIKS’08]Dk log(n/k)n log(n/k)log(n/k)ncnc l1 / l1 [GLR’08]Dk logn logloglogn kn 1-a n 1-a ncnc l2 / l1 [NV’07], [DM’08], [NT’08], [BD’08], [GK’09], … Dk log(n/k)nk log(n/k)k log(n/k)nk log(n/k) * Tl2 / l1 Dk log c nn log nk log c nn log n * Tl2 / l1 [IR’08], [BIR’08]Dk log(n/k)n log(n/k)log(n/k)n log(n/k)l1 / l1 Excellent Scale: Very GoodGoodFair Dk log(n/k)n log(n/k)log(n/k)n log(n/k) * Tl1 / l1 [BI’09]Dk log(n/k)n log(n/k)log(n/k)nlog(n/k)*T*stuffl1 / l1 “state of art” [GLSP’09]Rk log(n/k)n log c nlog c nk log c nl2 / l1
Recovery “in principle” (when is a matrix “good”)
Restricted Isometry Property (RIP) * - sufficient property of a dense matrix A: is k-sparse || || 2 ||A || 2 C || || 2 Holds w.h.p. for: –Random Gaussian/Bernoulli: m=O(k log (n/k)) –Random Fourier: m=O(k log O(1) n) Consider random m x n 0-1 matrices with d ones per column Do they satisfy RIP ? –No, unless m= (k 2 ) [Chandar’07] However, they can satisfy the following RIP-1 property [Berinde-Gilbert-Indyk- Karloff-Strauss’08]: is k-sparse d (1- ) || || 1 ||A || 1 d|| || 1 Sufficient (and necessary) condition: the underlying graph is a ( k, d(1- /2) )-expander dense vs. sparse * Other (weaker) conditions exist, e.g., “kernel of A is a near-Euclidean subspace of l1”. More later.
Expanders A bipartite graph is a (k,d(1- ))-expander if for any left set S, |S|≤k, we have |N(S)|≥(1- )d |S| Constructions: –Randomized: m=O(k log (n/k)) –Explicit: m=k quasipolylog n Plenty of applications in computer science, coding theory etc. In particular, LDPC-like techniques yield good algorithms for exactly k-sparse vectors x [Xu-Hassibi’07, Indyk’08, Jafarpour-Xu-Hassibi-Calderbank’08] n m d S N(S)
Proof: d(1- /2)-expansion RIP-1 Want to show that for any k-sparse we have d (1- ) || || 1 ||A || 1 d|| || 1 RHS inequality holds for any LHS inequality: –W.l.o.g. assume | 1 |≥… ≥| k | ≥ | k+1 |=…= | n |=0 –Consider the edges e=(i,j) in a lexicographic order –For each edge e=(i,j) define r(e) s.t. r(e)=-1 if there exists an edge (i’,j)<(i,j) r(e)=1 if there is no such edge Claim 1: ||A || 1 ≥∑ e=(i,j) | i |r e Claim 2: ∑ e=(i,j) | i |r e ≥ (1- ) d|| || 1 n m d
Matching Pursuit(s) Iterative algorithm: given current approximation x* : –Find (possibly several) i s. t. A i “correlates” with Ax-Ax*. This yields i and z s. t. ||x*+ze i -x|| p << ||x* - x|| p –Update x* –Sparsify x* (keep only k largest entries) –Repeat Norms: –p=2 : CoSaMP, SP, IHT etc (dense matrices) –p=1 : SMP, this paper (sparse matrices) –p=0 : LDPC bit flipping (sparse matrices) = i i A x*-xAx-Ax*
Sequential Sparse Matching Pursuit Algorithm: –x*=0 –Repeat T times Repeat S=O(k) times –Find i and z that minimize* ||A(x*+ze i )-b|| 1 –x* = x*+ze i Sparsify x* (set all but k largest entries of x* to 0) Similar to SMP, but updates done sequentially A i N(i) x-x* Ax-Ax* * Set z=median[ (Ax*-Ax) N(i) ]. Instead, one could first optimize (gradient) i and then z [ Fuchs’09]
SSMP: Running time Algorithm: –x*=0 –Repeat T times For each i=1…n compute* z i that achieves D i =min z ||A(x*+ze i )-b|| 1 and store D i in a heap Repeat S=O(k) times –Pick i,z that yield the best gain –Update x* = x*+ze i –Recompute and store D i’ for all i’ such that N(i) and N(i’) intersect Sparsify x* (set all but k largest entries of x* to 0) Running time: T [ n(d+log n) + k nd/m*d (d+log n)] = T [ n(d+log n) + nd (d+log n)] = T [ nd (d+log n)] A i x-x* Ax-Ax*
SSMP: Approximation guarantee Want to find k-sparse x* that minimizes ||x-x*|| 1 By RIP1, this is approximately the same as minimizing ||Ax-Ax*|| 1 Need to show we can do it greedily a1a1 a2a2 x a1a1 a2a2 x Supports of a 1 and a 2 have small overlap (typically)
Experiments 256x256 SSMP is ran with S=10000,T=20. SMP is ran for 100 iterations. Matrix sparsity is d=8.
Conclusions Even better algorithms for sparse approximation (using sparse matrices) State of the art: can do 2 out of 3: –Near-linear encoding/decoding –O(k log (n/k)) measurements –Approximation guarantee with respect to L2/L1 norm Questions: –3 out of 3 ? “Kernel of A is a near-Euclidean subspace of l1” [Kashin et al] Constructions of sparse A exist [Guruswami-Lee-Razborov, Guruswami-Lee-Wigderson] Challenge: match the optimal measurement bound –Explicit constructions This talk Thanks!