# Sparse Recovery Using Sparse (Random) Matrices Piotr Indyk MIT Joint work with: Radu Berinde, Anna Gilbert, Howard Karloff, Martin Strauss and Milan Ruzic.

## Presentation on theme: "Sparse Recovery Using Sparse (Random) Matrices Piotr Indyk MIT Joint work with: Radu Berinde, Anna Gilbert, Howard Karloff, Martin Strauss and Milan Ruzic."— Presentation transcript:

Sparse Recovery Using Sparse (Random) Matrices Piotr Indyk MIT Joint work with: Radu Berinde, Anna Gilbert, Howard Karloff, Martin Strauss and Milan Ruzic

Linear Compression (a.k.a. linear sketching, compressed sensing) Setup: –Data/signal in n-dimensional space : x E.g., x is an 1000x1000 image  n=1000,000 –Goal: compress x into a “sketch” Ax, where A is a carefully designed m x n matrix, m << n Requirements: –Plan A: want to recover x from Ax Impossible: undetermined system of equations –Plan B: want to recover an “approximation” x* of x Sparsity parameter k Want x* such that ||x*-x|| p  C(k) ||x’-x|| q ( l p /l q guarantee ) over all x’ that are k-sparse (at most k non-zero entries) The best x* contains k coordinates of x with the largest abs value Want: –Good compression (small m) –Efficient algorithms for encoding and recovery Why linear compression ? = A x Ax x x* k=2

Applications

Application I: Monitoring Network Traffic Router routs packets (many packets) –Where do they come from ? –Where do they go to ? Ideally, would like to maintain a traffic matrix x[.,.] –Easy to update: given a (src,dst) packet, increment x src,dst –Requires way too much space! (2 32 x 2 32 entries) –Need to compress x, increment easily Using linear compression we can: –Maintain sketch Ax under increments to x, since A(x+  ) = Ax + A  –Recover x* from Ax source destination x

Other applications Single pixel camera [Wakin, Laska, Duarte, Baron, Sarvotham, Takhar, Kelly, Baraniuk’06] Microarray Experiments/Pooling [Kainkaryam, Gilbert, Shearer, Woolf], [Hassibi et al], [Dai-Sheikh, Milenkovic, Baraniuk]

Known constructions+algorithms

Constructing matrix A Choose encoding matrix A at random (the algorithms for recovering x* are more complex) –Sparse matrices: Data stream algorithms Coding theory (LDPCs) –Dense matrices: Compressed sensing Complexity theory (Fourier) “Traditional” tradeoffs: –Sparse: computationally more efficient, explicit –Dense: shorter sketches Goals: unify, find the “best of all worlds”

Result Table [CCF’02], [CM’06] Rk log nn log nlog nn log nl2 / l2 Rk log c nn log c nlog c nk log c nl2 / l2 [CM’04]Rk log nn log nlog nn log nl1 / l1 Rk log c nn log c nlog c nk log c nl1 / l1 Scale: ExcellentVery GoodGoodFair Caveats: (1) only results for general vectors x are shown; (2) all bounds up to O() factors; (3) specific matrix type often matters (Fourier, sparse, etc); (4) ignore universality, explicitness, etc (5) most “dominated” algorithms not shown; Legend: n=dimension of x m=dimension of Ax k=sparsity of x* T = #iterations Approx guarantee: l2/l2: ||x-x*|| 2  C||x-x’|| 2 l2/l1: ||x-x*|| 2  C||x-x’|| 1 /k 1/2 l1/l1: ||x-x*|| 1  C||x-x’|| 1 [CRT’04] [RV’05] Dk log(n/k)nk log(n/k)k log(n/k)ncnc l2 / l1 Dk log c nn log nk log c nncnc l2 / l1 [GSTV’06] [GSTV’07] Dk log c nn log c nlog c nk log c nl1 / l1 Dk log c nn log c nk log c nk 2 log c nl2 / l1 [BGIKS’08]Dk log(n/k)n log(n/k)log(n/k)ncnc l1 / l1 [GLR’08]Dk logn logloglogn kn 1-a n 1-a ncnc l2 / l1 [NV’07], [DM’08], [NT’08,BD’08] Dk log(n/k)nk log(n/k)k log(n/k)nk log(n/k) * Tl2 / l1 Dk log c nn log nk log c nn log n * Tl2 / l1 [IR’08]Dk log(n/k)n log(n/k)log(n/k)n log(n/k)l1 / l1 [BIR’08]Dk log(n/k)n log(n/k)log(n/k)n log(n/k) *Tl1 / l1 PaperRand. / Det. Sketch length Encode time Col. sparsity/ Update time Recovery timeApprx [DIP’09]D  (k log(n/k)) l1 / l1 [CDD’07]D  (n) l2 / l2

Techniques

Restricted Isometry Property (RIP) - key property of a dense matrix A: x is k-sparse  ||x|| 2  ||Ax|| 2  C ||x|| 2 Holds w.h.p. for: –Random Gaussian/Bernoulli: m=O(k log (n/k)) –Random Fourier: m=O(k log O(1) n) Consider random m x n 0-1 matrices with d ones per column Do they satisfy RIP ? –No, unless m=  (k 2 ) [Chandar’07] However, they can satisfy the following RIP-1 property [Berinde-Gilbert-Indyk- Karloff-Strauss’08]: x is k-sparse  d (1-  ) ||x|| 1  ||Ax|| 1  d||x|| 1 Sufficient (and necessary) condition: the underlying graph is a ( k, d(1-  /2) )-expander dense vs. sparse

Expanders A bipartite graph is a (k,d(1-  ))-expander if for any left set S, |S|≤k, we have |N(S)|≥(1-  )d |S| Constructions: –Randomized: m=O(k log (n/k)) –Explicit: m=k quasipolylog n Plenty of applications in computer science, coding theory etc. In particular, LDPC-like techniques yield good algorithms for exactly k-sparse vectors x [Xu-Hassibi’07, Indyk’08, Jafarpour-Xu-Hassibi-Calderbank’08] n m d S N(S)

Instead of RIP in the L 2 norm, we have RIP in the L 1 norm Suffices for these results: Main drawback: l1/l1 guarantee Better approx. guarantee with same time and sketch length Other sparse matrix schemes, for (almost) k-sparse vectors: LDPC-like: [Xu-Hassibi’07, Indyk’08, Jafarpour-Xu-Hassibi-Calderbank’08] L1 minimization: [Wang-Wainwright-Ramchandran’08] Message passing: [Sarvotham-Baron-Baraniuk’06,’08, Lu-Montanari-Prabhakar’08] [BGIKS’08]Dk log(n/k)n log(n/k)log(n/k)ncnc l1 / l1 [IR’08]Dk log(n/k)n log(n/k)log(n/k)n log(n/k)l1 / l1 [BIR’08]Dk log(n/k)n log(n/k)log(n/k)n log(n/k) *Tl1 / l1 PaperRand. / Det. Sketch length Encode time Sparsity/ Update time Recovery timeApprx ? dense vs. sparse

Algorithms/Proofs

Proof: d(1-  /2)-expansion  RIP-1 Want to show that for any k-sparse x we have d (1-  ) ||x|| 1  ||Ax|| 1  d||x|| 1 RHS inequality holds for any x LHS inequality: –W.l.o.g. assume |x 1 |≥… ≥|x k | ≥ |x k+1 |=…= |x n |=0 –Consider the edges e=(i,j) in a lexicographic order –For each edge e=(i,j) define r(e) s.t. r(e)=-1 if there exists an edge (i’,j)<(i,j) r(e)=1 if there is no such edge Claim: ||Ax|| 1 ≥∑ e=(i,j) |x i |r e n m

Proof: d(1-  /2)-expansion  RIP-1 (ctd) Need to lower-bound ∑ e z e r e where z (i,j) =|x i | Let R b = the sequence of the first bd r e ’s From graph expansion, R b contains at most  /2 bd -1’s (for b=1, it contains no -1’s) The sequence of r e ’s that minimizes ∑ e z e r e is 1,1,…,1,-1,..,-1, 1,…,1, … d  /2 d (1-  /2)d Thus ∑ e z e r e ≥ (1-  )∑ e z e = (1-  ) d||x|| 1 d

A satisfies RIP-1  LP works [Berinde-Gilbert-Indyk-Karloff-Strauss’08] Compute a vector x* such that Ax=Ax* and ||x*|| 1 minimal Then we have, over all k-sparse x’ ||x-x*|| 1 ≤ C min x’ ||x-x’|| 1 –C  2 as the expansion parameter  0 Can be extended to the case when Ax is noisy

A satisfies RIP-1  Sparse Matching Pursuit works [Berinde-Indyk-Ruzic’08] Algorithm: –x*=0 –Repeat T times Compute c=Ax-Ax* = A(x-x*) Compute ∆ such that ∆ i is the median of its neighbors in c Sparsify ∆ (set all but 2k largest entries of ∆ to 0) x*=x*+∆ Sparsify x* (set all but k largest entries of x* to 0) After T=log() steps we have ||x-x*|| 1 ≤ C min k-sparse x’ ||x-x’|| 1 A ∆ c

Experiments Probability of recovery of random k-sparse +1/-1 signals from m measurements –Sparse matrices with d=10 1s per column –Signal length n=20,000 SMP LP

Conclusions Sparse approximation using sparse matrices State of the art: can do 2 out of 3: –Near-linear encoding/decoding –O(k log (n/k)) measurements –Approximation guarantee with respect to L2/L1 norm Open problems: –3 out of 3 ? –Explicit constructions ? This talk

Resources References: –R. Berinde, A. Gilbert, P. Indyk, H. Karloff, M. Strauss, “Combining geometry and combinatorics: a unified approach to sparse signal recovery”, Allerton, 2008. –R. Berinde, P. Indyk, M. Ruzic, “Practical Near-Optimal Sparse Recovery in the L1 norm”, Allerton, 2008. –R. Berinde, P. Indyk, “Sparse Recovery Using Sparse Random Matrices”, 2008. –P. Indyk, M. Ruzic, “Near-Optimal Sparse Recovery in the L1 norm”, FOCS, 2008.

Running times

Download ppt "Sparse Recovery Using Sparse (Random) Matrices Piotr Indyk MIT Joint work with: Radu Berinde, Anna Gilbert, Howard Karloff, Martin Strauss and Milan Ruzic."

Similar presentations