# 1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.

## Presentation on theme: "1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden."— Presentation transcript:

1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden

Compressed Sensing Choose an r x n matrix A Given x 2 R n Compute Ax Output a vector y so that |x-y| p · (1+ε) |x-x top k | p x top k is the k-sparse vector of largest magnitude coefficients of x p = 1 or p = 2 Minimize number r = r(n, k, ε) of measurements Pr A [ ] > 2/3

Previous Work p = 1 [IR, …] r = O(k log(n/k) / ε) (deterministic A) p = 2 [GLPS] r = O(k log(n/k) / ε) In both cases, r = (k log(n/k)) [DIPW] What is the dependence on ε?

Why 1+ε is Important Suppose x = e i + u –e i = (0, 0, …, 0, 1, 0, …, 0) –u is a random unit vector orthogonal to e i Consider y = 0 n –|x-y| 2 = |x| 2 · 2 1/2 ¢ |x-e i | 2 Its a trivial solution! (1+ε)-approximate recovery fixes this In some applications, can have 1/ε = 100, log n = 32

Our Results Vs. Previous Work p = 1 [IR, …] r = O(k log(n/k) / ε) r = O(k log(n/k) ¢ log 2 (1/ ε) / ε 1/2 ) (randomized) r = (k log(1/ε) / ε 1/2 ) p = 2: [GLPS] r = O(k log(n/k) / ε) r = (k log(n/k) / ε) Previous lower bounds (k log(n/k)) Lower bounds for randomized constant probability

Comparison to Deterministic Schemes We get r = O~(k/ε 1/2 ) randomized upper bound for p = 1 We show (k log (n/k) /ε) for p = 1 for deterministic schemes So randomized easier than deterministic

Our Sparse-Output Results Output a vector y from Ax so that |x-y| p · (1+ε) |x-x top k | p Sometimes want y to be k-sparse r = (k/ε p ) Both results tight up to logarithmic factors Recall that for non-sparse output r = £ ~(k/ε p/2 )

Talk Outline 1.O~(k / ε 1/2 ) upper bound for p = 1 2.Lower bounds

Simplifications Want O~(k/ε 1/2 ) for p = 1 Replace k with 1 –Sample 1/k fraction of coordinates –Solve the problem for k = 1 on the sample –Repeat O~(k) times independently –Combine the solutions found ε/k, ε/k, …, ε/k, 1/n, 1/n, …, 1/n ε/k, 1/n, …, 1/n

k = 1 Assume |x-x top | 1 = 1, and x top = ε First attempt –Use CountMin [CM] –Randomly partition coordinates into B buckets, maintain sum in each bucket Σ i s.t. h(i) = 2 x i The expected l 1 -mass of noise in a bucket is 1/B If B = £ (1/ε), most buckets have count ε/2 Repeat O(log n) times

Second Attempt But we wanted O~(1/ε 1/2 ) measurements Error in a bucket is 1/B, need B ¼ 1/ε What about CountSketch? [CCF-C] –Give each coordinate i a random ¾ (i) 2 {-1,1} –Randomly partition coordinates into B buckets, maintain Σ i s.t. h(i) = j ¾ (i) ¢ x i in j-th bucket –Bucket error is (Σ i top x i 2 / B) 1/2 –Is this better? Σ i s.t. h(i) = 2 ¾ (i) ¢ x i

CountSketch Bucket error Err = (Σ i top x i 2 / B) 1/2 All |x i | · ε and |x-x top | 1 = 1 Σ i top x i 2 · 1/ ε ¢ ε 2 · ε So Err · (ε/B) 1/2 which needs to be at most ε Solving, B ¸ 1/ ε CountSketch isnt better than CountMin

Main Idea We insist on using CountSketch with B = 1/ε 1/2 Suppose Err = (Σ i top x i 2 / B) 1/2 = ε This means Σ i top x i 2 = ε 3/2 Forget about x top ! Lets make up the mass another way

Main Idea We have: Σ i top x i 2 = ε 3/2 Intuition: suppose all x i, i top, are the same or 0 Then: (# non-zero)*value = 1 (# non-zero)*value 2 = ε 3/2 Hence, value = ε 3/2 and # non-zero = 1/ε 3/2 Sample ε-fraction of coordinates uniformly at random! –value = ε 3/2 and # non-zero sampled = 1/ε 1/2, so l 1 -contribution = ε –Find all non-zeros with O~(1/ε 1/2 ) measurements

General Setting Σ i top x i 2 = ε 3/2 S j = {i | 1/4 j < x i 2 · 1/4 j-1 } Σ i top x i 2 = ε 3/2 implies there is a j for which |S j |/4 j = ~(ε 3/2 ) ε 3/2, …, ε 3/2 4ε 3/2, …, 4ε 3/2 16ε 3/2, …, 16ε 3/2 … ε 3/4

General Setting If |S j | ε 2, so 1/2 j > ε, cant happen Else, sample at rate 1/(|S j | ε 1/2 ) to get 1/ε 1/2 elements of |S j | l 1 -mass of |S j | in sample is > ε Can we find the sampled elements of S j ? Use Σ i top x i 2 = ε 3/2 The l 2 2 of the sample is about ε 3/2 ¢ 1/(|S j | ε 1/2 ) = ε/|S j | Using CountSketch with 1/ε 1/2 buckets: Bucket error = sqrt{ε 1/2 ¢ ε 3/2 ¢ 1/(|S j | ε 1/2 )} = sqrt{ε 3/2 /|S j |} ε 3/2

Algorithm Wrapup Sub-sample O(log 1/ε) times in powers of 2 In each level of sub-sampling maintain CountSketch with O~(1/ε 1/2 ) buckets Find as many heavy coordinates as you can! Intuition: if CountSketch fails, there are many heavy elements that can be found by sub-sampling Wouldnt work for CountMin as bucket error could be ε because of n-1 items each of value ε/(n-1)

Talk Outline 1.O~(k / ε 1/2 ) upper bound for p = 1 2.Lower bounds

Our Results General results: – ~(k / ε 1/2 ) for p = 1 – (k log(n/k) / ε) for p = 2 Sparse output: – ~(k/ε) for p = 1 – ~(k/ε 2 ) for p = 2 Deterministic: – (k log(n/k) / ε) for p = 1

Simultaneous Communication Complexity Alice Bob x Alice and Bob send a single message to the referee who outputs f(x,y) with constant probability Communication cost CC(f) is maximum message length, over randomness of protocol and all possible inputs Parties share randomness What is f(x,y)? y M A (x) M B (y)

Shared randomness decides matrix A Alice sends Ax to referee Bob sends Ay to referee Referee computes A(x+y), uses compressed sensing recovery algorithm If output of algorithm solves f(x,y), then # rows of A * # bits per measurement > CC(f) Reduction to Compressed Sensing

A Unified View General results: Direct-Sum Gap-l 1 – ~(k / ε 1/2 ) for p = 1 – ~(k / ε) for p = 2 Sparse output: Indexing – ~(k/ε) for p = 1 – ~(k/ε 2 ) for p = 2 Deterministic: Equality – (k log(n/k) / ε) for p = 1 Tighter log factors achievable by looking at Gaussian channels

General Results: k = 1, p = 1 Alice and Bob have x, y, respectively, in R m There is a unique i * for which (x+y) i* = d For all j i *, (x+y) j 2 {0, c, -c}, where |c| < |d| Finding i * requires (m/(d/c) 2 ) communication [SS, BJKS] m = 1/ε 3/2, c = ε 3/2, d = ε Need (1/ε 1/2 ) communication

General Results: k = 1, p = 1 But the compressed sensing algorithm doesnt need to find i * If not then it needs to transmit a lot of information about the tail –Tail a random low-weight vector in {0, ε 3/2, - ε 3/2 } 1/ε 3 –Uses distributional lower bound and RS codes Send a vector y within 1-ε of tail in l 1 -norm Needs 1/ε 1/2 communication

General Results: k = 1, p = 2 Same argument, different parameters (1/ε) communication What about general k?

Handling General k Bounded Round Direct Sum Theorem [BR] (with slight modification) given k copies of a function f, with input pairs independently drawn from ¹, solving a 2/3 fraction needs communication (k ¢ CC ¹ (f)) ε 3/2, …, ε 3/2 ε1/2ε1/2 ε1/2ε1/2 … ε1/2ε1/2 } k Instance for p = 1

Handling General k CC = (k/ε 1/2 ) for p = 1 CC = (k/ε) for p = 2 What is implied about compressed sensing?

Rounding Matrices [DIPW] A is a matrix of real numbers Can assume orthonormal rows Round the entries of A to O(log n) bits, obtaining matrix A Careful –Ax = A(x+s) for small s –But s depends on A, no guarantee recovery works –Can be fixed by looking at A(x+s+u) for random u

Lower Bounds for Compressed Sensing # rows of A * # bits per measurement > CC(f) By rounding, # bits per measurement = O(log n) In our hard instances, universe size = poly(k/ε) So # rows of A * O(log (k/ε)) > CC(f) # rows of A = ~(k/ε 1/2 ) for p = 1 # rows of A = ~(k/ε) for p = 2

Sparse-Output Results Sparse output: Indexing – ~(k/ε) for p = 1 – ~(k/ε 2 ) for p = 2

Sparse Output Results - Indexing x 2 {0,1} n i 2 {1, 2, …, n} What is x i ? CC(Indexing) = (n)

(1/ε) Bound for k=1, p = 1 x 2 {- ε, ε} 1/ε y = e i Consider x+y If output is required to be 1-sparse must place mass on the i-th coordinate Mass must be 1+ε if x i = ε, otherwise 1-ε Generalizes to k > 1 to give ~(k/ε) Generalizes to p = 2 to give ~(k/ε 2 ) Generalizes to k > 1 to give ~(k/ε) Generalizes to p = 2 to give ~(k/ε 2 )

Deterministic Results Deterministic: Equality – (k log(n/k) / ε) for p = 1

Deterministic Results - Equality x 2 {0,1} n Is x = y? Deterministic CC(Equality) = (n) y 2 {0,1} n

(k log(n/k) / ε) for p = 1 Choose log n signals x 1, …, x log n, each with k/ε values equal to ε/k x = Σ i=1 log n 10 i x i Choose log n signals y 1, …, y log n, each with k/ε values equal to ε/k y = Σ i=1 log n 10 i y i Consider x-y Compressed sensing output is 0 n iff x = y

General Results – Gaussian Channels (k = 1, p = 2) Alice has a signal x =ε 1/2 e i for random i 2 [n] Alice transmits x over a noisy channel with independent N(0, 1/n) noise on each coordinate Consider any row vector a of A Channel output = +, where is N(0, |a| 2 2 /n) E i [ 2 ] = ε |a| 2 2 /n Shannon-Hartley Theorem: I(i; + ) = I( ; + ) · ½ log(1+ ε) = O(ε)

Summary of Results General results – £ ~(k/ε p/2 ) Sparse output – £ ~(k/ε p ) Deterministic – £ (k log(n/k) / ε) for p = 1

Similar presentations