Sudocodes Fast measurement and reconstruction of sparse signals Shriram Sarvotham Dror Baron Richard Baraniuk ECE Department Rice University dsp.rice.edu/cs Came out of my personal experience with 301 – fourier analysis and linear systems
Sparse signal Acquisition Consider that contains only non-zero coefficients Are there efficient ways to measure and recover ? Traditional DSP approach: Acquisition: obtain measurements Sparsity is exploited only in the processing stage New Compressed Sensing (CS) approach: Acquisition: obtain just measurements Sparsity is exploited during signal acquisition [Candes et al; Donoho]
CS revelation Measure the signal with few random linear projections (inner products) measurements sparse signal information rate Revelation: Small sufficient to encode
CS Reconstruction Reconstruct given : Less rows than columns Ill-posed inverse problem Reconstruction approach search over subspace of explanations to measurements find most likely explanation Sparsity serves as a strong prior
CS Performance metrics Efficiency in encoding How small can we push M? Reconstruction complexity Critical for a practical decoder
Reconstruction: Traditional L2 Approach Goal: Given measurements find signal Fewer rows than columns in measurement matrix Ill-posed: infinitely many solutions Classical solution: least squares
Reconstruction: Traditional L2 Approach Goal: Given measurements find signal Fewer rows than columns in measurement matrix Ill-posed: infinitely many solutions Classical solution: least squares Problem: small L2 doesn’t imply sparsity
Reconstruction: L0 approach Modern solution: exploit sparsity of Of the infinitely many solutions seek sparsest one number of nonzero entries
Reconstruction: L0 approach Modern solution: exploit sparsity of Of the infinitely many solutions seek sparsest one If then perfect reconstruction w/ high probability [Bresler et al; Wakin et al] Performance Most efficient encoding, but combinatorial computational complexity
The CS Miracle – L1 Modern solution: exploit sparsity of Of the infinitely many solutions seek the one with smallest L1 norm
The CS Miracle – L1 Modern solution: exploit sparsity of Of the infinitely many solutions seek the one with smallest L1 norm If then perfect reconstruction w/ high probability[Candes et al; Donoho] Performance Efficient encoding, and Polynomial N3 computational complexity with linear programming
But… L1 is still inadequate! L1 minimization is still impractical for many applications Reconstruction times: N=1,000 t=10 seconds N=10,000 t=3 hours N=100,000 t=140 days Examples where are not uncommon; L1 is impractical Need new measurement and reconstruction strategies
But… L1 is still inadequate! L1 minimization is still impractical for many applications Reconstruction times: N=1,000 t=10 seconds N=10,000 t=3 hours N=100,000 t=140 days Examples where are not uncommon; L1 is impractical Need new measurement and reconstruction strategies This is where Sudocodes come in!
Sudocodes: overview Efficiency Reconstruction complexity Numerical results are phenomenal. Example: N=100,000 K=1,000 t=5.47 seconds M=5,132 Drawback: works for a specific signal class
Signal model Signal contains exactly non-zero coefficients Condition on the non-zero coefficients of Let = set of non-zero coefficients of Sum of any subset of is unique upto precision True with high probability when non-zero coefficients are drawn from a continuous distribution Otherwise pre-process the signal by dithering
Sudocode strategy Measurement matrix is sparse 0/1 Each row of contains L randomly placed 1’s Value of L is chosen based on N and K Special structure of enables fast measurement and reconstruction measurements sparse signal sparse 0/1 matrix nonzero entries
Sudocode reconstruction Process each measurement y(i) in succession Can the value of y(i) resolve any coefficient(s) of x?
Case 1: Zero measurement Inference: all coefficients involved in the measurement are zero Can resolve up to L coefficients with 1 measurement
Case 1: Zero measurement Inference: all coefficients involved in the measurement are zero Can resolve up to L coefficients with 1 measurement Recovered coefficients and corresponding columns of Phi can be ignored in remaining processing
Case 2: #(support set) = 1 Row 2 of Phi contains only one non-zero entry
Case 2: #(support set) = 1 Row 2 of Phi contains only one non-zero entry
Case 2: #(support set) = 1 resolved Row 2 of Phi contains only one non-zero entry Trivially gives the value of the corresponding coefficient
Case 3: Matching measurements Inference: matching measurements come from summing the same set of non-zero coefficients
Case 3: Matching measurements Common support Disjoint support Inference: matching measurements come from summing the same set of non-zero coefficients Identify disjoint support and common support
Case 3: Matching measurements Inference: matching measurements come from summing the same set of non-zero coefficients Identify disjoint support and common support Resolve coefficients
Sudoku puzzles Name “Sudocodes” inspired by sudoku puzzles. Thanks to Ingrid Daubechies for pointing out the connection
Two phase decoding is not measured Phase 1: decode coefficients Phase 2: decode remaining coefficients Why? When most coefficients are decoded, Phase 2 saves a factor of measurements
Phase 2 measurements and decoding is non-sparse of dimension
Phase 2 measurements and decoding is non-sparse of dimension Resolve remaining coefficients by inverting the sub-matrix of
Phase 2 measurements and decoding is non-sparse of dimension Resolve remaining coefficients by inverting the sub-matrix of Phase 2 complexity = Key: choose Phase 2 complexity is
Accelerated decoding I: Fast matching Use Binary Search Tree to store measurements Searching for matching measurements:
Accelerated decoding II: Avalanche <<Example>> If a coefficient is resolved, search past measurements for potential coefficient revelations
Design of Sudo measurement matrix Choice of L Set L based on For large N,
Number of measurements Theorem: With , phase 1 requires to exactly reconstruct coefficients Proof sketch:
Choice of L K=0.02N For a given choice of N and K
Choice of L Numerical evidence also suggests L = O(N/K)
Related work [Cormode, Muthukrishnan] CS scheme based on group testing M=O(K log2 N) Complexity O(K log2 N) [Gilbert et. al.] Chaining Pursuit CS scheme based on group testing and iterating the solution Complexity O(K log2 N log2 K) Works best for super-sparse signals
Performance comparison Chaining Pursuit Sudocodes N=10,000 K=10 M=5,915 t=0.16 sec M=461 t=0.14 sec K=100 M=90,013 t=2.43 sec M=803 t=0.37 sec N=100,000 M=17,398 t=1.13 sec M=931 t=1.09 sec K=1000 M>106 t>30 sec M=5,132 t=5.47 sec Chaining pursuit works admirably for small K oversampling factor is huge: efficiency is low works for compressible signals as well Sudocodes Very efficient yet fast reconstruction But works only on a restricted class of signals
Sudocode applications Erasure codes in p2p and distributed file storage Stream compressed digital content Thresholded DCT/wavelet coefficients for sudocoding Partial reconstruction of signals (e.g. detection)
Ongoing work Exploit statistical dependencies between non-zero coefficients Adaptive linear projections Algorithms to handle noisy measurements
Conclusions Sudocodes are highly efficient CS technique with low complexity Key idea: use sparse Phi Numerical results are very encouraging Applications to erasure codes, P2P networks However- works for a very specific sparse signal class
Number of measurements Theorem: With , phase 1 requires to exactly reconstruct coefficients Proof sketch: