Compressible priors for high-dimensional statistics Volkan Cevher LIONS/Laboratory for Information and Inference Systems

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.
An Introduction to Compressed Sensing Student : Shenghan TSAI Advisor : Hsuan-Jung Su and Pin-Hsun Lin Date : May 02,
Compressive Sensing IT530, Lecture Notes.
Pattern Recognition and Machine Learning
Joint work with Irad Yavneh
Online Performance Guarantees for Sparse Recovery Raja Giryes ICASSP 2011 Volkan Cevher.
Approximate Message Passing for Bilinear Models
2008 SIAM Conference on Imaging Science July 7, 2008 Jason A. Palmer
Extensions of wavelets
Combinatorial Selection and Least Absolute Shrinkage via The CLASH Operator Volkan Cevher Laboratory for Information and Inference Systems – LIONS / EPFL.
ECE Department Rice University dsp.rice.edu/cs Measurements and Bits: Compressed Sensing meets Information Theory Shriram Sarvotham Dror Baron Richard.
Bayesian Robust Principal Component Analysis Presenter: Raghu Ranganathan ECE / CMR Tennessee Technological University January 21, 2011 Reading Group (Xinghao.
Visual Recognition Tutorial
A Bayesian Approach to Joint Feature Selection and Classifier Design Balaji Krishnapuram, Alexander J. Hartemink, Lawrence Carin, Fellow, IEEE, and Mario.
“Random Projections on Smooth Manifolds” -A short summary
Volkan Cevher, Marco F. Duarte, and Richard G. Baraniuk European Signal Processing Conference 2008.
Image Denoising via Learned Dictionaries and Sparse Representations
Dimensional reduction, PCA
Random Convolution in Compressive Sampling Michael Fleyer.
Introduction to Compressive Sensing
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Empirical Bayes approaches to thresholding Bernard Silverman, University of Bristol (joint work with Iain Johnstone, Stanford) IMS meeting 30 July 2002.
(1) A probability model respecting those covariance observations: Gaussian Maximum entropy probability distribution for a given covariance observation.
Compressed Sensing Compressive Sampling
How Robust are Linear Sketches to Adaptive Inputs? Moritz Hardt, David P. Woodruff IBM Research Almaden.
Model-based Compressive Sensing
An ALPS’ view of Sparse Recovery Volkan Cevher Laboratory for Information and Inference Systems - LIONS
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Richard Baraniuk Rice University Model-based Sparsity.
Compressive Sampling: A Brief Overview
Sensing via Dimensionality Reduction Structured Sparsity Models Volkan Cevher
Universal and composite hypothesis testing via Mismatched Divergence Jayakrishnan Unnikrishnan LCAV, EPFL Collaborators Dayu Huang, Sean Meyn, Venu Veeravalli,
PATTERN RECOGNITION AND MACHINE LEARNING
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
IC Research Seminar A Teaser Prof. Dr. Volkan Cevher LIONS/Laboratory for Information and Inference Systems
Game Theory Meets Compressed Sensing
Richard Baraniuk Chinmay Hegde Marco Duarte Mark Davenport Rice University Michael Wakin University of Michigan Compressive Learning and Inference.
EE 290A: Generalized Principal Component Analysis Lecture 2 (by Allen Y. Yang): Extensions of PCA Sastry & Yang © Spring, 2011EE 290A, University of California,
Recovery of Clustered Sparse Signals from Compressive Measurements
Cs: compressed sensing
Recovering low rank and sparse matrices from compressive measurements Aswin C Sankaranarayanan Rice University Richard G. Baraniuk Andrew E. Waters.
Learning With Structured Sparsity
Source Localization on a budget Volkan Cevher Rice University Petros RichAnna Martin Lance.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Model-Based Compressive Sensing Presenter: Jason David Bonior ECE / CMR Tennessee Technological University November 5, 2010 Reading Group (Richard G. Baraniuk,
SCALE Speech Communication with Adaptive LEarning Computational Methods for Structured Sparse Component Analysis of Convolutive Speech Mixtures Volkan.
Shriram Sarvotham Dror Baron Richard Baraniuk ECE Department Rice University dsp.rice.edu/cs Sudocodes Fast measurement and reconstruction of sparse signals.
High-dimensional Error Analysis of Regularized M-Estimators Ehsan AbbasiChristos ThrampoulidisBabak Hassibi Allerton Conference Wednesday September 30,
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Lecture 2: Statistical learning primer for biologists
Ridge Regression: Biased Estimation for Nonorthogonal Problems by A.E. Hoerl and R.W. Kennard Regression Shrinkage and Selection via the Lasso by Robert.
Machine Learning CUNY Graduate Center Lecture 6: Linear Regression II.
Multi-label Prediction via Sparse Infinite CCA Piyush Rai and Hal Daume III NIPS 2009 Presented by Lingbo Li ECE, Duke University July 16th, 2010 Note:
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Probability Theory and Parameter Estimation I
Special Topics In Scientific Computing
Basic Algorithms Christina Gallner
Probabilistic Models for Linear Regression
Nuclear Norm Heuristic for Rank Minimization
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Wavelet-Based Denoising Using Hidden Markov Models
Bounds for Optimal Compressed Sensing Matrices
Sudocodes Fast measurement and reconstruction of sparse signals
Introduction to Compressive Sensing Aswin Sankaranarayanan
Sketched Learning from Random Features Moments
Parametric Methods Berlin Chen, 2005 References:
CIS 700: “algorithms for Big Data”
Outline Sparse Reconstruction RIP Condition
Presentation transcript:

Compressible priors for high-dimensional statistics Volkan Cevher LIONS/Laboratory for Information and Inference Systems

Dimensionality Reduction Compressive sensingnon-adaptive measurements Sparse Bayesian learningdictionary of features Information theory coding frame Theoretical computer science sketching matrix / expander

Dimensionality Reduction Challenge: `

Approaches Deterministic Probabilistic Prior sparsity compressibility Metric likelihood function

Deterministic View

1.Sparse or compressible not sufficient alone 2.Projection information preserving (stable embedding / special null space) 3.Decoding algorithms tractable My Insights on Compressive Sensing

Sparse signal: only K out of N coordinates nonzero –model: union of all K -dimensional subspaces aligned w/ coordinate axes Example: 2-sparse in 3-dimensions support: Deterministic Priors

Sparse signal: only K out of N coordinates nonzero –model: union of all K -dimensional subspaces aligned w/ coordinate axes sorted index Deterministic Priors

Sparse signal: only K out of N coordinates nonzero –model: union of K -dimensional subspaces Compressible signal: sorted coordinates decay rapidly to zero –Model: weak ball: Deterministic Priors sorted index power-law decay wavelet coefficients:

Sparse signal: only K out of N coordinates nonzero –model: union of K -dimensional subspaces Compressible signal: sorted coordinates decay rapidly to zero well-approximated by a K -sparse signal (simply by thresholding) sorted index Deterministic Priors

Restricted Isometry Property (RIP) Preserve the structure of sparse/compressible signals RIP of order 2K implies: for all K-sparse x 1 and x 2 K-planes A random Gaussian matrix has the RIP with high probability if

Restricted Isometry Property (RIP) Preserve the structure of sparse/compressible signals RIP of order 2K implies: for all K-sparse x 1 and x 2 K-planes A random Gaussian matrix has the RIP with high probability if various works by Picasso

Robust Null Space Property (RNSP) RNSP in 1-norm (RNSP-1): RNSP-1 <> instance optimality Best K-term approximation: [Cohen, Dahmen, and Devore; Xu and Hassibi; Davies and Gribonval]

Recovery Algorithms Goal:given recover and convex optimization formulations –basis pursuit, Lasso, scalarization … –iterative re-weighted algorithms Greedy algorithms: CoSaMP, IHT, SP

Performance of Recovery (q=1) Tractability polynomial time Sparse signalsinstance optimal –noise-free measurements: exact recovery –noisy measurements: stable recovery Compressible signalsinstance optimal –recovery as good as K-sparse approximation (via RIP) CS recovery error signal K-term approx error noise

The Probabilistic View

Probabilistic View Goal:given recover Prior: iid generalized Gaussian distribution (GGD) Algorithms: via Bayesian inference arguments –maximize prior –prior thresholding –maximum a posteriori (MAP) iid: independent and identically distributed

Probabilistic View Goal:given recover Prior: iid generalized Gaussian distribution (GGD) Algorithms: (q=1 <> deterministic view) –maximize prior –prior thresholding –maximum a posteriori (MAP)

Probabilistic View Goal:given recover Prior: iid generalized Gaussian distribution (GGD) Stable embedding:an experiment –q=1 –x from N iid samples from GGD (no noise) –recover using

Probabilistic View Goal:given recover Prior: iid generalized Gaussian distribution (GGD) Stable embedding:a paradox –q=1 –x from N iid samples from GGD (no noise) –recover using –need M~0.9 N (Gaussian ) vs.

Approaches Do nothing / Ignore be content with where we are… –generalizes well –robust

* You could be a Bayesian if … your observations are less important than your prior. Compressible Priors * [VC; Gribonval, VC, Davies; VC, Gribonval, Davies]

Compressible Priors Goal: seek distributions whose iid realizations can be well-approximated as sparse Definition: sorted index relative k-term approximation:

Compressible Priors Goal:seek distributions whose iid realizations can be well-approximated as sparse sorted index Classical:New:

Compressible Priors Goal:seek distributions whose iid realizations can be well-approximated as sparse Motivations:deterministic embedding scaffold for the probabilistic view analytical proxies for sparse signals –learning (e.g., dim. reduced data) –algorithms (e.g., structured sparse) information theoretic (e.g., coding) Main concept:order statistics

Key Proposition

Example 1 Consider the Laplacian distribution (with scale parameter 1) –The G-function is straightforward to derive Laplacian distribution <> NOT 1 or 2-compressible

Example 1 Consider the Laplacian distribution (with scale parameter 1) Laplacian distribution <> NOT 1 or 2-compressible Why does minimization work for sparse recovery then? –The sparsity enforcing nature of the cost function –The compressible nature of the unknown vector x

Sparse Modeling vs. Sparsity Promotion Bayesian interpretation <> inconsistent of sparse recovery four decoding algorithms:

Example 2

Approximation Heuristic for Compressible Priors via Order Statistics Probabilistic <> signal model iid order statistics of

Probabilistic <> signal model Deterministic signal model <> iid order statistics of Approximation Heuristic for Compressible Priors via Order Statistics

Probabilistic <> signal model Deterministic signal model <> Quantile approximation iid order statistics of cdf Magnitude quantile function (MQF) Approximation Heuristic for Compressible Priors via Order Statistics

Compressible Priors w/ Examples

Dimensional (in)Dependence Dimensional independence <> <> unbounded moments example: CS recovery error signal K-term approx error noise

Dimensional (in)Dependence Dimensional independence <> truly logarithmic embedding Dimensional dependence <> <> bounded moments example: iid Laplacian not so much! / same result can be obtained via the G-function

Why should we care? Natural images –wavelet coefficients deterministic view vs. probabilistic view Besov spaces GGD, scale mixtures wavelet tresholding Shannon source coding

Why should we care? Natural images –wavelet coefficients deterministic view vs. probabilistic view Besov spaces GGD, scale mixtures wavelet tresholding Shannon source coding [Choi, Baraniuk; Wainwright, Simoncelli; …]

Berkeley Natural Images Database

Learned parameters depend on the dimension

Why should we care? Natural images (coding / quantization) –wavelet coefficients deterministic view vs. probabilistic view Besov spaces GGD, scale mixtures wavelet tresholding Shannon source coding (histogram fits, KL divergence) [bad ideas] Conjecture: Wavelet coefficients of natural images belong to a dimension independent (non-iid) compressible prior

Incompressibility of Natural Images 1-norm instance optimality blows up:

Incompressibility of Natural Images 1-norm instance optimality blows up: Is compressive sensing USELESS for natural images?

Instance Optimality in Probability to the Rescue

Incompressibility of Natural Images Is compressive sensing USELESS for natural images? Not according to Theorem 2!!! For large N, 1-norm minimization is still near-optimal.

Incompressibility of Natural Images Is compressive sensing USELESS for natural images? Not according to Theorem 2!!! But, are we NOT missing something practical?

Incompressibility of Natural Images But, are we NOT missing something practical? Natural images have finite energy since we have finite dynamic range. While the resolution of the images are currently ever-increasing, their dynamic range is not. In this setting, compressive sensing using naïve sparsity will not be useful.

Other Bayesian Interpretations Multivariate Lomax dist. (non-iid, compressible w/ r=1) –maximize prior –prior thresholding –maximum a posteriori (MAP) Interactions of Gamma and GGD –iterative re-weighted algorithms fixed point continuation (has GPD(x i ;q, i ) marginals)

Summary of Results

Conclusions Compressible priors<>probabilistic models for compressible signals (deterministic view) q-parameter <>(un)bounded moments –independent of Ntruly logarithmic embedding with tractable recovery dimension agnostic learning –not independent of Nmany restrictions (embedding, recovery, learning) Natural images <>CS is not a good idea w/ naïve sparsity Why would compressible priors be more useful vs. ? –Ability to determine the goodness or confidence of estimates

Pathology of Dimensional Dependence parameter learning <> dimension dependent