An Introduction to Sparse dictionary learning

An Introduction to Sparse dictionary learning
SUBTITLE HERE

1. Definition of Sparsity
Theoretical studies suggest that primary visual cortex (area V1) uses a sparse code to efficiently represent natural scenes. Sparse coding is a way of representing some phenomenon with as few variables as possible. Usage: Compression, Analysis, Denoising

2. Sparse Representation
𝑥 ≈ 𝐷 ∗𝑎 𝑥∈ ℛ 𝑛 is a signal Non-zero entries Near-zero or zero entries 𝐷=[𝑑1,𝑑2..𝑑𝑛]∈ ℛ 𝑛𝑥𝑚 𝑑 𝑇 𝑑=1 is a set of normalized “basis vectors” or “atoms” Non-zero entries ≈ * Near-zero or zero entries 𝑎∈ ℛ 𝑚 is a sparse vector that solves the equation Non-zero entries Signal Dictionary Sparse Vector

3. Why Sparsity? Provides the ability to represent large amount of data by keeping less information. (saves memory/storage) But the main property that sparsity has… “If you can create a model such that the objects can be represented with much less information than originally, but still have high fidelity to the original, it means that you have created a good model, because you have replaced redundancies in the original representation.”

4. Sparse Representation Overview
Δεν μ αρεσει ο τιτλος εδω When using sparse representation as a way of feature extraction, you may wonder: Does there exists the sparsity property in the data? Does sparse feature really come up with better results? Does it contain any semantic meaning? In a short answer “sometimes, not always” Below is a list of successful applications: Image Denoising Deblurring Inpainting Super-Resolution Restoration Quality Assessment Classification Segmentation Signal Processing Object Tracking Texture Classification Image Retrieval Bioinformatics Biometrics And Many Other Artificial Intelligence Sections

Δεν μ αρεσει ο τιτλος εδω Assumes that there exists a sparse property in the data, otherwise sparse representation means nothing. D is used to be the feature of x D can be used to efficiently store and reconstruct x

The general framework of sparse representation is: To exploit the linear combination of some samples or “atoms” . To represent the probe sample. To calculate the representation solution. i.e. Find the representation coefficients 𝑎 of features 𝑑 𝑖 for the samples 𝑥, and then utilize the representation solution to reconstruct the desired results.

4. Sparse Representation Categories
Sparse Representation, can be dominated by the regularizer imposed on the representation solution. For that reason it can be grouped into five general categories in terms of the different norms: l0-norm minimization: 𝛼 ∗ =𝑎𝑟𝑔𝑚𝑖𝑛 α 0 l1-norm minimization: 𝛼 ∗ =𝑎𝑟𝑔𝑚𝑖𝑛 α 1 lp-norm minimization: 𝛼 ∗ =𝑎𝑟𝑔𝑚𝑖𝑛 α 𝑝 𝑝 l2-norm minimization: 𝛼 ∗ =𝑎𝑟𝑔𝑚𝑖𝑛 α 2 2 l2,1-norm minimization: 𝛼 ∗ =𝑎𝑟𝑔𝑚𝑖𝑛 𝑋−𝐷𝛼 2,1 +𝜇 𝛼 2,1 function [ out ] = l_norm(varargin) A = varargin{1}; p = varargin{2}; if nargin == 2 if p == 0 out = nnz(A); else out = sum(abs(A).^p)^(1/p); end q = varargin{3}; out = sum(sum(abs(A').^p).^(q/p))^(1/q);

Built-in feature selection
4.1. Comparison of L-norms Before the algorithms that will find the sparse vector 𝑎, of a given 𝑥,𝐷. Let’s take a close look at 𝐿0/𝐿𝑝/𝐿1/𝐿2/𝐿2,1 norm L0-regularization Lp-regularization L1-regularization L2-regularization L2,1-regularization 𝑥 0 =#(𝑖| 𝑥 𝑖 ≠0) 𝑥 𝑝 = 𝑝 𝑖 |𝑥 𝑖 𝑝 𝑥 1 = 𝑖 | 𝑥 𝑖 | 𝑥 2 = 𝑖 𝑥 𝑖 2 𝑋 𝑝,𝑞 = 𝑗=1 𝑛 𝑖=1 𝑚 𝑥 𝑖𝑗 𝑝 𝑞 𝑝 /𝑞 𝑥 is a vector 𝑋 is a matrix Counts nonzero elements A p-th-root of a summation of all elements to the p-th power Sparse outputs No Sparse outputs Is more robust since the error for each data point (a column) is not squared. It is used in robust data analysis and sparse coding. Built-in feature selection No feature selection

4.1. Why solution uniqueness is important?
A unique linear regression estimate can have quite poor predictive accuracy. For example: The green line (L2-norm) is the unique shortest path, while the red, blue, yellow (L1- norm) are all same length for the same route. L2-norm has unique solutions while L1-norm does not.

4.1. Why Built-in feature selection is important?
Suppose the model have 1000 coefficients but only 10 of them have non-zero coefficients, this is effectively saying that “the other 990 predictors are useless in predicting the target values”. This is actually a result of the L1-norm, which tends to produces sparse coefficients.

5. Methods of Solving Sparse Coding
The following algorithms are trying to find a sparse vector α οf a given signal x and dictionary D. Greedy Optimization Strategy Constrained Optimization Strategy Proximity Algorithm Based Optimization Strategy Homotopy Algorithm Based Sparse Representation Matching Pursuit Orthogonal Matching Pursuit Orthogonal Matching Pursuit for large matrices Gradient Projection Sparse Reconstruction Interior-point method based sparse representation strategy Alternating direction method (ADM) based sparse representation strategy Soft thresholding or shrinkage operator Sparse reconstruction by separable approximation Iterative shrinkage thresholding algorithm l1/2-norm regularization based sparse representation Fast Iterative shrinkage thresholding algorithm Augmented Lagrange Multiplier based optimization strategy LASSO homotopy BPDN homotopy Iterative Reweighting l1-norm minimization via homotopy

5.1. Matching Pursuit Algorithm
It is a coordinate descent algorithm that iteratively selects one entry of the current estimate 𝛼 at a time and updates it. It is closely related to projection pursuit algorithms from the statistics literature. input Initialize 𝑎 = 0 select the coordinate with maximum partial Derivative 𝐽 = 𝑎𝑟𝑔𝑚𝑎𝑥| 𝑑 𝑗 𝑡 (𝑥−𝐷𝑎) D, χ 𝑎 0 < 𝑘 output α Lnorm0 function [a] = matching_pursuit(x,D,K) [n,m] = size (D) ; a = zeros(m,1); while l_norm(a,0) < K [argval_k,argmax_k] = max(abs(D'*(x-D*a))); a(argmax_k) = a(argmax_k) + D(:,argmax_k)'*(x-D*a); end

5.2. Orthogonal Matching Pursuit Algorithm
It is similar in spirit to matching pursuit, but enforces the residual to be always orthogonal to all previously selected variables, which is equivalent to saying that the algorithm reoptimizes the value of all non-zero coefficients once it selects a new variable. input Select dk with max projection on residue αk = arg min ||χ-Dkαk|| D, χ output Check terminating condition Update residue r = χ - Dkαk α Lnorm0 There is an optimized version of this algorithm which is presented as matlab source above.

5.3. Orthogonal Matching Pursuit Algorithm – for large matrices
function [a] = orthogonal_matching_pursuit (x,D,k) [m,n] = size (D) ; Q = zeros (m,k); R = zeros (k,k); Rinv = zeros (k,k) ; w = zeros (m,k) ; a = zeros (1,n) ; Res = x; for J = 1 : k [argval_k ,argmax_k(J)] = max(abs(D'*Res)) ; w (:,J) = D (:,argmax_k(J)) ; for I = 1 : J-1 if (J-1 ~= 0) R (I,J) = Q (:,I)' * w (:,J) ; w (:,J) = w (:,J) - R (I,J) * Q (:,I) ; end R (J,J) = norm (w (:,J)) ; Q (:,J) = w (:,J) / R (J,J) ; Res = Res - (Q (:,J) * Q (:,J)' * Res) ; Rinv (J,J) = 1 / R (J,J) ; Rinv (I,J) = -Rinv (J,J) * (Rinv (I,1:J-1) * R (1:J-1,J)); xx = Rinv * Q' * x ; a(argmax_k) = xx ;

6. Dictionary Learning Now let’s look at the reverse problem: could we design dictionaries based on learning? In the preceding slides, we generally assume that the (over-complete) bases D is existed and known. However in practice, we usually need to build it. Our goal is to find the dictionary D that yields sparse representations for the training signals.

6. How can we get the dictionary D?
Ideal case of Dictionary Expect: 𝑋 – 𝐷∗𝑎 = 0 Until now no algorithm is guaranteed to find a global minimizer. Know categories of dictionaries: Predefined dictionary based on various types of wavelets Dictionary Learning

6.1. Predefined dictionary based on various types of wavelets Algorithms
DCT/Wavelet dictionaries Time-frequency dictionaries

6.2. Dictionary Learning Algorithms
Supervised dictionary learning. Some representative algorithms are the following: Unsupervised dictionary learning. Discriminative KSVD Metaface dictionary learning Label consistent KSVD Discriminative non-parametric dictionary learning Fisher discrimination dictionary learning K-SVD Nonparametric Bayesian dictionary learning Tree-structured dictionary learningLabel consistent KSVD Locality constrained linear coding for unsupervised dictionary learning

Initialize Dictionary
K-SVD Algorithm Select atoms from input Atoms can be patches from the image Patches are overlapping Initialize Dictionary Sparse Coding (OMP) Update Dictionary One atom at a time

K-SVD Algorithm Use OMP or any other fast method Output gives sparse code for all signals Minimize error in representation Initialize Dictionary Sparse Coding (OMP) Update Dictionary One atom at a time

K-SVD Algorithm Replace unused atom with minimally represented signal Identify signals that use k-th atom (non zero entries in rows of X) Initialize Dictionary Sparse Coding (OMP) Update Dictionary One atom at a time

K-SVD Algorithm Initialize Dictionary Deselect k-th atom from dictionary Find coding error matrix of these signals Minimize this error matrix with rank- 1 approx from SVD Sparse Coding (OMP) Update Dictionary One atom at a time

K-SVD Algorithm Initialize Dictionary [U,S,V] = svd(Ek) Replace coeff of atom dk in X with entries of s1v1 dk = u1/||u1||2 Sparse Coding (OMP) Update Dictionary One atom at a time

Applications

7.1. Super Resolution using Sparse Coding
This application performs sparse coding to increase(x2) the resolution of a given image. Open and execute the Main.m file Select an image (jpg,bmp,png) Wait for the results. The application is using a predefined dictionary created by several natural images.

7.1. Super Resolution using Sparse Coding

7.2. Image Denoising using Sparse Coding
This application performs sparse coding to denoise a given image. Open and execute the Main.m file Select an image (jpg,bmp,png) Wait for the results. The application creates a dictionary using the K-SVD algorithm from the noisy image and afterwards reconstructs the image by its coefficients

7.2. Image Denoising using Sparse Coding

An Introduction to Sparse dictionary learning

Similar presentations

Presentation on theme: "An Introduction to Sparse dictionary learning"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

An Introduction to Sparse dictionary learning

Similar presentations

Presentation on theme: "An Introduction to Sparse dictionary learning"— Presentation transcript:

Similar presentations

About project

Feedback