Design of Non-Linear Kernel Dictionaries for Object Recognition

Name: Design of Non-Linear Kernel Dictionaries for Object Recognition
Uploaded: 2017-07-14T15:35:53+00:00
Duration: PTM27S41
Channel: Earl Chapman
Description: Design of Non-Linear Kernel Dictionaries for Object Recognition

Design of Non-Linear Kernel Dictionaries for Object Recognition
Murad Megjhani MATH : 6397

Agenda Sparse Coding Dictionary Learning Problem Formulation (Kernel)
Results and Discussions

Motivation Given a 16x16(or nxn) image patch x, we can represent it using 256 real numbers(pixels). Problem: Can we find or learn a better representation for this? Given a set of images, learn a better way to represent image other than pixels.

What is Sparse Linear Model
be a set of normalized “basis vectors”. Lets call it Dictionary. D is “adapted” to x if it can represent it with a few basis vectors—that is, there exists a sparse vector γ in Rk such that x ≈Dγ. We call γ the sparse code.

Sparse Coding Illustration
Bases [d1 , …, d64] Natural Images Test example » 0.8 * * * x » 0.8 * d * d * d63 [0, 0, …, 0, 0.8, 0, …, 0, 0.3, 0, …, 0, 0.5, …] = [γ1, …, γ64] (feature representation) Compact & easily interpretable

≈ X D Γ Notation … … n dimensional Signal Vector
. . X ≈ D Γ … … n dimensional Signal Vector Matrix of N signal vectors of n dimension Atom : Elementary signal representing template Over-complete basis -Dictionary of size K (K>>n) Sparse representation for input signal Matrix of Sparse Vectors

Sparse Coding Problem ψ induces sparsity in γj
data fitting Sparsity inducing regularization ψ induces sparsity in γj The l0 “pseudo norm”. ||γ||0 ≡ #{i s.t. γ[i] ≠ 0} (NP-hard). The l1 norm. ||γ||1 ≡ ∑ | γ[i] | (convex). This is a selection problem. When ψ is the l1 norm, the problem is called LASSO [1] or Basis Pursuit [2]. When ψ is the l0 norm, the problem is called Matching Pursuit [3] or Orthogonal Matching Pursuit [4].

Dictionary Learning Problem
data fitting Sparsity inducing regularization Designed Dictionary D can be designed like in Haar [5], Curvelets [6], … Learned Dictionary D can be created from data as in “Sparse coding with an over-complete basis set” [7] and K-SVD [8] We will study two of these algorithms today and see how they can be Kernel-aized [9].

Orthogonal Matching Pursuit
1 2 for iter = 1…T do Select the atom which most reduces the objective 3 4 Update the active set: 5 Update the residual (Orthogonal Projection) 6 Update the coefficients 7 end for

Sparse Coding Algorithms
Matching Pursuit The MP is one of the greedy algorithms that finds one atom at a time [4]. Step 1: find the one atom that best matches the signal. Next steps: given the previously found atoms, find the next one to best fit the residual. The Orthogonal MP (OMP) is an improved version that re-evaluates the coefficients by Least-Squares after each round.

Orthogonal Matching Pursuit
Input x, Dictionary D and Sparsity Threshold T 1. Initialize 2. for iter = 1…T do Select the atom which most reduces the objective 3. 4. Update the active set: MP : Updates only one coefficient corresponding to the selected atom 5. Update the coefficients OMP : Updates coefficient of all the coefficients in the active set 6. Update the residual 7. end for

Dictionary Learning Algorithms: K-SVD
. . ≈ X D Γ … … Initialize D Sparse Code Dictionary Update

The K-SVD algorithm: Train an explicit dictionary from examples Input Output . . X D Γ … … Set of Examples The target function to minimize: The examples are linear combinations of the atoms Each representation uses at most T atoms

The Sparse Coding Stage: . . X ≈ D Γ … … For the jth example For sparse coding, use batch OMP, or any other sparse coding algorithm

Dictionary Update Stage: . . X ≈ D Γ … … For the kth atom (Residual)

Dictionary Update Stage: . . ≈ X D Γ … … We can do better! sparsity?

Dictionary Update Stage: We want to solve: γk T ~ ~ Ek dk ~ When updating γk, only re-compute the coefficients for those examples Only some of the examples use atom dk Solve with SVD

Summary: . . ≈ X D Γ … … Initialize Dictionary Sparse Code Using OMP Dictionary Update Atom-by-atom + coeffs.

Summary: Input :X, Sparsity Threshold T Initialization : Set the random normalized dictionary matrix D(0) ∈ Rn×K . Set J = 1 Repeat until convergence ( or for some fixed number of iterations) Sparse Coding Stage: Use any Pursuit algorithm to compute γi for i = 1…N Codebook Update Stage : for k = 1…K Define the group of examples that use dk, ωk = {i|1 ≤ i ≤ N, xi(k) ≠ 0} Compute Restrict Ek by choosing only the columns corresponding to those elements that initially used dk in their representation, and obtain Apply SVD decomposition Update : Set J = J+1 Output : D,Γ

Non-Linear Dictionary Learning Problem Formulation
Goal is to learn the non-linear dictionary in the feature space H by solving (1) Proposition 1**: There exists an optimal solution D* to (1) that has the following form Equation (1) now can be written as **For proof refer Appendix VI of H. Van Nguyen, V. M. Patel, N. M. Nasrabadi, and R. Chellappa, “Design of non-linear kernel dictionaries for object recognition.,” IEEE Trans. Image Process., vol. 22, no. 12, pp. 5123–35, Dec. 2013

Non-Linear Dictionary Learning Problem Formulation
This gives the motivation to formulate the dictionary learning in functional space using the kernel trick.

Kernel-Orthogonal Matching Pursuit
Input: z, kernel ‘K’, Sparsity Threshold T, Coefficient Matrix A 1. Initialize: 2. for iter = 1…T do 3. Select the atom which most reduces the objective 4. Update the active set: 5. Update the coefficients 6. 7. end for Output: γ

Kernel-Dictionary Learning Algorithms: K-SVD
Summary: Input: X, Sparsity Threshold T, Kernel function K Initialization: Set the random normalized dictionary matrix A(0) ∈ Rn×K . Set J = 1 Repeat until convergence ( or for some fixed number of iterations) Sparse Coding Stage: Use any KOMP to compute γi for i = 1…N given A(J-1) Codebook Update Stage: for each column ak(J-1) in A(J-1) k = 1…K Define the group of examples that use ak, ωk = {i|1 ≤ i ≤ N, xi(k) ≠ 0} Compute Restrict Ek by choosing only the columns corresponding to those elements that initially used ak in their representation, and obtain Apply SVD decomposition Where v1 is the first vector of V corresponding to the largest singular value σ12 in Δ Set J = J+1 Output: A,Γ

Kernel-Dictionary Learning Algorithms: K-SVD

Results and Discussion
Synthetic data ( not discussed) Kernel Sparse Representation Digit Recognition Caltech-101 and Caltech-256 Object Recognition Kernel Sparse Representation “It is clearly seen that the MSE decays much faster for kernel KSVD than kernel PCA with respect to the number of selected bases. This observation implies that the image is nonlinearly sparse and learning a dictionary in the high dimensional feature space can provide a better representation of data.” Compares the mean-squared-error (MSE) of an image from the USPS dataset when approximated using the first m dominant kernel PCA components and m = [1, , 20] kernel dictionary atoms (i.e. T0 = m).

Digit Recognition Kernel Sparse Representation Digit Recognition Caltech-101 and Caltech-256 Object Recognition Dataset USPS handwritten digit database. Image Size 16x16 making the dimension of the vector 256 and number of classes = 10 ( number of digits). Ntrain = 500 samples for training and Ntest = 200 samples for testing – for each class. Parameter Selection The selection of parameters is done through a 5-fold cross-validation. Parameters K (size of dictionary) = 300 atoms. T0 (sparsity constraint)= 5. maximum number of training iterations = 80. Kernel Type : polynomial kernel of degree 4.

Digit Recognition Approach 1 : Distributive Approach Training Examples A separate Dictionary for each class was learned using Kernel-KSVD X1 = [xi1,…,xiN]∈ R256×500 i=1 . . X10 = [xi1,…,xiN]∈ R256×500 i=10 A1 ∈ R500×300 . . A10 ∈ R500×300 compute Pre-images of learned atoms : Since the dictionary is learned in the kernel space the atoms in the dictionary need to be converted to back to the Euclidean space to view the atom**. Testing : Given a test image z ∈ R256 , perform KOMP γ1 ∈ R300 using A1 . . γ10 ∈ R300 using A10 r1 . r10 z ∈ R256 ** J. T.-Y. Kwok and I. W.-H. Tsang, “The pre-image problem in kernel methods,” IEEE Trans. Neural Netw., vol. 15, no. 6, pp. 1517–1525, Nov

Digit Recognition Approach 2 : Collective Approach Training Examples A separate Dictionary for each class was learned using Kernel-KSVD X1 = [xi1,…,xiN]∈ R256×500 i=1 . . X10 = [xi1,…,xiN]∈ R256×500 i=10 A1 ∈ R500×300 . . A10 ∈ R500×300 compute Pre-images of learned atoms : Since the dictionary is learned in the kernel space the atoms in the dictionary need to be converted to back to the Euclidean space to view the atom**[10]. Testing : Given a test image z ∈ R256 , perform KOMP on joined Dictionary [A1,…,A10 ] Sparsity Constraint T = 10( in this case) r1 . r10 γ1 ∈ R3000 using [A1,…,A10] z ∈ R256 ** J. T. Kwok and I. W. Tsang, “The pre-image problem in kernel methods.,” IEEE Trans. Neural Netw., vol. 15, no. 6, pp. 1517–25, Nov

Digit Recognition Performance Comparison – KSVD, Kernel PCA, Kernel K-SVD and Kernel MOD under different scenarios – missing pixels, different noise. Comparison of digit recognition accuracies for different methods in the presence of Gaussian noise and missing-pixel effects. Red color and orange color represent the distributive and collective classification approaches for kernel KSVD, respectively. In order to avoid clutter, we only report distributive approach for kernel MOD which gives better performance for this dataset. (a) Missing pixels. (b) Gaussian noise. Kernel KSVD classification accuracy versus the polynomial degree of the USPS dataset. The second set of experiments examines the effects of parameters choices on the overall recognition performances. Figure shows the classification accuracy of kernel KSVD as we vary the degree of polynomial kernel. The best error rate of 1.6% is achieved with the polynomial degree of 4.

Caltech-101 and Caltech-256 Object Recognition Dataset Caltech-101 database. 101 object classes, and 1 background class – collected randomly from Internet Each category contains images Average size of the image = 300x300 pixels Diverse and challenging dataset as it includes objects like building, musical instruments, animals and natural scenes Parameter Selection The selection of parameters is done through a 5-fold cross-validation Parameters K (size of dictionary) = 300 atoms. T0 (sparsity constraint)= 5. Maximum number of training iterations =80. Kernel Type : polynomial kernel of degree 4.

Caltech-101 and Caltech-256 Object Recognition Train on N images where N={5,10,15,20,25,30} and test on rest. Some categories are very small so end up with just one single image for testing. To compensate for the variations in class size they normalize the recognition results by the number of test images to get per-class accuracies. Final results is obtained by averaging per-class accuracies across 102 categories. Confusion matrix of Kernel KSVD recognition performances on Caltech 101 dataset The rows and columns correspond to true labels and predicted labels, respectively. The dictionary is learned from 3030 images where each class contributes 30 images. The sparsity is set at 30 for both training and testing. Although the confusion matrix contains all classes, only a subset of class labels is displayed for better legibility.

Caltech-101 and Caltech-256 Object Recognition Performance Comparison on Caltech – 101 Dataset Performance Comparison on Caltech – 256 Dataset #train samples 5 10 15 20 25 30 Griffin[11] 44.2 54.5 59.0 63.3 65.8 67.6 Germet[12] - 64.16 Yang[13] 67.0 73.2 KSVD[8] 49.8 59.8 65.2 68.7 71.0 LC-KSVD[14] 54.0 63.1 67.7 70.5 72.3 73.6 D-Kernel (MOD) 53.8 64.0 70.2 73.8 76.4 78.1 C-Kernel (MOD) 56.2 72.4 75.6 77.5 80.0 D-Kernel (KSVD) 54.2 64.5 74.0 76.5 78.5 C-Kernel (KSVD) 56.5 67.2 72.5 75.8 77.6 80.1 #train samples 15 30 Griffin[11] 28.3 34.1 Germet[12] - 27.2 Yang[13] 34.4 41.2 D-Kernel (MOD) 34.2 C-Kernel (MOD) 34.6 42.7 D-Kernel (KSVD) 34.5 41.4 C-Kernel (KSVD) 34.8 42.5

References R. Tibshirani, “Regression Shrinkage and Selection via the Lasso,” J. R. Stat. Soc., vol. Vol. 58, no. No. 1, pp. 267–288, 1996. S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic Decomposition by Basis Pursuit,” SIAM J. Sci. Comput., vol. 20, no. 1, pp. 33–61, Jan S. G. Mallat, “Matching pursuits with time-frequency dictionaries,” IEEE Trans. Signal Process., vol. 41, no. 12, pp. 3397–3415, 1993. Y. C. Pati, R. Rezaiifar, and P. S. Krishnaprasad, “Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition,” in Proceedings of 27th Asilomar Conference on Signals, Systems and Computers, 1993, pp. 40–44. A. Haar, “Zur Theorie der orthogonalen Funktionensysteme,” Math. Ann., vol. 71, no. 1, pp. 38–53, Mar E. J. Candes and D. L. Donoho, “Curvelets: A surprisingly effective nonadaptive representation for objects with edges,” 2000. B. A. Olshausen and D. J. Field, “Sparse coding with an overcomplete basis set: a strategy employed by V1?,” Vision Res., vol. 37, no. 23, pp. 3311–25, Dec M. Aharon, M. Elad, and A. Bruckstein, “K-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation,” IEEE Trans. Signal Process., vol. 54, no. 11, pp. 4311–4322, Nov H. Van Nguyen, V. M. Patel, N. M. Nasrabadi, and R. Chellappa, “Design of non-linear kernel dictionaries for object recognition.,” IEEE Trans. Image Process., vol. 22, no. 12, pp. 5123–35, Dec J. T. Kwok and I. W. Tsang, “The pre-image problem in kernel methods.,” IEEE Trans. Neural Netw., vol. 15, no. 6, pp. 1517–25, Nov G. Griffin, A. Holub, and P. Perona, “Caltech-256 Object Category Dataset.” California Institute of Technology, 10-Mar-2007. J. C. van Gemert, J.-M. Geusebroek, C. J. Veenman, and A. W. M. Smeulders, “Kernel codebooks for scene categorization,” in Computer Vision--ECCV 2008, Springer, 2008, pp. 696–709. T. Huang, “Linear spatial pyramid matching using sparse coding for image classification,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 1794–1801. Z. Jiang, Z. Lin, and L. S. Davis, “Learning a discriminative dictionary for sparse coding via label consistent K-SVD,” in CVPR 2011, 2011, pp. 1697–1704.

Thank You Q&A (let Q be sparse )

Design of Non-Linear Kernel Dictionaries for Object Recognition

Similar presentations

Presentation on theme: "Design of Non-Linear Kernel Dictionaries for Object Recognition"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Design of Non-Linear Kernel Dictionaries for Object Recognition

Similar presentations

Presentation on theme: "Design of Non-Linear Kernel Dictionaries for Object Recognition"— Presentation transcript:

Similar presentations

About project

Feedback