1 Modification of Correlation Kernels in SVM, KPCA and KCCA in Texture Classification Yo Horikawa Kagawa University, Japan.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Lecture 9 Support Vector Machines
ECG Signal processing (2)
Road-Sign Detection and Recognition Based on Support Vector Machines Saturnino, Sergio et al. Yunjia Man ECG 782 Dr. Brendan.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
Medical Image Registration Kumar Rajamani. Registration Spatial transform that maps points from one image to corresponding points in another image.

An Introduction of Support Vector Machine
Support Vector Machines and Kernels Adapted from slides by Tim Oates Cognition, Robotics, and Learning (CORAL) Lab University of Maryland Baltimore County.
Machine learning continued Image source:
CLUSTERING PROXIMITY MEASURES
Face Recognition and Biometric Systems
Discriminative and generative methods for bags of features
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Principal Component Analysis
RBF Neural Networks x x1 Examples inside circles 1 and 2 are of class +, examples outside both circles are of class – What NN does.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Pattern Recognition Topic 1: Principle Component Analysis Shapiro chap
Face Recognition using PCA (Eigenfaces) and LDA (Fisherfaces)
Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.
Chapter 5 Part II 5.3 Spread of Data 5.4 Fisher Discriminant.
Support Vector Machines and Kernel Methods
Support Vector Machines
SVM Support Vectors Machines
Continuous Latent Variables --Bishop
Comparing Kernel-based Learning Methods for Face Recognition Zhiguo Li
An Introduction to Support Vector Machines Martin Law.
Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)
Probability of Error Feature vectors typically have dimensions greater than 50. Classification accuracy depends upon the dimensionality and the amount.
Outline Separating Hyperplanes – Separable Case
This week: overview on pattern recognition (related to machine learning)
Lecture 19 Representation and description II
1 Facial Expression Recognition using KCCA with Combining Correlation Kernels and Kansei Information Yo Horikawa Kagawa University, Japan.
Local invariant features Cordelia Schmid INRIA, Grenoble.
Support Vector Machine (SVM) Based on Nello Cristianini presentation
计算机学院 计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知 计算机学院 Perceptron Revisited: Linear Separators Binary classification.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
An Introduction to Support Vector Machines (M. Law)
Christopher M. Bishop, Pattern Recognition and Machine Learning.
CISC667, F05, Lec22, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines I.
CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
ENEE698A Graduate Seminar Reproducing Kernel Hilbert Space (RKHS), Regularization Theory, and Kernel Methods Shaohua (Kevin) Zhou Center for Automation.
PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.
Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.
CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
Feature Selection and Dimensionality Reduction. “Curse of dimensionality” – The higher the dimensionality of the data, the more data is needed to learn.
A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.
SVMs in a Nutshell.
Support Vector Machine: An Introduction. (C) by Yu Hen Hu 2 Linear Hyper-plane Classifier For x in the side of o : w T x + b  0; d = +1; For.
Chapter 13 Discrete Image Transforms
An Introduction of Support Vector Machine In part from of Jinwei Gu.
1 Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable.
Non-separable SVM's, and non-linear classification using kernels Jakob Verbeek December 16, 2011 Course website:
Support Vector Machines
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
CH 5: Multivariate Methods
Principal Component Analysis (PCA)
CS 2750: Machine Learning Support Vector Machines
Techniques for studying correlation and covariance structure
Principal Component Analysis
Feature space tansformation methods
Generally Discriminant Analysis
第 四 章 VQ 加速運算與編碼表壓縮 4-.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Faculty of Engineering, Kagawa University,
Presentation transcript:

1 Modification of Correlation Kernels in SVM, KPCA and KCCA in Texture Classification Yo Horikawa Kagawa University, Japan

2 ・ Support vector machine (SVM) ・ Kernel principal component analysis (kPCA) ・ Kernel canonical correlation analysis (kCCA) with modified versions of correlation kernels → Invariant texture classification Compare the performance of the modified correlation kernels and the kernel methods.

3 Support vector machine (SVM) Sample data: x i (1 ≤ i ≤ n), belonging to Class c i ∊ {-1, 1} SVM learns a discriminant function for test data x: d(x) = sgn(∑ i=1 n’ α i c i k(x, x si ) + b) α i and b are obtained through the quadratic programming problem. Kernel function: Inner product of nonlinear maps φ(x): k(x i, x j ) = φ(x i ) ・ φ(x j ) Support vectors: x si (1 ≤ i ≤ n’ (≤ n)): a part of sample data Feature extraction process is implicitly done in SVM through the kernel function and support vectors.

4 Kernel principal component analysis (kPCA) Principal components for for the nonlinear map φ(x i ) are obtained through the eigenproblem: Φv =λv (Φ: Kernel matrix (Φ ij = φ(x i )∙φ(x j ) = k(x i, x j )) ) Let v r = (v r1, …, v rn ) T (1 ≤ r ≤ R ( ≤ n)) be the eigenvectors in the non-increasing order of the corresponding non-zero eigenvalues λ r, which are normalized as λ r v r ∙ T v r = 1. The rth principal component u r for a new data x is obtained by u r = ∑ i=1 n v ri φ(x i )∙φ(x) = ∑ i=1 n v ri k(x i, x) Classification methods, e.g., the nearest-neighbor method, can be applied in the principal component space (u 1, ∙∙∙, u R ).

5 Kernel canonical correlation analysis (kCCA) Pairs of feature vectors of sample objects: (x i, y i ) (1 ≤ i ≤ n) KCCA finds projections ( canonical variates ) (u, v) that yield maximum correlation between φ (x) and θ( y). (u, v) = (w φ ・ φ(x), w θ ・ θ(y)) w φ = ∑ i=1 n f i φ(x i ), w θ = ∑ i=1 n g i θ(y i ) where f T = (f 1, ∙∙∙, f n ) and g T = (g 1, ∙∙∙, g n ) are the eigenvectors of the generalized eigenvalue problem: Φ ij = φ(x i ) ・ φ(x j ) Θ ij = θ(y i ) ・ θ(y j ) I: Identity matrix of n×n

6 Application of KCCA for classification problems Use an indicator vector as the second feature vector y. y = (y 1, ∙∙∙, y nc ) corresponding to x: y c = 1 if x belongs to class c y c = 0 otherwise (n c : the number of classes) Mapping θ of y is not used. A total of n c -1 eigenvectors f r = ( f r1, …, f kn ) (1 ≤ k ≤ n c -1) corresponding to non-zero eigenvalues are obtained. Canonical variates u r (1 ≤ r ≤ n c -1) for a new object (x, ?) are calculated by u r = ∑ r=1 n f r φ(x r ) ・ φ(x) = ∑ r=1 n f r k(x r, x) Classification methods can be applied in the canonical variate space (u 1, …, u nc-1 ).

7 Correlation kernel The kth-order autocorrelation of data x i (t): r xi (t 1, t 2, ∙∙∙, t k-1 ) = ∫x i (t)x i (t+t 1 ) ・・・ x i (t+t k-1 )dt The inner product between r xi and r xj is calculated with the k-th power of the cross-correlation function (2nd-order): r xi ・ r xj =∫{cc xi, xj (t 1 )} k dt 1 (cc xi, xj (t 1 ) =∫x i (t)x j (t+t 1 )dt) The calculation of explicit values of the autocorrelations is avoided. → High-order autocorrelations are tractable with practical computational cost. ・ Linear correlation kernel: K(x i, x j ) = r xi ・ r xj ・ Gaussian correlation kernel: K(x i, x j ) = exp(-μ|r xi - r xj | 2 ) = exp(-μ(r xi ・ r xj + r xi ・ r xj - 2r xi ・ r xj ))

8 Calculation of correlation kernels r xi ・ r xj for 2- dimensional image data: x(l, m) (1≤ l ≤ L, 1≤ m ≤ M) ・ Calculate the cross-correlations between x i (l, m) and x j (l, m): cc xi, xj (l 1, m 1 ) = ∑ l=1 L-l1 ∑ m=1 M-m1 x i (l, m)x j (l+l 1, m+m 1 )/(LM) (1 ≤ l 1 ≤ L 1, 1 ≤ m 1 ≤ M 1 ) ・ Sum up the kth-power of the cross-correlations: r xi ・ r xj = ∑ l1=0 L1-1 ∑ m1=0 M1-1 {cc xi, xj (l 1, m 1 )} k /(L 1 M 1 ) L M M1M1 L1L1 x i (l, m) x j (l+l 1, m+m 1 ) ∑ l,m x i (l+m)x j (l+l 1, m+m 1 ) r xi ・ r xj = ∑ l1, m1 { ・ } k

9 Problem of correlation kernels The order k of correlation kernels increases. → The generalization ability and robustness are lost. r xi ・ r xj = ∑ t1 (cc xi, xj (t 1 )) k → δ i, j (k → ∞) For test data x (≠x i ), r xi ・ r x = 0 In kCCA, Φ= I, Θ: block matrix, eigenvectors: f = (p 1, …, p 1, p 2, …, p 2, …, p C, …, p C ) (f i = p c, if x i ∊ class c) For sample data, canonical variates lie on a line through the origin corresponding to its class: u xi = (r xi ・ r xi )p c (p c = (p c,1, ∙∙∙, p c,C-1 )), if x i ∊ class c For test data: u x ≈ 0

10 Fig. A. Scatter diagram of canonical variates (u1, u2) and (u3, u1) of Test 1 data of texture images in the Brodatz album in kCCA. Plotted are square ( ■ ) for D4, cross (×) for D84, circle ( ● ) for D5 and triangle (Δ) for D92. (a) linear kernel ( ⅰ )(b) Gaussian kernel ( ⅱ ) (c) 2nd-order correlation kernel ( ⅲ ) (d) 3rd-order correlation kernel ( ⅲ ) (e) 4th-order correlation kernel ( ⅲ )(f) 10th-order correlation kernel ( ⅲ ) Most of test data u ≈ 0

11 Modification of correlation kernels ・ The kth root of the kth-order correlation kernel in the limit of k→∞ is related to the max norm, which corresponds to the L p norm ||x|| p = {∑|x i | p } 1/p in the limit of p→∞. The max norm corresponds to the peak response of a matched filter, which maximizes SNR, and is then expected to have robustness. Then the correlation kernel can be modified with its kth root, taking account of its sign. ・ A difference between the even and odd-order correlations is that the odd-order autocorrelations are blind to sinusoidal signals and random signals with symmetric distributions. This is attributed to the fact that changes in the sign of the original data (x→-x) cause changes in the signs of the autocorrelations of odd-orders but not of even-orders. In the correlation kernel, it appears as the parity of the number of the power of the cross-correlations. Then the absolute values of the cross- correlations might be used instead.

12 Proposed modified autocorrelation kernels L p norm kernel (P) : sgn(cc xi, xj (l 1, m 1 ))|∑ l1,m1 {cc xi, xj (l 1, m 1 )} k | 1/k Absolute kernel (A) : ∑ l1, m1 |cc xi, xj (l 1, m 1 )| k Absolute L p norm kernel (AP): |∑ l1, m1 {cc xi, xj (l 1, m 1 )} k | 1/k Absolute L p norm absolute kernel (APA): |∑ l1, m1 |cc xi, xj (l 1, m 1 )| k | 1/k Max norm kernel (Max): max l1, m1 cc xi, xj (l 1, m 1 ) Max norm absolute kernel (MaxA): max l1, m1 |cc xi, xj (l 1, m 1 )|

13 Classification experiment Fig. 1. Texture images. Table 1. Sample and test sets. 4-class classification problems with SVM, kPCA and kCCA Original images: 512×512 pixels (256 gray scale) in the VisTex database and the Brodatz album Sample and test images: 50×50 pixels, chosen in the original images with random shift and scaling, rotation, Gaussian noise (100 images each)

14 Kernel functions K(x i, x j ) Linear kernel: x i ・ x j Gaussian kernel: exp(-μ||x i – x j || 2 ) Correlation kernels: r xi ・ r xj (C2-10) Modified correlation kernels: (P2-10, A3-7, AP3-7, APA3-7, Max, MaxA) Range of correlation lags: L 1 = M 1 = 10 (in 50×50 pixel images) The simple nearest-neighbor classifier is used for classification in the principal component space (u 1, ∙∙∙, u R ) in kPCA and with canonical variate space (u 1, …, u C-1 ) in kCCA. Parameter values are empirically chosen. (Soft margin: C = 100, Regularization:γ x =γ y = 0.1)

15 Fig. 2. Correct classification rates (CCR (%)) in SVM.

16 Fig. 3. Correct classification rates (CCR (%)) in kPCA.

17 Fig. 4. Correct classification rates (CCR (%)) in kCCA.

18 Comparison of the performance Correct classification rates (CCRs) of the correlation kernels (C2- 10) of odd- or higher-orders are low. With the modification, the L p norm kernels (P2-10) and the absolute kernels (A3-7) give high CCRs even for higher-orders and for odd-orders, respectively. Their combination (AP3-7, APA3-7), and the max norm kernels (Max, MaxA) also show good performance. Table 2. Highest correct classification rates.

19 Summary Modified versions of the correlation kernels are proposed. ・ Apply of the L p norm and max norm → The poor generalization of the higher-order correlation kernels is improved. ・ Use of the absolute values → The inferior performance of the correlation kernels of odd-orders to even-orders due to the blindness to sinusoidal or symmetrically distributed signals is also improved. SVMs, kPCA and kCCA with the modified correlation kernels show good performance in texture classification experiments.