Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis Haesun Park Georgia Institute of Technology, Atlanta, GA, USA (joint work.

Slides:



Advertisements
Similar presentations
Component Analysis (Review)
Advertisements

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Olivier Duchenne , Armand Joulin , Jean Ponce Willow Lab , ICCV2011.
Input Space versus Feature Space in Kernel- Based Methods Scholkopf, Mika, Burges, Knirsch, Muller, Ratsch, Smola presented by: Joe Drish Department of.
Dimension reduction (1)
Minimum Redundancy and Maximum Relevance Feature Selection
FODAVA-Lead: Dimension Reduction and Data Reduction: Foundations for Visualization Haesun Park Division of Computational Science and Engineering College.
G. Valenzise *, L. Gerosa, M. Tagliasacchi *, F. Antonacci *, A. Sarti * IEEE Int. Conf. On Advanced Video and Signal-based Surveillance, 2007 * Dipartimento.
Discriminative and generative methods for bags of features
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8
Face Recognition Under Varying Illumination Erald VUÇINI Vienna University of Technology Muhittin GÖKMEN Istanbul Technical University Eduard GRÖLLER Vienna.
Principal Component Analysis
Unsupervised Learning - PCA The neural approach->PCA; SVD; kernel PCA Hertz chapter 8 Presentation based on Touretzky + various additions.
CS 790Q Biometrics Face Recognition Using Dimensionality Reduction PCA and LDA M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive.
An Introduction to Kernel-Based Learning Algorithms K.-R. Muller, S. Mika, G. Ratsch, K. Tsuda and B. Scholkopf Presented by: Joanna Giforos CS8980: Topics.
Eigenfaces As we discussed last time, we can reduce the computation by dimension reduction using PCA –Suppose we have a set of N images and there are c.
Face Recognition using PCA (Eigenfaces) and LDA (Fisherfaces)
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Principal Component Analysis Barnabás Póczos University of Alberta Nov 24, 2009 B: Chapter 12 HRF: Chapter 14.5.
Effective Dimension Reduction with Prior Knowledge Haesun Park Division of Computational Science and Eng. College of Computing Georgia Institute of Technology.
Comparing Kernel-based Learning Methods for Face Recognition Zhiguo Li
Summarized by Soo-Jin Kim
Machine Learning CS 165B Spring Course outline Introduction (Ch. 1) Concept learning (Ch. 2) Decision trees (Ch. 3) Ensemble learning Neural Networks.
Enhancing Tensor Subspace Learning by Element Rearrangement
Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)
Probability of Error Feature vectors typically have dimensions greater than 50. Classification accuracy depends upon the dimensionality and the amount.
Recognition Part II Ali Farhadi CSE 455.
This week: overview on pattern recognition (related to machine learning)
Feature extraction 1.Introduction 2.T-test 3.Signal Noise Ratio (SNR) 4.Linear Correlation Coefficient (LCC) 5.Principle component analysis (PCA) 6.Linear.
General Tensor Discriminant Analysis and Gabor Features for Gait Recognition by D. Tao, X. Li, and J. Maybank, TPAMI 2007 Presented by Iulian Pruteanu.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
IEEE TRANSSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Basics of Neural Networks Neural Network Topologies.
Using Support Vector Machines to Enhance the Performance of Bayesian Face Recognition IEEE Transaction on Information Forensics and Security Zhifeng Li,
Classification Course web page: vision.cis.udel.edu/~cv May 12, 2003  Lecture 33.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
Local Fisher Discriminant Analysis for Supervised Dimensionality Reduction Presented by Xianwang Wang Masashi Sugiyama.
1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh
Discriminant Analysis
Speech Lab, ECE, State University of New York at Binghamton  Classification accuracies of neural network (left) and MXL (right) classifiers with various.
PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.
Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
Dimensionality reduction
Data Mining and Decision Support
June 25-29, 2006ICML2006, Pittsburgh, USA Local Fisher Discriminant Analysis for Supervised Dimensionality Reduction Masashi Sugiyama Tokyo Institute of.
Principal Component Analysis and Linear Discriminant Analysis for Feature Reduction Jieping Ye Department of Computer Science and Engineering Arizona State.
2D-LDA: A statistical linear discriminant analysis for image matrix
A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.
3D Face Recognition Using Range Images Literature Survey Joonsoo Lee 3/10/05.
Computer Vision Lecture 7 Classifiers. Computer Vision, Lecture 6 Oleh Tretiak © 2005Slide 1 This Lecture Bayesian decision theory (22.1, 22.2) –General.
Finding Clusters within a Class to Improve Classification Accuracy Literature Survey Yong Jae Lee 3/6/08.
Unsupervised Learning II Feature Extraction
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
CSE 4705 Artificial Intelligence
Principal Component Analysis (PCA)
CS 9633 Machine Learning Support Vector Machines
LECTURE 11: Advanced Discriminant Analysis
Background on Classification
Recognition with Expression Variations
Machine Learning Dimensionality Reduction
Outline Peter N. Belhumeur, Joao P. Hespanha, and David J. Kriegman, “Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection,”
PCA vs ICA vs LDA.
Principal Component Analysis
Feature space tansformation methods
What is Artificial Intelligence?
Presentation transcript:

Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis Haesun Park Georgia Institute of Technology, Atlanta, GA, USA (joint work with C. Park) KAIST, Korea, June 2007

Clustering

 Clustering : grouping of data based on similarity measures

 Classification: assign a class label to new unseen data Classification

Data Mining Data Preparation Preprocessing ClassificationClustering Association Analysis Regression Probabilistic modeling … Dimension reduction -Feature Selection - Data Reduction Mining or discovery of new information - patterns or rules - from large databases Feature Extraction

Optimal feature extraction - Reduce the dimensionality of data space - Minimize effects of redundant features and noise Apply a classifier to predict a class label of new data feature extraction.. number of features new data Curse of dimensionality

Linear dimension reduction Maximize class separability in the reduced dimensional space

Linear dimension reduction Maximize class separability in the reduced dimensional space

What if data is not linear separable? Nonlinear Dimension Reduction

Contents Linear Discriminant Analysis Nonlinear Dimension Reduction based on Kernel Methods - Nonlinear Discriminant Analysis Application to Fingerprint Classification

For a given data set {a 1, ┉, a n } Within-class scatter matrix trace(S w ) Centroids : Linear Discriminant Analysis (LDA)

Between-class scatter matrix trace(S b ) GTGT → maximize minimize trace(G T S w G) trace(G T S b G) a1┉ ana1┉ an GTa1┉ GTanGTa1┉ GTan

Eigenvalue problem S w -1 S b G = S w -1 S b X =  X rank(S b )  number of classes - 1

Face Recognition … … 92 x … GTGT … ? dimension reduction to maximize the distances among classes.

Text Classification A bag of words: each document is represented with frequencies of words contained Education Faculty Student Syllabus Grade Tuition …. Recreation Movie Music Sport Hollywood Theater ….. GTGT

SbSb SwSw Generalized LDA Algorithms Undersampled problems: high dimensionality & small number of data  Can’t compute S w -1 S b

Nonlinear Dimension Reduction based on Kernel Methods

Nonlinear Dimension Reduction GTGT nonlinear mapping linear dimension reduction

Kernel Method If a kernel function k(x,y) satisfies Mercer’s condition, then there exists a mapping  for which = k(x,y) holds A  (A) = k(x,y) For a finite data set A=[a 1, …,a n ], Mercer’s condition can be rephrased as the kernel matrix is positive semi-definite. 

Nonlinear Dimension Reduction by Kernel Methods GTGT Given a kernel function k(x,y) linear dimension reduction

Positive Definite Kernel Functions Gaussian kernel Polynomial kernel

Nonlinear Discriminant Analysis using Kernel Methods {a 1,a 2,…,a n } S b x= S w x {  (a 1 ),…,  (a n )} Want to apply LDA = k(x,y) 

Nonlinear Discriminant Analysis using Kernel Methods {a 1,a 2,…,a n } S b x= S w x {  (a 1 ),…,  (a n )} k(a 1,a 1 ) k(a 1,a n ) …,…, … k(a n,a 1 ) k(a n,a n ) S b u= S w u Apply Generalized LDA Algorithms 

SbSb SwSw Generalized LDA Algorithms Minimize trace(x T S w x) x T S w x = 0 x  null(S w ) Maximize trace(x T S b x) x T S b x ≠ 0 x  range(S b )

Generalized LDA algorithms Add a positive diagonal matrix  I to S w so that S w +  I is nonsingular RLDA LDA/GSVD Apply the generalized singular value decomposition (GSVD) to {H w, H b } in S b = H b H b T and S w =H w H w T To-N(S w ) Projection to null space of S w Maximize between-class scatter in the projected space

Generalized LDA Algorithms To-R(S b ) Transformation to range space of S b Diagonalize within-class scatter matrix in the transformed space To-NR(S w ) Reduce data dimension by PCA Maximize between-class scatter in range(S w ) and null(S w )

Data sets Data dim no. of data no. of classes Musk Isolet Car Mfeature Bcancer Bscale From Machine Learning Repository Database

Experimental Settings Split kernel function k and a linear transf. G T Dimension reducing Predict class labels of test data using training data Original data Training dataTest data

Each color represents different data sets methods Prediction accuracies

Linear and Nonlinear Discriminant Analysis Data sets

Face Recognition

Application of Nonlinear Discriminant Analysis to Fingerprint Classification

Left Loop Right Loop Whorl Arch Tented Arch Fingerprint Classification From NIST Fingerprint database 4

Previous Works in Fingerprint Classification Feature representation Minutiae Gabor filtering Directional partitioning Apply Classifiers: Neural Networks Support Vector Machines Probabilistic NN Our Approach Construct core directional images by DFT Dimension Reduction by Nonlinear Discriminant Analysis

Construction of Core Directional Images Left Loop Right Loop Whorl

Construction of Core Directional Images Core Point

Discrete Fourier transform (DFT)

Construction of Directional Images  Computation of local dominant directions by DFT and directional filtering  Core point detection  Reconstruction of core directional images Fast computation of DFT by FFT Reliable for low quality images

 Computation of local dominant directions by DFT and directional filtering

Construction of Directional Images 105 x x 512

Nonlinear discriminant Analysis … … 105 x dim. space GTGT Left loop WhorlRight loop Tented arch Arch Maximizing class separability in the reduced dimensional space 4- dim. space

Comparison of Experimental Results NIST Database 4 Rejection rate (%) Nonlinear LDA/GSVD PCASYS +  Jain et.al. [1999,TPAMI] Yao et al. [2003,PR] prediction accuracies (%)

Summary Nonlinear Feature Extraction based on Kernel Methods - Nonlinear Discriminant Analysis - Kernel Orthogonal Centroid Method (KOC) A comparison of Generalized Linear and Nonlinear Discriminant Analysis Algorithms Application to Fingerprint Classification

Dimension reduction - feature transformation : linear combination of original features Feature selection : select a part of original features gene expression microarray data anaysis -- gene selection Visualization of high dimensional data Visual data mining

θ i,j : dominant direction on the neighborhood centered at (i, j) Measure consistency of local dominant directions | ΣΣ i,j=-1,0,1 [cos(2θ i,j ), sin(2θ i,j )] | : distance from the starting point to finishing point the lowest value -> Core point  Core point detection

References L.Chen et al., A new LDA-based face recognition system which can solve the small sample size problem, Pattern Recognition, 33: , 2000 P.Howland et al., Structure preserving dimension reduction for clustered text data based on the generalized singular value decomposition, SIMAX, 25(1): , 2003 H.Yu and J.Yang, A direct LDA algorithm for high-dimensional data-with application to face recognition, Pattern Recognition, 34: , 2001 J.Yang and J.-Y.Yang, Why can LDA be performed in PCA transformed space?, Pattern Recognition, 36: , 2003 H. Park et al., Lower dimensional representation of text data based on centroids and least squares, BIT Numerical Mathematics, 43(2):1-22, 2003 S. Mika et al., Fisher discriminant analysis with kernels, Neural networks for signal processing IX, J.Larsen and S.Douglas, pp.41-48, IEEE, 1999 B. Scholkopf et al., Nonlinear component analysis as a kernel eigenvalue problem, Neural computation, 10: , 1998 G. Baudat and F. Anouar, Generalized discriminant analysis using a kernel approach, Neural computation, 12: , 2000 V. Roth and V. Steinhage, Nonlinear discriminant analysis using a kernel functions, Advances in neural information processing functions, 12: ,

S.A. Billings and K.L. Lee, Nonlinear fisher discriminant analysis using a minimum squared error cost function and the orthogonal least squares algorithm, Neural networks, 15(2): , 2002 C.H. Park and H. Park, Nonlinear discriminant analysis based on generalized singular value decomposition, SIMAX, 27-1, pp , 2005 A.K.Jain et al., A multichannel approach to fingerprint classification, IEEE transactions on Pattern Analysis and Machine Intelligence, 21(4): ,1999 Y.Yao et al., Combining flat and structural representations for fingerprint classifiaction with recursive neural networks and support vector machines, Pattern recognition, 36(2): ,2003 C.H.Park and H.Park, Nonlinear feature extraction based on cetroids and kernel functions, Pattern recognition, 37(4): C.H.Park and H.Park, A Comparison of Generalized LDA algorithms for undersampled problems, Pattern Recognition, to appear C.H.Park and H.Park, Fingerprint classification using fast fourier transform and nonlinear discriminant analysis, Pattern recognition, 2006