Download presentation
Presentation is loading. Please wait.
Published byMatilda Lamb Modified over 9 years ago
1
Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis Haesun Park Georgia Institute of Technology, Atlanta, GA, USA (joint work with C. Park) KAIST, Korea, June 2007
2
Clustering
3
Clustering : grouping of data based on similarity measures
4
Classification: assign a class label to new unseen data Classification
5
Data Mining Data Preparation Preprocessing ClassificationClustering Association Analysis Regression Probabilistic modeling … Dimension reduction -Feature Selection - Data Reduction Mining or discovery of new information - patterns or rules - from large databases Feature Extraction
6
Optimal feature extraction - Reduce the dimensionality of data space - Minimize effects of redundant features and noise Apply a classifier to predict a class label of new data feature extraction.. number of features new data Curse of dimensionality
7
Linear dimension reduction Maximize class separability in the reduced dimensional space
8
Linear dimension reduction Maximize class separability in the reduced dimensional space
9
What if data is not linear separable? Nonlinear Dimension Reduction
10
Contents Linear Discriminant Analysis Nonlinear Dimension Reduction based on Kernel Methods - Nonlinear Discriminant Analysis Application to Fingerprint Classification
11
For a given data set {a 1, ┉, a n } Within-class scatter matrix trace(S w ) Centroids : Linear Discriminant Analysis (LDA)
12
Between-class scatter matrix trace(S b ) GTGT → maximize minimize trace(G T S w G) trace(G T S b G) a1┉ ana1┉ an GTa1┉ GTanGTa1┉ GTan
13
Eigenvalue problem S w -1 S b G = S w -1 S b X = X rank(S b ) number of classes - 1
14
Face Recognition … … 92 x 112 10304 … GTGT … ? dimension reduction to maximize the distances among classes.
15
Text Classification A bag of words: each document is represented with frequencies of words contained Education Faculty Student Syllabus Grade Tuition …. Recreation Movie Music Sport Hollywood Theater ….. GTGT
16
SbSb SwSw Generalized LDA Algorithms Undersampled problems: high dimensionality & small number of data Can’t compute S w -1 S b
17
Nonlinear Dimension Reduction based on Kernel Methods
18
Nonlinear Dimension Reduction GTGT nonlinear mapping linear dimension reduction
19
Kernel Method If a kernel function k(x,y) satisfies Mercer’s condition, then there exists a mapping for which = k(x,y) holds A (A) = k(x,y) For a finite data set A=[a 1, …,a n ], Mercer’s condition can be rephrased as the kernel matrix is positive semi-definite.
20
Nonlinear Dimension Reduction by Kernel Methods GTGT Given a kernel function k(x,y) linear dimension reduction
21
Positive Definite Kernel Functions Gaussian kernel Polynomial kernel
22
Nonlinear Discriminant Analysis using Kernel Methods {a 1,a 2,…,a n } S b x= S w x { (a 1 ),…, (a n )} Want to apply LDA = k(x,y)
23
Nonlinear Discriminant Analysis using Kernel Methods {a 1,a 2,…,a n } S b x= S w x { (a 1 ),…, (a n )} k(a 1,a 1 ) k(a 1,a n ) …,…, … k(a n,a 1 ) k(a n,a n ) S b u= S w u Apply Generalized LDA Algorithms
24
SbSb SwSw Generalized LDA Algorithms Minimize trace(x T S w x) x T S w x = 0 x null(S w ) Maximize trace(x T S b x) x T S b x ≠ 0 x range(S b )
25
Generalized LDA algorithms Add a positive diagonal matrix I to S w so that S w + I is nonsingular RLDA LDA/GSVD Apply the generalized singular value decomposition (GSVD) to {H w, H b } in S b = H b H b T and S w =H w H w T To-N(S w ) Projection to null space of S w Maximize between-class scatter in the projected space
26
Generalized LDA Algorithms To-R(S b ) Transformation to range space of S b Diagonalize within-class scatter matrix in the transformed space To-NR(S w ) Reduce data dimension by PCA Maximize between-class scatter in range(S w ) and null(S w )
27
Data sets Data dim no. of data no. of classes Musk 166 6599 2 Isolet 617 7797 26 Car 6 1728 4 Mfeature 649 2000 10 Bcancer 9 699 2 Bscale 4 625 3 From Machine Learning Repository Database
28
Experimental Settings Split kernel function k and a linear transf. G T Dimension reducing Predict class labels of test data using training data Original data Training dataTest data
29
Each color represents different data sets methods Prediction accuracies
30
Linear and Nonlinear Discriminant Analysis Data sets
31
Face Recognition
32
Application of Nonlinear Discriminant Analysis to Fingerprint Classification
33
Left Loop Right Loop Whorl Arch Tented Arch Fingerprint Classification From NIST Fingerprint database 4
34
Previous Works in Fingerprint Classification Feature representation Minutiae Gabor filtering Directional partitioning Apply Classifiers: Neural Networks Support Vector Machines Probabilistic NN Our Approach Construct core directional images by DFT Dimension Reduction by Nonlinear Discriminant Analysis
35
Construction of Core Directional Images Left Loop Right Loop Whorl
36
Construction of Core Directional Images Core Point
37
Discrete Fourier transform (DFT)
39
Construction of Directional Images Computation of local dominant directions by DFT and directional filtering Core point detection Reconstruction of core directional images Fast computation of DFT by FFT Reliable for low quality images
40
Computation of local dominant directions by DFT and directional filtering
41
Construction of Directional Images 105 x 105 512 x 512
42
Nonlinear discriminant Analysis … … 105 x 105 11025- dim. space GTGT Left loop WhorlRight loop Tented arch Arch Maximizing class separability in the reduced dimensional space 4- dim. space
43
Comparison of Experimental Results NIST Database 4 Rejection rate (%) 0 1.8 8.5 20.0 Nonlinear LDA/GSVD 90.7 91.3 92.8 95.3 PCASYS + 89.7 90.5 92.8 95.6 Jain et.al. [1999,TPAMI] - 90.0 91.2 93.5 Yao et al. [2003,PR] - 90.0 92.2 95.6 prediction accuracies (%)
44
Summary Nonlinear Feature Extraction based on Kernel Methods - Nonlinear Discriminant Analysis - Kernel Orthogonal Centroid Method (KOC) A comparison of Generalized Linear and Nonlinear Discriminant Analysis Algorithms Application to Fingerprint Classification
45
Dimension reduction - feature transformation : linear combination of original features Feature selection : select a part of original features gene expression microarray data anaysis -- gene selection Visualization of high dimensional data Visual data mining
46
θ i,j : dominant direction on the neighborhood centered at (i, j) Measure consistency of local dominant directions | ΣΣ i,j=-1,0,1 [cos(2θ i,j ), sin(2θ i,j )] | : distance from the starting point to finishing point the lowest value -> Core point Core point detection
47
References L.Chen et al., A new LDA-based face recognition system which can solve the small sample size problem, Pattern Recognition, 33:1713-1726, 2000 P.Howland et al., Structure preserving dimension reduction for clustered text data based on the generalized singular value decomposition, SIMAX, 25(1):165-179, 2003 H.Yu and J.Yang, A direct LDA algorithm for high-dimensional data-with application to face recognition, Pattern Recognition, 34:2067-2070, 2001 J.Yang and J.-Y.Yang, Why can LDA be performed in PCA transformed space?, Pattern Recognition, 36:563-566, 2003 H. Park et al., Lower dimensional representation of text data based on centroids and least squares, BIT Numerical Mathematics, 43(2):1-22, 2003 S. Mika et al., Fisher discriminant analysis with kernels, Neural networks for signal processing IX, J.Larsen and S.Douglas, pp.41-48, IEEE, 1999 B. Scholkopf et al., Nonlinear component analysis as a kernel eigenvalue problem, Neural computation, 10:1299-1319, 1998 G. Baudat and F. Anouar, Generalized discriminant analysis using a kernel approach, Neural computation, 12:2385-2404, 2000 V. Roth and V. Steinhage, Nonlinear discriminant analysis using a kernel functions, Advances in neural information processing functions, 12:568-574, 2000..
48
S.A. Billings and K.L. Lee, Nonlinear fisher discriminant analysis using a minimum squared error cost function and the orthogonal least squares algorithm, Neural networks, 15(2):263-270, 2002 C.H. Park and H. Park, Nonlinear discriminant analysis based on generalized singular value decomposition, SIMAX, 27-1, pp. 98-102, 2005 A.K.Jain et al., A multichannel approach to fingerprint classification, IEEE transactions on Pattern Analysis and Machine Intelligence, 21(4):348-359,1999 Y.Yao et al., Combining flat and structural representations for fingerprint classifiaction with recursive neural networks and support vector machines, Pattern recognition, 36(2):397-406,2003 C.H.Park and H.Park, Nonlinear feature extraction based on cetroids and kernel functions, Pattern recognition, 37(4):801-810 C.H.Park and H.Park, A Comparison of Generalized LDA algorithms for undersampled problems, Pattern Recognition, to appear C.H.Park and H.Park, Fingerprint classification using fast fourier transform and nonlinear discriminant analysis, Pattern recognition, 2006
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.