Download presentation
Presentation is loading. Please wait.
Published byBrian Riley Modified over 9 years ago
1
C OMBINING E NSEMBLE T ECHNIQUE OF S UPPORT V ECTOR M ACHINES WITH THE O PTIMAL K ERNEL M ETHOD FOR H IGH D IMENSIONAL D ATA C LASSIFICATION I-Ling Chen 1, Bor-Chen Kuo 1, Chen-Hsuan Li 2, Chih-Cheng Hung 3 1 Graduate Institute of Educational Measurement and Statistics, National Taichung University, Taichung, Taiwan, R.O.C. 2 Department of Electrical and Control Engineering, National Chiao Tung University, Taiwan, R.O.C. 3 School of Computing and Software Engineering, Southern Polytechnic State University, GA, U.S.A.
2
Introduction Statement of problems The Objective Literature Review Support Vector Machines –Kernel method Multiple Classifier System –Random subspace method, Dynamic subspace method An Optimal Kernel Method for selecting RBF Kernel Parameter Optimal Kernel-based Dynamic Subspace Method Experimental Design and Results Conclusion and Future Work Outline
3
INTRODUCTION
4
or so called curse of dimensionality, peaking phenomenon Small sample size, N High dimensionality, d low performance Hughes Phenomenon (Hughes, 1968)
5
Proposed by Vapnik and Coworkers (1992, 1995, 1996, 1997, 1998) It’s robust and effect to Hughes phenomenon. (Bruzzone & Persello, 2009; Camps-Valls, Gomez-Chova, Munoz-Mari, Vila-Frances, Calpe- Maravilla,2006; Melgani & Bruzzone,2004; Camps-Valls & Bruzzone, 2005; Fauvel, Chanussot, & Benediktsson, 2006) SVM includes Kernel Trick Support Vector Learning Support Vector Machines (SVM)
6
The Goal of Kernel Method for Classification The samples in the same class can be mapped into the same area. The samples in the different classes can be mapped into the different areas.
7
SV learning tries to learn a linear separating hyperplane for a two-class classification problem via a given training set. Illustration of SV learning with kernel trick: optimal hyperplane support vectors margins support vector Support Vector Learning nonlinear feature mapping
8
Multiple Classifier System There are two effective approaches for generating an ensemble of diverse base classifiers via different feature subsets. (Ho, T. K.,1998 ; Yang, J-M., Kuo, B-C., Yu,P-T. & Chuang, C-H. 2010) Kuncheva, L. I. (2004). Combining Pattern Classifiers: Methods and Algorithms. Hoboken, NJ: Wiley & Sons. Approaches to building classifier ensembles.
9
THE FRAMEWORK OF RANDOM SUBSPACE METHOD (RSM) BASED ON SVM (HO, 1998) Given the learning algorithm, SVM, and the ensemble size, S.
10
THE INADEQUACIES OF RSM Given the learning algorithm, SVM, and the ensemble size, S. * Irregular Rule Each individual feature potentially possesses the different discriminate power for classification. A randomized strategy for selecting feature is unable to distinguish between informative features and redundant ones. * Implicit Number How to choose a suitable subspace dimensionality for the SVM. Without an appropriate subspace dimensionality for the SVM, RSM might be inferior to a single classifier. random features selection Given w
11
Two importance distributions –Importance distribution of feature weight, W distribution to model the selected probability of each feature. –Importance distribution of subspace dimensionality, R distribution to automatically determine the suitable subspace size. Initialization R0R0 Kernel smoothing 14997145191 0 1 2 3 4 Feature Density (%) Class separability of LDA for each featureRe-substitution accuracy for each feature DYNAMIC SUBSPACE METHOD (DSM) (Yang et al., 2010)
12
THE FRAMEWORK OF DSM BASED ON SVM Given the learning algorithm, SVM, and the ensemble size, S.
13
INADEQUACIES OF DSM Given the learning algorithm, SVM, and the ensemble size, S. * Kernel function The SVM algorithm provides an effective way to perform supervised classification. However, The kernel function is a critical topic to influence the performance of SVM. * time-consuming Choosing a proper kernel function or a better parameter of kernel for SVM is quite important yet ordinarily time-consuming. Especially, an updating R distribution is obtained by the resubstitution accuracy in DSM.
14
The performances of SVM are based on choosing the proper kernel functions or proper parameters of a kernel function. Li, Lin, Kuo, and Chu (2010) present a novel criterion to choose a proper parameter σ of RBF kernel function automatically. An Optimal Kernel Method for Selecting RBF Kernel Parameter Gaussian Radial Basis Function (RBF) kernel : In the feature space determined by the RBF kernel, the norm of every sample is one, and the kernel values are positive. Hence, the samples will be mapped onto the surface of a hypersphere.
15
Kernel-based Dynamic Subspace Method (KDSM)
16
THE FRAMEWORK OF KDSM Original Dataset X Separability Feature (Band) Kernel based Feature Selection Distribution M dist Multiple Classifiers Subspace Pool (Reduced Dataset) Decision Fusion (Majority Voting) Kernel based W distribution Kernel Space (L-dimension) Optimal RBF Kernel Algorithm + Kernel Smoothing Optimal RBF Kernel Algorithm Until the performance of classification is stable
17
Experiment Design AlgorithmDescription SVM_CV Without any dimension reduction on only a single SVM with CV method SVM_OP Without any dimension reduction on only a single SVM with OP method DSM_W ACC DSM with the re-substitution accuracy as the feature weights DSM_ W LDA DSM with the separability of Fisher’s LDA as the feature weights KDSM Kernel-based dynamic subspace method proposed in this research OP : the optimal method to choose CV : 5-fold cross-validation We use the grid search within a range [0.01, 10] (suggested by Bruzzone & Persello, 2009) to choose a proper parameter (2σ 2 ) of RBF kernel and a set {0.1, 1, 10, 20, 60, 100, 160, 200, 1000} to choose a proper parameter of slack variable to control the margins.
18
Hyperspectral Image data EXPERIMENTAL DATASET IR Image Image (No. of bands) Washington, DC Mall (dims d=191) # of classes7 Category (No. of labeled data) Roof (3776) Road (1982) Path (737) Grass (2870) Tree (1430) Water (1156) Shadow (840)
19
Experimental Results MethodSVM_CVSVM_OP DSM_ W ACC DSM_ W LDA KDSM Case 1 Accuracy (%) 83.6683.7985.4987.47 88.64 CPU Time (sec) 30.353.106045.312188.62155.31 Case 2 Accuracy (%) 86.3987.8988.7489.43 92.53 CPU Time (sec) 116.026.6521113.754883.92308.26 Case 3 Accuracy (%) 94.6995.3195.9496.94 97.43 CPU Time (sec) 5858.18376.991165048.6220121.6217847.7 There are three cases in Washington, DC Mall. case 1: ; case 2: case 3: : the number of training samples in class i : the number of all training samples
20
Experiment Results in Washington, DC Mall Method Case 1Case 2Case 3 AccuracyRatioAccuracyRatioAccuracyRatio DSM_W ACC 85.49%38.92488.74%68.49395.94%65.277 DSM_W LDA 87.47%14.09289.43%15.84496.94%12.333 KDSM88.64%192.53%197.43%1 The outcome of classification by using various multiple classifier systems:
21
Classification Maps with N i =20 in Washington, DC Mall □ Background ■ Water ■ Tree ■ Path ■ Grass ■ Roof ■ Road ■ Shadow SVM_CV SVM_OP DSM_W ACC DSM_W LDA KDSM
22
Classification Maps (roof) with N i =40 □ Background ■ Water ■ Tree ■ Path ■ Grass ■ Roof ■ Road ■ Shadow SVM_CV SVM_OP DSM_W ACC DSM_W LDA KDSM
23
Classification Maps with N i =300 in Washington, DC Mall □ Background ■ Water ■ Tree ■ Path ■ Grass ■ Roof ■ Road ■ Shadow SVM_CV SVM_OP DSM_W ACC DSM_W LDA KDSM
24
In this paper, the core of the presented method, KDSM, is applying both optimal algorithm of selecting the proper RBF parameter and dynamic subspace method in the subspace selection based MCS to improve the result of classification in high dimensional dataset. The experimental results showed that the classification accuracies of KDSM invariably are the best among outcomes of all classifiers in each cases of Washington DC Mall datasets. Moreover, these results show that comparing with DSM, the KDSM can not only obtain more accurate outcome of classification but also economize on computer time. Conclusions
25
25 Thank You
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.