Download presentation
Presentation is loading. Please wait.
1
Difficulties with Nonlinear SVM for Large Problems The nonlinear kernel is fully dense Computational complexity depends on Separating surface depends on almost entire dataset Complexity of nonlinear SSVM Runs out of memory while storing the kernel matrix Long CPU time to compute the dense kernel matrix Need to generate and store entries Need to store the entire dataset even after solving the problem
2
Reduced Support Vector Machine (ii) Solve the following problem by the Newton method with corresponding : min (iii) The nonlinear classifier is defined by the optimal solution in step (ii): Using gives lousy results! (i) Choose a random subset matrix of entire data matrix Nonlinear Classifier:
3
A Nonlinear Kernel Application Checkerboard Training Set: 1000 Points in Separate 486 Asterisks from 514 Dots
4
Conventional SVM Result on Checkerboard Using 50 Randomly Selected Points Out of 1000
5
RSVM Result on Checkerboard Using SAME 50 Random Points Out of 1000
6
RSVM on Moderate Sized Problems (Best Test Set Correctness %, CPU seconds) Cleveland Heart 297 x 13, 30 86.47 3.04 85.92 32.42 76.88 1.58 BUPA Liver 345 x 6, 35 74.86 2.68 73.62 32.61 68.95 2.04 Ionosphere 351 x 34, 35 95.19 5.02 94.35 59.88 88.70 2.13 Pima Indians 768 x 8, 50 78.64 5.72 76.59 328.3 57.32 4.64 Tic-Tac-Toe 958 x 9, 96 98.75 14.56 98.43 1033.5 88.24 8.87 Mushroom 8124 x 22, 215 89.04 466.20 N/A 83.90 221.50
7
RSVM on Large UCI Adult Dataset Standard Deviation over 50 Runs = 0.001 Average Correctness % & Standard Deviation, 50 Runs (6414, 26148) 84.470.00177.030.014210 3.2% (11221, 21341) 84.710.00175.960.016225 2.0% (16101, 16461) 84.900.00175.450.017242 1.5% (22697, 9865) 85.310.00176.730.018284 1.2% (32562, 16282) 85.070.00176.950.013326 1.0%
8
Time( CPU sec. ) Training Set Size RSVM SMO PCGC
9
Support Vector Regression (Linear Case:) Given the training set: Find a linear function, where is determined by solving a minimization problem that guarantees the smallest overall experiment error made by Motivated by SVM: should be as small as possible Some tiny error should be discard
10
-Insensitive Loss Function -insensitive loss function: The loss made by the estimation function, at the data point is If then is defined as:
11
-Insensitive Linear Regression Find with the smallest overall error
12
- insensitive Support Vector Regression Model Motivated by SVM: should be as small as possible Some tiny error should be discarded where
13
Reformulated - SVR as a Constrained Minimization Problem subject to n+1+2m variables and 2m constrains minimization problem Enlarge the problem size and computational complexity for solving the problem
14
SV Regression by Minimizing Quadratic -Insensitive Loss We minimizeat the same time Occam’s razor : the simplest is the best We have the following (nonsmooth) problem: where Have the strong convexity of the problem
15
- insensitive Loss Function
16
Quadratic -insensitive Loss Function
17
-function replaceUse Quadratic -insensitive Function whichis defined by -function with
19
-insensitive Smooth Support Vector Regression strongly convex This problem is a strongly convex minimization problem without any constrains twice differentiable Newton-Armijo method The object function is twice differentiable thus we can use a fast Newton-Armijo method to solve this problem
20
Nonlinear -SVR Based on duality theorem and KKT – optimality conditions In nonlinear case :
21
Nonlinear SVR Let and Nonlinear regression function :
22
Nonlinear Smooth Support Vector -insensitive Regression
23
Slice method Training set and testing set (Slice method) Gaussian kernel Gaussian kernel is used to generate nonlinear -SVR in all experiments Reduced kernel technique Reduced kernel technique is utilized when training dataset is bigger then 1000 Error measure : 2-norm relative error Numerical Results : observations : predicted values
24
+noise Noise: mean=0, 101 points Parameter: Training time : 0.3 sec. 101 Data Points in Nonlinear SSVR with Kernel:
25
First Artificial Dataset random noise with mean=0,standard deviation 0.04 Training Time : 0.016 sec. Error : 0.059 Training Time : 0.015 sec. Error : 0.068 - SSVR LIBSVM
26
Original Function Noise : mean=0, Parameter : Training time : 9.61 sec. Mean Absolute Error (MAE) of 49x49 mesh points : 0.1761 Estimated Function 481 Data Points in
27
Noise : mean=0, Estimated Function Original Function Using Reduced Kernel: Parameter : Training time : 22.58 sec. MAE of 49x49 mesh points : 0.0513
28
Real Datasets
29
Linear -SSVR Tenfold Numerical Result
30
Nonlinear -SSVR Tenfold Numerical Result 1/2
31
Nonlinear -SSVR Tenfold Numerical Result 2/2
32
Difficulties with Nonlinear SVM for Large Problems The nonlinear kernel is fully dense Computational complexity depends on Separating surface depends on almost entire dataset Complexity of nonlinear SSVM Runs out of memory while storing the kernel matrix Long CPU time to compute the dense kernel matrix Need to generate and store entries Need to store the entire dataset even after solving the problem
33
Reduced Support Vector Machine (ii) Solve the following problem by the Newton method with corresponding : min (iii) The nonlinear classifier is defined by the optimal solution in step (ii): Using gives lousy results! (i) Choose a random subset matrix of entire data matrix Nonlinear Classifier:
34
A Nonlinear Kernel Application Checkerboard Training Set: 1000 Points in Separate 486 Asterisks from 514 Dots
35
Conventional SVM Result on Checkerboard Using 50 Randomly Selected Points Out of 1000
36
RSVM Result on Checkerboard Using SAME 50 Random Points Out of 1000
37
RSVM on Moderate Sized Problems (Best Test Set Correctness %, CPU seconds) Cleveland Heart 297 x 13, 30 86.47 3.04 85.92 32.42 76.88 1.58 BUPA Liver 345 x 6, 35 74.86 2.68 73.62 32.61 68.95 2.04 Ionosphere 351 x 34, 35 95.19 5.02 94.35 59.88 88.70 2.13 Pima Indians 768 x 8, 50 78.64 5.72 76.59 328.3 57.32 4.64 Tic-Tac-Toe 958 x 9, 96 98.75 14.56 98.43 1033.5 88.24 8.87 Mushroom 8124 x 22, 215 89.04 466.20 N/A 83.90 221.50
38
RSVM on Large UCI Adult Dataset Standard Deviation over 50 Runs = 0.001 Average Correctness % & Standard Deviation, 50 Runs (6414, 26148) 84.470.00177.030.014210 3.2% (11221, 21341) 84.710.00175.960.016225 2.0% (16101, 16461) 84.900.00175.450.017242 1.5% (22697, 9865) 85.310.00176.730.018284 1.2% (32562, 16282) 85.070.00176.950.013326 1.0%
39
Time( CPU sec. ) Training Set Size RSVM SMO PCGC
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.