Difficulties with Nonlinear SVM for Large Problems  The nonlinear kernel is fully dense  Computational complexity depends on  Separating surface depends.

Difficulties with Nonlinear SVM for Large Problems  The nonlinear kernel is fully dense  Computational complexity depends on  Separating surface depends on almost entire dataset  Complexity of nonlinear SSVM  Runs out of memory while storing the kernel matrix  Long CPU time to compute the dense kernel matrix  Need to generate and store entries  Need to store the entire dataset even after solving the problem

Reduced Support Vector Machine (ii) Solve the following problem by the Newton method with corresponding : min (iii) The nonlinear classifier is defined by the optimal solution in step (ii): Using gives lousy results! (i) Choose a random subset matrix of entire data matrix Nonlinear Classifier:

A Nonlinear Kernel Application Checkerboard Training Set: 1000 Points in Separate 486 Asterisks from 514 Dots

Conventional SVM Result on Checkerboard Using 50 Randomly Selected Points Out of 1000

RSVM Result on Checkerboard Using SAME 50 Random Points Out of 1000

RSVM on Moderate Sized Problems (Best Test Set Correctness %, CPU seconds) Cleveland Heart 297 x 13, 30 86.47 3.04 85.92 32.42 76.88 1.58 BUPA Liver 345 x 6, 35 74.86 2.68 73.62 32.61 68.95 2.04 Ionosphere 351 x 34, 35 95.19 5.02 94.35 59.88 88.70 2.13 Pima Indians 768 x 8, 50 78.64 5.72 76.59 328.3 57.32 4.64 Tic-Tac-Toe 958 x 9, 96 98.75 14.56 98.43 1033.5 88.24 8.87 Mushroom 8124 x 22, 215 89.04 466.20 N/A 83.90 221.50

RSVM on Large UCI Adult Dataset Standard Deviation over 50 Runs = 0.001 Average Correctness % & Standard Deviation, 50 Runs (6414, 26148) 84.470.00177.030.014210 3.2% (11221, 21341) 84.710.00175.960.016225 2.0% (16101, 16461) 84.900.00175.450.017242 1.5% (22697, 9865) 85.310.00176.730.018284 1.2% (32562, 16282) 85.070.00176.950.013326 1.0%

Time( CPU sec. ) Training Set Size RSVM SMO PCGC

Support Vector Regression (Linear Case:)  Given the training set:  Find a linear function, where is determined by solving a minimization problem that guarantees the smallest overall experiment error made by  Motivated by SVM:  should be as small as possible  Some tiny error should be discard

-Insensitive Loss Function  -insensitive loss function:  The loss made by the estimation function, at the data point is  If then is defined as:

-Insensitive Linear Regression Find with the smallest overall error

- insensitive Support Vector Regression Model Motivated by SVM:  should be as small as possible  Some tiny error should be discarded where

Reformulated - SVR as a Constrained Minimization Problem subject to n+1+2m variables and 2m constrains minimization problem Enlarge the problem size and computational complexity for solving the problem

SV Regression by Minimizing Quadratic -Insensitive Loss  We minimizeat the same time  Occam’s razor : the simplest is the best  We have the following (nonsmooth) problem: where  Have the strong convexity of the problem

- insensitive Loss Function

Quadratic -insensitive Loss Function

-function replaceUse Quadratic -insensitive Function whichis defined by -function with

-insensitive Smooth Support Vector Regression strongly convex This problem is a strongly convex minimization problem without any constrains twice differentiable Newton-Armijo method The object function is twice differentiable thus we can use a fast Newton-Armijo method to solve this problem

Nonlinear -SVR Based on duality theorem and KKT – optimality conditions In nonlinear case :

Nonlinear SVR Let and Nonlinear regression function :

Nonlinear Smooth Support Vector -insensitive Regression

Slice method Training set and testing set (Slice method) Gaussian kernel Gaussian kernel is used to generate nonlinear -SVR in all experiments Reduced kernel technique Reduced kernel technique is utilized when training dataset is bigger then 1000 Error measure : 2-norm relative error Numerical Results : observations : predicted values

+noise Noise: mean=0, 101 points Parameter: Training time : 0.3 sec. 101 Data Points in Nonlinear SSVR with Kernel:

First Artificial Dataset random noise with mean=0,standard deviation 0.04 Training Time : 0.016 sec. Error : 0.059 Training Time : 0.015 sec. Error : 0.068 - SSVR LIBSVM

Original Function Noise : mean=0, Parameter : Training time : 9.61 sec. Mean Absolute Error (MAE) of 49x49 mesh points : 0.1761 Estimated Function 481 Data Points in

Noise : mean=0, Estimated Function Original Function Using Reduced Kernel: Parameter : Training time : 22.58 sec. MAE of 49x49 mesh points : 0.0513

Real Datasets

Linear -SSVR Tenfold Numerical Result

Nonlinear -SSVR Tenfold Numerical Result 1/2

Nonlinear -SSVR Tenfold Numerical Result 2/2

Difficulties with Nonlinear SVM for Large Problems  The nonlinear kernel is fully dense  Computational complexity depends on  Separating surface depends on almost entire dataset  Complexity of nonlinear SSVM  Runs out of memory while storing the kernel matrix  Long CPU time to compute the dense kernel matrix  Need to generate and store entries  Need to store the entire dataset even after solving the problem

Reduced Support Vector Machine (ii) Solve the following problem by the Newton method with corresponding : min (iii) The nonlinear classifier is defined by the optimal solution in step (ii): Using gives lousy results! (i) Choose a random subset matrix of entire data matrix Nonlinear Classifier:

A Nonlinear Kernel Application Checkerboard Training Set: 1000 Points in Separate 486 Asterisks from 514 Dots

Conventional SVM Result on Checkerboard Using 50 Randomly Selected Points Out of 1000

RSVM Result on Checkerboard Using SAME 50 Random Points Out of 1000

RSVM on Moderate Sized Problems (Best Test Set Correctness %, CPU seconds) Cleveland Heart 297 x 13, 30 86.47 3.04 85.92 32.42 76.88 1.58 BUPA Liver 345 x 6, 35 74.86 2.68 73.62 32.61 68.95 2.04 Ionosphere 351 x 34, 35 95.19 5.02 94.35 59.88 88.70 2.13 Pima Indians 768 x 8, 50 78.64 5.72 76.59 328.3 57.32 4.64 Tic-Tac-Toe 958 x 9, 96 98.75 14.56 98.43 1033.5 88.24 8.87 Mushroom 8124 x 22, 215 89.04 466.20 N/A 83.90 221.50

RSVM on Large UCI Adult Dataset Standard Deviation over 50 Runs = 0.001 Average Correctness % & Standard Deviation, 50 Runs (6414, 26148) 84.470.00177.030.014210 3.2% (11221, 21341) 84.710.00175.960.016225 2.0% (16101, 16461) 84.900.00175.450.017242 1.5% (22697, 9865) 85.310.00176.730.018284 1.2% (32562, 16282) 85.070.00176.950.013326 1.0%

Time( CPU sec. ) Training Set Size RSVM SMO PCGC

Difficulties with Nonlinear SVM for Large Problems  The nonlinear kernel is fully dense  Computational complexity depends on  Separating surface depends.

Similar presentations

Presentation on theme: "Difficulties with Nonlinear SVM for Large Problems  The nonlinear kernel is fully dense  Computational complexity depends on  Separating surface depends."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Difficulties with Nonlinear SVM for Large Problems  The nonlinear kernel is fully dense  Computational complexity depends on  Separating surface depends.

Similar presentations

Presentation on theme: "Difficulties with Nonlinear SVM for Large Problems  The nonlinear kernel is fully dense  Computational complexity depends on  Separating surface depends."— Presentation transcript:

Similar presentations

About project

Feedback