Presentation is loading. Please wait.

Presentation is loading. Please wait.

Difficulties with Nonlinear SVM for Large Problems  The nonlinear kernel is fully dense  Computational complexity depends on  Separating surface depends.

Similar presentations


Presentation on theme: "Difficulties with Nonlinear SVM for Large Problems  The nonlinear kernel is fully dense  Computational complexity depends on  Separating surface depends."— Presentation transcript:

1 Difficulties with Nonlinear SVM for Large Problems  The nonlinear kernel is fully dense  Computational complexity depends on  Separating surface depends on almost entire dataset  Complexity of nonlinear SSVM  Runs out of memory while storing the kernel matrix  Long CPU time to compute the dense kernel matrix  Need to generate and store entries  Need to store the entire dataset even after solving the problem

2 Reduced Support Vector Machine (ii) Solve the following problem by the Newton method with corresponding : min (iii) The nonlinear classifier is defined by the optimal solution in step (ii): Using gives lousy results! (i) Choose a random subset matrix of entire data matrix Nonlinear Classifier:

3 A Nonlinear Kernel Application Checkerboard Training Set: 1000 Points in Separate 486 Asterisks from 514 Dots

4 Conventional SVM Result on Checkerboard Using 50 Randomly Selected Points Out of 1000

5 RSVM Result on Checkerboard Using SAME 50 Random Points Out of 1000

6 RSVM on Moderate Sized Problems (Best Test Set Correctness %, CPU seconds) Cleveland Heart 297 x 13, 30 86.47 3.04 85.92 32.42 76.88 1.58 BUPA Liver 345 x 6, 35 74.86 2.68 73.62 32.61 68.95 2.04 Ionosphere 351 x 34, 35 95.19 5.02 94.35 59.88 88.70 2.13 Pima Indians 768 x 8, 50 78.64 5.72 76.59 328.3 57.32 4.64 Tic-Tac-Toe 958 x 9, 96 98.75 14.56 98.43 1033.5 88.24 8.87 Mushroom 8124 x 22, 215 89.04 466.20 N/A 83.90 221.50

7 RSVM on Large UCI Adult Dataset Standard Deviation over 50 Runs = 0.001 Average Correctness % & Standard Deviation, 50 Runs (6414, 26148) 84.470.00177.030.014210 3.2% (11221, 21341) 84.710.00175.960.016225 2.0% (16101, 16461) 84.900.00175.450.017242 1.5% (22697, 9865) 85.310.00176.730.018284 1.2% (32562, 16282) 85.070.00176.950.013326 1.0%

8 Time( CPU sec. ) Training Set Size RSVM SMO PCGC

9 Support Vector Regression (Linear Case:)  Given the training set:  Find a linear function, where is determined by solving a minimization problem that guarantees the smallest overall experiment error made by  Motivated by SVM:  should be as small as possible  Some tiny error should be discard

10 -Insensitive Loss Function  -insensitive loss function:  The loss made by the estimation function, at the data point is  If then is defined as:

11 -Insensitive Linear Regression Find with the smallest overall error

12 - insensitive Support Vector Regression Model Motivated by SVM:  should be as small as possible  Some tiny error should be discarded where

13 Reformulated - SVR as a Constrained Minimization Problem subject to n+1+2m variables and 2m constrains minimization problem Enlarge the problem size and computational complexity for solving the problem

14 SV Regression by Minimizing Quadratic -Insensitive Loss  We minimizeat the same time  Occam’s razor : the simplest is the best  We have the following (nonsmooth) problem: where  Have the strong convexity of the problem

15 - insensitive Loss Function

16 Quadratic -insensitive Loss Function

17 -function replaceUse Quadratic -insensitive Function whichis defined by -function with

18

19 -insensitive Smooth Support Vector Regression strongly convex This problem is a strongly convex minimization problem without any constrains twice differentiable Newton-Armijo method The object function is twice differentiable thus we can use a fast Newton-Armijo method to solve this problem

20 Nonlinear -SVR Based on duality theorem and KKT – optimality conditions In nonlinear case :

21 Nonlinear SVR Let and Nonlinear regression function :

22 Nonlinear Smooth Support Vector -insensitive Regression

23 Slice method Training set and testing set (Slice method) Gaussian kernel Gaussian kernel is used to generate nonlinear -SVR in all experiments Reduced kernel technique Reduced kernel technique is utilized when training dataset is bigger then 1000 Error measure : 2-norm relative error Numerical Results : observations : predicted values

24 +noise Noise: mean=0, 101 points Parameter: Training time : 0.3 sec. 101 Data Points in Nonlinear SSVR with Kernel:

25 First Artificial Dataset random noise with mean=0,standard deviation 0.04 Training Time : 0.016 sec. Error : 0.059 Training Time : 0.015 sec. Error : 0.068 - SSVR LIBSVM

26 Original Function Noise : mean=0, Parameter : Training time : 9.61 sec. Mean Absolute Error (MAE) of 49x49 mesh points : 0.1761 Estimated Function 481 Data Points in

27 Noise : mean=0, Estimated Function Original Function Using Reduced Kernel: Parameter : Training time : 22.58 sec. MAE of 49x49 mesh points : 0.0513

28 Real Datasets

29 Linear -SSVR Tenfold Numerical Result

30 Nonlinear -SSVR Tenfold Numerical Result 1/2

31 Nonlinear -SSVR Tenfold Numerical Result 2/2

32 Difficulties with Nonlinear SVM for Large Problems  The nonlinear kernel is fully dense  Computational complexity depends on  Separating surface depends on almost entire dataset  Complexity of nonlinear SSVM  Runs out of memory while storing the kernel matrix  Long CPU time to compute the dense kernel matrix  Need to generate and store entries  Need to store the entire dataset even after solving the problem

33 Reduced Support Vector Machine (ii) Solve the following problem by the Newton method with corresponding : min (iii) The nonlinear classifier is defined by the optimal solution in step (ii): Using gives lousy results! (i) Choose a random subset matrix of entire data matrix Nonlinear Classifier:

34 A Nonlinear Kernel Application Checkerboard Training Set: 1000 Points in Separate 486 Asterisks from 514 Dots

35 Conventional SVM Result on Checkerboard Using 50 Randomly Selected Points Out of 1000

36 RSVM Result on Checkerboard Using SAME 50 Random Points Out of 1000

37 RSVM on Moderate Sized Problems (Best Test Set Correctness %, CPU seconds) Cleveland Heart 297 x 13, 30 86.47 3.04 85.92 32.42 76.88 1.58 BUPA Liver 345 x 6, 35 74.86 2.68 73.62 32.61 68.95 2.04 Ionosphere 351 x 34, 35 95.19 5.02 94.35 59.88 88.70 2.13 Pima Indians 768 x 8, 50 78.64 5.72 76.59 328.3 57.32 4.64 Tic-Tac-Toe 958 x 9, 96 98.75 14.56 98.43 1033.5 88.24 8.87 Mushroom 8124 x 22, 215 89.04 466.20 N/A 83.90 221.50

38 RSVM on Large UCI Adult Dataset Standard Deviation over 50 Runs = 0.001 Average Correctness % & Standard Deviation, 50 Runs (6414, 26148) 84.470.00177.030.014210 3.2% (11221, 21341) 84.710.00175.960.016225 2.0% (16101, 16461) 84.900.00175.450.017242 1.5% (22697, 9865) 85.310.00176.730.018284 1.2% (32562, 16282) 85.070.00176.950.013326 1.0%

39 Time( CPU sec. ) Training Set Size RSVM SMO PCGC


Download ppt "Difficulties with Nonlinear SVM for Large Problems  The nonlinear kernel is fully dense  Computational complexity depends on  Separating surface depends."

Similar presentations


Ads by Google