Download presentation

Presentation is loading. Please wait.

Published byJada Whicker Modified about 1 year ago

1
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 7: SPARSE KERNEL MACHINES

2
Outline The problem: finding a sparse decision (and regression) machine that uses kernels The solution: Support Vector Machines (SVMs) and Relevance Vector Machines (RVMs) The core ideas behind the solutions The mathematical details

3
The problem (1) Methods introduced in chapters 3 and 4 Take into account all data points in the training set -> cumbersome Do not take advantage of kernel methods -> basis functions have to be explicit Example: Least squares and logistic regression

4
The problem (2) Kernel methods require evaluation of the kernel function for all pairs of -> cumbersome

5
The solution (1) Support vector machines (SVMs) are kernel machines that compute a decision boundary making sparse use of data points

6
The solution (2) Relevance vector machines (RVMs) are kernel machines that compute a posterior class probability making sparse use of data points

7
The solution (3) SVMs as well as RVMs can also be used for regression SVMRVM even sparser!

8
SVM: The core idea (1) That class separator which maximizes the margin between itself and the nearest data points will have the smallest generalization error:

9
SVM: The core idea (2) In input space:

10
SVM: The core idea (3) For regression:

11
RVM: The core idea (1) Exclude basis vectors whose presence reduces the probability of the observed data

12
RVM: The core idea (2) For classification and regression: ClassificationRegression

13
SVM: The details (1) Equation of the decision surface: Distance of a point from the decision surface:

14
SVM: The details (2) Distance of a point from the decision surface: Maximum margin solution:

15
SVM: The details (3) Distance of a point from the decision surface: We therefore may rescale, such that for the point closest to the surface.

16
SVM: The details (4) Therefore, we can reduce to under the constraint

17
SVM: The details (5) To solve this, we introduce Lagrange multipliers and minimize Equivalently, we can maximize the dual representation where the kernel function can be chosen without specifying explicitly.

18
SVM: The details (6) Because of the constraint only those survive for which is on the margin, i.e. This leads to sparsity.

19
SVM: The details (7) Based on numerical optimization of the parameters and, predictions on new data points can be made by evaluating the sign of

20
SVM: The details (8) In cases where the data points are not separable in feature space, we need a soft margin, i.e. a (limited) tolerance for misclassified points. To achieve this, we introduce slack variables with

21
SVM: The details (9) Graphically:

22
SVM: The details (10) The same procedure as before (with additional Lagrange multipliers and corresponding additional constraints) again yields a sparse kernel-based solution:

23
SVM: The details (11) The soft-margin approach can be formulated as minimizing the regularized error function This formulation can be extended to use SVMs for regression: where and are slack variables describing the position of a data point above or below a tube of width 2ϵ around the estimate y.

24
SVM: The details (12) Graphically:

25
SVM: The details (13) Again, optimization using Lagrange multipliers yields a sparse kernel-based solution:

26
SVM: Limitations Output is a decision, not a posterior probability Extension of classification to more than two classes is problematic The parameters C and ϵ have to be found by methods such as cross validation Kernel functions are required to be positive definite

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google