Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistical Learning Dong Liu Dept. EEIS, USTC.

Similar presentations


Presentation on theme: "Statistical Learning Dong Liu Dept. EEIS, USTC."— Presentation transcript:

1 Statistical Learning Dong Liu Dept. EEIS, USTC

2 Chapter 3. Support Vector Machine (SVM)
Max-margin linear classification Soft-margin linear classification The kernel trick Efficient algorithm for SVM 2018/11/27 Chap 3. SVM

3 Linear classification
Which one is the optimal? 2018/11/27 Chap 3. SVM

4 Classification margin
We want to maximize the margin Intuitively, this way the classifier is the most tolerant to noise Theoretically, this way the classifier has the best generalization ability margin 2018/11/27 Chap 3. SVM

5 Geometric margin & Functional margin
For a point , its distance to the decision boundary is Geometric margin is Functional margin is Since we can amplify both w and b by a scaling factor, without changing the geometric margin, we can set 2018/11/27 Chap 3. SVM

6 Maximize geometric margin 1/2
The problem is Equivalent to 2018/11/27 Chap 3. SVM

7 Maximize geometric margin 2/2
Using the Lagrange multiplier According to KKT condition is determined by the samples that have non-zero where These samples are termed support vectors 2018/11/27 Chap 3. SVM

8 Support vectors margin is determined by the samples that have non-zero
wT x + b = 1 margin is determined by the samples that have non-zero where These samples are termed support vectors wT x + b = -1 2018/11/27 Chap 3. SVM

9 Lagrange dual For a general constrained optimization problem
We try to solve And its dual problem is Under certain conditions, the original problem and its dual problem are equivalent 2018/11/27 Chap 3. SVM

10 Lagrange dual of max-margin
Original problem: Dual problem: Plus KKT condition: 2018/11/27 Chap 3. SVM

11 Solution of max-margin
Once we have solved the dual problem, we have Summary: for max-margin classification, we can solve the dual problem to find out support vectors, and then determine the best decision boundary 2018/11/27 Chap 3. SVM

12 Chapter 3. Support Vector Machine (SVM)
Max-margin linear classification Soft-margin linear classification The kernel trick Efficient algorithm for SVM 2018/11/27 Chap 3. SVM

13 Why soft margin 1/4 If the dataset is linearly non-separable, how to define margin? 2018/11/27 Chap 3. SVM

14 Why soft margin 2/4 We may still define margin disregarding the “error” samples margin 2018/11/27 Chap 3. SVM

15 Why soft margin 3/4 Or, even if the dataset is linearly separable, we still want to give rise to a large margin 2018/11/27 Chap 3. SVM

16 Why soft margin 4/4 We may still define margin but allow samples to be exceptions margin 2018/11/27 Chap 3. SVM

17 Soft margin formulation 1/3
We change our objective to where the indicator function is Compared to the “hard” margin 2018/11/27 Chap 3. SVM

18 Soft margin formulation 2/3
Using the Lagrange multiplier Since the indicator function is intractable, we replace it with So the problem becomes It can be interpreted as to minimize hinge loss with L2 norm regularization 2018/11/27 Chap 3. SVM

19 Soft margin formulation 3/3
Define slack variables The problem becomes x1 x2 wT x + b = 0 wT x + b = -1 wT x + b = 1 2018/11/27 Chap 3. SVM

20 Soft margin solution 1/2 Using the Lagrange multiplier
The KKT condition: 2018/11/27 Chap 3. SVM

21 Soft margin solution 2/2 Thus the Lagrange dual problem is
Samples are categorized into Support vectors 2018/11/27 Chap 3. SVM

22 Support vectors in soft margin SVM
wT x + b = 1 margin is determined by the samples that have non-zero where These samples are termed support vectors wT x + b = -1 2018/11/27 Chap 3. SVM

23 Chapter 3. Support Vector Machine (SVM)
Max-margin linear classification Soft-margin linear classification The kernel trick Efficient algorithm for SVM 2018/11/27 Chap 3. SVM

24 Using basis functions A non-linear transform to allow for an easier (linear) classification 2018/11/27 Chap 3. SVM

25 SVM with basis functions
Consider to solve The dual problem is The solution is 2018/11/27 Chap 3. SVM

26 From basis function to kernel function
We notice that, all the basis functions appear in the form of inner product We define the inner product of basis functions as which is termed kernel function The space of basis function is then termed Reproducing Kernel Hilbert Space (RKHS) 2018/11/27 Chap 3. SVM

27 Kernel function: example
For example, we can prove that if , then is a kernel function Since we can set Similarly, we can prove the following kernel functions RBF = Radial-Basis Function 2018/11/27 Chap 3. SVM

28 Kernel function: benefit
For SVM (and alike), defining kernel function is equivalent to designing basis functions, which is termed kernel trick Sometimes it is easier to express in kernel function than in basis function, for example RBF kernel Sometimes we can prove a function is a kernel function but it is difficult to write out its corresponding basis function If a function satisfies the Mercer’s condition, it is a kernel function 2018/11/27 Chap 3. SVM

29 Kernelized SVM The dual problem is Once solved, we have 2018/11/27
Chap 3. SVM

30 Kernelized SVM: example
Using the RBF kernel 2018/11/27 Chap 3. SVM

31 More about the kernel trick
There are many problems that can be formulated using the kernel trick, i.e. using kernel function to replace basis function The Representation Theorem claims that, the solution to can be expressed as 2018/11/27 Chap 3. SVM

32 Chapter 3. Support Vector Machine (SVM)
Max-margin linear classification Soft-margin linear classification The kernel trick Efficient algorithm for SVM 2018/11/27 Chap 3. SVM

33 SMO algorithm The (dual) problem is
For this problem, sequential minimal optimization (SMO) is an efficient algorithm Choose two Lagrange multipliers as variables, optimize over them while keeping the other multipliers unchanged, and iterate 2018/11/27 Chap 3. SVM

34 SMO algorithm: considering two variables
Due to the KKT condition, we have either or And our objective is a quadratic function 2018/11/27 Chap 3. SVM

35 Chapter summary Kernel trick Margin; geometric ~; soft ~
Dictionary Toolbox Kernel trick Margin; geometric ~; soft ~ Mercer’s condition Representation theorem RKHS Slack variable Support vector Hinge loss Kernel function Lagrange dual Sequential minimal optimization (SMO) 2018/11/27 Chap 3. SVM


Download ppt "Statistical Learning Dong Liu Dept. EEIS, USTC."

Similar presentations


Ads by Google