Download presentation

Presentation is loading. Please wait.

1
LOGO Classification IV Lecturer: Dr. Bo Yuan

2
Overview Support Vector Machines 2

3
Linear Classifier 3 w · x + b >0 w · x + b <0 w · x + b =0 w

4
Distance to Hyperplane 4 x x'x'

5
Selection of Classifiers 5 Which classifier is the best? All have the same training error. How about generalization? ?

6
Unknown Samples 6 A B Classifier B divides the space more consistently (unbiased).

7
Margins 7 Support Vectors

8
Margins The margin of a linear classifier is defined as the width that the boundary could be increased by before hitting a data point. Intuitively, it is safer to choose a classifier with a larger margin. Wider buffer zone for mistakes The hyperplane is decided by only a few data points. Support Vectors! Others can be discarded! Select the classifier with the maximum margin. Linear Support Vector Machines (LSVM) Works very well in practice. How to specify the margin formally? 8

9
Margins 9 “Predict Class = +1” zone “Predict Class = -1” zone wx+b=1 wx+b=0 wx+b=-1 X-X- x+x+ M=Margin Width

10
Objective Function Correctly classify all data points: Maximize the margin Quadratic Optimization Problem Minimize Subject to 10

11
Lagrange Multipliers 11 Dual Problem Quadratic problem again!

12
Solutions of w & b 12 inner product

13
An Example 13 (1, 1, +1) (0, 0, -1) x1x1 x2x2

14
Soft Margin 14 wx+b=1 wx+b=0 wx+b=-1 77 11 22

15
Soft Margin 15

16
Non-linear SVMs 16 0 x 0 x x2x2 x

17
Feature Space 17 Φ: x → φ(x) x1x1 x2x2 x12x12 x22x22

18
Feature Space 18 Φ: x → φ(x) x2x2 x1x1

19
Quadratic Basis Functions 19 Constant Terms Linear Terms Pure Quadratic Terms Quadratic Cross-Terms Number of terms

20
Calculation of Φ(x i )·Φ(x j ) 20

21
It turns out … 21

22
Kernel Trick The linear classifier relies on dot products between vectors x i ·x j If every data point is mapped into a high-dimensional space via some transformation Φ: x → φ(x), the dot product becomes: φ(x i ) ·φ(x j ) A kernel function is some function that corresponds to an inner product in some expanded feature space: K(x i, x j ) = φ(x i ) ·φ(x j ) Example: x=[x 1, x 2 ]; K(x i, x j ) = (1 + x i · x j ) 2 22

23
Kernels 23

24
String Kernel 24 Similarity between text strings: Car vs. Custard

25
Solutions of w & b 25

26
Decision Boundaries 26

27
More Maths … 27

28
SVM Roadmap 28 Kernel Trick K(a,b)= Φ(a)·Φ(b) a · b → Φ(a)·Φ(b) High Computational Cost Soft Margin Nonlinear Problem Linear SVM Noise Linear Classifier Maximum Margin

29
Reading Materials Text Book Nello Cristianini and John Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, Online Resources A list of papers uploaded to the web learning portal Wikipedia & Google 29

30
Review What is the definition of margin in a linear classifier? Why do we want to maximize the margin? What is the mathematical expression of margin? How to solve the objective function in SVM? What are support vectors? What is soft margin? How does SVM solve nonlinear problems? What is so called “kernel trick”? What are the commonly used kernels? 30

31
Next Week’s Class Talk Volunteers are required for next week’s class talk. Topic : SVM in Practice Hints: Applications Demos Multi-Class Problems Software A very popular toolbox: Libsvm Any other interesting topics beyond this lecture Length: 20 minutes plus question time 31

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google