Presentation is loading. Please wait.

Presentation is loading. Please wait.

LOGO Classification IV Lecturer: Dr. Bo Yuan

Similar presentations


Presentation on theme: "LOGO Classification IV Lecturer: Dr. Bo Yuan"— Presentation transcript:

1 LOGO Classification IV Lecturer: Dr. Bo Yuan

2 Overview  Support Vector Machines 2

3 Linear Classifier 3 w · x + b >0 w · x + b <0 w · x + b =0 w

4 Distance to Hyperplane 4 x x'x'

5 Selection of Classifiers 5 Which classifier is the best? All have the same training error. How about generalization? ?

6 Unknown Samples 6 A B Classifier B divides the space more consistently (unbiased).

7 Margins 7 Support Vectors

8 Margins  The margin of a linear classifier is defined as the width that the boundary could be increased by before hitting a data point.  Intuitively, it is safer to choose a classifier with a larger margin.  Wider buffer zone for mistakes  The hyperplane is decided by only a few data points.  Support Vectors!  Others can be discarded!  Select the classifier with the maximum margin.  Linear Support Vector Machines (LSVM)  Works very well in practice.  How to specify the margin formally? 8

9 Margins 9 “Predict Class = +1” zone “Predict Class = -1” zone wx+b=1 wx+b=0 wx+b=-1 X-X- x+x+ M=Margin Width

10 Objective Function  Correctly classify all data points:  Maximize the margin  Quadratic Optimization Problem  Minimize  Subject to 10

11 Lagrange Multipliers 11 Dual Problem Quadratic problem again!

12 Solutions of w & b 12 inner product

13 An Example 13 (1, 1, +1) (0, 0, -1) x1x1 x2x2

14 Soft Margin 14 wx+b=1 wx+b=0 wx+b=-1 77  11 22

15 Soft Margin 15

16 Non-linear SVMs 16 0 x 0 x x2x2 x

17 Feature Space 17 Φ: x → φ(x) x1x1 x2x2 x12x12 x22x22

18 Feature Space 18 Φ: x → φ(x) x2x2 x1x1

19 Quadratic Basis Functions 19 Constant Terms Linear Terms Pure Quadratic Terms Quadratic Cross-Terms Number of terms

20 Calculation of Φ(x i )·Φ(x j ) 20

21 It turns out … 21

22 Kernel Trick The linear classifier relies on dot products between vectors x i ·x j If every data point is mapped into a high-dimensional space via some transformation Φ: x → φ(x), the dot product becomes: φ(x i ) ·φ(x j ) A kernel function is some function that corresponds to an inner product in some expanded feature space: K(x i, x j ) = φ(x i ) ·φ(x j ) Example: x=[x 1, x 2 ]; K(x i, x j ) = (1 + x i · x j ) 2 22

23 Kernels 23

24 String Kernel 24 Similarity between text strings: Car vs. Custard

25 Solutions of w & b 25

26 Decision Boundaries 26

27 More Maths … 27

28 SVM Roadmap 28 Kernel Trick K(a,b)= Φ(a)·Φ(b) a · b → Φ(a)·Φ(b) High Computational Cost Soft Margin Nonlinear Problem Linear SVM Noise Linear Classifier Maximum Margin

29 Reading Materials  Text Book  Nello Cristianini and John Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press,  Online Resources      A list of papers uploaded to the web learning portal  Wikipedia & Google 29

30 Review  What is the definition of margin in a linear classifier?  Why do we want to maximize the margin?  What is the mathematical expression of margin?  How to solve the objective function in SVM?  What are support vectors?  What is soft margin?  How does SVM solve nonlinear problems?  What is so called “kernel trick”?  What are the commonly used kernels? 30

31 Next Week’s Class Talk  Volunteers are required for next week’s class talk.  Topic : SVM in Practice  Hints:  Applications  Demos  Multi-Class Problems  Software A very popular toolbox: Libsvm  Any other interesting topics beyond this lecture  Length: 20 minutes plus question time 31


Download ppt "LOGO Classification IV Lecturer: Dr. Bo Yuan"

Similar presentations


Ads by Google