LOGO Classification IV Lecturer: Dr. Bo Yuan

Presentation on theme: "LOGO Classification IV Lecturer: Dr. Bo Yuan"— Presentation transcript:

LOGO Classification IV Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Overview  Support Vector Machines 2

Linear Classifier 3 w · x + b >0 w · x + b <0 w · x + b =0 w

Distance to Hyperplane 4 x x'x'

Selection of Classifiers 5 Which classifier is the best? All have the same training error. How about generalization? ?

Unknown Samples 6 A B Classifier B divides the space more consistently (unbiased).

Margins 7 Support Vectors

Margins  The margin of a linear classifier is defined as the width that the boundary could be increased by before hitting a data point.  Intuitively, it is safer to choose a classifier with a larger margin.  Wider buffer zone for mistakes  The hyperplane is decided by only a few data points.  Support Vectors!  Others can be discarded!  Select the classifier with the maximum margin.  Linear Support Vector Machines (LSVM)  Works very well in practice.  How to specify the margin formally? 8

Margins 9 “Predict Class = +1” zone “Predict Class = -1” zone wx+b=1 wx+b=0 wx+b=-1 X-X- x+x+ M=Margin Width

Objective Function  Correctly classify all data points:  Maximize the margin  Quadratic Optimization Problem  Minimize  Subject to 10

Lagrange Multipliers 11 Dual Problem Quadratic problem again!

Solutions of w & b 12 inner product

An Example 13 (1, 1, +1) (0, 0, -1) x1x1 x2x2

Soft Margin 14 wx+b=1 wx+b=0 wx+b=-1 77  11 22

Soft Margin 15

Non-linear SVMs 16 0 x 0 x x2x2 x

Feature Space 17 Φ: x → φ(x) x1x1 x2x2 x12x12 x22x22

Feature Space 18 Φ: x → φ(x) x2x2 x1x1

Calculation of Φ(x i )·Φ(x j ) 20

It turns out … 21

Kernel Trick The linear classifier relies on dot products between vectors x i ·x j If every data point is mapped into a high-dimensional space via some transformation Φ: x → φ(x), the dot product becomes: φ(x i ) ·φ(x j ) A kernel function is some function that corresponds to an inner product in some expanded feature space: K(x i, x j ) = φ(x i ) ·φ(x j ) Example: x=[x 1, x 2 ]; K(x i, x j ) = (1 + x i · x j ) 2 22

Kernels 23

String Kernel 24 Similarity between text strings: Car vs. Custard

Solutions of w & b 25

Decision Boundaries 26

More Maths … 27

SVM Roadmap 28 Kernel Trick K(a,b)= Φ(a)·Φ(b) a · b → Φ(a)·Φ(b) High Computational Cost Soft Margin Nonlinear Problem Linear SVM Noise Linear Classifier Maximum Margin

Reading Materials  Text Book  Nello Cristianini and John Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, 2000.  Online Resources  http://www.kernel-machines.org/  http://www.support-vector-machines.org/  http://www.tristanfletcher.co.uk/SVM%20Explained.pdf  http://www.csie.ntu.edu.tw/~cjlin/libsvm/  A list of papers uploaded to the web learning portal  Wikipedia & Google 29

Review  What is the definition of margin in a linear classifier?  Why do we want to maximize the margin?  What is the mathematical expression of margin?  How to solve the objective function in SVM?  What are support vectors?  What is soft margin?  How does SVM solve nonlinear problems?  What is so called “kernel trick”?  What are the commonly used kernels? 30

Next Week’s Class Talk  Volunteers are required for next week’s class talk.  Topic : SVM in Practice  Hints:  Applications  Demos  Multi-Class Problems  Software A very popular toolbox: Libsvm  Any other interesting topics beyond this lecture  Length: 20 minutes plus question time 31