Presentation on theme: "LOGO Classification IV Lecturer: Dr. Bo Yuan"— Presentation transcript:
LOGO Classification IV Lecturer: Dr. Bo Yuan
Overview Support Vector Machines 2
Linear Classifier 3 w · x + b >0 w · x + b <0 w · x + b =0 w
Distance to Hyperplane 4 x x'x'
Selection of Classifiers 5 Which classifier is the best? All have the same training error. How about generalization? ?
Unknown Samples 6 A B Classifier B divides the space more consistently (unbiased).
Margins 7 Support Vectors
Margins The margin of a linear classifier is defined as the width that the boundary could be increased by before hitting a data point. Intuitively, it is safer to choose a classifier with a larger margin. Wider buffer zone for mistakes The hyperplane is decided by only a few data points. Support Vectors! Others can be discarded! Select the classifier with the maximum margin. Linear Support Vector Machines (LSVM) Works very well in practice. How to specify the margin formally? 8
Margins 9 “Predict Class = +1” zone “Predict Class = -1” zone wx+b=1 wx+b=0 wx+b=-1 X-X- x+x+ M=Margin Width
Objective Function Correctly classify all data points: Maximize the margin Quadratic Optimization Problem Minimize Subject to 10
Lagrange Multipliers 11 Dual Problem Quadratic problem again!
Feature Space 17 Φ: x → φ(x) x1x1 x2x2 x12x12 x22x22
Feature Space 18 Φ: x → φ(x) x2x2 x1x1
Quadratic Basis Functions 19 Constant Terms Linear Terms Pure Quadratic Terms Quadratic Cross-Terms Number of terms
Calculation of Φ(x i )·Φ(x j ) 20
It turns out … 21
Kernel Trick The linear classifier relies on dot products between vectors x i ·x j If every data point is mapped into a high-dimensional space via some transformation Φ: x → φ(x), the dot product becomes: φ(x i ) ·φ(x j ) A kernel function is some function that corresponds to an inner product in some expanded feature space: K(x i, x j ) = φ(x i ) ·φ(x j ) Example: x=[x 1, x 2 ]; K(x i, x j ) = (1 + x i · x j ) 2 22
String Kernel 24 Similarity between text strings: Car vs. Custard
Solutions of w & b 25
Decision Boundaries 26
More Maths … 27
SVM Roadmap 28 Kernel Trick K(a,b)= Φ(a)·Φ(b) a · b → Φ(a)·Φ(b) High Computational Cost Soft Margin Nonlinear Problem Linear SVM Noise Linear Classifier Maximum Margin
Reading Materials Text Book Nello Cristianini and John Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, Online Resources A list of papers uploaded to the web learning portal Wikipedia & Google 29
Review What is the definition of margin in a linear classifier? Why do we want to maximize the margin? What is the mathematical expression of margin? How to solve the objective function in SVM? What are support vectors? What is soft margin? How does SVM solve nonlinear problems? What is so called “kernel trick”? What are the commonly used kernels? 30
Next Week’s Class Talk Volunteers are required for next week’s class talk. Topic : SVM in Practice Hints: Applications Demos Multi-Class Problems Software A very popular toolbox: Libsvm Any other interesting topics beyond this lecture Length: 20 minutes plus question time 31