Recap Finds the boundary with “maximum margin”

Recap Finds the boundary with “maximum margin”
Uses “slack variables” to deal with outliers Uses “kernels”, and the “kernel trick”, to solve nonlinear problems.

SVM error function = hinge loss + 1/margin
…where hinge loss (summed over datapoints) proportional to the inverse of the margin, 1/m

Slack variables (aka soft margins) slack penalty for using slack

2D 3D What about much more complicated data?
- project into high dimensional space, and solve with a linear model - project back to the original space, and the linear boundary becomes non-linear 2D 3D

The Kernel Trick

Slight rearrangement of our model – still equivalent though.
Remember matrix notation – this is a “dot product”

…our new feature space. Project into higher dimensional space…
x3 x2 x2 x1 x1 …our new feature space. BUT WHERE DO WE GET THIS FROM!?

The Representer Theorem
(Kimeldorf and Wahba, 1978) For a linear model, The optimal parameter vector is always a linear combination of the training examples…

The Kernel Trick, PART 1 Substitute this into our model….
Or, if with our hypothetical high dimensional feature space:

The Kernel Trick, PART 2 scalar value
Wouldn’t it be nice if we didn’t have to think up the ? And just skip straight to the scalar value we need directly…? ….If we had this, ….our model would look like this.

Kernels When d=2, the implicit feature space is:
For example…. The polynomial kernel When d=2, the implicit feature space is: But we never actually calculate it!

- project into high dimensional space, and solve with a linear model
- project back to the original space, and the linear boundary becomes non-linear 2D 3D

Polynomial kernel, with d=2

The Polynomial Kernel

The RBF (Gaussian) Kernel

Varying two things at once!

Summary of things…

SVMs versus Neural Networks
Started from solid theory Theory led to many extensions (SVMS for text, images, graphs) Almost no parameter tuning Highly efficient to train Single optimum Highly resistant to overfitting Neural Nets Started from bio-inspired heuristics Ended up at theory equivalent to statistics ideas Extensions only by biological analogy Good performance = lots of parameter tuning Computationally intensive to train Suffers from local optima Prone to overfitting

SVMs, done. Tonight… read chapter 4 while it’s still fresh.
Remember, by next week – read chapter 5.

Examples of Parameters obeying the Representer Theorem
10 5

10 p 5 n

n 10 p 5 n

We had before, for an x on the boundary:
And we just worked out: p Which gives us the expression for t … n The w and t are both linear functions of the examples.

Recap Finds the boundary with “maximum margin”

Similar presentations

Presentation on theme: "Recap Finds the boundary with “maximum margin”"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Recap Finds the boundary with “maximum margin”

Similar presentations

Presentation on theme: "Recap Finds the boundary with “maximum margin”"— Presentation transcript:

Similar presentations

About project

Feedback