Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fun with Hyperplanes: Perceptrons, SVMs, and Friends

Similar presentations


Presentation on theme: "Fun with Hyperplanes: Perceptrons, SVMs, and Friends"— Presentation transcript:

1 Fun with Hyperplanes: Perceptrons, SVMs, and Friends
Adapted from slides by Ryan Gabbard CIS 391 – Introduction to Artificial Intelligence

2 Universal Machine Learning Diagram
Naïve Bayes Classifiers are one example CIS Intro to AI

3 Generative v. Discriminative Models
Generative question: “How can we model the joint distribution of the classes and the features?” Why waste energy on stuff we don’t care about? Let’s optimize the job we’re trying to do directly! Discriminative question: “What features distinguish the classes from one another?” CIS Intro to AI

4 chart from MIT tech report #507, Tony Jebara
Example Modeling what sort of bizarre distribution produced these training points is hard, but distinguishing the classes is a piece of cake! chart from MIT tech report #507, Tony Jebara CIS Intro to AI

5 Linear Classification
CIS Intro to AI

6 Why bother with this weird representation?
Representing Lines How do we represent a line? In general a hyperplane is defined by Why bother with this weird representation? CIS Intro to AI

7 Projections alternate intuition: recall the dot product of two vectors is simply the product of their lengths and the cosine of the angle between them CIS Intro to AI

8 Now classification is easy!
But... how do we learn this mysterious model vector? CIS Intro to AI

9 Perceptron Learning Algorithm
CIS Intro to AI

10 Perceptron Update Example I
CIS Intro to AI

11 Perceptron Update Example II
CIS Intro to AI

12 Properties of the Simple Perceptron
You can prove that If it’s possible to separate the data with a hyperplane (i.e. if it’s linearly separable), Then the algorithm will converge to that hyperplane. But what if it isn’t? Then perceptron is very unstable and bounces all over the place CIS Intro to AI

13 Voted Perceptron Works just like a regular perceptron, except you keep track of all the intermediate models you created When you want to classify something, you let each of the (many, many) models vote on the answer and take the majority CIS Intro to AI

14 Properties of Voted Perceptron
Simple! Much better generalization performance than regular perceptron For later: (almost as good as SVMs) For later: Can use the ‘kernel trick’ Training as fast as regular perceptron But run-time is slower CIS Intro to AI

15 Averaged Perceptron Extremely simple!
Return as your final model the average of all your intermediate models Approximation to voted perceptron Nearly as fast to train and exactly as fast to run as regular perceptron CIS Intro to AI

16 What’s wrong with these hyperplanes?
CIS Intro to AI

17 They’re unjustifiably biased!
CIS Intro to AI

18 A less biased choice CIS Intro to AI

19 Margin The margin is the distance to closest point in the training data We tend to get better generalization to unseen data if we choose the separating hyperplane which maximizes the margin CIS Intro to AI

20 Support Vector Machines
Another learning method which explicitly calculates the maximum margin hyperplane by solving a gigantic quadratic programming problem. Generally considered the highest-performing current machine learning technique. But it’s relatively slow and very complicated. CIS Intro to AI

21 Margin-Infused Relaxed Algorithm (MIRA)
Multiclass; each class has a prototype vector Classify an instance by choosing the class whose prototype vector has the greatest dot product with the instance During training, when updating make the ‘smallest’ (in a sense) change to the prototype vectors which guarantees correct classification by a minimum margin Pays attention to the margin directly CIS Intro to AI

22 What if it isn’t separable?
CIS Intro to AI

23 Project it to someplace where it is!
CIS Intro to AI

24 Kernel Trick If our data isn’t linearly separable, we can define a projection to map it into a much higher dimensional feature space where it is. For some algorithms where everything can be expressed as the dot products of instances (SVM, voted perceptron, MIRA) this can be done efficiently using something called the `kernel trick’ CIS Intro to AI


Download ppt "Fun with Hyperplanes: Perceptrons, SVMs, and Friends"

Similar presentations


Ads by Google