Greg GrudicIntro AI1 Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm Greg Grudic.

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

Lecture 9 Support Vector Machines
Perceptron Learning Rule
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
Classification / Regression Support Vector Machines
Support Vector Machines
Machine learning continued Image source:
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Supervised Learning Recap
Lecture 13 – Perceptrons Machine Learning March 16, 2010.
Separating Hyperplanes
Linear Discriminant Functions Wen-Hung Liao, 11/25/2008.
Classification Neural Networks 1
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Pattern Recognition and Machine Learning
Support Vector Machines and Kernel Methods
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
x – independent variable (input)
Linear Learning Machines  Simplest case: the decision function is a hyperplane in input space.  The Perceptron Algorithm: Rosenblatt, 1956  An on-line.
Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Artificial Intelligence Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)
Greg GrudicIntro AI1 Introduction to Artificial Intelligence CSCI 3202 Fall 2007 Introduction to Classification Greg Grudic.
Linear Learning Machines  Simplest case: the decision function is a hyperplane in input space.  The Perceptron Algorithm: Rosenblatt, 1956  An on-line.
L15:Microarray analysis (Classification). The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
Support Vector Machine (SVM) Classification
Support Vector Machines
Lecture 10: Support Vector Machines
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Greg GrudicIntro AI1 Introduction to Artificial Intelligence CSCI 3202 Fall 2007 The Goal of Classification Greg Grudic.
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
An Introduction to Support Vector Machines Martin Law.
Neural Networks Lecture 8: Two simple learning algorithms
Classification III Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell,
Biointelligence Laboratory, Seoul National University
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Support Vector Machine & Image Classification Applications
Machine Learning Lecture 11 Summary G53MLE | Machine Learning | Dr Guoping Qiu1.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
An Introduction to Support Vector Machines (M. Law)
Non-Bayes classifiers. Linear discriminants, neural networks.
CISC667, F05, Lec22, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines I.
Linear Classification with Perceptrons
Linear Models for Classification
CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions.
Chapter 2 Single Layer Feedforward Networks
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 7: Linear and Generalized Discriminant Functions.
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall Perceptron Rule and Convergence Proof Capacity.
Machine Learning in CSC 196K
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Artificial Intelligence Methods Neural Networks Lecture 3 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with.
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
A Brief Introduction to Support Vector Machine (SVM) Most slides were from Prof. A. W. Moore, School of Computer Science, Carnegie Mellon University.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Neural networks and support vector machines
CSSE463: Image Recognition Day 14
Deep Feedforward Networks
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
An Introduction to Support Vector Machines
LINEAR AND NON-LINEAR CLASSIFICATION USING SVM and KERNELS
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
Classification Slides by Greg Grudic, CSCI 3202 Fall 2007
CSSE463: Image Recognition Day 14
Support Vector Machines
CSSE463: Image Recognition Day 15
Linear Discrimination
Artificial Intelligence Chapter 3 Neural Networks
Presentation transcript:

Greg GrudicIntro AI1 Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm Greg Grudic

Intro AI2 Questions?

Greg GrudicIntro AI3 Binary Classification A binary classifier is a mapping from a set of d inputs to a single output which can take on one of TWO values In the most general setting Specifying the output classes as -1 and +1 is arbitrary! –Often done as a mathematical convenience

Greg GrudicIntro AI4 A Binary Classifier Classification Model Given learning data: A model is constructed:

Greg GrudicIntro AI5 Linear Separating Hyper-Planes

Greg GrudicIntro AI6 Linear Separating Hyper-Planes The Model: Where: The decision boundary:

Greg GrudicIntro AI7 Linear Separating Hyper-Planes The model parameters are: The hat on the betas means that they are estimated from the data Many different learning algorithms have been proposed for determining

Greg GrudicIntro AI8 Rosenblatt’s Preceptron Learning Algorithm Dates back to the 1950’s and is the motivation behind Neural Networks The algorithm: –Start with a random hyperplane –Incrementally modify the hyperplane such that points that are misclassified move closer to the correct side of the boundary –Stop when all learning examples are correctly classified

Greg GrudicIntro AI9 Rosenblatt’s Preceptron Learning Algorithm The algorithm is based on the following property: –Signed distance of any point to the boundary is: Therefore, if is the set of misclassified learning examples, we can push them closer to the boundary by minimizing the following

Greg GrudicIntro AI10 Rosenblatt’s Minimization Function This is classic Machine Learning! First define a cost function in model parameter space Then find an algorithm that modifies such that this cost function is minimized One such algorithm is Gradient Descent

Greg GrudicIntro AI11 Gradient Descent

Greg GrudicIntro AI12 The Gradient Descent Algorithm Where the learning rate is defined by:

Greg GrudicIntro AI13 The Gradient Descent Algorithm for the Perceptron Update One misclassified point at a time (online) Update all misclassified points at once (batch) Two Versions of the Perceptron Algorithm:

Greg GrudicIntro AI14 The Learning Data Matrix Representation of N learning examples of d dimensional inputs Training Data:

Greg GrudicIntro AI15 The Good Theoretical Properties of the Perceptron Algorithm If a solution exists the algorithm will always converge in a finite number of steps! Question: Does a solution always exist?

Greg GrudicIntro AI16 Linearly Separable Data Which of these datasets are separable by a linear boundary? + a) b)

Greg GrudicIntro AI17 Linearly Separable Data Which of these datasets are separable by a linear boundary? + a) b) Not Linearly Separable!

Greg GrudicIntro AI18 Bad Theoretical Properties of the Perceptron Algorithm If the data is not linearly separable, algorithm cycles forever! –Cannot converge! –This property “stopped” active research in this area between 1968 and 1984… Perceptrons, Minsky and Pappert, 1969 Even when the data is separable, there are infinitely many solutions –Which solution is best? When data is linearly separable, the number of steps to converge can be very large (depends on size of gap between classes)

Greg GrudicIntro AI19 What about Nonlinear Data? Data that is not linearly separable is called nonlinear data Nonlinear data can often be mapped into a nonlinear space where it is linearly separable

Greg GrudicIntro AI20 Nonlinear Models The Linear Model: The Nonlinear (basis function) Model: Examples of Nonlinear Basis Functions:

Greg GrudicIntro AI21 Linear Separating Hyper-Planes In Nonlinear Basis Function Space

Greg GrudicIntro AI22 An Example

Greg GrudicIntro AI23 Kernels as Nonlinear Transformations Polynomial Sigmoid Gaussian or Radial Basis Function (RBF)

Greg GrudicIntro AI24 The Kernel Model Training Data: The number of basis functions equals the number of training examples! - Unless some of the beta’s get set to zero…

Greg GrudicIntro AI25 Gram (Kernel) Matrix Training Data: Properties: Positive Definite Matrix Symmetric Positive on diagonal N by N

Greg GrudicIntro AI26 Picking a Model Structure? How do you pick the Kernels? –Kernel parameters These are called learning parameters or hyperparamters –Two approaches choosing learning paramters Bayesian –Learning parameters must maximize probability of correct classification on future data based on prior biases Frequentist –Use the training data to learn the model parameters –Use validation data to pick the best hyperparameters. More on learning parameter selection later

Greg GrudicIntro AI27 Perceptron Algorithm Convergence Two problems: –No convergence when data is not separable in basis function space –Gives infinitely many solutions when data is separable Can we modify the algorithm to fix these problems?