PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.

Slides:



Advertisements
Similar presentations
Classification Classification Examples
Advertisements

Regularization David Kauchak CS 451 – Fall 2013.
Data Mining and Machine Learning
Imbalanced data David Kauchak CS 451 – Fall 2013.
Linear Classifiers/SVMs
CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 20 Jim Martin.
SOFT LARGE MARGIN CLASSIFIERS David Kauchak CS 451 – Fall 2013.
Linear Separators.
SVM—Support Vector Machines
Machine Learning in Practice Lecture 7 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
PROBABILISTIC MODELS David Kauchak CS451 – Fall 2013.
PROBABILISTIC MODELS David Kauchak CS451 – Fall 2013.
CS Perceptrons1. 2 Basic Neuron CS Perceptrons3 Expanded Neuron.
Linear Separators.
Simple Neural Nets For Pattern Classification
x – independent variable (input)
Linear Learning Machines  Simplest case: the decision function is a hyperplane in input space.  The Perceptron Algorithm: Rosenblatt, 1956  An on-line.
Announcements See Chapter 5 of Duda, Hart, and Stork. Tutorial by Burge linked to on web page. “Learning quickly when irrelevant attributes abound,” by.
Linear Separators. Bankruptcy example R is the ratio of earnings to expenses L is the number of late payments on credit cards over the past year. We will.
Linear Learning Machines  Simplest case: the decision function is a hyperplane in input space.  The Perceptron Algorithm: Rosenblatt, 1956  An on-line.
Support Vector Machines
Linear Discriminant Functions Chapter 5 (Duda et al.)
Today’s quiz on 8.2 B Graphing Worksheet 2 will be given at the end of class. You will have 12 minutes to complete this quiz, which will consist of one.
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Rugby players, Ballet dancers, and the Nearest Neighbour Classifier COMP24111 lecture 2.
Neural Networks Lecture 8: Two simple learning algorithms
EVALUATION David Kauchak CS 451 – Fall Admin Assignment 3 - change constructor to take zero parameters - instead, in the train method, call getFeatureIndices()
Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?
GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013.
ENSEMBLE LEARNING David Kauchak CS451 – Fall 2013.
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
ADVANCED CLASSIFICATION TECHNIQUES David Kauchak CS 159 – Fall 2014.
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 25: Perceptrons; # of regions;
Machine Learning CSE 681 CH2 - Supervised Learning.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
Classification: Feature Vectors
LOGISTIC REGRESSION David Kauchak CS451 – Fall 2013.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
BOOSTING David Kauchak CS451 – Fall Admin Final project.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
LARGE MARGIN CLASSIFIERS David Kauchak CS 451 – Fall 2013.
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
ADVANCED PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.
Text Classification 2 David Kauchak cs459 Fall 2012 adapted from:
1 Chapter 6. Classification and Prediction Overview Classification algorithms and methods Decision tree induction Bayesian classification Lazy learning.
Please CLOSE YOUR LAPTOPS, and turn off and put away your cell phones, and get out your note- taking materials.
1. Put in slope-intercept form: 3x – 4y = Graph the line: y = -1/2 x + 3.
Linear Discrimination Reading: Chapter 2 of textbook.
CISC667, F05, Lec22, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines I.
Lecture 4 Linear machine
Linear Models for Classification
CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions.
Support Vector Machines and Gene Function Prediction Brown et al PNAS. CS 466 Saurabh Sinha.
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
1 Learning Bias & Clustering Louis Oliphant CS based on slides by Burr H. Settles.
CSSE463: Image Recognition Day 14 Lab due Weds. Lab due Weds. These solutions assume that you don't threshold the shapes.ppt image: Shape1: elongation.
Page 1 CS 546 Machine Learning in NLP Review 1: Supervised Learning, Binary Classifiers Dan Roth Department of Computer Science University of Illinois.
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
Linear Discriminant Functions Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
Gradient descent David Kauchak CS 158 – Fall 2016.
Large Margin classifiers
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
CLOSE Please YOUR LAPTOPS, and get out your note-taking materials.
CS 4/527: Artificial Intelligence
CS 188: Artificial Intelligence
Linear Classifier by Dr
CS480/680: Intro to ML Lecture 01: Perceptron 9/11/18 Yao-Liang Yu.
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Presentation transcript:

PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013

Admin Assignment 1 solution available online Assignment 2 Due Sunday at midnight Assignment 2 competition site setup later today

Machine learning models Some machine learning approaches make strong assumptions about the data  If the assumptions are true this can often lead to better performance  If the assumptions aren’t true, they can fail miserably Other approaches don’t make many assumptions about the data  This can allow us to learn from more varied data  But, they are more prone to overfitting  and generally require more training data

What is the data generating distribution?

Actual model

Model assumptions If you don’t have strong assumptions about the model, it can take you a longer to learn Assume now that our model of the blue class is two circles

What is the data generating distribution?

Actual model

What is the data generating distribution? Knowing the model beforehand can drastically improve the learning and the number of examples required

What is the data generating distribution?

Make sure your assumption is correct, though!

Machine learning models What were the model assumptions (if any) that k-NN and decision trees make about the data? Are there data sets that could never be learned correctly by either?

k-NN model K = 1

Decision tree model label 1 label 2 label 3 Axis-aligned splits/cuts of the data

Bias The “bias” of a model is how strong the model assumptions are. low-bias classifiers make minimal assumptions about the data (k-NN and DT are generally considered low bias high-bias classifiers make strong assumptions about the data

Linear models A strong high-bias assumption is linear separability:  in 2 dimensions, can separate classes by a line  in higher dimensions, need hyperplanes A linear model is a model that assumes the data is linearly separable

Hyperplanes A hyperplane is line/plane in a high dimensional space What defines a line? What defines a hyperplane?

Defining a line Any pair of values (w 1,w 2 ) defines a line through the origin: f1f1 f2f2

Defining a line Any pair of values (w 1,w 2 ) defines a line through the origin: f1f1 f2f2

Defining a line Any pair of values (w 1,w 2 ) defines a line through the origin: f1f1 f2f2

Defining a line Any pair of values (w 1,w 2 ) defines a line through the origin: We can also view it as the line perpendicular to the weight vector w=(1,2) (1,2) f1f1 f2f2

Classifying with a line w=(1,2) Mathematically, how can we classify points based on a line? BLUE RED (1,1) (1,-1) f1f1 f2f2

Classifying with a line w=(1,2) Mathematically, how can we classify points based on a line? BLUE RED (1,1) (1,-1) (1,1): (1,-1): The sign indicates which side of the line f1f1 f2f2

Defining a line Any pair of values (w 1,w 2 ) defines a line through the origin: How do we move the line off of the origin? f1f1 f2f2

Defining a line Any pair of values (w 1,w 2 ) defines a line through the origin: f1f1 f2f2

Defining a line Any pair of values (w 1,w 2 ) defines a line through the origin: f1f1 f2f2 Now intersects at -1

Linear models A linear model in n-dimensional space (i.e. n features) is define by n+1 weights: In two dimensions, a line: In three dimensions, a plane: In n-dimensions, a hyperplane (where b = -a)

Classifying with a linear model We can classify with a linear model by checking the sign: Negative example Positive example classifier f 1, f 2, …, f n

Learning a linear model Geometrically, we know what a linear model represents Given a linear model (i.e. a set of weights and b) we can classify examples Training Data (data with labels) learn How do we learn a linear model?

Positive or negative? NEGATIVE

Positive or negative? NEGATIVE

Positive or negative? POSITIVE

Positive or negative? NEGATIVE

Positive or negative? POSITIVE

Positive or negative? POSITIVE

Positive or negative? NEGATIVE

Positive or negative? POSITIVE

A method to the madness blue = positive yellow triangles = positive all others negative How is this learning setup different than the learning we’ve done before? When might this arise?

Online learning algorithm learn Only get to see one example at a time! 0 Labeled data

Online learning algorithm learn Only get to see one example at a time! 0 0 Labeled data

Online learning algorithm learn Only get to see one example at a time! 0 0 Labeled data 1

Online learning algorithm learn Only get to see one example at a time! 0 0 Labeled data 1 1

Online learning algorithm learn Only get to see one example at a time! 0 0 Labeled data 1 1 …

Learning a linear classifier f1f1 f2f2 w=(1,0) What does this model currently say?

Learning a linear classifier f1f1 f2f2 w=(1,0) POSITIVE NEGATIVE

Learning a linear classifier f1f1 f2f2 w=(1,0) (-1,1) Is our current guess: right or wrong?

Learning a linear classifier f1f1 f2f2 w=(1,0) (-1,1) predicts negative, wrong How should we update the model?

Learning a linear classifier f1f1 f2f2 w=(1,0) (-1,1) Should move this direction

A closer look at why we got it wrong w1w1 w2w2 We’d like this value to be positive since it’s a positive value (-1, 1, positive) Which of these contributed to the mistake?

A closer look at why we got it wrong w1w1 w2w2 We’d like this value to be positive since it’s a positive value (-1, 1, positive) contributed in the wrong direction could have contributed (positive feature), but didn’t How should we change the weights?

A closer look at why we got it wrong w1w1 w2w2 We’d like this value to be positive since it’s a positive value (-1, 1, positive) contributed in the wrong direction could have contributed (positive feature), but didn’t decreaseincrease 1 -> 00 -> 1

Learning a linear classifier f1f1 f2f2 w=(0,1) (-1,1) Graphically, this also makes sense!

Learning a linear classifier f1f1 f2f2 w=(0,1) (1,-1) Is our current guess: right or wrong?

Learning a linear classifier f1f1 f2f2 w=(0,1) (1,-1) predicts negative, correct How should we update the model?

Learning a linear classifier f1f1 f2f2 w=(0,1) (1,-1) Already correct… don’t change it!

Learning a linear classifier f1f1 f2f2 w=(0,1) (-1,-1) Is our current guess: right or wrong?

Learning a linear classifier f1f1 f2f2 w=(0,1) (-1,-1) predicts negative, wrong How should we update the model?

Learning a linear classifier f1f1 f2f2 w=(0,1) (-1,-1) Should move this direction

A closer look at why we got it wrong w1w1 w2w2 We’d like this value to be positive since it’s a positive value (-1, -1, positive) Which of these contributed to the mistake?

A closer look at why we got it wrong w1w1 w2w2 We’d like this value to be positive since it’s a positive value (-1, -1, positive) didn’t contribute, but could have contributed in the wrong direction How should we change the weights?

A closer look at why we got it wrong w1w1 w2w2 We’d like this value to be positive since it’s a positive value (-1, -1, positive) didn’t contribute, but could have contributed in the wrong direction decrease 0 -> -11 -> 0

Learning a linear classifier f1f1 f2f2 f 1, f 2, label -1,-1, positive -1, 1, positive 1, 1, negative 1,-1, negative w=(-1,0)