Lecture 13 – Perceptrons Machine Learning March 16, 2010.

Slides:



Advertisements
Similar presentations
Perceptron Lecture 4.
Advertisements

Neural networks Introduction Fitting neural networks
G53MLE | Machine Learning | Dr Guoping Qiu
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Support Vector Machines
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Supervised Learning Recap
Chapter 9 Perceptrons and their generalizations. Rosenblatt ’ s perceptron Proofs of the theorem Method of stochastic approximation and sigmoid approximation.
Separating Hyperplanes
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
CS Perceptrons1. 2 Basic Neuron CS Perceptrons3 Expanded Neuron.
Classification Neural Networks 1
Perceptron.
The loss function, the normal equation,
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Machine Learning Neural Networks
Overview over different methods – Supervised Learning
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Lecture 14 – Neural Networks
Artificial Intelligence Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)
CES 514 – Data Mining Lecture 8 classification (contd…)
September 21, 2010Neural Networks Lecture 5: The Perceptron 1 Supervised Function Approximation In supervised learning, we train an ANN with a set of vector.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Greg GrudicIntro AI1 Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm Greg Grudic.
September 23, 2010Neural Networks Lecture 6: Perceptron Learning 1 Refresher: Perceptron Training Algorithm Algorithm Perceptron; Start with a randomly.
MACHINE LEARNING 12. Multilayer Perceptrons. Neural Networks Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Artificial Neural Networks
CS 4700: Foundations of Artificial Intelligence
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Collaborative Filtering Matrix Factorization Approach
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Artificial Neural Networks
Classification Part 3: Artificial Neural Networks
1 Artificial Neural Networks Sanun Srisuk EECP0720 Expert Systems – Artificial Neural Networks.
Multi-Layer Perceptrons Michael J. Watts
Chapter 3 Neural Network Xiu-jun GONG (Ph. D) School of Computer Science and Technology, Tianjin University
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
Classification / Regression Neural Networks 2
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
CS 478 – Tools for Machine Learning and Data Mining Backpropagation.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Multi-Layer Perceptron
Non-Bayes classifiers. Linear discriminants, neural networks.
Linear Classification with Perceptrons
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Insight: Steal from Existing Supervised Learning Methods! Training = {X,Y} Error = target output – actual output.
Fundamentals of Artificial Neural Networks Chapter 7 in amlbook.com.
Chapter 2 Single Layer Feedforward Networks
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
Bab 5 Classification: Alternative Techniques Part 4 Artificial Neural Networks Based Classifer.
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
Machine Learning Supervised Learning Classification and Regression
Neural networks and support vector machines
Fall 2004 Backpropagation CS478 - Machine Learning.
CS 388: Natural Language Processing: Neural Networks
Predictive Learning from Data
Deep Feedforward Networks
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Classification Neural Networks 1
Collaborative Filtering Matrix Factorization Approach
CSC 578 Neural Networks and Deep Learning
Lecture Notes for Chapter 4 Artificial Neural Networks
CSCE833 Machine Learning Lecture 9 Linear Discriminant Analysis
Neural networks (1) Traditional multi-layer perceptrons
Presentation transcript:

Lecture 13 – Perceptrons Machine Learning March 16, 2010

Last Time Hidden Markov Models – Sequential modeling represented in a Graphical Model 2

Today Perceptrons Leading to – Neural Networks – aka Multilayer Perceptron Networks – But more accurately: Multilayer Logistic Regression Networks 3

Review: Fitting Polynomial Functions Fitting nonlinear 1-D functions Polynomial: Risk: 4

Review: Fitting Polynomial Functions Order-D polynomial regression for 1D variables is the same as D-dimensional linear regression Extend the feature vector from a scalar. More generally 5

Neuron inspires Regression Graphical Representation of linear regression – McCullough-Pitts Neuron – Note: Not a graphical model 6

Neuron inspires Regression Edges multiply the signal (x i ) by some weight (θ i ). Nodes sum inputs Equivalent to Linear Regression 7

Introducing Basis functions Graphical representation of feature extraction Edges multiply the signal by a weight. Nodes apply a function, ϕ d. 8

Extension to more features Graphical representation of feature extraction 9

Combining function How do we construct the neural output Linear Neuron 10

Combining function Sigmoid function or Squashing function Logistic Neuron Classification using the same metaphor 11

Logistic Neuron optimization Minimizing R(θ) is more difficult Bad News: There is no “closed-form” solution. Good News: It’s convex. 12

Aside: Convex Regions Convex: for any pair of points x a and x b within a region, every point x c on a line between x a and x b is in the region 13

Aside: Convex Functions Convex: for any pair of points x a and x b within a region, every point x c on a line between x a and x b is in the region 14

Aside: Convex Functions Convex: for any pair of points x a and x b within a region, every point x c on a line between x a and x b is in the region 15

Aside: Convex Functions Convex functions have a single maximum and minimum! How does this help us? (nearly) Guaranteed optimality of Gradient Descent 16

Gradient Descent The Gradient is defined (though we can’t solve directly) Points in the direction of fastest increase 17

Gradient Descent Gradient points in the direction of fastest increase To minimize R, move in the opposite direction 18

Gradient Descent Gradient points in the direction of fastest increase To minimize R, move in the opposite direction 19

Gradient Descent Initialize Randomly Update with small steps (nearly) guaranteed to converge to the minimum 20

Gradient Descent Initialize Randomly Update with small steps (nearly) guaranteed to converge to the minimum 21

Gradient Descent Initialize Randomly Update with small steps (nearly) guaranteed to converge to the minimum 22

Gradient Descent Initialize Randomly Update with small steps (nearly) guaranteed to converge to the minimum 23

Gradient Descent Initialize Randomly Update with small steps (nearly) guaranteed to converge to the minimum 24

Gradient Descent Initialize Randomly Update with small steps (nearly) guaranteed to converge to the minimum 25

Gradient Descent Initialize Randomly Update with small steps (nearly) guaranteed to converge to the minimum 26

Gradient Descent Initialize Randomly Update with small steps Can oscillate if η is too large 27

Gradient Descent Initialize Randomly Update with small steps Can oscillate if η is too large 28

Gradient Descent Initialize Randomly Update with small steps Can oscillate if η is too large 29

Gradient Descent Initialize Randomly Update with small steps Can oscillate if η is too large 30

Gradient Descent Initialize Randomly Update with small steps Can oscillate if η is too large 31

Gradient Descent Initialize Randomly Update with small steps Can oscillate if η is too large 32

Gradient Descent Initialize Randomly Update with small steps Can oscillate if η is too large 33

Gradient Descent Initialize Randomly Update with small steps Can oscillate if η is too large 34

Gradient Descent Initialize Randomly Update with small steps Can oscillate if η is too large 35

Gradient Descent Initialize Randomly Update with small steps Can stall if is ever 0 not at the minimum 36

Back to Neurons Linear Neuron Logistic Neuron 37

Classification squashing function Strictly classification error Perceptron 38

Classification Error Only count errors when a classification is incorrect. Sigmoid leads to greater than zero error on correct classifications 39

Classification Error Only count errors when a classification is incorrect. Perceptrons use strictly classification error 40

Perceptron Error Can’t do gradient descent on this. Perceptrons use strictly classification error 41

Perceptron Loss With classification loss: Define Perceptron Loss. – Loss calculated for each misclassified data point – Now piecewise linear risk rather than step function vs 42

Perceptron Loss Perceptron Loss. – Loss calculated for each misclassified data point – Nice gradient descent 43

Perceptron vs. Logistic Regression Logistic Regression has a hard time eliminating all errors – Errors: 2 Perceptrons often do better. Increased importance of errors. – Errors: 0 44

Stochastic Gradient Descent Instead of calculating the gradient over all points, and then updating. Update the gradient for each misclassified point at training Update rule: 45

Online Perceptron Training Online Training – Update weights for each data point. Iterate over points x i point, – If x i is correctly classified – Else Theorem: If x i in X are linearly separable, then this process will converge to a θ * which leads to zero error in a finite number of steps. 46

Linearly Separable Two classes of points are linearly separable, iff there exists a line such that all the points of one class fall on one side of the line, and all the points of the other class fall on the other side of the line 47

Linearly Separable Two classes of points are linearly separable, iff there exists a line such that all the points of one class fall on one side of the line, and all the points of the other class fall on the other side of the line 48

Next Time Multilayer Neural Networks 49