Announcements 1. Textbook will be on reserve at library 2. Topic schedule change; modified reading assignment: This week: Linear discrimination, evaluating.

Slides:



Advertisements
Similar presentations
Multi-Layer Perceptron (MLP)
Advertisements

1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
G53MLE | Machine Learning | Dr Guoping Qiu
Perceptron Learning Rule
NEURAL NETWORKS Perceptron
also known as the “Perceptron”
1 Neural networks. Neural networks are made up of many artificial neurons. Each input into the neuron has its own weight associated with it illustrated.
Machine Learning Neural Networks
Overview over different methods – Supervised Learning
Neural Networks.
Connectionist models. Connectionist Models Motivated by Brain rather than Mind –A large number of very simple processing elements –A large number of weighted.
Introduction to Neural Networks John Paxton Montana State University Summer 2003.
The Perceptron CS/CMPE 333 – Neural Networks. CS/CMPE Neural Networks (Sp 2002/2003) - Asim LUMS2 The Perceptron – Basics Simplest and one.
20.5 Nerual Networks Thanks: Professors Frank Hoffmann and Jiawei Han, and Russell and Norvig.
Biological neuron artificial neuron.
An Illustrative Example
Carla P. Gomes CS4700 CS 4700: Foundations of Artificial Intelligence Prof. Carla P. Gomes Module: Neural Networks Expressiveness.
Machine Learning Motivation for machine learning How to set up a problem How to design a learner Introduce one class of learners (ANN) –Perceptrons –Feed-forward.
Support Vector Machines Classification
Data Mining with Neural Networks (HK: Chapter 7.5)
LOGO Classification III Lecturer: Dr. Bo Yuan
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
CS 4700: Foundations of Artificial Intelligence
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Computer Science and Engineering
Artificial Neural Networks
Neural NetworksNN 11 Neural netwoks thanks to: Basics of neural network theory and practice for supervised and unsupervised.
Chapter 9 Neural Network.
ANNs (Artificial Neural Networks). THE PERCEPTRON.
1 Machine Learning The Perceptron. 2 Heuristic Search Knowledge Based Systems (KBS) Genetic Algorithms (GAs)
Artificial Intelligence Techniques Multilayer Perceptrons.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
CS 478 – Tools for Machine Learning and Data Mining Backpropagation.
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 31: Feedforward N/W; sigmoid.
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 30: Perceptron training convergence;
Linear Discrimination Reading: Chapter 2 of textbook.
Linear Classification with Perceptrons
CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions.
CS621 : Artificial Intelligence
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 32: sigmoid neuron; Feedforward.
CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 21: Perceptron training and convergence.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
Artificial Neural Network
Multilayer Neural Networks (sometimes called “Multilayer Perceptrons” or MLPs)
CAP6938 Neuroevolution and Artificial Embryogeny Neural Network Weight Optimization Dr. Kenneth Stanley January 18, 2006.
Perceptrons Michael J. Watts
Previous Lecture Perceptron W  t+1  W  t  t  d(t) - sign (w(t)  x)] x Adaline W  t+1  W  t  t  d(t) - f(w(t)  x)] f’ x Gradient.
Artificial Intelligence CIS 342 The College of Saint Rose David Goldschmidt, Ph.D.
Chapter 6 Neural Network.
Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with.
Start with student evals. What function does perceptron #4 represent?
Evaluating Classifiers. Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
Pattern Recognition Lecture 20: Neural Networks 3 Dr. Richard Spillman Pacific Lutheran University.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Neural networks.
Linear separability Hyperplane In 2D: Feature 1 Feature 2 A perceptron can separate data that is linearly separable.
Linear separability Hyperplane In 2D: Feature 1 Feature 2 A perceptron can separate data that is linearly separable.
Classification with Perceptrons Reading:
Multi-layer Perceptrons (MLPs)
CSSE463: Image Recognition Day 17
CS621: Artificial Intelligence
network of simple neuron-like computing elements
Neural Networks Chapter 5
CSSE463: Image Recognition Day 17
CSSE463: Image Recognition Day 17
CSSE463: Image Recognition Day 17
CSSE463: Image Recognition Day 17
CS621: Artificial Intelligence Lecture 22-23: Sigmoid neuron, Backpropagation (Lecture 20 and 21 taken by Anup on Graphical Models) Pushpak Bhattacharyya.
Outline Announcement Neural networks Perceptrons - continued
Presentation transcript:

Announcements 1. Textbook will be on reserve at library 2. Topic schedule change; modified reading assignment: This week: Linear discrimination, evaluating classifiers Extra reading: T. Fawcett, An introduction to ROC analysis, Sections 1-4 (linked from class web page) 3. No class Monday (MLK day) 4. Guest lecture Wednesday: Josh Hugues on multi-layer perceptrons

Perceptrons as simple neural networks w1w1 w2w2 wnwn o w0w0 +1 x1x1 x2x2 xnxn

Geometry of the perceptron Feature 1 Feature 2 Hyperplane In 2d:

Work with one neighbor on this: (a) Find weights (w 0, w 1, w 2 ) for a perceptron that separates “true” and “false” in x 1  x 2. Find the slope and intercept, and sketch the separation line defined by this discriminant, showing that it separates the points correctly. (b) Do the same, but for x 1  x 2. (c) What (if anything) might make one separation line better than another? In-class exercise

Training a perceptron 1.Start with random weights, w = (w 1, w 2,..., w n ). 2.Select training example (x k, t k ). 3.Run the perceptron with input x k and weights w to obtain o. 4.Let  be the learning rate (a user-set parameter). Now, 5.Go to 2.

In-class exercise S = {((0,0), -1), ((0,1), 1), ((1,1), 1)} Let w = {w 0, w 1, w 2 ) = {0.1, 0.1, −0.3} 1. Calculate new perceptron weights after each training example is processed. Let η = What is accuracy on training data after one epoch of training? Did the accuracy improve? o +1 x1x1 0.1 x2x2 −0.3 Perceptron learning rule:

Homework 1 summary w1w1 w2w2 w 64 o w0w0 +1 x1x1 x2x2 x 64 x 1,..., x 64, 8 x 1,..., x 64, 0 Training data: 8 vs Train perceptron: 8 vs Evaluate perceptron : 8 vs. 0 x 1,..., x 64, 8 x 1,..., x 64, 0 Training data: 8 vs Calculate accuracy on training data x 1,..., x 64, 8 x 1,..., x 64, 0 Test data: 8 vs Calculate accuracy on test data Confusion matrix: 8 vs Actual Predicted Give confusion matrix for test data

Homework 1 summary w1w1 w2w2 w 64 o w0w0 +1 x1x1 x2x2 x 64 x 1,..., x 64, 8 x 1,..., x 64, 1 Training data: 8 vs Train perceptron: 8 vs Evaluate perceptron : 8 vs. 1 x 1,..., x 64, 8 x 1,..., x 64, 1 Training data: 8 vs Calculate accuracy on training data x 1,..., x 64, 8 x 1,..., x 64, 1 Test data: 8 vs Calculate accuracy on test data Confusion matrix: 8 vs Actual Predicted Give confusion matrix for test data

Questions on HW What should the “threshold value” be? What should the target and output values look like? The assignment says we will train 10 separate perceptrons; shouldn’t this be 9?

1960s: Rosenblatt proved that the perceptron learning rule converges to correct weights in a finite number of steps, provided the training examples are linearly separable. 1969: Minsky and Papert proved that perceptrons cannot represent non-linearly separable target functions. However, they proved that any transformation can be carried out by adding a fully connected hidden layer.

XOR function x1x1 x2x2 x 1 XOR x x1x1 x2x2

Decision regions of a multilayer feedforward network. The network was trained to recognize 1 of 10 vowel sounds occurring in the context “h_d” (e.g., “had”, “hid”) The network input consists of two parameters, F1 and F2, obtained from a spectral analysis of the sound. The 10 network outputs correspond to the 10 possible vowel sounds. (From T. M. Mitchell, Machine Learning ) Multi-layer perceptron example

Good news: Adding hidden layer allows more target functions to be represented. Bad news: No algorithm for learning in multi-layered networks, and no convergence theorem! Quote from Minsky and Papert’s book, Perceptrons (1969): “[The perceptron] has many features to attract attention: its linearity; its intriguing learning theorem; its clear paradigmatic simplicity as a kind of parallel computation. There is no reason to suppose that any of these virtues carry over to the many-layered version. Nevertheless, we consider it to be an important research problem to elucidate (or reject) our intuitive judgment that the extension is sterile.”

Two major problems they saw were: 1.How can the learning algorithm apportion credit (or blame) to individual weights for incorrect classifications depending on a (sometimes) large number of weights? 2.How can such a network learn useful higher-order features? Good news: Successful credit-apportionment learning algorithms developed soon afterwards (e.g., back- propagation). Still successful, in spite of lack of convergence theorem.