Neural Networks Reading: Chapter 3 in Nilsson text Lab: Work on assignment 2 (due 19th in class)

The Brain -- A Paradox l The brain consists of 10^11 neurons, each of which is connected to 10^4 neighbors l Each neuron is slow (1 millisecond to respond to new stimulus) l Yet, the brain is astonishingly fast at perceptual tasks (e.g. recognize a face) l The brain can also store and retrieve prodigious amounts of multi-media information l The brain uses a very different computing architecture: parallel distributed processing

A brief history of neural networks l 1960-65: Perceptron model developed by Rosenblatt l 1969: Perceptron model analyzed by Minksy and Papert l 1975: Werbos thesis at Harvard lays the roots for multi-layer neural networks l 1985: 2 vol. Book on parallel distributed processing l 1990: Neural networks enter mainstream applications (stock market, OCR, robotics)

ALVINN: A Neural Net that Drives a Truck (Pomerleau)

Object Recognition using Neural Networks (Georgios Theocharous et al.)

Network learns to predict direction and distance to trash can

Training Methods l Single threshold-logic units: n Perceptron training rule n Widrow-hoff method l Gradient descent methods l Arbitrary feedforward neural networks n Sigmoid activation function (smooth nonlinearity) n Backpropagation algorithm

 Threshold-logic Units w1 w2 wn Threshold (nonlinear) Out x1 x2 xn Inputs

TLU’s define hyperplane Denote the input vector by X = (x0,x1,x2,…,xn) (Note n+1 inputs, where x0 is always set to 1) Denote the weight vector by W = (w0,w1,…,wn) (note w0 is the threshold weight) Output = 0 if X*W < 0, else Output = 1 For n=2, this is the equation of a line in 2D w0 + w1*x1 + w2*x2 = 0 x2 = -(w1/w2)*x1 + (-w0/w2)

TLU in 2D x2 x1 ? Slope = ? ? x2 = -(w1/w2)*x1 + (-w0/w2)

Classifier Problem Matlab demop1 perceptron demo (neural net toolbox)

Initial Classifier

Final Classifier

Error Plot

Gradient Descent Methods Define the squared “error” to be D is labeled “training”set of input/output pairs Since the output of the TLU is a function of the weights W, we want to adjust the weights to minimize the squared error. But how to minimize the error? I.e., will increasing a weight increase the error, or reduce it? KEY IDEA: Compute the gradient of the error w.r.t. W

Error Gradients Each component of the gradient gives us the slope of the error function with respect to that weight. Given the gradient, we adjust each weight by the negative of the gradient (since we want to reduce the error).

Learning by Gradient Descent A boombox with many unlabeled volume & tone controls Sound output Which way to turn each control to reduce the volume? Antenna input

Error gradient computation

Threshold Units without thresholding Let the output of the unit be simply be the weighted sum. We ignore the thresholding, and train the weights such that an output of 1 is produced exactly, and an output of 0 is replaced by an output of -1. In this case, we get that The error gradient then becomes

Widrow-Hoff Training Method (Batch) 1. Set the weights to small random values (e.g. between -.1 and.1) 2. Set the learning rate  to some small fixed positive value (.1). 3. Repeat // until training error is low enough Set error = 0; diff = 0; For each training example e begin error = error + square(d[e] - o[e]); for j=0 to N do diff = diff + (d[e] - o[e])*x[j,e]; end for j=0 to N do w[j] = w[j] +  diff; // x[0,e] is always 1 until error < desired_value. 4. Store weight vector w in a file.

Widrow-Hoff Training Method (Incremental) 1. Set the weights to small random values (e.g. between -.1 and.1) 2. Set the learning rate  3. Repeat // until training error is low enough Set total squared error = 0; For each training example e begin for j=0 to N do w[j] = w[j] +  (d[e] - o[e])*x[j,e]; // x[0,e] is always 1 error = error + square(d[e] - o[e]); end until error < desired_value. 4. Store weight vector w in a file.

Smooth Nonlinearity Functions We would like to have smooth nonlinearities, which make it possible to compute gradients. Examples: 1. Sigmoid function: 2. Hyperbolic function: What is the gradient of the output o w.r.t. weighted sum s for these functions?

Training with Sigmoidal Units Note that for sigmoidal units (show this!) So, the training rule is (incremental version)

Neural Networks Reading: Chapter 3 in Nilsson text Lab: Work on assignment 2 (due 19th in class)

Similar presentations

Presentation on theme: "Neural Networks Reading: Chapter 3 in Nilsson text Lab: Work on assignment 2 (due 19th in class)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Neural Networks Reading: Chapter 3 in Nilsson text Lab: Work on assignment 2 (due 19th in class)

Similar presentations

Presentation on theme: "Neural Networks Reading: Chapter 3 in Nilsson text Lab: Work on assignment 2 (due 19th in class)"— Presentation transcript:

Similar presentations

About project

Feedback