Download presentation
Presentation is loading. Please wait.
Published byMariah Hamilton Modified over 8 years ago
1
Learning: Neural Networks Artificial Intelligence CMSC 25000 February 3, 2005
2
Roadmap Neural Networks –Motivation: Overcoming perceptron limitations –Motivation: ALVINN –Heuristic Training Backpropagation; Gradient descent Avoiding overfitting Avoiding local minima –Conclusion: Teaching a Net to talk
3
Perceptron Summary Motivated by neuron activation Simple training procedure Guaranteed to converge –IF linearly separable
4
Neural Nets Multi-layer perceptrons –Inputs: real-valued –Intermediate “hidden” nodes –Output(s): one (or more) discrete-valued X1 X2 X3 X4 InputsHidden Outputs Y1 Y2
5
Neural Nets Pro: More general than perceptrons –Not restricted to linear discriminants –Multiple outputs: one classification each Con: No simple, guaranteed training procedure –Use greedy, hill-climbing procedure to train –“Gradient descent”, “Backpropagation”
6
Solving the XOR Problem x1x1 w 13 w 11 w 21 o2o2 o1o1 w 12 y w 03 w 22 x2x2 w 23 w 02 w 01 Network Topology: 2 hidden nodes 1 output Desired behavior: x1 x2 o1 o2 y 0 0 0 0 0 1 0 0 1 1 0 1 0 1 1 1 1 1 1 0 Weights: w11= w12=1 w21=w22 = 1 w01=3/2; w02=1/2; w03=1/2 w13=-1; w23=1
7
Neural Net Applications Speech recognition Handwriting recognition NETtalk: Letter-to-sound rules ALVINN: Autonomous driving
8
ALVINN Driving as a neural network Inputs: –Image pixel intensities I.e. lane lines 5 Hidden nodes Outputs: –Steering actions E.g. turn left/right; how far Training: –Observe human behavior: sample images, steering
9
Backpropagation Greedy, Hill-climbing procedure –Weights are parameters to change –Original hill-climb changes one parameter/step Slow –If smooth function, change all parameters/step Gradient descent –Backpropagation: Computes current output, works backward to correct error
10
Producing a Smooth Function Key problem: –Pure step threshold is discontinuous Not differentiable Solution: –Sigmoid (squashed ‘s’ function): Logistic fn
11
Neural Net Training Goal: –Determine how to change weights to get correct output Large change in weight to produce large reduction in error Approach: Compute actual output: o Compare to desired output: d Determine effect of each weight w on error = d-o Adjust weights
12
Neural Net Example y3y3 w 03 w 23 z3z3 z2z2 w 02 w 22 w 21 w 12 w11w11 w 01 z1z1 x1x1 x2x2 w 13 y1y1 y2y2 xi : ith sample input vector w : weight vector yi*: desired output for ith sample Sum of squares error over training samples z3z3 z1z1 z2z2 Full expression of output in terms of input and weights - From 6.034 notes lozano-perez
13
Gradient Descent Error: Sum of squares error of inputs with current weights Compute rate of change of error wrt each weight –Which weights have greatest effect on error? –Effectively, partial derivatives of error wrt weights In turn, depend on other weights => chain rule
14
Gradient Descent E = G(w) –Error as function of weights Find rate of change of error –Follow steepest rate of change –Change weights s.t. error is minimized E w G(w) dG dw Local minima w0w1
15
MIT AI lecture notes, Lozano- Perez 2000 Gradient of Error z3z3 z1z1 z2z2 y3y3 w 03 w 23 z3z3 z2z2 w 02 w 22 w 21 w 12 w11w11 w 01 z1z1 x1x1 x2x2 w 13 y1y1 y2y2 Note: Derivative of sigmoid: ds(z1) = s(z1)(1-s(z1)) dz1 - From 6.034 notes lozano-perez
16
From Effect to Update Gradient computation: –How each weight contributes to performance To train: –Need to determine how to CHANGE weight based on contribution to performance –Need to determine how MUCH change to make per iteration Rate parameter ‘r’ –Large enough to learn quickly –Small enough reach but not overshoot target values
17
Backpropagation Procedure Pick rate parameter ‘r’ Until performance is good enough, –Do forward computation to calculate output –Compute Beta in output node with –Compute Beta in all other nodes with –Compute change for all weights with i j k
18
Backprop Example y3y3 w 03 w 23 z3z3 z2z2 w 02 w 22 w 21 w 12 w 11 w 01 z1z1 x1x1 x2x2 w 13 y1y1 y2y2 Forward prop: Compute z i and y i given x k, w l
19
Backpropagation Observations Procedure is (relatively) efficient –All computations are local Use inputs and outputs of current node What is “good enough”? –Rarely reach target (0 or 1) outputs Typically, train until within 0.1 of target
20
Neural Net Summary Training: –Backpropagation procedure Gradient descent strategy (usual problems) Prediction: –Compute outputs based on input vector & weights Pros: Very general, Fast prediction Cons: Training can be VERY slow (1000’s of epochs), Overfitting
21
Training Strategies Online training: –Update weights after each sample Offline (batch training): –Compute error over all samples Then update weights Online training “noisy” –Sensitive to individual instances –However, may escape local minima
22
Training Strategy To avoid overfitting: –Split data into: training, validation, & test Also, avoid excess weights (less than # samples) Initialize with small random weights –Small changes have noticeable effect Use offline training –Until validation set minimum Evaluate on test set –No more weight changes
23
Classification Neural networks best for classification task –Single output -> Binary classifier –Multiple outputs -> Multiway classification Applied successfully to learning pronunciation –Sigmoid pushes to binary classification Not good for regression
24
Neural Net Example NETtalk: Letter-to-sound by net Inputs: –Need context to pronounce 7-letter window: predict sound of middle letter 29 possible characters – alphabet+space+,+. –7*29=203 inputs 80 Hidden nodes Output: Generate 60 phones –Nodes map to 26 units: 21 articulatory, 5 stress/sil Vector quantization of acoustic space
25
Neural Net Example: NETtalk Learning to talk: –5 iterations/1024 training words: bound/stress –10 iterations: intelligible –400 new test words: 80% correct Not as good as DecTalk, but automatic
26
Neural Net Conclusions Simulation based on neurons in brain Perceptrons (single neuron) –Guaranteed to find linear discriminant IF one exists -> problem XOR Neural nets (Multi-layer perceptrons) –Very general –Backpropagation training procedure Gradient descent - local min, overfitting issues
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.