Download presentation
Presentation is loading. Please wait.
1
Artificial Neural Networks ML 4.6-4.9 Paul Scheible
2
BackPropagation Algorithm
3
Convergence to Local Minima Performs well in many practical problems Local minima are less troubling than one might think Multi-dimensional problems are unlikely to have a local minima in all dimensions Any dimension without a local minimum provides an escape route Starting with small initial weights tend to avoid local minima
4
Convergence to Local Minima Still no methods known to find when local minima will present a problem Use momentum Use stochastic gradient descent Train several networks with data but different starting bias
5
Representational Power Boolean functions Continuous functions Arbitrary functions
6
Other Aspects Continuous hypothesis space Smooth interpolation inductive bias Able to determine its own internal configuration Avoiding overfitting Weight decay Cross validation
7
Face Recognition Example
8
Problem Identify face orientation from an image Training set of 640 images with resolution 120x128 pixels and 255 value grey scale Training set has varied backgrounds, clothing, expressions, and eye wear (sun glasses)
9
Design Choices Input encoding Image reduced to 30x32 pixels Pixels averaged to obtain reduced values Pixel values scaled from 0 to 255 to 0 to 1 Output encoding Four outputs Each output corresponds to a face orientation
10
Design Choices Network graph structure Acyclic Two layer Hidden units Five minutes to train with 3 units (chosen) One hour to train with 30 units Learning rate η: 0.3 Momentum α: 0.3
11
Design Choices Full gradient descent Small random weights on output Zero weights on input
12
Advanced Topics
13
Alternate Error Functions Add penalty for weight magnitude Prefers small magnitude vectors Equivalent of weight decay
14
Alternate Error Functions Add term for error in slope Requires knowledge of target function
15
Alternative Error Functions Minimize cross entropy Relate weights to each other by some design constraint
16
Alternative Error Minimization Procedures Line Search Conjugate gradient
17
Recurrent Networks Directed cyclic graphs Used to find recursive functions
18
Dynamic Modification of Network Structure Cascade-Correlation Start with one layer Add hidden node if error too great and retrain holding the hidden node weights constant Add additional hidden nodes until error is acceptable Can easily result in overfitting
19
Pruning Start with complex network Prune unneeded nodes Weights close to zero Nodes with little effect on output (better)
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.