Artificial Neural Networks ML 4.6-4.9 Paul Scheible.

Artificial Neural Networks ML 4.6-4.9 Paul Scheible

BackPropagation Algorithm

Convergence to Local Minima Performs well in many practical problems Local minima are less troubling than one might think  Multi-dimensional problems are unlikely to have a local minima in all dimensions  Any dimension without a local minimum provides an escape route  Starting with small initial weights tend to avoid local minima

Convergence to Local Minima Still no methods known to find when local minima will present a problem  Use momentum  Use stochastic gradient descent  Train several networks with data but different starting bias

Representational Power Boolean functions Continuous functions Arbitrary functions

Other Aspects Continuous hypothesis space Smooth interpolation inductive bias Able to determine its own internal configuration Avoiding overfitting  Weight decay  Cross validation

Face Recognition Example

Problem Identify face orientation from an image Training set of 640 images with resolution 120x128 pixels and 255 value grey scale Training set has varied backgrounds, clothing, expressions, and eye wear (sun glasses)‏

Design Choices Input encoding  Image reduced to 30x32 pixels  Pixels averaged to obtain reduced values  Pixel values scaled from 0 to 255 to 0 to 1 Output encoding  Four outputs  Each output corresponds to a face orientation

Design Choices Network graph structure  Acyclic  Two layer  Hidden units Five minutes to train with 3 units (chosen)‏ One hour to train with 30 units Learning rate η: 0.3 Momentum α: 0.3

Design Choices Full gradient descent Small random weights on output Zero weights on input

Advanced Topics

Alternate Error Functions Add penalty for weight magnitude  Prefers small magnitude vectors  Equivalent of weight decay

Alternate Error Functions Add term for error in slope  Requires knowledge of target function

Alternative Error Functions Minimize cross entropy Relate weights to each other by some design constraint

Alternative Error Minimization Procedures Line Search Conjugate gradient

Recurrent Networks Directed cyclic graphs Used to find recursive functions

Dynamic Modification of Network Structure Cascade-Correlation  Start with one layer  Add hidden node if error too great and retrain holding the hidden node weights constant  Add additional hidden nodes until error is acceptable  Can easily result in overfitting

Pruning Start with complex network Prune unneeded nodes  Weights close to zero  Nodes with little effect on output (better)‏

Artificial Neural Networks ML 4.6-4.9 Paul Scheible.

Similar presentations

Presentation on theme: "Artificial Neural Networks ML 4.6-4.9 Paul Scheible."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Artificial Neural Networks ML 4.6-4.9 Paul Scheible.

Similar presentations

Presentation on theme: "Artificial Neural Networks ML 4.6-4.9 Paul Scheible."— Presentation transcript:

Similar presentations

About project

Feedback