Back Propagation and Representation in PDP Networks Psychology 209 February 6, 2013.

Slides:



Advertisements
Similar presentations
Multi-Layer Perceptron (MLP)
Advertisements

NEURAL NETWORKS Backpropagation Algorithm
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
1 Neural networks. Neural networks are made up of many artificial neurons. Each input into the neuron has its own weight associated with it illustrated.
Artificial Intelligence 13. Multi-Layer ANNs Course V231 Department of Computing Imperial College © Simon Colton.
Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.
Financial Informatics –XVI: Supervised Backpropagation Learning
Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
The back-propagation training algorithm
Back-Propagation Algorithm
Chapter 6: Multilayer Neural Networks
Lecture 4 Neural Networks ICS 273A UC Irvine Instructor: Max Welling Read chapter 4.
Data Mining with Neural Networks (HK: Chapter 7.5)
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
September 28, 2010Neural Networks Lecture 7: Perceptron Modifications 1 Adaline Schematic Adjust weights i1i1i1i1 i2i2i2i2 inininin …  w 0 + w 1 i 1 +
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
CSC2535: 2013 Advanced Machine Learning Lecture 3a: The Origin of Variational Bayes Geoffrey Hinton.
Neural Networks Lecture 8: Two simple learning algorithms
Using Backprop to Understand Apects of Cognitive Development PDP Class Feb 8, 2010.
Artificial Neural Networks
Artificial Neural Networks
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Lecture 3 Introduction to Neural Networks and Fuzzy Logic President UniversityErwin SitompulNNFL 3/1 Dr.-Ing. Erwin Sitompul President University
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 16: NEURAL NETWORKS Objectives: Feedforward.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
The Boltzmann Machine Psych 419/719 March 1, 2001.
A note about gradient descent: Consider the function f(x)=(x-x 0 ) 2 Its derivative is: By gradient descent. x0x0 + -
Multi-Layer Perceptron
Non-Bayes classifiers. Linear discriminants, neural networks.
Neural Networks and Backpropagation Sebastian Thrun , Fall 2000.
ADALINE (ADAptive LInear NEuron) Network and
CS621 : Artificial Intelligence
Fundamentals of Artificial Neural Networks Chapter 7 in amlbook.com.
Introduction to Neural Networks Introduction to Neural Networks Applied to OCR and Speech Recognition An actual neuron A crude model of a neuron Computational.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
Backpropagation Training
EEE502 Pattern Recognition
Chapter 8: Adaptive Networks
Hazırlayan NEURAL NETWORKS Backpropagation Network PROF. DR. YUSUF OYSAL.
Neural Networks 2nd Edition Simon Haykin
Previous Lecture Perceptron W  t+1  W  t  t  d(t) - sign (w(t)  x)] x Adaline W  t+1  W  t  t  d(t) - f(w(t)  x)] f’ x Gradient.
Chapter 6 Neural Network.
Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Back Propagation and Representation in PDP Networks
Neural networks.
Back Propagation and Representation in PDP Networks
Deep Feedforward Networks
The Gradient Descent Algorithm
第 3 章 神经网络.
Ranga Rodrigo February 8, 2014
A Simple Artificial Neuron
CSE 473 Introduction to Artificial Intelligence Neural Networks
Derivation of a Learning Rule for Perceptrons
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
CSE P573 Applications of Artificial Intelligence Neural Networks
CSE 573 Introduction to Artificial Intelligence Neural Networks
Neural Networks Chapter 5
Backpropagation.
Backpropagation.
Back Propagation and Representation in PDP Networks
CS621: Artificial Intelligence Lecture 22-23: Sigmoid neuron, Backpropagation (Lecture 20 and 21 taken by Anup on Graphical Models) Pushpak Bhattacharyya.
Artificial Neural Networks / Spring 2002
Outline Announcement Neural networks Perceptrons - continued
Presentation transcript:

Back Propagation and Representation in PDP Networks Psychology 209 February 6, 2013

Homework 4 Part 1 due Feb 13 –Complete Exercises 5.1 and 5.2. –It may be helpful to carry out some explorations of parameters, as suggested in Exercise 5.3. This may help you achieve a solution in the last part of the homework, below. However, no write-up is required for this. Part 2 due Feb 20 –Consult Chapter 8 of the PDP Book by Rumelhart, Hinton, and Williams (In readings directory for Feb 6). Consider the problems described there that were solved using back propagation, and choose one; or create a problem of your own to investigate with back propagation. –Carry out Exercise 5.4, creating your own network, template, pattern, and startup file (similar to bpxor.m), and answer question

The Perceptron For input pattern p, teacher t p and output o p, change the threshold And weights as follows: Note: including bias = - in net and using threshold of 0, then treating bias as a weight from a unit that is always on is equivalent

AND, OR, XOR

Adding a unit to make XOR solvable

Gradient Descent Learning in the ‘LMS’ Associator Output is a linear function of inputs and weights: Find a learning rule to minimize the Summed squared Error: Taking derivatives, we find: Consider the policy: This breaks down into the sum over patterns of terms of the form:

Error Surface for OR function in LMS Associator

What if we want to learn how to solve xor? We need to figure out how to adjust the weights into the ‘hidden’ unit, following the principle of gradient descent:

We start with an even simpler problem Assume units are linear, both weights =.5 and, i = 1, t = 1. We use the chain rule to calculate for each weight w 10 w 21 Weight changes should follow the gradient: First we unpack the chain, then we calculate the elements of it.

Including a non-linear activation function Let Then So our chains from before become:

Including the activation function in the chain rule and including more than one output unit leads to the formulation below, in which we use ‘  i ’ to represent ∂E/∂net i We can continue this back indefinitely…  s = f’(net s )  r  r w rs The weight change rule at every layer is:  w rs =  r a s Calculating the  term for output unit i:  i = (t i -a i )f’(net i ) And the  term for hidden unit j:  j = f’(net j )  i  i w ij i j k

Back propagation algorithm Propagate activation forward –Activation can only flow from lower-numbered units to higher numbered units Propagate “error” backward –Error flows from higher numbered units back to lower numbered units Calculate ‘weight error derivative’ terms =  r a s One can change weights after processing a single pattern or accumulate weight error derivatives over a batch of patterns before changing the weights.

Variants/Embellishments to back propagation Full “batch mode” (epoch-wise) learning rule with weight decay and momentum:  w rs =   p  rp a sp –  w rs +  w rs (prev) Weights can alternatively be updated after each pattern or after every k patterns. An alternative error measure has both conceptual and practical advantages: CE p = -  i [t ip log(a ip ) + (1-t ip )log(1-a ip )] If targets are actually probabilistic, minimizing CE p maximizes the probability of the observed target values. This also eliminates the ‘pinned output unit’ problem.

Why is back propagation important? Provides a procedure that allows networks to learn weights that can solve any deterministic input-output problem. –Contrary to expectation, it does not get stuck in local minima except in cases where the network is exceptionally tightly constrained. Allows networks to learn how to represent information as well as how to use it. Raises questions about the nature of representations and of what must be specified in order to learn them.

Is Backprop biologically plausible? Neurons do not send error signals backward across their weights through a chain of neurons, as far as anyone can tell But we shouldn’t be too literal minded about the actual biological implementation of the learning rule. Some neurons appear to use error signals, and there are ways to use differences between activation signals to carry error information