Neural Networks Lecture 11: Learning in recurrent networks Geoffrey Hinton.

Slides:



Advertisements
Similar presentations
Multi-Layer Perceptron (MLP)
Advertisements

Backpropagation Learning Algorithm
CSC : Advanced Machine Learning Lecture 10 Recurrent neural networks
Learning in Neural and Belief Networks - Feed Forward Neural Network 2001 년 3 월 28 일 안순길.
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
1 Neural networks. Neural networks are made up of many artificial neurons. Each input into the neuron has its own weight associated with it illustrated.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 7: Learning in recurrent networks Geoffrey Hinton.
CSC321: Neural Networks Lecture 3: Perceptrons
Machine Learning Neural Networks
Lecture 14 – Neural Networks
Simple Neural Nets For Pattern Classification
S. Mandayam/ ANN/ECE Dept./Rowan University Artificial Neural Networks / Fall 2004 Shreekanth Mandayam ECE Department Rowan University.
Branch Prediction with Neural- Networks: Hidden Layers and Recurrent Connections Andrew Smith CSE Dept. June 10, 2004.
Neural Networks Basic concepts ArchitectureOperation.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
September 30, 2010Neural Networks Lecture 8: Backpropagation Learning 1 Sigmoidal Neurons In backpropagation networks, we typically choose  = 1 and 
CSC321: Introduction to Neural Networks and Machine Learning Lecture 20 Learning features one layer at a time Geoffrey Hinton.
November 21, 2012Introduction to Artificial Intelligence Lecture 16: Neural Network Paradigms III 1 Learning in the BPN Gradients of two-dimensional functions:
Image Compression Using Neural Networks Vishal Agrawal (Y6541) Nandan Dubey (Y6279)
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
CSC2515 Fall 2007 Introduction to Machine Learning Lecture 4: Backpropagation All lecture slides will be available as.ppt,.ps, &.htm at
How to do backpropagation in a brain
MSE 2400 EaLiCaRA Spring 2015 Dr. Tom Way
Multiple-Layer Networks and Backpropagation Algorithms
Artificial Neural Networks
CSC2535: Computation in Neural Networks Lecture 11: Conditional Random Fields Geoffrey Hinton.
Neural Networks Chapter 6 Joost N. Kok Universiteit Leiden.
Explorations in Neural Networks Tianhui Cai Period 3.
Chapter 9 Neural Network.
Waqas Haider Khan Bangyal. Multi-Layer Perceptron (MLP)
Appendix B: An Example of Back-propagation algorithm
 Diagram of a Neuron  The Simple Perceptron  Multilayer Neural Network  What is Hidden Layer?  Why do we Need a Hidden Layer?  How do Multilayer.
Artificial Intelligence Methods Neural Networks Lecture 4 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 21 Oct 28, 2005 Nanjing University of Science & Technology.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
BACKPROPAGATION: An Example of Supervised Learning One useful network is feed-forward network (often trained using the backpropagation algorithm) called.
Multi-Layer Perceptron
CSC321 Introduction to Neural Networks and Machine Learning Lecture 3: Learning in multi-layer networks Geoffrey Hinton.
Introduction to Neural Networks Introduction to Neural Networks Applied to OCR and Speech Recognition An actual neuron A crude model of a neuron Computational.
Recurrent Neural Networks - Learning, Extension and Applications
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
Hazırlayan NEURAL NETWORKS Backpropagation Network PROF. DR. YUSUF OYSAL.
CSC321 Lecture 5 Applying backpropagation to shape recognition Geoffrey Hinton.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 6: Applying backpropagation to shape recognition Geoffrey Hinton.
Perceptrons Michael J. Watts
Previous Lecture Perceptron W  t+1  W  t  t  d(t) - sign (w(t)  x)] x Adaline W  t+1  W  t  t  d(t) - f(w(t)  x)] f’ x Gradient.
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Neural Networks Lecture 4 out of 4. Practical Considerations Input Architecture Output.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Neural networks.
Multiple-Layer Networks and Backpropagation Algorithms
CSE 190 Modeling sequences: A brief overview
Deep Learning Amin Sobhani.
CSC321: Neural Networks Lecture 22 Learning features one layer at a time Geoffrey Hinton.
CSE 190 Modeling sequences: A brief overview
CSE P573 Applications of Artificial Intelligence Neural Networks
Neural Networks Advantages Criticism
Recurrent Neural Networks
Artificial Neural Network & Backpropagation Algorithm
CSE 573 Introduction to Artificial Intelligence Neural Networks
network of simple neuron-like computing elements
Neural Network - 2 Mayank Vatsa
Neural networks (3) Regularization Autoencoder
CSC321: Neural Networks Lecture 11: Learning in recurrent networks
CS621: Artificial Intelligence Lecture 22-23: Sigmoid neuron, Backpropagation (Lecture 20 and 21 taken by Anup on Graphical Models) Pushpak Bhattacharyya.
Pattern Recognition: Statistical and Neural
Artificial Neural Networks / Spring 2002
Outline Announcement Neural networks Perceptrons - continued
Presentation transcript:

Neural Networks Lecture 11: Learning in recurrent networks Geoffrey Hinton

Recurrent networks If the connectivity has directed cycles, the network can do much more than just computing a fixed sequence of non-linear transforms: –It can oscillate. Good for motor control? –It can settle to point attractors. Good for classification –It can behave chaotically But that is usually a bad idea for information processing. –It can remember things for a long time. The network has internal state. It can decide to ignore the input for a while if it wants to. –It can model sequential data in a natural way. No need to use delay taps to spatialize time.

An advantage of modeling sequential data We can get a teaching signal by trying to predict the next term in a series. –This seems much more natural than trying to predict one pixel in an image from the other pixels.

The equivalence between layered, feedforward nets and recurrent nets w 1 w 2 w 3 w 4 w 1 w 2 w 3 w 4 w 1 w 2 w 3 w 4 w 1 w 2 w 3 w 4 time=0 time=2 time=1 time=3 Assume that there is a time delay of 1 in using each connection. The recurrent net is just a layered net that keeps reusing the same weights.

Backpropagation with weight constraints It is easy to modify the backprop algorithm to incorporate linear constraints between the weights. We compute the gradients as usual, and then modify the gradients so that they satisfy the constraints. –So if the weights started off satisfying the constraints, they will continue to satisfy them.

Backpropagation through time We could convert the recurrent net into a layered, feed- forward net and then train the feed-forward net with weight constraints. –This is clumsy. It is better to move the algorithm to the recurrent domain rather than moving the network to the feed-forward domain. So the forward pass builds up a stack of the activities of all the units at each time step. The backward pass peels activities off the stack to compute the error derivatives at each time step. After the backward pass we add together the derivatives at all the different times for each weight. There is an irritating extra problem: –We need to specify the initial state of all the non-input units. We could backpropagate all the way to these initial activities and learn the best values for them.

Teaching signals for recurrent networks We can specify targets in several ways: –Specify desired final activities of all the units –Specify desired activities of all units for the last few steps Good for learning attractors It is easy to add in extra error derivatives as we backpropagate. –Specify the desired activity of a subset of the units. The other units are “hidden” w 1 w 2 w 3 w 4 w 1 w 2 w 3 w 4 w 1 w 2 w 3 w 4

A good problem for a recurrent network We can train a feedforward net to do binary addition, but there are obvious regularities that it cannot capture: –We must decide in advance the maximum number of digits in each number. –The processing applied to the beginning of a long number does not generalize to the end of the long number because it uses different weights. As a result, feedforward nets do not generalize well on the binary addition task hidden units

The algorithm for binary addition no carry print 1 carry print 1 no carry print 0 carry print This is a finite state automaton. It decides what transition to make by looking at the next column. It prints after making the transition. It moves from right to left over the two input numbers. 1

A recurrent net for binary addition The network has two input units and one output unit. The network is given two input digits at each time step. The desired output at each time step is the output for the column that was provided as input two time steps ago. –It takes one time step to update the hidden units based on the two input digits. –It takes another time step for the hidden units to cause the output time

The connectivity of the network The 3 hidden units have all possible interconnections in all directions. –This allows a hidden activity pattern at one time step to vote for the hidden activity pattern at the next time step. The input units have feedforward connections that allow them to vote for the next hidden activity pattern. 3 fully interconnected hidden units

What the network learns It learns four distinct patterns of activity for the 3 hidden units. These patterns correspond to the nodes in the finite state automaton. –Do not confuse units in a neural network with nodes in a finite state automaton. Nodes are like activity vectors. –The automaton is restricted to be in exactly one state at each time. The hidden units are restricted to have exactly one vector of activity at each time. A recurrent network can emulate a finite state automaton, but it is exponentially more powerful. With N hidden neurons it has 2^N possible binary activity vectors in the hidden units. –This is important when the input stream has several separate things going on at once. A finite state automaton cannot cope with this properly.