CSC321: Introduction to Neural Networks and machine Learning Lecture 16: Hopfield nets and simulated annealing Geoffrey Hinton.

Slides:



Advertisements
Similar presentations
Bioinspired Computing Lecture 16
Advertisements

Learning in Neural and Belief Networks - Feed Forward Neural Network 2001 년 3 월 28 일 안순길.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 7: Learning in recurrent networks Geoffrey Hinton.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
CS 678 –Boltzmann Machines1 Boltzmann Machine Relaxation net with visible and hidden units Learning algorithm Avoids local minima (and speeds up learning)
Kostas Kontogiannis E&CE
CSC321: Neural Networks Lecture 3: Perceptrons
1 Neural networks 3. 2 Hopfield network (HN) model A Hopfield network is a form of recurrent artificial neural network invented by John Hopfield in 1982.
Carla P. Gomes CS4700 CS 4700: Foundations of Artificial Intelligence Prof. Carla P. Gomes Module: Neural Networks: Concepts (Reading:
Network Goodness and its Relation to Probability PDP Class Winter, 2010 January 13, 2010.
Basic Models in Neuroscience Oren Shriki 2010 Associative Memory 1.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 20 Learning features one layer at a time Geoffrey Hinton.
Neural Networks Chapter 2 Joost N. Kok Universiteit Leiden.
8/6/20151 Neural Networks CIS 479/579 Bruce R. Maxim UM-Dearborn.
Learning Energy-Based Models of High-Dimensional Data Geoffrey Hinton Max Welling Yee-Whye Teh Simon Osindero
CSC2535: 2013 Advanced Machine Learning Lecture 3a: The Origin of Variational Bayes Geoffrey Hinton.
Neural Networks Lecture 8: Two simple learning algorithms
CS623: Introduction to Computing with Neural Nets (lecture-10) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
© Negnevitsky, Pearson Education, Lecture 7 Artificial neural networks: Supervised learning Introduction, or how the brain works Introduction, or.
Presentation on Neural Networks.. Basics Of Neural Networks Neural networks refers to a connectionist model that simulates the biophysical information.
CSC2535: Computation in Neural Networks Lecture 11: Conditional Random Fields Geoffrey Hinton.
Neural Networks Architecture Baktash Babadi IPM, SCS Fall 2004.
Neural Networks Ellen Walker Hiram College. Connectionist Architectures Characterized by (Rich & Knight) –Large number of very simple neuron-like processing.
CSC321: Neural Networks Lecture 12: Clustering Geoffrey Hinton.
Introduction to Artificial Neural Network Models Angshuman Saha Image Source: ww.physiol.ucl.ac.uk/fedwards/ ca1%20neuron.jpg.
Artificial Neural Network Supervised Learning دكترمحسن كاهاني
Boltzmann Machine (BM) (§6.4) Hopfield model + hidden nodes + simulated annealing BM Architecture –a set of visible nodes: nodes can be accessed from outside.
Learning Lateral Connections between Hidden Units Geoffrey Hinton University of Toronto in collaboration with Kejie Bao University of Toronto.
Neural Networks and Fuzzy Systems Hopfield Network A feedback neural network has feedback loops from its outputs to its inputs. The presence of such loops.
CSC321: Neural Networks Lecture 2: Learning with linear neurons Geoffrey Hinton.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
Geoffrey Hinton CSC2535: 2013 Lecture 5 Deep Boltzmann Machines.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 9: Ways of speeding up the learning and preventing overfitting Geoffrey Hinton.
CIAR Second Summer School Tutorial Lecture 1a Sigmoid Belief Nets and Boltzmann Machines Geoffrey Hinton.
CSC321: Neural Networks Lecture 24 Products of Experts Geoffrey Hinton.
CSC 2535 Lecture 8 Products of Experts Geoffrey Hinton.
ADVANCED PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.
Activations, attractors, and associators Jaap Murre Universiteit van Amsterdam
CSC2535 Lecture 4 Boltzmann Machines, Sigmoid Belief Nets and Gibbs sampling Geoffrey Hinton.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 18 Learning Boltzmann Machines Geoffrey Hinton.
The Essence of PDP: Local Processing, Global Outcomes PDP Class January 16, 2013.
Constraint Satisfaction and Schemata Psych 205. Goodness of Network States and their Probabilities Goodness of a network state How networks maximize goodness.
CSC321 Introduction to Neural Networks and Machine Learning Lecture 3: Learning in multi-layer networks Geoffrey Hinton.
CIAR Summer School Tutorial Lecture 1b Sigmoid Belief Nets Geoffrey Hinton.
CSC321: Lecture 7:Ways to prevent overfitting
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
CSC321: Introduction to Neural Networks and Machine Learning Lecture 19: Learning Restricted Boltzmann Machines Geoffrey Hinton.
Image Source: ww.physiol.ucl.ac.uk/fedwards/ ca1%20neuron.jpg
Boltzman Machines Stochastic Hopfield Machines Lectures 11e 1.
Activations, attractors, and associators Jaap Murre Universiteit van Amsterdam en Universiteit Utrecht
Chapter 6 Neural Network.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 23: Linear Support Vector Machines Geoffrey Hinton.
Lecture 9 Model of Hopfield
Neural Networks Lecture 11: Learning in recurrent networks Geoffrey Hinton.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 17: Boltzmann Machines as Probabilistic Models Geoffrey Hinton.
CSC2535 Lecture 5 Sigmoid Belief Nets
CSC321: Computation in Neural Networks Lecture 21: Stochastic Hopfield nets and simulated annealing Geoffrey Hinton.
CSC2535: Computation in Neural Networks Lecture 8: Hopfield nets Geoffrey Hinton.
CSC 2535: Computation in Neural Networks Lecture 10 Learning Deterministic Energy-Based Models Geoffrey Hinton.
CSC Lecture 23: Sigmoid Belief Nets and the wake-sleep algorithm Geoffrey Hinton.
Machine Learning Artificial Neural Networks MPλ ∀ Stergiou Theodoros 1.
J. Kubalík, Gerstner Laboratory for Intelligent Decision Making and Control Artificial Neural Networks II - Outline Cascade Nets and Cascade-Correlation.
Some Slides from 2007 NIPS tutorial by Prof. Geoffrey Hinton
CSC321 Lecture 18: Hopfield nets and simulated annealing
CSC321: Neural Networks Lecture 19: Boltzmann Machines as Probabilistic Models Geoffrey Hinton.
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Boltzmann Machine (BM) (§6.4)
CSC321 Winter 2007 Lecture 21: Some Demonstrations of Restricted Boltzmann Machines Geoffrey Hinton.
CS623: Introduction to Computing with Neural Nets (lecture-11)
CSC 578 Neural Networks and Deep Learning
Presentation transcript:

CSC321: Introduction to Neural Networks and machine Learning Lecture 16: Hopfield nets and simulated annealing Geoffrey Hinton

Hopfield Nets A Hopfield net is composed of binary threshold units with recurrent connections between them. Recurrent networks of non-linear units are generally very hard to analyze. They can behave in many different ways: –Settle to a stable state –Oscillate –Follow chaotic trajectories that cannot be predicted far into the future. But Hopfield realized that if the connections are symmetric, there is a global energy function –Each “configuration” of the network has an energy. –The binary threshold decision rule causes the network to settle to an energy minimum.

The energy function The global energy is the sum of many contributions. Each contribution depends on one connection weight and the binary states of two neurons: The simple quadratic energy function makes it easy to compute how the state of one neuron affects the global energy:

Settling to an energy minimum Pick the units one at a time and flip their states if it reduces the global energy. Find the minima in this net If units make simultaneous decisions the energy could go up

How to make use of this type of computation Hopfield proposed that memories could be energy minima of a neural net. The binary threshold decision rule can then be used to “clean up” incomplete or corrupted memories. –This gives a content-addressable memory in which an item can be accessed by just knowing part of its content (like google) –It is robust against hardware damage.

Storing memories If we use activities of 1 and -1, we can store a state vector by incrementing the weight between any two units by the product of their activities. –Treat biases as weights from a permanently on unit With states of 0 and 1 the rule is slightly more complicated.

Spurious minima Each time we memorize a configuration, we hope to create a new energy minimum. But what if two nearby minima merge to create a minimum at an intermediate location? This limits the capacity of a Hopfield net. Using Hopfield’s storage rule the capacity of a totally connected net with N units is only 0.15N memories. –This does not make efficient use of the bits required to store the weights in the network.

Avoiding spurious minima by unlearning Hopfield, Feinstein and Palmer suggested the following strategy: –Let the net settle from a random initial state and then do unlearning. –This will get rid of deep, spurious minima and increase memory capacity. Crick and Mitchison proposed unlearning as a model of what dreams are for. –That’s why you don’t remember them (Unless you wake up during the dream) But how much unlearning should we do? –And can we analyze what unlearning achieves?

Willshaw nets We can improve efficiency by using sparse vectors and only allowing one bit per weight. –Turn on a synapse when input and output units are both active. For retrieval, set the output threshold equal to the number of active input units –This makes false positives improbable in output units with dynamic thresholds

An iterative storage method Instead of trying to store vectors in one shot as Hopfield does, cycle through the training set many times. –use the perceptron convergence procedure to train each unit to have the correct state given the states of all the other units in that vector. –This uses the capacity of the weights efficiently.

Another computational role for Hopfield nets Instead of using the net to store memories, use it to construct interpretations of sensory input. –The input is represented by the visible units. –The interpretation is represented by the states of the hidden units. –The badness of the interpretation is represented by the energy This raises two difficult issues: –How do we escape from poor local minima to get good interpretations? –How do we learn the weights on connections to the hidden units? Visible units. Used to represent the inputs Hidden units. Used to represent an interpretation of the inputs

An example: Interpreting a line drawing Use one “2-D line” unit for each possible line in the picture. –Any particular picture will only activate a very small subset of the line units. Use one “3-D line” unit for each possible 3-D line in the scene. –Each 2-D line unit could be the projection of many possible 3-D lines. Make these 3-D lines compete. Make 3-D lines support each other if they join in 3-D. Make them strongly support each other if they join at right angles. Join in 3-D Join in 3-D at right angle 2-D lines 3-D lines picture

Noisy networks find better energy minima A Hopfield net always makes decisions that reduce the energy. –This makes it impossible to escape from local minima. We can use random noise to escape from poor minima. –Start with a lot of noise so its easy to cross energy barriers. –Slowly reduce the noise so that the system ends up in a deep minimum. This is “simulated annealing”. A B C

Stochastic units Replace the binary threshold units by binary stochastic units that make biased random decisions. –The “temperature” controls the amount of noise –Decreasing all the energy gaps between configurations is equivalent to raising the noise level. temperature

The annealing trade-off At high temperature the transition probabilities for uphill jumps are much greater. At low temperature the equilibrium probabilities of good states are much better than the equilibrium probabilities of bad ones. Energy increase

How temperature affects transition probabilities A B A B High temperature transition probabilities Low temperature transition probabilities

Thermal equilibrium Thermal equilibrium is a difficult concept! –It does not mean that the system has settled down into the lowest energy configuration. –The thing that settles down is the probability distribution over configurations. The best way to think about it is to imagine a huge ensemble of systems that all have exactly the same energy function. –The probability distribution is just the fraction of the systems that are in each possible configuration. –We could start with all the systems in the same configuration, or with an equal number of systems in each possible configuration. –After running the systems stochastically in the right way, we eventually reach a situation where the number of systems in each configuration remains constant even though any given system keeps moving between configurations

An analogy Imagine a casino in Las Vegas that is full of card dealers (we need many more than 52! of them). We start with all the card packs in standard order and then the dealers all start shuffling their packs. –After a few time steps, the king of spades still has a good chance of being next to queen of spades. The packs have not been fully randomized. –After prolonged shuffling, the packs will have forgotten where they started. There will be an equal number of packs in each of the 52! possible orders. –Once equilibrium has been reached, the number of packs that leave a configuration at each time step will be equal to the number that enter the configuration. The only thing wrong with this analogy is that all the configurations have equal energy, so they all end up with the same probability.