CS 678 –Boltzmann Machines1 Boltzmann Machine Relaxation net with visible and hidden units Learning algorithm Avoids local minima (and speeds up learning)

Slides:



Advertisements
Similar presentations
Bioinspired Computing Lecture 16
Advertisements

Greedy Layer-Wise Training of Deep Networks
Lecture 10 Boltzmann machine
Slides from: Doug Gray, David Poole
NEURAL NETWORKS Backpropagation Algorithm
Deep Learning Bing-Chen Tsai 1/21.
Stochastic Neural Networks Deep Learning and Neural Nets Spring 2015.
Tuomas Sandholm Carnegie Mellon University Computer Science Department
Kostas Kontogiannis E&CE
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
CS Perceptrons1. 2 Basic Neuron CS Perceptrons3 Expanded Neuron.
CS 678 –Relaxation and Hopfield Networks1 Relaxation and Hopfield Networks Totally connected recurrent relaxation networks Bidirectional weights (symmetric)
Perceptron.
Structure learning with deep neuronal networks 6 th Network Modeling Workshop, 6/6/2013 Patrick Michl.
Neural Nets Using Backpropagation Chris Marriott Ryan Shirley CJ Baker Thomas Tannahill.
Radial Basis Functions
Carla P. Gomes CS4700 CS 4700: Foundations of Artificial Intelligence Prof. Carla P. Gomes Module: Neural Networks: Concepts (Reading:
Stochastic Neural Networks, Optimal Perceptual Interpretation, and the Stochastic Interactive Activation Model PDP Class January 15, 2010.
Network Goodness and its Relation to Probability PDP Class Winter, 2010 January 13, 2010.
Associative Learning.
Data Mining with Neural Networks (HK: Chapter 7.5)
Introduction to Neural Networks John Paxton Montana State University Summer 2003.
LOGO Classification III Lecturer: Dr. Bo Yuan
December 7, 2010Neural Networks Lecture 21: Hopfield Network Convergence 1 The Hopfield Network The nodes of a Hopfield network can be updated synchronously.
CS 4700: Foundations of Artificial Intelligence
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Radial Basis Function Networks
Artificial Neural Networks
Chapter 7 Other Important NN Models Continuous Hopfield mode (in detail) –For combinatorial optimization Simulated annealing (in detail) –Escape from local.
Presentation on Neural Networks.. Basics Of Neural Networks Neural networks refers to a connectionist model that simulates the biophysical information.
Neural Networks Ellen Walker Hiram College. Connectionist Architectures Characterized by (Rich & Knight) –Large number of very simple neuron-like processing.
Neural Networks AI – Week 23 Sub-symbolic AI Multi-Layer Neural Networks Lee McCluskey, room 3/10
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 16: NEURAL NETWORKS Objectives: Feedforward.
Hebbian Coincidence Learning
Boltzmann Machine (BM) (§6.4) Hopfield model + hidden nodes + simulated annealing BM Architecture –a set of visible nodes: nodes can be accessed from outside.
CS 478 – Tools for Machine Learning and Data Mining Backpropagation.
The Boltzmann Machine Psych 419/719 March 1, 2001.
Geoffrey Hinton CSC2535: 2013 Lecture 5 Deep Boltzmann Machines.
CSC321: Introduction to Neural Networks and machine Learning Lecture 16: Hopfield nets and simulated annealing Geoffrey Hinton.
CSC 2535 Lecture 8 Products of Experts Geoffrey Hinton.
CSC2535 Lecture 4 Boltzmann Machines, Sigmoid Belief Nets and Gibbs sampling Geoffrey Hinton.
B. Stochastic Neural Networks
CSC321: Introduction to Neural Networks and Machine Learning Lecture 18 Learning Boltzmann Machines Geoffrey Hinton.
The Essence of PDP: Local Processing, Global Outcomes PDP Class January 16, 2013.
Constraint Satisfaction and Schemata Psych 205. Goodness of Network States and their Probabilities Goodness of a network state How networks maximize goodness.
Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4.
Chapter 18 Connectionist Models
Chapter 6 Neural Network.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 17: Boltzmann Machines as Probabilistic Models Geoffrey Hinton.
CSC321: Computation in Neural Networks Lecture 21: Stochastic Hopfield nets and simulated annealing Geoffrey Hinton.
Deep Belief Network Training Same greedy layer-wise approach First train lowest RBM (h 0 – h 1 ) using RBM update algorithm (note h 0 is x) Freeze weights.
CSC2535: Computation in Neural Networks Lecture 8: Hopfield nets Geoffrey Hinton.
Learning: Neural Networks Artificial Intelligence CMSC February 3, 2005.
Fall 2004 Backpropagation CS478 - Machine Learning.
Learning Deep Generative Models by Ruslan Salakhutdinov
CSC321 Lecture 18: Hopfield nets and simulated annealing
Learning with Perceptrons and Neural Networks
Real Neurons Cell structures Cell body Dendrites Axon
CSC321: Neural Networks Lecture 19: Boltzmann Machines as Probabilistic Models Geoffrey Hinton.
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Structure learning with deep autoencoders
Machine Learning Today: Reading: Maria Florina Balcan
ECE 471/571 - Lecture 17 Back Propagation.
Boltzmann Machine (BM) (§6.4)
Backpropagation Disclaimer: This PPT is modified based on
Simulated Annealing & Boltzmann Machines
CSC 578 Neural Networks and Deep Learning
Stochastic Methods.
Presentation transcript:

CS 678 –Boltzmann Machines1 Boltzmann Machine Relaxation net with visible and hidden units Learning algorithm Avoids local minima (and speeds up learning) by using simulated annealing with stochastic nodes

Node activation: Logistic Function Node k outputs s k = 1 with probability else 0, where T is the temperature parameter Node does asynchronous random update CS 678 –Boltzmann Machines2

Network Energy and Simulated Annealing Energy is like in the Hopfield Network Simulated Annealing during relaxation – Start with high temperature T (more randomness and large jumps) – Progressively lower T while relaxing until equilibrium reached – Escapes local minima and speeds up learning CS 678 –Boltzmann Machines3

Boltzmann Learning Physical systems at thermal equilibrium obey the Boltzmann distribution P + (V α ) = Probability that the visible nodes (V) are in state α during training P - (V α ) = Probability that the V is in state α when free Goal: P - (V α ) ≈ P + (V α ) What are the probabilities for all states assuming the following training set (goal stable states)? CS 678 –Boltzmann Machines4

Boltzmann Learning Information Gain (G) is a measure of the similarity between P - (V α ) and P + (V α ) G = 0 if the probabilities are the same, else positive Thus we can derive a gradient descent algorithm for weight change by taking the partial derivative and setting it negative where p ij = probability that node i and node j simultaneously output 1 when in equilibrium CS 678 –Boltzmann Machines5

Network Relaxation/Annealing A network time step is a period in which each node has updated approximately once 1. Initialize node activations (Input) – Hidden nodes activations initialized randomly – Visible nodes Random Subset of nodes set to initial state, others random Subset of nodes clamped, others set to random or initial state 2. Relax following an annealing schedule. For example: 3. *Gather stats for m (e.g. 10) time steps, p ij = #times_both_on/m 4. Set final node state (output) to 1 if it was a 1 during the majority of the m time steps (could also output the probability or net value) CS 678 –Boltzmann Machines6

Boltzmann Learning Algorithm Until Convergence (Δw < ε) For each pattern in the training set Clamp pattern on all visible units Anneal several times calculating p + ij over m time steps end Average p + ij for all patterns Unclamp all visible units Anneal several times calculating p - ij over m time steps Update weights: Δw ij = C(p + ij - p - ij ) End CS 678 –Boltzmann Machines7

4-2-4 Simple Encoder Example Map single input node to a single output node Requires ≥ log(n) hidden nodes 1. Anneal and gather p + ij for each pattern twice (10 time steps for gather). Noise.15 of 1 to 0,.05 of 0 to 1. Annealing Schedule: 2. Anneal and gather p - ij in free state an equal number of times 3. Δw ij = 2 (p + ij – p - ij ) Average: 110 cycles CS 678 –Boltzmann Machines8

4-2-4 Encoder weights before and after training Note common recursive weight representation What is the network topology? CS 678 –Boltzmann Machines9

Shifting network, ~9000 cycles Note no explicit I/O directionality CS 678 –Boltzmann Machines10

Boltzmann Learning But does this Boltzmann algorithm learn the XOR function Hidden nodes But first order weight updates (ala perceptron learning rule) CS 678 –Boltzmann Machines11

Boltzmann Summary Stochastic Relaxation – minima escape and learning speed Hidden nodes and a learning algorithm, improvement over Hopfield Slow learning algorithm but need to extend to learn higher order interactions A different way of thinking about learning – creating a probabilistic environment to match goals Deep learning will use the Boltzmann machine (particularly the restricted Boltzmann machine) as a key component CS 678 –Boltzmann Machines12