The Helmholtz Machine P Dayan, GE Hinton, RM Neal, RS Zemel

Slides:

Advertisements

Similar presentations

Artificial Neural Networks

Advertisements

Pattern Recognition and Machine Learning

Part 2: Unsupervised Learning

Neural networks Introduction Fitting neural networks

Deep Learning Bing-Chen Tsai 1/21.

CIAR Second Summer School Tutorial Lecture 2a Learning a Deep Belief Net Geoffrey Hinton.

CS590M 2008 Fall: Paper Presentation

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 7: Learning in recurrent networks Geoffrey Hinton.

ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.

Learning Representations. Maximum likelihood s r s?s? World Activity Probabilistic model of neuronal firing as a function of s Generative Model.

CS 678 –Boltzmann Machines1 Boltzmann Machine Relaxation net with visible and hidden units Learning algorithm Avoids local minima (and speeds up learning)

EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.

The free-energy principle: a rough guide to the brain? Karl Friston

CIAR Summer School Tutorial Lecture 2b Learning a Deep Belief Net

Chapter 8-3 Markov Random Fields 1. Topics 1. Introduction 1. Undirected Graphical Models 2. Terminology 2. Conditional Independence 3. Factorization.

Visual Recognition Tutorial

Wake-Sleep algorithm for Representational Learning

Simple Neural Nets For Pattern Classification

Ensemble Learning: An Introduction

Data Mining with Neural Networks (HK: Chapter 7.5)

CSC321: Introduction to Neural Networks and Machine Learning Lecture 20 Learning features one layer at a time Geoffrey Hinton.

CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.

CSC2535: 2013 Advanced Machine Learning Lecture 3a: The Origin of Variational Bayes Geoffrey Hinton.

EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.

How to do backpropagation in a brain

The free-energy principle: a rough guide to the brain? K Friston Summarized by Joon Shik Kim (Thu) Computational Models of Intelligence.

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

Natural Gradient Works Efficiently in Learning S Amari (Fri) Computational Modeling of Intelligence Summarized by Joon Shik Kim.

 The most intelligent device - “Human Brain”.  The machine that revolutionized the whole world – “computer”.  Inefficiencies of the computer has lead.

ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.

Varieties of Helmholtz Machine Peter Dayan and Geoffrey E. Hinton, Neural Networks, Vol. 9, No. 8, pp , 1996.

Learning Lateral Connections between Hidden Units Geoffrey Hinton University of Toronto in collaboration with Kejie Bao University of Toronto.

Geoffrey Hinton CSC2535: 2013 Lecture 5 Deep Boltzmann Machines.

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

CIAR Second Summer School Tutorial Lecture 1a Sigmoid Belief Nets and Boltzmann Machines Geoffrey Hinton.

CSC 2535 Lecture 8 Products of Experts Geoffrey Hinton.

SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.

CSC321: Introduction to Neural Networks and Machine Learning Lecture 18 Learning Boltzmann Machines Geoffrey Hinton.

CSC321 Introduction to Neural Networks and Machine Learning Lecture 3: Learning in multi-layer networks Geoffrey Hinton.

Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:

CSC2515: Lecture 7 (post) Independent Components Analysis, and Autoencoders Geoffrey Hinton.

CIAR Summer School Tutorial Lecture 1b Sigmoid Belief Nets Geoffrey Hinton.

CSC321: Introduction to Neural Networks and Machine Learning Lecture 19: Learning Restricted Boltzmann Machines Geoffrey Hinton.

Cognitive models for emotion recognition: Big Data and Deep Learning

Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.

CSC321: Introduction to Neural Networks and Machine Learning Lecture 17: Boltzmann Machines as Probabilistic Models Geoffrey Hinton.

CSC2535 Lecture 5 Sigmoid Belief Nets

Deep Belief Network Training Same greedy layer-wise approach First train lowest RBM (h 0 – h 1 ) using RBM update algorithm (note h 0 is x) Freeze weights.

CSC Lecture 23: Sigmoid Belief Nets and the wake-sleep algorithm Geoffrey Hinton.

Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.

Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.

Pattern Recognition Lecture 20: Neural Networks 3 Dr. Richard Spillman Pacific Lutheran University.

CSC2535: 2013 Advanced Machine Learning Lecture 2b: Variational Inference and Learning in Directed Graphical Models Geoffrey Hinton.

Some Slides from 2007 NIPS tutorial by Prof. Geoffrey Hinton

Learning Deep Generative Models by Ruslan Salakhutdinov

Deep Feedforward Networks

Energy models and Deep Belief Networks

CSC321: Neural Networks Lecture 22 Learning features one layer at a time Geoffrey Hinton.

CSC321: Neural Networks Lecture 19: Boltzmann Machines as Probabilistic Models Geoffrey Hinton.

Multimodal Learning with Deep Boltzmann Machines

Hidden Markov Models Part 2: Algorithms

Computational models for imaging analyses

Bayesian Models in Machine Learning

Xaq Pitkow, Dora E. Angelaki Neuron

The free-energy principle: a rough guide to the brain? K Friston

CSC321 Winter 2007 Lecture 21: Some Demonstrations of Restricted Boltzmann Machines Geoffrey Hinton.

CSC 2535: Computation in Neural Networks Lecture 9 Learning Multiple Layers of Features Greedily Geoffrey Hinton.

CSC 578 Neural Networks and Deep Learning

Random Neural Network Texture Model

How to win big by thinking straight about relatively trivial problems

Presentation transcript:

The Helmholtz Machine P Dayan, GE Hinton, RM Neal, RS Zemel Computational Modeling of Intelligence 11.04.22.(Fri) Summarized by Joon Shik Kim

Abstract Discovering the structure inherent in a set of patterns, is a fundamental aim of statistical inference or learning. One fruitful approach is to build a parametrized stochastic generative model. Each pattern can be generated in exponentially many ways. We describe a way of fitnessing this combinatorial explosion by maximizing an easily computed lower bound on the probability of the observation.

Introduction (1/3) Following Helmholtz, we view the human perceptual system as a statistical inference engine. Inference engine’s function is to infer the probable causes of sensory input. A recognition model is used to infer a probability distribution over the underlying causes from the sensory input.

Introduction (2/3) A separate generative model is used to train the recognition model. Figure 1: Shift patterns.

Introduction (3/3) First the direction of the shift was chosen. Each pixel in the bottom row was turned on (white) with a probability of 0.2. Shifted pixel in the top row and the copies of these in the middle rows were made. If we treat the top two rows as a left retina and the bottom two rows as a right retina, detecting the direction of the shift resembles the task of extracting depth from simple stereo images. Extra redundancy facilitates the search for the correct generative model.

The Recognition Distribution (1/6) The log probability of generating a particular example, d, from a model with parameter θ is where α are explanations. We call each possible set of causes an explanation of the pattern.

The Recognition Distribution (2/6) We define the energy of explanation α to be The posterior distribution of an explanation given d and θ is

The Recognition Distribution (3/6) The Helmholtz free energy is Any probability distribution over the explanations will have at least as high a free energy as the Boltzmann distribution. The log probability of the data can be written as where F is the free energy based on the incorrect or nonequilibrium posterior Q.

The Recognition Distribution (4/6) The last term in above equation is the Kullback-Leibler divergence between Q(d) and the posterior distribution P(θ,d). Distribution Q is produced by a separate recognition model that has its own parameters, Ф. These parameters are optimized at the same time as the parameters of the generative model, θ, to maximize the overall fit function .

The Recognition Distribution (5/6) Figure 2: Graphical view of our approximation.

The Recognition Distribution (6/6)

The Deterministic Helmholtz Machine (1/5) Helmholtz machine is a connectionist system with multiple layers of neuron-like binary stochastic processing units connected hierarchically by two sets of weights. Top-down connections θ implement the generative model. Bottom-up connections Ф implement the recognition model.

The Deterministic Helmholtz Machine (2/5) The key simplifying assumption is that the recognition distribution for a particular example d, Q(Ф,d), is factorial (separable) in each layer. Let’s assume there are h stochastic binary units in a layer l. Q(Ф,d) makes the assumption that the actual activity of any one unit in layer l is independent of the activities of all the other units in that layer, so the recognition model needs only specify h probabilities rather than 2h-1.

The Deterministic Helmholtz Machine (3/5) where .

The Deterministic Helmholtz Machine (4/5)

The Deterministic Helmholtz Machine (5/5) One could perform stochastic gradient ascent in the negative free energy across all the data with the above equation and a form of REINFORCEMENT algorithm.

The Wake-Sleep Algorithm (1/3) Borrowing an idea from the Boltzmann machine, we get the wake-sleep algorithm, which is a very simple learning scheme for layered networks of stochastic binary units. During the wake phase, data d from the world are presented at the lowest layer and binary activations of units at successively higher layers are picked according to the recognition probabilities.

The Wake-Sleep Algorithm (2/3) The top-down generative weights from layer l+1 to l are then altered to reduce the Kullback-Leibler divergence between actual activations and the generated probabilities. In the sleep phase, the recognition weights are turned off and the top-down weights are used to activate the units. The network thus generates a random instance from its generative model.

The Wake-Sleep Algorithm (3/3) Since it has generated the instance, it knows the true underlying causes, and therefore has the available target values for the hidden units that are required to train the bottom-up weights. Unfortunately, there is no single cost function that is reduced by these two procedures.

Discussion (1/2) Instead of using a fixed independent prior distribution for each of the hidden units in a layer, the Helmholtz machine makes this prior more flexible by deriving it from the bottom-up activities of units. The old idea of analysis-by-synthesis assumes that the cortex contains a generative model of the world and that recognition involves inverting the generative model in real time.

Discussion (2/2) An interesting first step toward interaction within layers would be to organize their units into small clusters with local excitation and longer-range inhibition, as is seen in the columnar structure of the brain.