Presentation is loading. Please wait.

Presentation is loading. Please wait.

Wake-Sleep algorithm for Representational Learning

Similar presentations


Presentation on theme: "Wake-Sleep algorithm for Representational Learning"— Presentation transcript:

1 Wake-Sleep algorithm for Representational Learning
Hamid Reza Maei Physiol. & Neurosci. Program University of Toronto

2 Motivation The Brain is able to learn the underlying representation of received input data (e.g. images) in an unsupervised manner. Challenge for neural networks: 1. It needs a specific teacher for desired output 2. It needs training all the connections Wake-Sleep algorithm avoids these two problems d G1 G2 R1 R2 V/H

3 Logistic belief network
d i G X Y xi yj Gijxy j gjy Advantage Conditional distributions are factorial:

4 Learning Generative weights
The inference is intractable Sprinkle Rain Wet Explaining away: Sprinkle and Rain conditionally are dependent Though it is very crude, but let’s approximate P(h|d; G) with a factorial distribution Q(h|d; R). Recognition weight

5 Any guarantee for the improvement of learning?
Using Jensen’s inequality we find a lower bound for log likelihood: YES! Free energy Thus, decreasing the free energy increases the lower bound and therefore increases log likelihood. This leads to Wake Phase.

6 Wake phase Replaced by Q(h|d; R). Remind:: Get samples (xo and yo) from factorial distribution Q(h|d;R) (bottom-up pass) use these samples in generative model for changing the generative weights.

7 Learning recognition weights
Derivative of free energy with respect to R ,gives complicated results that computationally is intractable What should be done?! Switch! (KL is not a symmetric function! ) Change the recognition weights to minimize the above free energy. This leads to sleep phase.

8 Sleep Phase NO! Any guarantee for improvement? (for sleep phase)
1. Get samples (x●,y●), generated by generative model using data coming from nowhere! 2. Change the recognition connections using the above delta rule. Any guarantee for improvement? (for sleep phase) Sleep phase approximation Wake phase approximation P Q -In the sleep phase we are minimizing KL(P, Q) which is wrong! -In the wake phase we are minimizing KL(Q, P) which is right thing to do. NO!

9 The wake-sleep algorithm
Wake-phase: -Use recognition weights to perform a bottom-up pass in order to create samples for layers above (from data). -Train generative weights using samples obtained from recognition model. 2. Sleep-phase: -Use generative weights to reconstruct data by performing a top-down pass. -Train recognition weights using samples obtained from generative model G2 R2 G1 R1 d What Wake-Sleep algorithm really is trying to achieve?! It turns out that the goal of wake-sleep algorithm is to learn representation that are economical to describe: We can describe it using Shannon’s coding theory.

10 Simple example Training:
For 4X4 images, we use belief network with one visible layer and two hidden layers (binary neurons): -The visible layer has 16 neurons. -First hidden layer (8 neurons) decides all possible orientations. -The top hidden layer (1 neuron) decides vertical and horizontal bars 2. The network was trained on 2x106 random examples. Hinton et. al. Science (1995)

11 Wake-sleep algorithm on 20 news group data set
-contains about 20,000 articles. -many categories fall into overlapping clusters -we used tiny version of this data set with binary occurrence of 100 words across posting which could be divided with 4 classes: comp.* sci.* rec.* talk.*

12 Training Visible layer: 100 visible units
hidden Visible layer: 100 visible units First hidden layer: 50 hidden units in the first hidden layer Second hidden layer: hidden units in the top layer. For training we used %60 of data (9745 training examples) and kept remaining for testing the model (6497 testing examples).

13 Just for fun! Performance for model- Comp.* (class 1)
`windows',`win',`video',`card',`dos', `memory',`program',`ftp',`help',`system‘ Performance for model-talk.* (class 4) 'god‘, 'bible‘, 'jesus‘, 'question‘, 'christian', 'israel‘, 'religion‘, 'card‘, 'jews' ' ' `world',`jews',`war',`religion',`god',`jesus', `christian',`israel',`children',`food‘

14 Testing (classification)
Learn two different Wake-Sleep algorithm on two different classes 1 and 4; that is comp.* and talk.* respectively. Present the training examples from classes 1 and 4 to each of the two learned algorithm and compute the following free energy as score under each model. Presented examples from classes 1 and 4 to the learned wake-sleep algorithm under model comp.* Presented examples from classes 1 and 4 to the learned wake-sleep algorithm under model talk.* (Class 1) (Class 4)

15 Naïve Bayes classifier
Assumptions: P(cj): frequency of classes in the training examples (9745). Conditional Independence Assumption. Use Bayes rule Learn model parameter using Maximum likelihood (e.g. for classes 1 and 4). Correct prediction on testing examples: Present testing example from class 1 and 2 to the trained model and predict which class it belongs to. %80 correct prediction Most probable words in each class: Comp.*: -’windows‘, 'help‘, ' ‘,'problem' 'system‘, 'computer''software’,'program' 'university''drive‘ Talk.*: -'fact‘, 'god‘,'government’,'question''world‘,'christian‘,'case''course''state' 'jews' McCallum et. al. (1998)

16 Conclusion Wake-Sleep is unsupervised learning algorithm.
higher hidden layers store representations. Although we have used very crude approximations it works very well on some of realistic data. Wake-Sleep is trying to describe the representation economical (Shannon’s coding theory).

17 Flaws of wake-sleep algorithm
Sleep phase has horrible assumptions (although it worked!) -it minimized KL(P||Q) rather KL(Q|P) -The recognition weights are trained not from data space but dream space! *Variation approximations.

18 Using complementary priors to eliminate explaining away
1. Because of explaining away there .. Remove the correlations in hidden layers—complementary priors etc G GT H1 hi1 Do complementary priors exist? Very hard questions and not obvious! G GT V1 vj1 But it is possible to remove the effect of explaining away using this architecture: G GT H0 hi0 G GT V0 vj0 Restricted Boltzman Machine: Inference is very easy Because of factorial distributions Hinton et al. Neural Computation (2006) Hinton et. al. Science (2006)


Download ppt "Wake-Sleep algorithm for Representational Learning"

Similar presentations


Ads by Google