Presentation is loading. Please wait.

Presentation is loading. Please wait.

Stacking RBMs and Auto-encoders for Deep Architectures References:[Bengio, 2009], [Vincent et al., 2008] 2011/03/03 강병곤.

Similar presentations


Presentation on theme: "Stacking RBMs and Auto-encoders for Deep Architectures References:[Bengio, 2009], [Vincent et al., 2008] 2011/03/03 강병곤."— Presentation transcript:

1 Stacking RBMs and Auto-encoders for Deep Architectures References:[Bengio, 2009], [Vincent et al., 2008] 2011/03/03 강병곤

2 Introduction Deep architectures for various levels of representations –Implicitly learn representations –Layer-by-layer unsupervised training Generative model –Stack Restricted Boltzmann Machines (RBMs) –Forms a Deep belief network (DBN) Discriminative model –Stack Auto-encoders (AEs) –Multi-layered classifier

3 Generative Model Given a training set {x i } n, –Construct a generative model that produces samples of the same distribution –Start with sigmoid belief networks Need parameters for each component of the top-most layer: i.e. Bernoulli priors

4 Deep Belief Network Same as sigmoid BN, but with different top-layer structure –Use RBM to model the top layer Restricted Boltzmann Machine: (More on next slide) –Divided into hidden and visible layers (2 levels) –Connection forms a bipartite graph Called Restricted because no connection among same-layer units

5 Restricted Boltzmann Machines Energy-based model for hidden-visible joint distribution –Or express as a distribution of the visible variable:

6 RBMs (Cont’d) How posteriors factorize: notice how the energy is of the form –Then,

7 More on Posteriors Using the same factorization trick, we can compute the posterior: –Posterior on visible units can be derived similarly Due to factorization, Gibbs sampling is easy: This is just the sigmoid function for binomial h

8 Training RBMs Given parameters θ={W, b, c} Compute log-likelihood gradient for steepest ascent method –The first term is OK, but the second term is intractable, due to partition function –Use k-step Gibbs sampling to approximately sample for second term –k=1 performs well empirically

9 Training DBNs Every time we see a sample x, we lower the energy of the distribution at that point Start from the bottom layer and move up and train unsupervised –Each layer has its own set of parameters *Q(.) is the RBM posterior for the hidden variables

10 How to sample from DBNs 1.Sample a visible h l-1 from the top-level RBM (using Gibbs) 2.For k = l – 1 to 1 Sample h k-1 ~ P(. | h k ) from the DBN model 3.x = h 0 is the final sample

11 Discriminative Model Receive input x to classify –Unlike DBNs, which didn’t have inputs Multi-layer neural network should do –Use auto-encoders to discover compact representations –Use denoising AEs to add robustness to corruption

12 Auto-encoders A neural network where Input = Output –Hence its name “auto” –But has one hidden layer for input representation y z d-dimensional d'-dimensional (lower dimensional representation - d‘ < d is necessary to avoid learning identity function) x

13 AE Mechanism Parameterize each layer with parameter θ={W, b} Aim to “reconstruct” the input by minimizing reconstruction error –where, Can train in an “unsupervised” way –for any x in training set, train AE to reconstruct x

14 Denoising Auto-encoders Also need to be robust to missing data –Same structure as regular AE –But train against corrupted inputs –Arbitrarily remove a fixed portion of input component Rationale: Latent structure learning is important for re-building missing data –The hidden layer will learn the structure representation

15 Training Stacked DAEs Stack the DAEs to form a deep architecture –Take each DAE’s hidden layer –This hidden layer becomes the next layer Training is simple. Given training set {(x i, y i )}, –Initialize each layer (sequentially) in an unsupervised fashion –Each layer’s output is fed as inputs to the next layer –Finally tune the entire architecture with supervised learning using training set

16 References [Bengio, 2009] Yoshua Bengio. Learning deep architectures for AI. Foundations and Trends in Machine Learning. Vol. 2, No. 1, [Vincent et al., 2008] Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. Extracting and composing robust features with denoising autoencoders. In proceedings of ICML 2008.


Download ppt "Stacking RBMs and Auto-encoders for Deep Architectures References:[Bengio, 2009], [Vincent et al., 2008] 2011/03/03 강병곤."

Similar presentations


Ads by Google