Download presentation

Presentation is loading. Please wait.

Published byMaximillian Gowing Modified over 2 years ago

1
Stacking RBMs and Auto-encoders for Deep Architectures References:[Bengio, 2009], [Vincent et al., 2008] 2011/03/03 강병곤

2
Introduction Deep architectures for various levels of representations –Implicitly learn representations –Layer-by-layer unsupervised training Generative model –Stack Restricted Boltzmann Machines (RBMs) –Forms a Deep belief network (DBN) Discriminative model –Stack Auto-encoders (AEs) –Multi-layered classifier

3
Generative Model Given a training set {x i } n, –Construct a generative model that produces samples of the same distribution –Start with sigmoid belief networks Need parameters for each component of the top-most layer: i.e. Bernoulli priors

4
Deep Belief Network Same as sigmoid BN, but with different top-layer structure –Use RBM to model the top layer Restricted Boltzmann Machine: (More on next slide) –Divided into hidden and visible layers (2 levels) –Connection forms a bipartite graph Called Restricted because no connection among same-layer units

5
Restricted Boltzmann Machines Energy-based model for hidden-visible joint distribution –Or express as a distribution of the visible variable:

6
RBMs (Cont’d) How posteriors factorize: notice how the energy is of the form –Then,

7
More on Posteriors Using the same factorization trick, we can compute the posterior: –Posterior on visible units can be derived similarly Due to factorization, Gibbs sampling is easy: This is just the sigmoid function for binomial h

8
Training RBMs Given parameters θ={W, b, c} Compute log-likelihood gradient for steepest ascent method –The first term is OK, but the second term is intractable, due to partition function –Use k-step Gibbs sampling to approximately sample for second term –k=1 performs well empirically

9
Training DBNs Every time we see a sample x, we lower the energy of the distribution at that point Start from the bottom layer and move up and train unsupervised –Each layer has its own set of parameters *Q(.) is the RBM posterior for the hidden variables

10
How to sample from DBNs 1.Sample a visible h l-1 from the top-level RBM (using Gibbs) 2.For k = l – 1 to 1 Sample h k-1 ~ P(. | h k ) from the DBN model 3.x = h 0 is the final sample

11
Discriminative Model Receive input x to classify –Unlike DBNs, which didn’t have inputs Multi-layer neural network should do –Use auto-encoders to discover compact representations –Use denoising AEs to add robustness to corruption

12
Auto-encoders A neural network where Input = Output –Hence its name “auto” –But has one hidden layer for input representation y z d-dimensional d'-dimensional (lower dimensional representation - d‘ < d is necessary to avoid learning identity function) x

13
AE Mechanism Parameterize each layer with parameter θ={W, b} Aim to “reconstruct” the input by minimizing reconstruction error –where, Can train in an “unsupervised” way –for any x in training set, train AE to reconstruct x

14
Denoising Auto-encoders Also need to be robust to missing data –Same structure as regular AE –But train against corrupted inputs –Arbitrarily remove a fixed portion of input component Rationale: Latent structure learning is important for re-building missing data –The hidden layer will learn the structure representation

15
Training Stacked DAEs Stack the DAEs to form a deep architecture –Take each DAE’s hidden layer –This hidden layer becomes the next layer Training is simple. Given training set {(x i, y i )}, –Initialize each layer (sequentially) in an unsupervised fashion –Each layer’s output is fed as inputs to the next layer –Finally tune the entire architecture with supervised learning using training set

16
References [Bengio, 2009] Yoshua Bengio. Learning deep architectures for AI. Foundations and Trends in Machine Learning. Vol. 2, No. 1, 2009. [Vincent et al., 2008] Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. Extracting and composing robust features with denoising autoencoders. In proceedings of ICML 2008.

Similar presentations

OK

Deep Learning Early Work Why Deep Learning Stacked Auto Encoders

Deep Learning Early Work Why Deep Learning Stacked Auto Encoders

© 2018 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on youth power in india Pulmonary anatomy and physiology ppt on cells Ppt on rolling friction Ppt on ram and rom pictures Ppt on word association test free Download ppt on automatic street light system Ppt on rivers of india in hindi Ppt on our government of india Ppt on festivals of kerala Performance based pay ppt online