Goodfellow: Chapter 14 Autoencoders

Goodfellow: Chapter 14 Autoencoders
Dr. Charles Tappert The information here, although greatly condensed, comes almost entirely from the chapter content.

Chapter 14 Sections Introduction 1. Undercomplete Autoencoders
2. Regularized Autoencoders 3. Representational Power, Layer Size and Depth 4. Stochastic Encoders and Decoders 5. Denoising Autoencoders 6. Learning Manifolds with Autoencoders 7. Contractive Autoencoders 8. Predictive Sparse Decomposition 9. Applications of Autoencoders

Introduction An autoencoder is a neural network trained to copy its input to its output Network has encoder and decoder functions Autoencoders should not copy perfectly But restricted by design to copy only approximately By doing so, it learns useful properties of the data Modern autoencoders use stochastic mappings Autoencoders were traditionally used for Dimensionality reduction as well as feature learning

Structure of an Autoencoder
Hidden layer (code) h g f x r Input Reconstruction Figure 14.1 (Goodfellow 2016)

1. Undercomplete Autoencoders
There are several ways to design autoencoders to copy only approximately The system learns useful properties of the data One way makes dimension h < dimension x Undercomplete: h has smaller dimension than x Overcomplete: h has greater dimension than x Principle Component Analysis (PCA) An undercomplete autoencoder, linear decoder and MSE loss function, learns same subspace as PCA Nonlinear encoder/decoder functions yield more powerful nonlinear generalizations of PCA

Avoiding Trivial Identity
• Undercomplete autoencoders • h has lower dimension than x • f or g has low capacity (e.g., linear g) • Must discard some information in h • Overcomplete autoencoders • h has higher dimension than x • Must be regularized (Goodfellow 2016)

2. Regularized Autoencoders
Allow overcomplete case but regularize Use a loss model that encourages properties other than copying the input to the output Sparsity of representation Smallness of the derivative of the representation Robustness to noise or missing inputs

3. Representational Power, Layer Size, and Depth
Autoencoders are often trained with only a single layer encoder and a single layer decoder Using deep encoders and decoders offers the advantages of usual feedforward networks

4. Stochastic Encoders and Decoders
Modern autoencoders use stochastic mappings We can generalize the notion of the encoding and decoding functions to encoding and decoding distributions

Stochastic Autoencoders
pencoder(h | x) pdecoder(x | h) x r Figure 14.2 (Goodfellow 2016)

5. Denoising Autoencoders
A denoising autoencoder (DAE) is one that receives a corrupted data point as input and is trained to predict the original, uncorrupted data point as its output Learn the reconstructed distribution Choose a training sample from the training data Obtain corrupted version from corruption process Use training sample pair to estimate reconstruction

Denoising Autoencoder
h g f x˜ L C: corruption process (introduce noise) C(x˜ | x) L = - log pdecoder(x | h = f (x˜)) x Figure 14.3 (Goodfellow 2016)

Gray circle of equiprobable corruptions
Denoising Autoencoders Learn a Manifold Gray circle of equiprobable corruptions x˜ g o f x˜ C(x˜ | x) Corrupted point falls back to nearest point on the manifold Figure 14.4 (Goodfellow 2016)

Vector Field Learned by a Denoising Autoencoder
(Goodfellow 2016)

6. Learning Manifolds with Autoencoders
Like other machine learning algorithms, autoencoders exploit the idea that data concentrates around a low-dimensional manifold Autoencoders take the idea further and aim to learn the structure of the manifold

Tangent Hyperplane of a Manifold
Amount of vertical translation defines a coordinate along a 1D manifold tracing out a curved path through image space Figure 14.6 (Goodfellow 2016)

Learning a Collection of 0-D Manifolds by Resisting Perturbation
1.0 Identity Optimal reconstruction 0.8 0.6 0.4 0.2 r(x) 0.0 x0 x1 x2 x Reconstruction invariant to small perturbations near data points Figure 14.7 (Goodfellow 2016)

Non-Parametric Manifold Learning with Nearest-Neighbor Graphs
Figure 14.8 (Goodfellow 2016)

Tiling a Manifold with Local Coordinate Systems
Each local patch is like a flat Gaussian “pancake” Figure 14.9 (Goodfellow 2016)

7. Contractive Autoencoders
The contractive autoencoder (CAE) uses a regularizer to make the derivatives of f(x) as small as possible The name contractive arises from the way the CAE warps space The input neighborhood is contracted to a smaller output neighborhood The CAE is contractive only locally

Contractive Autoencoders
@f (x) 2 ⌦(h) = A . (14.18) @x F Input point Tangent vectors Local PCA (no sharing across regions) Contractive autoencoder Figure 14.10 Dog from CIFAR-10 dataset (Goodfellow 2016)

8. Predictive Sparse Decomposition
Predictive Sparse Decomposition (PSD) is a model that is a hybrid of sparse coding and parametric autoencoders The model consists of an encoder and decoder that are both parametric Predictive sparse coding is an example of learned approximate inference (section 19.5)

9. Applications of Autoencoders
Autoencoder applications Feature learning Good features can be obtained in the hidden layer Dimensionality reduction For example, a 2006 study resulted in better results than PCA, with the representation easier to interpret and the categories manifested as well-separated clusters Information retrieval A task that benefits more than usual from dimensionality reduction is the information retrieval task of finding entries in a database that resemble a query entry

Goodfellow: Chapter 14 Autoencoders

Similar presentations

Presentation on theme: "Goodfellow: Chapter 14 Autoencoders"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Goodfellow: Chapter 14 Autoencoders

Similar presentations

Presentation on theme: "Goodfellow: Chapter 14 Autoencoders"— Presentation transcript:

Similar presentations

About project

Feedback