Download presentation
Presentation is loading. Please wait.
Published byIsabelle Godin Modified over 6 years ago
1
Auto-encoder (draft) KH Wong Auto-encoder v.9c3
2
Overview Introduction Theory Architecture Application example
Auto-encoder v.9c3
3
Introduction What is auto-decoder? Application Method
A unsupervised method Application for noise removal Dimensional reduction Method Use noise-free ground truth data (e.g. minst)+ self generative noise to taring the network The final network can remove noise of input (e.g. hand written characters) similar to the ground truth data Auto-encoder v.9c3
4
Noise removal Result: plt.title('Original images: top rows,' 'Corrupted Input: middle rows, ' 'Denoised Input: third rows') Auto-encoder v.9c3
5
Auto encoder Structure
An autoencoder is a feedforward neural network that learns to predict the input (corrupted by noise) itself in the output. 𝑦 (𝑖) = 𝑥 (𝑖) The input-to-hidden part corresponds to an encoder The hidden-to-output part corresponds to a decoder. Input Output Auto-encoder v.9c3
6
Theory x->F->x’ z=(Wx+b) x’=’(W’z+b’)
Autoencoders are trained to minimize reconstruction errors (such as squared errors), often referred to as the "loss": L(x,x’)=||x-x’||2 =||x-’(W’ (Wx+b)+b’)||2 ’ Auto-encoder v.9c3
7
Architecture Encoder and decoder
Training can use typical backpropagation methods Auto-encoder v.9c3
8
Training Using MINST data set and added noise to train the autoencoder using backpropagation Added noise Clean MINST samples + Autoencoder training by backpropagation same Clean MINST samples Auto-encoder v.9c3
9
Recall After training, autoencoder can removal noise Trained
Noisy Input Trained autoencoder Denoised Output Auto-encoder v.9c3
10
#part1 ---------------------------------------------------
np.random.seed(1337) # MNIST dataset (x_train, _), (x_test, _) = mnist.load_data() image_size = x_train.shape[1] x_train = np.reshape(x_train, [-1, image_size, image_size, 1]) x_test = np.reshape(x_test, [-1, image_size, image_size, 1]) x_train = x_train.astype('float32') / 255 x_test = x_test.astype('float32') / 255 # Generate corrupted MNIST images by adding noise with normal dist # centered at 0.5 and std=0.5 noise = np.random.normal(loc=0.5, scale=0.5, size=x_train.shape) x_train_noisy = x_train + noise noise = np.random.normal(loc=0.5, scale=0.5, size=x_test.shape) x_test_noisy = x_test + noise x_train_noisy = np.clip(x_train_noisy, 0., 1.) x_test_noisy = np.clip(x_test_noisy, 0., 1.) Code: Part1: obtain dataset and add noise Auto-encoder v.9c3
11
Part 2:First build the Encoder Model
# Network parameters input_shape = (image_size, image_size, 1) batch_size = 128 kernel_size = 3 latent_dim = 16 # Encoder/Decoder number of CNN layers and filters per layer layer_filters = [32, 64] # Build the Autoencoder Model # First build the Encoder Model inputs = Input(shape=input_shape, name='encoder_input') x = inputs # Stack of Conv2D blocks # Notes: # 1) Use Batch Normalization before ReLU on deep networks # 2) Use MaxPooling2D as alternative to strides>1 # - faster but not as good as strides>1 for filters in layer_filters: x = Conv2D(filters=filters, kernel_size=kernel_size, strides=2, activation='relu', padding='same')(x) # Shape info needed to build Decoder Model shape = K.int_shape(x) # Generate the latent vector x = Flatten()(x) latent = Dense(latent_dim, name='latent_vector')(x) # Instantiate Encoder Model encoder = Model(inputs, latent, name='encoder') encoder.summary() Part 2:First build the Encoder Model Auto-encoder v.9c3
12
Part 3:Build the Decoder Model
latent_inputs = Input(shape=(latent_dim,), name='decoder_input') x = Dense(shape[1] * shape[2] * shape[3])(latent_inputs) x = Reshape((shape[1], shape[2], shape[3]))(x) # Stack of Transposed Conv2D blocks # Notes: # 1) Use Batch Normalization before ReLU on deep networks # 2) Use UpSampling2D as alternative to strides>1 # - faster but not as good as strides>1 for filters in layer_filters[::-1]: x = Conv2DTranspose(filters=filters, kernel_size=kernel_size, strides=2, activation='relu', padding='same')(x) x = Conv2DTranspose(filters=1, outputs = Activation('sigmoid', name='decoder_output')(x) # Instantiate Decoder Model decoder = Model(latent_inputs, outputs, name='decoder') decoder.summary() # Autoencoder = Encoder + Decoder # Instantiate Autoencoder Model autoencoder = Model(inputs, decoder(encoder(inputs)), name='autoencoder') autoencoder.summary() autoencoder.compile(loss='mse', optimizer='adam') Part 3:Build the Decoder Model Auto-encoder v.9c3
13
Part 4: Train the autoencoder, decode images display result
autoencoder.fit(x_train_noisy, x_train, validation_data=(x_test_noisy, x_test), epochs=30, batch_size=batch_size) # Predict the Autoencoder output from corrupted test images x_decoded = autoencoder.predict(x_test_noisy) # Display the 1st 8 corrupted and denoised images rows, cols = 10, 30 num = rows * cols imgs = np.concatenate([x_test[:num], x_test_noisy[:num], x_decoded[:num]]) imgs = imgs.reshape((rows * 3, cols, image_size, image_size)) imgs = np.vstack(np.split(imgs, rows, axis=1)) imgs = imgs.reshape((rows * 3, -1, image_size, image_size)) imgs = np.vstack([np.hstack(i) for i in imgs]) imgs = (imgs * 255).astype(np.uint8) plt.figure() plt.axis('off') plt.title('Original images: top rows, ' 'Corrupted Input: middle rows, ' 'Denoised Input: third rows') plt.imshow(imgs, interpolation='none', cmap='gray') Image.fromarray(imgs).save('corrupted_and_denoised.png') plt.show() Part 4: Train the autoencoder, decode images display result Auto-encoder v.9c3
14
Code https://towardsdatascience
Code Result: plt.title('Original images: top rows, ' 'Corrupted Input: middle rows, ' 'Denoised Input: third rows') '''Trains a denoising autoencoder on MNIST dataset. The denoising process removes unwanted noise that corrupted the Denoising is one of the classic applications of autoencoders. true signal. Noise + Data ---> Denoising Autoencoder ---> Data Given a training dataset of corrupted data as input and hidden structure to generate clean data. true signal as output, a denoising autoencoder can recover the are 3 models that share weights. For example, after training the This example has modular design. The encoder, decoder and autoencoder autoencoder, the encoder can be used to generate latent vectors ''' of input data for low-dim visualization like PCA or TSNE. #keras>> tensorflow.keras, modification by khw from __future__ import print_function from __future__ import division from __future__ import absolute_import import tensorflow.keras as keras from tensorflow.keras.layers import Reshape, Conv2DTranspose from tensorflow.keras.layers import Conv2D, Flatten from tensorflow.keras.layers import Activation, Dense, Input from tensorflow.keras.models import Model import numpy as np from tensorflow.keras.datasets import mnist from tensorflow.keras import backend as K from PIL import Image import matplotlib.pyplot as plt np.random.seed(1337) # MNIST dataset (x_train, _), (x_test, _) = mnist.load_data() x_train = np.reshape(x_train, [-1, image_size, image_size, 1]) image_size = x_train.shape[1] x_test = np.reshape(x_test, [-1, image_size, image_size, 1]) x_test = x_test.astype('float32') / 255 x_train = x_train.astype('float32') / 255 # Generate corrupted MNIST images by adding noise with normal dist x_train_noisy = x_train + noise noise = np.random.normal(loc=0.5, scale=0.5, size=x_train.shape) # centered at 0.5 and std=0.5 x_test_noisy = x_test + noise noise = np.random.normal(loc=0.5, scale=0.5, size=x_test.shape) x_test_noisy = np.clip(x_test_noisy, 0., 1.) x_train_noisy = np.clip(x_train_noisy, 0., 1.) batch_size = 128 input_shape = (image_size, image_size, 1) # Network parameters latent_dim = 16 kernel_size = 3 layer_filters = [32, 64] # Encoder/Decoder number of CNN layers and filters per layer # Build the Autoencoder Model x = inputs inputs = Input(shape=input_shape, name='encoder_input') # First build the Encoder Model # Notes: # Stack of Conv2D blocks # - faster but not as good as strides>1 # 2) Use MaxPooling2D as alternative to strides>1 # 1) Use Batch Normalization before ReLU on deep networks kernel_size=kernel_size, x = Conv2D(filters=filters, for filters in layer_filters: strides=2, padding='same')(x) activation='relu', shape = K.int_shape(x) # Shape info needed to build Decoder Model x = Flatten()(x) # Generate the latent vector latent = Dense(latent_dim, name='latent_vector')(x) # Instantiate Encoder Model encoder.summary() encoder = Model(inputs, latent, name='encoder') # Build the Decoder Model x = Reshape((shape[1], shape[2], shape[3]))(x) x = Dense(shape[1] * shape[2] * shape[3])(latent_inputs) latent_inputs = Input(shape=(latent_dim,), name='decoder_input') # Stack of Transposed Conv2D blocks # 2) Use UpSampling2D as alternative to strides>1 x = Conv2DTranspose(filters=filters, for filters in layer_filters[::-1]: x = Conv2DTranspose(filters=1, outputs = Activation('sigmoid', name='decoder_output')(x) decoder = Model(latent_inputs, outputs, name='decoder') # Instantiate Decoder Model decoder.summary() # Instantiate Autoencoder Model # Autoencoder = Encoder + Decoder autoencoder.summary() autoencoder = Model(inputs, decoder(encoder(inputs)), name='autoencoder') autoencoder.compile(loss='mse', optimizer='adam') # Train the autoencoder autoencoder.fit(x_train_noisy, epochs=30, validation_data=(x_test_noisy, x_test), x_train, batch_size=batch_size) x_decoded = autoencoder.predict(x_test_noisy) # Predict the Autoencoder output from corrupted test images # Display the 1st 8 corrupted and denoised images imgs = np.concatenate([x_test[:num], x_test_noisy[:num], x_decoded[:num]]) num = rows * cols rows, cols = 10, 30 imgs = imgs.reshape((rows * 3, -1, image_size, image_size)) imgs = np.vstack(np.split(imgs, rows, axis=1)) imgs = imgs.reshape((rows * 3, cols, image_size, image_size)) imgs = np.vstack([np.hstack(i) for i in imgs]) plt.axis('off') plt.figure() imgs = (imgs * 255).astype(np.uint8) plt.title('Original images: top rows, ' plt.imshow(imgs, interpolation='none', cmap='gray') 'Denoised Input: third rows') 'Corrupted Input: middle rows, ' plt.show() Image.fromarray(imgs).save('corrupted_and_denoised.png') Auto-encoder v.9c3
15
Deep Autoencoders _ Auto-encoder v.9c3
16
Deep Autoencoders: architecture
A deep Autoencoder is constructed by extending the encoder and decoder of autoencoder with multiple hidden layers. Gradient vanishing problem: the gradient becomes too small as it passes back through many layers Auto-encoder v.9c3
17
Denoising Autoencoders
By adding stochastic noise to the, it can force Autoencoder to learn more robust features. The loss function of Denoising autoencoder: where 1. A higher-level representation should be rather stable and robust under corruptions of the input. 2. Performing the denoising task well requires extracting features that capture useful structure in the input distribution. 3. Denoising is not the primary goal. It is advocated and investigated as a training criterion for learning to extract useful features that will constitute better higher-level representation. Auto-encoder v.9c3
18
Training Denoising Autoencoder
Like deep Autoencoder, we can stack multiple denoising autoencoders layer-wisely to form a Stacked DenoisingAutoencoder. Auto-encoder v.9c3
19
Deep Reinforcement Learning
Applications Playing Atari Games AlphaGO Auto-encoder v.9c3
20
Variational autoencoder
_ Auto-encoder v.9c3
21
Variational Autoencoder (VAE) v.s. Autoencoder
Autoencoders During training you present a pattern with artificial added noise to the encoder. And feed the same input pattern to the output. Then, use backpropagation to train the Autoencoder network. So it is unsupervised learning (no label data is needed). It can be used for data compression and noise removal. During recall, when a noisy pattern is presented to the input, the a de-noised pattern will appear in the output. Variational autoencoders Instead of learning a pattern from a pattern, variational autoencoders learn the parameters of a probability distribution function from the input patterns. We then use the parameter slearned to generate new data. So it is a generative model similar to GAN (Generative Adversarial Network). Auto-encoder v.9c3
22
variational autoencoder https://jaan
Variational autoencoders are cool. They let us design complex generative models of data, and fit them to large datasets. They can generate images of fictional celebrity faces and high-resolution digital artwork. VAE faces VAE faces demo VAE MNIST VAE street addresses FICTIONAL CELEBRITY FACES GENERATED BY A VARIATIONAL AUTOENCODER (BY ALEC RADFORD). Auto-encoder v.9c3
23
d d https://www.jeremyjordan.me/variational-autoencoders/
Auto-encoder v.9c3
24
Variational autoencoder
Auto-encoder v.9c3
25
Example of variational autoencoder
Auto-encoder v.9c3
26
Autoencoders Autoencoders are designed to reproduce their input, especially for images. Key point is to reproduce the input from a learned encoding. Auto-encoder v.9c3
27
Variational Autoencoder (VAE)
Z=Latent Variable By sampling Encoder and the decoder are neural networks. The latent variables, z, are drawn from a probability distribution depending on the input, X, and the reconstruction is chosen probabilistically from z. ??That means after you obtain mean=µ,variance 2, sample from X (500) to get Z (30) Encoder Q (z|X) Decoder P (X|z) Z X Z=Sample from a distribution N(µ,) Auto-encoder v.9c3
28
VAE Encoder The encoder takes input and returns parameters for a probability density (e.g., Gaussian): I.e., gives the mean and co-variance matrix. We can sample from this distribution to get random values of the lower-dimensional representation z. Implemented via a neural network: each input x gives a vector mean and diagonal covariance matrix that determine the Gaussian density Parameters 𝜃 for the NN need to be learned – need to set up a loss function. Auto-encoder v.9c3
29
VAE Decoder The decoder takes latent variable z and returns parameters for a distribution. E.g., gives the mean and variance for each pixel in the output. Reconstruction is produced by sampling. Implemented via neural network, the NN parameters 𝜙 are learned. Auto-encoder v.9c3
30
VAE loss function Loss function for autoencoder: L2 distance between output and input (or clean input for denoising case) For VAE, we need to learn parameters of two probability distributions. For a single input, xi, we maximize the expected value of returning xi or minimize the expected negative log likelihood. This takes expected value wrt z over the current distribution of the loss Auto-encoder v.9c3
31
VAE loss function Problem: the weights may adjust to memorize input images via z. I.e., input that we regard as similar may end up very different in z space. We prefer continuous latent representations to give meaningful parameterizations. E.g., smooth changes from one digit to another. Solution: Try to force to be close to a standard normal (or some other simple density). Auto-encoder v.9c3
32
VAE loss function For a single data point xi we get the loss function
For a single data point xi we get the loss function The first term promotes recovery of the input. The second term keeps the encoding continuous – the encoding is compared to a fixed p(z) regardless of the input, which inhibits memorization. With this loss function the VAE can (almost) be trained using gradient descent on minibatches. Auto-encoder v.9c3
33
VAE loss function For a single data point xi we get the loss function
For a single data point xi we get the loss function Problem: The expectation would usually be approximated by choosing samples and averaging. This is not differentiable wrt 𝜃 and 𝜙. Auto-encoder v.9c3
34
Some math background is needed:
See appendix: The expected negative log likelihood Conditional expectation etc. Auto-encoder v.9c3
35
Example A Tutorial on Information Maximizing Variational Autoencoders (InfoVAE) Auto-encoder v.9c3
36
VAE loss function Problem: The expectation would usually be approximated by choosing samples and averaging. This is not differentiable wrt 𝜃 and 𝜙. Auto-encoder v.9c3
37
VAE loss function Reparameterization: If z is 𝑁(𝜇 𝑥 𝑖 , Σ 𝑥 𝑖 ), then we can sample z using 𝑧=𝜇 𝑥 𝑖 +√(Σ 𝑥 𝑖 ) 𝜖, where 𝜖 is N(0,1). So we can draw samples from N(0,1), which doesn’t depend on the parameters. Auto-encoder v.9c3
38
VAE generative model After training, is close to a standard normal, N(0,1) – easy to sample. Using a sample of z from as input to sample from gives an approximate reconstruction of xi, at least in expectation. If we sample any z from N(0,1) and use it as input to to sample from then we can approximate the entire data distribution p(x). I.e., we can generate new samples that look like the input but aren’t in the input. Auto-encoder v.9c3
39
Implementation : learning
example Auto-encoder v.9c3
40
Algorithm : learning We also want the output to be similar to the input This Q(z|x) =N(µz|X,z|X) should be close to N(0,I) P(X|z) StdDev= Qq(z|x) Figure 4: A variational autoencoder with the "reparameterization trick". Notice that all operations between the inputs and objectives are continuous deterministic functions, allowing back-propagation to occur. Figure 3: An initial attempt at a variational autoencoder without the "reparameterization trick". Objective functions shown in red. We cannot back-propagate through the stochastic sampling operation because it is not a continuous deterministic function. Auto-encoder v.9c3
41
Training: Loss L is to be minimized
Auto-encoder v.9c3
42
KL Divergence for 2 Gaussians
Auto-encoder v.9c3
43
Algorithm : recall after training
x Figure 2 A graphical model of a typical variational autoencoder (without a "encoder", just the "decoder"). We're using a modified plate notation: the circles represent variables/parameters, rectangular boxes with a number in the lower right corner to represent multiple instances of the contained variables, and the little diagram in the middle is a representation of a deterministic neural network (function approximator). Figure 5 The generative model component of a variational autoencoder in test mode. Auto-encoder v.9c3
44
math Auto-encoder v.9c3
45
Math L is to be minimized Auto-encoder v.9c3
46
Kullback–Leibler divergence KL (D) for two Gaussians
Auto-encoder v.9c3
47
Training: Loss L is to be minimized
Auto-encoder v.9c3
48
Implementation Keras Auto-encoder v.9c3
49
Keras StdDev= Auto-encoder v.9c3
50
Keras implementation of VAE
original_dim = 784 intermediate_dim = 256 latent_dim = 2 batch_size = 100 epochs = 50 epsilon_std = 1.0 Keras implementation of VAE x = Input(shape=(original_dim,)) h = Dense(intermediate_dim, activation='relu')(x) z_mu = Dense(latent_dim)(h) z_log_var = Dense(latent_dim)(h) z_mu, z_log_var = KLDivergenceLayer()([z_mu, z_log_var]) # Use of lambda: normalize log variance to std dev z_sigma = Lambda(lambda t: K.exp(.5*t))(z_log_var) eps = Input(tensor=K.random_normal(shape=(K.shape(x)[0], latent_dim))) z_eps = Multiply()([z_sigma, eps]) z = Add()([z_mu, z_eps]) decoder = Sequential([ Dense(intermediate_dim, input_dim=latent_dim, activation='relu'), Dense(original_dim, activation='sigmoid') ]) x_pred = decoder(z) StdDev= Predicted output Auto-encoder v.9c3
51
df StdDev= Auto-encoder v.9c3
52
Derivation of expected value E()
Auto-encoder v.9c3
53
Math background Inverse transform sampling is a method for sampling from any distribution given its cumulative distribution function (CDF), F(x). For a given distribution with CDF F(x), it works as such: Sample a value, u, between [0,1] from a uniform distribution. Define the inverse of the CDF as F−1(u) (the domain is a probability value between [0,1]. F−1(u) is a sample from your target distribution. Auto-encoder v.9c3
54
Proof The proof of correctness is actually pretty simple. Let U be a uniform random variable on [0,1], and transformation F−1(U) as before, then we have: Thus, we have shown that F−1(U) has the distribution of our target random variable (since the cumulative distribution function CDF F(x) is the same). It's important to note what we did: we took an easy to sample random variable U, performed a deterministic transformation F−1(U) and ended up with a random variable that was distributed according to our target distribution. Auto-encoder v.9c3
55
Example As a simple example, we can try to generate a exponential distribution with CDF of F(x)=1−e−λx for x≥0x≥0. The inverse is defined by x=F−1(u)=−(1/λ)log(1−y). Thus, we can sample from an exponential distribution just by iteratively evaluating this expression with a uniform randomly distributed number. Auto-encoder v.9c3
56
Extensions Now instead of starting from a uniform distribution, what happens if we want to sample from another distribution, say a normal distribution? We just first apply the reverse of the inverse sampling transform called the Probability Integral Transform. So the steps would be: Sample from a normal distribution. Apply the probability integral transform using the CDF (cumulative distribution function ) of a normal distribution to get a uniformly distributed sample. Apply inverse transform sampling with the inverse CDF of the target distribution to get a sample from our target distribution. What about extending to multiple dimensions? We can just break up the joint distribution into its conditional components and sample each sequentially to construct the overall sample: P(x1,…,xn)=P(xn|xn−1,…,x1)…P(x2|x1)P(x1) (4) In detail, first sample x1 using the method above, then x2|x1, then x3|x2,x1, and so on. Of course, this implicitly means you would have the CDF of each of those distributions available, which practically might not be possible. Auto-encoder v.9c3
57
A graphical model of a typical variational autoencoder (without a "encoder", just the "decoder"). We're using a modified plate notation: the circles represent variables/parameters, rectangular boxes with a number in the lower right corner to represent multiple instances of the contained variables, and the little diagram in the middle is a representation of a deterministic neural network (function approximator). Note: we can put another distribution on X like a Bernoulli for binary data parameterized by p=g(z;θ). The important part is we're able to maximize the likelihood over the θ parameters. Implicitly, we will want our output variable to be continuous in θ so we can take its gradient. Auto-encoder v.9c3
58
A hard fit First, we need to define the probability of seeing a single example x: The probability of a single sample is just the joint probability of our given model marginalizing (i.e. integrating) out Z. Since we don't have an analytical form of the density, we approximate the integral by averaging over M samples from Z∼N(0,I). Putting together the log-likelihood (defined by logging the density and summing over all of our N observations): Auto-encoder v.9c3
59
appendix Auto-encoder v.9c3
60
Training Denoising Autoencoder on MNIST
The following pictures show the difference between the resulting filters of Denoising Autoencoder trained on MNIST with different noise ratios. No noise (noise ratio=0%) noise ratio=30% Auto-encoder v.9c3 Diagram from (Hinton and Salakhutdinov, 2006)
61
Objective function Auto-encoder v.9c3
62
x Auto-encoder v.9c3
63
d d Auto-encoder v.9c3
64
Negative Log-Likelihood (NLL)
Auto-encoder v.9c3
65
Softmax Activation Function
Auto-encoder v.9c3
66
Negative Log-Likelihood (NLL)
Auto-encoder v.9c3
67
Derivation of the softmax
Auto-encoder v.9c3
68
c c Auto-encoder v.9c3
69
d d Auto-encoder v.9c3
70
d sd Auto-encoder v.9c3
71
Conditional probability density
Auto-encoder v.9c3
72
Expectation Conditional probability 1
Example: E(X | Y ) = { 9/4 with probability 1/8, and 18/4 with probability 7/8}. Total average is: E(X) = EY{ E(X|Y) } =(9/ 4) ×( 1 /8) + (18/ 4) × (7 /8) = 4.22. Auto-encoder v.9c3
73
Expectation Conditional probability 2 -- example
Auto-encoder v.9c3
74
Prior and Posterior probability desitizes
Auto-encoder v.9c3
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.