Auto-encoder (draft) KH Wong Auto-encoder v.9c3.

Auto-encoder (draft) KH Wong Auto-encoder v.9c3

Overview Introduction Theory Architecture Application example
Auto-encoder v.9c3

Introduction What is auto-decoder? Application Method
A unsupervised method Application for noise removal Dimensional reduction Method Use noise-free ground truth data (e.g. minst)+ self generative noise to taring the network The final network can remove noise of input (e.g. hand written characters) similar to the ground truth data Auto-encoder v.9c3

Noise removal Result: plt.title('Original images: top rows,' 'Corrupted Input: middle rows, ' 'Denoised Input: third rows') Auto-encoder v.9c3

Auto encoder Structure
An autoencoder is a feedforward neural network that learns to predict the input (corrupted by noise) itself in the output. 𝑦 (𝑖) = 𝑥 (𝑖) The input-to-hidden part corresponds to an encoder The hidden-to-output part corresponds to a decoder. Input Output Auto-encoder v.9c3

Theory x->F->x’ z=(Wx+b) x’=’(W’z+b’)
Autoencoders are trained to minimize reconstruction errors (such as squared errors), often referred to as the "loss": L(x,x’)=||x-x’||2 =||x-’(W’ (Wx+b)+b’)||2 ’  Auto-encoder v.9c3

Architecture Encoder and decoder
Training can use typical backpropagation methods Auto-encoder v.9c3

Training Using MINST data set and added noise to train the autoencoder using backpropagation Added noise Clean MINST samples + Autoencoder training by backpropagation same Clean MINST samples Auto-encoder v.9c3

Recall After training, autoencoder can removal noise Trained
Noisy Input Trained autoencoder Denoised Output Auto-encoder v.9c3

#part1 ---------------------------------------------------
np.random.seed(1337) # MNIST dataset (x_train, _), (x_test, _) = mnist.load_data() image_size = x_train.shape[1] x_train = np.reshape(x_train, [-1, image_size, image_size, 1]) x_test = np.reshape(x_test, [-1, image_size, image_size, 1]) x_train = x_train.astype('float32') / 255 x_test = x_test.astype('float32') / 255 # Generate corrupted MNIST images by adding noise with normal dist # centered at 0.5 and std=0.5 noise = np.random.normal(loc=0.5, scale=0.5, size=x_train.shape) x_train_noisy = x_train + noise noise = np.random.normal(loc=0.5, scale=0.5, size=x_test.shape) x_test_noisy = x_test + noise x_train_noisy = np.clip(x_train_noisy, 0., 1.) x_test_noisy = np.clip(x_test_noisy, 0., 1.) Code: Part1: obtain dataset and add noise Auto-encoder v.9c3

Part 2:First build the Encoder Model
# Network parameters input_shape = (image_size, image_size, 1) batch_size = 128 kernel_size = 3 latent_dim = 16 # Encoder/Decoder number of CNN layers and filters per layer layer_filters = [32, 64] # Build the Autoencoder Model # First build the Encoder Model inputs = Input(shape=input_shape, name='encoder_input') x = inputs # Stack of Conv2D blocks # Notes: # 1) Use Batch Normalization before ReLU on deep networks # 2) Use MaxPooling2D as alternative to strides>1 # - faster but not as good as strides>1 for filters in layer_filters: x = Conv2D(filters=filters, kernel_size=kernel_size, strides=2, activation='relu', padding='same')(x) # Shape info needed to build Decoder Model shape = K.int_shape(x) # Generate the latent vector x = Flatten()(x) latent = Dense(latent_dim, name='latent_vector')(x) # Instantiate Encoder Model encoder = Model(inputs, latent, name='encoder') encoder.summary() Part 2:First build the Encoder Model Auto-encoder v.9c3

Part 3:Build the Decoder Model
latent_inputs = Input(shape=(latent_dim,), name='decoder_input') x = Dense(shape[1] * shape[2] * shape[3])(latent_inputs) x = Reshape((shape[1], shape[2], shape[3]))(x) # Stack of Transposed Conv2D blocks # Notes: # 1) Use Batch Normalization before ReLU on deep networks # 2) Use UpSampling2D as alternative to strides>1 # - faster but not as good as strides>1 for filters in layer_filters[::-1]: x = Conv2DTranspose(filters=filters, kernel_size=kernel_size, strides=2, activation='relu', padding='same')(x) x = Conv2DTranspose(filters=1, outputs = Activation('sigmoid', name='decoder_output')(x) # Instantiate Decoder Model decoder = Model(latent_inputs, outputs, name='decoder') decoder.summary() # Autoencoder = Encoder + Decoder # Instantiate Autoencoder Model autoencoder = Model(inputs, decoder(encoder(inputs)), name='autoencoder') autoencoder.summary() autoencoder.compile(loss='mse', optimizer='adam') Part 3:Build the Decoder Model Auto-encoder v.9c3

Part 4: Train the autoencoder, decode images display result
autoencoder.fit(x_train_noisy, x_train, validation_data=(x_test_noisy, x_test), epochs=30, batch_size=batch_size) # Predict the Autoencoder output from corrupted test images x_decoded = autoencoder.predict(x_test_noisy) # Display the 1st 8 corrupted and denoised images rows, cols = 10, 30 num = rows * cols imgs = np.concatenate([x_test[:num], x_test_noisy[:num], x_decoded[:num]]) imgs = imgs.reshape((rows * 3, cols, image_size, image_size)) imgs = np.vstack(np.split(imgs, rows, axis=1)) imgs = imgs.reshape((rows * 3, -1, image_size, image_size)) imgs = np.vstack([np.hstack(i) for i in imgs]) imgs = (imgs * 255).astype(np.uint8) plt.figure() plt.axis('off') plt.title('Original images: top rows, ' 'Corrupted Input: middle rows, ' 'Denoised Input: third rows') plt.imshow(imgs, interpolation='none', cmap='gray') Image.fromarray(imgs).save('corrupted_and_denoised.png') plt.show() Part 4: Train the autoencoder, decode images display result Auto-encoder v.9c3

Code https://towardsdatascience
Code Result: plt.title('Original images: top rows, ' 'Corrupted Input: middle rows, ' 'Denoised Input: third rows') '''Trains a denoising autoencoder on MNIST dataset. The denoising process removes unwanted noise that corrupted the Denoising is one of the classic applications of autoencoders. true signal. Noise + Data ---> Denoising Autoencoder ---> Data Given a training dataset of corrupted data as input and hidden structure to generate clean data. true signal as output, a denoising autoencoder can recover the are 3 models that share weights. For example, after training the This example has modular design. The encoder, decoder and autoencoder autoencoder, the encoder can be used to generate latent vectors ''' of input data for low-dim visualization like PCA or TSNE. #keras>> tensorflow.keras, modification by khw from __future__ import print_function from __future__ import division from __future__ import absolute_import import tensorflow.keras as keras from tensorflow.keras.layers import Reshape, Conv2DTranspose from tensorflow.keras.layers import Conv2D, Flatten from tensorflow.keras.layers import Activation, Dense, Input from tensorflow.keras.models import Model import numpy as np from tensorflow.keras.datasets import mnist from tensorflow.keras import backend as K from PIL import Image import matplotlib.pyplot as plt np.random.seed(1337) # MNIST dataset (x_train, _), (x_test, _) = mnist.load_data() x_train = np.reshape(x_train, [-1, image_size, image_size, 1]) image_size = x_train.shape[1] x_test = np.reshape(x_test, [-1, image_size, image_size, 1]) x_test = x_test.astype('float32') / 255 x_train = x_train.astype('float32') / 255 # Generate corrupted MNIST images by adding noise with normal dist x_train_noisy = x_train + noise noise = np.random.normal(loc=0.5, scale=0.5, size=x_train.shape) # centered at 0.5 and std=0.5 x_test_noisy = x_test + noise noise = np.random.normal(loc=0.5, scale=0.5, size=x_test.shape) x_test_noisy = np.clip(x_test_noisy, 0., 1.) x_train_noisy = np.clip(x_train_noisy, 0., 1.) batch_size = 128 input_shape = (image_size, image_size, 1) # Network parameters latent_dim = 16 kernel_size = 3 layer_filters = [32, 64] # Encoder/Decoder number of CNN layers and filters per layer # Build the Autoencoder Model x = inputs inputs = Input(shape=input_shape, name='encoder_input') # First build the Encoder Model # Notes: # Stack of Conv2D blocks # - faster but not as good as strides>1 # 2) Use MaxPooling2D as alternative to strides>1 # 1) Use Batch Normalization before ReLU on deep networks kernel_size=kernel_size, x = Conv2D(filters=filters, for filters in layer_filters: strides=2, padding='same')(x) activation='relu', shape = K.int_shape(x) # Shape info needed to build Decoder Model x = Flatten()(x) # Generate the latent vector latent = Dense(latent_dim, name='latent_vector')(x) # Instantiate Encoder Model encoder.summary() encoder = Model(inputs, latent, name='encoder') # Build the Decoder Model x = Reshape((shape[1], shape[2], shape[3]))(x) x = Dense(shape[1] * shape[2] * shape[3])(latent_inputs) latent_inputs = Input(shape=(latent_dim,), name='decoder_input') # Stack of Transposed Conv2D blocks # 2) Use UpSampling2D as alternative to strides>1 x = Conv2DTranspose(filters=filters, for filters in layer_filters[::-1]: x = Conv2DTranspose(filters=1, outputs = Activation('sigmoid', name='decoder_output')(x) decoder = Model(latent_inputs, outputs, name='decoder') # Instantiate Decoder Model decoder.summary() # Instantiate Autoencoder Model # Autoencoder = Encoder + Decoder autoencoder.summary() autoencoder = Model(inputs, decoder(encoder(inputs)), name='autoencoder') autoencoder.compile(loss='mse', optimizer='adam') # Train the autoencoder autoencoder.fit(x_train_noisy, epochs=30, validation_data=(x_test_noisy, x_test), x_train, batch_size=batch_size) x_decoded = autoencoder.predict(x_test_noisy) # Predict the Autoencoder output from corrupted test images # Display the 1st 8 corrupted and denoised images imgs = np.concatenate([x_test[:num], x_test_noisy[:num], x_decoded[:num]]) num = rows * cols rows, cols = 10, 30 imgs = imgs.reshape((rows * 3, -1, image_size, image_size)) imgs = np.vstack(np.split(imgs, rows, axis=1)) imgs = imgs.reshape((rows * 3, cols, image_size, image_size)) imgs = np.vstack([np.hstack(i) for i in imgs]) plt.axis('off') plt.figure() imgs = (imgs * 255).astype(np.uint8) plt.title('Original images: top rows, ' plt.imshow(imgs, interpolation='none', cmap='gray') 'Denoised Input: third rows') 'Corrupted Input: middle rows, ' plt.show() Image.fromarray(imgs).save('corrupted_and_denoised.png') Auto-encoder v.9c3

Deep Autoencoders _ Auto-encoder v.9c3

Deep Autoencoders: architecture
A deep Autoencoder is constructed by extending the encoder and decoder of autoencoder with multiple hidden layers. Gradient vanishing problem: the gradient becomes too small as it passes back through many layers Auto-encoder v.9c3

Denoising Autoencoders
By adding stochastic noise to the, it can force Autoencoder to learn more robust features. The loss function of Denoising autoencoder: where 1. A higher-level representation should be rather stable and robust under corruptions of the input. 2. Performing the denoising task well requires extracting features that capture useful structure in the input distribution. 3. Denoising is not the primary goal. It is advocated and investigated as a training criterion for learning to extract useful features that will constitute better higher-level representation. Auto-encoder v.9c3

Training Denoising Autoencoder
Like deep Autoencoder, we can stack multiple denoising autoencoders layer-wisely to form a Stacked DenoisingAutoencoder. Auto-encoder v.9c3

Deep Reinforcement Learning
Applications Playing Atari Games AlphaGO Auto-encoder v.9c3

Variational autoencoder
_ Auto-encoder v.9c3

Variational Autoencoder (VAE) v.s. Autoencoder
Autoencoders During training you present a pattern with artificial added noise to the encoder. And feed the same input pattern to the output. Then, use backpropagation to train the Autoencoder network. So it is unsupervised learning (no label data is needed). It can be used for data compression and noise removal. During recall, when a noisy pattern is presented to the input, the a de-noised pattern will appear in the output. Variational autoencoders Instead of learning a pattern from a pattern, variational autoencoders learn the parameters of a probability distribution function from the input patterns. We then use the parameter slearned to generate new data. So it is a generative model similar to GAN (Generative Adversarial Network). Auto-encoder v.9c3

variational autoencoder https://jaan
Variational autoencoders are cool. They let us design complex generative models of data, and fit them to large datasets. They can generate images of fictional celebrity faces and high-resolution digital artwork. VAE faces VAE faces demo VAE MNIST VAE street addresses FICTIONAL CELEBRITY FACES GENERATED BY A VARIATIONAL AUTOENCODER (BY ALEC RADFORD). Auto-encoder v.9c3

d d https://www.jeremyjordan.me/variational-autoencoders/
Auto-encoder v.9c3

Variational autoencoder
Auto-encoder v.9c3

Example of variational autoencoder
Auto-encoder v.9c3

Autoencoders Autoencoders are designed to reproduce their input, especially for images. Key point is to reproduce the input from a learned encoding. Auto-encoder v.9c3

Variational Autoencoder (VAE)
Z=Latent Variable By sampling Encoder and the decoder are neural networks. The latent variables, z, are drawn from a probability distribution depending on the input, X, and the reconstruction is chosen probabilistically from z. ??That means after you obtain mean=µ,variance 2, sample from X (500) to get Z (30) Encoder Q (z|X) Decoder P (X|z) Z X Z=Sample from a distribution N(µ,) Auto-encoder v.9c3

VAE Encoder The encoder takes input and returns parameters for a probability density (e.g., Gaussian): I.e., gives the mean and co-variance matrix. We can sample from this distribution to get random values of the lower-dimensional representation z. Implemented via a neural network: each input x gives a vector mean and diagonal covariance matrix that determine the Gaussian density Parameters 𝜃 for the NN need to be learned – need to set up a loss function. Auto-encoder v.9c3

VAE Decoder The decoder takes latent variable z and returns parameters for a distribution. E.g., gives the mean and variance for each pixel in the output. Reconstruction is produced by sampling. Implemented via neural network, the NN parameters 𝜙 are learned. Auto-encoder v.9c3

VAE loss function Loss function for autoencoder: L2 distance between output and input (or clean input for denoising case) For VAE, we need to learn parameters of two probability distributions. For a single input, xi, we maximize the expected value of returning xi or minimize the expected negative log likelihood. This takes expected value wrt z over the current distribution of the loss Auto-encoder v.9c3

VAE loss function Problem: the weights may adjust to memorize input images via z. I.e., input that we regard as similar may end up very different in z space. We prefer continuous latent representations to give meaningful parameterizations. E.g., smooth changes from one digit to another. Solution: Try to force to be close to a standard normal (or some other simple density). Auto-encoder v.9c3

VAE loss function For a single data point xi we get the loss function
For a single data point xi we get the loss function The first term promotes recovery of the input. The second term keeps the encoding continuous – the encoding is compared to a fixed p(z) regardless of the input, which inhibits memorization. With this loss function the VAE can (almost) be trained using gradient descent on minibatches. Auto-encoder v.9c3

VAE loss function For a single data point xi we get the loss function
For a single data point xi we get the loss function Problem: The expectation would usually be approximated by choosing samples and averaging. This is not differentiable wrt 𝜃 and 𝜙. Auto-encoder v.9c3

Some math background is needed:
See appendix: The expected negative log likelihood Conditional expectation etc. Auto-encoder v.9c3

Example A Tutorial on Information Maximizing Variational Autoencoders (InfoVAE) Auto-encoder v.9c3

VAE loss function Problem: The expectation would usually be approximated by choosing samples and averaging. This is not differentiable wrt 𝜃 and 𝜙. Auto-encoder v.9c3

VAE loss function Reparameterization: If z is 𝑁(𝜇 𝑥 𝑖 , Σ 𝑥 𝑖 ), then we can sample z using 𝑧=𝜇 𝑥 𝑖 +√(Σ 𝑥 𝑖 ) 𝜖, where 𝜖 is N(0,1). So we can draw samples from N(0,1), which doesn’t depend on the parameters. Auto-encoder v.9c3

VAE generative model After training, is close to a standard normal, N(0,1) – easy to sample. Using a sample of z from as input to sample from gives an approximate reconstruction of xi, at least in expectation. If we sample any z from N(0,1) and use it as input to to sample from then we can approximate the entire data distribution p(x). I.e., we can generate new samples that look like the input but aren’t in the input. Auto-encoder v.9c3

Implementation : learning
example Auto-encoder v.9c3

Algorithm : learning We also want the output to be similar to the input This Q(z|x) =N(µz|X,z|X) should be close to N(0,I) P(X|z) StdDev=  Qq(z|x) Figure 4: A variational autoencoder with the "reparameterization trick". Notice that all operations between the inputs and objectives are continuous deterministic functions, allowing back-propagation to occur. Figure 3: An initial attempt at a variational autoencoder without the "reparameterization trick". Objective functions shown in red. We cannot back-propagate through the stochastic sampling operation because it is not a continuous deterministic function. Auto-encoder v.9c3

Training: Loss L is to be minimized
Auto-encoder v.9c3

KL Divergence for 2 Gaussians
Auto-encoder v.9c3

Algorithm : recall after training
x Figure 2 A graphical model of a typical variational autoencoder (without a "encoder", just the "decoder"). We're using a modified plate notation: the circles represent variables/parameters, rectangular boxes with a number in the lower right corner to represent multiple instances of the contained variables, and the little diagram in the middle is a representation of a deterministic neural network (function approximator). Figure 5 The generative model component of a variational autoencoder in test mode. Auto-encoder v.9c3

math Auto-encoder v.9c3

Math L is to be minimized Auto-encoder v.9c3

Kullback–Leibler divergence KL (D) for two Gaussians
Auto-encoder v.9c3

Training: Loss L is to be minimized
Auto-encoder v.9c3

Implementation Keras Auto-encoder v.9c3

Keras StdDev=  Auto-encoder v.9c3

Keras implementation of VAE
original_dim = 784 intermediate_dim = 256 latent_dim = 2 batch_size = 100 epochs = 50 epsilon_std = 1.0 Keras implementation of VAE x = Input(shape=(original_dim,)) h = Dense(intermediate_dim, activation='relu')(x) z_mu = Dense(latent_dim)(h) z_log_var = Dense(latent_dim)(h) z_mu, z_log_var = KLDivergenceLayer()([z_mu, z_log_var]) # Use of lambda: normalize log variance to std dev z_sigma = Lambda(lambda t: K.exp(.5*t))(z_log_var) eps = Input(tensor=K.random_normal(shape=(K.shape(x)[0], latent_dim))) z_eps = Multiply()([z_sigma, eps]) z = Add()([z_mu, z_eps]) decoder = Sequential([ Dense(intermediate_dim, input_dim=latent_dim, activation='relu'), Dense(original_dim, activation='sigmoid') ]) x_pred = decoder(z) StdDev= Predicted output Auto-encoder v.9c3

df StdDev=  Auto-encoder v.9c3

Derivation of expected value E()
Auto-encoder v.9c3

Math background Inverse transform sampling is a method for sampling from any distribution given its cumulative distribution function (CDF), F(x). For a given distribution with CDF F(x), it works as such: Sample a value, u, between [0,1] from a uniform distribution. Define the inverse of the CDF as F−1(u) (the domain is a probability value between [0,1]. F−1(u) is a sample from your target distribution. Auto-encoder v.9c3

Proof The proof of correctness is actually pretty simple. Let U be a uniform random variable on [0,1], and transformation F−1(U) as before, then we have: Thus, we have shown that F−1(U) has the distribution of our target random variable (since the cumulative distribution function CDF F(x) is the same). It's important to note what we did: we took an easy to sample random variable U, performed a deterministic transformation F−1(U) and ended up with a random variable that was distributed according to our target distribution. Auto-encoder v.9c3

Example As a simple example, we can try to generate a exponential distribution with CDF of F(x)=1−e−λx for x≥0x≥0. The inverse is defined by x=F−1(u)=−(1/λ)log(1−y). Thus, we can sample from an exponential distribution just by iteratively evaluating this expression with a uniform randomly distributed number. Auto-encoder v.9c3

Extensions Now instead of starting from a uniform distribution, what happens if we want to sample from another distribution, say a normal distribution? We just first apply the reverse of the inverse sampling transform called the Probability Integral Transform. So the steps would be: Sample from a normal distribution. Apply the probability integral transform using the CDF (cumulative distribution function ) of a normal distribution to get a uniformly distributed sample. Apply inverse transform sampling with the inverse CDF of the target distribution to get a sample from our target distribution. What about extending to multiple dimensions? We can just break up the joint distribution into its conditional components and sample each sequentially to construct the overall sample: P(x1,…,xn)=P(xn|xn−1,…,x1)…P(x2|x1)P(x1) (4) In detail, first sample x1 using the method above, then x2|x1, then x3|x2,x1, and so on. Of course, this implicitly means you would have the CDF of each of those distributions available, which practically might not be possible. Auto-encoder v.9c3

A graphical model of a typical variational autoencoder (without a "encoder", just the "decoder"). We're using a modified plate notation: the circles represent variables/parameters, rectangular boxes with a number in the lower right corner to represent multiple instances of the contained variables, and the little diagram in the middle is a representation of a deterministic neural network (function approximator). Note: we can put another distribution on X like a Bernoulli for binary data parameterized by p=g(z;θ). The important part is we're able to maximize the likelihood over the θ parameters. Implicitly, we will want our output variable to be continuous in θ so we can take its gradient. Auto-encoder v.9c3

A hard fit First, we need to define the probability of seeing a single example x: The probability of a single sample is just the joint probability of our given model marginalizing (i.e. integrating) out Z. Since we don't have an analytical form of the density, we approximate the integral by averaging over M samples from Z∼N(0,I). Putting together the log-likelihood (defined by logging the density and summing over all of our N observations): Auto-encoder v.9c3

appendix Auto-encoder v.9c3

Training Denoising Autoencoder on MNIST
The following pictures show the difference between the resulting filters of Denoising Autoencoder trained on MNIST with different noise ratios. No noise (noise ratio=0%) noise ratio=30% Auto-encoder v.9c3 Diagram from (Hinton and Salakhutdinov, 2006)

Objective function Auto-encoder v.9c3

x Auto-encoder v.9c3

d d Auto-encoder v.9c3

Negative Log-Likelihood (NLL)
Auto-encoder v.9c3

Softmax Activation Function
Auto-encoder v.9c3

Negative Log-Likelihood (NLL)
Auto-encoder v.9c3

Derivation of the softmax
Auto-encoder v.9c3

c c Auto-encoder v.9c3

d d Auto-encoder v.9c3

d sd Auto-encoder v.9c3

Conditional probability density
Auto-encoder v.9c3

Expectation Conditional probability 1
Example: E(X | Y ) = { 9/4 with probability 1/8, and 18/4 with probability 7/8}. Total average is: E(X) = EY{ E(X|Y) } =(9/ 4) ×( 1 /8) + (18/ 4) × (7 /8) = 4.22. Auto-encoder v.9c3

Expectation Conditional probability 2 -- example
Auto-encoder v.9c3

Prior and Posterior probability desitizes
Auto-encoder v.9c3

Auto-encoder (draft) KH Wong Auto-encoder v.9c3.

Similar presentations

Presentation on theme: "Auto-encoder (draft) KH Wong Auto-encoder v.9c3."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Auto-encoder (draft) KH Wong Auto-encoder v.9c3.

Similar presentations

Presentation on theme: "Auto-encoder (draft) KH Wong Auto-encoder v.9c3."— Presentation transcript:

Similar presentations

About project

Feedback