Presentation is loading. Please wait.

Presentation is loading. Please wait.

Auto-encoder (draft) KH Wong Auto-encoder v.9c3.

Similar presentations


Presentation on theme: "Auto-encoder (draft) KH Wong Auto-encoder v.9c3."— Presentation transcript:

1 Auto-encoder (draft) KH Wong Auto-encoder v.9c3

2 Overview Introduction Theory Architecture Application example
Auto-encoder v.9c3

3 Introduction What is auto-decoder? Application Method
A unsupervised method Application for noise removal Dimensional reduction Method Use noise-free ground truth data (e.g. minst)+ self generative noise to taring the network The final network can remove noise of input (e.g. hand written characters) similar to the ground truth data Auto-encoder v.9c3

4 Noise removal Result: plt.title('Original images: top rows,' 'Corrupted Input: middle rows, ' 'Denoised Input: third rows') Auto-encoder v.9c3

5 Auto encoder Structure
An autoencoder is a feedforward neural network that learns to predict the input (corrupted by noise) itself in the output. 𝑦 (𝑖) = 𝑥 (𝑖) The input-to-hidden part corresponds to an encoder The hidden-to-output part corresponds to a decoder. Input Output Auto-encoder v.9c3

6 Theory x->F->x’ z=(Wx+b) x’=’(W’z+b’)
Autoencoders are trained to minimize reconstruction errors (such as squared errors), often referred to as the "loss": L(x,x’)=||x-x’||2 =||x-’(W’ (Wx+b)+b’)||2 ’ Auto-encoder v.9c3

7 Architecture Encoder and decoder
Training can use typical backpropagation methods Auto-encoder v.9c3

8 Training Using MINST data set and added noise to train the autoencoder using backpropagation Added noise Clean MINST samples + Autoencoder training by backpropagation same Clean MINST samples Auto-encoder v.9c3

9 Recall After training, autoencoder can removal noise Trained
Noisy Input Trained autoencoder Denoised Output Auto-encoder v.9c3

10 #part1 ---------------------------------------------------
np.random.seed(1337) # MNIST dataset (x_train, _), (x_test, _) = mnist.load_data() image_size = x_train.shape[1] x_train = np.reshape(x_train, [-1, image_size, image_size, 1]) x_test = np.reshape(x_test, [-1, image_size, image_size, 1]) x_train = x_train.astype('float32') / 255 x_test = x_test.astype('float32') / 255 # Generate corrupted MNIST images by adding noise with normal dist # centered at 0.5 and std=0.5 noise = np.random.normal(loc=0.5, scale=0.5, size=x_train.shape) x_train_noisy = x_train + noise noise = np.random.normal(loc=0.5, scale=0.5, size=x_test.shape) x_test_noisy = x_test + noise x_train_noisy = np.clip(x_train_noisy, 0., 1.) x_test_noisy = np.clip(x_test_noisy, 0., 1.) Code: Part1: obtain dataset and add noise Auto-encoder v.9c3

11 Part 2:First build the Encoder Model
# Network parameters input_shape = (image_size, image_size, 1) batch_size = 128 kernel_size = 3 latent_dim = 16 # Encoder/Decoder number of CNN layers and filters per layer layer_filters = [32, 64] # Build the Autoencoder Model # First build the Encoder Model inputs = Input(shape=input_shape, name='encoder_input') x = inputs # Stack of Conv2D blocks # Notes: # 1) Use Batch Normalization before ReLU on deep networks # 2) Use MaxPooling2D as alternative to strides>1 # - faster but not as good as strides>1 for filters in layer_filters: x = Conv2D(filters=filters, kernel_size=kernel_size, strides=2, activation='relu', padding='same')(x) # Shape info needed to build Decoder Model shape = K.int_shape(x) # Generate the latent vector x = Flatten()(x) latent = Dense(latent_dim, name='latent_vector')(x) # Instantiate Encoder Model encoder = Model(inputs, latent, name='encoder') encoder.summary() Part 2:First build the Encoder Model Auto-encoder v.9c3

12 Part 3:Build the Decoder Model
latent_inputs = Input(shape=(latent_dim,), name='decoder_input') x = Dense(shape[1] * shape[2] * shape[3])(latent_inputs) x = Reshape((shape[1], shape[2], shape[3]))(x) # Stack of Transposed Conv2D blocks # Notes: # 1) Use Batch Normalization before ReLU on deep networks # 2) Use UpSampling2D as alternative to strides>1 # - faster but not as good as strides>1 for filters in layer_filters[::-1]: x = Conv2DTranspose(filters=filters, kernel_size=kernel_size, strides=2, activation='relu', padding='same')(x) x = Conv2DTranspose(filters=1, outputs = Activation('sigmoid', name='decoder_output')(x) # Instantiate Decoder Model decoder = Model(latent_inputs, outputs, name='decoder') decoder.summary() # Autoencoder = Encoder + Decoder # Instantiate Autoencoder Model autoencoder = Model(inputs, decoder(encoder(inputs)), name='autoencoder') autoencoder.summary() autoencoder.compile(loss='mse', optimizer='adam') Part 3:Build the Decoder Model Auto-encoder v.9c3

13 Part 4: Train the autoencoder, decode images display result
autoencoder.fit(x_train_noisy, x_train, validation_data=(x_test_noisy, x_test), epochs=30, batch_size=batch_size) # Predict the Autoencoder output from corrupted test images x_decoded = autoencoder.predict(x_test_noisy) # Display the 1st 8 corrupted and denoised images rows, cols = 10, 30 num = rows * cols imgs = np.concatenate([x_test[:num], x_test_noisy[:num], x_decoded[:num]]) imgs = imgs.reshape((rows * 3, cols, image_size, image_size)) imgs = np.vstack(np.split(imgs, rows, axis=1)) imgs = imgs.reshape((rows * 3, -1, image_size, image_size)) imgs = np.vstack([np.hstack(i) for i in imgs]) imgs = (imgs * 255).astype(np.uint8) plt.figure() plt.axis('off') plt.title('Original images: top rows, ' 'Corrupted Input: middle rows, ' 'Denoised Input: third rows') plt.imshow(imgs, interpolation='none', cmap='gray') Image.fromarray(imgs).save('corrupted_and_denoised.png') plt.show() Part 4: Train the autoencoder, decode images display result Auto-encoder v.9c3

14 Code https://towardsdatascience
Code Result: plt.title('Original images: top rows, ' 'Corrupted Input: middle rows, ' 'Denoised Input: third rows') '''Trains a denoising autoencoder on MNIST dataset. The denoising process removes unwanted noise that corrupted the Denoising is one of the classic applications of autoencoders. true signal. Noise + Data ---> Denoising Autoencoder ---> Data Given a training dataset of corrupted data as input and hidden structure to generate clean data. true signal as output, a denoising autoencoder can recover the are 3 models that share weights. For example, after training the This example has modular design. The encoder, decoder and autoencoder autoencoder, the encoder can be used to generate latent vectors ''' of input data for low-dim visualization like PCA or TSNE. #keras>> tensorflow.keras, modification by khw from __future__ import print_function from __future__ import division from __future__ import absolute_import import tensorflow.keras as keras from tensorflow.keras.layers import Reshape, Conv2DTranspose from tensorflow.keras.layers import Conv2D, Flatten from tensorflow.keras.layers import Activation, Dense, Input from tensorflow.keras.models import Model import numpy as np from tensorflow.keras.datasets import mnist from tensorflow.keras import backend as K from PIL import Image import matplotlib.pyplot as plt np.random.seed(1337) # MNIST dataset (x_train, _), (x_test, _) = mnist.load_data() x_train = np.reshape(x_train, [-1, image_size, image_size, 1]) image_size = x_train.shape[1] x_test = np.reshape(x_test, [-1, image_size, image_size, 1]) x_test = x_test.astype('float32') / 255 x_train = x_train.astype('float32') / 255 # Generate corrupted MNIST images by adding noise with normal dist x_train_noisy = x_train + noise noise = np.random.normal(loc=0.5, scale=0.5, size=x_train.shape) # centered at 0.5 and std=0.5 x_test_noisy = x_test + noise noise = np.random.normal(loc=0.5, scale=0.5, size=x_test.shape) x_test_noisy = np.clip(x_test_noisy, 0., 1.) x_train_noisy = np.clip(x_train_noisy, 0., 1.) batch_size = 128 input_shape = (image_size, image_size, 1) # Network parameters latent_dim = 16 kernel_size = 3 layer_filters = [32, 64] # Encoder/Decoder number of CNN layers and filters per layer # Build the Autoencoder Model x = inputs inputs = Input(shape=input_shape, name='encoder_input') # First build the Encoder Model # Notes: # Stack of Conv2D blocks # - faster but not as good as strides>1 # 2) Use MaxPooling2D as alternative to strides>1 # 1) Use Batch Normalization before ReLU on deep networks kernel_size=kernel_size, x = Conv2D(filters=filters, for filters in layer_filters: strides=2, padding='same')(x) activation='relu', shape = K.int_shape(x) # Shape info needed to build Decoder Model x = Flatten()(x) # Generate the latent vector latent = Dense(latent_dim, name='latent_vector')(x) # Instantiate Encoder Model encoder.summary() encoder = Model(inputs, latent, name='encoder') # Build the Decoder Model x = Reshape((shape[1], shape[2], shape[3]))(x) x = Dense(shape[1] * shape[2] * shape[3])(latent_inputs) latent_inputs = Input(shape=(latent_dim,), name='decoder_input') # Stack of Transposed Conv2D blocks # 2) Use UpSampling2D as alternative to strides>1 x = Conv2DTranspose(filters=filters, for filters in layer_filters[::-1]: x = Conv2DTranspose(filters=1, outputs = Activation('sigmoid', name='decoder_output')(x) decoder = Model(latent_inputs, outputs, name='decoder') # Instantiate Decoder Model decoder.summary() # Instantiate Autoencoder Model # Autoencoder = Encoder + Decoder autoencoder.summary() autoencoder = Model(inputs, decoder(encoder(inputs)), name='autoencoder') autoencoder.compile(loss='mse', optimizer='adam') # Train the autoencoder autoencoder.fit(x_train_noisy, epochs=30, validation_data=(x_test_noisy, x_test), x_train, batch_size=batch_size) x_decoded = autoencoder.predict(x_test_noisy) # Predict the Autoencoder output from corrupted test images # Display the 1st 8 corrupted and denoised images imgs = np.concatenate([x_test[:num], x_test_noisy[:num], x_decoded[:num]]) num = rows * cols rows, cols = 10, 30 imgs = imgs.reshape((rows * 3, -1, image_size, image_size)) imgs = np.vstack(np.split(imgs, rows, axis=1)) imgs = imgs.reshape((rows * 3, cols, image_size, image_size)) imgs = np.vstack([np.hstack(i) for i in imgs]) plt.axis('off') plt.figure() imgs = (imgs * 255).astype(np.uint8) plt.title('Original images: top rows, ' plt.imshow(imgs, interpolation='none', cmap='gray') 'Denoised Input: third rows') 'Corrupted Input: middle rows, ' plt.show() Image.fromarray(imgs).save('corrupted_and_denoised.png') Auto-encoder v.9c3

15 Deep Autoencoders _ Auto-encoder v.9c3

16 Deep Autoencoders: architecture
A deep Autoencoder is constructed by extending the encoder and decoder of autoencoder with multiple hidden layers. Gradient vanishing problem: the gradient becomes too small as it passes back through many layers Auto-encoder v.9c3

17 Denoising Autoencoders
By adding stochastic noise to the, it can force Autoencoder to learn more robust features. The loss function of Denoising autoencoder: where 1. A higher-level representation should be rather stable and robust under corruptions of the input. 2. Performing the denoising task well requires extracting features that capture useful structure in the input distribution. 3. Denoising is not the primary goal. It is advocated and investigated as a training criterion for learning to extract useful features that will constitute better higher-level representation. Auto-encoder v.9c3

18 Training Denoising Autoencoder
Like deep Autoencoder, we can stack multiple denoising autoencoders layer-wisely to form a Stacked DenoisingAutoencoder. Auto-encoder v.9c3

19 Deep Reinforcement Learning
Applications Playing Atari Games AlphaGO Auto-encoder v.9c3

20 Variational autoencoder
_ Auto-encoder v.9c3

21 Variational Autoencoder (VAE) v.s. Autoencoder
Autoencoders During training you present a pattern with artificial added noise to the encoder. And feed the same input pattern to the output. Then, use backpropagation to train the Autoencoder network. So it is unsupervised learning (no label data is needed). It can be used for data compression and noise removal. During recall, when a noisy pattern is presented to the input, the a de-noised pattern will appear in the output. Variational autoencoders Instead of learning a pattern from a pattern, variational autoencoders learn the parameters of a probability distribution function from the input patterns. We then use the parameter slearned to generate new data. So it is a generative model similar to GAN (Generative Adversarial Network). Auto-encoder v.9c3

22 variational autoencoder https://jaan
Variational autoencoders are cool. They let us design complex generative models of data, and fit them to large datasets. They can generate images of fictional celebrity faces and high-resolution digital artwork. VAE faces VAE faces demo VAE MNIST VAE street addresses FICTIONAL CELEBRITY FACES GENERATED BY A VARIATIONAL AUTOENCODER (BY ALEC RADFORD). Auto-encoder v.9c3

23 d d https://www.jeremyjordan.me/variational-autoencoders/
Auto-encoder v.9c3

24 Variational autoencoder
Auto-encoder v.9c3

25 Example of variational autoencoder
Auto-encoder v.9c3

26 Autoencoders Autoencoders are designed to reproduce their input, especially for images. Key point is to reproduce the input from a learned encoding. Auto-encoder v.9c3

27 Variational Autoencoder (VAE)
Z=Latent Variable By sampling Encoder and the decoder are neural networks. The latent variables, z, are drawn from a probability distribution depending on the input, X, and the reconstruction is chosen probabilistically from z. ??That means after you obtain mean=µ,variance 2, sample from X (500) to get Z (30) Encoder Q (z|X) Decoder P (X|z) Z X Z=Sample from a distribution N(µ,) Auto-encoder v.9c3

28 VAE Encoder The encoder takes input and returns parameters for a probability density (e.g., Gaussian): I.e., gives the mean and co-variance matrix. We can sample from this distribution to get random values of the lower-dimensional representation z. Implemented via a neural network: each input x gives a vector mean and diagonal covariance matrix that determine the Gaussian density Parameters 𝜃 for the NN need to be learned – need to set up a loss function. Auto-encoder v.9c3

29 VAE Decoder The decoder takes latent variable z and returns parameters for a distribution. E.g., gives the mean and variance for each pixel in the output. Reconstruction is produced by sampling. Implemented via neural network, the NN parameters 𝜙 are learned. Auto-encoder v.9c3

30 VAE loss function Loss function for autoencoder: L2 distance between output and input (or clean input for denoising case) For VAE, we need to learn parameters of two probability distributions. For a single input, xi, we maximize the expected value of returning xi or minimize the expected negative log likelihood. This takes expected value wrt z over the current distribution of the loss Auto-encoder v.9c3

31 VAE loss function Problem: the weights may adjust to memorize input images via z. I.e., input that we regard as similar may end up very different in z space. We prefer continuous latent representations to give meaningful parameterizations. E.g., smooth changes from one digit to another. Solution: Try to force to be close to a standard normal (or some other simple density). Auto-encoder v.9c3

32 VAE loss function For a single data point xi we get the loss function
For a single data point xi we get the loss function The first term promotes recovery of the input. The second term keeps the encoding continuous – the encoding is compared to a fixed p(z) regardless of the input, which inhibits memorization. With this loss function the VAE can (almost) be trained using gradient descent on minibatches. Auto-encoder v.9c3

33 VAE loss function For a single data point xi we get the loss function
For a single data point xi we get the loss function Problem: The expectation would usually be approximated by choosing samples and averaging. This is not differentiable wrt 𝜃 and 𝜙. Auto-encoder v.9c3

34 Some math background is needed:
See appendix: The expected negative log likelihood Conditional expectation etc. Auto-encoder v.9c3

35 Example A Tutorial on Information Maximizing Variational Autoencoders (InfoVAE) Auto-encoder v.9c3

36 VAE loss function Problem: The expectation would usually be approximated by choosing samples and averaging. This is not differentiable wrt 𝜃 and 𝜙. Auto-encoder v.9c3

37 VAE loss function Reparameterization: If z is 𝑁(𝜇 𝑥 𝑖 , Σ 𝑥 𝑖 ), then we can sample z using 𝑧=𝜇 𝑥 𝑖 +√(Σ 𝑥 𝑖 ) 𝜖, where 𝜖 is N(0,1). So we can draw samples from N(0,1), which doesn’t depend on the parameters. Auto-encoder v.9c3

38 VAE generative model After training, is close to a standard normal, N(0,1) – easy to sample. Using a sample of z from as input to sample from gives an approximate reconstruction of xi, at least in expectation. If we sample any z from N(0,1) and use it as input to to sample from then we can approximate the entire data distribution p(x). I.e., we can generate new samples that look like the input but aren’t in the input. Auto-encoder v.9c3

39 Implementation : learning
example Auto-encoder v.9c3

40 Algorithm : learning We also want the output to be similar to the input This Q(z|x) =N(µz|X,z|X) should be close to N(0,I) P(X|z) StdDev= Qq(z|x) Figure 4: A variational autoencoder with the "reparameterization trick". Notice that all operations between the inputs and objectives are continuous deterministic functions, allowing back-propagation to occur. Figure 3: An initial attempt at a variational autoencoder without the "reparameterization trick". Objective functions shown in red. We cannot back-propagate through the stochastic sampling operation because it is not a continuous deterministic function. Auto-encoder v.9c3

41 Training: Loss L is to be minimized
Auto-encoder v.9c3

42 KL Divergence for 2 Gaussians
Auto-encoder v.9c3

43 Algorithm : recall after training
x Figure 2 A graphical model of a typical variational autoencoder (without a "encoder", just the "decoder"). We're using a modified plate notation: the circles represent variables/parameters, rectangular boxes with a number in the lower right corner to represent multiple instances of the contained variables, and the little diagram in the middle is a representation of a deterministic neural network (function approximator). Figure 5 The generative model component of a variational autoencoder in test mode. Auto-encoder v.9c3

44 math Auto-encoder v.9c3

45 Math L is to be minimized Auto-encoder v.9c3

46 Kullback–Leibler divergence KL (D) for two Gaussians
Auto-encoder v.9c3

47 Training: Loss L is to be minimized
Auto-encoder v.9c3

48 Implementation Keras Auto-encoder v.9c3

49 Keras StdDev= Auto-encoder v.9c3

50 Keras implementation of VAE
original_dim = 784 intermediate_dim = 256 latent_dim = 2 batch_size = 100 epochs = 50 epsilon_std = 1.0 Keras implementation of VAE x = Input(shape=(original_dim,)) h = Dense(intermediate_dim, activation='relu')(x) z_mu = Dense(latent_dim)(h) z_log_var = Dense(latent_dim)(h) z_mu, z_log_var = KLDivergenceLayer()([z_mu, z_log_var]) # Use of lambda: normalize log variance to std dev z_sigma = Lambda(lambda t: K.exp(.5*t))(z_log_var) eps = Input(tensor=K.random_normal(shape=(K.shape(x)[0], latent_dim))) z_eps = Multiply()([z_sigma, eps]) z = Add()([z_mu, z_eps]) decoder = Sequential([ Dense(intermediate_dim, input_dim=latent_dim, activation='relu'), Dense(original_dim, activation='sigmoid') ]) x_pred = decoder(z) StdDev= Predicted output Auto-encoder v.9c3

51 df StdDev= Auto-encoder v.9c3

52 Derivation of expected value E()
Auto-encoder v.9c3

53 Math background Inverse transform sampling is a method for sampling from any distribution given its cumulative distribution function (CDF), F(x). For a given distribution with CDF F(x), it works as such: Sample a value, u, between [0,1] from a uniform distribution. Define the inverse of the CDF as F−1(u) (the domain is a probability value between [0,1].  F−1(u) is a sample from your target distribution. Auto-encoder v.9c3

54 Proof The proof of correctness is actually pretty simple. Let U be a uniform random variable on [0,1], and transformation F−1(U)  as before, then we have: Thus, we have shown that F−1(U) has the distribution of our target random variable (since the cumulative distribution function CDF F(x) is the same). It's important to note what we did: we took an easy to sample random variable U, performed a deterministic transformation F−1(U) and ended up with a random variable that was distributed according to our target distribution. Auto-encoder v.9c3

55 Example As a simple example, we can try to generate a exponential distribution with CDF of F(x)=1−e−λx for x≥0x≥0. The inverse is defined by x=F−1(u)=−(1/λ)log(1−y). Thus, we can sample from an exponential distribution just by iteratively evaluating this expression with a uniform randomly distributed number. Auto-encoder v.9c3

56 Extensions Now instead of starting from a uniform distribution, what happens if we want to sample from another distribution, say a normal distribution? We just first apply the reverse of the inverse sampling transform called the Probability Integral Transform. So the steps would be: Sample from a normal distribution. Apply the probability integral transform using the CDF (cumulative distribution function ) of a normal distribution to get a uniformly distributed sample. Apply inverse transform sampling with the inverse CDF of the target distribution to get a sample from our target distribution. What about extending to multiple dimensions? We can just break up the joint distribution into its conditional components and sample each sequentially to construct the overall sample: P(x1,…,xn)=P(xn|xn−1,…,x1)…P(x2|x1)P(x1) (4) In detail, first sample x1 using the method above, then x2|x1, then x3|x2,x1, and so on. Of course, this implicitly means you would have the CDF of each of those distributions available, which practically might not be possible. Auto-encoder v.9c3

57 A graphical model of a typical variational autoencoder (without a "encoder", just the "decoder"). We're using a modified plate notation: the circles represent variables/parameters, rectangular boxes with a number in the lower right corner to represent multiple instances of the contained variables, and the little diagram in the middle is a representation of a deterministic neural network (function approximator). Note: we can put another distribution on X like a Bernoulli for binary data parameterized by p=g(z;θ). The important part is we're able to maximize the likelihood over the θ parameters. Implicitly, we will want our output variable to be continuous in θ so we can take its gradient. Auto-encoder v.9c3

58 A hard fit First, we need to define the probability of seeing a single example x: The probability of a single sample is just the joint probability of our given model marginalizing (i.e. integrating) out Z. Since we don't have an analytical form of the density, we approximate the integral by averaging over M samples from Z∼N(0,I). Putting together the log-likelihood (defined by logging the density and summing over all of our N observations): Auto-encoder v.9c3

59 appendix Auto-encoder v.9c3

60 Training Denoising Autoencoder on MNIST
The following pictures show the difference between the resulting filters of Denoising Autoencoder trained on MNIST with different noise ratios. No noise (noise ratio=0%) noise ratio=30% Auto-encoder v.9c3 Diagram from (Hinton and Salakhutdinov, 2006)

61 Objective function Auto-encoder v.9c3

62 x Auto-encoder v.9c3

63 d d Auto-encoder v.9c3

64 Negative Log-Likelihood (NLL)
Auto-encoder v.9c3

65 Softmax Activation Function
Auto-encoder v.9c3

66 Negative Log-Likelihood (NLL)
Auto-encoder v.9c3

67 Derivation of the softmax
Auto-encoder v.9c3

68 c c Auto-encoder v.9c3

69 d d Auto-encoder v.9c3

70 d sd Auto-encoder v.9c3

71 Conditional probability density
Auto-encoder v.9c3

72 Expectation Conditional probability 1
Example: E(X | Y ) = { 9/4 with probability 1/8, and 18/4 with probability 7/8}. Total average is: E(X) = EY{ E(X|Y) } =(9/ 4) ×( 1 /8) + (18/ 4) × (7 /8) = 4.22. Auto-encoder v.9c3

73 Expectation Conditional probability 2 -- example
Auto-encoder v.9c3

74 Prior and Posterior probability desitizes
Auto-encoder v.9c3


Download ppt "Auto-encoder (draft) KH Wong Auto-encoder v.9c3."

Similar presentations


Ads by Google