Presentation is loading. Please wait.

Presentation is loading. Please wait.

Generative Adversarial Network (GAN)

Similar presentations


Presentation on theme: "Generative Adversarial Network (GAN)"— Presentation transcript:

1 Generative Adversarial Network (GAN)
I should follow this F-GAN W-GAN x 2 LS-GAN Loss sentisive EB-GAN Restricted Boltzmann Machine: Gibbs Sampling: Outlook:

2 NIPS 2016 Tutorial: Generative Adversarial Networks
Author: Ian Goodfellow Paper: Video: Information-Processing-Systems- Conference/Neural-Information-Processing- Systems-Conference-NIPS-2016/Generative- Adversarial-Networks You can find tips for training GAN here:

3 Review

4 Generation Writing Poems? Drawing?
Generation Writing Poems? 就像教小朋友畫圖 Drawing?

5 Review: Auto-encoder Image ? As close as possible NN NN Encoder code
Decoder code Randomly generate a vector as code NN Decoder code Image ?

6 Review: Auto-encoder 2D NN Decoder code NN Decoder 1.5 0 -1.5 1.5
−1.5 0 SOM: NN Decoder

7 Review: Auto-encoder -1.5 1.5
SOM:

8 Auto-encoder VAE NN Encoder NN Decoder input output code
Minimize reconstruction error m1 m2 m3 NN Encoder 𝑐 1 NN Decoder input + 𝑐 2 output 𝜎 1 exp 𝜎 2 𝑐 3 𝜎 3 X 𝑐 𝑖 =𝑒𝑥𝑝 𝜎 𝑖 × 𝑒 𝑖 + 𝑚 𝑖 𝑒 3 𝑒 1 𝑒 2 From a normal distribution Minimize 𝑖=1 3 𝑒𝑥𝑝 𝜎 𝑖 − 1+ 𝜎 𝑖 + 𝑚 𝑖 2 Auto-Encoding Variational Bayes,

9 Problems of VAE It does not really try to simulate real images code NN
Decoder Output As close as possible One pixel difference from the target One pixel difference from the target Realistic Fake

10 The evolution of generation
NN Generator v1 NN Generator v2 NN Generator v3 Discri-minator v1 Discri-minator v2 Discri-minator v3 Binary Classifier Real images:

11 The evolution of generation
NN Generator v1 NN Generator v2 NN Generator v3 Discri-minator v1 Discri-minator v2 Discri-minator v3 Binary Classifier Real images:

12 GAN - Discriminator NN Generator v1 Randomly sample a vector
Real images: Something like Decoder in VAE 1 1 1 1 Discri-minator v1 image 1/0 (real or fake)

13 Randomly sample a vector
GAN - Generator NN Generator v1 Updating the parameters of generator v2 The output be classified as “real” (as close to 1 as possible) Generator + Discriminator = a network Discri-minator v1 Using gradient descent to update the parameters in the generator, but fix the discriminator 1.0 0.13

14 GAN – 二次元人物頭像鍊成 Source of images: DCGAN:

15 GAN – 二次元人物頭像鍊成 100 rounds

16 GAN – 二次元人物頭像鍊成 1000 rounds

17 GAN – 二次元人物頭像鍊成 2000 rounds

18 GAN – 二次元人物頭像鍊成 5000 rounds

19 GAN – 二次元人物頭像鍊成 10,000 rounds

20 GAN – 二次元人物頭像鍊成 20,000 rounds

21 GAN – 二次元人物頭像鍊成 50,000 rounds

22 Basic Idea of GAN

23 Maximum Likelihood Estimation
Given a data distribution 𝑃 𝑑𝑎𝑡𝑎 𝑥 We have a distribution 𝑃 𝐺 𝑥;𝜃 parameterized by 𝜃 E.g. 𝑃 𝐺 𝑥;𝜃 is a Gaussian Mixture Model, 𝜃 are means and variances of the Gaussians We want to find 𝜃 such that 𝑃 𝐺 𝑥;𝜃 close to 𝑃 𝑑𝑎𝑡𝑎 𝑥 Sample 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑚 from 𝑃 𝑑𝑎𝑡𝑎 𝑥 We can compute 𝑃 𝐺 𝑥 𝑖 ;𝜃 Likelihood of generating the samples 𝐿= 𝑖=1 𝑚 𝑃 𝐺 𝑥 𝑖 ;𝜃 Find 𝜃 ∗ maximizing the likelihood

24 Maximum Likelihood Estimation
𝑥 1 , 𝑥 2 ,…, 𝑥 𝑚 from 𝑃 𝑑𝑎𝑡𝑎 𝑥 ≈𝑎𝑟𝑔 max 𝜃 𝐸 𝑥~ 𝑃 𝑑𝑎𝑡𝑎 [𝑙𝑜𝑔 𝑃 𝐺 𝑥;𝜃 ] =𝑎𝑟𝑔 max 𝜃 𝑥 𝑃 𝑑𝑎𝑡𝑎 𝑥 𝑙𝑜𝑔 𝑃 𝐺 𝑥;𝜃 𝑑𝑥 − 𝑥 𝑃 𝑑𝑎𝑡𝑎 𝑥 𝑙𝑜𝑔 𝑃 𝑑𝑎𝑡𝑎 𝑥 𝑑𝑥 How to have a very general 𝑃 𝐺 𝑥;𝜃 ? =𝑎𝑟𝑔 min 𝜃 𝐾𝐿 𝑃 𝑑𝑎𝑡𝑎 𝑥 || 𝑃 𝐺 𝑥;𝜃

25 Now 𝑃 𝐺 𝑥;𝜃 is a NN 𝑃 𝐺 𝑥 = 𝑧 𝑃 𝑝𝑟𝑖𝑜𝑟 𝑧 𝐼 𝐺 𝑧 =𝑥 𝑑𝑧 𝑃 𝐺 𝑥;𝜃 𝑃 𝑑𝑎𝑡𝑎 𝑥
𝑃 𝐺 𝑥;𝜃 𝑃 𝑑𝑎𝑡𝑎 𝑥 𝐺 𝑧 =𝑥 𝑥 𝑃 𝐺 𝑥 = 𝑧 𝑃 𝑝𝑟𝑖𝑜𝑟 𝑧 𝐼 𝐺 𝑧 =𝑥 𝑑𝑧 It is difficult to compute the likelihood.

26 Basic Idea of GAN Generator G G is a function, input z, output x
Given a prior distribution Pprior(z), a probability distribution PG(x) is defined by function G Discriminator D D is a function, input x, output scalar Evaluate the “difference” between PG(x) and Pdata(x) There is a function V(G,D). Hard to learn by maximum likelihood 𝐺 ∗ =𝑎𝑟𝑔 min 𝐺 max 𝐷 𝑉 𝐺,𝐷

27 Basic Idea 𝐺 ∗ =𝑎𝑟𝑔 min 𝐺 max 𝐷 𝑉 𝐺,𝐷
𝑉= 𝐸 𝑥∼ 𝑃 𝑑𝑎𝑡𝑎 𝑙𝑜𝑔𝐷 𝑥 + 𝐸 𝑥∼ 𝑃 𝐺 𝑙𝑜𝑔 1−𝐷 𝑥 Given a generator G, max 𝐷 𝑉 𝐺,𝐷 evaluate the “difference” between 𝑃 𝐺 and 𝑃 𝑑𝑎𝑡𝑎 Pick the G defining 𝑃 𝐺 most similar to 𝑃 𝑑𝑎𝑡𝑎 𝑉 𝐺 1 ,𝐷 𝑉 𝐺 2 ,𝐷 𝑉 𝐺 3 ,𝐷 𝐷 𝐷 𝐷 𝐺 1 𝐺 2 𝐺 3

28 max 𝐷 𝑉 𝐺,𝐷 𝐺 ∗ =𝑎𝑟𝑔 min 𝐺 max 𝐷 𝑉 𝐺,𝐷
Given G, what is the optimal D* maximizing Given x, the optimal D* maximizing 𝑉= 𝐸 𝑥∼ 𝑃 𝑑𝑎𝑡𝑎 𝑙𝑜𝑔𝐷 𝑥 + 𝐸 𝑥∼ 𝑃 𝐺 𝑙𝑜𝑔 1−𝐷 𝑥 = 𝑥 𝑃 𝑑𝑎𝑡𝑎 𝑥 𝑙𝑜𝑔𝐷 𝑥 𝑑𝑥+ 𝑥 𝑃 𝐺 𝑥 𝑙𝑜𝑔 1−𝐷 𝑥 𝑑𝑥 = 𝑥 𝑃 𝑑𝑎𝑡𝑎 𝑥 𝑙𝑜𝑔𝐷 𝑥 + 𝑃 𝐺 𝑥 𝑙𝑜𝑔 1−𝐷 𝑥 𝑑𝑥 Assume that D(x) can have any value here 𝑃 𝑑𝑎𝑡𝑎 𝑥 𝑙𝑜𝑔𝐷 𝑥 + 𝑃 𝐺 𝑥 𝑙𝑜𝑔 1−𝐷 𝑥

29 max 𝐷 𝑉 𝐺,𝐷 𝐺 ∗ =𝑎𝑟𝑔 min 𝐺 max 𝐷 𝑉 𝐺,𝐷
Given x, the optimal D* maximizing Find D* maximizing: 𝑃 𝑑𝑎𝑡𝑎 𝑥 𝑙𝑜𝑔𝐷 𝑥 + 𝑃 𝐺 𝑥 𝑙𝑜𝑔 1−𝐷 𝑥 a D b D f 𝐷 =a𝑙𝑜𝑔(𝐷)+𝑏𝑙𝑜𝑔 1−𝐷 𝑑f 𝐷 𝑑𝐷 =𝑎× 1 𝐷 +𝑏× 1 1−𝐷 × −1 =0 sigmoid 𝑎× 1 𝐷 ∗ =𝑏× 1 1− 𝐷 ∗ 𝑎× 1− 𝐷 ∗ =𝑏× 𝐷 ∗ 𝑎−𝑎 𝐷 ∗ =𝑏 𝐷 ∗ 𝐷 ∗ 𝑥 = 𝑃 𝑑𝑎𝑡𝑎 𝑥 𝑃 𝑑𝑎𝑡𝑎 𝑥 + 𝑃 𝐺 𝑥 𝐷 ∗ = 𝑎 𝑎+𝑏 0 < < 1

30 max 𝐷 𝑉 𝐺,𝐷 𝐺 ∗ =𝑎𝑟𝑔 min 𝐺 max 𝐷 𝑉 𝐺,𝐷 𝐷 𝐷 𝐷 𝑉 𝐺 1 ,𝐷 𝑉 𝐺 2 ,𝐷
𝐷 1 ∗ 𝑥 = 𝑃 𝑑𝑎𝑡𝑎 𝑥 𝑃 𝑑𝑎𝑡𝑎 𝑥 + 𝑃 𝐺 1 𝑥 𝐷 2 ∗ 𝑥 = 𝑃 𝑑𝑎𝑡𝑎 𝑥 𝑃 𝑑𝑎𝑡𝑎 𝑥 + 𝑃 𝐺 2 𝑥 “difference” between 𝑃 𝐺 1 and 𝑃 𝑑𝑎𝑡𝑎 𝑉 𝐺 1 , 𝐷 1 ∗ 𝐷 𝐷 𝐷 𝑉 𝐺 1 ,𝐷 𝑉 𝐺 2 ,𝐷 𝑉 𝐺 3 ,𝐷

31 max 𝐷 𝑉 𝐺,𝐷 𝑉= 𝐸 𝑥∼ 𝑃 𝑑𝑎𝑡𝑎 𝑙𝑜𝑔𝐷 𝑥 + 𝐸 𝑥∼ 𝑃 𝐺 𝑙𝑜𝑔 1−𝐷 𝑥
𝑉= 𝐸 𝑥∼ 𝑃 𝑑𝑎𝑡𝑎 𝑙𝑜𝑔𝐷 𝑥 + 𝐸 𝑥∼ 𝑃 𝐺 𝑙𝑜𝑔 1−𝐷 𝑥 max 𝐷 𝑉 𝐺,𝐷 𝐷 ∗ 𝑥 = 𝑃 𝑑𝑎𝑡𝑎 𝑥 𝑃 𝑑𝑎𝑡𝑎 𝑥 + 𝑃 𝐺 𝑥 max 𝐷 𝑉 𝐺,𝐷 =𝑉 𝐺, 𝐷 ∗ = 𝐸 𝑥∼ 𝑃 𝑑𝑎𝑡𝑎 𝑙𝑜𝑔 𝑃 𝑑𝑎𝑡𝑎 𝑥 𝑃 𝑑𝑎𝑡𝑎 𝑥 + 𝑃 𝐺 𝑥 + 𝐸 𝑥∼ 𝑃 𝐺 𝑙𝑜𝑔 𝑃 𝐺 𝑥 𝑃 𝑑𝑎𝑡𝑎 𝑥 + 𝑃 𝐺 𝑥 1 2 = 𝑥 𝑃 𝑑𝑎𝑡𝑎 𝑥 𝑙𝑜𝑔 𝑃 𝑑𝑎𝑡𝑎 𝑥 𝑃 𝑑𝑎𝑡𝑎 𝑥 + 𝑃 𝐺 𝑥 𝑑𝑥 Converge 1 2 2 + 𝑥 𝑃 𝐺 𝑥 𝑙𝑜𝑔 𝑃 𝐺 𝑥 𝑃 𝑑𝑎𝑡𝑎 𝑥 + 𝑃 𝐺 𝑥 𝑑𝑥 +2𝑙𝑜𝑔 1 2 −2𝑙𝑜𝑔2 2

32 max 𝐷 𝑉 𝐺,𝐷 𝐷 ∗ 𝑥 = 𝑃 𝑑𝑎𝑡𝑎 𝑥 𝑃 𝑑𝑎𝑡𝑎 𝑥 + 𝑃 𝐺 𝑥 max 𝐷 𝑉 𝐺,𝐷 =𝑉 𝐺, 𝐷 ∗
𝐷 ∗ 𝑥 = 𝑃 𝑑𝑎𝑡𝑎 𝑥 𝑃 𝑑𝑎𝑡𝑎 𝑥 + 𝑃 𝐺 𝑥 max 𝐷 𝑉 𝐺,𝐷 =𝑉 𝐺, 𝐷 ∗ =−2𝑙𝑜𝑔2+ 𝑥 𝑃 𝑑𝑎𝑡𝑎 𝑥 𝑙𝑜𝑔 𝑃 𝑑𝑎𝑡𝑎 𝑥 𝑃 𝑑𝑎𝑡𝑎 𝑥 + 𝑃 𝐺 𝑥 /2 𝑑𝑥 + 𝑥 𝑃 𝐺 𝑥 𝑙𝑜𝑔 𝑃 𝐺 𝑥 𝑃 𝑑𝑎𝑡𝑎 𝑥 + 𝑃 𝐺 𝑥 /2 𝑑𝑥 =−2log2+KL P data x || P data x + P G x 2 +KL P G x || P data x + P G x 2 =−2𝑙𝑜𝑔2+2𝐽𝑆𝐷 𝑃 𝑑𝑎𝑡𝑎 𝑥 || 𝑃 𝐺 𝑥 Jensen-Shannon divergence

33 In the end …… Generator G, Discriminator D Looking for G* such that
𝑉= 𝐸 𝑥∼ 𝑃 𝑑𝑎𝑡𝑎 𝑙𝑜𝑔𝐷 𝑥 + 𝐸 𝑥∼ 𝑃 𝐺 𝑙𝑜𝑔 1−𝐷 𝑥 Generator G, Discriminator D Looking for G* such that Given G, max 𝐷 𝑉 𝐺,𝐷 What is the optimal G? 𝐺 ∗ =𝑎𝑟𝑔 min 𝐺 max 𝐷 𝑉 𝐺,𝐷 0 < < log 2 =−2𝑙𝑜𝑔2+2𝐽𝑆𝐷 𝑃 𝑑𝑎𝑡𝑎 𝑥 || 𝑃 𝐺 𝑥 𝑃 𝐺 𝑥 = 𝑃 𝑑𝑎𝑡𝑎 𝑥

34 Algorithm 𝐺 ∗ =𝑎𝑟𝑔 min 𝐺 max 𝐷 𝑉 𝐺,𝐷 𝐿 𝐺
To find the best G minimizing the loss function 𝐿 𝐺 , 𝜃 𝐺 ← 𝜃 𝐺 −𝜂 𝜕𝐿 𝐺 𝜕 𝜃 𝐺 𝜃 𝐺 defines G 𝑑𝑓 𝑥 𝑑𝑥 =? 𝑓 𝑥 =max⁡{ 𝐷 1 𝑥 , 𝐷 2 𝑥 , 𝐷 3 𝑥 } 𝑑 𝐷 𝑖 𝑥 𝑑𝑥 If 𝐷 𝑖 𝑥 is the max one 𝐷 1 𝑥 𝐷 3 𝑥 𝐷 2 𝑥 𝑑 𝐷 1 𝑥 𝑑𝑥 𝑑 𝐷 2 𝑥 𝑑𝑥 𝑑 𝐷 3 𝑥 𝑑𝑥

35 Algorithm 𝐺 ∗ =𝑎𝑟𝑔 min 𝐺 max 𝐷 𝑉 𝐺,𝐷 𝐿 𝐺 Given 𝐺 0
Find 𝐷 0 ∗ maximizing 𝑉 𝐺 0 ,𝐷 𝜃 𝐺 ← 𝜃 𝐺 −𝜂 𝜕𝑉 𝐺, 𝐷 0 ∗ 𝜕 𝜃 𝐺 Obtain 𝐺 1 Find 𝐷 1 ∗ maximizing 𝑉 𝐺 1 ,𝐷 𝜃 𝐺 ← 𝜃 𝐺 −𝜂 𝜕𝑉 𝐺, 𝐷 1 ∗ 𝜕 𝜃 𝐺 Obtain 𝐺 2 …… 𝑉 𝐺 0 , 𝐷 0 ∗ is the JS divergence between 𝑃 𝑑𝑎𝑡𝑎 𝑥 and 𝑃 𝐺 0 𝑥 Decrease JS divergence(?) 𝑉 𝐺 1 , 𝐷 1 ∗ is the JS divergence between 𝑃 𝑑𝑎𝑡𝑎 𝑥 and 𝑃 𝐺 1 𝑥 Decrease JS divergence(?)

36 Algorithm 𝐺 ∗ =𝑎𝑟𝑔 min 𝐺 max 𝐷 𝑉 𝐺,𝐷 𝐿 𝐺 Given 𝐺 0
Find 𝐷 0 ∗ maximizing 𝑉 𝐺 0 ,𝐷 𝜃 𝐺 ← 𝜃 𝐺 −𝜂 𝜕𝑉 𝐺, 𝐷 0 ∗ 𝜕 𝜃 𝐺 Obtain 𝐺 1 𝑉 𝐺 0 , 𝐷 0 ∗ is the JS divergence between 𝑃 𝑑𝑎𝑡𝑎 𝑥 and 𝑃 𝐺 0 𝑥 Decrease JS divergence(?) 𝑉 𝐺 0 , 𝐷 0 ∗ smaller 這頁對嗎? 𝑉 𝐺 1 , 𝐷 0 ∗ 𝑉 𝐺 1 , 𝐷 1 ∗ …… Assume 𝐷 0 ∗ ≈ 𝐷 1 ∗ 𝐷 0 ∗ 𝐷 0 ∗ Don’t update G too much 𝑉 𝐺 0 ,𝐷 𝑉 𝐺 1 ,𝐷

37 In practice … 𝑉= 𝐸 𝑥∼ 𝑃 𝑑𝑎𝑡𝑎 𝑙𝑜𝑔𝐷 𝑥 + 𝐸 𝑥∼ 𝑃 𝐺 𝑙𝑜𝑔 1−𝐷 𝑥
𝑉= 𝐸 𝑥∼ 𝑃 𝑑𝑎𝑡𝑎 𝑙𝑜𝑔𝐷 𝑥 + 𝐸 𝑥∼ 𝑃 𝐺 𝑙𝑜𝑔 1−𝐷 𝑥 Given G, how to compute max 𝐷 𝑉 𝐺,𝐷 Sample 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑚 from 𝑃 𝑑𝑎𝑡𝑎 𝑥 , sample 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑚 from generator 𝑃 𝐺 𝑥 𝑉 = 1 𝑚 𝑖=1 𝑚 𝑙𝑜𝑔𝐷 𝑥 𝑖 𝑚 𝑖=1 𝑚 𝑙𝑜𝑔 1−𝐷 𝑥 𝑖 Maximize Binary Classifier Output is D(x) Minimize Cross-entropy Minimize –log D(x) If x is a positive example If x is a negative example Minimize –log(1-D(x))

38 = Binary Classifier Output is f(x) Minimize Cross-entropy
Minimize –log f(x) If x is a positive example If x is a negative example Minimize –log(1-f(x)) D is a binary classifier (can be deep) with parameters 𝜃 𝑑 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑚 from 𝑃 𝑑𝑎𝑡𝑎 𝑥 Positive examples 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑚 from 𝑃 𝐺 𝑥 Negative examples 𝐿= 1 𝑚 𝑖=1 𝑚 𝑙𝑜𝑔𝐷 𝑥 𝑖 𝑚 𝑖=1 𝑚 𝑙𝑜𝑔 1−𝐷 𝑥 𝑖 Minimize = 𝑉 = 1 𝑚 𝑖=1 𝑚 𝑙𝑜𝑔𝐷 𝑥 𝑖 𝑚 𝑖=1 𝑚 𝑙𝑜𝑔 1−𝐷 𝑥 𝑖 Maximize

39 Algorithm Initialize 𝜃 𝑑 for D and 𝜃 𝑔 for G
Can only find lower found of max 𝐷 𝑉 𝐺,𝐷 In each training iteration: Sample m examples 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑚 from data distribution 𝑃 𝑑𝑎𝑡𝑎 𝑥 Sample m noise samples 𝑧 1 , 𝑧 2 ,…, 𝑧 𝑚 from the prior 𝑃 𝑝𝑟𝑖𝑜𝑟 𝑧 Obtaining generated data 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑚 , 𝑥 𝑖 =𝐺 𝑧 𝑖 Update discriminator parameters 𝜃 𝑑 to maximize 𝑉 = 1 𝑚 𝑖=1 𝑚 𝑙𝑜𝑔𝐷 𝑥 𝑖 𝑚 𝑖=1 𝑚 𝑙𝑜𝑔 1−𝐷 𝑥 𝑖 𝜃 𝑑 ← 𝜃 𝑑 +𝜂𝛻 𝑉 𝜃 𝑑 Sample another m noise samples 𝑧 1 , 𝑧 2 ,…, 𝑧 𝑚 from the prior 𝑃 𝑝𝑟𝑖𝑜𝑟 𝑧 Update generator parameters 𝜃 𝑔 to minimize 𝑉 = 1 𝑚 𝑖=1 𝑚 𝑙𝑜𝑔𝐷 𝑥 𝑖 𝑚 𝑖=1 𝑚 𝑙𝑜𝑔 1−𝐷 𝐺 𝑧 𝑖 𝜃 𝑔 ← 𝜃 𝑔 −𝜂𝛻 𝑉 𝜃 𝑔 Learning D Repeat k times 位什麼要 1- 啊 Learning G Only Once

40 Objective Function for Generator in Real Implementation
𝑉= 𝐸 𝑥∼ 𝑃 𝑑𝑎𝑡𝑎 𝑙𝑜𝑔𝐷 𝑥 −𝑙𝑜𝑔 𝐷 𝑥 + 𝐸 𝑥∼ 𝑃 𝐺 𝑙𝑜𝑔 1−𝐷 𝑥 Slow at the beginning 𝑉= 𝐸 𝑥∼ 𝑃 𝐺 −𝑙𝑜𝑔 𝐷 𝑥 𝐷 𝑥 Real implementation: label x from PG as positive 𝑙𝑜𝑔 1−𝐷 𝑥

41 Demo The code used in demo from:
CNN_GAN_v2.ipynb keras-adversarial Simple example of TF version I cannot run it, probably because of version issue

42 Issue about Evaluating the Divergence

43 Evaluating JS divergence
Martin Arjovsky, Léon Bottou, Towards Principled Methods for Training Generative Adversarial Networks, 2017, arXiv preprint

44 Evaluating JS divergence
JS divergence estimated by discriminator telling little information Weak Generator Strong Generator

45 Discriminator 1 𝑉= 𝐸 𝑥∼ 𝑃 𝑑𝑎𝑡𝑎 𝑙𝑜𝑔𝐷 𝑥 + 𝐸 𝑥∼ 𝑃 𝐺 𝑙𝑜𝑔 1−𝐷 𝑥
𝑉= 𝐸 𝑥∼ 𝑃 𝑑𝑎𝑡𝑎 𝑙𝑜𝑔𝐷 𝑥 + 𝐸 𝑥∼ 𝑃 𝐺 𝑙𝑜𝑔 1−𝐷 𝑥 ≈ 1 𝑚 𝑖=1 𝑚 𝑙𝑜𝑔𝐷 𝑥 𝑖 𝑚 𝑖=1 𝑚 𝑙𝑜𝑔 1−𝐷 𝑥 𝑖 max 𝐷 𝑉 𝐺,𝐷 =−2𝑙𝑜𝑔2+2𝐽𝑆𝐷 𝑃 𝑑𝑎𝑡𝑎 𝑥 || 𝑃 𝐺 𝑥 = 0 log2 Reason 1. Approximate by sampling Weaken your discriminator? Can weak discriminator compute JS divergence?

46 Discriminator 1 𝑉= 𝐸 𝑥∼ 𝑃 𝑑𝑎𝑡𝑎 𝑙𝑜𝑔𝐷 𝑥 + 𝐸 𝑥∼ 𝑃 𝐺 𝑙𝑜𝑔 1−𝐷 𝑥
𝑉= 𝐸 𝑥∼ 𝑃 𝑑𝑎𝑡𝑎 𝑙𝑜𝑔𝐷 𝑥 + 𝐸 𝑥∼ 𝑃 𝐺 𝑙𝑜𝑔 1−𝐷 𝑥 ≈ 1 𝑚 𝑖=1 𝑚 𝑙𝑜𝑔𝐷 𝑥 𝑖 𝑚 𝑖=1 𝑚 𝑙𝑜𝑔 1−𝐷 𝑥 𝑖 max 𝐷 𝑉 𝐺,𝐷 =−2𝑙𝑜𝑔2+2𝐽𝑆𝐷 𝑃 𝑑𝑎𝑡𝑎 𝑥 || 𝑃 𝐺 𝑥 = 0 log2 Reason 2. the nature of data Both 𝑃 𝑑𝑎𝑡𝑎 𝑥 and 𝑃 𝐺 𝑥 are low-dim manifold in high-dim space Usually they do not have any overlap

47 Evaluation Better

48 Evaluation Better 𝑃 𝐺 0 𝑥 𝐽𝑆 𝑃 𝐺 1 || 𝑃 𝑑𝑎𝑡𝑎 =𝑙𝑜𝑔2 𝑃 𝑑𝑎𝑡𝑎 𝑥 ……
𝑃 𝐺 0 𝑥 𝐽𝑆 𝑃 𝐺 1 || 𝑃 𝑑𝑎𝑡𝑎 =𝑙𝑜𝑔2 𝑃 𝑑𝑎𝑡𝑎 𝑥 …… 𝐽𝑆 𝑃 𝐺 2 || 𝑃 𝑑𝑎𝑡𝑎 =𝑙𝑜𝑔2 𝑃 𝐺 50 𝑥 𝑃 𝑑𝑎𝑡𝑎 𝑥 Not really better …… …… 𝑃 𝐺 𝑥 𝐽𝑆 𝑃 𝐺 2 || 𝑃 𝑑𝑎𝑡𝑎 =0 𝑃 𝑑𝑎𝑡𝑎 𝑥

49 Add Noise Add some artificial noise to the inputs of discriminator
Make the labels noisy for the discriminator Discriminator cannot perfectly separate real and generated data 𝑃 𝑑𝑎𝑡𝑎 𝑥 and 𝑃 𝐺 𝑥 have some overlap Noises decay over time

50 Mode Collapse

51 Generated Distribution
Mode Collapse Generated Distribution Data Distribution

52 𝑃 𝑑𝑎𝑡𝑎 Mode Collapse What we want … In reality …

53 Flaw in Optimization? 𝐾𝐿= 𝑃 𝑑𝑎𝑡𝑎 𝑙𝑜𝑔 𝑃 𝑑𝑎𝑡𝑎 𝑃 𝐺 𝑑𝑥
Modified from Ian Goodfellow’s tutorial Flaw in Optimization? 𝐾𝐿= 𝑃 𝑑𝑎𝑡𝑎 𝑙𝑜𝑔 𝑃 𝑑𝑎𝑡𝑎 𝑃 𝐺 𝑑𝑥 Reverse 𝐾𝐿= 𝑃 𝐺 𝑙𝑜𝑔 𝑃 𝐺 𝑃 𝑑𝑎𝑡𝑎 𝑑𝑥 𝑃 𝑑𝑎𝑡𝑎 𝑃 𝑑𝑎𝑡𝑎 𝑃 𝐺 𝑃 𝐺 Maximum likelihood (minimize KL( 𝑃 𝑑𝑎𝑡𝑎 || 𝑃 𝐺 )) Minimize KL( 𝑃 𝐺 || 𝑃 𝑑𝑎𝑡𝑎 ) (reverse KL) This may not be the reason (based on Ian Goodfellow’s tutorial)

54 So many GANs …… Modifying the Optimization of GAN fGAN WGAN
Least-square GAN Loss Sensitive GAN Energy-based GAN Boundary-seeking GAN Unroll GAN …… Different Structure from the Original GAN Conditional GAN Semi-supervised GAN InfoGAN BiGAN Cycle GAN Disco GAN VAE-GAN …… Application: deep convolutional generative adversarial networks (DCGANs)

55 Conditional GAN

56 Motivation Generator Text Image
Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, Honglak Lee, “Generative Adversarial Text-to-Image Synthesis”, ICML 2016 Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaolei Huang, Xiaogang Wang, Dimitris Metaxas, “StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks”, arXiv prepring, 2016 Scott Reed, Zeynep Akata, Santosh Mohan, Samuel Tenka, Bernt Schiele, Honglak Lee, “Learning What and Where to Draw”, NIPS 2016

57 Motivation Challenge (a point, not a distribution) NN Text Image 𝑐 𝑥
Text: “train” NN output

58 Conditional GAN Learn to approximate P(x|c) scalar scalar
Training data: 𝑐 , 𝑥 Learn to approximate P(x|c) condition G 𝑐 𝑥 Prior distribution 𝑧 Learn to ignore this term … 𝑐 ,𝑥=𝐺( 𝑐 ) classified as positive dropout D (v1) scalar 𝑥 D (v2) scalar 𝑐 𝑥 Can generated x not related to c Positive example: 𝑐 , 𝑥 Negative example: 𝑐 ,𝐺( 𝑐 ) , 𝑐′ , 𝑥

59 Text to Image - Results

60 Text to Image - Results "red flower with black center"

61 Image-to-image Translation
Façade門面,正面,外觀,虛設的外表 That is the facade of the Palace. 那是宮殿的正面 Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, Alexei A. Efros, “Image-to-Image Translation with Conditional Adversarial Networks”, arXiv preprint, 2016

62

63 Image-to-image Translation - Results

64 Speech Enhancement GAN

65 Speech Enhancement GAN
Using Least-square GAN

66 Least-square GAN For discriminator D has linear output
For Generator D has linear output min 𝐷 𝐸 𝑥~ 𝑃 𝑑𝑎𝑡𝑎 𝐷 𝑥 −𝑏 𝐸 𝑥~ 𝑃 𝐺 𝐷 𝑥 −𝑎 2 1 min 𝐷 𝐸 𝑧~ 𝑃 𝑑𝑎𝑡𝑎 𝐷 𝐺 𝑧 −𝑐 2 1

67 Least-square GAN The code used in demo from:
CNN_GAN_v2.ipynb keras-adversarial Simple example of TF version I cannot run it, probably because of version issue


Download ppt "Generative Adversarial Network (GAN)"

Similar presentations


Ads by Google