Image recognition: Defense adversarial attacks using Generative Adversarial Network (GAN) Speaker: Guofei Pang Division of Applied Mathematics Brown University Presentation after reading the paper: Ilyas, Andrew, et al. "The Robust Manifold Defense: Adversarial Training using Generative Models." arXiv preprint arXiv:1712.09196 (2017).
Generative Adversarial Network (GAN) How to defense attacks using GAN Outline Adversarial attacks Generative Adversarial Network (GAN) How to defense attacks using GAN Numerical results 2/25
Adversarial Attacks 3/25
Adversarial Attacks 4/25
n = m n*m Adversarial Attacks 𝐈𝐦𝐚𝐠𝐞 𝐚𝐬 𝐚 𝐯𝐞𝐜𝐭𝐨𝐫: 𝐱= 𝐱𝐣 , 𝐣=𝟏,𝟐,…,𝐧∗𝐦 𝐱= 𝐱𝐣 , 𝐣=𝟏,𝟐,…,𝐧∗𝐦 m n*m 5/25
Adversarial examples for a classifier C(): A pair of input x1 and x2 Adversarial Attacks 𝐱𝟏 𝐱𝟐 𝐱𝟏−𝐱𝟐 𝟐<𝐞𝟎 𝐂(𝐱𝟏)−𝐂(𝐱𝟐) >𝐟𝟎 Adversarial examples for a classifier C(): A pair of input x1 and x2 A person says they are of the same class But a classifier will they are completely different! 6/25
Why does classifier become fool for these examples? Adversarial Attacks Why does classifier become fool for these examples? 7/25
Why does classifier become fool for these examples? Adversarial Attacks Why does classifier become fool for these examples? An intuition from the authors: Natural image: Low-dimensional manifold Noisy image: High-dimensional manifold High dimensionality is tough for classifier. 8/25
Generative adversarial network (GAN) x and x’ have similar PDF G() has learned the underlying distribution of image dataset after training GAN The DNN G() is a nonlinear mapping from low-dimensional space, z, to high-dimensional space, x’ Original image x Synthetic image /Generative model x’=G(z) GAN Generator G(z) Noisy input z, say, z – N(0,I) 9/25
Convergence state: pdata(x)=pG(x) Generative adversarial network (GAN) Convergence state: pdata(x)=pG(x) Green solid line: probability density function (PDF) of the generator G() Black dotted line: PDF of original image x, i.e., pdata(x) Blue dash line: PDF of discriminator D() 10/25
Generative adversarial network (GAN) 11/25
Invert and Classify How to defense attacks using GAN G() is pre-trained and has learned the underlying distribution of the training (image) dataset after training GAN Invert and Classify Synthetic image x’=G(z*) (Preserve low-dimensional manifold) Classifier C() Original image x (Could include high-dimensional manifold when noise enters) 12/25
Enhanced Invert and Classify How to defense attacks using GAN G() is pre-trained and has learned the underlying distribution of the training (image) dataset after training GAN Enhanced Invert and Classify Synthetic image x’=G(z*) (Preserve low-dimensional manifold) Classifier C() (retrain the classifier) Upper bound of attack magnitude Classification loss 13/25
First-order classifier attacks for handwritten digit classification Numerical results First-order classifier attacks for handwritten digit classification 14/25
First-order classifier attacks for handwritten digit classification Numerical results First-order classifier attacks for handwritten digit classification 15/25
First-order classifier attacks for handwritten digit classification Numerical results First-order classifier attacks for handwritten digit classification 16/25
First-order classifier attacks for gender classification Numerical results First-order classifier attacks for gender classification 17/25
First-order classifier attacks for gender classification Numerical results First-order classifier attacks for gender classification 18/25
Substitute model attacks Numerical results Substitute model attacks Results from Invert and Classify 19/25
Invert and Classify and Enhanced Invert and Classify Numerical results Comparison between Invert and Classify and Enhanced Invert and Classify 20/25
Numerical results 21/25
Numerical results 22/25
Numerical results 23/25
Numerical results 24/25
GAN for regression problems? GAN versus other neural networks? Thinking GAN for regression problems? GAN versus other neural networks? One defense strategy for all types of attacks? 25/25