Presentation is loading. Please wait.

Presentation is loading. Please wait.

ConvNets for Image Classification

Similar presentations


Presentation on theme: "ConvNets for Image Classification"— Presentation transcript:

1 ConvNets for Image Classification
Welcome to the ML meetup here and thanks to Citrix for hosting 1st part: lecture style intro to convnets, if you know everything about convnets might be boring Example: image classification -> intuitive what’s going on 2nd part: teaching convnets to play go Christfried Focke Santa Barbara Machine Learning MeetUp May 2, 2016

2 Convolutional Neural Networks (CNNs)
Popular use for: Speech recognition ( , ...) Haptic input recognition ( , …) Text classification ( , …) Image Classification (many) Identify people or objects Read printed or handwritten text Playing Atari ( ) Playing Go ( , Nature 529, 484–489, January 2016, …) Extremely useful for marketing and advertisement Relax, speak passionately, control the audience

3 Image Classification Images as “volumes”
Position: width x height RGB: 3 layers deep A dense network has bad scaling behavior # weights ~ depth x height x width x # nodes

4 ConvNets Instead: Make use of translational invariance
Slide low dimensional filter across the image Same filter is used to create entire feature map # weights ~ depth x (filter size) x # filters Easier to learn Faster to compute Less prone to overfitting Built in translation invariance doesn't matter where car is would have to learn the car in different positions reference to image red numbers don't change

5 Architecture Conv layer is followed by:
Non-linear activation function (e.g. ReLU) Pooling layer (down sampling): Reduce storage requirements Invariance under small perturbations Control overfitting Stack many such layers to learn more complex filters At the end: fully connected (dense) network for classification Memory intensive -> pooling to throw away spatial info Memory is largest bottle neck on modern GPUs Softmax functions convert a raw value into a posterior probability -> Provides a measure of certainty

6 What does the ConvNet see?
Interpret filter parameters as “images” 1st layer learns to look for lines 2nd layer learns to look for textures …. nth layer learns to look for dogs, cars, people, … Learn filters that recognize features such as horizontal lines or dogs Filters recognize more higher level features the deeper the network

7 Convolution 1d image with 3 pixels: Filter: More general:
Output (feature map): Get to the point bar = "flip” correlation --> output large when w and x have large overlap Convolution --> expresses amount of overlap of one function as it is shifted over the other More general:

8 In 2D Leow Wee Kheng

9 Conv Layer Zero-padding to avoid small feature maps Input dimensions:
Choose: # Filters K Stride S Filter size F Zero-padding P # Parameters =

10 Pooling Layer Input dimensions: Choose: Stride S Filter size F
# Parameters = 0

11 Backpropagation Convolution Max-Pooling Einstein summation convention
Conv: backward pass is also conv but with flipped filters Max: only route gradient to input that had highest value in forward pass (can keep track of indices during forward pass) Max-Pooling

12 Example http://cs231n.github.io/convolutional-networks/
Volume sizes: activations, gradients (-> could be removed during production) Memory requirements from paramters ~ 3 x #parameters: params, gradients, ‘step cache’ if using momentum

13 Summary ConvNets are well suited for image classification:
Built in translation invariance Parameter sharing Pooling layers: Robustness against small perturbations Reduce memory requirements Control overfitting Classification in dense layers at the end Many related applications that are intuitive for humans Not well suited for tasks that require reflection Parameter sharing <-> filters

14 Teaching ConvNets to play Go
Project confidence Something extraordinary happened Intrigued, read papers Share with you what I learned How AlphaGo defeats Humanity Christfried Focke Santa Barbara Machine Learning MeetUp May 2, 2016

15 What is Go? 19x19 Board Capture opponents stones by surrounding them
Special “ko” rules to prevent infinite loops A player can pass a turn (usually not advantageous) Game ends after both players subsequently pass a turn Winner is determined by special counting rules

16 Why is it hard? The search space is enormous
In theory can compute “optimal value function” in a search tree possible sequences, = # legal moves, = length of game Chess: Go: # Atoms in the observable universe No good heuristics to determine who is winning Chess: examine what pieces are left + some heuristics Go: number of stones for each player is very poor indicator Deep Blue would only go to d = 6 -> 10^9 sequences + evaluation function Evaluation function replaces subtree Move that leads to the ‘least bad’ worst case outcome

17 Previous Efforts Hand crafted rules Symmetries to reduce search space
Monte Carlo Tree Search (MCTS): Simulate playouts using random or cheap best move heuristic Good positions where it wins the majority of games Policy used to select moves improves over time -> converges to optimal play No domain knowledge required Predict human expert moves: Feature construction or shallow NNs Deep Convolutional Neural Networks (DCNN)

18 AlphaGo Two components: MCTS: brute force DCNN: “intuition” for MCTS
4 separate DCNNs: 3 policy networks, 1 value network : Supervised learning (SL) network trained on expert moves Fast learning updates, high quality gradients : Fast SL policy network to sample actions during rollouts : Reinforcement learning (RL) policy network Improves SL network by playing against itself : RL value network to predict winner Evaluation function is learned not designed Nature 529, 484–489, January 2016 January < March Symmetries dynamically sampled - not than hardcoded into weights Target function non-smooth minor changes can alter dramatically which move is played next

19 Policy Networks , = weights, = legal move, = current state
Randomly sample pairs Stochastic gradient ascent to maximize likelihood of move 13 layer network from 30 mio positions (KGS) Tradeoff between network size and evaluation time: Full network: 57% accuracy, Additional rollout policy : 24% accuracy, Reinforcement learning: Play current network against random previous iteration 57% -> suggests human Go play is intuitive rather than reflective RL: should not predict human moves but the best moves

20 Policy Networks RL wins 80% against SL
Fast feed-forward much stronger than tree search Intuition > reflection RL wins 80% against SL RL wins 85% against Pachi (sophisticated MCTS) ( per move)

21 Combining Value and Policy Networks
Train on state-outcome pairs Outputs single prediction instead of probability distribution value of a state = value network output + simulation result Simulation from fast policy network Prior probabilities from slow policy network Encourage exploration by penalizing multiple visits of the same Evaluation mechanisms complementary: Value network approximates the outcome of games played by the strong but impractically slow pρ Rollouts can precisely score and evaluate the outcome of games played by the weaker but faster rollout policy pπ Intuition + reflection complementary

22 Combining Value and Policy Networks
Select state Action value Bonus Visit count Prefer high priors, low visit count, asymptotically high action value SL policy network performed better than the stronger RL policy network humans select diverse set of promising moves, whereas RL optimizes for single best move Intuition + reflection complementary Leaf evaluation Prior probability

23 SL > RL: humans select diverse set of promising moves, RL single best move
a) Root -> leaf L according to a_t (prefer high priors, low visit count, asymptotically high action value) b) Successor state added to search tree once visit count > n_thresh c) Add s_L to queue for v, rollout for s_t, t>=L from p_pi d) Update action values from value (and rollout) Select move with max visit count (less sensitive to outliers) Resign when max_a Q(s,a) < -0.8

24 Network Architectures
Policy network non-linearity: ReLU, Stride: 1 Input: 1st hidden layer: Kernel size: Hidden layers 2-12: Output layer: Softmax Value network Similar to policy network Output layer is fully connected linear layer with single tanh unit

25 Performance

26 Summary Go was seen as the most challenging classic game for AI
Enormous search space Difficulty evaluating board positions and moves AlphaGo uses combination of intuition and reflection: Value networks to evaluate board positions Policy networks to select moves SL from human experts and RL from self-play MCTS for lookahead search 99.8% winning rate against other Go programs 4:1 games won against Go professional Lee Sedol in March 2016

27 Features

28


Download ppt "ConvNets for Image Classification"

Similar presentations


Ads by Google