Presentation is loading. Please wait.

Presentation is loading. Please wait.

Today’s Topics 11/10/15CS 540 - Fall 2015 (Shavlik©), Lecture 21, Week 101 More on DEEP ANNs –Convolution –Max Pooling –Drop Out Final ANN Wrapup FYI:

Similar presentations


Presentation on theme: "Today’s Topics 11/10/15CS 540 - Fall 2015 (Shavlik©), Lecture 21, Week 101 More on DEEP ANNs –Convolution –Max Pooling –Drop Out Final ANN Wrapup FYI:"— Presentation transcript:

1 Today’s Topics 11/10/15CS 540 - Fall 2015 (Shavlik©), Lecture 21, Week 101 More on DEEP ANNs –Convolution –Max Pooling –Drop Out Final ANN Wrapup FYI: Some Resources http://deeplearning.net/ http://googleresearch.blogspot.com/2015/11/tensorflow-googles-latest- machine_9.htmlhttp://googleresearch.blogspot.com/2015/11/tensorflow-googles-latest- machine_9.html https://research.facebook.com/blog/879898285375829/fair-open-sources- deep-learning-modules-for-torch/https://research.facebook.com/blog/879898285375829/fair-open-sources- deep-learning-modules-for-torch

2 Back to Deep ANNs - Convolution & Max Pooling 11/10/15CS 540 - Fall 2015 (Shavlik©), Lecture 21, Week 10 2 C = Convolution, MP = Max Pooling (next) (ie, a CHANGE OF REP)

3 Look for 8’s in all the Right Places 11/10/15CS 540 - Fall 2015 (Shavlik©), Lecture 21, Week 10 3 Imagine we have a great 8 detector expressed as an 8x8 array of 0-1’s (see upper left) We want to find all the 8’s in an 1024x1024 image of 0-1’s Q:What might we do? A: ‘Slide’ the detector across the image and count # of matching bits in the detector and the ‘overlaid’ image 888 If count greater than some threshold, say an ‘8’ is there etc

4 Look for 8’s in all the Right Places 11/10/15CS 540 - Fall 2015 (Shavlik©), Lecture 21, Week 10 4 Q:What about 8’s in the image larger than 8x8 bits? A: Use ‘detectors’ of, say, 16x16, 32x32, 64x64, etc PS:Could also ‘slide’ slightly rotated 8’s of various sizes (too much rotation and becomes infinity symbol!) 8 etc 88

5 Back to Deep ANNs - Convolution (cont.) –but each ‘template’ is a HU and the wgts are learned –some HUs are coupled –each group of HUs learns what to ‘look for’ –we do hard-code the ‘size’ of the ‘template’ 11/10/15CS 540 - Fall 2015 (Shavlik©), Lecture 21, Week 10 5 The ‘sliding window’ is the basic idea of convolution Input units rep’ing the image HU2 HU1 Our code would employ weight sharing, ie the corresponding weights in each HU (eg, the two thicker lines) would always have the same value Note: HU77, say, would connect to the same INPUTS as HU1 but would have different wgts, ie would be a different ‘feature dectors’

6 BACKUP: A Possibly Helpful Slide on Convolution from the Web 11/10/15CS 540 - Fall 2015 (Shavlik©), Lecture 21, Week 10 6

7 Back to Deep ANNs - Max Pooling Researchers have empirically found it helpful to ‘clean up’ the convolution scores by –Creating the next layer of HUs where each HU holds the MAX score in an N  N array, for various values of N and across various locations –This is called MAX POOLING (example on next slide) –Advanced note (not on final): I’m not sure if people (a) use the differentiable ‘soft max’ (https://en.wikipedia.org/wiki/Softmax_function) and BP through all nodes or (b) only BP through the max node; I’d guess (b)https://en.wikipedia.org/wiki/Softmax_function 11/10/15CS 540 - Fall 2015 (Shavlik©), Lecture 21, Week 10 7

8 Back to Deep ANNs - Max Pooling Example (connections not shown) 11/10/15CS 540 - Fall 2015 (Shavlik©), Lecture 21, Week 10 8 -4546 0-32 78-59 30-41 Hidden Layer i Possible Nodes in Hidden Layer i + 1 4x4 max 56 89 2x2 max, non overlapping 556 889 889 2x2 max, overlapping (contains non- overlapping, so no need for both) 9

9 Back to Deep ANNs - Drop Out (from Hinton’s Group) Each time one example is processed (forward + back prop) during TRAINING Randomly turn off (‘ie, drop out’) a fraction (say,  = ½) of the input and hidden units During TESTING scale all weights by (1 -  ), since that is the fraction of time each unit was present during training (ie, so on average, weighted sums are the same) Adds ROBUSTNESS – need to learn multiple ways to compute the function being learned 11/10/15CS 540 - Fall 2015 (Shavlik©), Lecture 21, Week 10 9

10 Back to Deep ANNs - Drop Out as an Ensemble Drop Out can be viewed as training an ensemble of ‘thinned’ ANNs - ie, consider all possible ANNs that one can construct by ‘thinning’ the non-output nodes in the original ANN - in each Drop Out step we are training ONE of these (but note that ALL since wgts shared) 11/10/15CS 540 - Fall 2015 (Shavlik©), Lecture 21, Week 10 10 becomes We implicitly store O (2 N ) networks in 1, where N = # non- output nodes

11 Warning: At the Research Frontier Research on Deep ANNs changing rapidly, lot of IT-industry money dedicated to it Until recently, people used unsupervised ML to train all the HU layers except the final one (surprisingly, BP works through many levels when much data!) So this ‘slide deck’ likely to be out of date soon, if not already 11/10/15CS 540 - Fall 2015 (Shavlik©), Lecture 21, Week 10 11

12 Neural Network Wrapup ANNs compute weighted sums to make decisions Use (stochastic) gradient descent to adjust weights in order to reduce error (or cost) Only find local minima, though (but good enough!) Impressive testset accuracy, especially Deep ANNs on (mainly) vision tasks and natural language tasks Slow training (GPUs, parallelism, advanced optimization methods, etc help) Learned models hard to interpret 11/10/15CS 540 - Fall 2015 (Shavlik©), Lecture 21, Week 1012


Download ppt "Today’s Topics 11/10/15CS 540 - Fall 2015 (Shavlik©), Lecture 21, Week 101 More on DEEP ANNs –Convolution –Max Pooling –Drop Out Final ANN Wrapup FYI:"

Similar presentations


Ads by Google