Presentation is loading. Please wait.

Presentation is loading. Please wait.

Welcome deep loria !.

Similar presentations


Presentation on theme: "Welcome deep loria !."— Presentation transcript:

1 Welcome deep loria !

2 Deep Loria Mailing list: deeploria@inria.fr
Web site: Git repository:

3 deeploria: involvment
I'm no DL expert !!! (at most a trigger) Deeploria'll be what you make of it: Need volunteers !! Propose anything Organize, participate, animate... Next meeting (please think about it): Coffee & discussion session ? → paper reading group: who's willing to take care of it ? Demo for Yann LeCun's venue ? … ?

4 Outline Motivation Lightning-speed overview of DNN basics
Neuron vs. Random Variable; activations Layers: dense, RNN Vanishing gradient More layers: LSTM, RBM/DBN, CNN, Autoencoder Implementation with Keras/Theano

5 Why all this buzz with DNN ?
Because of Expressive Power cf. “On the Expressive Power of Deep Learning: A Tensor Analysis” by Nadav Cohen, Or Sharir, Amnon Shashua “[...] besides a negligible set, all functions that can be implemented by a deep network of polynomial size, require an exponential size if one wishes to implement (or approximate) them with a shallow network”

6 Basic neuron

7 Activations sigmoid = logistic relu = rectified linear

8 Dense layer

9 Alternative “neuron” Graphical model: node = random variable
Connection = “dependency” between variables Restricted Boltzmann Machine (RBM)

10 Training Dense: Minimize error Stochastic Gradient Descent (SGD) =
gradient descent ( back-propagation) RBM: Minimize energy Contrastive Divergence = gradient descent ( Gibbs sampling)

11 DNN vs. DBN N x Dense → DNN (Deep Neural Networks)
N x RBM → DBN (Deep Belief Networks) Dense are discriminative = model the “boundary” between classes RBM are generative = model every classe Performances: RBM better (?) Efficiency: RBM much more difficult to train Usage: 90% for Dense

12 Recurrent neural network
Take the past into account to predict the next step Just like HMMs, CRFs...

13 Issue 1: Vanishing gradient
Back-propagation of error E = chain rule: N layers → N factors of gradient of activation Gradient decreases exponentially with N Consequences: The deepest layers are never learnt

14 Vanishing gradient Solutions: More data !
Rectified linear (gradient = 1) Unsupervised pre-training: DBN Autoencoders LSTMs instead of RNNs

15 Autoencoders Re-create the inputs = model of the data with dimensionality reduction = compression

16 LSTM

17 Vanishing gradient

18 Issue 2 : overfitting

19 Overfitting: solutions
Share weights: ex: convolutional layer Regularization: ex: Drop-out, Drop-connect...

20 Time to code, isn't it ?

21 Keras example: Reuters
Trains an MLP to classify texts into 46 topics In the root dir of Keras, run: python examples/reuters_mlp.py

22 Keras example max_words 512 46 topics

23 Tricks for the model Score = categorical cross-entropy
= kind of smooth, continuous classification error Softmax = normalizes the outputs as probas Adam = adaptive gradient ?

24 Tricks for the data X_train = int[# sentences][# words] = word idx
Converts list of word indexes into matrix = #sents X Bag Of Words vector (dim=#words)

25 Plot accuracy as fct of epochs
sudo apt-get install python-matplotlib import matplotlib.pyplot as plt […] plt.plot(history.history['acc']) plt.show()

26 Plot matrix of weights Or
plt.matshow(model.get_weight()[0], cmap=plt.cm.gray) plt.show() Or plt.savefig(“fig.png”)

27 Rules of thumb Check overfitting: plot training acc vs. test acc
Check vanishing gradient: plot weights or gradients Normalize your inputs & outputs Try to automatically augment your training set: add noise, rotate/translate images...

28


Download ppt "Welcome deep loria !."

Similar presentations


Ads by Google