Presentation is loading. Please wait.

Presentation is loading. Please wait.

Deep Learning Bing-Chen Tsai 1/21 1. outline  Neural networks  Graphical model  Belief nets  Boltzmann machine  DBN  Reference 2.

Similar presentations


Presentation on theme: "Deep Learning Bing-Chen Tsai 1/21 1. outline  Neural networks  Graphical model  Belief nets  Boltzmann machine  DBN  Reference 2."— Presentation transcript:

1 Deep Learning Bing-Chen Tsai 1/21 1

2 outline  Neural networks  Graphical model  Belief nets  Boltzmann machine  DBN  Reference 2

3 Neural networks  Supervised learning  The training data consists of input information with their corresponding output information.  Unsupervised learning  The training data consists of input information without their corresponding output information. 3

4 Neural networks  Generative model  Model the distribution of input as well as output,P(x, y)  Discriminative model  Model the posterior probabilities,P(y | x) P(x,y1) P(x,y2) P(y1|x)P(y2|x) 4

5 Neural networks  What is the neural?  Linear neurons  Binary threshold neurons  Sigmoid neurons  Stochastic binary neurons x1 x2 1 w1 w2 b y 1 if 0 otherwise 5

6 Neural networks  Two layer neural networks ( Sigmoid neurons ) 6 Back-propagation Step1: Randomly initial weight Determine the output vector Step2: Evaluating the gradient of an error function Step3: Adjusting weight, Repeat The step1,2,3 until error enough low

7 Neural networks  Back-propagation is not good for deep learning  It requires labeled training data.  Almost data is unlabeled.  The learning time is very slow in networks with multiple hidden layers.  It is very slow in networks with multi hidden layer.  It can get stuck in poor local optima.  For deep nets they are far from optimal.  Learn P(input) not P(output | input)  What kind of generative model should we learn? 7

8 outline  Neural networks  Graphical model  Belief nets  Boltzmann machine  DBN  Reference 8

9 Graphical model  A graphical model is a probabilistic model for which graph denotes the conditional dependence structure between random variables probabilistic model 9 In this example: D depends on A, D depends on B, D depends on C, C depends on B, and C depends on D.

10 Graphical model  Directed graphical model  Undirected graphical model 10 A BC D A B C D

11 outline  Neural networks  Graphical model  Belief nets  Boltzmann machine  DBN  Reference 11

12 Belief nets  A belief net is a directed acyclic graph composed of stochastic variables 12 stochastic hidden causes visible Stochastic binary neurons It is sigmoid belief nets

13 Belief nets  we would like to solve two problems  The inference problem: Infer the states of the unobserved variables.  The learning problem: Adjust the interactions between variables to make the network more likely to generate the training data. 13 stochastic hidden causes visible

14 Belief nets  It is easy to generate sample P(v | h)  It is hard to infer P(h | v)  Explaining away 14 stochastic hidden causes visible

15 Belief nets  Explaining away 15 H1H2 V

16 Belief nets  Some methods for learning deep belief nets  Monte Carlo methods  But its painfully slow for large, deep belief nets  Learning with samples from the wrong distribution  Use Restricted Boltzmann Machines 16

17 outline  Neural networks  Graphical model  Belief nets  Boltzmann machine  DBN  Reference 17

18 Boltzmann Machine  It is a Undirected graphical model  The Energy of a joint configuration 18 hidden i j visible

19 Boltzmann Machine 19 h1 h v1 v2 An example of how weights define a distribution

20 Boltzmann Machine  A very surprising fact 20 Derivative of log probability of one training vector, v under the model. Expected value of product of states at thermal equilibrium when v is clamped on the visible units Expected value of product of states at thermal equilibrium with no clamping

21 Boltzmann Machines  Restricted Boltzmann Machine  We restrict the connectivity to make learning easier.  Only one layer of hidden units.  We will deal with more layers later  No connections between hidden units  Making the updates more parallel 21 visible

22 Boltzmann Machines  the Boltzmann machine learning algorithm for an RBM 22 i j ii j i j t = 0 j t = 1t = 2t = infinity

23 Boltzmann Machines  Contrastive divergence: A very surprising short-cut 23 t = 0 t = 1 reconstruction data i j i j This is not following the gradient of the log likelihood. But it works well.

24 outline  Neural networks  Graphical model  Belief nets  Boltzmann machine  DBN  Reference 24

25 DBN  It is easy to generate sample P(v | h)  It is hard to infer P(h | v)  Explaining away  Use RBM to initial weight can get good optimal 25 stochastic hidden causes visible

26 DBN  Combining two RBMs to make a DBN 26 copy binary state for each v Compose the two RBM models to make a single DBN model Train this RBM first Then train this RBM It’s a deep belief nets!

27 DBN  Why we can use RBM to initial belief nets weights?  An infinite sigmoid belief net that is equivalent to an RBM  Inference in a directed net with replicated weights  Inference is trivial. We just multiply v0 by W transpose.  The model above h0 implements a complementary prior.  Multiplying v0 by W transpose gives the product of the likelihood term and the prior term. 27 v 1 h 1 v 0 h 0 v 2 h 2 etc.

28 DBN 28 X1 X2X3X4

29 DBN 29 X1 X2X3X4 X1 X2X3X4

30 DBN 30

31 DBN  Combining two RBMs to make a DBN 31 copy binary state for each v Compose the two RBM models to make a single DBN model Train this RBM first Then train this RBM It’s a deep belief nets!

32 Reference  Deep Belief Nets,2007 NIPS tutorial, G. Hinton  https://class.coursera.org/neuralnets /class/index  Machine learning 上課講義  el 32


Download ppt "Deep Learning Bing-Chen Tsai 1/21 1. outline  Neural networks  Graphical model  Belief nets  Boltzmann machine  DBN  Reference 2."

Similar presentations


Ads by Google