Presentation is loading. Please wait.

Presentation is loading. Please wait.

Deep Learning Bing-Chen Tsai 1/21.

Similar presentations


Presentation on theme: "Deep Learning Bing-Chen Tsai 1/21."— Presentation transcript:

1 Deep Learning Bing-Chen Tsai 1/21

2 outline Neural networks Graphical model Belief nets Boltzmann machine
DBN Reference

3 Neural networks Supervised learning Unsupervised learning
The training data consists of input information with their corresponding output information. Unsupervised learning The training data consists of input information without their corresponding output information. 範例:給大量二維的點,supervised有顏色的區分,unsuoervised沒有顏色區分

4 Neural networks Generative model Discriminative model
Model the distribution of input as well as output ,P(x , y) Discriminative model Model the posterior probabilities ,P(y | x) P(x,y2) P(y1|x) P(y2|x) P(x,y1)

5 Neural networks What is the neural? 1 if Linear neurons
x1 What is the neural? Linear neurons Binary threshold neurons Sigmoid neurons Stochastic binary neurons w1 x2 w2 y 1 b 1 if 0 otherwise

6 Neural networks Two layer neural networks (Sigmoid neurons)
Back-propagation Step1: Randomly initial weight Determine the output vector Step2: Evaluating the gradient of an error function Step3: Adjusting weight, Repeat The step1,2,3 until error enough low

7 Neural networks Back-propagation is not good for deep learning
It requires labeled training data. Almost data is unlabeled. The learning time is very slow in networks with multiple hidden layers. It is very slow in networks with multi hidden layer. It can get stuck in poor local optima. For deep nets they are far from optimal. Learn P(input) not P(output | input) What kind of generative model should we learn?

8 outline Neural networks Graphical model Belief nets Boltzmann machine
DBN Reference

9 Graphical model A graphical model is a probabilistic model for which graph denotes the conditional dependence structure between random variables probabilistic model  In this example: D depends on A, D depends on B, D depends on C, C depends on B, and C depends on D.

10 Graphical model Directed graphical model Undirected graphical model A
B C 𝑃 𝐴,𝐵,𝐶,𝐷 =𝑃 𝐴 𝑃 𝐵 𝐴 𝑃 𝐶 𝐴 𝑃(𝐷|𝐵,𝐶) D A C DAG有很多變形例如MC HMM等等 𝑃 𝐴,𝐵,𝐶,𝐷 = 1 𝑍 ∗φ 𝐴,𝐵,𝐶 ∗𝜑(𝐵,𝐶,𝐷) B D

11 outline Neural networks Graphical model Belief nets Boltzmann machine
DBN Reference

12 Belief nets A belief net is a directed acyclic graph composed of stochastic variables stochastic hidden causes Stochastic binary neurons 講解為什麼要用STOCHASTICAL BINARY NEURON It is sigmoid belief nets visible

13 Belief nets we would like to solve two problems
The inference problem: Infer the states of the unobserved variables. The learning problem: Adjust the interactions between variables to make the network more likely to generate the training data. stochastic hidden causes visible

14 Belief nets It is easy to generate sample P(v | h)
It is hard to infer P(h | v) Explaining away stochastic hidden causes visible

15 Belief nets Explaining away
H1 and H2 are independent, but they can become dependent when we observe an effect that they can both influence 𝑃 𝐻1 𝑉 𝑎𝑛𝑑 𝑃 𝐻2 𝑉 𝑎𝑟𝑒 𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 H1 H2 V

16 Belief nets Some methods for learning deep belief nets
Monte Carlo methods But its painfully slow for large, deep belief nets Learning with samples from the wrong distribution Use Restricted Boltzmann Machines

17 outline Neural networks Graphical model Belief nets Boltzmann machine
DBN Reference

18 Boltzmann Machine It is a Undirected graphical model
The Energy of a joint configuration hidden j i visible

19 Boltzmann Machine An example of how weights define a distribution
h h2 v v2 -1

20 Boltzmann Machine A very surprising fact
Derivative of log probability of one training vector, v under the model. Expected value of product of states at thermal equilibrium when v is clamped on the visible units Expected value of product of states at thermal equilibrium with no clamping

21 Boltzmann Machines Restricted Boltzmann Machine
We restrict the connectivity to make learning easier. Only one layer of hidden units. We will deal with more layers later No connections between hidden units Making the updates more parallel visible

22 Boltzmann Machines the Boltzmann machine learning algorithm for an RBM
j j j j i i i i t = 0 t = 1 t = 2 t = infinity

23 Boltzmann Machines Contrastive divergence: A very surprising short-cut
j j This is not following the gradient of the log likelihood. But it works well. i i t = t = 1 data reconstruction

24 outline Neural networks Graphical model Belief nets Boltzmann machine
DBN Reference

25 DBN It is easy to generate sample P(v | h)
It is hard to infer P(h | v) Explaining away Use RBM to initial weight can get good optimal stochastic hidden causes visible

26 DBN Combining two RBMs to make a DBN
Compose the two RBM models to make a single DBN model Then train this RBM copy binary state for each v Train this RBM first It’s a deep belief nets!

27 DBN etc. h2 Why we can use RBM to initial belief nets weights? An infinite sigmoid belief net that is equivalent to an RBM Inference in a directed net with replicated weights Inference is trivial. We just multiply v0 by W transpose. The model above h0 implements a complementary prior. Multiplying v0 by W transpose gives the product of the likelihood term and the prior term. v2 h1 v1 h0 v0

28 DBN Complementary prior
A Markov chain is a sequence of variables X1;X2; : : : with the Markov property 𝑃 𝑋 𝑡 𝑋 1 ,…, 𝑋 𝑡−1 =𝑃( 𝑋 𝑡 | 𝑋 𝑡−1 ) A Markov chain is stationary if the transition probabilities do not depend on time 𝑃 𝑋 𝑡 = 𝑥 ′ 𝑋 𝑡−1 =𝑥 =𝑇 𝑥→ 𝑥 ′ 𝑇(𝑥→𝑥′) is called the transition matrix. If a Markov chain is ergodic it has a unique equilibrium distribution 𝑃 𝑡 𝑋 𝑡 =𝑥 → 𝑃 ∞ 𝑋=𝑥 𝑎𝑠 𝑡→∞ X1 X2 X3 X4

29 DBN Most Markov chains used in practice satisfy detailed balance
𝑃 ∞ (𝑋)𝑇(𝑋 →𝑋′)= 𝑃 ∞ (𝑋′)𝑇(𝑋′ → 𝑋) e.g. Gibbs, Metropolis-Hastings, slice sampling. . . Such Markov chains are reversible X1 X2 X3 X4 𝑃 ∞ 𝑋 1 𝑇 𝑋 1 → 𝑋 2 𝑇 𝑋 2 → 𝑋 3 𝑇( 𝑋 3 → 𝑋 4 ) X1 X2 X3 X4 𝑇 𝑋 1 ← 𝑋 2 𝑇 𝑋 2 ← 𝑋 3 𝑇 𝑋 3 ← 𝑋 4 𝑃 ∞ ( 𝑋 4 )

30 DBN 𝑃 𝑌 𝑘 =1 𝑋 𝑘+1 =𝜎( 𝑊 𝑇 𝑋 𝑘+1 +𝑐) 𝑃 𝑋 𝑘 =1 𝑌 𝑘 =𝜎(𝑊 𝑌 𝑘 +𝑏)

31 DBN Combining two RBMs to make a DBN
Compose the two RBM models to make a single DBN model Then train this RBM copy binary state for each v Train this RBM first It’s a deep belief nets!

32 Reference Deep Belief Nets,2007 NIPS tutorial , G . Hinton
Machine learning 上課講義


Download ppt "Deep Learning Bing-Chen Tsai 1/21."

Similar presentations


Ads by Google