3 Neural networks Supervised learning Unsupervised learning The training data consists of input information with their corresponding output information.Unsupervised learningThe training data consists of input information without their corresponding output information.範例:給大量二維的點,supervised有顏色的區分,unsuoervised沒有顏色區分
4 Neural networks Generative model Discriminative model Model the distribution of input as well as output ,P(x , y)Discriminative modelModel the posterior probabilities ,P(y | x)P(x,y2)P(y1|x)P(y2|x)P(x,y1)
5 Neural networks What is the neural? 1 if Linear neurons x1What is the neural?Linear neuronsBinary threshold neuronsSigmoid neuronsStochastic binary neuronsw1x2w2y1b1 if0 otherwise
6 Neural networks Two layer neural networks (Sigmoid neurons) Back-propagationStep1:Randomly initial weightDetermine the output vectorStep2:Evaluating the gradientof an error functionStep3:Adjusting weight,Repeat The step1,2,3until error enough low
7 Neural networks Back-propagation is not good for deep learning It requires labeled training data.Almost data is unlabeled.The learning time is very slow in networks with multiple hidden layers.It is very slow in networks with multi hidden layer.It can get stuck in poor local optima.For deep nets they are far from optimal.Learn P(input) not P(output | input)What kind of generative model should we learn?
9 Graphical modelA graphical model is a probabilistic model for which graph denotes the conditional dependence structure between random variables probabilistic model In this example: D depends on A, D depends on B, D depends on C, C depends on B, and C depends on D.
10 Graphical model Directed graphical model Undirected graphical model A BC𝑃 𝐴,𝐵,𝐶,𝐷 =𝑃 𝐴 𝑃 𝐵 𝐴 𝑃 𝐶 𝐴 𝑃(𝐷|𝐵,𝐶)DACDAG有很多變形例如MC HMM等等𝑃 𝐴,𝐵,𝐶,𝐷 = 1 𝑍 ∗φ 𝐴,𝐵,𝐶 ∗𝜑(𝐵,𝐶,𝐷)BD
12 Belief netsA belief net is a directed acyclic graph composed of stochastic variablesstochastic hidden causesStochastic binary neurons講解為什麼要用STOCHASTICAL BINARY NEURONIt is sigmoid belief netsvisible
13 Belief nets we would like to solve two problems The inference problem: Infer the states of the unobserved variables.The learning problem: Adjust the interactions between variables to make the network more likely to generate the training data.stochastic hidden causesvisible
14 Belief nets It is easy to generate sample P(v | h) It is hard to infer P(h | v)Explaining awaystochastic hidden causesvisible
15 Belief nets Explaining away H1 and H2 are independent, but they can become dependentwhen we observe an effect that they can both influence𝑃 𝐻1 𝑉 𝑎𝑛𝑑 𝑃 𝐻2 𝑉 𝑎𝑟𝑒 𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡H1H2V
16 Belief nets Some methods for learning deep belief nets Monte Carlo methodsBut its painfully slow for large, deep belief netsLearning with samples from the wrong distributionUse Restricted Boltzmann Machines
18 Boltzmann Machine It is a Undirected graphical model The Energy of a joint configurationhiddenjivisible
19 Boltzmann Machine An example of how weights define a distribution h h2v v2-1
20 Boltzmann Machine A very surprising fact Derivative of log probability of one training vector, v under the model.Expected value of product of states at thermal equilibrium when v is clamped on the visible unitsExpected value of product of states at thermal equilibrium with no clamping
21 Boltzmann Machines Restricted Boltzmann Machine We restrict the connectivity to make learning easier.Only one layer of hidden units.We will deal with more layers laterNo connections between hidden unitsMaking the updates more parallelvisible
22 Boltzmann Machines the Boltzmann machine learning algorithm for an RBM jjjjiiiit = 0t = 1t = 2t = infinity
23 Boltzmann Machines Contrastive divergence: A very surprising short-cut jjThis is not following the gradient of the log likelihood. But it works well.iit = t = 1datareconstruction
25 DBN It is easy to generate sample P(v | h) It is hard to infer P(h | v)Explaining awayUse RBM to initial weight can get good optimalstochastic hidden causesvisible
26 DBN Combining two RBMs to make a DBN Compose the two RBM models to make a single DBN modelThen train this RBMcopy binary state for each vTrain this RBM firstIt’s a deep belief nets!
27 DBNetc.h2Why we can use RBM to initial belief nets weights?An infinite sigmoid belief net that is equivalent to an RBMInference in a directed net with replicated weightsInference is trivial. We just multiply v0 by W transpose.The model above h0 implements a complementary prior.Multiplying v0 by W transpose gives theproduct of the likelihood term and the prior term.v2h1v1h0v0
28 DBN Complementary prior A Markov chain is a sequence of variables X1;X2; : : : with the Markov property𝑃 𝑋 𝑡 𝑋 1 ,…, 𝑋 𝑡−1 =𝑃( 𝑋 𝑡 | 𝑋 𝑡−1 )A Markov chain is stationary if the transition probabilities do notdepend on time𝑃 𝑋 𝑡 = 𝑥 ′ 𝑋 𝑡−1 =𝑥 =𝑇 𝑥→ 𝑥 ′𝑇(𝑥→𝑥′) is called the transition matrix.If a Markov chain is ergodic it has a unique equilibrium distribution𝑃 𝑡 𝑋 𝑡 =𝑥 → 𝑃 ∞ 𝑋=𝑥 𝑎𝑠 𝑡→∞X1X2X3X4