Presentation is loading. Please wait.

Presentation is loading. Please wait.

Boltzman Machines Stochastic Hopfield Machines Lectures 11e https://class.coursera.org/neuralnets-2012-001/lecture/131 1.

Similar presentations


Presentation on theme: "Boltzman Machines Stochastic Hopfield Machines Lectures 11e https://class.coursera.org/neuralnets-2012-001/lecture/131 1."— Presentation transcript:

1 Boltzman Machines Stochastic Hopfield Machines Lectures 11e https://class.coursera.org/neuralnets-2012-001/lecture/131 1

2 2 Document classification given binary vectors Nuclear power station – dont want positive examples!

3 3 Two ways a model can generate data: 1)Causal model: First generate latent variables (hidden units), then … 2)Boltzman Machines: …

4 4

5 5

6 6

7 7

8 8

9 What to do when the network is big 9

10 What is Needed for Learning: 10

11 Learning in Boltzman Machines Lecture 12a 11

12 Modelling the input vectors 12 There are no labels; we want to build a model of a set of input vectors.

13 13

14 14

15 15

16 16 Given that one needs to know about all the other weights, it is very surprising that there is a simple learning algorithm:

17 17 How often i and j are on together when v is clamped on visible units How often i and j are on together when v is NOT clamped

18 18 First term in the rule says raise the weights in proportion to the product of activities that the units have (Hebbian learning). But if we only use this rule, the weights will all become positive and the whole system will blow up. So the second term in the rule says to decrease how often the units are on together when you are sampling from the model’s distribution. An alternate view is that the first term is like the storage term for a Hopfield net and the second term term for getting rid of the spurious minima. And this is the correct way of thinking about that (that tells you how much unlearning to do).

19 19

20 Unlearning to get rid of the spurious minima 20

21 -Sample how often to units are on together = measuring the correlation between two units -Repeat over all the data vectors -You expect the energy landscape to have many different minimum that are fairly separated and have about the same energy. -Model a set of images all of which has the same energy and unreasonable images with very high energy.

22 Restricted Boltzman Machines Lecture 12c 22

23 Much simplified architecture: No connection between hidden units If visible units are given, equilibrium distribution of hidden units can be computed in one step – because hidden units are all independent from one another given the state of visible units Proper Boltzman Machine learning alg. is still slow for a restricted Boltzman machine In 1998, a short cut for Boltzman machines (Hinton)  approx. but works well in practice  caused resurgence in this area 23

24 24

25 25 Note that this does not depend on what other units are doing; so can be computed all in parallel.

26 26 Fantasy particles == global configurations After each weight update, you update the fantasy particles a little and that should bring them back to close to being in equilibrium. Algorithm works very well at building density models.

27 Alternate but much faster algorithm: 27

28 28 Hinton 2002 -

29 29

30 30

31 31

32 Example of Contrastive Divergence Lecture 12d 32

33 33

34 34

35 35

36 36

37 37

38 RBMs for Collaborative Filtering Lecture 12e 38


Download ppt "Boltzman Machines Stochastic Hopfield Machines Lectures 11e https://class.coursera.org/neuralnets-2012-001/lecture/131 1."

Similar presentations


Ads by Google