Presentation is loading. Please wait.

Presentation is loading. Please wait.

LAC group, 16/06/2011. So far...  Directed graphical models  Bayesian Networks Useful because both the structure and the parameters provide a natural.

Similar presentations


Presentation on theme: "LAC group, 16/06/2011. So far...  Directed graphical models  Bayesian Networks Useful because both the structure and the parameters provide a natural."— Presentation transcript:

1 LAC group, 16/06/2011

2 So far...  Directed graphical models  Bayesian Networks Useful because both the structure and the parameters provide a natural representation for many types of real-world domains.

3 This chapter...  Undirected graphical models Useful in modelling phenomena where we cannot determine the directionality of the interaction between the variables. Offer a different, simpler perspective on directed models (both independence structure & inference task)

4 This chapter...  Introduce a framework that allows both directed and undirected edges  Note: some of the results in this chapter require that we restrict attention to distribution over discrete state spaces.  Discrete vs. continuous = boolean or real numbers e.g. 2.1.6

5 The 4 students example (The misconception example sec. 3.4.2, ex.3.8)  4 students who get together in pairs to work on their homework for a class. The pairs that meet are shown via the edges (lines) of this undirected graph :  A : Alice  B : Bobby  C : Charles  D : Debbie A D B C

6 The 4 students example We want to model the following distribution: 1) A is independent of C given B and D 2) B is independent of D given A and C

7 The 4 students example PROBLEM 1: If we try to model these on a Bayesian network, we will be in trouble:  Any bayesian network I-map of such a distribution will have extraneous edges  At least one of the desired independence statements will not be captured (cont’d)

8 The 4 students example (cont’d)  Any bayesian will require from us to describe the directionality of the influence Also:  Interactions look symmetrical and we would like to model this somehow, without representing a direction of influence.

9 The 4 students example SOLUTION 1: Undirected graph = (here) Markov network structure  Nodes (circles) represent variables  Edges (lines) represent a notion of direct probabilistic interaction between the neighbouring variables, not mediated by any other variable in the network. A D B C

10 The 4 students example PROBLEM 2:  How to parameterise this undirected graph?  CPD (conditional probability distribution) not useful, as the interaction is not directed  We would like to capture the affinities between the related variables e.g. Alice and Bobby are more likely to agree than disagree A D B C

11 The 4 students example SOLUTION 2:  Associate A and B with a general purpose function : factor

12 The 4 students example  Here we focus only on non-negative factors. Factor: Let D be a set of random variables. We define a factor φ to be a function from Val(D) to R. A factor is non- negative if all its entries are non-negative. Scope: The set of variables D is called the scope of the factor and is denoted as Scope[φ].

13 The 4 students example  Let’s calculate the factor of A and B i.e. the fact that Alice and Bob are more likely to agree than disagree: φ 1 (A,B) : Val(A,B) to R + The value associated with a particular assignment a,b denotes the affinity between the two values: the higher the value of φ 1 (A,B) the more compatible the two values are

14 The 4 students example  Fig 4.1/a shows one possible compatibility factor for A and B  Not normalised (see partial function later on how to do this)  0: right, 1:wrong/has the misconception φ 1 (A,B) a0a0 b0b0 30 a0a0 b1b1 5 a1a1 b0b0 1 a1a1 b1b1 10 0: right, 1:wrong/has the misconception

15 The 4 students example  φ 1 (A,B) asserts that:  it is more likely that Alice and Bob agree φ 1 (a 0, b 0 ), φ 1 (a 1, b 1 ) - they are more likely to be either both wrong or both right  If they disagree, Alice is more likely to be right (φ 1 (a 0, b 1 )) than Bob (φ 1 (a 1, b 0 )) φ 1 (A,B) a0a0 b0b0 30 a0a0 b1b1 5 a1a1 b0b0 1 a1a1 b1b1 10 0: right, 1:wrong/has the misconception

16 The 4 students example  φ 3 (C,D) asserts that:  Charles and Debbie argue all the time and they will end up disagreeing any way : φ 3 (c 0, d 1 ) and φ 3 (c 1, d 0 ) φ 3 (C,D) c0c0 d0d0 1 c0c0 d1d1 100 c1c1 d0d0 c1c1 d1d1 1 0: right, 1:wrong/has the misconception

17 The 4 students example So far:  defined the local interactions between variables/nodes/circles Next step:  Define a global model : need to combine these interactions = multiply them as with a Bayesian network

18 The 4 students example A possible GLOBAL MODEL: P(a,b,c,d) = φ 1 (a, b) ∙ φ 2 (b, c) ∙ φ 3 (c, d) ∙ φ 4 (d, a) PROBLEM: Nothing guarantees that the result is a normalised distribution (see fig. 4.2 middle column)

19 The 4 students example SOLUTION Take the product of the local factors and normalise it: P(a,b,c,d) = 1/Z ∙ φ 1 (a, b) ∙ φ 2 (b, c) ∙ φ 3 (c, d) ∙ φ 4 (d, a) Where Z= ∑ φ 1 (a, b) ∙ φ 2 (b, c) ∙ φ 3 (c, d) ∙ φ 4 (d, a) Z is a normalising constant known as partition function : partition as in markov random field in statistical physics; function, as Z is a function of the parameters [important for machine learning]

20 The 4 students example  See figure 4.2 for the calculations of the joint distribution  Calculate the partition function of a 1,b 1,c 0,d 1

21 The 4 students example  We can use the partition function/joint probability to answer questions like:  How likely is Bob to have a misconception?  How likely is Bob to have the misconception, given that Charles doesn’t?

22 The 4 students example  How likely is Bob to have the misconception? P(b 1 ) ≈ 0.732 P(b 0 ) ≈ 0.268 Bob is 26% less ?? likely to have the misconception

23 The 4 students example  How likely is Bob to have the misconception, given that Charles doesn’t? P(b 1 |c 0 ) ≈ 0.06

24 The 4 students example Advantages of this approach:  Allows great flexibility in representing interactions between variables.  We can change the nature of interaction between A and B by simply modifying the entries in the factor without caring about normalisation constraints and the interaction of other factors

25 The 4 students example  Tight connection between factorisation of the distribution and its independence properties:  Factorisation:

26 The 4 students example  Using the formula in 3) we can decompose the distribution in several ways e.g. P(A,B,C,D) = [1/Z ∙ φ 1 (A, B) ∙ φ 2 (B, C)] ∙ φ 3 (C, D) ∙ φ 4 (A, D) and infer that


Download ppt "LAC group, 16/06/2011. So far...  Directed graphical models  Bayesian Networks Useful because both the structure and the parameters provide a natural."

Similar presentations


Ads by Google