Presentation is loading. Please wait.

Presentation is loading. Please wait.

Probabilistic Graphical Models seminar 15/16 (0368-4511-01) Haim Kaplan Tel Aviv University.

Similar presentations


Presentation on theme: "Probabilistic Graphical Models seminar 15/16 (0368-4511-01) Haim Kaplan Tel Aviv University."— Presentation transcript:

1 Probabilistic Graphical Models seminar 15/16 (0368-4511-01) Haim Kaplan Tel Aviv University

2 What is a Probabilistic Graphical Model (PGM) ? A method to represent a joint probability distribution over a set of random variables X=(X 1,…,X n )

3 Explicit Representation Huge X1X1 X2X2 X3X3 XnXn P(X 1,…X n ) 1011100011/100

4 PGM Use the special structure of the distribution to get a more compact representation Two kinds: Baysian networks (directed) Markov networks (undirected)

5 A Baysian Network (Chapter 3) T t

6 A Markov Model (Chapter 4) B DC A Potential functions defined over cliques:

7 A Markov Model B DC A

8 Inference (chapters 9-13) Given a PGM we want to compute – Conditional probability query: P(Y|E=e). The probability distribution on the values of Y given that E=e. – Maximum a posteriori (MAP): argmax y P(y|E=e). The assignment y to Y that maximizes P(y|E=e). (Most Probable Explanation (MPE) when Y consists of all remaining variables)

9 Examples

10 Hidden Markov models for the parts of speech problem X i ’s value is a part of speech (noun, verb, proposition..) O i ’s value is a word

11 Hidden Markov models for the parts of speech problem We observe the O i ’s and we would like to compute the X i ’s that maximize P(X i ’s | O i ’s) ?

12 Pylogenetic trees

13 Pedigrees

14 Phenotype vs Genotype

15 Pedigrees F2F2 O1O1 F1F1 M2M2 O2O2 M1M1

16 F2F2 O1O1 F1F1 M2M2 O2O2 M1M1 F2F2 O1O1 F1F1 M2M2 O2O2 M1M1 More than one gene Haplotype variables form a hidden Markov model

17 F2F2 O1O1 F1F1 M2M2 O2O2 M1M1 F2F2 O1O1 F1F1 M2M2 O2O2 M1M1 More than one individual A1A1 A2A2 A1A1 A2A2

18 The computer vision applications – Examples Image segmentation (2D, 3D) Stereo – depth estimation There are many others…

19 Image segmentation Separate foreground (object) from background

20 Image segmentation Each pixel is a vertex A vertex connects to its neighbors

21 Image segmentation For every pixel we have a factor Ф(X): We determine Ф(b) and Ф(f) X

22 Image segmentation For every pair of adjacent pixels X,Y we have a factor Ф(X,Y): We determine Ф(b,b), Ф(f,f), Ф(b,f), Ф(f,b) Y X

23 Image segmentation We perform a MAP query: Y X

24 3D segmentation Consecutive video frames are adjacent layers in 3D grid

25 3D segmentation

26 Stereo vision – computing depths Left camera Right camera Real world point 1 Image 1 Image 2 Disparity 1 = x 1 -y 1 y1y1 x1x1

27 Stereo vision – computing depths y1y1 Real world point 2 Disparity 1 = x 1 -y 1 x1x1 Left camera Right camera Real world point 1 Image 1 Image 2 y2y2 x2x2 Disparity 1 = x 2 -y 2

28 Stereo vision – computing depths Disparity is usually a small number of pixels We want to label each pixel with its disparity A multi-label problem

29 Stereo vision – computing depths Compute Ф p (d) for each pixel p: how likely is p to have disparity d ? p Compute for each pair of adjacent pixels p,q, Ф p,q (d 1,d 2 ): how likely are p and q to have disparities d 1 and d 2, respectively ? x+d x

30 Interactions between proteins I(p 1,p 2 ) I(p 1,p 3 ) I(p 2,p 3 ) L(p 1,S) L(p 2,S) L(p 3,S)

31 Interactions between proteins I(p 1,p 2 ) I(p 1,p 3 ) I(p 2,p 3 ) L(p 1,S) L(p 2,S) L(p 3,S) L(p 1,A) L(p 2,A) L(p 3,A)

32 Interactions between proteins I(p 1,p 2 ) I(p 1,p 3 ) I(p 2,p 3 ) L(p 1,S) L(p 2,S) L(p 3,S) L(p 1,A) L(p 2,A) L(p 3,A) Ex 1 (p 2,p 3 ) Ex 2 (p 2,p 3 ) Ex 2 (p 1,p 3 )

33 Inference (chapters 9-13) Given a PGM we want to compute – Conditional probability query: P(Y|E=e). The probability distribution on the values of Y given that E=e. – Maximum a posteriori (MAP): argmax y P(y|E=e). The assignment y to Y that maximizes P(y|E=e). (Most Probable Explanation (MPE) when Y consists of all remaining variables)

34 Complexity (chapter 9) These problems are NP-hard…sometimes even to approximate

35 Exact Solutions An example:

36 Exact Solutions – variable elimination P(X 1 ) ?

37 Exact Solutions – variable elimination

38 Explicit Computation Sum all rows with X 1 =1 and all rows with X 1 =0 X1X1 X2X2 X3X3 XnXn Ф(X 1,…X n ) 10111000113

39 Variable elimination X2X2 X5X5 X6X6 Ф(X 2,X 5,X 6 )  X2X2 X5X5 m(X 2,X 5 )

40 Variable elimination

41

42 If variables are binary, temporary tables are of size 2 3, global table is of size 2 6

43 Elimination order There are many possible elimination orders Which one do we want to pick ? When we eliminate a variable X, how large is the intermediate table that we create ?

44 Elimination order X

45 The number of variables that X is in factors with is equal to the number of neighbors of X

46 Elimination order To augment the graph such that it reflects the factors after the elimination of X we need to make the neighbors of X into a clique.. Want an elimination order that does not generate large cliques NP-hard to find the optimal There are good heuristics

47 Suppose the graph is a tree X1X1 X2X2 X3X3 X4X4 X5X5 X6X6 X7X7

48 X1X1 X2X2 X3X3 X4X4 X5X5 X6X6

49 X1X1 X2X2 X3X3 X4X4 X6X6

50 X1X1 X2X2 X3X3 X4X4

51 Computing many marginals ? P(X 1 ), P(X 2 ), P(X 3 )…… ? Want to recycle parts of the computation

52 Sum-product algorithm X1X1 X2X2 X3X3 X4X4 X5X5 X6X6 X7X7 m 75 (X 5 ) m 57 (X 7 ) m 63 (X 3 ) m 34 (X 4 ) m 12 (X 2 )

53 Generalizations Junction tree algorithm Belief propagation

54 Sampling methods (chapter 12) Sample from the distribution Estimate P(y|E=e) by its fraction in the samples (for which E=e) How do we sample efficiently ?

55 Sampling methods (chapter 12) Use Markov Chains (with stationary distribution P(Y|E=e) Gibbs chain Metropolis - Hastings Discuss issues like the mixing time of the chain

56 Learning (Chapters 17-20) Find the PGM, given samples d 1,d 2,…,d m from the PGM There are 2 levels of difficulty here: – Graph structure is know, we just estimate the factors – Need to estimate the graph structure as well Sometime values of variables in the sample are missing

57 Learning -- techniques Maximum likelihood estimation – Find the factors that maximize the probability of sampling d 1,d 2,….,d m – Problem usually decomposes for Baysian Networks, harder for Markov Networks Baysian estimation: assume some prior on the model


Download ppt "Probabilistic Graphical Models seminar 15/16 (0368-4511-01) Haim Kaplan Tel Aviv University."

Similar presentations


Ads by Google