Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Summarized by B.-H. Kim Biointelligence Laboratory, Seoul National.

Similar presentations


Presentation on theme: "Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Summarized by B.-H. Kim Biointelligence Laboratory, Seoul National."— Presentation transcript:

1 Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Summarized by B.-H. Kim Biointelligence Laboratory, Seoul National University http://bi.snu.ac.kr/

2 2(C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/Contents 8.3 Markov Random Field  8.3.1 Conditional independence properties  8.3.2 Factorization properties  8.3.3 Illustration: Image de-noising  8.3.4 Relation to directed graphs 8.4 Inference in Graphical Models  8.4.1 Inference on a chain  8.4.2 Trees  8.4.3 Factor graphs  8.4.4 The sum-product algorithm  8.4.5 The max-sum algorithm  8.4.6 Exact Inference in general graphs  8.4.7 Loopy belief propagation  8.4.8 Learning the graph structure

3 3(C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ Directed graph vs. undirected graph Both graphical model  Specify a factorization (how to express the joint distribution)  Define a set of conditional independence properties Parent - child Local conditional distribution Maximal clique Potential function Chain graphs: graphs that include both directed and undirected links

4 4(C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 8.3.1 Conditional independence properties In directed graphs ‘‘ D-separation’ test: if the paths connecting two sets of nodes are ‘blocked’ SS ubtle case: ‘head-to-head’ nodes In undirected graphs SS imple graph separation (simpler than in directed graphs) CC hecking all the paths btw A and B iif all the paths are blocked by C or not AAfter removing C, if there is any path remaining Markov blanket for an undirected graph Shaded circle: evidence, i.e. observed variables

5 5(C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 8.3.2 Factorization properties A maximal clique  Clique: a subset of the nodes in a graph s.t. there exists a link btw all pairs of nodes in the subset Functions of the maximal cliques become the factors in the decomposition of the joint distribution  Potential function Partition function (normalization constant) Potential functions are not restricted to marginal or conditional distributions Normalization constant: major limitation of undirected graph. But we can overcome when we focus on local conditional distribution

6 6(C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ and are identical 8.3.2 Factorization properties Considering formal connection btw conditional independence and factorization  Restriction: should be strictly positive Hammersley-Clifford theorem Expressing potential functions in exponential form (a graphical model as a filter) : energy function Boltzmann distribution

7 7(C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 8.3.3 Illustration: Image de-noising (1) Setting  Image as a set of ‘binary pixel values’ {-1, +1}  In the observed noisy image  In the unknown noise-free image  Noise: randomly flipping the sign of pixels with some small probability Goal: to recover the original noise-free image (Original image) (noisy-image: 10% noise)

8 8(C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 8.3.3 Illustration: Image de-noising (2) Prior knowledge (when the noise level is small)  Strong correlation between and  Strong correlation between neighboring pixels and Corresponding Markov random field  A simple energy function for the cliques  form :  Bias (preference of one particular sign) :  The complete energy function for the model / joint distribution : Ising model

9 9(C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 8.3.3 Illustration: Image de-noising (3) Image restoration results  Iterated conditional modes (ICM)  Coordinate-wise gradient ascent  Initialization: for all I  Take one node, evaluate the total energy, change the state of the node if it results in lower energy  Repeat till some stopping criterion is satisfied  Graph-cut algorithm  Guaranteed to find the global maximum in Ising model original 10% noise Restored by ICM Restored by graph-cut

10 10(C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 8.3.4 Relation to directed graphs (1) Converting a directed graph to un undirected graph  Case 1: straight line  In this case, the partition function Z=1

11 11(C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 8.3.4 Relation to directed graphs (2) Converting a directed graph to un undirected graph  Case 2: general case. Moralization, ‘marrying the parents’  Add additional undirected links btw all pairs of parents  Drop the arrows  Result in the moral graph  Fully connected -> no conditional independence properties, in contrast to the original directed graph  We should add the fewest extra links to retain the maximum number of independence properties Usage example: Exact inference algorithm Ex) junction tree alg.

12 12(C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 8.3.4 Relation to directed graphs (3) Directed and undirected graphs can express different conditional independence properties specific view: graphical model as a filter (map) D map I map Perfect map = both I&D map filtered Ex) completely disconnected graph is a trivial D map for any distribution Ex) fully connected graph is a trivial I map for any distribution

13 13(C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 8.3.4 Relation to directed graphs (4) D: the set of distributions that can be represented as a perfect map using a directed graph U: ~ using a undirected graph

14 14(C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/Contents 8.3 Markov Random Field  8.3.1 Conditional independence properties  8.3.2 Factorization properties  8.3.3 Illustration: Image de-noising  8.3.4 Relation to directed graphs 8.4 Inference in Graphical Models  8.4.1 Inference on a chain  8.4.2 Trees  8.4.3 Factor graphs  8.4.4 The sum-product algorithm  8.4.5 The max-sum algorithm  8.4.6 Exact Inference in general graphs  8.4.7 Loopy belief propagation  8.4.8 Learning the graph structure

15 15(C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ Introduction / Guidelines Inference in graphical models  Given evidences (some nodes are clamped to observed values)  Wish to compute the posterior distributions of other nodes Inference algorithms in graphical structures  Main idea: propagation of local messages  Exact inference: section 8.4  Sum-product algorithm, max-product algorithm, junction tree algorithm  Approximate inference: chapter 10, 11  Loopy belief propagation + message passing schedule (8.4.7)  Variational methods, sampling methods (Monte Carlo methods) A BD CE ABD BCD CDE

16 16(C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ Graphical interpretation of Bayes’ theorem Given structure: We observe the value of y Goal: infer the posterior distribution over x,  Marginal distribution : a prior over the latent variable x  We can evaluate the marginal distribution  By Bayes’ theorem we can calculate (a) (b) (c)

17 17(C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 8.4.1 Inference on a chain (1) Specific setting  N nodes, each discrete node has K states  => each potential function: K by K table, total (N-1)K 2 parameters Problem: inference the marginal distribution  Naïve implementation  first evaluate the joint distribution and then perform the summations explicitly => K N values for x, exponential growth with N  Efficient algorithm: exploiting the conditional independence  Each summation effectively removes a variable from the distribution `

18 18(C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 8.4.1 Inference on a chain (2) The desired marginal is expressed as following  Key concept of the underlying idea  multiplication is distributive over addition  The computational cost is linear in the length of a chain 3 op.2 op.

19 19(C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 8.4.1 Inference on a chain (3) Powerful interpretation of (8.52)  passing of local messages around on the graph  Recursive evaluation of message A message passed forwards A message passed backwards

20 20(C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 8.4.1 Inference on a chain (4) Evaluation of the marginals for every node in the chain If some of the nodes in the graph are observed  Corresponding variables are clamped => no summation  The joint distribution is multiplied by Calculating the joint distribution for two neighbouring nodes One by one separately => wasteful, duplicated Storing all of the intermediate messages along the way

21 21(C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 8.4.2 Trees Efficient exact inference using local message passing  In case of a chain: linear time in the number of nodes  More general case: trees  Sum-product algorithm A tree in an undirected graph  There is one, and only one, path btw any pair of nodes A tree in a directed graph  Root: single node which has no parents  All other nodes have one parent  Conversion to an undirected graph => undirected tree with no more links added during the moralization step Polytree  A directed graph that have more than one parent,  but there is still only one path btw any two nodes

22 22(C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 8.4.3 Factor graphs (1) Factor graphs  Introducing additional nodes for the factors themselves  Explicit decomposition /factorization  Joint distribution in the form of a product of factors  Factors in directed/undirected graphs  example factor (Factor graphs are bipartite)

23 23(C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 8.4.3 Factor graphs (2) Conversion  An undirected graph => factor graph  A directed graph => factor graph  There can be multiple factor graphs all of which correspond to the same undirected/directed graph

24 24(C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 8.4.3 Factor graphs (3) Converting directed/undirected tree to a factor graph  The result is again a tree (no loops, one and only one path connecting any two nodes) In the case of a directed polytree  To undirected: results in loops due to the moralization step  To factor graphs: we can avoid loops

25 25(C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 8.4.3 Factor graphs (4) Local cycles in a directed graph can be removed on conversion to a factor graph Factor graphs are more specific about the precise form of the factorization No corresponding conditional independence properties

26 26(C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 8.4.4 The sum-product algorithm (0) The sum-product algorithm  allows us to take a joint distribution p(x) expressed as a factor graph and efficiently find marginals over the component variables  Exact inference algorithm that are applicable to tree-structured graphs The max-sum algorithm  A technique to find the most probable state

27 27(C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 8.4.4 The sum-product algorithm (1) Basic setting  Suppose that all of the variables are discrete, and so marginalization corresponds to performing sums (the framework is equally applicable to linear-Gaussian models)  The original graph is un undirected tree or a directed tree or polytree => corresponding factor graph has a tree structure  Goal: exact inference for finding marginals

28 28(C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 8.4.4 The sum-product algorithm (2) Two distinct kinds of message  From factor nodes to variable nodes:  From variable nodes to factor nodes: Factorization: View x as the root

29 29(C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 8.4.4 The sum-product algorithm (3) … Recursive computation of messages Two cases in leaf nodes Each node can send a message towards the root once it has received messages from all of its other neighbours Once the root node has received messages from all of its neighbours, the required marginal can be evaluated

30 30(C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 8.4.4 The sum-product algorithm (4) To find the marginals for every variable node in the graph  Running the algorithm for each node => wasteful  Efficient procedure: by ‘overlaying’ multiple message passing Step 1: arbitrarily pick any node, designate it as the root Step 2: propagate messages from the leaves to the root Step 3: now, the root node received messages from all of its neighbours=>send out messages outwards all the way to the leaves By now, a message have passed in both directions across every link, and every node received a message from all of its neighbours We can readily calculate the marginal distribution for every variable in the graph

31 31(C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 8.4.4 The sum-product algorithm (5) Issue of normalization  If the factor graph was derived from a directed graph  The joint distribution is already correctly normalized  If from un undirected graph  Unknown normalization coefficient 1/Z  We first run the sum-product algorithm to find the corresponding unnormalized marginals => obtain 1/Z after then

32 32(C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 8.4.4 The sum-product algorithm (6-1) A simple example to illustrate the operation of the sum- product algorithm Designate node x 3 as the root. Then leaf nodes are x 1 and x 4

33 33(C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 8.4.4 The sum-product algorithm (6-2) A simple example to illustrate the operation of the sum- product algorithm (cont’d) From leaves to the root From the root to leaves

34 34(C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 8.4.4 The sum-product algorithm (6-3) A simple example to illustrate the operation of the sum- product algorithm (cont’d) Sum-product algorithm applied to a graph of linear-Gaussian variables => Linear dynamical systems (LDS) in chapter 13

35 35(C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 8.4.5 The max-sum algorithm (1) Goal of the algorithm  To find a setting of the variables that has the larges probability  To find the value of that probability  An application of dynamic programming in the context of graphical models Problem description Exchanging the max and product operators results in a much more efficient computation

36 36(C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 8.4.5 The max-sum algorithm (2) In practice, to prevent numerical underflow in products of small probabilities, we take logarithm  Logarithm is a monotonic function  The distributive property is preserved max-sum algorithm

37 37(C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 8.4.5 The max-sum algorithm (3) Finding the configuration of the variables for which the joint distribution attains its maximum value  We need a rather different kind of message passing  keeping track of which values of the variables gave rise to the maximum state of each variable For each state of a given variable, there is a unique state of the previous variable that maximizes the probability => indicated by the lines connecting the nodes by back-tracking we can build a globally consistent maximizing configuration

38 38(C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 8.4.5 The max-sum algorithm (4) The max-sum algorithm, with back-tracking, gives an exact maximizing configuration for the variables provided the factor graph is a tree  Important application: the Viterbi algorithm in HMM (ch. 13) For many practical applications, we have to deal with graphs having loops  Generalization of the message passing framework to arbitrary graph topology => junction tree algorithm

39 39(C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 8.4.6 Exact Inference in general graphs Junction tree algorithm  Refer explanation in the textbook  At its heart is the simple idea that we have used already of exploiting the factorization properties of  the distribution to allow sum and products to be interchanged  So that partial summations can be performed, avoiding having to work directly with the joint distribution

40 40(C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 8.4.7 Loopy belief propagation For many problems of practical interests, we use approximation methods  Variational methods => Ch. 10  Sampling methods, also called Monte Carlo methods => Ch. 11 One simple approach to approximate inference in graphs with loops  Simply apply the sum-product algorithm even though there is no guarantee that it will yield good results: loopy belief propagation  We need to define a message passing schedule  Flooding schedule, serial schedules, pending messages

41 41(C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 8.4.8 Learning the graph structure Learning the graph structure itself from data requires  A space of possible structures  A measure that can be used to score each structure From a Bayesian viewpoint Tough points  Marginalization over latent variables => challenging computational problem  Exploring the space of structures can also be problematic  The # of different graph structures grows exponentially with the # of nodes  Usually we resort to heuristics : score for each model


Download ppt "Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Summarized by B.-H. Kim Biointelligence Laboratory, Seoul National."

Similar presentations


Ads by Google