Readings: K&F: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7 Markov networks, Factor graphs, and an unified view Start approximate inference If we are lucky… Graphical.

Slides:

Advertisements

Similar presentations

Linear Time Methods for Propagating Beliefs Min Convolution, Distance Transforms and Box Sums Daniel Huttenlocher Computer Science Department December,

Advertisements

1 Undirected Graphical Models Graphical Models – Carlos Guestrin Carnegie Mellon University October 29 th, 2008 Readings: K&F: 4.1, 4.2, 4.3, 4.4,

Markov Networks Alan Ritter.

Variational Inference Amr Ahmed Nov. 6 th Outline Approximate Inference Variational inference formulation – Mean Field Examples – Structured VI.

Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.

CS498-EA Reasoning in AI Lecture #15 Instructor: Eyal Amir Fall Semester 2011.

Graphical Models BRML Chapter 4 1. the zoo of graphical models Markov networks Belief networks Chain graphs (Belief and Markov ) Factor graphs =>they.

Bayesian Networks, Winter Yoav Haimovitch & Ariel Raviv 1.

Introduction to Markov Random Fields and Graph Cuts Simon Prince

Exact Inference in Bayes Nets

Parameter Learning in MN. Outline CRF Learning CRF for 2-d image segmentation IPF parameter sharing revisited.

Overview of Inference Algorithms for Bayesian Networks Wei Sun, PhD Assistant Research Professor SEOR Dept. & C4I Center George Mason University, 2009.

Chapter 8-3 Markov Random Fields 1. Topics 1. Introduction 1. Undirected Graphical Models 2. Terminology 2. Conditional Independence 3. Factorization.

Parameter Learning in Markov Nets Dhruv Batra, Recitation 11/13/2008.

Graphical Models Lei Tang. Review of Graphical Models Directed Graph (DAG, Bayesian Network, Belief Network) Typically used to represent causal relationship.

CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of.

Computer vision: models, learning and inference Chapter 10 Graphical Models.

Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications.

1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 11 th, 2006 Readings: K&F: 8.1, 8.2, 8.3,

Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website

1 BN Semantics 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 15 th, 2008 Readings: K&F: 3.1, 3.2, –  Carlos.

1 Mean Field and Variational Methods finishing off Graphical Models – Carlos Guestrin Carnegie Mellon University November 5 th, 2008 Readings: K&F:

Probabilistic Graphical Models seminar 15/16 ( ) Haim Kaplan Tel Aviv University.

Belief Propagation and its Generalizations Shane Oldenburger.

1 Parameter Learning 2 Structure Learning 1: The good Graphical Models – Carlos Guestrin Carnegie Mellon University September 27 th, 2006 Readings:

1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,

Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:

1 BN Semantics 2 – The revenge of d-separation Graphical Models – Carlos Guestrin Carnegie Mellon University September 20 th, 2006 Readings: K&F:

1 Param. Learning (MLE) Structure Learning The Good Graphical Models – Carlos Guestrin Carnegie Mellon University October 1 st, 2008 Readings: K&F:

Christopher M. Bishop, Pattern Recognition and Machine Learning 1.

1 Learning P-maps Param. Learning Graphical Models – Carlos Guestrin Carnegie Mellon University September 24 th, 2008 Readings: K&F: 3.3, 3.4, 16.1,

Pattern Recognition and Machine Learning

1 BN Semantics 2 – Representation Theorem The revenge of d-separation Graphical Models – Carlos Guestrin Carnegie Mellon University September 17.

Today Graphical Models Representing conditional dependence graphically

1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:

Markov Random Fields in Vision

1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 15 th, 2008 Readings: K&F: 8.1, 8.2, 8.3,

1 BN Semantics 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 15 th, 2006 Readings: K&F: 3.1, 3.2, 3.3.

1 BN Semantics 3 – Now it’s personal! Parameter Learning 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 22 nd, 2006 Readings:

Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk

Inference in Bayesian Networks

Exact Inference Continued

CAP 5636 – Advanced Artificial Intelligence

Markov Networks.

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18

CSCI 5822 Probabilistic Models of Human and Machine Learning

General Gibbs Distribution

Readings: K&F: 15.1, 15.2, 15.3, 15.4, 15.5 K&F: 7 (overview of inference) K&F: 8.1, 8.2 (Variable Elimination) Structure Learning in BNs 3: (the good,

Markov Networks.

CS 188: Artificial Intelligence

General Gibbs Distribution

Markov Random Fields Presented by: Vladan Radosavljevic.

Variable Elimination 2 Clique Trees

Parameter Learning 2 Structure Learning 1: The good

Approximate Inference by Sampling

Unifying Variational and GBP Learning Parameters of MNs EM for BNs

BN Semantics 3 – Now it’s personal! Parameter Learning 1

Junction Trees 3 Undirected Graphical Models

Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website

Context-specific independence

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18

Markov Networks.

Variable Elimination Graphical Models – Carlos Guestrin

Mean Field and Variational Methods Loopy Belief Propagation

Generalized Belief Propagation

BN Semantics 2 – The revenge of d-separation

Presentation transcript:

Readings: K&F: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7 Markov networks, Factor graphs, and an unified view Start approximate inference If we are lucky… Graphical Models – 10708 Carlos Guestrin Carnegie Mellon University October 27th, 2006

Factorization in Markov networks Given an undirected graph H over variables X={X1,...,Xn} A distribution P factorizes over H if 9 subsets of variables D1µX,…, DmµX, such that the Di are fully connected in H non-negative potentials (or factors) 1(D1),…, m(Dm) also known as clique potentials such that Also called Markov random field H, or Gibbs distribution over H 10-708 – Carlos Guestrin 2006

Global Markov assumption in Markov networks A path X1 – … – Xk is active when set of variables Z are observed if none of Xi 2 {X1,…,Xk} are observed (are part of Z) Variables X are separated from Y given Z in graph H, sepH(X;Y|Z), if there is no active path between any X2X and any Y2Y given Z The global Markov assumption for a Markov network H is 10-708 – Carlos Guestrin 2006

Representation Theorem for Markov Networks If joint probability distribution P: Then H is an I-map for P If H is an I-map for P and P is a positive distribution Then joint probability distribution P: 10-708 – Carlos Guestrin 2006

Local independence assumptions for a Markov network Separation defines global independencies Pairwise Markov Independence: Pairs of non-adjacent variables are independent given all others Markov Blanket: Variable independent of rest given its neighbors T1 T2 T3 T4 T5 T6 T7 T8 T9 10-708 – Carlos Guestrin 2006

Equivalence of independencies in Markov networks Soundness Theorem: For all positive distributions P, the following three statements are equivalent: P entails the global Markov assumptions P entails the pairwise Markov assumptions P entails the local Markov assumptions (Markov blanket) 10-708 – Carlos Guestrin 2006

Minimal I-maps and Markov Networks A fully connected graph is an I-map Remember minimal I-maps? A “simplest” I-map ! Deleting an edge makes it no longer an I-map In a BN, there is no unique minimal I-map Theorem: In a Markov network, minimal I-map is unique!! Many ways to find minimal I-map, e.g., Take pairwise Markov assumption: If P doesn’t entail it, add edge: 10-708 – Carlos Guestrin 2006

How about a perfect map? Remember perfect maps? independencies in the graph are exactly the same as those in P For BNs, doesn’t always exist counter example: Swinging Couples How about for Markov networks? 10-708 – Carlos Guestrin 2006

Unifying properties of BNs and MNs give you: V-structures, CPTs are conditional probabilities, can directly compute probability of full instantiation but: require acyclicity, and thus no perfect map for swinging couples MNs: give you: cycles, and perfect maps for swinging couples but: don’t have V-structures, cannot interpret potentials as probabilities, requires partition function Remember PDAGS??? skeleton + immoralities provides a (somewhat) unified representation see book for details 10-708 – Carlos Guestrin 2006

What you need to know so far about Markov networks Markov network representation: undirected graph potentials over cliques (or sub-cliques) normalize to obtain probabilities need partition function Representation Theorem for Markov networks if P factorizes, then it’s an I-map if P is an I-map, only factorizes for positive distributions Independence in Markov nets: active paths and separation pairwise Markov and Markov blanket assumptions equivalence for positive distributions Minimal I-maps in MNs are unique Perfect maps don’t always exist 10-708 – Carlos Guestrin 2006

Some common Markov networks and generalizations Pairwise Markov networks A very simple application in computer vision Logarithmic representation Log-linear models Factor graphs 10-708 – Carlos Guestrin 2006

Pairwise Markov Networks All factors are over single variables or pairs of variables: Node potentials Edge potentials Factorization: Note that there may be bigger cliques in the graph, but only consider pairwise potentials T1 T2 T3 T4 T5 T6 T7 T8 T9 10-708 – Carlos Guestrin 2006

A very simple vision application Image segmentation: separate foreground from background Graph structure: pairwise Markov net grid with one node per pixel Node potential: “background color” v. “foreground color” Edge potential: neighbors like to be of the same class 10-708 – Carlos Guestrin 2006

Logarithmic representation Standard model: Log representation of potential (assuming positive potential): also called the energy function Log representation of Markov net: 10-708 – Carlos Guestrin 2006

Log-linear Markov network (most common representation) Feature is some function [D] for some subset of variables D e.g., indicator function Log-linear model over a Markov network H: a set of features 1[D1],…, k[Dk] each Di is a subset of a clique in H two ’s can be over the same variables a set of weights w1,…,wk usually learned from data 10-708 – Carlos Guestrin 2006

Structure in cliques Possible potentials for this graph: A B C 10-708 – Carlos Guestrin 2006

Factor graphs Very useful for approximate inference Bipartite graph: Make factor dependency explicit Bipartite graph: variable nodes (ovals) for X1,…,Xn factor nodes (squares) for 1,…,m edge Xi – j if Xi2 Scope[j] More explicit representation, but exactly equivalent 10-708 – Carlos Guestrin 2006

Exact inference in MNs and Factor Graphs Variable elimination algorithm presented in terms of factors ! exactly the same VE algorithm can be applied to MNs & Factor Graphs Junction tree algorithms also applied directly here: triangulate MN graph as we did with moralized graph each factor belongs to a clique same message passing algorithms 10-708 – Carlos Guestrin 2006

Summary of types of Markov nets Pairwise Markov networks very common potentials over nodes and edges Log-linear models log representation of potentials linear coefficients learned from data most common for learning MNs Factor graphs explicit representation of factors you know exactly what factors you have very useful for approximate inference 10-708 – Carlos Guestrin 2006

What you learned about so far Bayes nets Junction trees (General) Markov networks Pairwise Markov networks Factor graphs How do we transform between them? More formally: I give you an graph in one representation, find an I-map in the other 10-708 – Carlos Guestrin 2006

From Bayes nets to Markov nets SAT Grade Job Letter Intelligence 10-708 – Carlos Guestrin 2006

BNs ! MNs: Moralization Difficulty SAT Grade Happy Job Coherence Letter Intelligence Theorem: Given a BN G the Markov net H formed by moralizing G is the minimal I-map for I(G) Intuition: in a Markov net, each factor must correspond to a subset of a clique the factors in BNs are the CPTs CPTs are factors over a node and its parents thus node and its parents must form a clique Effect: some independencies that could be read from the BN graph become hidden Difficulty SAT Grade Happy Job Coherence Letter Intelligence 10-708 – Carlos Guestrin 2006

From Markov nets to Bayes nets Exam Grade Job Letter Intelligence SAT 10-708 – Carlos Guestrin 2006

MNs ! BNs: Triangulation Exam Grade Job Letter Intelligence SAT Theorem: Given a MN H, let G be the Bayes net that is a minimal I-map for I(H) then G must be chordal Intuition: v-structures in BN introduce immoralities these immoralities were not present in a Markov net the triangulation eliminates immoralities Effect: many independencies that could be read from the MN graph become hidden Exam Grade Job Letter Intelligence SAT 10-708 – Carlos Guestrin 2006

Markov nets v. Pairwise MNs B C Every Markov network can be transformed into a Pairwise Markov net introduce extra “variable” for each factor over three or more variables domain size of extra variable is exponential in number of vars in factor Effect: any local structure in factor is lost a chordal MN doesn’t look chordal anymore 10-708 – Carlos Guestrin 2006

Overview of types of graphical models and transformations between them 10-708 – Carlos Guestrin 2006

Approximate inference overview So far: VE & junction trees exact inference exponential in tree-width There are many many many many approximate inference algorithms for PGMs We will focus on three representative ones: sampling variational inference loopy belief propagation and generalized belief propagation There will be a special recitation by Pradeep Ravikumar on more advanced methods 10-708 – Carlos Guestrin 2006

Approximating the posterior v. approximating the prior Prior model represents entire world world is complicated thus prior model can be very complicated Posterior: after making observations sometimes can become much more sure about the way things are sometimes can be approximated by a simple model First approach to approximate inference: find simple model that is “close” to posterior Fundamental problems: what is close? posterior is intractable result of inference, how can we approximate what we don’t have? Difficulty SAT Grade Happy Job Coherence Letter Intelligence 10-708 – Carlos Guestrin 2006

KL divergence: Distance between distributions Given two distributions p and q KL divergence: D(p||q) = 0 iff p=q Not symmetric – p determines where difference is important p(x)=0 and q(x)0 p(x)0 and q(x)=0 10-708 – Carlos Guestrin 2006

Find simple approximate distribution Suppose p is intractable posterior Want to find simple q that approximates p KL divergence not symmetric D(p||q) true distribution p defines support of diff. the “correct” direction will be intractable to compute D(q||p) approximate distribution defines support tends to give overconfident results will be tractable 10-708 – Carlos Guestrin 2006

Back to graphical models Inference in a graphical model: P(x) = want to compute P(Xi|e) our p: What is the simplest q? every variable is independent: mean-fields approximation can compute any prob. very efficiently 10-708 – Carlos Guestrin 2006

D(p||q) for mean-fields – KL the right way 10-708 – Carlos Guestrin 2006

D(q||p) for mean-fields – KL the reverse direction D(p||q)= 10-708 – Carlos Guestrin 2006

What you need to know so far Goal: Find an efficient distribution that is close to posterior Distance: measure distance in terms of KL divergence Asymmetry of KL: D(p||q)  D(q||p) The right KL is intractable, so we use the reverse KL 10-708 – Carlos Guestrin 2006