Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications.

Slides:



Advertisements
Similar presentations
CS188: Computational Models of Human Behavior
Advertisements

1 Undirected Graphical Models Graphical Models – Carlos Guestrin Carnegie Mellon University October 29 th, 2008 Readings: K&F: 4.1, 4.2, 4.3, 4.4,
Markov Networks Alan Ritter.
Exact Inference. Inference Basic task for inference: – Compute a posterior distribution for some query variables given some observed evidence – Sum out.
Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.
CS498-EA Reasoning in AI Lecture #15 Instructor: Eyal Amir Fall Semester 2011.
1 Chapter 5 Belief Updating in Bayesian Networks Bayesian Networks and Decision Graphs Finn V. Jensen Qunyuan Zhang Division. of Statistical Genomics,
Graphical Models BRML Chapter 4 1. the zoo of graphical models Markov networks Belief networks Chain graphs (Belief and Markov ) Factor graphs =>they.
Bayesian Networks, Winter Yoav Haimovitch & Ariel Raviv 1.
Introduction to Markov Random Fields and Graph Cuts Simon Prince
Exact Inference in Bayes Nets
Junction Trees And Belief Propagation. Junction Trees: Motivation What if we want to compute all marginals, not just one? Doing variable elimination for.
Dynamic Bayesian Networks (DBNs)
Identifying Conditional Independencies in Bayes Nets Lecture 4.
Undirected Probabilistic Graphical Models (Markov Nets) (Slides from Sam Roweis)
An Introduction to Variational Methods for Graphical Models.
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
Review Markov Logic Networks Mathew Richardson Pedro Domingos Xinran(Sean) Luo, u
Overview of Inference Algorithms for Bayesian Networks Wei Sun, PhD Assistant Research Professor SEOR Dept. & C4I Center George Mason University, 2009.
Markov Networks.
Chapter 8-3 Markov Random Fields 1. Topics 1. Introduction 1. Undirected Graphical Models 2. Terminology 2. Conditional Independence 3. Factorization.
Junction Tree Algorithm Brookes Vision Reading Group.
Junction Trees: Motivation Standard algorithms (e.g., variable elimination) are inefficient if the undirected graph underlying the Bayes Net contains cycles.
From Variable Elimination to Junction Trees
Machine Learning CUNY Graduate Center Lecture 6: Junction Tree Algorithm.
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
Inference in Bayesian Nets
Bayesian Networks Clique tree algorithm Presented by Sergey Vichik.
Belief Propagation, Junction Trees, and Factor Graphs
Graphical Models Lei Tang. Review of Graphical Models Directed Graph (DAG, Bayesian Network, Belief Network) Typically used to represent causal relationship.
Exact Inference: Clique Trees
CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
Machine Learning CUNY Graduate Center Lecture 21: Graphical Models.
CSC2535 Spring 2013 Lecture 2a: Inference in factor graphs Geoffrey Hinton.
Belief Propagation. What is Belief Propagation (BP)? BP is a specific instance of a general class of methods that exist for approximate inference in Bayes.
Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website
1 CMSC 671 Fall 2001 Class #21 – Tuesday, November 13.
An Introduction to Variational Methods for Graphical Models
1 Mean Field and Variational Methods finishing off Graphical Models – Carlos Guestrin Carnegie Mellon University November 5 th, 2008 Readings: K&F:
Lecture 2: Statistical learning primer for biologists
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
Inference Algorithms for Bayes Networks
CPSC 422, Lecture 17Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 17 Oct, 19, 2015 Slide Sources D. Koller, Stanford CS - Probabilistic.
Christopher M. Bishop, Pattern Recognition and Machine Learning 1.
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
Pattern Recognition and Machine Learning
Perceptual and Sensory Augmented Computing Machine Learning, Summer’09 Machine Learning – Lecture 13 Exact Inference & Belief Propagation Bastian.
Today Graphical Models Representing conditional dependence graphically
Belief propagation with junction trees Presented by Mark Silberstein and Yaniv Hamo.
CS 541: Artificial Intelligence Lecture VII: Inference in Bayesian Networks.
Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk
Inference in Bayesian Networks
Artificial Intelligence
CAP 5636 – Advanced Artificial Intelligence
Markov Networks.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18
CSCI 5822 Probabilistic Models of Human and Machine Learning
Markov Networks.
CS 188: Artificial Intelligence
CS 188: Artificial Intelligence Fall 2008
Class #16 – Tuesday, October 26
Readings: K&F: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7 Markov networks, Factor graphs, and an unified view Start approximate inference If we are lucky… Graphical.
Lecture 3: Exact Inference in GMs
Junction Trees 3 Undirected Graphical Models
Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18
Markov Networks.
Presentation transcript:

Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications

Markov networks Undirected graphs (cf. Bayesian networks, which are directed) A Markov network represents the joint probability distribution over events which are represented by variables Nodes in the network represent variables

Markov network structure A table (also called a potential or a factor) could potentially be associated with each complete subgraph in the network graph. Table values are typically nonnegative Table values have no other restrictions –Not necessarily probabilities –Not necessarily < 1

Obtaining the full joint distribution You may also see the formula written with D i replacing X i. The full joint distribution of the event probabilities is the product of all of the potentials, normalized. Notation: ϕ indicates one of the potentials. i i

Normalization constant Z = normalization constant (similar to α in Bayesian inference) Also called the partition function

Steps for calculating the probability distribution Method is similar to Bayesian Network Multiply the distribution of factors (potentials) together to get joint distribution. Normalize table to sum to 1.

Topics for remainder of lecture Relationship between Markov network and Bayesian network conditional dependencies Inference in Markov networks Variations of Markov networks

Independence in Markov networks Two nodes in a Markov network are independent if and only if every path between them is cut off by evidence Nodes B and D are independent or separated from node E

Markov blanket In a Markov network, the Markov blanket of a node consists of that node and its neighbors

Converting between a Bayesian network and a Markov network Same data flow must be maintained in the conversion Sometimes new dependencies must be introduced to maintain data flow When converting to a Markov net, the dependencies of Markov net must be a superset of the Bayes net dependencies. –I(Bayes) ⊆ I(Markov) When converting to a Bayes net the dependencies of Bayes net must be a superset of the Markov net dependencies. –I( Markov ) ⊆ I(Bayes)

Convert Bayesian network to Markov network Maintain I(Bayes) ⊆ I(Markov) Structure must be able to handle any evidence. Address data flow issue: –With evidence at D Data flows between B and C in Bayesian network Data does not flow between B and C in Markov network Diverging and linear connections are same for Bayes and Markov Problem exists only for converging connections

Convert Bayesian network to Markov network 1.Maintain structure of the Bayes Net 2.Eliminate directionality 3.Moralize

Convert Markov network to Bayesian network Maintain I( Markov ) ⊆ I(Bayes) Address data flow issues –If evidence exists at A Data can flow from B to C in Bayesian net Data cannot flow from B to C in Markov net Problem exists for diverging connections

Convert Bayesian network to Markov network 1.Triangulate graph –This guarantees representation of all independencies

Convert Bayesian network to Markov network 2.Add directionality –Do topological sort of nodes and number as you go. –Add directionality in direction of sort

Variable elimination in Markov networks ϕ represents a potential Potential tables must be over complete subgraphs in a Markov network

Variable elimination in Markov networks Example: P(D | ¬ c) At any table which mentions c, set entries which contradict evidence (¬ c) to 0 Combine and marginalize potentials same as for Bayesian network variable elimination

Junction trees for Markov networks Don’t moralize Must triangulate Rest of algorithm is the same as for Bayesian networks

Gibbs sampling for Markov networks Example: P(D | ¬ c) Resample non-evidence variables in a pre-defined order or a random order Suppose we begin with A –B and C are Markov blanket of A –Calculate P(A | B,C) –Use current Gibbs sampling value for B & C –Note: never change (evidence). ABCDEF

Example: Gibbs sampling Resample probability distribution of A a ¬a c12 ¬c 34 a ¬a b15 ¬b a ¬a 21 ABCDEF ?00110 a Normalized result = a ¬a

Example: Gibbs sampling Resample probability distribution of B d ¬d b12 ¬b 21 a ¬a b15 ¬b ABCDEF ?0110 b ¬b 18.6 Normalized result = b ¬b

Loopy Belief Propagation Cluster graphs with undirected cycles are “loopy” Algorithm not guaranteed to converge In practice, the algorithm is very effective

Loopy Belief Propagation We want one node for every potential: Moralize the original graph Do not triangulate One node for every clique Markov Network

Running intersection property Every variable in the intersection between two nodes must be carried through every node along exactly one path between the two nodes. Similar to junction tree property (weaker) See also K&F p 347

Running intersection property Variables may be eliminated from edges so that clique graph does not violate running intersection property This may result in a loss of information in the graph

Special cases of Markov Networks Log linear models Conditional random fields (CRF)

Log linear model Normalization:

Log linear model Rewrite each potential as: OR Where For every entry V in Replace V with ln V

Log linear models Use negative natural log of each number in a potential Allows us to replace potential table with one or more features Each potential is represented by a set of features with associated weights Anything that can be represented in a log linear model can also be represented in a Markov model

Log linear model probability distribution

Log linear model Example feature f i : b → a When the feature is violated, then weight = e -w, otherwise weight = 1 a ¬a be 0 = 1e -w ¬b e 0 = 1 a ¬a bewew 1 ¬b ewew ewew Is proportional to.. α

Trivial Example f 1 : a ∧ b, -ln V 1 f 2 : ¬ a ∧ b, -ln V 2 f 3 : a ∧ ¬ b, -ln V 3 f 4 : ¬ a ∧ ¬ b, -ln V 4 Features are not necessarily mutually exclusive as they are in this example In a complete setting, only one feature is true. Features are binary: true or false a ¬a bV1V1 V2V2 ¬b V3V3 V4V4

Trivial Example (cont)

Markov Conditional Random Field (CRF) Focuses on the conditional distribution of a subset of variables. ϕ 1 ( D 1 )… ϕ m ( D m ) represent the factors which annotate the network. Normalization constant is only difference between this and standard Markov definition