Exact Inference in Bayes Nets

Slides:



Advertisements
Similar presentations
Exact Inference. Inference Basic task for inference: – Compute a posterior distribution for some query variables given some observed evidence – Sum out.
Advertisements

Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.
CS498-EA Reasoning in AI Lecture #15 Instructor: Eyal Amir Fall Semester 2011.
Lauritzen-Spiegelhalter Algorithm
Junction Trees And Belief Propagation. Junction Trees: Motivation What if we want to compute all marginals, not just one? Doing variable elimination for.
Dynamic Bayesian Networks (DBNs)
Introduction to Belief Propagation and its Generalizations. Max Welling Donald Bren School of Information and Computer and Science University of California.
Pearl’s Belief Propagation Algorithm Exact answers from tree-structured Bayesian networks Heavily based on slides by: Tomas Singliar,
Belief Propagation by Jakob Metzler. Outline Motivation Pearl’s BP Algorithm Turbo Codes Generalized Belief Propagation Free Energies.
Overview of Inference Algorithms for Bayesian Networks Wei Sun, PhD Assistant Research Professor SEOR Dept. & C4I Center George Mason University, 2009.
Markov Networks.
Exact Inference (Last Class) variable elimination  polytrees (directed graph with at most one undirected path between any two vertices; subset of DAGs)
Chapter 8-3 Markov Random Fields 1. Topics 1. Introduction 1. Undirected Graphical Models 2. Terminology 2. Conditional Independence 3. Factorization.
Junction Trees: Motivation Standard algorithms (e.g., variable elimination) are inefficient if the undirected graph underlying the Bayes Net contains cycles.
From Variable Elimination to Junction Trees
Machine Learning CUNY Graduate Center Lecture 6: Junction Tree Algorithm.
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
A Graphical Model For Simultaneous Partitioning And Labeling Philip Cowans & Martin Szummer AISTATS, Jan 2005 Cambridge.
Bayesian network inference
Bayesian Networks Chapter 2 (Duda et al.) – Section 2.11
1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth.
Inference in Bayesian Nets
Global Approximate Inference Eran Segal Weizmann Institute.
Belief Propagation, Junction Trees, and Factor Graphs
Graphical Models Lei Tang. Review of Graphical Models Directed Graph (DAG, Bayesian Network, Belief Network) Typically used to represent causal relationship.
Belief Propagation Kai Ju Liu March 9, Statistical Problems Medicine Finance Internet Computer vision.
Exact Inference: Clique Trees
Bayesian Networks Alan Ritter.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Computer vision: models, learning and inference
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by B.-H. Kim Biointelligence Laboratory, Seoul National.
Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications.
Belief Propagation. What is Belief Propagation (BP)? BP is a specific instance of a general class of methods that exist for approximate inference in Bayes.
1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 11 th, 2006 Readings: K&F: 8.1, 8.2, 8.3,
1 CMSC 671 Fall 2001 Class #21 – Tuesday, November 13.
An Introduction to Variational Methods for Graphical Models
Intro to Junction Tree propagation and adaptations for a Distributed Environment Thor Whalen Metron, Inc.
CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Oct, 2, 2015.
1 Mean Field and Variational Methods finishing off Graphical Models – Carlos Guestrin Carnegie Mellon University November 5 th, 2008 Readings: K&F:
Belief Propagation and its Generalizations Shane Oldenburger.
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
Inference Algorithms for Bayes Networks
Christopher M. Bishop, Pattern Recognition and Machine Learning 1.
Pattern Recognition and Machine Learning
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Machine Learning – Lecture 18
Daphne Koller Overview Conditional Probability Queries Probabilistic Graphical Models Inference.
Today Graphical Models Representing conditional dependence graphically
Belief propagation with junction trees Presented by Mark Silberstein and Yaniv Hamo.
1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:
Distributed cooperation and coordination using the Max-Sum algorithm
Bayesian Belief Propagation for Image Understanding David Rosenberg.
1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 15 th, 2008 Readings: K&F: 8.1, 8.2, 8.3,
Perfect recall: Every decision node observes all earlier decision nodes and their parents (along a “temporal” order) Sum-max-sum rule (dynamical programming):
Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk
Exact Inference Continued
Markov Networks.
CSCI 5822 Probabilistic Models of Human and Machine Learning
Exact Inference ..
Class #19 – Tuesday, November 3
Exact Inference Eric Xing Lecture 11, August 14, 2010
Exact Inference Continued
Expectation-Maximization & Belief Propagation
Readings: K&F: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7 Markov networks, Factor graphs, and an unified view Start approximate inference If we are lucky… Graphical.
Lecture 3: Exact Inference in GMs
Junction Trees 3 Undirected Graphical Models
Markov Networks.
Variable Elimination Graphical Models – Carlos Guestrin
Advanced Machine Learning
Presentation transcript:

Exact Inference in Bayes Nets

Inference Techniques Exact Inference Approximate Inference Variable elimination Belief propagation (polytrees) Junction tree algorithm (arbitrary graphs) Kalman filte Adams & MacKay changepoint Approximate Inference Loopy belief propagation Rejection sampling Importance sampling Markov Chain Monte Carlo (MCMC) Gibbs sampling Variational approximations Expectation maximization (forward-backward algorithm) Particle filters Later in the semester Polytree: tree with potentially multiple parents

Notation U: set of nodes in a graph Xi: random variable associated with node i πi: parents of node i Joint probability: General form to include undirected as well as directed graphs: where C is an index over cliques to apply to directed graph, turn directed graph into moral graph moral graph: connect all parents of each node and remove arrows

Common Inference Problems Assume partition of graph into subsets O = observations; U = unknowns; N = nuisance variables Computing marginals (avg. over nuisance vars.) Computing MAP probability Given observations O, find distribution over U

Variable Elimination E.g., calculating marginal p(x5) Elimination order: 1, 2, 4, 3 m43(x3) m12(x2) m23(x3) m35(x5)

Variable Elimination E.g., calculating conditional p(x5|x2,x4) m43(x3)

Quiz: Variable Elimination D A B C Quiz: Variable Elimination Elimination order for P(C)? Elimination order for P(D) ?

What If Wrong Order Is Chosen? B C Compute P(B) with order D, E, C, A Compute P(B) with order A, C, D, E For C, eliminate: B, A, D, E For B, eliminate: D, E, C, A

Weaknesses Of Variable Elimination 1. Efficiency of algorithm strongly dependent on choosing the best elimination order NP-hard problem 2. Inefficient if you want to compute multiple queries conditioned on the same evidence.

Message Passing Inference as elimination process → Inference as passing messages along edges of (moral) graph Leads to efficient inference when you want to make multiple inferences, because each message can contribute to more than one marginal.

Message Passing mij(xj): intermediate term i is variable being summed over, j is other variable Note dependence on elimination ordering M12 =

What are these messages? Message from Xi to Xj says, “Xi thinks that Xj belongs in these states with various likelihoods.” Messages are similar to likelihoods non-negative Don’t have to sum to 1, but you can normalize them without affecting results (which adds some numerical stability) large message means that Xi believes that the marginal value of Xj=xj with high probability Result of message passing is a consensus that determines the marginal probabilities of all variables

Belief Propagation (Pearl, 1983) i: node we're sending from j: node we're sending to N(i): neighbors of i N(i)\j: all neighbors of i excluding j e.g., computing marginal probability:

Belief Propagation (Pearl, 1983) i: node we're sending from j: node we're sending to Start with i = leaf nodes of undirected graph (nodes with one edge) N(i)\j = Ø Tree structure guarantees each node i can collect messages from all N(i)\j before passing message on to j

Computing MAP Probability Same operation with summation replaced by max

Polytrees Can do exact inference via belief propagation and variable elimination for polytrees Polytree Directed graph with at most one undirected path between two vertices DAG with no undirected cycles If there were undirected cycles, message passing would produce infinite loops

Efficiency of Belief Propagation With trees, BP terminates after two steps 1 step to propagate information from outside in 1 step to propagate information from inside out boils down to calculation like variable elimination over all eliminations With polytrees, belief propagation converges in time linearly related to diameter of net but multiple iterations are required (not 1 pass as for trees) polynomial In number of states of each variable diameter = longest path between two leaves

Inference Techniques Exact Inference Approximate Inference Variable elimination Belief propagation (polytrees) Junction tree algorithm (arbitrary graphs) Kalman filte Adams & MacKay changepoint Approximate Inference Loopy belief propagation Rejection sampling Importance sampling Markov Chain Monte Carlo (MCMC) Gibbs sampling Variational approximations Expectation maximization (forward-backward algorithm) Particle filters Later in the semester

Junction Tree Algorithm Works for general graphs not just trees but also graphs with cycles both directed and undirected Basic idea Eliminate cycles by clustering nodes into cliques Perform belief propagation on cliques Exact inference of (clique) marginals

Junction Tree Algorithm 1. Moralization If graph is directed, turn it into an undirected graph by linking parents of each node and dropping arrows 2. Triangulation Decide on elimination order. Imagine removing nodes in order and adding a link between remaining neighbors of node i when node i is removed. e.g., elimination order (5, 4, 3, 2)

Junction Tree Algorithm 3. Construct the junction tree one node for every maximal clique form maximal spanning tree of cliques clique tree is a junction tree if for every pair of cliques V and W, then all cliques on the (unique) path between V and W contain V∩W If this property holds, then local propagation of information will lead to global consistency.

Junction Tree Algorithm This is a junction tree. This is not a junction tree.

Junction Tree Algorithm 4. Transfer the potentials from original graph to moral graph define a potential for each clique, ψC(xC)

Junction Tree Algorithm 5. Propagate Given junction tree and potentials on the cliques, can send messages from clique Ci to Cj Sij: nodes shared by i and j N(i): neighboring cliques of i Messages get sent in all directions. Once messages propagated, can determine the marginal prob of any clique.

Computational Complexity of Exact Inference Exponential in number of nodes in a clique need to integrate over all nodes Goal is to find a triangulation that yields the smallest maximal clique NP-hard problem →Approximate inference

Loopy Belief Propagation Instead of making a single pass from the leaves, perform iterative refinement of message variables. initialize all variables to 1 recompute all variables assuming the values of the other variables iterate until convergence For polytrees, guaranteed to converge in time ~ longest undirected path through tree. For general graphs, some sufficiency conditions, and some graphs known not to converge.