Slide 1 Directed Graphical Probabilistic Models: inference William W. Cohen Machine Learning 10-601 Feb 2008.

Slides:



Advertisements
Similar presentations
Markov Networks Alan Ritter.
Advertisements

Exact Inference. Inference Basic task for inference: – Compute a posterior distribution for some query variables given some observed evidence – Sum out.
CS498-EA Reasoning in AI Lecture #15 Instructor: Eyal Amir Fall Semester 2011.
Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Atomic events Probabilities Joint distribution Inference by enumeration Independence.
Exact Inference in Bayes Nets
Junction Trees And Belief Propagation. Junction Trees: Motivation What if we want to compute all marginals, not just one? Doing variable elimination for.
Identifying Conditional Independencies in Bayes Nets Lecture 4.
For Monday Read chapter 18, sections 1-2 Homework: –Chapter 14, exercise 8 a-d.
Lirong Xia Approximate inference: Particle filter Tue, April 1, 2014.
For Monday Finish chapter 14 Homework: –Chapter 13, exercises 8, 15.
Pearl’s Belief Propagation Algorithm Exact answers from tree-structured Bayesian networks Heavily based on slides by: Tomas Singliar,
Belief Propagation by Jakob Metzler. Outline Motivation Pearl’s BP Algorithm Turbo Codes Generalized Belief Propagation Free Energies.
Junction Trees: Motivation Standard algorithms (e.g., variable elimination) are inefficient if the undirected graph underlying the Bayes Net contains cycles.
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
Bayesian Network. Introduction Independence assumptions Seems to be necessary for probabilistic inference to be practical. Naïve Bayes Method Makes independence.
M.I. Jaime Alfonso Reyes ´Cortés.  The basic task for any probabilistic inference system is to compute the posterior probability distribution for a set.
Bayesian network inference
1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth.
Belief Propagation, Junction Trees, and Factor Graphs
Example applications of Bayesian networks
10/22  Homework 3 returned; solutions posted  Homework 4 socket opened  Project 3 assigned  Mid-term on Wednesday  (Optional) Review session Tuesday.
Today Logistic Regression Decision Trees Redux Graphical Models
Bayesian Networks Alan Ritter.
CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of.
. DAGs, I-Maps, Factorization, d-Separation, Minimal I-Maps, Bayesian Networks Slides by Nir Friedman.
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
Machine Learning CUNY Graduate Center Lecture 21: Graphical Models.
Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications.
Belief Propagation. What is Belief Propagation (BP)? BP is a specific instance of a general class of methods that exist for approximate inference in Bayes.
For Wednesday Read Chapter 11, sections 1-2 Program 2 due.
Bayesian Statistics and Belief Networks. Overview Book: Ch 13,14 Refresher on Probability Bayesian classifiers Belief Networks / Bayesian Networks.
Bayesian Networks: Independencies and Inference Scott Davies and Andrew Moore Note to other teachers and users of these slides. Andrew and Scott would.
Announcements Project 4: Ghostbusters Homework 7
1 CMSC 671 Fall 2001 Class #21 – Tuesday, November 13.
Computing & Information Sciences Kansas State University Data Sciences Summer Institute Multimodal Information Access and Synthesis Learning and Reasoning.
CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Oct, 2, 2015.
1 Mean Field and Variational Methods finishing off Graphical Models – Carlos Guestrin Carnegie Mellon University November 5 th, 2008 Readings: K&F:
Belief Propagation and its Generalizations Shane Oldenburger.
Bayesian networks and their application in circuit reliability estimation Erin Taylor.
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
Inference Algorithms for Bayes Networks
Christopher M. Bishop, Pattern Recognition and Machine Learning 1.
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
Pattern Recognition and Machine Learning
Reasoning Under Uncertainty: Independence and Inference CPSC 322 – Uncertainty 5 Textbook §6.3.1 (and for HMMs) March 25, 2011.
Machine Learning – Lecture 18
Perceptual and Sensory Augmented Computing Machine Learning, Summer’09 Machine Learning – Lecture 13 Exact Inference & Belief Propagation Bastian.
1 BN Semantics 2 – Representation Theorem The revenge of d-separation Graphical Models – Carlos Guestrin Carnegie Mellon University September 17.
Today Graphical Models Representing conditional dependence graphically
Bayesian Belief Propagation for Image Understanding David Rosenberg.
CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016.
Slide 1 Directed Graphical Probabilistic Models: learning in DGMs William W. Cohen Machine Learning
Knowledge Representation & Reasoning Lecture #5 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro.
Artificial Intelligence Bayes’ Nets: Independence Instructors: David Suter and Qince Li Course Harbin Institute of Technology [Many slides.
Local factors in a graphical model
Latent Dirichlet Allocation
Prof. Adriana Kovashka University of Pittsburgh April 4, 2017
Markov Networks.
CSCI 5822 Probabilistic Models of Human and Machine Learning
CAP 5636 – Advanced Artificial Intelligence
Bayesian Statistics and Belief Networks
Class #19 – Tuesday, November 3
CS 188: Artificial Intelligence
Directed Graphical Probabilistic Models: the sequel
Class #16 – Tuesday, October 26
Readings: K&F: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7 Markov networks, Factor graphs, and an unified view Start approximate inference If we are lucky… Graphical.
Lecture 3: Exact Inference in GMs
Junction Trees 3 Undirected Graphical Models
Mean Field and Variational Methods Loopy Belief Propagation
Presentation transcript:

Slide 1 Directed Graphical Probabilistic Models: inference William W. Cohen Machine Learning Feb 2008

Slide 2 REVIEW OF DIRECTED GRAPHICAL MODELS AND D- SEPARATION

Slide 3 Summary of Monday(1): Bayes nets Many problems can be solved using the joint probability P(X 1,…,X n ). Bayes nets describe a way to compactly write the joint. For a Bayes net: AB First guess The money C The goat D Stick or swap? E Second guess AP(A) BP(B) ABCP(C|A,B) ………… ACDP(E|A,C,D) ………… Conditional independence:

Slide 4 Conditional Independence Independence: Conditional independence: (Fancy version of c.r.) (Def ’ n of cond. Indep.) Claim: if I then “It’s really a lot like independence”

Slide 5 Summary of Monday(2): d-separation Z Z Z X YE There are three ways paths from X to Y given evidence E can be blocked. X is d-separated from Y given E iff all paths from X to Y given E are blocked If X is d-separated from Y given E, then I All the ways paths can be blocked

Slide 6 d-separation continued… X Y E Question: is ? XP(X) XEP(E|X) EYP(Y|E) EYP(Y|E) It depends…on the CPTs This is why d-separation implies conditional independence but not the converse…

Slide 7 d-separation continued… X Y E Question: is ? Yes!

Slide 8 d-separation continued… X Y E Question: is ? Yes! Bayes rule Fancier version of B.R. From previous slide…

Slide 9 d-separation Z Z Z X YE

Slide 10 d-separation Z Z Z X YE ? ?

Slide 11 d-separation continued… X Y E Question: is ? Yes!

Slide 12 d-separation continued… X Y E Question: is ? No EP(E)

Slide 13 d-separation Z Z Z X YE ? ?

Slide 14 d-separation continued… X Y E Question: is ? Yes!

Slide 15 d-separation continued… X Y E Question: is ? XYEP(E|X,Y)P(E,X,Y)

Slide 16 d-separation continued… X Y E Question: is ? No! XYEP(E|X,Y)P(E,X,Y)

Slide 17 d-separation continued… X Y E Question: is ? No! XYEP(E|X,Y)P(E,X,Y)

Slide 18 “ Explaining away ” X Y E NO YES This is “ explaining away ” : E is common symptom of two causes, X and Y After observing E=1, both X and Y become more probable After observing E=1 and X=1, Y becomes less probable (compared to just E=1) since X alone is enough to “ explain ” E=1

Slide 19 INFERENCE IN DGM from: Russell and Norvig

____ ____ __ ______ ______ ____ ____ __ ______ ______ Great Ideas in ML: Message Passing 3 behind you 2 behind you 1 behind you 4 behind you 5 behind you 1 before you 2 before you there's 1 of me 3 before you 4 before you 5 before you Count the soldiers 20 adapted from MacKay (2003) textbook Thanks Matt Gormley

Great Ideas in ML: Message Passing 3 behind you 2 before you there's 1 of me Belief: Must be = 6 of us only see my incoming messages 231 Count the soldiers 21 adapted from MacKay (2003) textbook 2 before you Thanks Matt Gormley

Great Ideas in ML: Message Passing 4 behind you 1 before you there's 1 of me only see my incoming messages Count the soldiers 22 adapted from MacKay (2003) textbook Belief: Must be = 6 of us 231 Belief: Must be = 6 of us 141 Thanks Matt Gormley

Great Ideas in ML: Message Passing 7 here 3 here 11 here (= 7+3+1) 1 of me Each soldier receives reports from all branches of tree 23 adapted from MacKay (2003) textbook

Great Ideas in ML: Message Passing 3 here 7 here (= 3+3+1) Each soldier receives reports from all branches of tree 24 adapted from MacKay (2003) textbook

Great Ideas in ML: Message Passing 7 here 3 here 11 here (= 7+3+1) Each soldier receives reports from all branches of tree 25 adapted from MacKay (2003) textbook Thanks Matt Gormley

Great Ideas in ML: Message Passing 7 here 3 here Belief: Must be 14 of us Each soldier receives reports from all branches of tree 26 adapted from MacKay (2003) textbook Thanks Matt Gormley

Great Ideas in ML: Message Passing Each soldier receives reports from all branches of tree 7 here 3 here Belief: Must be 14 of us wouldn't work correctly with a 'loopy' (cyclic) graph 27 adapted from MacKay (2003) textbook Thanks Matt Gormley

Message Passing and Inference Message passing: – Handles the “simple” (and tractable) case of polytree* exactly – Handles intractable cases approximately – Often an important part of exact algorithms for more complex cases 28

Message Passing and Counting Message passing is almost counting Instead of passing counts, we’ll be passing probability distributions (beliefs) of various types 29

Inference in Bayes Nets The belief propagation algorithm 30

Slide 31 Inference in Bayes nets General problem: given evidence E 1,…,E k compute P(X|E 1,..,E k ) for any X Big assumption: graph is “ polytree ” <=1 undirected path between any nodes X,Y Notation: X Y1Y1 Y2Y2 Z2Z2 Z1Z1 U2U2 U1U1

Great Ideas in ML: Message Passing Each soldier receives reports from all branches of tree 7 here 3 here Belief: Must be 14 of us wouldn't work correctly with a 'loopy' (cyclic) graph 32 adapted from MacKay (2003) textbook Thanks Matt Gormley

Slide 33 Hot* or not? AB First guess The money C The goat D Stick or swap? E *Hot = polytree (Must be one undirected path between every pair of nodes)

Slide 34 Inference in Bayes nets: P(X|E) E+: causal support E-: evidential support E+: causal support E-: evidential support

Slide 35 Inference in Bayes nets: P(X|E + ) d-sep. d-sep – write as product

Slide 36 Inference in Bayes nets: P(X|E + ) CPT table lookup Recursive call to P(. |E + ) So far: simple way of propagating requests for “ belief due to causal evidence ” up the tree Evidence for Uj that doesn’t go thru X I.e. info on Pr(X|E+) flows down

Slide 37 Inference in Bayes nets: P(X|E)

Slide 38 Inference in Bayes nets: P(E - |X) simplified d-sep + polytree Recursive call to P(E - |. ) So far: simple way of propagating requests for “ belief due to evidential support ” down the tree I.e. info on Pr(E-|X) flows up

Slide 39 Inference in Bayes nets: P(E - |X) simplified Recursive call to P(E - |. ) Usual implementation is message passing: Send values for P(E-|X) up the tree (vertex to parent) Wait for all children’s msgs before sending Send values for P(X|E+) down the tree (parent to child) Wait for all parent’s msgs before sending Compute P(X|E) when after all msgs to X are recieved Usual implementation is message passing: Send values for P(E-|X) up the tree (vertex to parent) Wait for all children’s msgs before sending Send values for P(X|E+) down the tree (parent to child) Wait for all parent’s msgs before sending Compute P(X|E) when after all msgs to X are recieved Recursive call to P(. |E + )

Slide 40 recursion Inference in Bayes nets: P(E - |X) our decomposition d-sep

Slide 41 Inference in Bayes nets: P(E - |X)

Slide 42 where Recursive call to P(E - |. ) CPT Recursive call to P(. |E ) Inference in Bayes nets: P(E - |X)

Slide 43 Recap: Message Passing for BP We reduced P(X|E) to product of two recursively calculated parts: P(X=x|E + ) i.e., CPT for X and product of “ forward ” messages from parents P(E - |X=x) i.e., combination of “ backward ” messages from parents, CPTs, and P(Z|E Z\Yk ), a simpler instance of P(X|E) This can also be implemented by message-passing (belief propagation) Messages are distributions – i.e., vectors

Slide 44 Recap: Message Passing for BP Top-level algorithm Pick one vertex as the “root” Any node with only one edge is a “leaf” Pass messages from the leaves to the root Pass messages from root to the leaves Now every X has received P(X|E+) and P(E-|X) and can compute P(X|E)

Slide 45 From Russell and Norvig

46 Pr(X2=1|X1=1) Pr(X3=1|X2=2) ~ count the total weight of paths from E to X Evidential support Causal support

Slide 47 More on message passing/BP BP for other graphical models Markov networks Factor graphs BP for non-polytrees: Small tree-width graphs Loopy BP

Slide 48 Markov blanket The Markov blanket for a random variable A is the set of variables B 1,…,B k that A is not conditionally independent of I.e., not d-separated from For DGM this includes parents, children, and “co-parents” (other parents of children) why? explaining away

Slide 49 Markov network A Markov network is a set of random variables in an undirected graph where each variable is conditionally independent of all other variables given its neighbors E.g: I

Slide 50 Markov networks A Markov network is a set of random variables in an undirected graph where each variable is conditionally independent of all other variables given its neighbors E.g: I Instead of CPTs there are clique potentials CE ϕ So the Markov blanket, d-sep stuff is much simpler But there’s no “generative” story

Slide 51 Markov networks Instead of CPTs there are clique potentials which define a joint CE ϕ ABD ϕ ………… 1111 clique C vars in clique C these value mapped to these variable indices these values mapped to these variable indices

Slide 52 More on message passing/BP BP can be extended to other graphical models Markov networks, if they are polytrees “Factor graphs”---which are a generalization of both Markov networks and DGMs Arguably cleaner than DGM BP BP for non-polytrees: Small tree-width graphs Loopy BP

Slide 53 Small tree-width graphs AB First guess The money C The goat D Stick or swap? E Not a polytree B The money ACE Guesses A,E and the goat C D Stick or swap? A polytree BDACEP(ACE|B,D) 001,1,1… 001,1,2… ………… 003,3,3… 011,1,1… …………

Slide 54 Small tree-width graphs AB First guess The money C The goat D E Not a polytree B The money ACE Guesses A,E and the goat C D A polytree X … W …… X … W ……

Slide 55 More on message passing/BP BP can be extended to other graphical models Markov networks, if they are polytrees “Factor graphs” BP for non-polytrees: Small tree-width graphs: convert to polytrees and run normal BP Loopy BP

Great Ideas in ML: Message Passing Each soldier receives reports from all branches of tree 7 here 3 here Belief: Must be 14 of us wouldn't work correctly with a 'loopy' (cyclic) graph 56 adapted from MacKay (2003) textbook Thanks Matt Gormley

Recap: Loopy BP Top-level algorithm – Initialize every X’s messages P(X|E+) and P(E-|X) to vectors of all 1’s. – Repeat: Have every X send its children/parents messages – Which will be incorrect at first For every X, update P(X|E+) and P(E-|X) based on last messages – Which will be incorrect at first – In a tree this will eventually converge to the right values – In a graph if might converge Non-trivial to predict when and if but… it’s often a good approximation