. DAGs, I-Maps, Factorization, d-Separation, Minimal I-Maps, Bayesian Networks Slides by Nir Friedman.

Slides:



Advertisements
Similar presentations
Markov Networks Alan Ritter.
Advertisements

CS498-EA Reasoning in AI Lecture #15 Instructor: Eyal Amir Fall Semester 2011.
Probabilistic Reasoning Bayesian Belief Networks Constructing Bayesian Networks Representing Conditional Distributions Summary.
BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.
BAYESIAN NETWORKS CHAPTER#4 Book: Modeling and Reasoning with Bayesian Networks Author : Adnan Darwiche Publisher: CambridgeUniversity Press 2009.
Identifying Conditional Independencies in Bayes Nets Lecture 4.
Bayesian Networks VISA Hyoungjune Yi. BN – Intro. Introduced by Pearl (1986 ) Resembles human reasoning Causal relationship Decision support system/ Expert.
Chapter 8-3 Markov Random Fields 1. Topics 1. Introduction 1. Undirected Graphical Models 2. Terminology 2. Conditional Independence 3. Factorization.
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) March, 16, 2009.
3/19. Conditional Independence Assertions We write X || Y | Z to say that the set of variables X is conditionally independent of the set of variables.
Bayesian Networks. Motivation The conditional independence assumption made by naïve Bayes classifiers may seem to rigid, especially for classification.
. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:
PGM 2003/04 Tirgul 3-4 The Bayesian Network Representation.
Bayesian Network Representation Continued
. Bayesian Networks Lecture 9 Edited from Nir Friedman’s slides by Dan Geiger from Nir Friedman’s slides.
Goal: Reconstruct Cellular Networks Biocarta. Conditions Genes.
. Inference I Introduction, Hardness, and Variable Elimination Slides by Nir Friedman.
Graphical Models: An Introduction Lise Getoor Computer Science Dept University of Maryland
. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.
Bayesian Networks Alan Ritter.
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
. PGM 2002/3 – Tirgul6 Approximate Inference: Sampling.
1 CMSC 471 Fall 2002 Class #19 – Monday, November 4.
Artificial Intelligence CS 165A Tuesday, November 27, 2007  Probabilistic Reasoning (Ch 14)
Machine Learning CUNY Graduate Center Lecture 21: Graphical Models.
Bayes’ Nets  A Bayes’ net is an efficient encoding of a probabilistic model of a domain  Questions we can ask:  Inference: given a fixed BN, what is.
Made by: Maor Levy, Temple University  Probability expresses uncertainty.  Pervasive in all of Artificial Intelligence  Machine learning 
A Brief Introduction to Graphical Models
Introduction to Bayesian Networks
Learning Linear Causal Models Oksana Kohutyuk ComS 673 Spring 2005 Department of Computer Science Iowa State University.
UIUC CS 598: Section EA Graphical Models Deepak Ramachandran Fall 2004 (Based on slides by Eyal Amir (which were based on slides by Lise Getoor and Alvaro.
Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
1 COROLLARY 4: D is an I-map of P iff each variable X is conditionally independent in P of all its non-descendants, given its parents. Proof  : Each variable.
INTERVENTIONS AND INFERENCE / REASONING. Causal models  Recall from yesterday:  Represent relevance using graphs  Causal relevance ⇒ DAGs  Quantitative.
Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro.
Conditional Probability Distributions Eran Segal Weizmann Institute.
Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):
1 BN Semantics 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 15 th, 2008 Readings: K&F: 3.1, 3.2, –  Carlos.
1 Bayesian Networks (Directed Acyclic Graphical Models) The situation of a bell that rings whenever the outcome of two coins are equal can not be well.
Review: Bayesian inference  A general scenario:  Query variables: X  Evidence (observed) variables and their values: E = e  Unobserved variables: Y.
Exploiting Structure in Probability Distributions Irit Gat-Viks Based on presentation and lecture notes of Nir Friedman, Hebrew University.
1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,
1 BN Semantics 2 – The revenge of d-separation Graphical Models – Carlos Guestrin Carnegie Mellon University September 20 th, 2006 Readings: K&F:
Christopher M. Bishop, Pattern Recognition and Machine Learning 1.
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
Pattern Recognition and Machine Learning
Reasoning Under Uncertainty: Independence and Inference CPSC 322 – Uncertainty 5 Textbook §6.3.1 (and for HMMs) March 25, 2011.
Introduction on Graphic Models
1 BN Semantics 2 – Representation Theorem The revenge of d-separation Graphical Models – Carlos Guestrin Carnegie Mellon University September 17.
Belief Networks Kostas Kontogiannis E&CE 457. Belief Networks A belief network is a graph in which the following holds: –A set of random variables makes.
1 BN Semantics 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 15 th, 2006 Readings: K&F: 3.1, 3.2, 3.3.
. Bayesian Networks Some slides have been edited from Nir Friedman’s lectures which is available at Changes made by Dan Geiger.
1 Day 2: Search June 9, 2015 Carnegie Mellon University Center for Causal Discovery.
CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016.
Knowledge Representation & Reasoning Lecture #5 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro.
Daphne Koller Independencies Bayesian Networks Probabilistic Graphical Models Representation.
Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk
CS 2750: Machine Learning Directed Graphical Models
Qian Liu CSE spring University of Pennsylvania
Approximate Inference
Learning Bayesian Network Models from Data
Bell & Coins Example Coin1 Bell Coin2
Bayesian Networks (Directed Acyclic Graphical Models)
Dependency Models – abstraction of Probability distributions
UIUC CS 497: Section EA Lecture #6
Bayesian Networks Based on
Bayesian Networks Independencies Representation Probabilistic
Markov Random Fields Presented by: Vladan Radosavljevic.
Inference III: Approximate Inference
Presentation transcript:

. DAGs, I-Maps, Factorization, d-Separation, Minimal I-Maps, Bayesian Networks Slides by Nir Friedman

Probability Distributions  Let X 1,…,X n be random variables  Let P be a joint distribution over X 1,…,X n If the variables are binary, then we need O( 2 n ) parameters to describe P Can we do better? u Key idea: use properties of independence

Independent Random Variables  Two variables X and Y are independent if P(X = x|Y = y) = P(X = x) for all values x,y That is, learning the values of Y does not change prediction of X  If X and Y are independent then l P(X,Y) = P(X|Y)P(Y) = P(X)P(Y)  In general, if X 1,…,X n are independent, then P( X 1,…,X n )= P( X 1 )...P(X n ) l Requires O(n) parameters

Conditional Independence u Unfortunately, most of random variables of interest are not independent of each other u A more suitable notion is that of conditional independence  Two variables X and Y are conditionally independent given Z if P(X = x|Y = y,Z=z) = P(X = x|Z=z) for all values x,y,z That is, learning the values of Y does not change prediction of X once we know the value of Z notation: Ind( X ; Y | Z )

Modeling assumptions: Ancestors can effect descendants' genotype only by passing genetic materials through intermediate generations Example: Family trees Noisy stochastic process: Example: Pedigree u A node represents an individual’s genotype Homer Bart Marge LisaMaggie

Markov Assumption u We now make this independence assumption more precise for directed acyclic graphs (DAGs)  Each random variable X, is independent of its non- descendents, given its parents Pa(X)  Formally, Ind(X; NonDesc(X) | Pa(X)) Descendent Ancestor Parent Non-descendent X Y1Y1 Y2Y2

Markov Assumption Example u In this example: l Ind( E; B ) l Ind( B; E, R ) Ind( R; A, B, C | E ) Ind( A; R | B, E ) Ind( C; B, E, R | A) Earthquake Radio Burglary Alarm Call

I-Maps  A DAG G is an I-Map of a distribution P if the all Markov assumptions implied by G are satisfied by P (Assuming G and P both use the same set of random variables) Examples: XYXY

Factorization  Given that G is an I-Map of P, can we simplify the representation of P ? u Example:  Since Ind(X;Y), we have that P(X|Y) = P(X)  Applying the chain rule P(X,Y) = P(X|Y) P(Y) = P(X) P(Y)  Thus, we have a simpler representation of P(X,Y) XY

Factorization Theorem Thm: if G is an I-Map of P, then Proof: u By chain rule:  wlog. X 1,…,X n is an ordering consistent with G u From assumption:  Since G is an I-Map, Ind(X i ; NonDesc(X i )| Pa(X i ))  Hence,  We conclude, P(X i | X 1,…,X i-1 ) = P(X i | Pa(X i ) )

Factorization Example P(C,A,R,E,B) = P(B)P(E|B)P(R|E,B)P(A|R,B,E)P(C|A,R,B,E) versus P(C,A,R,E,B) = P(B) P(E) P(R|E) P(A|B,E) P(C|A) Earthquake Radio Burglary Alarm Call

Consequences  We can write P in terms of “local” conditional probabilities If G is sparse, that is, | Pa(X i ) | < k,  each conditional probability can be specified compactly e.g. for binary variables, these require O(2 k ) params.  representation of P is compact l linear in number of variables

 Let Markov(G) be the set of Markov Independencies implied by G u The decomposition theorem shows G is an I-Map of P  u We can also show the opposite: Thm:  G is an I-Map of P Conditional Independencies

Proof (Outline) Example: X Y Z

Implied Independencies  Does a graph G imply additional independencies as a consequence of Markov(G) u We can define a logic of independence statements u We already seen some axioms: l Ind( X ; Y | Z )  Ind( Y; X | Z ) l Ind( X ; Y 1, Y 2 | Z )  Ind( X; Y 1 | Z ) u We can continue this list..

d-seperation  A procedure d-sep(X; Y | Z, G) that given a DAG G, and sets X, Y, and Z returns either yes or no u Goal: d-sep(X; Y | Z, G) = yes iff Ind(X;Y|Z) follows from Markov(G)

Paths u Intuition: dependency must “flow” along paths in the graph u A path is a sequence of neighboring variables Examples: u R  E  A  B u C  A  E  R Earthquake Radio Burglary Alarm Call

Paths blockage u We want to know when a path is l active -- creates dependency between end nodes l blocked -- cannot create dependency end nodes u We want to classify situations in which paths are active given the evidence.

Blocked Unblocked E R A E R A Path Blockage Three cases: l Common cause l Blocked Active

Blocked Unblocked E C A E C A Path Blockage Three cases: l Common cause l Intermediate cause l Blocked Active

Blocked Unblocked E B A C E B A C E B A C Path Blockage Three cases: l Common cause l Intermediate cause l Common Effect Blocked Active

Path Blockage -- General Case A path is active, given evidence Z, if  Whenever we have the configuration B or one of its descendents are in Z  No other nodes in the path are in Z A path is blocked, given evidence Z, if it is not active. A C B

A l d-sep(R,B) = yes Example E B C R

l d-sep(R,B) = yes l d-sep(R,B|A) = no Example E B A C R

l d-sep(R,B) = yes l d-sep(R,B|A) = no l d-sep(R,B|E,A) = yes Example E B A C R

d-Separation  X is d-separated from Y, given Z, if all paths from a node in X to a node in Y are blocked, given Z. u Checking d-separation can be done efficiently (linear time in number of edges) Bottom-up phase: Mark all nodes whose descendents are in Z X to Y phase: Traverse (BFS) all edges on paths from X to Y and check if they are blocked

Soundness Thm: u If G is an I-Map of P l d-sep( X; Y | Z, G ) = yes u then P satisfies Ind( X; Y | Z ) Informally, u Any independence reported by d-separation is satisfied by underlying distribution

Completeness Thm:  If d-sep( X; Y | Z, G ) = no  then there is a distribution P such that G is an I-Map of P P does not satisfy Ind( X; Y | Z ) Informally, u Any independence not reported by d-separation might be violated by the by the underlying distribution  We cannot determine this by examining the graph structure alone

I-Maps revisited  The fact that G is I-Map of P might not be that useful u For example, complete DAGs A DAG is G is complete is we cannot add an arc without creating a cycle u These DAGs do not imply any independencies u Thus, they are I-Maps of any distribution X1X1 X3X3 X2X2 X4X4 X1X1 X3X3 X2X2 X4X4

Minimal I-Maps A DAG G is a minimal I-Map of P if  G is an I-Map of P  If G’  G, then G’ is not an I-Map of P Removing any arc from G introduces (conditional) independencies that do not hold in P

Minimal I-Map Example u If is a minimal I-Map u Then, these are not I-Maps: X1X1 X3X3 X2X2 X4X4 X1X1 X3X3 X2X2 X4X4 X1X1 X3X3 X2X2 X4X4 X1X1 X3X3 X2X2 X4X4 X1X1 X3X3 X2X2 X4X4

Constructing minimal I-Maps The factorization theorem suggests an algorithm  Fix an ordering X 1,…,X n  For each i, select Pa i to be a minimal subset of {X 1,…,X i-1 }, such that Ind(X i ; {X 1,…,X i-1 } - Pa i | Pa i ) u Clearly, the resulting graph is a minimal I-Map.

Non-uniqueness of minimal I-Map u Unfortunately, there may be several minimal I- Maps for the same distribution l Applying I-Map construction procedure with different orders can lead to different structures E B A C R Original I-Map E B A C R Order: C, R, A, E, B

P-Maps  A DAG G is P-Map (perfect map) of a distribution P if Ind(X; Y | Z) if and only if d-sep(X; Y |Z, G) = yes Notes: u A P-Map captures all the independencies in the distribution u P-Maps are unique, up to DAG equivalence

P-Maps u Unfortunately, some distributions do not have a P-Map u Example: u A minimal I-Map:  This is not a P-Map since Ind(A;C) but d-sep(A;C) = no A B C

Bayesian Networks u A Bayesian network specifies a probability distribution via two components: A DAG G A collection of conditional probability distributions P(X i |Pa i ) u The joint distribution P is defined by the factorization  Additional requirement: G is a minimal I-Map of P

Summary u We explored DAGs as a representation of conditional independencies: l Markov independencies of a DAG Tight correspondence between Markov(G) and the factorization defined by G l d-separation, a sound & complete procedure for computing the consequences of the independencies l Notion of minimal I-Map l P-Maps u This theory is the basis of Bayesian networks