Gated Graphs and Causal Inference

Slides:



Advertisements
Similar presentations
Preliminary Results (Synthetic Data) We generate a random 4-ary MRF and we sample training and test data. We forget the structure and start learning with.
Advertisements

CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.
A Tutorial on Learning with Bayesian Networks
BAYESIAN NETWORKS Ivan Bratko Faculty of Computer and Information Sc. University of Ljubljana.
Probabilistic Reasoning Bayesian Belief Networks Constructing Bayesian Networks Representing Conditional Distributions Summary.
Bayesian Network and Influence Diagram A Guide to Construction And Analysis.
1 Some Comments on Sebastiani et al Nature Genetics 37(4)2005.
BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.
Dynamic Bayesian Networks (DBNs)
For Monday Read chapter 18, sections 1-2 Homework: –Chapter 14, exercise 8 a-d.
For Monday Finish chapter 14 Homework: –Chapter 13, exercises 8, 15.
Introduction to Belief Propagation and its Generalizations. Max Welling Donald Bren School of Information and Computer and Science University of California.
Introduction of Probabilistic Reasoning and Bayesian Networks
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
Pearl’s Belief Propagation Algorithm Exact answers from tree-structured Bayesian networks Heavily based on slides by: Tomas Singliar,
Belief Propagation by Jakob Metzler. Outline Motivation Pearl’s BP Algorithm Turbo Codes Generalized Belief Propagation Free Energies.
Causal Networks Denny Borsboom. Overview The causal relation Causality and conditional independence Causal networks Blocking and d-separation Excercise.
Belief Propagation on Markov Random Fields Aggeliki Tsoli.
Chapter 8-3 Markov Random Fields 1. Topics 1. Introduction 1. Undirected Graphical Models 2. Terminology 2. Conditional Independence 3. Factorization.
Variational Inference and Variational Message Passing
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth.
A Differential Approach to Inference in Bayesian Networks - Adnan Darwiche Jiangbo Dang and Yimin Huang CSCE582 Bayesian Networks and Decision Graph.
Learning with Bayesian Networks David Heckerman Presented by Colin Rickert.
Part 2 of 3: Bayesian Network and Dynamic Bayesian Network.
Third Generation Machine Intelligence Christopher M. Bishop Microsoft Research, Cambridge Microsoft Research Summer School 2009.
Bayesian Networks Alan Ritter.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Bayes Nets. Bayes Nets Quick Intro Topic of much current research Models dependence/independence in probability distributions Graph based - aka “graphical.
A Differential Approach to Inference in Bayesian Networks - Adnan Darwiche Jiangbo Dang and Yimin Huang CSCE582 Bayesian Networks and Decision Graphs.
Causal Models, Learning Algorithms and their Application to Performance Modeling Jan Lemeire Parallel Systems lab November 15 th 2006.
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
Causal Modeling for Anomaly Detection Andrew Arnold Machine Learning Department, Carnegie Mellon University Summer Project with Naoki Abe Predictive Modeling.
Bayes’ Nets  A Bayes’ net is an efficient encoding of a probabilistic model of a domain  Questions we can ask:  Inference: given a fixed BN, what is.
Made by: Maor Levy, Temple University  Probability expresses uncertainty.  Pervasive in all of Artificial Intelligence  Machine learning 
A Brief Introduction to Graphical Models
Towards Highly Reliable Enterprise Network Services via Inference of Multi-level Dependencies Paramvir Bahl, Ranveer Chandra, Albert Greenberg, Srikanth.
Soft Computing Lecture 17 Introduction to probabilistic reasoning. Bayesian nets. Markov models.
Bayesian Learning By Porchelvi Vijayakumar. Cognitive Science Current Problem: How do children learn and how do they get it right?
1 Structured Region Graphs: Morphing EP into GBP Max Welling Tom Minka Yee Whye Teh.
For Wednesday Read Chapter 11, sections 1-2 Program 2 due.
Direct Message Passing for Hybrid Bayesian Networks Wei Sun, PhD Assistant Research Professor SFL, C4I Center, SEOR Dept. George Mason University, 2009.
Probabilistic Reasoning ECE457 Applied Artificial Intelligence Spring 2007 Lecture #9.
Introduction to Bayesian Networks
Learning Linear Causal Models Oksana Kohutyuk ComS 673 Spring 2005 Department of Computer Science Iowa State University.
Course files
Learning With Bayesian Networks Markus Kalisch ETH Zürich.
1 CMSC 671 Fall 2001 Class #21 – Tuesday, November 13.
Daphne Koller Message Passing Belief Propagation Algorithm Probabilistic Graphical Models Inference.
INTERVENTIONS AND INFERENCE / REASONING. Causal models  Recall from yesterday:  Represent relevance using graphs  Causal relevance ⇒ DAGs  Quantitative.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Approximate Inference: Decomposition Methods with Applications to Computer Vision Kyomin Jung ( KAIST ) Joint work with Pushmeet Kohli (Microsoft Research)
Belief Propagation and its Generalizations Shane Oldenburger.
CIAR Summer School Tutorial Lecture 1b Sigmoid Belief Nets Geoffrey Hinton.
Wei Sun and KC Chang George Mason University March 2008 Convergence Study of Message Passing In Arbitrary Continuous Bayesian.
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
Pattern Recognition and Machine Learning
Reasoning Under Uncertainty: Independence and Inference CPSC 322 – Uncertainty 5 Textbook §6.3.1 (and for HMMs) March 25, 2011.
Bayes network inference  A general scenario:  Query variables: X  Evidence (observed) variables and their values: E = e  Unobserved variables: Y 
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016.
INTRODUCTION TO Machine Learning 2nd Edition
CS 2750: Machine Learning Directed Graphical Models
Extending Expectation Propagation for Graphical Models
Today.
Exam Preparation Class
Generalized Belief Propagation
CS 188: Artificial Intelligence
Expectation-Maximization & Belief Propagation
Class #16 – Tuesday, October 26
Presentation transcript:

Gated Graphs and Causal Inference John Winn Microsoft Research, Cambridge with lots of input from Tom Minka Networks: Processes and Causality, September 2012

Outline Graphical models of mixtures Gated graphs d-separation in gated graphs Inference in gated graphs Modelling interventions with gated graphs Causal inference with gated graphs

A mixture of two Gaussians 𝑃 𝑋 =𝑃 𝐶=1 𝑁 𝑋 𝜇 1 , 𝜎 1 2 + 𝑃 𝐶=2 𝑁 𝑋 𝜇 2 , 𝜎 2 2 C=1 C=2 𝑃 𝑋 𝑋

Mixture as a Bayesian Network 𝑃 𝑋|𝐶, 𝜇 1 , 𝜇 2 , 𝜎 1 , 𝜎 2 =𝛿 𝐶=1 𝑁 𝑋 𝜇 1 , 𝜎 1 2 + 𝛿 𝐶=2 𝑁 𝑋 𝜇 2 , 𝜎 2 2 All structure is lost!

Mixture as a Factor Graph 𝑃 𝑋|𝐶, 𝜇 1 , 𝜇 2 , 𝜎 1 , 𝜎 2 = 𝑁 𝑋 𝜇 1 , 𝜎 1 2 𝛿(𝐶=1) 𝑁 𝑋 𝜇 2 , 𝜎 2 2 𝛿(𝐶=2) Context-specific independence is lost!

Mixture as a Gated Graph 𝑃 𝑋|𝐶, 𝜇 1 , 𝜇 2 , 𝜎 1 , 𝜎 2 = 𝑁 𝑋 𝜇 1 , 𝜎 1 2 𝛿(𝐶=1) 𝑁 𝑋 𝜇 2 , 𝜎 2 2 𝛿(𝐶=2) Context-specific independence is retained!

gated graphs

The Gate Gate Selector variable Key Key 𝑖 𝑓 𝑖 𝑋 𝑖 𝛿(𝑐=𝑘𝑒𝑦) Gate: 𝑖 𝑓 𝑖 𝑋 𝑖 𝛿(𝑐=𝑘𝑒𝑦) Gate: Contained factor(s) Selector variable Contained factor(s) [Minka & Winn, Gates. NIPS 2009]

Mixture of Gaussians 𝑃 𝑋|𝐶 = 𝑁 𝑋 𝜇 1 , 𝜎 1 2 𝛿(𝐶=1) 𝑁 𝑋 𝜇 2 , 𝜎 2 2 𝛿(𝐶=2) Gate block

Mixture of Gaussians 𝑃 𝑋|𝐶 = 𝑁 𝑋 𝜇 1 , 𝜎 1 2 𝛿(𝐶=1) 𝑁 𝑋 𝜇 2 , 𝜎 2 2 𝛿(𝐶=2) Gate block

Mixture of Gaussians 𝑃 𝑋|𝐶 = 𝑁 𝑋 𝜇 1 , 𝜎 1 2 𝛿(𝐶=1) 𝑁 𝑋 𝜇 2 , 𝜎 2 2 𝛿(𝐶=2) Gate block

Model Selection Model 1 Model 2

Model Selection Model 1 Model 2

Structure learning Edge presence/absence Variable presence/absence Edge type

Example: image edge model

Example: genetic association study

D-separation in gated graphs

d-separation in factor graphs Tests whether X independent of Y given Z. Criterion 1: Observed node on path Criterion 2: No observed descendant

d-separation with gates Gate selector acts like another parent 𝑿 𝑿 𝑊 𝑿 F T Y 𝑍 F F 𝑍 𝑊 𝑊 𝑍 T T Y Y Criterion 1: Observed node on path Criterion 2: No observed descendant

d-separation with gates Paths are blocked by gates that are off, but pass through gates that are on. 𝒁=T 𝒁=F F F 𝑌 𝑋 𝑌 𝑋 T T Criterion 3 (context-sensitive): Path passes through off gate

d-separation summary New! Criterion 1: Observed node on path Criterion 2: No observed descendant Criterion 3: Path passes through off gate New! Allows new independencies to be detected, (even if they apply only in particular contexts)

Inference in gated graphs

Inference in Gated Graphs Extended forms of standard algorithms: belief propagation expectation propagation variational message passing Gibbs sampling Algorithms become more accurate + more efficient by exploiting conditional independencies. Free software at http://research.microsoft.com/infernet [Minka & Winn, Gates. NIPS 2009]

BP in factor graphs 𝑚 𝑖→𝑓 ( 𝑋 𝑖 )= 𝑎≠𝑓 𝑚 𝑎→𝑖 ( 𝑋 𝑖 ) Variable to factor 𝑚 𝑖→𝑓 ( 𝑋 𝑖 )= 𝑎≠𝑓 𝑚 𝑎→𝑖 ( 𝑋 𝑖 ) Factor to variable 𝑚 𝑓→𝑖 ( 𝑋 𝑖 )= 𝑋 𝑓 ∖ 𝑋 𝑖 𝑓( 𝑋 𝑓 ) 𝑗≠𝑖 𝑚 𝑗→𝑓 𝑋 𝑗

BP in a gate block 𝑚 𝑓→𝐶 (𝐶)=𝛿(𝐶=𝑘) 𝑋 𝑓 𝑓( 𝑋 𝑓 ) 𝑗 𝑚 𝑗→𝑓 𝑋 𝑗 ∑ 𝑚 𝐶→𝐺 ∑ Factor fk to selector (evidence) 𝑚 𝑓→𝐶 (𝐶)=𝛿(𝐶=𝑘) 𝑋 𝑓 𝑓( 𝑋 𝑓 ) 𝑗 𝑚 𝑗→𝑓 𝑋 𝑗 Factor fk to variable (after leaving gate) 𝑚 𝑓→𝑖 𝑋 𝑖 = 𝑚 𝑓→𝑖 𝑋 𝑖 . 𝑚 𝑓→𝐶 (𝑘) 𝑚 𝐶→𝐺 𝑘 𝑋 𝑖 ′ 𝑚 𝑓→𝑖 𝑋 𝑖 ′ 𝑚 𝑖→𝑓 𝑋 𝑖 ′ scale factor

Modelling Interventions with gated graphs (yes – I’m finally getting round to talking about causality)

Intervention with Gates doZ False Y Z f True Gate block I

Normal (no intervention) doZ = F F Y Z f T I

Intervention on Z doZ = T F Y Z f T I

Example model

Example model with interventions

do calculus Rules for rewriting P(y| 𝑥 ) in terms of P(𝑦|𝑥) etc. where 𝑥 stands for “an intervention on 𝑥”. P y 𝑥 ,𝑧 =𝑃(𝑦| 𝑥 ) if y independent of z in graph with parent edges of x removed. P y 𝑧 =𝑃(𝑦|𝑧) if y independent of z in graph with child edges of z removed. P y 𝑧 =𝑃(𝑦) if y independent of z in graph with parent edges of z removed if no descendent of z is observed. [Pearl, Causal diagrams for empirical research, Biometrika 1995]

Rule 1: deletion of observations do calculus gates P y 𝑥 ,𝑧 =𝑃(𝑦| 𝑥 ) P(y│𝑑𝑜𝑋=𝑇,𝑧)=𝑃(𝑦|𝑑𝑜𝑋=𝑇) 𝑑𝑜𝑋 =T parents(𝑥) 𝑥 Criterion 3: Gate is off F Remove parent edges of x parents(𝑥) 𝑥 T parents(𝑥) 𝑥

Rule 2: action/observation exchange do calculus gates P y 𝑧 =𝑃(𝑦|𝑧) P(y│𝑑𝑜𝑍=𝑇,𝑧)=𝑃(𝑦|𝑑𝑜𝑍=𝐹,𝑧) Criterion 1: Observed node on path 𝑑𝑜𝑍 𝑧 children(𝑧) F Remove child edges of z parents(𝑧) 𝑧 T 𝑧 children(𝑧) children(𝑧)

Rule 3: deletion of actions do calculus gates P y 𝑧 =𝑃(𝑦) P(y│𝑑𝑜𝑍)=𝑃(𝑦) Criterion 2: No observed descendent parents(𝑧) 𝑧 𝑑𝑜𝑍 F parents(𝑧) 𝑧 parents(𝑧) 𝑧 T desc(𝑧) desc(𝑧)

Rule 3: deletion of actions do calculus gates P y 𝑧 =𝑃(𝑦) P(y│𝑑𝑜𝑍)=𝑃(𝑦) parents(𝑧) 𝑧 𝑑𝑜𝑍 F parents(𝑧) 𝑧 parents(𝑧) 𝑧 T desc(𝑧) desc(𝑧)

do calculus equivalence The three rules of do calculus are a special case of the three d-separation criteria applied to the gated graph of an intervention.

Causal inference with gated graphs

Causal Inference using BP

Causal Inference using BP Intervention on X Posterior for Y

Causal Inference using BP Posterior for Y Intervention on Z

Learning causal structure Does A cause B or B cause A? A, B are binary. f is noisy equality with flip probability q.

Learning causal structure Add gated structure for intervention on B

Learning causal structure

…and without interventions X Y 1 g(r) r 1-r Thanks to Bernhard!

…and without interventions Same algorithm as before

Dominik’s idea

Conclusions Causal reasoning is a special case of probabilistic inference: The rules of do-calculus arise from testing d-separation in the gated graph. Causal inference can be performed using probabilistic inference in the gated graph. Causal structure can be discovered by using gates in two ways: to model interventions and/or to compare alternative structures.

Future directions Imperfect interventions Counterfactuals Partial compliance Mechanism change Counterfactuals Variables that differ in the real and counterfactual worlds lie in different gates Variables common to both worlds lie outside the gates

Thank you!

Imperfect Interventions ‘Fat hand’ Mechanism change Partial compliance