Course files

Slides:

Advertisements

Similar presentations

CS188: Computational Models of Human Behavior

Advertisements

A Tutorial on Learning with Bayesian Networks

Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.

Structural Equation Modeling

Bayesian Network and Influence Diagram A Guide to Construction And Analysis.

Weakening the Causal Faithfulness Assumption

Outline 1)Motivation 2)Representing/Modeling Causal Systems 3)Estimation and Updating 4)Model Search 5)Linear Latent Variable Models 6)Case Study: fMRI.

BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.

Introduction of Probabilistic Reasoning and Bayesian Networks

Challenges posed by Structural Equation Models Thomas Richardson Department of Statistics University of Washington Joint work with Mathias Drton, UC Berkeley.

Learning Causality Some slides are from Judea Pearl’s class lecture

From: Probabilistic Methods for Bioinformatics - With an Introduction to Bayesian Networks By: Rich Neapolitan.

Chapter 8-3 Markov Random Fields 1. Topics 1. Introduction 1. Undirected Graphical Models 2. Terminology 2. Conditional Independence 3. Factorization.

Bayesian network inference

Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.

CPSC 322, Lecture 12Slide 1 CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12 (Textbook Chpt ) January, 29, 2010.

Bayesian Network Representation Continued

Learning Equivalence Classes of Bayesian-Network Structures David M. Chickering Presented by Dmitry Zinenko.

Visual Recognition Tutorial

1 gR2002 Peter Spirtes Carnegie Mellon University.

Computer vision: models, learning and inference Chapter 10 Graphical Models.

Causal Models, Learning Algorithms and their Application to Performance Modeling Jan Lemeire Parallel Systems lab November 15 th 2006.

Causal Modeling for Anomaly Detection Andrew Arnold Machine Learning Department, Carnegie Mellon University Summer Project with Naoki Abe Predictive Modeling.

CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.

1 Day 2: Search June 9, 2015 Carnegie Mellon University Center for Causal Discovery.

CAUSAL SEARCH IN THE REAL WORLD. A menu of topics  Some real-world challenges:  Convergence & error bounds  Sample selection bias  Simpson’s paradox.

Bayes Net Perspectives on Causation and Causal Inference

Bayes’ Nets  A Bayes’ net is an efficient encoding of a probabilistic model of a domain  Questions we can ask:  Inference: given a fixed BN, what is.

Summary of the Bayes Net Formalism David Danks Institute for Human & Machine Cognition.

Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.

Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.

Bayesian Learning By Porchelvi Vijayakumar. Cognitive Science Current Problem: How do children learn and how do they get it right?

Causal inferences During the last two lectures we have been discussing ways to make inferences about the causal relationships between variables. One of.

Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for

V13: Causality Aims: (1) understand the causal relationships between the variables of a network (2) interpret a Bayesian network as a causal model whose.

Methodological Problems in Cognitive Psychology David Danks Institute for Human & Machine Cognition January 10, 2003.

Learning Linear Causal Models Oksana Kohutyuk ComS 673 Spring 2005 Department of Computer Science Iowa State University.

Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.

Learning With Bayesian Networks Markus Kalisch ETH Zürich.

Announcements Project 4: Ghostbusters Homework 7

INTERVENTIONS AND INFERENCE / REASONING. Causal models  Recall from yesterday:  Represent relevance using graphs  Causal relevance ⇒ DAGs  Quantitative.

Slides for “Data Mining” by I. H. Witten and E. Frank.

Exploratory studies: you have empirical data and you want to know what sorts of causal models are consistent with it. Confirmatory tests: you have a causal.

Copyright © 2009 Pearson Education, Inc. Chapter 19 Confidence Intervals for Proportions.

Lecture 2: Statistical learning primer for biologists

Bayesian networks and their application in circuit reliability estimation Erin Taylor.

The Visual Causality Analyst: An Interactive Interface for Causal Reasoning Jun Wang, Stony Brook University Klaus Mueller, Stony Brook University, SUNY.

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

Causal inferences This week we have been discussing ways to make inferences about the causal relationships between variables. One of the strongest ways.

04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.

Markov Networks: Theory and Applications Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

1 Day 2: Search June 9, 2015 Carnegie Mellon University Center for Causal Discovery.

1Causal Inference and the KMSS. Predicting the future with the right model Stijn Meganck Vrije Universiteit Brussel Department of Electronics and Informatics.

1 Day 2: Search June 14, 2016 Carnegie Mellon University Center for Causal Discovery.

Lecture 7: Constrained Conditional Models

CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12

CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12

Markov Properties of Directed Acyclic Graphs

CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12

CSCI 5822 Probabilistic Models of Human and Machine Learning

More about Posterior Distributions

Center for Causal Discovery: Summer Short Course/Datathon

Discrete Event Simulation - 4

CAP 5636 – Advanced Artificial Intelligence

CS 188: Artificial Intelligence Fall 2007

CS 188: Artificial Intelligence

CS 188: Artificial Intelligence Spring 2007

Readings: K&F: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7 Markov networks, Factor graphs, and an unified view Start approximate inference If we are lucky… Graphical.

CS 188: Artificial Intelligence Fall 2008

Presentation transcript:

Course files

PRINCIPLES UNDERLYING CAUSAL SEARCH ALGORITHMS

Fundamental problem  As we have all heard many times… “Correlation is not causation!”

Fundamental problem  Why is this slogan correct?  Causal hypotheses make implicit claims about the effects of intervening (manipulating) one or more variables  Hypotheses about association or correlation make no such claims Correlation or probabilistic dependence can be produced in many ways

Fundamental problem  Some of the possible reasons why X and Y might be associated are:  Sheer chance  X causes Y  Y causes X  Some third variable Z influences X and Y  The value of X (or a cause of X) and the value of Y (or a cause of Y) can be causes/reasons for whether an individual is in the sample (sample selection bias)

Fundamental problem  Fundamental problem of causal search:  For any particular set of data, there are often many different causal structures that could have produced that data  Causation → Association map is many → one

Fundamental problem  Okay, so what can we do about this?  Use the data to figure out as much as possible (though it usually won’t be everything) Requires developing search procedures  And then try to narrow the possibilities Use other knowledge (e.g., time order, interventions) Get better / different data (e.g., run an experiment)

Always remember… Even if we cannot discover the whole truth, we might be able to find some of the truth!

Markov equivalence  Formally, we say that:  Two causal graphs are members of the same Markov Equivalence Class iff they imply the exact same (un)conditional independence relations among the observed variables By the Markov and Faithfulness assumptions  Remember that d-separation gives a purely graphical criterion for determining all of the (un)conditional independencies

Markov equivalence  The “Fundamental Problem of Causal Inference” can be restated as:  For some sets of independence relations, the Markov equivalence class is not a singleton  Markov equivalence classes give a precise characterization of what can be inferred from independencies alone

Markov equivalence  Examples:  X {Y, Z} ⇒  X Y | Z ⇒  X Y ⇒ X Y Z X Y Z X Y Z X Y Z Y Z X Y Z X

Markov equivalence  Two more examples:  Are these graphs Markov equivalent?  Are these two graphs? X Y Z X Y Z X Y Z X Y Z

Shared structure  What is shared by all of the graphs in a Markov equivalence class?  Same “skeleton” I.e., they all have the same adjacency relations  Same “unshielded colliders” I.e., X → Y ← Z with no edge between X and Z  Sometimes, other edges have same direction In these last two cases, we can infer that the true graph contains the shared directed edges.

Shared structure as patterns  Since every Markov equivalent graph has the same adjacencies, we can represent the whole class using a pattern  A pattern is itself a graph, but the edges represent edges in other graphs

Shared structure as patterns  A pattern can have directed and undirected edges  It represents all graphs that can be created by adding arrowheads to the undirected edges without creating either: (i) a cycle; or (ii) an unshielded collider  Let’s try some examples…

Shared structure as patterns Nitrogen — PlantGrowth — Bees Nitrogen → PlantGrowth → Bees Nitrogen ← PlantGrowth → Bees Nitrogen ← PlantGrowth ← Bees

Shared structure as patterns Nitrogen → PlantGrowth ← Bees

Formal problem of search  Given some dataset D, find:  Markov equivalence class, represented as a pattern P, that predicts exactly the independence relations found in the data  More colloquially, find the causal graphs that could have produced data like this

Hard to find a pattern  “Gee, how hard could this be? Just test all of the associations, find the Markov equivalence class, then write down the pattern for it. Voila! We’re doing causal learning!”  Big problem: # of independencies to test is super- exponential in # of variables:  2 variables ⇒ 1 test5 variables ⇒ 80 tests  3 variables ⇒ 6 tests6 variables ⇒ 240 tests  4 variables ⇒ 24 testsand so on…

General features of causal search  Huge model and parameter spaces  Even when we (necessarily) use prior information about the family of probability distributions.  Relevant statistics must be rapidly computed  But substantive knowledge about the domain may restrict the space of alternative models  Time order of variables  Required cause/effect relationships  Existence or non-existence of latent variables

Three schemata for search  Bayesian / score-based  Find the graph(s) with highest P(graph | data)  Constraint-based  Find the graph(s) that predict exactly the observed associations and independencies  Combined  Get “close” with constraint-based, and then find the best graph using score-based

Bayesian / score-based  Informally:  Give each model an initial score using “prior beliefs”  Update each score based on the likelihood of the data if the model were true  Output the highest-scoring model  Formally:  Specify P(M, v) for all models M and possible parameter values v of M  For any data D, P(D | M, v) can easily be calculated  P(M | D) ∝ v P(D | M, v)P(M, v)

Bayesian / score-based  In practice, this strategy is completely computationally intractable  There are too many graphs to check them all  So, we use a greedy search strategy  Start with an initial graph  Iteratively compare the current graph’s score ( ∝ posterior probability) with that of each 1- or 2-step modification of that graph By edge addition, deletion or reversal

Bayesian / score-based  Problem #1: Local maxima  Often, greedy searches get stuck  Solution:  Greedy search over Markov equivalence classes, rather than graphs (Meek) Has a proof of correctness and convergence (Chickering) But it gets to the right answer slowly

Bayesian / score-based  Problem #2: Unobserved variables  Huge number of graphs  Huge number of different parameterizations  No fast, general way to compute likelihoods from latent variable models  Partial solution:  Focus on a small, “plausible” set of models for which we can compute scores

Constraint-based  Implementation of the earlier idea  “Build” the Markov equivalence class that predicts the pattern of association actually found in the data Compatible with a variety of statistical techniques Note that we might have to introduce a latent variable to explain the pattern of statistics  Important constraints on search: Minimize the number of statistical tests Minimize the size of the conditioning sets (Why?)

Constraint-based  Algorithm step #1: Discover the adjacencies  Create the complete graph with undirected edges  Test all pairs X, Y for unconditional independence Remove X—Y edge if they are independent  Test all adjacent X, Y for independence given single N Remove X—Y edge if they are independent  Test adjacent pairs given two neighbors ……

Constraint-based  Algorithm step #2: (Try to) Orient edges  “Unshielded triple”: X — C — Y, but X, Y not adjacent  If X & Y independent given S containing C, then C must be a non-collider Since we have to condition on it to achieve d-separation  If X & Y independent given S not containing C, then C must be a collider Since the path is not active when not conditioning on C  And then do further orientations to ensure acyclicity and nodes being non-colliders

Constraint-based example  Variables are {X, Y, Z, W}  Only independencies are:  X Y  X W | Z  Y W | Z

Constraint-based example  Step 1: Form the complete graph using undirected edges X Y Z W

Constraint-based example  Step 2: For each pair of variables, remove the edge between them if they’re unconditionally independent X Y ⇒ X Y Z W

Constraint-based example  Step 3: For each adjacent pair, remove the edge if they’re independent conditional on some variable adjacent to one of them {X, Y} W | Z ⇒ X Y Z W

Constraint-based example  Step 4: Continue removing edges, checking independence conditional on 2 (or 3, or 4, or…) variables X Y Z W

Constraint-based example  Step 5: Orientation  For X – Z – Y, since X Y without conditioning on Z, then make Z a collider  Since Z is a non-collider between X and W, though, we must orient Z – W away from Z X Y Z W

Constraint-based output  Searches that allow for latent variables can also have edges of the form X o → Y  This indicates one of three possibilities: X → YX → Y  At least one unobserved common cause of X and Y  Both of these

Interventions to the rescue?  Interventions helped us solve an earlier equivalence class problem  Randomization meant that: Treatment-Effect association ⇒ T → E  Interventions alter equivalence classes, but don’t make them all into singletons  The fundamental problem of search remains

Before X-intervention X Y Z X Y Z X Y Z X Y Z X Y Z X Y Z X Y Z X Y Z X Y Z X Y Z X Y Z X Y Z Y Z X Y Z X Y Z X Y Z X Y Z X Y Z Y Z X Y Z X Y Z X Y Z X Y Z X Y Z X X X Y Z

After X-intervention X Y Z X Y Z X Y Z X Y Z X Y Z X Y Z X Y Z X Y Z X Y Z X Y Z X Y Z X Y Z Y Z X Y Z X Y Z X Y Z X Y Z X Y Z Y Z X Y Z X Y Z X Y Z X Y Z X Y Z X X X Y Z

Search with interventions  Search with interventions is the same as search with observations, except  We adjust the graphs in the search space to account for the intervention  For multiple experiments, we search for graphs in every output equivalence class  More complicated than this in the real world due to sampling variation

Example  Observation  Y Z | X ⇒  Intervention on X  Y {X, Z} ⇒ &  Only possible graph: X Y Z X Y Z X Y Z X Y Z Y Z X X Y Z

Looking ahead…  Have:  Basic formal representation for causation  Fundamental causal asymmetry (of intervention)  Inference & reasoning methods  Search & causal discovery principles  Need:  Search & causal discovery methods that work in the real world