V13: Causality Aims: (1) understand the causal relationships between the variables of a network (2) interpret a Bayesian network as a causal model whose.

Slides:



Advertisements
Similar presentations
CHAPTER TWELVE ANALYSING DATA I: QUANTITATIVE DATA ANALYSIS.
Advertisements

CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.
A Tutorial on Learning with Bayesian Networks
Learning on the Test Data: Leveraging “Unseen” Features Ben Taskar Ming FaiWong Daphne Koller.
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Bayesian Network and Influence Diagram A Guide to Construction And Analysis.
Holland on Rubin’s Model Part II. Formalizing These Intuitions. In the 1920 ’ s and 30 ’ s Jerzy Neyman, a Polish statistician, developed a mathematical.
MTH 161: Introduction To Statistics
Aim: How do we establish causation?
Introduction of Probabilistic Reasoning and Bayesian Networks
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 6.1 Chapter Six Probability.
From: Probabilistic Methods for Bioinformatics - With an Introduction to Bayesian Networks By: Rich Neapolitan.
 Once you know the correlation coefficient for your sample, you might want to determine whether this correlation occurred by chance.  Or does the relationship.
Bayesian Networks. Motivation The conditional independence assumption made by naïve Bayes classifiers may seem to rigid, especially for classification.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Chapter 3 Simple Regression. What is in this Chapter? This chapter starts with a linear regression model with one explanatory variable, and states the.
1 Bayesian Reasoning Chapter 13 CMSC 471 Adapted from slides by Tim Finin and Marie desJardins.
Chapter 11 Multiple Regression.
Biol 500: basic statistics
N The Experimental procedure involves manipulating something called the Explanatory Variable and seeing the effect on something called the Outcome Variable.
SIMPSON’S PARADOX, ACTIONS, DECISIONS, AND FREE WILL Judea Pearl UCLA
1. Statistics: Learning from Samples about Populations Inference 1: Confidence Intervals What does the 95% CI really mean? Inference 2: Hypothesis Tests.
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
FPP Chapters Design of Experiments. Main topics Designed experiments Comparison Randomization Observational studies “control” Compare and contrast.
Bayesian networks Chapter 14 Section 1 – 2. Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact.
Bayesian networks. Motivation We saw that the full joint probability can be used to answer any question about the domain, but can become intractable as.
Bayesian Networks for Data Mining David Heckerman Microsoft Research (Data Mining and Knowledge Discovery 1, (1997))
Lectures 2 – Oct 3, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Chapter 7 Sampling Distributions Statistics for Business (Env) 1.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
Course files
LAC group, 16/06/2011. So far...  Directed graphical models  Bayesian Networks Useful because both the structure and the parameters provide a natural.
INTERVENTIONS AND INFERENCE / REASONING. Causal models  Recall from yesterday:  Represent relevance using graphs  Causal relevance ⇒ DAGs  Quantitative.
Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):
CHAPTER 5 Probability Theory (continued) Introduction to Bayesian Networks.
CPSC 322, Lecture 33Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 33 Nov, 30, 2015 Slide source: from David Page (MIT) (which were.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 4 Designing Studies 4.2Experiments.
Copyright © Cengage Learning. All rights reserved. Chi-Square and F Distributions 10.
CORRELATIONS: PART II. Overview  Interpreting Correlations: p-values  Challenges in Observational Research  Correlations reduced by poor psychometrics.
Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3
Activity! Get your heart beating! Are standing pulse rates generally higher than sitting pulse rates? We will preform two experiments to try to answer.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 4 Designing Studies 4.2Experiments.
From the population to the sample The sampling distribution FETP India.
Introduction on Graphic Models
Significance Tests for Regression Analysis. A. Testing the Significance of Regression Models The first important significance test is for the regression.
Summary: connecting the question to the analysis(es) Jay S. Kaufman, PhD McGill University, Montreal QC 26 February :40 PM – 4:20 PM National Academy.
Variable selection in Regression modelling Simon Thornley.
Chapter 9 Introduction to the t Statistic
Methods of Presenting and Interpreting Information Class 9.
CHAPTER 4 Designing Studies
Maximum Expected Utility
The Practice of Statistics in the Life Sciences Third Edition
Read R&N Ch Next lecture: Read R&N
Significance Tests: A Four-Step Process
Learning Bayesian Network Models from Data
POSC 202A: Lecture Lecture: Substantive Significance, Relationship between Variables 1.
The Practice of Statistics in the Life Sciences Fourth Edition
Sampling and Sampling Distributions
A Bayesian Approach to Learning Causal networks
Read R&N Ch Next lecture: Read R&N
Causal Networks Farrokh Alemi, PhD.
Propagation Algorithm in Bayesian Networks
CS 188: Artificial Intelligence Fall 2007
Lesson Using Studies Wisely.
Markov Random Fields Presented by: Vladan Radosavljevic.
Copyright © Cengage Learning. All rights reserved.
Experimental Design: The Basic Building Blocks
Probabilistic Influence & d-separation
Read R&N Ch Next lecture: Read R&N
Presentation transcript:

V13: Causality Aims: (1) understand the causal relationships between the variables of a network (2) interpret a Bayesian network as a causal model whose edges have causal significance. For standard probabilistic queries it does not matter whether a Bayesian model is causal or not. It matters only that it encode the „right“ distribution. A correlation between 2 variables X and Y can arise in multiple settings: -when X causes Y -when Y causes X -or when X and Y are both effects of a single cause. 1 SS lecture 13Mathematics of Biological Networks

Intervention queries When some variable W causally affects both X and Y, we generally observe a correlation between them. If we know about the existence of W and can observe it, we can disentangle the correlation between X and Y that is induced by W. One approach to modeling causal relationships is using the notion of ideal intervention – interventions of the form do(Z := z), which force the variable Z to take the value z, and have no other immediate effect. In intervention queries, we are interested in answering queries of the form P(Y | do(z)) or P(Y | do(z), X = x) 2 SS lecture 13Mathematics of Biological Networks

Case study: student example Let us revisit our simple student example from lecture V8 and consider a particular student Gump. Conditioning an observation that Gump receives an A in the class increases the probability that he has high intelligence, his probability of getting a high SAT score, and his probability of getting a job. Consider a situation where Gump is lazy and rather than working hard to get an A in the class, he pays someone to hack into the university computer system and change his grade in the course to an A. What is his probability of getting a good job in this case? Intuitively, the company where Gump is applying only has access to his transcript. Thus, we expect P(J | do(g 1 ) = P(J | g 1 ). 3 SS lecture 13Mathematics of Biological Networks

Case study: student example What about the other 2 probabilities? Intuitively, we feel that the manipulation to Gump‘s grade should not affect our beliefs about his intelligence nor about his SAT score. Thus, we expect P(i 1 | do(g 1 ) ) = P(i 1 ) and P(s 1 | do(g 1 ) ) = P(s 1 ) An appropriate graphical model for the postinter- vention situation is shown in the right figure. Here, the grade (and thus his chances to get the job ) do no longer depend on intelligence or difficulty of the class. This model is an instance of a mutilated network. 4 SS lecture 13Mathematics of Biological Networks

Latent variables In practice, however, there is a huge set of possible latent variables, representing factors in the world that we cannot observe and often are not even aware of. A latent variable may induce correlations between the observed variables that do not correspond to causal relations between them, and hence forms a confounding factor in our goal of determining causal interactions. For the purpose of causal inference, it is critical to disentangle the component in the correlation between X and Y that is due to causal relationships and the component due to these confounding factors. Unfortunately, it is virtually impossible in complex real-world settings, to identify all relevant latent variables and quantify their effects. 5 SS lecture 13Mathematics of Biological Networks

Causal models A causal model has the same form as a probabilistic Bayesian network. It consists of a directed acyclic graph over the random variables in the domain. The model asserts that each variable X is governed by a causal mechanism that (stochastically) determines its value based on the values of its parents. A causal mechanism takes the same form as a standard CPD. For a node X and its parents U, the causal model specifies for each value u of U a distribution over the values of X. The difference to a probabilistic Bayesian network is in the interpretation of edges. In a causal model, we assume that X‘s parents are its direct causes. 6 SS lecture 13Mathematics of Biological Networks

Causal models The assumption that CPDs correspond to causal mechanisms forms the basis for the treatment of intervention queries. Wen we intervene at a variable X, setting its value to x, we replace its original causal mechanism with one that dictates that it should take the value x. The model presented before is an instance of the mutilated network. In a mutilated network B Z=z, we eliminate all incoming edges into each variable Z i  Z, (here: grade ) and set its value to be z i with probability 1. 7 SS lecture 13Mathematics of Biological Networks

Causal models 8 SS lecture 13Mathematics of Biological Networks

Causal models 9 SS lecture 13Mathematics of Biological Networks

Causal models 10 SS lecture 13Mathematics of Biological Networks

Simpson‘s paradox Consider the problem of trying to determine whether a drug is beneficial in curing a particular disease within some population of patients. Statistics show that, within the population, 57.5% of patients who took the drug (D) are cured (C), whereas only 50% of the patients who did not take the drug are cured. One may belief given these statistics that the drug is beneficial. However, within the subpopulation of male patients 70% who took the drug are cured, whereas 80% who did not take the drug are cured. Within the subpopulation of female patients 20% of who took the drug are cured, whereas 40% of those who did not take the drug are cured. 11 SS lecture 13Mathematics of Biological Networks

Simpson‘s paradox 12 SS lecture 13Mathematics of Biological Networks

Simpson‘s paradox 13 SS lecture 13Mathematics of Biological Networks

Structural Causal Identifiability Fully specifying a causal model is often impossible. At least, for intervention queries, we must disentangle the causal influence of X and Y from other factors leading to correlations between them. 14 SS lecture 13Mathematics of Biological Networks

Structural Causal Identifiability Consider a pair of variables X, Y with an observed correlation between them, and imagine that our task is to determine P(Y | do(X) ). Let us even assume that X temporally precedes Y, and therefore we know that Y cannot cause X. However, if we consider the possibility that least some of the correlation between X and Y is due to a hidden common cause, we have no way of determining how much effect perturbing X would have on Y. If all of the correlation is due to a causal link, then P(Y | do(X) ) = P(Y | X). Conversely, if all of the common correlation is due to the hidden common cause, then P(Y | do(X) ) = P(Y). In general, any value between those 2 distributions is possible. 15 SS lecture 13Mathematics of Biological Networks

Query Simplification Rules 16 SS lecture 13Mathematics of Biological Networks

Query Simplification Rules 17 SS lecture 13Mathematics of Biological Networks

Query Simplification Rules 18 SS lecture 13Mathematics of Biological Networks

Query Simplification Rules 19 SS lecture 13Mathematics of Biological Networks

Query Simplification Rules 20 SS lecture 13Mathematics of Biological Networks

Iterated Query Simplification There are many queries where none of the 3 rules apply directly. But we will see that we can also perform other transformations on the query allowing the rules to be applied. Let us revisit the right figure which involves the query P(J | do(G) ). None of our rules apply directly to this query. -we cannot eliminate the intervention as P(J | do(G) )  P(J) -we also cannot turn the intervention into an observation, P(J | do(G) )  P(J | G) because intervening at G only affects J via the single edge G → J, whereas conditioning G also influences J by the indirect trail G ← I → S → J. This trail is called a back-door trail, since it leaves G via the „back door“. 21 SS lecture 13Mathematics of Biological Networks

Iterated Query Simplification 22 SS lecture 13Mathematics of Biological Networks

Iterated Query Simplification 23 SS lecture 13Mathematics of Biological Networks

Revisit Simpson‘s paradox 24 SS lecture 13Mathematics of Biological Networks

Case study: lung cancer In the early 1960s, following a significant increase in the number of smokers that occurred around World War II, people began to notice a substantial increase in the number of cases of lung cancer. After many studies, a correlation was noticed between smoking and lung cancer. This correlation was noticed in both directions: The frequency of smokers among lung cancer patients was substantially higher than in the general population, Also, the frequency of lung cancer patients within the population of smokers was substantially higher than within the population of nonsmokers. This led the Surgeon General, in 1964, to issue a report linking cigarette smoking to lung cancer. This report came under severe attack by the tobacco industry. 25 SS lecture 13Mathematics of Biological Networks

Case study: lung cancer The industry claimed that the observed correlation can also be explained by a model in which there is no causal relationship between smoking and lung cancer. Instead, an observed genotype might exist that simultaneously causes cancer and a desire for nicotine. There exist several possible models Direct causal effectIndirect influence via a latent common parent genotype 26 SS lecture 13Mathematics of Biological Networks

Case study: lung cancer 27 SS lecture 13Mathematics of Biological Networks

Case study: lung cancer 28 SS lecture 13Mathematics of Biological Networks

Case study: lung cancer Pearl model We now need to assess from the marginal distribution over the observed variables alone the parametrization of the 3 links. Unfortunately, it is impossible to determine the parameters of these links from the observational data alone, since both original models can explain the data perfectly. Pearl thus refined the model somewhat by introducing an additional assumption, and could then determine estimates for the links. 29 SS lecture 13Mathematics of Biological Networks

Case study: lung cancer 30 SS lecture 13Mathematics of Biological Networks

Case study: lung cancer 31 SS lecture 13Mathematics of Biological Networks

Case study: lung cancer 32 SS lecture 13Mathematics of Biological Networks

Case study: lung cancer 33 SS lecture 13Mathematics of Biological Networks

Case study: lung cancer 34 SS lecture 13Mathematics of Biological Networks

Case study: lung cancer 35 SS lecture 13Mathematics of Biological Networks