Holland on Rubin’s Model Part II. Formalizing These Intuitions. In the 1920 ’ s and 30 ’ s Jerzy Neyman, a Polish statistician, developed a mathematical.

Slides:



Advertisements
Similar presentations
STA291 Statistical Methods Lecture 23. Difference of means, redux Default case: assume no natural connection between individual observations in the two.
Advertisements

Mean, Proportion, CLT Bootstrap
The World Bank Human Development Network Spanish Impact Evaluation Fund.
Probability Distributions CSLU 2850.Lo1 Spring 2008 Cameron McInally Fordham University May contain work from the Creative Commons.
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
Variance reduction techniques. 2 Introduction Simulation models should be coded such that they are efficient. Efficiency in terms of programming ensures.
Review of the Basic Logic of NHST Significance tests are used to accept or reject the null hypothesis. This is done by studying the sampling distribution.
Omitted Variable Bias Methods of Economic Investigation Lecture 7 1.
Physics 270 – Experimental Physics. Let say we are given a functional relationship between several measured variables Q(x, y, …) What is the uncertainty.
Econ 140 Lecture 61 Inference about a Mean Lecture 6.
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Chapter 4 Multiple Regression.
Independent Sample T-test Often used with experimental designs N subjects are randomly assigned to two groups (Control * Treatment). After treatment, the.
Maximum likelihood (ML)
Introduction to the design (and analysis) of experiments James M. Curran Department of Statistics, University of Auckland
Copyright © 2012 Pearson Education. All rights reserved Copyright © 2012 Pearson Education. All rights reserved. Chapter 10 Sampling Distributions.
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
Difference Two Groups 1. Content Experimental Research Methods: Prospective Randomization, Manipulation Control Research designs Validity Construct Internal.
Modeling and Simulation CS 313
1 Statistical Inference Greg C Elvers. 2 Why Use Statistical Inference Whenever we collect data, we want our results to be true for the entire population.
SUTVA, Assignment Mechanism STA 320 Design and Analysis of Causal Studies Dr. Kari Lock Morgan and Dr. Fan Li Department of Statistical Science Duke University.
1 TABLE OF CONTENTS PROBABILITY THEORY Lecture – 1Basics Lecture – 2 Independence and Bernoulli Trials Lecture – 3Random Variables Lecture – 4 Binomial.
ECON 3039 Labor Economics By Elliott Fan Economics, NTU Elliott Fan: Labor 2015 Fall Lecture 21.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Multiple Random Variables Two Discrete Random Variables –Joint pmf –Marginal pmf Two Continuous Random Variables –Joint Distribution (PDF) –Joint Density.
Chapter 7 Sampling and Sampling Distributions ©. Simple Random Sample simple random sample Suppose that we want to select a sample of n objects from a.
Application 2: Minnesota Domestic Violence Experiment Methods of Economic Investigation Lecture 6.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 10 Comparing Two Populations or Groups 10.1.
1 2 nd Pre-Lab Quiz 3 rd Pre-Lab Quiz 4 th Pre-Lab Quiz.
Scientific Method Vocabulary Review Compiled By Robert Strawn.
Day 1 lecture “clean-up” issues. What’s wrong with social epi? Kaplan Poor theory Individual focus Risk-factor thinking Interdisciplinarity Berkman Too.
Inference: Probabilities and Distributions Feb , 2012.
Chapter 8: Simple Linear Regression Yang Zhenlin.
Impact Evaluation Sebastian Galiani November 2006 Causal Inference.
Stat 31, Section 1, Last Time Big Rules of Probability –The not rule –The or rule –The and rule P{A & B} = P{A|B}P{B} = P{B|A}P{A} Bayes Rule (turn around.
Sampling Design and Analysis MTH 494 Lecture-21 Ossam Chohan Assistant Professor CIIT Abbottabad.
Comparing Means Chapter 24. Plot the Data The natural display for comparing two groups is boxplots of the data for the two groups, placed side-by-side.
Learning Objectives After this section, you should be able to: The Practice of Statistics, 5 th Edition1 DESCRIBE the shape, center, and spread of the.
Copyright © Cengage Learning. All rights reserved. 5 Joint Probability Distributions and Random Samples.
Public Finance and Public Policy Jonathan Gruber Third Edition Copyright © 2010 Worth Publishers 1 of 24 Copyright © 2010 Worth Publishers.
The inference and accuracy We learned how to estimate the probability that the percentage of some subjects in the sample would be in a given interval by.
The accuracy of averages We learned how to make inference from the sample to the population: Counting the percentages. Here we begin to learn how to make.
Copyright © 2015 Inter-American Development Bank. This work is licensed under a Creative Commons IGO 3.0 Attribution-Non Commercial-No Derivatives (CC-IGO.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Sampling Distributions Chapter 18. Sampling Distributions A parameter is a number that describes the population. In statistical practice, the value of.
The simple linear regression model and parameter estimation
CHAPTER 4 Designing Studies
12. Principles of Parameter Estimation
CHAPTER 10 Comparing Two Populations or Groups
Chapter 5 STATISTICS (PART 4).
CHAPTER 10 Comparing Two Populations or Groups
Explanation of slide: Logos, to show while the audience arrive.
CHAPTER 4 Designing Studies
Empirical Tools of Public Finance
CHAPTER 4 Designing Studies
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 4 Designing Studies
Chapter 24 Comparing Means Copyright © 2009 Pearson Education, Inc.
CHAPTER 4 Designing Studies
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
Two Halves to Statistics
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
Introduction to the design (and analysis) of experiments
12. Principles of Parameter Estimation
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
Presentation transcript:

Holland on Rubin’s Model Part II

Formalizing These Intuitions. In the 1920 ’ s and 30 ’ s Jerzy Neyman, a Polish statistician, developed a mathematical model that allowed him to make sense of intuitions like those I have been discussing. Neyman applied this model in the analysis of results of randomized experiments, which had only recently been invented, by R. A. Fisher, a British statistician. In the 1970 ’ s Donald Rubin, an American statistician, expanded this model to cover the more complicated cases of non- randomized “ observational ” studies. I will give a brief introduction to the Neyman-Rubin model and show its connection to the ideas I have been discussing.

It is easiest to talk about experiments and observational studies where there are only two causes or treatment conditions, t and c. In this setting there is a sample of “ units ” (these are samples of material, people, parts of agricultural fields, etc.) each of which is “ subjected to ” one of the two treatment conditions, t or c. Denote by x the treatment given to a unit, x i = t or c for unit i. Later on we record the value of some outcome, y i, for unit i. So the data are pairs, (y i, x i ), for each unit, i. That is all we get to observe.

Data Analysis In a situation like this, about the only thing that a sensible person knows how to do is to compute the mean value of y for those units for which x i = t and compare it to the mean value of y for those units for which x i = c. At the population level (i.e., big samples) this is a comparison of E(y| x = t) and E(y| x = c).(1) When does the difference, E(y| x = t) - E(y| x = c),(2) have an interpretation as a Causal Effect?

Return to the idea of a Minimal Ideal Comparative Experiment It has three parts Two identical units of study Two precisely defined and executed experimental conditions. Precisely measured outcome observed on each unit an appropriate time after exposure to the experimental conditions. We never thought seriously that we could find “ Two identical units, ” but the Neyman-Rubin model substitutes this impossible idea with one that is possible to think about.

We can imagine a unit being exposed to one of two treatment conditions. If we do, then there are two Potential Outcomes that we might observe for unit i: Y ti = outcome for unit if i is exposed to t, Y ci = outcome for unit if i is exposed to c. Once we go this far, it is not hard to realize that the observed outcome, y i, is actually the realization of two different potential outcomes. If x i = t, then y i = Y ti, and If x i = c, then y i = Y ci. Thus, the observed outcome, y i, is not the simple datum it might first appear to be. This is all a result of thinking about causation and goes way beyond the data to the causal interpretation of the data.

Now let ’ s go back to (1) and (2) specified previously In terms of the Potential Outcomes, Y t and Y c, the difference in (2) is E(Y t | x = t) - E(Y c | x = c),(3) We are not done yet. We need to define the Average Causal Effect (ACE) of t relative to c. Since we envision every real unit having two potential values, Y ti, Y ci, from the two Potential Outcomes, we can certainly entertain the idea of their difference, Y ti - Y ci.(4) This difference is the Causal Effect of t relative to c on Y for unit i.

If we average this difference over all of the units in the population we get the Average Causal Effect (ACE) of t relative to c on y, i.e., ACE = E(Yt – Yc)(5) For reasons of simplicity, I will introduce the idea of the ACE on the treated group, or the effect of the treatment on the treated, that is, ACE(x = t) = E(Yt – Yc|x = t),(6) so that we may re-express (6) as ACE(x = t) = E(Yt|x = t) - E(Yc|x = t).(7)

Returning now to (3), we see that ACE(x = t) = E(Y t | x = t) - E(Y c | x = c)(8) + E(Y c | x = c) - E(Y c | x = t).(9) (8) is the difference between the means of the t and c groups, at the population level. But (9) involves something we can know, i.e. E(Y c | x = c) and something that is impossible to know directly, i.e., E(Y c | x = t). E(Y c | x = t) is an example of a counterfactual expected value.

Counterfactuals comes up in discussions of causation by those who don ’ t have a model to be very concrete about them. The difference E(Y c | x = c) - E(Y c | x = t), will be zero if E(Y c | x = c) = E(Y c | x = t). A condition that insures this is that Y c is statistically independent of x. How can this occur?

The effect of randomization. When we randomize units to treatments we use an external mechanism to assign treatments of units like the toss of a coin or a table of random numbers. This has the effect of making the assignment variable, x, statistically independent of any variable defined on the units including Y c. Thus, under random assignment, at the level of the population we have E(Y c | x = c) = E(Y c | x = t).

Hence, we have equality ACE(x = t) = E(Y t | x = t) - E(Y c | x = c),(10) or using the original notation E(y| x = t) - E(y| x = c) = ACE(x = t). (11) Equation (11) is very important. It shows that a causal parameter, the ACE, is equal to something that we can estimate with data, and thus do statistical inference about.

From this point of view, causal inference is statistical inference about causal parameters and not something about ethereal quantities that have little reality. Note also, that the admonition about a “ cause ” being something that could be a treatment in some experiment is given more force by the identity between an ACE and a difference between means that can be estimated by data.

Causal Models A causal model is an assumption about the Potential Outcomes. Here are two very common ones. Homogeneous units: Y ti = Y t, and Y ci = Y c, for all i, So it does not matter what i we look at, we get the same outcome under t or c. This is the basic tool of most of lab science where the units are carefully created samples of material for study.

Constant Causal Effects: Y ti - Y ci = k, for any i. The effect of t relative to c is the same for all units. Clearly Homogeneous Units implies Constant Causal Effect, but the converse is not necessarily true. So Constant Causal Effect is a weaker assumption than Homogeneous Units. Constant Causal Effect may be thought of as a formalization of Hume ’ s “ constant conjunction ” condition for causality. It is also the reason you learn about statistical models in which the distributions have the same shape but different means.

Without going any further, let me just end by saying that the Neyman-Rubin model can be used to illuminate any causal discussion or idea and should be part of any scientists tool kit.