Presentation on theme: "Causal Inference Yu Xie University of Michigan. Causal Questions A causal question is a simple question involving the relationship between two theoretical."— Presentation transcript:
Causal Questions A causal question is a simple question involving the relationship between two theoretical concepts: a cause and an effect. Cause => Effect? Or, X => Y?
Evaluation Research In high demand by policy makers. Definition: Evaluation research, or program evaluation, refers to the kind of applied social research that attempts to evaluate the effectiveness of social programs. Key to all evaluation research is causal inference: i.e., evaluating effectiveness of programs.
Economics Tradition Tradition of structuralism. Step 1: deriving structural equations based on theory Step 2: combining structural equations to be estimable Step 3: interpreting parameters as “structural,” (invariant, essential, “truth- like”)
Example: Mincer’s Human Capital Model for Earnings Maximization model: cost of education versus lifelong return on education. Derive Ln(Y) = + edu + exp + exp 2. Ln(Y) = + edu + exp + exp 2. However, a statistician or sociologist may choose the same regression function on the basis of observed data. Game is different.
Statistics Tradition Experimental tradition. Even with observational data, assume ignorability, that is, potential Y is independent of treatment given other covariates. However, statisticians are reluctant to accept “structural parameters” that are universal - attention to heterogeneous effects. Reluctant to make and suspicious of strong parametric assumptions imposed by researchers.
Example: Harding’s Study Propensity score matching on residence in poor neighborhood. Assumption: no other unobserved confounders conditional on observed covariates considered in the study. Matching to leave the distributions of the treated cases undisturbed. Sensitivity analysis to see the plausibility of the ignorability assumption.
Example: Need to Control for Selection on Observables SES Head Start Education + + - The observed, bivariate relationship between Head Start participation and educational outcomes may be negative.
More Examples Does cohabitation decrease or increase the likelihood of divorce? Is it better to have more siblings or fewer siblings for educational attainment? What is the earnings return to college education?
Causal Effect as a Counter- Factual Question For causal inference, one should ask the counter-factual question, for those who received “treatment” (subscript=1), what would have happened to them if they hadn't been treated? Or, y 1 t - y 1 c (t denoting treatment; c denoting control) Or, y 1 t - y 1 c (t denoting treatment; c denoting control) Note that y 1 t is observed, but y 1 c is not. Note that y 1 t is observed, but y 1 c is not.
Causal Effect as a Counter Factual Question (continued) For those who did not receive treatment (subscript=2),, what would have happened to them if they had been treated? Or, y 2 t - y 2 c (t denoting treatment; c denoting control) Or, y 2 t - y 2 c (t denoting treatment; c denoting control) Note that y 2 c is observed, but y 2 t is not. Note that y 2 c is observed, but y 2 t is not. The problem is one of missing data.
Assumption for Simple Comparison If subjects who are treated are, on average, “comparable” to subjects who are untreated (which can be achieved by randomization) we can assume away the problem by averaging: E(y 1 c )= E(y 2 c ), E(y 1 t ) = E(y 2 t ) In that case, E(y 1 t - y 1 c ) = E(y 2 t - y 2 c ) = E(y 1 t - y 2 c ) I.e, simple comparison is valid
Now Consider the Usual Case Population is divided into two subpopulations: P 1 if D i =1, P 0 if D i =0. Use the following notations: q = proportion of P 0 in P q = proportion of P 0 in P E(Y 1 T ) = E(Y T |D=1), E(Y 1 C ) = E(Y C |D=1) E(Y 1 T ) = E(Y T |D=1), E(Y 1 C ) = E(Y C |D=1) E(Y 0 T ) = E(Y T |D=0), E(Y 0 C ) = E(Y C |D=0) E(Y 0 T ) = E(Y T |D=0), E(Y 0 C ) = E(Y C |D=0) By total expectation rule: E(Y T - Y C ) = E(Y 1 T – Y 1 C )(1-q) + E(Y 0 T – Y 0 C )q = E(Y 1 T – Y 0 C ) - E(Y 1 C – Y 0 C ) - ( 1 - 0 )q, E(Y T - Y C ) = E(Y 1 T – Y 1 C )(1-q) + E(Y 0 T – Y 0 C )q = E(Y 1 T – Y 0 C ) - E(Y 1 C – Y 0 C ) - ( 1 - 0 )q, where 1 = E(Y 1 T – Y 1 C ), 0 = E(Y 0 T – Y 0 C ).
Two Potential Sources of Bias on Unobservables The standard estimator E(Y 1 T – Y 2 C ) contains two sources of biases: (1) The average difference between the two groups in the absence of treatment ( “heterogeneity bias.”) (1) The average difference between the two groups in the absence of treatment ( “heterogeneity bias.”) (2) The difference in the average treatment effect between the two groups ( “endogeneity bias.”) (2) The difference in the average treatment effect between the two groups ( “endogeneity bias.”) Both sources of bias average to zero under randomized assignment.
Observable Selectivity Bias If subjects who receive treatment and those who do not are different only in observed characteristics, this type of selectivity is called observable selectivity. This problem can be handled by statistical controls in multivariate analysis to make the two groups comparable (or, differences between the two groups are “ignorable” conditional on covariates). Often called “omitted variable bias.” This is the basis for multivariate analysis.
Conditions for Omitted Variable Bias (1) Correlation Condition: The omitted variable is correlated with the independent variable of primary interest; (2) Relevance Condition: The omitted variable affects the dependent variable. If one of the two conditions is not met, an omitted variable does not introduce a bias.
Experimental Approach Experimental design eliminates both types of problems. Example: High/Scope Perry Preschool study conducted in Ypsilanti. Manski and Garfinkel (1992): experimental designs suffer from shortcomings that are often overlooked. Manski and Garfinkel refer to experimental approach as “reduced-form.”
Shortcomings of Experimental Approach We cannot always extrapolate results from an experimental setting to natural setting. Thus, Manski and Garfinkel openly criticize experimental designs: "In fact, reduced-form experimental evaluation actually requires that a highly specific and suspect structural assumption hold: Individuals and organizations must respond in the same way to the experimental version of a program as they would to the actual version." (p.17) I.e., lacking “external validity.”
Structural Approach Manski and Garfinkel propose the "structural" approach as an alternative. Definition: structural approach refers to statistical methods that model causal processes based on observational data. Head Start example: control on SES, parental involvement, etc. Requires strong social science theories.
Structural vs. Reduced-Form Equations 1. Structural Equations Structural equations are theoretically derived equations that often have endogenous variables as independent variables. 2. Reduced-Form Equations Reduced-form equations are equations in which all independent variables are exogenous variables. I.e., in reduced-form equations, we purposely ignore intermediate variables.
Comparison of the two Approaches Advantages of Structural Approach: Since it is conducted in a natural setting, its findings are directly relevant to the whole population. In contrast, results from an experimental design need to be extrapolated. It is less costly. In contrast, experimental research is very expensive. It builds upon and contributes to theory. In contract, the reduced-form approach only yield simple answers to simple questions.
Advantages of Reduced-form Approach Biases due to unobservables can be eliminated through randomization. It requires fewer assumptions. It does not require complicated statistical models that the public and government officials have difficulty understanding.
Research Design Approaches Quasi-Experiment Utilizing spatial variation Utilizing spatial variation Utilizing temporal variation Utilizing temporal variation Clustering Design Fixed effects model Fixed effects model Instrumental-Variable Estimation Special type of structural approach Special type of structural approach
Examples: Quasi-Experiment Design Utilizing Spatial Variation Certain policies are introduced in State A but not in State B. States A and B are otherwise comparable. States A and B are otherwise comparable. Observe how outcome Y differs between State A and State B. Observe how outcome Y differs between State A and State B. Pace of economic reforms in China differs greatly by region Associate regional variation in returns to education to regional variation in depth of economic reforms. Associate regional variation in returns to education to regional variation in depth of economic reforms.
Examples: Quasi-Experiment Design Utilizing Temporal Variation Declining significance of race? Examine temporal changes in SES differences by race Examine temporal changes in SES differences by race Hope to see a narrowing of racial gaps, particularly after the civil rights movement. Hope to see a narrowing of racial gaps, particularly after the civil rights movement. Effect of a new instructional method:
Extra 1: Propensity Score P(D=1)=probability of treatment. Could be a function of other observed variables, z vector. We can estimate P(D=1) through a logit model: logit(P) = ’ z. Under the assumption of no other relevant factors, group T and group C are comparable within levels of the estimated propensity score.
Extra 2: Instrumental-Variable Approach Condition: IV Z does not affect Y except through X, meaning: Z is correlated with Y but does not affect Y directly (called “exclusion restriction”). Z is correlated with Y but does not affect Y directly (called “exclusion restriction”). Z is also correlated with X but not perfectly. Z is also correlated with X but not perfectly. It’s very hard to find a good Z. X Y Z U
Extra 3: Fixed Effects Model Sibling models Family SES, environment are shared Family SES, environment are shared Y i1 = X i1 i i1Y i1 = X i1 i i1 Y i2 = X i2 i i2Y i2 = X i2 i i2 and X may be correlated. and X may be correlated. Take difference between the two eq. Take difference between the two eq. Y i2 - Y i1 = X i2 - X i1 ) i2 - i1 )Y i2 - Y i1 = X i2 - X i1 ) i2 - i1 ) Resulting in a more robust equationResulting in a more robust equation Properties of the fixed effects approach: Properties of the fixed effects approach: All fixed-characteristics are controlledAll fixed-characteristics are controlled It consumes a lot of informationIt consumes a lot of information Unobserved heterogeneity is controlled at the group level (fixed effects)Unobserved heterogeneity is controlled at the group level (fixed effects)
Decomposing Correlation/Covariance via Path Analysis Total effect: regression coefficient in the reduced-form regression Direct effect: regression coefficient in the structural regression Indirect effect: product of regression coefficients in structural regressions. In linear equation system: Total effect = direct effect + indirect effect Total effect = direct effect + indirect effect
Decomposition of Correlation Observed correlation between two variables = total effects plus associated total effects. Example: correlation between U and X in the Blau-Duncan model. More decomposition examples in the Blau- Duncan model.
Table 1: Typology of Workers in Labor Market Transition Current State SectorCurrent Market Sector Initial State SectorType I (Stayers) Type II (Later Entrants) Initial Market SectorType III (Market Losers) Type IV (Early Birds)
Wu and Xie (2003) We found that higher earnings returns to education in the market sector are limited only to recent market entrants, and that early market entrants resemble workers in the state sector in both the level of earnings and returns to education.
Interpretation This suggest that higher returns to education in the market sector should not be construed as caused by marketization per se, and that the sorting process of workers in labor markets helps explain the sectoral differentials.
Jann’s (2005) Criticisms There is no statistical difference in returns to education between early entrants and late entrants. Thus, Wu and Xie’s conclusion is incorrect.
Xie and Wu’s (2005) Reply Classical statistical tests are mainly functions of the sample size. Statistical methods should be secondary to substantive applications. Social processes generating the three groups are cumulative so that the three groups are not symmetric.
Year p 1 =0.11p 2 =0.16 d=2 d=1 19961987 1978 Stat e Sect or Mar ket Sect or Experienced Workers (1197) Figure 1. Flow Chart of Labor Market Transitions in China, 1978 – 1996. Stayers (1068) Stayers (1590) Stayers (1337) Later Entrants (253) Earl Birds (129) New Entrants to the State Sector (522) Earl Birds (129)
Two Quantities of Interest Because P 2 Because P 2 is small at 0.16, we may give full weight to stayers among those who did not experience early entry, yielding the comparison between early birds and stayers:
Propensity Score Analysis Under ignorability, we can adjust systematic differences between the treatment and the control group with propensity scores. We then divide the sample into groups defined by propensity scores. Covariates are balanced between the treatment group and the control groups within each propensity score group.
Remember Education is not the treatment. Market entry is the treatment. In propensity score analysis, education (along with other covariates) is part of the propensity score.
Figure 2a. Histogram of the Estimated Propensity Score for Early Entry (Early Entrants versus Late Entrants + Stayers)
Figure 2b. Histogram of the Estimated Propensity Score for Late Entry (Late Entrants versus Stayers)
Substantive Findings for Early Entrants There is no market premium. The null effect is true throughout the range of propensity score. Analysis is through multi-level modeling, with groups defined by propensity scores.
Substantive Findings for Late Entrants Are very different. There is market premium. Plus, the market premium is concentrated for individuals with lowest propensity to make the transition. Analysis is through multi-level modeling, with groups defined by propensity scores.
Interpretation and Conclusion Low-propensity later entrants are workers who are doing especially well in the state sector. For them, the attraction of the market sector needs to be large enough to more than compensate for the advantages they already enjoy in the state sector. Therefore, only those with the best market opportunities actually make the transition. These results illustrate a classic violation of the ignorability assumption, the problem of endogeneity: individuals select their “treatment” based on the anticipated outcome, which is not homogeneous across workers. This kind of insight into social processes can never be produced by analyses such as Jann’s.