Presentation on theme: "1 Program Evaluation and Panel Discrete Data Models – Some Considerations on Methodology and Applications Cheng Hsiao."— Presentation transcript:
1 Program Evaluation and Panel Discrete Data Models – Some Considerations on Methodology and Applications Cheng Hsiao
2 1.Essential Issues for Program Evaluation 2.Panel Discrete Choice Models 3.General Principle of Estimating Structural Parameters in the Presence of Incidental Parameters 4.Parametric Approach 5.Semi-Parametric Approach 6.Bias-Reduced Approach 7.Concluding Remarks
3 Definition of Treatment Effect Let denote the potential outcomes of the i th individual in the untreated and treated state. Then, the treatment effects on the i th individual are just 1. Essential Issues for Program Evaluation
4 Two measures of treatment effects are of interest to policy makers: The average treatment effects (ATE) and the treatment effects on the treated (TT)
5 The ATE is of interest if one is interested in the effect of treatment for a randomly assigned individual or population mean response to treatment. The TT is of interest if the same selection rule for treatment continues in the future.
6 The observed data is in the form of, i = 1,…,N, where if the i th individual receives treatment and if not, and, where i = 1,…,N In other words, we do not simultaneously observe and. An individual is either observed with or. Using as a measure of ATE could be subject to two sources of bias: selection on observables and selection on unobservables. (1)
7 If selection into the treatment is random, (1) can be used to estimate ATE (≡ TT). then
8 If (1) will be a biased estimate of ATE or TT. then
9 Suppose we can decompose the outcomes in terms of the effects of observables and the effects of unobservables as where denotes the effects of observable factors, denotes the effects of unobservable factors, j = 0,1, and. Then and
10 Selection on observables Selection on unobservables
11 Conditional Independence Assumption – selection is ignorable after controlling a set of observable confounders then
12 Adjustment Methods 1. If there is no selection on unobservables (i.e. where j = 0,1) 1a. Propensity Score Matching Method (Rosenbaum and Rubin (1983)) Let, then (i) (ii)
13 (ii) → If a subclass of units or a matched treatment- control pair is homogenous in, the treated and control units in that subclass or matched pair will have the same distribution of, then, at any value of a propensity score, the difference between the treatment and the control means is an unbiased estimate of the ATE at that value of the propensity score (treatment assignment is ignorable)
15 Issues of this approach (i) Conditional independence assumption is not a testable hypothesis (ii) Estimates sensitive to how blocks of are constructed
16 Decriminalization and Marijuana Smoking Prevalence: Evidence from Australia Kannika Damrongplasit, Cheng Hsiao and Xueyan Zhao
17 The global sale of illegal drugs – US $150 billion (2001) US drugs policy costs $30-40 billion a year
18 Decriminalized StatesNon-decriminalized states South Australia 1987 ACT 1992 Northern Territory 1996 Western Australia 2003 New South Wales Queensland Victoria Tasmania
19 What is Decriminalization Policy? Reduction of penalties for possession and cultivation of marijuana for personal consumption (i.e. minor possession) In Australia, it is called “Expiation System” - Still an offence to use or grow marijuana - The offence is expiable by payment of a fine with no imprisonment and no criminal record if the fine is paid.
20 Debates on Marijuana Decriminalization Supporting Arguments - Criminal offence from marijuana possession is too severe - Allow separation of marijuana market from other harder drugs - Reduce law enforcement and criminal justice resources Opposing Arguments - Increase marijuana smoking prevalence - Greater use of other illicit drugs
21 Sources of Data (i)2001 National Drug Strategy Household survey (NDSHS) - Nationally representative survey of non-institutionalized civilian population aged 14 and above total observations resulting samples after delete missing data - Treatment group = 2968, Control group = (ii) Australian Illicit Drug Report (iii) Australia Bureau of Statistics
22 VariableDescription y Decrim P MAR Income Age1419 Age2024 Age2529 Age3034 Age3539 Age4069 Age70 Male Married Divorce Widow Never Married # Depchild Degree Working Status Aboriginal Unemployment rate 1 if using marijuana in the last 12 months, otherwise 0 1 if residing in decriminalized states ln(real price of marijuana) ln(real household annual income before tax) 1 if age is 14 to 19 years old 1 if age is 20 to 24 years old 1 if age is 25 to 29 years old 1 if age is 30 to 34 years old 1 if age is 35 to 39 years old 1 if age is 40 to 69 years old 1 if age is 70 years old and above, it is a reference category and is omitted from the estimation 1 for male and 0 for female 1 if married 1 if divorce 1 if widow 1 if single, it is a reference category and omitted from the estimation Number of dependent children aged 14 or below in the household 1 if university degree 1 if respondent is unemployed, and 0 otherwise 1 if Aboriginal or Torres Strait Islander State Unemployment rate (%)
23 Summary Statistics Variable All Data (N = 14008)Treatment (N = 2968)Control (N = 11040) MeanS.D.MeanS.D.MeanS.D. y Decrim P MAR Income Age1419 Age2024 Age2529 Age3034 Age3539 Age4069 Male Married Divorce Widow # Depchild Degree Working Status Aboriginal Unemployment rate
24 Non-parametric Model: Propensity Score Stratification Matching Propensity score is Under the assumptions, and then and
26 Range of Estimated Propensity Score Number of Treatment observations Number of Control observations ATEATET 0.05 – 0.45 Length of interval *** (0.026) (0.053) 0.05 – 0.45 With STATA interval *** (0.025) * (0.038) – Length of interval *** (0.018) ** (0.025) – With STATA interval *** (0.020) ** (0.025) 0.05 – 0.4 Length of interval *** (0.017) † (0.014) 0.05 – 0.4 With STATA interval *** (0.026) ** (0.012) 0.1 – 0.35 Length of interval *** (0.016) † (0.015) 0.1 – 0.35 With STATA interval *** (0.020) † (0.015)
27 Violation ofcould happen because of (1) Our sample size is not large enough to perform a reliable non-parametric estimation - For many ranges of propensity score, there is no overlapping observations - Within overlapping range, t-squared tests unambiguously reject the balancing condition or (2) Conditional independence assumption is violated
28 Non-parametric ApproachParametric Approach Advantages- Do not impose any distributional assumption -Take account of both selection on observables and unobservables - Can estimate the impact of other explanatory variables on smoking outcome in addition to the effect of decriminalization policy Disadvantages- Conditional independence assumption is the maintained hypothesis - Only take account of selection on observables - Need to impose both functional form and distributional assumptions
29 Endogenous Probit Switching Model (Model 1)
30 Average Treatment Effect Model 1 & 2: If sample is randomly drawn, Model 3 & 4:
33 Average Treatment Effect and Marginal Effect Binary ProbitBivariate Probit Two-part: Treatment Two-part: Control Switching: Treatment Switching: Control ATE Marginal Effect Decrim P MAR Income Age1419 Age2024 Age2529 Age3034 Age3539 Age4069 Male Married Divorce Widow Degree Working Status Aboriginal 0.037*** (0.0002) 0.067*** *** *** 0.598*** 0.578*** 0.570*** 0.443*** 0.085*** *** *** *** 0.110** 0.040*** (0.0002) *** *** 0.599*** 0.578*** 0.571*** 0.443*** 0.085*** *** *** *** 0.109** 0.137*** (0.002) *** *** 0.523*** 0.520*** 0.507*** 0.510*** 0.388*** 0.078*** *** † *** *** *** 0.599*** 0.601*** 0.580*** 0.567*** 0.450*** 0.089*** *** *** *** 0.172*** 0.163*** (0.002) *** *** 0.518*** 0.514*** 0.502*** 0.505*** 0.385** 0.078*** *** *** *** *** 0.628*** 0.631*** 0.604*** 0.588*** 0.451*** 0.078*** *** *** *** 0.143***
34 Panel Data – Allow better control of selection on observables and unobservables Difference-in-Difference Method - outcome of the j th individual after the treatment - outcome of the j th individual before the treatment - outcome of the i th individual who did not receive treatment at time t - outcome of the i th individual at time s
35 Cross-Section vs Panel Discrete Modeling
40 6. Bias-Reduced Estimator Mean Square Error = Suppose
41 Consider the log-likelihood function of N cross- sectional units observed over T time periods, where denotes the likelihood function of the T- time series observations for the i th individual. For instance, consider a binary choice model of the form,
42 Then The MLE is obtained by simultaneously solving for from (1) (2)
43 The MLE of can also be derived by first obtaining from (2) as a function of,, substituting into the likelihood function to form the concentrated log-likelihood function, (3) (4) then solving
44 When T is finite Expanding the score of the concentrated log-likelihood around, and evaluating it at : (5)
46 is derived by solving
48 Monte carlo experiments conducted by Carro (2006) have shown that when T = 8, the bias of modified MLE for dynamic probit and logit models are negligible. Another advantage of the Arellano-Carro approach is its generality. For instance, a dynamic logit model with time dummy explanatory variable cannot meet the Honore and Kyriazidou (2000) conditions for generating consistent estimator, but can still be estimated by the modified MLE with good finite sample properties.
49 Advantages of Carro (2006) produce: 1.No need to transform the parameters of interest into (information) orthogonal parameters as done in Cox and Reid (1987, JRSS B) or Arellano (2003). 2.No need to impose any conditions on the observed data as in the case of Honore and Kyriazidou (2000). In other words, all observed data can be utilized to obtain.
50 Issues: 1. may not have a closed form solution. Neither is the evaluation of expectation term trivial. Hence, computationally can be tedious. e.g. in the case of logit model,
51 It will be useful to derive a bias-reduced estimator that has the form
52 7. Concluding Remarks Issues: (i)Is there a simultaneous equation framework for discrete data?
53 If and are continuous, one can find a particular realized value of ( u 1, u 2 ) satisfying observed (, ) or one can consider that there exist independent shocks ( u 1, u 2 ) such that there exist (, ) satisfying the model.
54 But if is dichotomous, and exogenous, then and realized u 1 takes value of (1-F) and (-F) while F depends on y 2. If y 2 is endogenous, then u 1 cannot be independent shocks
55 (ii) Is there an equivalent limited information framework for discrete data simultaneous equation model (iii) Cross-Sectional dependence