3 The Grand ExperimentWater supplied to households by competing private companiesSometimes different companies supplied households in same streetIn south London two main companies:Lambeth Company (water supply from Thames Ditton, 22 miles upstream)Southwark and Vauxhall Company (water supply from Thames)
4 In 1853/54 cholera outbreakDeath Rates per people by water companyLambeth 10Southwark and Vauxhall 150Might be water but perhaps other factorsSnow compared death rates in 1849 epidemicLambeth 150Southwark and Vauxhall 125In 1852 Lambeth Company had changed supply from Hungerford Bridge
5 What would be good estimate of effect of clean water? 18491853/54DifferenceLambeth15010-140Vauxhall and Southwark12525-25140-165
6 This is basic idea of Differences-in-Differences Have already seen idea of using differences to estimate causal effectsTreatment/control groups in experimental dataOften would like to find ‘treatment’ and ‘control’ group who can be assumed to be similar in every way except receipt of treatmentThis may be very difficult to do
7 A Weaker Assumption is..Assume that, in absence of treatment, difference between ‘treatment’ and ‘control’ group is constant over timeWith this assumption can use observations on treatment and control group pre- and post-treatment to estimate causal effectIdeaDifference pre-treatment is ‘normal’ differenceDifference pre-treatment is ‘normal’ difference + causal effectDifference-in-difference is causal effect
8 A Graphical Representation yTimeTreatmentControlPre-Post-ABC
9 What is D-in-D estimate? Standard differences estimator is ABBut ‘normal’ difference estimated as CBHence D-in-D estimate is ACNote: assumes trends in outcome variables the same for treatment and control groupsThis is not testablewith two periods can get no idea of plausibility but can with more periods
10 Some Notation Define: μit=E(yit) Where i=0 is control group, i=1 is treatmentWhere t=0 is pre-period, t=1 is post-periodStandard ‘differences’ estimate of causal effect is estimate of:μ11-μ01‘Differences-in-Differences’ estimate of causal effect is estimate of:(μ11-μ01)-(μ10-μ00)
11 How to estimate? Can write D-in-D estimate as: (μ11-μ10)-(μ01 -μ00) This is simply the difference in the change of treatment and control groups so can estimate as:
12 This is simply ‘differences’ estimator applied to the difference To implement this need to have repeat observations on the same individualsMay not have this – individuals observed pre- and post-treatment may be differentWhat can we do in this case?
13 In this case can estimate…. D-in-D estimate is estimate of β3 – why is this?
14 A Comparison of the Two Methods Where have repeated observations could use both methodsWill give same parameter estimatesBut will give different standard errors‘levels’ version will assume residuals are independent – unlikely to be a good assumptionCan deal with this by:ClusteringOr estimating ‘differences’ version
15 Other Regressors Can put in other regressors as before Perhaps should think about way in which they enter the estimating equationE.g. if level of W affects level of y then should include ΔW in differences version
16 Differential Trends in Treatment and Control Groups Key assumption underlying validity of D-in-D estimate is that differences between treatment and control group would have remained constant in absence of treatmentCan never test thisWith only two periods can get no idea of plausibilityBut can with more than two periods
17 An Example: “Vertical Relationships and Competition in Retail Gasoline Markets”, by Justine Hastings, American Economic Review, 2004Interested in effect of vertical integration on retail petrol pricesInvestigates take-over in CA of independent ‘Thrifty’ chain of petrol stations by ARCO (more integrated)Defines treatment group as petrol stations which had a ‘Thrifty’ within 1 mileControl group those that did notLots of reasons why these groups might be different so D-in-D approach seems a good idea
18 This picture contains relevant information… Can see D-in-D estimate of +5c per gallonAlso can see trends before and after change very similar – D-in-D assumption valid
19 A Case which does not look so good…..Ashenfelter’s Dip Interested in effect of government-sponsored training (MDTA) on earningsTreatment group are those who received training in 1964Control group are random sample of population as a whole
21 Things to Note..Earnings for trainees very low in 1964 as training not working in that year – should ignore this yearSimple D-in-D approach would compare earnings in 1965 with 1963But earnings of trainees in 1963 seem to show a ‘dip’ – so D-in-D assumption probably not validProbably because those who enter training are those who had a bad shock (e.g. job loss)
22 Differences-in-Differences: Summary A very useful and widespread approachValidity does depend on assumption that trends would have been the same in absence of treatmentCan use other periods to see if this assumption is plausible or notUses 2 observations on same individual – most rudimentary form of panel data
23 A Brief Introduction to Panel Data Panel Data has both time-series and cross-section dimension – N individuals over T periodsWill restrict attention to balanced panels – same number of observations on each individualsWhole books written about but basics can be understood very simply and not very different from what we have seen beforeAsymptotics typically done on large N, small TUse yit to denote variable for individual i at time t
24 The Pooled Model Can simply ignore panel nature of data and estimate: yit=β’xit+εitThis will be consistent if E(εit|xit)=0 or plim(X’ ε/N)=0But computed standard errors will only be consistent if errors uncorrelated across observationsThis is unlikely:Correlation between residuals of same individual in different time periodsCorrelation between residuals of different individuals in same time period (aggregate shocks)
25 A More Plausible ModelShould recognise this as model with ‘group-level’ dummies or residualsHere, individual is a ‘group’
26 Three Models Fixed Effects Model Random Effects Model Treats θi as parameter to be estimated (like β)Consistency does not require anything about correlation with xitRandom Effects ModelTreats θi as part of residual (like θ)Consistency does require no correlation between θi and xitBetween-Groups ModelRuns regression on averages for each individual
27 Proposition 5.2 The fixed effect estimator of β will be consistent if: E(εit|xit)=0Rank(X,D)=N+KProof: Simple application of what you should know about linear regression model
28 IntuitionFirst condition should be obvious – regressors uncorrelated with residualsSecond condition requires regressors to be of full rankMain way in which this is likely to fail in fixed effects model is if some regressors vary only across individuals and not over timeSuch a variable perfectly multicollinear with individual fixed effect
29 Estimating the Fixed Effects Model Can estimate by ‘brute force’ - include separate dummy variable for every individual – but may be a lot of themCan also estimate in mean-deviation form:
30 How does de-meaning work? Can do simple OLS on de-meaned variablesSTATA command is like:. xtreg y x, fe i(id)
31 Problems with fixed effect estimator Only uses variation within individuals – sometimes called ‘within-group’ estimatorThis variation may be small part of total (so low precision) and more prone to measurement error (so more attenuation bias)Cannot use it to estimate effect of regressor that is constant for an individual
32 Random Effects Estimator Treats θi as part of residual (like θ)Consistency does require no correlation between θi and xitShould recognise as like model with clustered standard errorsBut random effects estimator is feasible GLS estimator
33 More on RE EstimatorWill not describe how we compute Ω-hat – see WooldridgeSTATA command. xtreg y x, re i(id)
34 Proposition 5.3 The random effects estimator of β will be consistent if: E(εit|xi1,..xit,.. xiT)=0E(θi|xi1,..xit,.. xiT)=0Rank(X’Ω-1X)=kProof: RE estimator a special case of the feasible GLS estimator so conditions for consistency are the same.Error has two components so need a. and b.
35 CommentsAssumption about exogeneity of errors is stronger than for FE model – need to assume εit uncorrelated with whole history of x – this is called strong exogeneityAssumption about rank condition weaker than for FE model e.g. can estimate effect variables that are constant for a given individual
36 Another reason why may prefer RE to FE model If exogeneity assumptions are satisfied RE estimate will be more efficient than FE estimatorApplication of general principle that imposing true restriction on data leads to efficiency gain.
37 Another Useful ResultCan show that RE estimator can be thought of as an OLS regression of:On:Where:This is sometimes called quasi-time demeaningSee Wooldridge (ch10, pp286-7) if want to know more
38 Between-Groups Estimator This takes individual means and estimates the regression by OLS:Stata command is xtreg y x, be i(id)Condition for consistency the same as for RE estimatorBut BE estimator less efficient as does not exploit variation in regressors for a given individualAnd cannot estimate variables like time trends whose average values do not vary across individualsSo why would anyone ever use it – lets think about measurement error
39 Measurement Error in Panel Data Models Assume true model is:Where x is one-dimensionalAssume E(εit|xi1,..xit,.. xiT)=0 and E(θi|xi1,..xit,.. xiT)=0 so that RE and BE estimators are consistent
40 Measurement Error Model Assume:where uit is classical measurement error, x*i is average value of x* for individual i and ηit is variation around the true value which is assumed to be uncorrelated with and uit and iid.We know this measurement error is likely to cause attenuation bias but this will vary between FE, RE and BE estimators.
41 Proposition 5.4 For FE model we have: For BE model we have: For RE model we have:Where:
42 What should we learn from this? All rather complicated – don’t worry too much about detailsBut intuition is simpleAttenuation bias largest for FE estimator – Var(x*) does not appear in denominator – FE estimator does not use this variation in data
43 Attenuation bias larger for RE than BE estimator as T>1>κ The averaging in the BE estimator reduces the importance of measurement error.Important to note that these results are dependent on the particular assumption about the measurement error process and the nature of the variation in xit – things would be very different if measurement error for a given individual did not vary over timeBut general point is the measurement error considerations could affect choice of model to estimate with panel data
44 Time EffectsHave treated time and individual dimensions asymmetrically – no good reason for thisErrors likely to be correlated for different individuals in same time period – most common way to deal with this is to include set of time dummies:
45 Estimating Fixed Effects Model in Differences Can also get rid of fixed effect by differencing:
46 Comparison of two methods Estimate parameters by OLS on differenced dataIf only 2 observations then get same estimates as ‘de-meaning’ methodBut standard errors differentWhy?: assumption about autocorrelation in residuals
47 What Are these assumptions? For de-meaned model:For differenced model:These are not consistent:
48 This leads to time series… Which is ‘better’ depends on which assumption is right – how can we decide this?We are not going to cover this in this course.