Presentation is loading. Please wait.

Presentation is loading. Please wait.

Non-Experimental Data: Natural Experiments and more on IV.

Similar presentations

Presentation on theme: "Non-Experimental Data: Natural Experiments and more on IV."— Presentation transcript:

1 Non-Experimental Data: Natural Experiments and more on IV


3 Non-Experimental Data Refers to all data that has not been collected as part of experiment Quality of analysis depends on how well one can deal with problems of: –Omitted variables –Reverse causality –Measurement error –selection Or… how close one can get to experimental conditions

4 Natural/ ‘Quasi’ Experiments Used to refer to situation that is not experimental but is ‘as if’ it was Not a precise definition – saying your data is a ‘natural experiment’ makes it sound better Refers to case where variation in X is ‘good variation’ (directly or indirectly via instrument) A Famous Example: London, 1854

5 The Case of the Broad Street Pump Regular cholera epidemics in 19 th century London Widely believed to be caused by ‘bad air’ John Snow thought ‘bad water’ was cause Experimental design would be to randomly give some people good water and some bad water Ethical Problems with this

6 Soho Outbreak August/September 1854 People closest to Broad Street Pump most likely to die But breathe same air so does not resolve air vs. water hypothesis Nearby workhouse had own well and few deaths Nearby brewery had own well and no deaths (workers all drank beer)

7 Why is this a Natural experiment? Variation in water supply ‘as if’ it had been randomly assigned – other factors (‘air’) held constant Can then estimate treatment effect using difference in means Or run regression of death on water source distance to pump, other factors Strongly suggests water the cause Woman died in Hampstead, niece in Islington

8 What’s that got to do with it? Aunt liked taste of water from Broad Street pump Had it delivered every day Niece had visited her Investigation of well found contamination by sewer This is non-experimental data but analysed in a way that makes a very powerful case – no theory either

9 Methods for Analysing Data from Natural Experiments If data is ‘as if’ it were experimental then can use all techniques described for experimental data –OLS (perhaps Snow case) –IV to get appropriate units of measurement Will say more about IV than OLS –IV perhaps more common –If can use OLS not more to say –With IV there is more to say – weak instruments

10 Conditions for Instrument Validity To be valid instrument: –Must be correlated with X - testable –Must be uncorrelated with ‘error’ – untestable – have to argue case for this assumption These conditions guaranteed with instrument for experimental data But more problematic for data from quasi- experiments

11 Bombs, Bones and Breakpoints: The Geography of Economic Activity Davis and Weinstein, AER, 2002 Existence of agglomerations (e.g. cities) a puzzle Land and labour costs higher so why don’t firms relocate to increase profits Must be some compensatory productivity effect Different hypotheses about this: –Locational fundamentals –Increasing returns (Krugman) – path-dependence

12 Testing these Hypotheses Consider a temporary shock to city population Locational fundamentals theory would predict no permanent effect Increasing returns would suggest permanent effect Would like to do experiment of randomly assigning shocks to city size This is not going to happen

13 The Davis-Weinstein idea Use US bombing of Japanese cities in WW2 This is a ‘natural experiment’ not a true experiment because: –WW2 not caused by desire to test theories of economic geography –Pattern of US bombing not random Sample is 303 Japanese cities, data is: –Population before and after bombing –Measures of destruction

14 Basic Equation Δs i,47-40 is change in population just before and after war Δs i,60-47 is change in population at later period How to test hypotheses: –Locational fundamentals predicts β 1 =-1 –Increasing returns predicts β 1 =0

15 The IV approach Δs i,47-40 might be influenced by both permanent and temporary factors Only want part that is transitory shock caused by war damage Instrument Δs i,47-40 by measures of death and destruction

16 The First-Stage: Correlation of Δs i,47-40 with Z

17 Why Do We Need First-Stage? Establishes instrument relevance – correlation of X and Z Gives an idea of how strong this correlation is – ‘weak instrument’ problem In this case reported first-stage not obviously that implicit in what follows –That would be bad practice

18 The IV Estimates

19 Why Are these other variables included? Potential criticisms of instrument exogeneity –Government post-war reconstruction expenses correlated with destruction and had an effect on population growth –US bombing heavier of cities of strategic importance (perhaps they had higher growth rates) Inclusion of the extra variables designed to head off these criticisms Assumption is that of exogeneity conditional on the inclusion of these variables Conclusion favours locational fundamentals view

20 An additional piece of supporting evidence…. Always trying to build a strong evidence base – many potential ways to do this, not just estimating equations

21 The Problem of Weak Instruments Say that instruments are ‘weak’ if correlation between X and Z low (after inclusion of other exogenous variables) Rule of thumb - If F-statistic on instruments in first-stage less than 10 then may be problem (will explain this a bit later)

22 Why Do Weak Instruments Matter? A whole range of problems tend to arise if instruments are weak Asymptotic problems: –High asymptotic variance –Small departures from instrument exogeneity lead to big inconsistencies Finite-Sample Problems: –Small-sample distirbution may be very different from asymptotic one May be large bias Computed variance may be wrong Distribution may be very different from normal

23 Asymptotic Problems I: Low precision asymptotic variance of IV estimator is larger the weaker the instruments Intuition – variance in any estimator tends to be lower the bigger the variation in X – think of σ 2 (X’X) -1 IV only uses variation in X that is associated with Z As instruments get weaker using less and less variation in X

24 Asymptotic Problems II: Small Departures from Instrument Exogeneity Lead to Big Inconsistencies Suppose true causal model is y=Xβ+Zγ+ε So possibly direct effect of Z on y. Instrument exogeneity is γ=0. Obviously want this to be zero but might hope that no big problem if ‘close to zero’ – a small deviation from exogeneity

25 But this will not be the case if instruments weak… consider just- identified case If instruments weak then Σ ZX small so Σ ZX -1 large so γ multiplied by a large number

26 An Example: The Return to Education Economists long-interested in whether investment in human capital a ‘good’ investment Some theory shows that coefficient on s in regression: y=β 0 +β 1 s+β 2 x+ε Is measure of rate of return to education OLS estimates around 8% - suggests very good investment Might be liquidity constraints Might be bias

27 Potential Sources of Bias Most commonly mentioned is ‘ability bias’ Ability correlated with earnings independent of education Ability correlated with education If ability omitted from ‘x’ variables then usual formula for omitted variables bias suggests upward bias in OLS estimate

28 Potential Solution Find an instrument correlated with education but uncorrelated with ‘ability’ (or other excluded variables) Angrist-Krueger “Does Compulsory Schooling Attendance Affect Schooling and Earnings”, QJE 1991, suggest using quarter of birth Argue correlated with education because of school start age policies and school leaving laws (instrument relevance) Don’t have to accept this – can test it

29 A graphical version of first-stage (correlation between education and Z)

30 In this case… Their instrument is binary so IV estimator can be written in Wald form And this leads to following expression for potential inconsistency: Note denominator is difference in schooling for those born in first- and other quarters Instrument will be ‘weak’ if this difference is small

31 Their Results

32 Interpretation (and Potential Criticism) IV estimates not much below OLS estimates (higher in one case) Suggests ‘ability bias’ no big deal But instrument is weak Being born in 1 st quarter reduces education by 0.1 years Means ‘γ’ will be multiplied by 10

33 But why should we have γ≠0 Remember this would imply a direct effect of quarter of birth on earnings, not just one that works through the effect on education Bound, Jaeger and Baker argued that evidence that quarter of birth correlated with: –Mental and physical health –Socioeconomic status of parents Unlikely that any effects are large but don’t have to be when instruments are weak

34 An example: UK data Effect is small but significantly different from zero

35 A Back-of-the-Envelope Calculation Being born in first quarter means 0.01 less likely to have a managerial/professional parent Being a manager/professional raises log earnings by 0.64 Correlation between earnings of children and parents 0.4 Effect on earnings through this route 0.01*0.64*0.4=0.00256 i.e. ¼ of 1 per cent Small but weak instrument causes effect on inconsistency of IV estimate to be multiplied by 10 – 0.0256 Now large relative to OLS estimate of 0.08

36 Summary Small deviations from instrument exogeneity lead to big inconsistencies in IV estimate if instruments are weak Suspect this is often of great practical importance Quite common to use ‘odd’ instrument – argue that ‘no reason to believe’ it is correlated with ε but show correlation with X

37 Finite Sample Problems This is a very complicated topic Exact results for special cases, approximations for more general cases Hard to say anything that is definitely true but can give useful guidance Problems in 3 areas –Bias –Incorrect measurement of variance –Non-normal distribution But really all different symptoms of same thing

38 Review and Reminder If ask STATA to estimate equation by IV Coefficients compute using formula given Standard errors computed using formula for asymptotic variance T-statistics, confidence intervals and p- values computed using assumption that estimator is unbiased with variance as computed and normally distributed All are asymptotic results

39 Difference between asymptotic and finite-sample distributions This is normal case Only in special cases e.g. linear regression model with normally distributed errors are small-sample and asymptotic distributions the same. Difference likely to be bigger –The smaller the sample size –The weaker the instruments

40 Rule of Thumb for Weak Instruments F-test for instruments in first-stage >10 Stricter than significant e.g. if one instrument F=10 equivalent to t=3.3

41 Conclusion Natural experiments useful source of knowledge Often requires use of IV Instrument exogeneity and relevance need justification Weak instruments potentially serious Good practice to present first-stage regression Finding more robust alternative to IV an active research area

Download ppt "Non-Experimental Data: Natural Experiments and more on IV."

Similar presentations

Ads by Google