Introduction Who I am Arbitrage Textbook Grading Homework Implementation Session 1 The two econometric problems Randomization as the Golden Benchmark Outline of the Course
Who I am Applied empirical economist. Work on urban economics, economics of education, applied econometrics in accounting. Emphasis on the identification of causal effects. Careful empirical work: clean data work, correct identification of causal effects. Large datasets: – +100 million observations, administrative datasets, geographic information software. Implementation of econometric procedures in Stata/Mata.
Trade-offs Classroom is heterogeneous. – In tastes, mathematics level, needs, prior knowledge. Different fields have different habits. – E.g. “endogeneity” is not an issue/the same issue in OB, Finance, Strategy, or TOM. Conclusion: – Course provides a particular spin on econometrics, with mathematics when needed, applications. This is a difficult course, even for students with a prior course in econometrics.
Textbooks *William H. Greene, Econometrics, 6 th edition. Jeffrey Wooldridge, Econometrics of Cross Section and Panel Data. Joshua Angrist and Jorn Steffen Pischke, Mostly Harmless Econometrics. Applied Econometrics using Stata, Cameron et al.
Prerequisites I assume you know: – Statistics Random variables. Moments of random variables (mean, variance, kurtosis, skewness). Probabilities. – Real analysis Integral of functions, derivatives. Convergence of a function at x or at infinity. – Matrix algebra Inverse, multiplication, projections.
Grading Exam: 60% Participation: 10% Homework: 30% – One problem set in-between Econometrics A and B.
Implementation STATA version 12. – License for PhD students. Ask IT. 5555 or Alina Jacquet. – Interactive mode, Do files, Mata programming. – Compulsory for this course. MATLAB, not for everybody. – Coding econometric procedures yourself, e.g. GMM.
Outline for Session 1 Introduction 1.Correlation and Causation 2.The Two Econometric Problems 3.Treatment Effects
1. CORRELATION AND CAUSATION Session 1 - Introduction
1. The perils of confounding correlation and causation How can we boost children’s reading scores? – Shoe size is correlated with IQ. Women earn less than men. – Sign of discrimination? Health is negatively correlated with the number of days spent in hospital. – Do hospitals kill patients?
Potential outcomes framework A.k.a the “Rubin causality model”. Outcome with the treatment Y(1), outcome without the treatment Y(0). Treatment status D=0,1. FUNDAMENTAL PROBLEM OF ECONOMETRICS: Either Y(1) or Y(0) is observed, or, equivalently, Y=Y(1) D + Y(0) (1- D) is observed. What would have happened if a given subject had received a different treatment?
Naïve estimator of the treatment effect =E(Y|D=1) – E(Y|D=0). Does that identify any relevant parameter? Notice that: – = E(Y|D=1) – E(Y|D=0) = E(Y(1)|D=1)-E(Y(0)|D=0) What are we looking for?
Ignorable Treatment (Rubin 1983) Assume Y(1),Y(0) D. Then E(Y(0)|D=1)=E(Y(0)|D=0)=E(Y(0)). Similarly for Y(1). Then:
Another Interpretation Assume Y(D)=a+bD+ . e is the “unobservables”. The naïve estimator D=b+E( |D=1)-E( |D=0). Selection bias: S=E( |D=1)-E( |D=0). – Overestimates the effect if S>0 – Underestimates the effect if S<0.
Definitions Treatment Effect. Y(1)-Y(0) Average Treatment Effect. E(Y(1)-Y(0)) Average Treatment on the Treated. E(Y(1)-Y(0)|D=1) Average Treatment on the Untreated. E(Y(1)-Y(0)|D=0)
Randomization as the Golden Benchmark Effect of a medical treatment. – Treatment and control group. – Randomization of the assignment to the treatment and to the control. Why randomize? … effect of jumping without a parachute on the probability of death.
With ignorability… If the treatment is ignorable (e.g. if the treatment has been randomly assigned to subjects) then – ATE = ATT = ATU
Selection bias Why is there a selection bias? – In medecine, in economics, in management? 1.Self-selection of subjects into the treatment. 2.Correlation between unobservables and observables, e.g. industry, gender, income.
2. THE TWO ECONOMETRIC PROBLEMS Session 1 - Introduction
2. The Two Econometric Problems Identification and Inference – “Studies of identification seek to characterize the conclusions that could be drawn if one could use the sampling process to obtain an unlimited number of observations.” – “Studies of statistical inference seek to characterize the generally weaker conclusions that can be drawn from a finite number of observations.”
Identification vs inference Consider a survey of a random subset of 1,302 French individuals. Identification: – Can you identify the average income in France? Inference: – How close to the true average income is the mean income in the sample? – i.e. what is the confidence interval around the estimate of the average income in Singapore?
Identification vs inference Consider a lab experiment with 9 rats, randomly assigned to a treatment group and a control group. Identification: – Can you identify the effect of the medication on the rats using the random assignment? Inference: – With 9 rats, can you say anything about the effectiveness of the medication?
This session This session has focused on identification. – i.e. I assume we have a potentially infinite dataset. – I focus on the conditions for the identification of the causal effect of a variable. Next session: what problems appear because we have a limited number of observations?
LOOKING FORWARD: OUTLINE OF THE COURSE Session 1 - Introduction
Outline of the course 1.Introduction: Identification 2.Introduction: Inference 3.Linear Regression 4.Identification Issues in Linear Regressions 5.Inference Issues in Linear Regressions
6.Identification in Simultaneous Equation Models 7.Instrumental variable (IV) estimation 8.Finding IVs: Identification strategies 9.Panel data analysis
10.Bootstrap 11.Generalized Method of Moments (GMM) 12.GMM: Dynamic Panel Data estimation 13.Maximum Likelihood (ML): Introduction 14.ML: Probit and Logit