Introduction 1 Panel Data Analysis. And now for… Panel Data! Panel data has both a time series and cross- section component Observe same (eg) people over.

Slides:



Advertisements
Similar presentations
Introduction Describe what panel data is and the reasons for using it in this format Assess the importance of fixed and random effects Examine the Hausman.
Advertisements

Economics 20 - Prof. Anderson1 Panel Data Methods y it = x it k x itk + u it.
Economics 20 - Prof. Anderson
Methods of Economic Investigation Lecture 2
Random Assignment Experiments
Lecture 8 (Ch14) Advanced Panel Data Method
Linear Regression with Multiple Regressors
Christopher Dougherty EC220 - Introduction to econometrics (chapter 4) Slideshow: interactive explanatory variables Original citation: Dougherty, C. (2012)
Heteroskedasticity The Problem:
Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem.
Advanced Panel Data Techniques
Lecture 4 This week’s reading: Ch. 1 Today:
Valuation 4: Econometrics Why econometrics? What are the tasks? Specification and estimation Hypotheses testing Example study.
Pooled Cross Sections and Panel Data II
Shall we take Solow seriously?? Empirics of growth Ania Nicińska Agnieszka Postępska Paweł Zaboklicki.
Econ Prof. Buckles1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
Regression Example Using Pop Quiz Data. Second Pop Quiz At my former school (Irvine), I gave a “pop quiz” to my econometrics students. The quiz consisted.
1Prof. Dr. Rainer Stachuletz Fixed Effects Estimation When there is an observed fixed effect, an alternative to first differences is fixed effects estimation.
Introduction to Regression Analysis Straight lines, fitted values, residual values, sums of squares, relation to the analysis of variance.
1 Review of Correlation A correlation coefficient measures the strength of a linear relation between two measurement variables. The measure is based on.
FIN357 Li1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
So far, we have considered regression models with dummy variables of independent variables. In this lecture, we will study regression models whose dependent.
Multicollinearity Omitted Variables Bias is a problem when the omitted variable is an explanator of Y and correlated with X1 Including the omitted variable.
© Christopher Dougherty 1999–2006 VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE We will now investigate the consequences of misspecifying.
Economics 20 - Prof. Anderson1 Fixed Effects Estimation When there is an observed fixed effect, an alternative to first differences is fixed effects estimation.
1Prof. Dr. Rainer Stachuletz Panel Data Methods y it =  0 +  1 x it  k x itk + u it.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: variable misspecification iii: consequences for diagnostics Original.
1 INTERPRETATION OF A REGRESSION EQUATION The scatter diagram shows hourly earnings in 2002 plotted against years of schooling, defined as highest grade.
TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT This sequence describes the testing of a hypotheses relating to regression coefficients. It is.
SLOPE DUMMY VARIABLES 1 The scatter diagram shows the data for the 74 schools in Shanghai and the cost functions derived from a regression of COST on N.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: precision of the multiple regression coefficients Original citation:
EDUC 200C Section 4 – Review Melissa Kemmerle October 19, 2012.
TOBIT ANALYSIS Sometimes the dependent variable in a regression model is subject to a lower limit or an upper limit, or both. Suppose that in the absence.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: dummy variable classification with two categories Original citation:
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: the effects of changing the reference category Original citation: Dougherty,
DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.
1 INTERACTIVE EXPLANATORY VARIABLES The model shown above is linear in parameters and it may be fitted using straightforward OLS, provided that the regression.
LT6: IV2 Sam Marden Question 1 & 2 We estimate the following demand equation ln(packpc) = b 0 + b 1 ln(avgprs) +u What do we require.
1 TWO SETS OF DUMMY VARIABLES The explanatory variables in a regression model may include multiple sets of dummy variables. This sequence provides an example.
1 PROXY VARIABLES Suppose that a variable Y is hypothesized to depend on a set of explanatory variables X 2,..., X k as shown above, and suppose that for.
Returning to Consumption
How do Lawyers Set fees?. Learning Objectives 1.Model i.e. “Story” or question 2.Multiple regression review 3.Omitted variables (our first failure of.
Addressing Alternative Explanations: Multiple Regression
EDUC 200C Section 3 October 12, Goals Review correlation prediction formula Calculate z y ’ = r xy z x for a new data set Use formula to predict.
Error Component Models Methods of Economic Investigation Lecture 8 1.
1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u.
Difference in Difference 1. Preliminaries Office Hours: Fridays 4-5pm 32Lif, 3.01 I will post slides from class on my website
. reg LGEARN S WEIGHT85 Source | SS df MS Number of obs = F( 2, 537) = Model |
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: exercise 5.2 Original citation: Dougherty, C. (2012) EC220 - Introduction.
Panel Data. Assembling the Data insheet using marriage-data.csv, c d u "background-data", clear d u "experience-data", clear u "wage-data", clear d reshape.
POSSIBLE DIRECT MEASURES FOR ALLEVIATING MULTICOLLINEARITY 1 What can you do about multicollinearity if you encounter it? We will discuss some possible.
(1)Combine the correlated variables. 1 In this sequence, we look at four possible indirect methods for alleviating a problem of multicollinearity. POSSIBLE.
COST 11 DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES 1 This sequence explains how you can include qualitative explanatory variables in your regression.
GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL The output above shows the result of regressing EARNINGS, hourly earnings in dollars, on S, years.
1 BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL Economists are often interested in the factors behind the decision-making of individuals or enterprises,
1 REPARAMETERIZATION OF A MODEL AND t TEST OF A LINEAR RESTRICTION Linear restrictions can also be tested using a t test. This involves the reparameterization.
F TESTS RELATING TO GROUPS OF EXPLANATORY VARIABLES 1 We now come to more general F tests of goodness of fit. This is a test of the joint explanatory power.
WHITE TEST FOR HETEROSCEDASTICITY 1 The White test for heteroscedasticity looks for evidence of an association between the variance of the disturbance.
VARIABLE MISSPECIFICATION II: INCLUSION OF AN IRRELEVANT VARIABLE In this sequence we will investigate the consequences of including an irrelevant variable.
VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE In this sequence and the next we will investigate the consequences of misspecifying the regression.
Managerial Economics & Decision Sciences Department cross-section and panel data  fixed effects  omitted variable bias  business analytics II Developed.
QM222 Class 10 Section D1 1. Goodness of fit -- review 2
Pooling Cross Sections across Time: Simple Panel Data Methods
Econometrics ITFD Week 8.
business analytics II ▌panel data models
Pooling Cross Sections across Time: Simple Panel Data Methods
Economics 20 - Prof. Anderson
QM222 Class 15 Section D1 Review for test Multicollinearity
Ch. 13. Pooled Cross Sections Across Time: Simple Panel Data.
Ch. 13. Pooled Cross Sections Across Time: Simple Panel Data.
Presentation transcript:

Introduction 1 Panel Data Analysis

And now for… Panel Data! Panel data has both a time series and cross- section component Observe same (eg) people over time You’ve already used it! Difference-in-differences is a panel (or pooled cross-section) data technique Panel data can be used to address some kinds of omitted variable bias E.g., use “yourself in a later period” as comparison group for yourself today If the omitted variables is fixed over time, this “fixed effect” approach removes bias

Unobserved Fixed Effects Initially consider having two periods of data (t=1, t=2), and suppose the population model is: y it =  0 +  0 d2 t +  1 x it1 +…+  k x itk + a i + u it a i = “person effect” (etc) has no “t” subscript u it = “idiosyncratic error” Person i… …in period t Dummy for t= 2 (intercept shift) a i = time-constant component of the composite error, third subscript: variable #

Unobserved Fixed Effects The population model is y it =  0 +  0 d2 t +  1 x it1 +…+  k x itk + a i + u it If a i is correlated with the x’s, OLS will be biased, since a i is part of the composite error term Aside: this also suffers from autocorrelation  Cov( i1, i2 ) = cov(a i,a i ) + 2cov(u it,a i ) + cov(u i2,u i1 ) = var(a i )  So OLS standard errors biased (downward) – more later. But supposing the u it are not correlated with the x’s – just the fixed part of the error is -- we can “difference out” the unobserved fixed effect… it

First differences Period 2: y i2 =  0 +  0 ∙1 +  1 x i21 +…+  k x i2k + a i + u i2 Period 1: y i1 =  0 +  0 ∙0 +  1 x i11 +…+  k x i1k + a i + u i1 Diff:  y i =  0 +  1  x i1 +…+  k  x ik +  u i  y i,  x i1,…,  x ik : “differenced data” – changes in y, x 1, x 2,…,x k from period 1 to period 2  Need to be careful about organization of the data to be sure compute correct change Model has no correlation between the x’s and the new error term (*just by assumption*), so no bias (Also, autocorrelation taken out)

Differencing w/ Multiple Periods Can extend this method to more periods Simply difference all adjacent periods So if 3 periods, then subtract period 1 from period 2, period 2 from period 3 and have 2 observations per individual; etc.  Also: include dummies for each period, so called “period dummies” or “period effects” Assuming the  u it are uncorrelated over time (and with  x’s) can estimate by OLS Otherwise, autocorrelation (and ov bias) remain

7 Two-period example from textbook Does higher unemployment rate raise crime? Data from:  46 U.S. cities (cross-sectional unit “i”)  in 1982, 1987 (the two years, “t”) Regress crmrte (crimes per 1000 population) on unem (unemployment rate) and a dummy for 1987 First, let’s see the data…

8 crmrteunemd ………

9 Pooled cross-section regression. reg crmrte unem d87, robust Linear regression Number of obs = 92 F( 2, 89) = 0.63 Prob > F = R-squared = Root MSE = | Robust crmrte | Coef. Std. Err. t P>|t| [95% Conf. Interval] unem | d87 | _cons | observations Nothing significant, magnitude of coefficients small

10 First difference regression c = “change” = . reg ccrmrte cunem, robust Linear regression Number of obs = 46 F( 1, 44) = 7.40 Prob > F = R-squared = Root MSE = | Robust ccrmrte | Coef. Std. Err. t P>|t| [95% Conf. Interval] cunem | _cons | Now only 46 observations (why?) Both intercept shift (-- now the constant) and unemployment rate are significant Also: magnitudes larger

11 crmrteccrmrteunemcunemd …………… Data convention: change is in later period observation

12 Why did coefficient estimates get larger and more significant? Perhaps cross-section regression suffered from omitted variables bias [cov(x it,a i ) ≠ 0] Third factors, fixed across the two periods, which raise unemployment rate and lower crime rate  (??) More generous unemployment benefits? … To be clear: taking differences can make omitted variables bias worse in some cases To oversimplify, depends which is larger:  cov(  x it,  u it ) or cov(x it,a i ) Possible example: crime and police

13 More police cause more crime?! (lpolpc = log police per capita). reg crmrte lpolpc d87, robust Linear regression Number of obs = 92 F( 2, 89) = 9.72 Prob > F = R-squared = Root MSE = | Robust crmrte | Coef. Std. Err. t P>|t| [95% Conf. Interval] lpolpc | d87 | _cons | A 100% increase in police officers per capita associated with 41 more crimes per 1,000 population Seems unlikely to be causal! (What’s going on?!)

14 In first differences. reg ccrmrte clpolpc, robust Linear regression Number of obs = 46 F( 1, 44) = 4.13 Prob > F = R-squared = Root MSE = | Robust ccrmrte | Coef. Std. Err. t P>|t| [95% Conf. Interval] clpolpc | _cons | % increase in police officer per capita now associated with 85 more crimes per 1,000 population!! Could it be that omitted variables bias is worse in changes in this case? On the other hand, confidence interval is wide

Bottom line Estimating in “differences” is not a panacea Though we usually trust this variation more than cross-sectional variation, it is not always the case it suffers from less bias  Another example: differencing also exacerbates bias from measurement error (soon!) Instead, as usual, a credible “natural experiment” is always what is really critical 15