Panel Data Models ECON 6002 Econometrics I Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.

Slides:



Advertisements
Similar presentations
Cointegration and Error Correction Models
Advertisements

Functional Form and Dynamic Models
Multiple Regression.
Autocorrelation and Heteroskedasticity
Introduction Describe what panel data is and the reasons for using it in this format Assess the importance of fixed and random effects Examine the Hausman.
Dynamic panels and unit roots
Econometric Analysis of Panel Data Panel Data Analysis – Random Effects Assumptions GLS Estimator Panel-Robust Variance-Covariance Matrix ML Estimator.
PANEL DATA 1. Dummy Variable Regression 2. LSDV Estimator
Panel Data Models Prepared by Vera Tabakova, East Carolina University.
Data organization.
Lecture 29 Summary of previous lecture LPM LOGIT PROBIT ORDINAL LOGIT AND PROBIT TOBIT MULTINOMIAL LOGIT AN PROBIT DURATION.
Properties of Least Squares Regression Coefficients
Lecture 8 (Ch14) Advanced Panel Data Method
Instrumental Variables Estimation and Two Stage Least Square
1 Lecture 2: ANOVA, Prediction, Assumptions and Properties Graduate School Social Science Statistics II Gwilym Pryce
Classical Regression III
1 Lecture 2: ANOVA, Prediction, Assumptions and Properties Graduate School Social Science Statistics II Gwilym Pryce
The Multiple Regression Model Prepared by Vera Tabakova, East Carolina University.
Chapter 10 Simple Regression.
Prof. Dr. Rainer Stachuletz
Simultaneous Equations Models
Econ 140 Lecture 131 Multiple Regression Models Lecture 13.
Econ 140 Lecture 181 Multiple Regression Applications III Lecture 18.
Chapter 15 Panel Data Analysis.
Chapter 11 Multiple Regression.
Further Inference in the Multiple Regression Model Prepared by Vera Tabakova, East Carolina University.
Topic 3: Regression.
Econ 140 Lecture 191 Autocorrelation Lecture 19. Econ 140 Lecture 192 Today’s plan Durbin’s h-statistic Autoregressive Distributed Lag model & Finite.
Multiple Linear Regression Analysis
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Hypothesis Testing in Linear Regression Analysis
Regression Method.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
Maximum Likelihood Estimation Methods of Economic Investigation Lecture 17.
Panel Data Analysis Introduction
Heteroskedasticity Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland.
Interval Estimation and Hypothesis Testing Prepared by Vera Tabakova, East Carolina University.
MODELS FOR PANEL DATA. PANEL DATA REGRESSION Double subscript on variables (observations) i… households, individuals, firms, countries t… period (time-series.
The Simple Linear Regression Model: Specification and Estimation ECON 4550 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s.
Panel Data Models ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.
Chap 6 Further Inference in the Multiple Regression Model
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Chapter 15 Panel Data Models Walter R. Paczkowski Rutgers University.
Dynamic Models, Autocorrelation and Forecasting ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.
When should you use fixed effects estimation rather than random effects estimation, or vice versa? FIXED EFFECTS OR RANDOM EFFECTS? 1 NLSY 1980–1996 Dependent.
Panel Random-Coefficient Model (xtrc) 경제학과 박사과정 이민준.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University.
Quantitative research methods in business administration Lecture 3 Multivariate analysis OLS, ENDOGENEITY BIAS, 2SLS Panel Data Exemplified by SPSS and.
Financial Econometrics Lecture Notes 5
Vera Tabakova, East Carolina University
Chapter 15 Panel Data Models.
Vera Tabakova, East Carolina University
15.5 The Hausman test For the random effects estimator to be unbiased in large samples, the effects must be uncorrelated with the explanatory variables.
Esman M. Nyamongo Central Bank of Kenya
David Bell University of Stirling
PANEL DATA REGRESSION MODELS
Prediction, Goodness-of-Fit, and Modeling Issues
PANEL DATA 1. Dummy Variable Regression 2. LSDV Estimator
12 Inferential Analysis.
STOCHASTIC REGRESSORS AND THE METHOD OF INSTRUMENTAL VARIABLES
Chapter 15 Panel Data Analysis.
Further Inference in the Multiple Regression Model
CHAPTER 29: Multiple Regression*
Migration and the Labour Market
Interval Estimation and Hypothesis Testing
12 Inferential Analysis.
Chapter 9 Dummy Variables Undergraduated Econometrics Page 1
Advanced Panel Data Methods
Presentation transcript:

Panel Data Models ECON 6002 Econometrics I Memorial University of Newfoundland Adapted from Vera Tabakova’s notes

 15.1 Grunfeld’s Investment Data  15.2 Sets of Regression Equations  15.3 Seemingly Unrelated Regressions  15.4 The Fixed Effects Model  15.4 The Random Effects Model  Extensions RCM, dealing with endogeneity when we have static variables

The different types of panel data sets can be described as:  “long and narrow,” with “long” time dimension and “narrow”, few cross sectional units;  “short and wide,” many units observed over a short period of time;  “long and wide,” indicating that both N and T are relatively large.

The data consist of T = 20 years of data ( ) for N = 10 large firms. Let y it = INV it and x 2it = V it and x 3it = K it Notice the subindices! Value of stock, proxy for expected profits Capital stock, proxy for desired permanent Capital stock

For simplicity we focus on only two firms keep if (i==3 | i==8) in STATA

Assumption (15.5) says that the errors in both investment functions (i) have zero mean, (ii) are homoskedastic with constant variance, and (iii) are not correlated over time; autocorrelation does not exist. The two equations do have different error variances

reg inv v k if i==3 scalar sse_ge = e(rss) reg inv v k if i==8 scalar sse_we = e(rss)

 Let D i be a dummy variable equal to 1 for the Westinghouse observations and 0 for the General Electric observations. * Create dummy variable gen d = (i == 8) gen dv = d*v gen dk = d*k * Estimate dummy variable model reg inv d v dv k dk test d dv dk

This assumption says that the error terms in the two equations, at the same point in time, are correlated. This kind of correlation is called a contemporaneous correlation.

Econometric software includes commands for SUR (or SURE) that carry out the following steps: (i) Estimate the equations separately using least squares; (ii) Use the least squares residuals from step (i) to estimate ; (iii) Use the estimates from step (ii) to estimate the two equations jointly within a generalized least squares framework.

* Open and summarize data use grunfeld2, clear summarize * SUR sureg ( inv_ge v_ge k_ge) ( inv_we v_we k_we), corr test ([inv_ge]_cons = [inv_we]_cons) ([inv_ge]_b[v_ge] = [inv_we]_b[v_we]) ([inv_ge]_b[k_ge] = [inv_we]_b[k_we])

There are two situations where separate least squares estimation is just as good as the SUR technique : (i) when the equation errors are not contemporaneously correlated; (ii) when the same explanatory variables appear in each equation. If the explanatory variables in each equation are different, then a test to see if the correlation between the errors is significantly different from zero is of interest.

In this case we have 3 parameters in each equation so:

Testing for correlated errors for two equations: LM = > 3.84 Hence we reject the null hypothesis of no correlation between the errors and conclude that there are potential efficiency gains from estimating the two investment equations jointly using SUR.

Testing for correlated errors for three equations:

Testing for correlated errors for M equations: Under the null hypothesis that there are no contemporaneous correlations, this LM statistic has a χ 2 -distribution with M(M–1)/2 degrees of freedom, in large samples.

Most econometric software will perform an F-test and/or a Wald χ 2 –test; in the context of SUR equations both tests are large sample approximate tests. The F-statistic has J numerator degrees of freedom and (MT  K) denominator degrees of freedom, where J is the number of hypotheses, M is the number of equations, and K is the total number of coefficients in the whole system, and T is the number of time series observations per equation. The χ 2 -statistic has J degrees of freedom.

We cannot consistently estimate the 3×N×T parameters in (15.9) with only NT total observations. But we can impose some more structure… We consider only one-way effects and assume common slope parameters across cross-sectional units

All behavioral differences between individual firms and over time are captured by the intercept. Individual intercepts are included to “control” for these firm specific differences.

This specification is sometimes called the least squares dummy variable model, or the fixed effects model.

These N–1= 9 joint null hypotheses are tested using the usual F-test statistic. In the restricted model all the intercept parameters are equal. If we call their common value β 1, then the restricted model is:

We reject the null hypothesis that the intercept parameters for all firms are equal. We conclude that there are differences in firm intercepts, and that the data should not be pooled into a single model with a common intercept parameter.

ONE PROBLEM: Even with the trick of using the within estimator, we still implicitly (even if no longer explicitly) include N-1 dummy variables in our model (not N, since we remove the intercept), so we use up N-1 degrees of freedom. It might not be then the most efficient way to estimate the common slope ANOTHER ONE. By using deviations from the means, the procedure wipes out all the static variables, whose effects might be of interest In order to overcome this problem, we can consider the random effects/or error components model

Randomness of the intercept Usual error

Because the random effects regression error has two components, one for the individual and one for the regression, the random effects model is often called an error components model. a composite error

There are several correlations that can be considered.  The correlation between two individuals, i and j, at the same point in time, t. The covariance for this case is given by

 The correlation between errors on the same individual (i) at different points in time, t and s. The covariance for this case is given by

 The correlation between errors for different individuals in different time periods. The covariance for this case is

Summary for now  Pooled OLS vs different intercepts: test (use a Chow type, after FE or run RE and test if the variance of the intercept component of the error is zero)  You cannot pool onto OLS? Then…  FE vs RE: test (Hausman type)  Different slopes too perhaps? => use SURE of RCM and test for equality of slopes across units

Summary for now  Note that there is within variation versus between variation  The OLS is an unweighted average of the between estimator and the within estimator  The RE is a weighted average of the between estimator and the within estimator  The FE is also a weighted average of the between estimator and the within estimator with zero as the weight for the between part

Summary for now  The RE is a weighted average of the between estimator and the within estimator  The FE is also a weighted average of the between estimator and the within estimator with zero as the weight for the between part  So now you see where the extra efficiency of RE comes from!...

Summary for now  The RE uses information from both the cross- sectional variation in the panel and the time series variation, so it mixes LR and and SR effects  The FE uses only information from the time series variation, so it estimates LR effects

Summary for now  With a panel, we can learn about dynamic effects from a short panel, while we need a long time series on a single cross-sectional unit, to learn about dynamics from a time series data set

If the random error is correlated with any of the right- hand side explanatory variables in a random effects model then the least squares and GLS estimators of the parameters are biased and inconsistent.

We expect to find because Hausman proved that

The test statistic to the coefficient of SOUTH is: Using the standard 5% large sample critical value of 1.96, we reject the hypothesis that the estimators yield identical results. Our conclusion is that the random effects estimator is inconsistent, and we should use the fixed effects estimator, or we should attempt to improve the model specification.

If the random error is correlated with any of the right- hand side explanatory variables in a random effects model then the least squares and GLS estimators of the parameters are biased and inconsistent. Then we would have to use the FE model But with FE we lose the static variables? Solutions? HT, AM, BMS, instrumental variables models could help

We can generalise the random effects idea and allow for different slopes too: Random Coefficients Model Again, the now it is the slope parameters that differ, but as in RE model, they are drawn from a common distribution The RCM in a way is to the RE model what the SURE model is to the FE model Further issues

Unit root tests and Cointegration in panels Dynamics in panels Further issues

 Of course it is not necessary that one of the dimensions of the panel is time as suchExample: i are students and t is for each quiz they take  Of course we could have a one-way effect model on the time dimension instead  Or a two-way model  Or a three way model! But things get a bit more complicated there… Further issues

 Another way to have more fun with panel data is to consider dependent variables that are not continuous  Logit, probit, count data can be considered  STATA has commands for these  Based on maximum likelihood and other estimation techniques we have not yet considered Further issues

 You can understand the use of the FE model as a solution to omitted variable bias  If the unmeasured variables left in the error model are not correlated with the ones in the model, we would not have a bias in OLS, so we can safely use RE  If the unmeasured variables left in the error model are correlated with the ones in the model, we would have a bias in OLS, so we cannot use RE, we should not leave them out and we should use FE, which bundles them together in each cross-sectional dummy Further issues

 Another criterion to choose between FE and RE  If the panel include all the relevant cross-sectional units, use FE, if only a random sample from a population, RE is more appropriate (as long as it is valid)  Further issues

Wooldridge’s book on panel data Baltagi’s book on panel data Greene’s coverage is also good Readings

Slide Principles of Econometrics, 3rd Edition  Balanced panel  Breusch-Pagan test  Cluster corrected standard errors  Contemporaneous correlation  Endogeneity  Error components model  Fixed effects estimator  Fixed effects model  Hausman test  Heterogeneity  Least squares dummy variable model  LM test  Panel corrected standard errors  Pooled panel data regression  Pooled regression  Random effects estimator  Random effects model  Seemingly unrelated regressions  Unbalanced panel

Slide Principles of Econometrics, 3rd Edition

Slide (15A.1) (15A.2) (15A.3)

Principles of Econometrics, 3rd Edition Slide (15A.4) (15A.5)

Principles of Econometrics, 3rd Edition Slide (15A.6) (15A.7)