Linear Panel Data Models

Slides:



Advertisements
Similar presentations
Data organization.
Advertisements

Economics 20 - Prof. Anderson
Managerial Economics in a Global Economy
Multiple Regression Analysis
General Linear Model With correlated error terms  =  2 V ≠  2 I.
The Simple Regression Model
CHAPTER 3: TWO VARIABLE REGRESSION MODEL: THE PROBLEM OF ESTIMATION
Instrumental Variables Estimation and Two Stage Least Square
Lecture 4 Econ 488. Ordinary Least Squares (OLS) Objective of OLS  Minimize the sum of squared residuals: where Remember that OLS is not the only possible.
8.4 Weighted Least Squares Estimation Before the existence of heteroskedasticity-robust statistics, one needed to know the form of heteroskedasticity -Het.
Chapter 10 Simple Regression.
Prof. Dr. Rainer Stachuletz
Econ Prof. Buckles1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
Chapter 11 Multiple Regression.
Topic 3: Regression.
Topic4 Ordinary Least Squares. Suppose that X is a non-random variable Y is a random variable that is affected by X in a linear fashion and by the random.
Lecture 2 (Ch3) Multiple linear regression
1 In a second variation, we shall consider the model shown above. x is the rate of growth of productivity, assumed to be exogenous. w is now hypothesized.
Regression and Correlation Methods Judy Zhong Ph.D.
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u.
Generalised method of moments approach to testing the CAPM Nimesh Mistry Filipp Levin.
1 Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
I271B QUANTITATIVE METHODS Regression and Diagnostics.
Quantitative Methods. Bivariate Regression (OLS) We’ll start with OLS regression. Stands for  Ordinary Least Squares Regression. Relatively basic multivariate.
The Instrumental Variables Estimator The instrumental variables (IV) estimator is an alternative to Ordinary Least Squares (OLS) which generates consistent.
Econometrics III Evgeniya Anatolievna Kolomak, Professor.
Lecturer: Ing. Martina Hanová, PhD..  How do we evaluate a model?  How do we know if the model we are using is good?  assumptions relate to the (population)
Heteroscedasticity Chapter 8
Vera Tabakova, East Carolina University
Esman M. Nyamongo Central Bank of Kenya
Chapter 15 Panel Data Models.
Chapter 7. Classification and Prediction
Linear Regression with One Regression
Vera Tabakova, East Carolina University
Multiple Regression Analysis: Estimation
Esman M. Nyamongo Central Bank of Kenya
THE LINEAR REGRESSION MODEL: AN OVERVIEW
Instrumental Variable (IV) Regression
Chapter 5: The Simple Regression Model
More on Specification and Data Issues
Simultaneous equation system
Evgeniya Anatolievna Kolomak, Professor
Chapter 3: TWO-VARIABLE REGRESSION MODEL: The problem of Estimation
Multiple Regression Analysis
Fundamentals of regression analysis 2
STOCHASTIC REGRESSORS AND THE METHOD OF INSTRUMENTAL VARIABLES
Chapter 15 Panel Data Analysis.
More on Specification and Data Issues
Instrumental Variables and Two Stage Least Squares
Serial Correlation and Heteroskedasticity in Time Series Regressions
Two-Variable Regression Model: The Problem of Estimation
Chapter 6: MULTIPLE REGRESSION ANALYSIS
The Regression Model Suppose we wish to estimate the parameters of the following relationship: A common method is to choose parameters to minimise the.
MLR5 MLR3* MLR4’ MLR3 + Q is finite MLR5 MLR{2 4’} MLR2 MLR3 MLR4 MLR6
Instrumental Variables and Two Stage Least Squares
J.-F. Pâris University of Houston
Migration and the Labour Market
Serial Correlation and Heteroscedasticity in
Simultaneous equation models Prepared by Nir Kamal Dahal(Statistics)
Instrumental Variables and Two Stage Least Squares
Simple Linear Regression
Heteroskedasticity.
Product moment correlation
Instrumental Variables Estimation and Two Stage Least Squares
Linear Regression Summer School IFPRI
More on Specification and Data Issues
Simultaneous Equations Models
Serial Correlation and Heteroscedasticity in
Regression Models - Introduction
Presentation transcript:

Linear Panel Data Models Chapter 6: More on Linear Panel Data Models In this chapter, we present some additional linear panel estimation methods, in particular those involving instrumental variable (IV) estimation. The main topics include IV estimation – Liner Models Basic IV theory, Model setup, Common IV estimators: IV, 2SLS, and GMM, Instrument validity and relevance, etc. Panel IV estimation: the xtivreg command. Hausman-Taylor estimator for FE model. zlyang@smu.edu.sg http://www.mysmu.edu/faculty/zlyang/ Zhenlin Yang

6.1. IV Estimation – Linear Models OLS estimator. Consider the multiple linear regression model: 𝑦 𝑖 =𝛼+ 𝑋 𝑖 ′ 𝛽+ 𝑢 𝑖 = 𝐗 𝑖 ′ 𝛃+ 𝑢 𝑖 , 𝑖=1, …, 𝑛. The ordinary least squares (OLS) estimator of 𝛃 is 𝛃 OLS = ( 𝐗 ′ 𝐗) −1 𝐗′𝑦, which minimizes the sum of squares of errors, 𝑖=1 𝑛 𝑦 𝑖 − 𝐗 𝑖 ′ 𝛃 2 =(𝑦−𝐗𝛃)′(𝑦−𝐗𝛃). The condition for 𝛃 OLS to be valid (unbiased, consistent): (i) E 𝑢 𝑖 𝑋 𝑖 =0 (exogeneity of regressors). Under (i), E 𝛃 OLS 𝐗 =𝐸 ( 𝐗 ′ 𝐗) −1 𝐗′𝑦 𝐗 = 𝛃+𝐸 𝑢 𝐗 =𝛃, implying that E( 𝛃 OLS )=𝛃 (unbiased). The conditions for 𝛃 OLS to be efficient: (ii) E 𝑢 𝑖 2 | 𝑋 𝑖 = 𝜎 2 (conditional homoskedasticity), and (ii) E 𝑢 𝑖 𝑢 𝑗 | 𝑋 𝑖 , 𝑋 𝑗 =0, 𝑖≠𝑗 (conditional zero correlation). Under (ii) and (iii), Var 𝛃 OLS = 𝜎 2 ( 𝐗 ′ 𝐗) −1 (efficient).

OLS Estimator Heteroskedasticity-robust standard errors. If homoskedasticity assumption is violated, i.e., E 𝑢 𝑖 2 | 𝑋 𝑖 = 𝜎 𝑖 2 (heteroskedasticity), the OLS estimator 𝛃 OLS remains valid (unbiased, consistent), but Var 𝛃 OLS ≠ 𝜎 2 𝐗 ′ 𝐗 −1 , instead, Var 𝛃 OLS ≠ 𝐗 ′ 𝐗 −1 diag( 𝜎 𝑖 2 ) 𝐗 ′ 𝐗 −1 . A heteroskedasticity-robust estimator of Var 𝛃 OLS is V robust 𝛃 OLS = 𝐗 ′ 𝐗 −1 𝑁 𝑁−𝑘−1 𝑖 𝑛 𝑢 𝑖 2 𝑋 𝑖 𝑋 𝑖 ′ 𝐗 ′ 𝐗 −1 , where 𝑢 𝑖 are OLS residuals, i.e., 𝑢 𝑖 = 𝑦 𝑖 − 𝐗 𝑖 ′ 𝛃 OLS . Cluster-robust standard errors. If observations possess some cluster or group structure, such that the errors are correlated within a cluster but are uncorrelated across clusters, then a cluster-robust estimator of Var 𝛃 OLS is V cluster 𝛃 OLS = 𝐗 ′ 𝐗 −1 𝐺 𝐺−1 𝑁 𝑁−𝑘−1 𝑔 𝐗 𝑔 𝐮 𝑔 𝐮 𝑔 ′ 𝐗 𝑔 ′ 𝐗 ′ 𝐗 −1 , where 𝐮 𝑔 is the vector of OLS residuals corresponding to the gth cluster, and 𝐗 𝑔 is the matrix of regressors’ values in the gth cluster, g =1, . . . , G, G →∞.

GLS Estimator If E 𝑢 𝑢 ′ 𝐗 = 𝜎 2 Ω, where Ω≠I, but is a known correlation matrix ((i) and/or (ii) violated), the generalized least-squares (GLS) estimator is: 𝛃 GLS = ( 𝐗 ′ Ω −1 𝐗) −1 𝐗′ Ω −1 𝑦, which minimizes the sum of squares: (𝑦−𝐗𝛃)′ Ω −1 (𝑦−𝐗𝛃), and Var( 𝛃 GLS )= 𝜎 2 ( 𝐗 ′ Ω −1 𝐗) −1 . Both 𝛃 OLS and 𝛃 GLS are unbiased, and consistent. But 𝛃 GLS is more efficient than 𝛃 OLS , because Var( 𝛃 GLS )= ( 𝐗 ′ Ω −1 𝐗) −1 is “less than” Var 𝛃 OLS = 𝜎 2 ( 𝐗 ′ 𝐗) −1 . In case where Ω is known up to a finite number of parameters 𝛾, i.e., Ω= Ω(𝛾), and if a consistent estimator of 𝛾 is available, say 𝛾 , then a feasible GLS (FGLS) estimator of 𝛃 and its variance are: 𝛃 FGLS = ( 𝐗 ′ Ω −1 𝐗) −1 𝐗′ Ω −1 𝑦, where Ω =Ω 𝛾 ; Var( 𝛃 FGLS )= 𝜎 2 ( 𝐗 ′ Ω −1 𝐗) −1 , where 𝜎 2 is a consistent estimator of 𝜎 2 .

IV Estimation: Basic Idea The most critical assumption for the validity of the usual linear regression analysis is the exogeneity assumption, E 𝑢 𝑋 =0. Violation of this assumption renders OLS and GLS inconsistent. Instrumental variables (IV) provides a consistent estimator under a strong assumption that valid instruments exists, where the instruments Z are the variables that: (i) are correlated with the regressors X; (ii) satisfy E 𝑢 𝐙 =0. Consider the simplest linear regression model without a intercept: y = x + u, where y measures earnings, x measures years of schooling, and u is the error term. If this simplest model assumes x is unrelated with u, then the only effect of x on y is a direct effect, via the term x, as shown below:

IV Estimation: Basic Idea In the path diagram, the absence of a direct arrow from u to x means that there is no association between x and u. Then, the OLS estimator 𝛽 = 𝑖 𝑥 𝑖 𝑦 𝑖 / 𝑖 𝑥 𝑖 2 is consistent for . x y u The errors u embodies all factors other than schooling that determine earnings. One such factor is ability, which is likely correlated to x, as high ability tends to lead to high schooling. The OLS estimator 𝛽 is then x y u inconsistent for , as 𝛽 combines the desired direct effect of schooling on earnings () with the indirect effect of ability: high x  high u  high y. A regressor X is called an endogenous, meaning that it arises within a system the influence u. As a consequence E 𝑢 𝑋 ≠0; By contrast, an exogenous regressor arises outside the system and is unrelated to u.

IV Estimation: Basic Idea An obvious solution to the endogeneity problem is to include as regressors controls for ability, called control function approach. But such regressors may not be available, and even if they do (e.g., IQ scores), there are questions about the extent to which they measure inherent ability.. IV approach provides an alternative solution. Let z be an IV such that the changes in z is associated with the changes in x but do not lead to changes in y (except indirectly via x). This leads to the path diagram: z x y u For example, proximity to college (z) may determine college attendance (x) but not directly determines earnings (y). The IV estimator for this simple example is 𝛽 IV = 𝑖 𝑧 𝑖 𝑦 𝑖 / 𝑖 𝑧 𝑖 𝑥 𝑖 . The IV estimator 𝛽 IV is consistent for  provided the instrument z if is unrelated with the error u and correlated with the regressor x.

IV Estimation: Model Setup We now consider the more general regression model with a scalar dependent variable y1 which depends on m endogenous regressors y2, and K1 exogenous regressors X1 (including an intercept). This model is called a structural equation, with 𝑦 1𝑖 = 𝐲 2𝑖 ′ 𝛃 1 + 𝐗 1𝑖 ′ 𝛃 2 + 𝑢 𝑖 , 𝑖=1, …, 𝑛. (6.1) The 𝑢 𝑖 are assumed to be uncorrelated with X1i, but are correlated with y2i, rendering OLS estimator of 𝛃=( 𝛃 1 ′ , 𝛃 2 ′ )′ inconsistent. To obtain a consistent estimator, we assume the existence of at least m IV X2 for y2 that satisfy the condition E 𝑢 𝑖 𝐗 2𝑖 =0. The instruments X2 need to be correlated with y2 so that provide some information on the variables being instrumented. One way to see this is, for each component 𝑦 2𝑗 of y2, 𝑦 2𝑗𝑖 = 𝐗 1𝑖 ′ 𝛑 1 + 𝐗 2𝑖 ′ 𝛑 2 + 𝜀 𝑗𝑖 , j = 1, …, m. (6.2)

IV Estimators: IV, 2SLS, and GMM Write the model (6.1) as (the dependent variable is denoted by y rather than y1), 𝑦 𝑖 = 𝐗 𝑖 ′ 𝛃+ 𝑢 𝑖 , (6.3) where the regressor vector 𝐗 𝑖 ′ = 𝐲 2𝑖 ′ 𝐗 1𝑖 ′ combines the endogenous and exogenous variables. Now, let 𝐙 𝑖 ′ = 𝐗 1𝑖 ′ 𝐗 2𝑖 ′ , called collectively the vector of IVs, where X1 serves as the (ideal) instrument for itself and X2 is the instrument for y2. The instruments satisfy the conditional moment restriction, E( 𝑢 𝑖 | 𝐙 𝑖 )=0. (6.4) This implies the following population moment condition: E 𝐙 𝑖 ′ ( 𝑦 𝑖 − 𝐗 𝑖 ′ 𝛃) =0. (6.5) We say “we regress y on X using instruments Z”. The IV estimators are the solutions to the sample analogue of (6.5).

IV Estimators: IV, 2SLS, and GMM Case I: dim(Z) = dim(X): the number of instruments exactly equals to the number of regressors, called the just-identified case. The sample analogue of (6.5) is 1 𝑛 𝑖=1 𝑛 𝐙 𝑖 ′ ( 𝑦 𝑖 − 𝐗 𝑖 ′ 𝛃) =0, (6.6) which can be written in vector form, 𝐙′(𝑦−𝐗𝛃)=0. Solving for 𝛃 leads to the IV estimator: 𝛃 IV = ( 𝐙 ′ 𝐗) −1 𝐙 ′ 𝑦. Case II: dim(Z) < dim(X), called the not-identified case, where there are fewer instruments than regressors. In this case, no consistent IV estimator exists.

IV Estimators: IV, 2SLS, and GMM Case III: dim(Z) > dim(X), called the over-identified case, where there are more instruments than regressors. Then, 𝐙′(𝑦−𝐗𝛃)=0 has no solution for 𝛃 because it is a system of dix(Z) equations for dim(X) unknowns. One possibility is to arbitrarily drop instruments to get to the just- identified case. But there are more efficient estimators. One is the two-stage least-squares (2SLS) estimator: 𝛃 2SLS = ( 𝐗 ′ 𝐙 ( 𝐙 ′ 𝐙) −1 𝐙 ′ 𝐗) −1 𝐗 ′ 𝐙 ( 𝐙 ′ 𝐙) −1 𝐙 ′ 𝑦. which is obtained by running two OLS regressions: an OLS regression of (6.2) to get the predicted y2, say 𝐲 2 ; and an OLS regression of (6.1) with y2 replaced by 𝐲 2 . This estimator is the most efficient if the errors are independent and homoscedastic.

IV Estimators: IV, 2SLS, and GMM Case III: GMM estimator.

Instrument Validity and Relevance

6.2. Panel IV estimation

Panel IV estimation

Hausman-Taylor Estimator