Presentation is loading. Please wait.

Presentation is loading. Please wait.

Linear Panel Data Models

Similar presentations


Presentation on theme: "Linear Panel Data Models"โ€” Presentation transcript:

1 Linear Panel Data Models
Chapter 6: More on Linear Panel Data Models In this chapter, we present some additional linear panel estimation methods, in particular those involving instrumental variable (IV) estimation. The main topics include IV estimation โ€“ Liner Models Basic IV theory, Model setup, Common IV estimators: IV, 2SLS, and GMM, Instrument validity and relevance, etc. Panel IV estimation: the xtivreg command. Hausman-Taylor estimator for FE model. Zhenlin Yang

2 6.1. IV Estimation โ€“ Linear Models
OLS estimator. Consider the multiple linear regression model: ๐‘ฆ ๐‘– =๐›ผ+ ๐‘‹ ๐‘– โ€ฒ ๐›ฝ+ ๐‘ข ๐‘– = ๐— ๐‘– โ€ฒ ๐›ƒ+ ๐‘ข ๐‘– , ๐‘–=1, โ€ฆ, ๐‘›. The ordinary least squares (OLS) estimator of ๐›ƒ is ๐›ƒ OLS = ( ๐— โ€ฒ ๐—) โˆ’1 ๐—โ€ฒ๐‘ฆ, which minimizes the sum of squares of errors, ๐‘–=1 ๐‘› ๐‘ฆ ๐‘– โˆ’ ๐— ๐‘– โ€ฒ ๐›ƒ 2 =(๐‘ฆโˆ’๐—๐›ƒ)โ€ฒ(๐‘ฆโˆ’๐—๐›ƒ). The condition for ๐›ƒ OLS to be valid (unbiased, consistent): (i) E ๐‘ข ๐‘– ๐‘‹ ๐‘– =0 (exogeneity of regressors). Under (i), E ๐›ƒ OLS ๐— =๐ธ ( ๐— โ€ฒ ๐—) โˆ’1 ๐—โ€ฒ๐‘ฆ ๐— = ๐›ƒ+๐ธ ๐‘ข ๐— =๐›ƒ, implying that E( ๐›ƒ OLS )=๐›ƒ (unbiased). The conditions for ๐›ƒ OLS to be efficient: (ii) E ๐‘ข ๐‘– 2 | ๐‘‹ ๐‘– = ๐œŽ 2 (conditional homoskedasticity), and (ii) E ๐‘ข ๐‘– ๐‘ข ๐‘— | ๐‘‹ ๐‘– , ๐‘‹ ๐‘— =0, ๐‘–โ‰ ๐‘— (conditional zero correlation). Under (ii) and (iii), Var ๐›ƒ OLS = ๐œŽ 2 ( ๐— โ€ฒ ๐—) โˆ’1 (efficient).

3 OLS Estimator Heteroskedasticity-robust standard errors. If homoskedasticity assumption is violated, i.e., E ๐‘ข ๐‘– 2 | ๐‘‹ ๐‘– = ๐œŽ ๐‘– 2 (heteroskedasticity), the OLS estimator ๐›ƒ OLS remains valid (unbiased, consistent), but Var ๐›ƒ OLS โ‰  ๐œŽ 2 ๐— โ€ฒ ๐— โˆ’1 , instead, Var ๐›ƒ OLS โ‰  ๐— โ€ฒ ๐— โˆ’1 diag( ๐œŽ ๐‘– 2 ) ๐— โ€ฒ ๐— โˆ’1 . A heteroskedasticity-robust estimator of Var ๐›ƒ OLS is V robust ๐›ƒ OLS = ๐— โ€ฒ ๐— โˆ’1 ๐‘ ๐‘โˆ’๐‘˜โˆ’1 ๐‘– ๐‘› ๐‘ข ๐‘– 2 ๐‘‹ ๐‘– ๐‘‹ ๐‘– โ€ฒ ๐— โ€ฒ ๐— โˆ’1 , where ๐‘ข ๐‘– are OLS residuals, i.e., ๐‘ข ๐‘– = ๐‘ฆ ๐‘– โˆ’ ๐— ๐‘– โ€ฒ ๐›ƒ OLS . Cluster-robust standard errors. If observations possess some cluster or group structure, such that the errors are correlated within a cluster but are uncorrelated across clusters, then a cluster-robust estimator of Var ๐›ƒ OLS is V cluster ๐›ƒ OLS = ๐— โ€ฒ ๐— โˆ’1 ๐บ ๐บโˆ’1 ๐‘ ๐‘โˆ’๐‘˜โˆ’1 ๐‘” ๐— ๐‘” ๐ฎ ๐‘” ๐ฎ ๐‘” โ€ฒ ๐— ๐‘” โ€ฒ ๐— โ€ฒ ๐— โˆ’1 , where ๐ฎ ๐‘” is the vector of OLS residuals corresponding to the gth cluster, and ๐— ๐‘” is the matrix of regressorsโ€™ values in the gth cluster, g =1, , G, G โ†’โˆž.

4 GLS Estimator If E ๐‘ข ๐‘ข โ€ฒ ๐— = ๐œŽ 2 ฮฉ, where ฮฉโ‰ I, but is a known correlation matrix ((i) and/or (ii) violated), the generalized least-squares (GLS) estimator is: ๐›ƒ GLS = ( ๐— โ€ฒ ฮฉ โˆ’1 ๐—) โˆ’1 ๐—โ€ฒ ฮฉ โˆ’1 ๐‘ฆ, which minimizes the sum of squares: (๐‘ฆโˆ’๐—๐›ƒ)โ€ฒ ฮฉ โˆ’1 (๐‘ฆโˆ’๐—๐›ƒ), and Var( ๐›ƒ GLS )= ๐œŽ 2 ( ๐— โ€ฒ ฮฉ โˆ’1 ๐—) โˆ’1 . Both ๐›ƒ OLS and ๐›ƒ GLS are unbiased, and consistent. But ๐›ƒ GLS is more efficient than ๐›ƒ OLS , because Var( ๐›ƒ GLS )= ( ๐— โ€ฒ ฮฉ โˆ’1 ๐—) โˆ’1 is โ€œless thanโ€ Var ๐›ƒ OLS = ๐œŽ 2 ( ๐— โ€ฒ ๐—) โˆ’1 . In case where ฮฉ is known up to a finite number of parameters ๐›พ, i.e., ฮฉ= ฮฉ(๐›พ), and if a consistent estimator of ๐›พ is available, say ๐›พ , then a feasible GLS (FGLS) estimator of ๐›ƒ and its variance are: ๐›ƒ FGLS = ( ๐— โ€ฒ ฮฉ โˆ’1 ๐—) โˆ’1 ๐—โ€ฒ ฮฉ โˆ’1 ๐‘ฆ, where ฮฉ =ฮฉ ๐›พ ; Var( ๐›ƒ FGLS )= ๐œŽ 2 ( ๐— โ€ฒ ฮฉ โˆ’1 ๐—) โˆ’1 , where ๐œŽ 2 is a consistent estimator of ๐œŽ 2 .

5 IV Estimation: Basic Idea
The most critical assumption for the validity of the usual linear regression analysis is the exogeneity assumption, E ๐‘ข ๐‘‹ =0. Violation of this assumption renders OLS and GLS inconsistent. Instrumental variables (IV) provides a consistent estimator under a strong assumption that valid instruments exists, where the instruments Z are the variables that: (i) are correlated with the regressors X; (ii) satisfy E ๐‘ข ๐™ =0. Consider the simplest linear regression model without a intercept: y = x๏ข + u, where y measures earnings, x measures years of schooling, and u is the error term. If this simplest model assumes x is unrelated with u, then the only effect of x on y is a direct effect, via the term x๏ข, as shown below:

6 IV Estimation: Basic Idea
In the path diagram, the absence of a direct arrow from u to x means that there is no association between x and u. Then, the OLS estimator ๐›ฝ = ๐‘– ๐‘ฅ ๐‘– ๐‘ฆ ๐‘– / ๐‘– ๐‘ฅ ๐‘– 2 is consistent for ๏ข. x y u The errors u embodies all factors other than schooling that determine earnings. One such factor is ability, which is likely correlated to x, as high ability tends to lead to high schooling. The OLS estimator ๐›ฝ is then x y u inconsistent for ๏ข, as ๐›ฝ combines the desired direct effect of schooling on earnings (๏ข) with the indirect effect of ability: high x ๏ƒž high u ๏ƒž high y. A regressor X is called an endogenous, meaning that it arises within a system the influence u. As a consequence E ๐‘ข ๐‘‹ โ‰ 0; By contrast, an exogenous regressor arises outside the system and is unrelated to u.

7 IV Estimation: Basic Idea
An obvious solution to the endogeneity problem is to include as regressors controls for ability, called control function approach. But such regressors may not be available, and even if they do (e.g., IQ scores), there are questions about the extent to which they measure inherent ability.. IV approach provides an alternative solution. Let z be an IV such that the changes in z is associated with the changes in x but do not lead to changes in y (except indirectly via x). This leads to the path diagram: z x y u For example, proximity to college (z) may determine college attendance (x) but not directly determines earnings (y). The IV estimator for this simple example is ๐›ฝ IV = ๐‘– ๐‘ง ๐‘– ๐‘ฆ ๐‘– / ๐‘– ๐‘ง ๐‘– ๐‘ฅ ๐‘– . The IV estimator ๐›ฝ IV is consistent for ๏ข provided the instrument z if is unrelated with the error u and correlated with the regressor x.

8 IV Estimation: Model Setup
We now consider the more general regression model with a scalar dependent variable y1 which depends on m endogenous regressors y2, and K1 exogenous regressors X1 (including an intercept). This model is called a structural equation, with ๐‘ฆ 1๐‘– = ๐ฒ 2๐‘– โ€ฒ ๐›ƒ 1 + ๐— 1๐‘– โ€ฒ ๐›ƒ 2 + ๐‘ข ๐‘– , ๐‘–=1, โ€ฆ, ๐‘› (6.1) The ๐‘ข ๐‘– are assumed to be uncorrelated with X1i, but are correlated with y2i, rendering OLS estimator of ๐›ƒ=( ๐›ƒ 1 โ€ฒ , ๐›ƒ 2 โ€ฒ )โ€ฒ inconsistent. To obtain a consistent estimator, we assume the existence of at least m IV X2 for y2 that satisfy the condition E ๐‘ข ๐‘– ๐— 2๐‘– =0. The instruments X2 need to be correlated with y2 so that provide some information on the variables being instrumented. One way to see this is, for each component ๐‘ฆ 2๐‘— of y2, ๐‘ฆ 2๐‘—๐‘– = ๐— 1๐‘– โ€ฒ ๐›‘ 1 + ๐— 2๐‘– โ€ฒ ๐›‘ 2 + ๐œ€ ๐‘—๐‘– , j = 1, โ€ฆ, m. (6.2)

9 IV Estimators: IV, 2SLS, and GMM
Write the model (6.1) as (the dependent variable is denoted by y rather than y1), ๐‘ฆ ๐‘– = ๐— ๐‘– โ€ฒ ๐›ƒ+ ๐‘ข ๐‘– , (6.3) where the regressor vector ๐— ๐‘– โ€ฒ = ๐ฒ 2๐‘– โ€ฒ ๐— 1๐‘– โ€ฒ combines the endogenous and exogenous variables. Now, let ๐™ ๐‘– โ€ฒ = ๐— 1๐‘– โ€ฒ ๐— 2๐‘– โ€ฒ , called collectively the vector of IVs, where X1 serves as the (ideal) instrument for itself and X2 is the instrument for y2. The instruments satisfy the conditional moment restriction, E( ๐‘ข ๐‘– | ๐™ ๐‘– )= (6.4) This implies the following population moment condition: E ๐™ ๐‘– โ€ฒ ( ๐‘ฆ ๐‘– โˆ’ ๐— ๐‘– โ€ฒ ๐›ƒ) = (6.5) We say โ€œwe regress y on X using instruments Zโ€. The IV estimators are the solutions to the sample analogue of (6.5).

10 IV Estimators: IV, 2SLS, and GMM
Case I: dim(Z) = dim(X): the number of instruments exactly equals to the number of regressors, called the just-identified case. The sample analogue of (6.5) is 1 ๐‘› ๐‘–=1 ๐‘› ๐™ ๐‘– โ€ฒ ( ๐‘ฆ ๐‘– โˆ’ ๐— ๐‘– โ€ฒ ๐›ƒ) =0, (6.6) which can be written in vector form, ๐™โ€ฒ(๐‘ฆโˆ’๐—๐›ƒ)=0. Solving for ๐›ƒ leads to the IV estimator: ๐›ƒ IV = ( ๐™ โ€ฒ ๐—) โˆ’1 ๐™ โ€ฒ ๐‘ฆ. Case II: dim(Z) < dim(X), called the not-identified case, where there are fewer instruments than regressors. In this case, no consistent IV estimator exists.

11 IV Estimators: IV, 2SLS, and GMM
Case III: dim(Z) > dim(X), called the over-identified case, where there are more instruments than regressors. Then, ๐™โ€ฒ(๐‘ฆโˆ’๐—๐›ƒ)=0 has no solution for ๐›ƒ because it is a system of dix(Z) equations for dim(X) unknowns. One possibility is to arbitrarily drop instruments to get to the just- identified case. But there are more efficient estimators. One is the two-stage least-squares (2SLS) estimator: ๐›ƒ 2SLS = ( ๐— โ€ฒ ๐™ ( ๐™ โ€ฒ ๐™) โˆ’1 ๐™ โ€ฒ ๐—) โˆ’1 ๐— โ€ฒ ๐™ ( ๐™ โ€ฒ ๐™) โˆ’1 ๐™ โ€ฒ ๐‘ฆ. which is obtained by running two OLS regressions: an OLS regression of (6.2) to get the predicted y2, say ๐ฒ 2 ; and an OLS regression of (6.1) with y2 replaced by ๐ฒ 2 . This estimator is the most efficient if the errors are independent and homoscedastic.

12 IV Estimators: IV, 2SLS, and GMM
Case III: GMM estimator.

13 Instrument Validity and Relevance

14 6.2. Panel IV estimation

15 Panel IV estimation

16 Hausman-Taylor Estimator


Download ppt "Linear Panel Data Models"

Similar presentations


Ads by Google