# EC220 - Introduction to econometrics (chapter 14)

## Presentation on theme: "EC220 - Introduction to econometrics (chapter 14)"— Presentation transcript:

EC220 - Introduction to econometrics (chapter 14)
Christopher Dougherty EC220 - Introduction to econometrics (chapter 14) Slideshow: regression analysis with panel data Original citation: Dougherty, C. (2012) EC220 - Introduction to econometrics (chapter 14). [Teaching Resource] © 2012 The Author This version available at: Available in LSE Learning Resources Online: May 2012 This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License. This license allows the user to remix, tweak, and build upon the work even for commercial purposes, as long as the user credits the author and licenses their new creations under the identical terms.

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION
A panel data set, or longitudinal data set, is one where there are repeated observations on the same units. 1

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION
A panel data set, or longitudinal data set, is one where there are repeated observations on the same units. The units may be individuals, households, enterprises, countries, or any set of entities that remain stable through time. 2

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION
A panel data set, or longitudinal data set, is one where there are repeated observations on the same units. The units may be individuals, households, enterprises, countries, or any set of entities that remain stable through time. The National Longitudinal Survey of Youth is an example. The same respondents were interviewed every year from 1979 to Since 1994 they have been interviewed every two years. 3

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION
A panel data set, or longitudinal data set, is one where there are repeated observations on the same units. The units may be individuals, households, enterprises, countries, or any set of entities that remain stable through time. The National Longitudinal Survey of Youth is an example. The same respondents were interviewed every year from 1979 to Since 1994 they have been interviewed every two years. A balanced panel is one where every unit is surveyed in every time period. The NLSY is unbalanced because some individuals have not been interviewed in some years. Some could not be located, some refused, and a few have died. 4

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION
Panel data sets have several advantages over cross-section data sets: • They may make it possible to overcome a problem of bias caused by unobserved heterogeneity. 5

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION
Panel data sets have several advantages over cross-section data sets: • They may make it possible to overcome a problem of bias caused by unobserved heterogeneity. • They make it possible to investigate dynamics without relying on retrospective questions that may yield data subject to measurement error. 6

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION
Panel data sets have several advantages over cross-section data sets: • They may make it possible to overcome a problem of bias caused by unobserved heterogeneity. • They make it possible to investigate dynamics without relying on retrospective questions that may yield data subject to measurement error. • They are often very large. If there are n units and T time periods, the potential number of observations is nT. 7

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION
Panel data sets have several advantages over cross-section data sets: • They may make it possible to overcome a problem of bias caused by unobserved heterogeneity. • They make it possible to investigate dynamics without relying on retrospective questions that may yield data subject to measurement error. • They are often very large. If there are n units and T time periods, the potential number of observations is nT. • Because they tend to be expensive to undertake, they are often well designed and have high response rates. The NLSY is an example. 8

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION
NLSY 1988 data Dependent variable LGEARN MARRIED – (0.024) (0.028) SOONMARR – –0.066 (0.037) (0.034) SINGLE – – –0.163 (0.028) R n We will start with an example of the use of panel data to investigate simple dynamics. We will use data from the 1988 round of the NLSY for 1,538 males in full-time employment. 9

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION
NLSY 1988 data Dependent variable LGEARN MARRIED – (0.024) (0.028) SOONMARR – –0.066 (0.037) (0.034) SINGLE – – –0.163 (0.028) R n Here is the result of regressing the logarithm of hourly earnings on a dummy variable for being married and a set of control variables (years of schooling, ASVABC score, years of tenure and square, years of work experience and square, etc; coefficients not shown). 10

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION
NLSY 1988 data Dependent variable LGEARN MARRIED – (0.024) (0.028) SOONMARR – –0.066 (0.037) (0.034) SINGLE – – –0.163 (0.028) R n Married males earn 12.9 percent more than single males and the effect is highly significant (standard error in parentheses). 11

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION
NLSY 1988 data Dependent variable LGEARN MARRIED – (0.024) (0.028) SOONMARR – –0.066 (0.037) (0.034) SINGLE – – –0.163 (0.028) R n The effect has often been found in the literature. One explanation is that marriage entails financial responsibilities — in particular, the raising of children — that may encourage men to work harder or seek better paying jobs. 12

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION
NLSY 1988 data Dependent variable LGEARN MARRIED – (0.024) (0.028) SOONMARR – –0.066 (0.037) (0.034) SINGLE – – –0.163 (0.028) R n Another is that certain unobserved qualities that are valued by employers are also valued by potential spouses and hence are conducive to getting married. According to this explanation the dummy variable for being married is acting as a proxy for these qualities. 13

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION
NLSY 1988 data Dependent variable LGEARN MARRIED – (0.024) (0.028) SOONMARR – –0.066 (0.037) (0.034) SINGLE – – –0.163 (0.028) R n Other explanations have been proposed, but we will restrict attention to these two. With cross-sectional data it is difficult to discriminate between them. 14

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION
NLSY 1988 data Dependent variable LGEARN MARRIED – (0.024) (0.028) SOONMARR – –0.066 (0.037) (0.034) SINGLE – – –0.163 (0.028) R n However with panel data one can find out whether there is an uplift at the time of marriage or soon after, as would be predicted by the increased productivity hypothesis ... 15

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION
NLSY 1988 data Dependent variable LGEARN MARRIED – (0.024) (0.028) SOONMARR – –0.066 (0.037) (0.034) SINGLE – – –0.163 (0.028) R n ... or whether men who end up married tend to earn more even when unmarried, as would be predicted by the unobserved heterogeneity hypothesis. 16

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION
NLSY 1988 data Dependent variable LGEARN MARRIED – (0.024) (0.028) SOONMARR – –0.066 (0.037) (0.034) SINGLE – – –0.163 (0.028) R n We define a second dummy variable SOONMARR equal to 1 if the respondent was single in 1988 but married within the next four years. The omitted category consists of those who were single in 1988 and still single four years later. 17

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION
NLSY 1988 data Dependent variable LGEARN MARRIED – (0.024) (0.028) SOONMARR – –0.066 (0.037) (0.034) SINGLE – – –0.163 (0.028) R n Under the null hypothesis that the marital effect is dynamic and marriage encourages men to earn more, the coefficient of SOONMARR should be 0 because the men in this category were still single as of 1988. 18

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION
NLSY 1988 data Dependent variable LGEARN MARRIED – (0.024) (0.028) SOONMARR – –0.066 (0.037) (0.034) SINGLE – – –0.163 (0.028) R n The t statistic is 3.10 and so it is significantly different from 0 at the 0.1 percent level, leading us to reject the null hypothesis at that level. 19

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION
NLSY 1988 data Dependent variable LGEARN MARRIED – (0.024) (0.028) SOONMARR – –0.066 (0.037) (0.034) SINGLE – – –0.163 (0.028) R n However, if the alternative hypothesis is true, the coefficient of SOONMARR should be equal to that of MARRIED, but it is lower. 20

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION
NLSY 1988 data Dependent variable LGEARN MARRIED – (0.024) (0.028) SOONMARR – –0.066 (0.037) (0.034) SINGLE – – –0.163 (0.028) R n To test whether it is significantly lower, the easiest method is to change the reference category to those who were married by 1988 and to introduce a new dummy variable SINGLE that is equal to 1 if the respondent was still single four years later. 21

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION
NLSY 1988 data Dependent variable LGEARN MARRIED – (0.024) (0.028) SOONMARR – –0.066 (0.037) (0.034) SINGLE – – –0.163 (0.028) R n The coefficient of SOONMARR now estimates the difference between the coefficients of those married by 1988 and those married within the next four years, and if the second hypothesis is true, it should be equal to 0. 22

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION
NLSY 1988 data Dependent variable LGEARN MARRIED – (0.024) (0.028) SOONMARR – –0.066 (0.037) (0.034) SINGLE – – –0.163 (0.028) R n The t statistic is –1.93, so we (just) do not reject the second hypothesis at the 5 percent level. The evidence is more compatible wtih the first hypothesis, but it is possible that neither hypothesis is correct on its own and the truth might reside in some compromise. 23

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION
The starting point for a discussion of regression models using panel data is an equation of the form shown above, where the Xj variables are observed and the Zp variables are unobserved. 24

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION
The index i refers to the unit of observation, t refers to the time period, and j and p are used to differentiate between different observed and unobserved explanatory variables. eit is a disturbance term assumed to satisfy the regression model assumptions. 25

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION
A trend term t has been introduced to allow for a shift of the intercept over time. If the implicit assumption of a constant rate of change seems too strong, the trend can be replaced by a set of dummy variables, one for each time period except the reference period. 26

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION
The Xj variables are usually the variables of interest, while the Zp variables are responsible for unobserved heterogeneity and as such constitute a nuisance component of the model. 27

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION
Note that the unobserved heterogeneity is assumed to be unchanging and accordingly the Zp variables do not have a time subscript. 28

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION
Because the Zp variables are unobserved, there is no means of obtaining information about the SgpZp component of the model and it is convenient to define a term ai, known as the unobserved effect, representing the joint impact of the Zp variables on Yi. 29

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION
Hence we can rewrite the regression model as shown. The characterization of the ai component will be seen to be crucially important in what follows. 30

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION
First, however, note that if the Xj controls are so comprehensive that they capture all the relevant characteristics of the individual, there will be no relevant unobserved characteristics. 31

REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION
In that case the ai term may be dropped and pooled OLS may be used to fit the model, treating all the observations for all of the time periods as a single sample. 32