1 REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION A panel data set, or longitudinal data set, is one where there are repeated observations on the same.

1 REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION A panel data set, or longitudinal data set, is one where there are repeated observations on the same units.

2 REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION A panel data set, or longitudinal data set, is one where there are repeated observations on the same units. The units may be individuals, households, enterprises, countries, or any set of entities that remain stable through time.

3 REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION A panel data set, or longitudinal data set, is one where there are repeated observations on the same units. The units may be individuals, households, enterprises, countries, or any set of entities that remain stable through time. The National Longitudinal Survey of Youth is an example. The same respondents were interviewed every year from 1979 to 1994. Since 1994 they have been interviewed every two years.

A panel data set, or longitudinal data set, is one where there are repeated observations on the same units. The units may be individuals, households, enterprises, countries, or any set of entities that remain stable through time. The National Longitudinal Survey of Youth is an example. The same respondents were interviewed every year from 1979 to 1994. Since 1994 they have been interviewed every two years. A balanced panel is one where every unit is surveyed in every time period. The NLSY is unbalanced because some individuals have not been interviewed in some years. Some could not be located, some refused, and a few have died. 4 REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

5 Panel data sets have several advantages over cross-section data sets: They may make it possible to overcome a problem of bias caused by unobserved heterogeneity.

6 REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION Panel data sets have several advantages over cross-section data sets: They may make it possible to overcome a problem of bias caused by unobserved heterogeneity. They make it possible to investigate dynamics without relying on retrospective questions that may yield data subject to measurement error.

7 REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION Panel data sets have several advantages over cross-section data sets: They may make it possible to overcome a problem of bias caused by unobserved heterogeneity. They make it possible to investigate dynamics without relying on retrospective questions that may yield data subject to measurement error. They are often very large. If there are n units and T time periods, the potential number of observations is nT.

Panel data sets have several advantages over cross-section data sets: They may make it possible to overcome a problem of bias caused by unobserved heterogeneity. They make it possible to investigate dynamics without relying on retrospective questions that may yield data subject to measurement error. They are often very large. If there are n units and T time periods, the potential number of observations is nT. Because they tend to be expensive to undertake, they are often well designed and have high response rates. The NLSY is an example. 8 REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

9 We will start with an example of the use of panel data to investigate simple dynamics. We will use data from the 1988 round of the NLSY for 1,538 males in full-time employment. REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION NLSY 1988 data Dependent variable LGEARN OLS MARRIED0.129 (0.024) SOONMARR— SINGLE— R 2 0.271 n 1538

10 Here is the result of regressing the logarithm of hourly earnings on a dummy variable for being married and a set of control variables (years of schooling, ASVABC score, years of tenure and square, years of work experience and square, etc; coefficients not shown). REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION NLSY 1988 data Dependent variable LGEARN OLS MARRIED0.129 (0.024) SOONMARR— SINGLE— R 2 0.271 n 1538

11 Married males earn 12.9 percent more than single males and the effect is highly significant (standard error in parentheses). REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION NLSY 1988 data Dependent variable LGEARN OLS MARRIED0.129 (0.024) SOONMARR— SINGLE— R 2 0.271 n 1538

12 The effect has often been found in the literature. One explanation is that marriage entails financial responsibilities — in particular, the raising of children — that may encourage men to work harder or seek better paying jobs. REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION NLSY 1988 data Dependent variable LGEARN OLS MARRIED0.129 (0.024) SOONMARR— SINGLE— R 2 0.271 n 1538

13 Another is that certain unobserved qualities that are valued by employers are also valued by potential spouses and hence are conducive to getting married. According to this explanation the dummy variable for being married is acting as a proxy for these qualities. REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION NLSY 1988 data Dependent variable LGEARN OLS MARRIED0.129 (0.024) SOONMARR— SINGLE— R 2 0.271 n 1538

14 Other explanations have been proposed, but we will restrict attention to these two. With cross-sectional data it is difficult to discriminate between them. REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION NLSY 1988 data Dependent variable LGEARN OLS MARRIED0.129 (0.024) SOONMARR— SINGLE— R 2 0.271 n 1538

15 However with panel data one can find out whether there is an uplift at the time of marriage or soon after, as would be predicted by the increased productivity hypothesis... REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION NLSY 1988 data Dependent variable LGEARN OLS MARRIED0.129 (0.024) SOONMARR— SINGLE— R 2 0.271 n 1538

16... or whether men who end up married tend to earn more even when unmarried, as would be predicted by the unobserved heterogeneity hypothesis. REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION NLSY 1988 data Dependent variable LGEARN OLS MARRIED0.129 (0.024) SOONMARR— SINGLE— R 2 0.271 n 1538

17 We define a second dummy variable SOONMARR equal to 1 if the respondent was single in 1988 but married within the next four years. The omitted category consists of those who were single in 1988 and still single four years later. REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION NLSY 1988 data Dependent variable LGEARN OLSFixed effects MARRIED0.1290.163 (0.024)(0.028) SOONMARR—0.096 (0.037) SINGLE—— R 2 0.2710.274 n 1538 1538

NLSY 1988 data Dependent variable LGEARN OLSFixed effects MARRIED0.1290.163 (0.024)(0.028) SOONMARR—0.096 (0.037) SINGLE—— R 2 0.2710.274 n 1538 1538 18 Under the null hypothesis that the marital effect is dynamic and marriage encourages men to earn more, the coefficient of SOONMARR should be 0 because the men in this category were still single as of 1988. REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

19 The t statistic is 2.60 and so it is significantly different from 0 at the 1 percent level, leading us to reject the null hypothesis at that level. REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION NLSY 1988 data Dependent variable LGEARN OLSFixed effects MARRIED0.1290.163 (0.024)(0.028) SOONMARR—0.096 (0.037) SINGLE—— R 2 0.2710.274 n 1538 1538

20 However, if the alternative hypothesis is true, the coefficient of SOONMARR should be equal to that of MARRIED, but it is lower. REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION NLSY 1988 data Dependent variable LGEARN OLSFixed effects MARRIED0.1290.163 (0.024)(0.028) SOONMARR—0.096 (0.037) SINGLE—— R 2 0.2710.274 n 1538 1538

21 To test whether it is significantly lower, the easiest method is to change the reference category to those who were married by 1988 and to introduce a new dummy variable SINGLE that is equal to 1 if the respondent was still single four years later. REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION NLSY 1988 data Dependent variable LGEARN OLSFixedFixedeffects MARRIED0.1290.163— (0.024)(0.028) SOONMARR—0.096–0.066 (0.037)(0.034) SINGLE——–0.163 (0.028) R 2 0.2710.2740.274 n 1538 1538 1538

22 The coefficient of SOONMARR now estimates the difference between the coefficients of those married by 1988 and those married within the next four years, and if the second hypothesis is true, it should be equal to 0. REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION NLSY 1988 data Dependent variable LGEARN OLSFixedFixedeffects MARRIED0.1290.163— (0.024)(0.028) SOONMARR—0.096–0.066 (0.037)(0.034) SINGLE——–0.163 (0.028) R 2 0.2710.2740.274 n 1538 1538 1538

NLSY 1988 data Dependent variable LGEARN OLSFixedFixedeffects MARRIED0.1290.163— (0.024)(0.028) SOONMARR—0.096–0.066 (0.037)(0.034) SINGLE——–0.163 (0.028) R 2 0.2710.2740.274 n 1538 1538 1538 23 The t statistic is –1.93, so we (just) do not reject the second hypothesis at the 5 percent level. The evidence is more compatible wtih the first hypothesis, but it is possible that neither hypothesis is correct on its own and the truth might reside in some compromise. REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

24 The starting point for a discussion of regression models using panel data is an equation of the form shown above, where the X j variables are observed and the Z p variables are unobserved. REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

25 The index i refers to the unit of observation, t refers to the time period, and j and p are used to differentiate between different observed and unobserved explanatory variables.  it is a disturbance term assumed to satisfy the regression model assumptions. REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

26 A trend term t has been introduced to allow for a shift of the intercept over time. If the implicit assumption of a constant rate of change seems too strong, the trend can be replaced by a set of dummy variables, one for each time period except the reference period. REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

27 The X j variables are usually the variables of interest, while the Z p variables are responsible for unobserved heterogeneity and as such constitute a nuisance component of the model. REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

28 Note that the unobserved heterogeneity is assumed to be unchanging and accordingly the Z p variables do not have a time subscript. REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

29 Because the Z p variables are unobserved, there is no means of obtaining information about the  p Z p component of the model and it is convenient to define a term  i, known as the unobserved effect, representing the joint impact of the Z p variables on Y i. REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

Hence we can rewrite the regression model as shown. The characterization of the  i component will be seen to be crucially important in what follows. 30 REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

First, however, note that if the X j controls are so comprehensive that they capture all the relevant characteristics of the individual, there will be no relevant unobserved characteristics. 31 REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

In that case the  i term may be dropped and pooled OLS may be used to fit the model, treating all the observations for all of the time periods as a single sample. 32 REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

Copyright Christopher Dougherty 2013. These slideshows may be downloaded by anyone, anywhere for personal use. Subject to respect for copyright and, where appropriate, attribution, they may be used as a resource for teaching an econometrics course. There is no need to refer to the author. The content of this slideshow comes from Section 14.1 of C. Dougherty, Introduction to Econometrics, fourth edition 2011, Oxford University Press. Additional (free) resources for both students and instructors may be downloaded from the OUP Online Resource Centre http://www.oup.com/uk/orc/bin/9780199567089/http://www.oup.com/uk/orc/bin/9780199567089/. Individuals studying econometrics on their own who feel that they might benefit from participation in a formal course should consider the London School of Economics summer school course EC212 Introduction to Econometrics http://www2.lse.ac.uk/study/summerSchools/summerSchool/Home.aspx http://www2.lse.ac.uk/study/summerSchools/summerSchool/Home.aspx or the University of London International Programmes distance learning course 20 Elements of Econometrics www.londoninternational.ac.uk/lsewww.londoninternational.ac.uk/lse. 2013.09.01

1 REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION A panel data set, or longitudinal data set, is one where there are repeated observations on the same.

Similar presentations

Presentation on theme: "1 REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION A panel data set, or longitudinal data set, is one where there are repeated observations on the same."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION A panel data set, or longitudinal data set, is one where there are repeated observations on the same.

Similar presentations

Presentation on theme: "1 REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION A panel data set, or longitudinal data set, is one where there are repeated observations on the same."— Presentation transcript:

Similar presentations

About project

Feedback