Presentation is loading. Please wait.

Presentation is loading. Please wait.

Christopher Dougherty EC220 - Introduction to econometrics (chapter 14) Slideshow: regression analysis with panel data Original citation: Dougherty, C.

Similar presentations


Presentation on theme: "Christopher Dougherty EC220 - Introduction to econometrics (chapter 14) Slideshow: regression analysis with panel data Original citation: Dougherty, C."— Presentation transcript:

1 Christopher Dougherty EC220 - Introduction to econometrics (chapter 14) Slideshow: regression analysis with panel data Original citation: Dougherty, C. (2012) EC220 - Introduction to econometrics (chapter 14). [Teaching Resource] © 2012 The Author This version available at: Available in LSE Learning Resources Online: May 2012 This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License. This license allows the user to remix, tweak, and build upon the work even for commercial purposes, as long as the user credits the author and licenses their new creations under the identical terms.

2 A panel data set, or longitudinal data set, is one where there are repeated observations on the same units. 1 REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

3 A panel data set, or longitudinal data set, is one where there are repeated observations on the same units. The units may be individuals, households, enterprises, countries, or any set of entities that remain stable through time. 2 REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

4 A panel data set, or longitudinal data set, is one where there are repeated observations on the same units. The units may be individuals, households, enterprises, countries, or any set of entities that remain stable through time. The National Longitudinal Survey of Youth is an example. The same respondents were interviewed every year from 1979 to Since 1994 they have been interviewed every two years. 3 REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

5 A panel data set, or longitudinal data set, is one where there are repeated observations on the same units. The units may be individuals, households, enterprises, countries, or any set of entities that remain stable through time. The National Longitudinal Survey of Youth is an example. The same respondents were interviewed every year from 1979 to Since 1994 they have been interviewed every two years. A balanced panel is one where every unit is surveyed in every time period. The NLSY is unbalanced because some individuals have not been interviewed in some years. Some could not be located, some refused, and a few have died. 4 REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

6 Panel data sets have several advantages over cross-section data sets: They may make it possible to overcome a problem of bias caused by unobserved heterogeneity. 5 REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

7 Panel data sets have several advantages over cross-section data sets: They may make it possible to overcome a problem of bias caused by unobserved heterogeneity. They make it possible to investigate dynamics without relying on retrospective questions that may yield data subject to measurement error. 6 REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

8 Panel data sets have several advantages over cross-section data sets: They may make it possible to overcome a problem of bias caused by unobserved heterogeneity. They make it possible to investigate dynamics without relying on retrospective questions that may yield data subject to measurement error. They are often very large. If there are n units and T time periods, the potential number of observations is nT. 7 REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

9 Panel data sets have several advantages over cross-section data sets: They may make it possible to overcome a problem of bias caused by unobserved heterogeneity. They make it possible to investigate dynamics without relying on retrospective questions that may yield data subject to measurement error. They are often very large. If there are n units and T time periods, the potential number of observations is nT. Because they tend to be expensive to undertake, they are often well designed and have high response rates. The NLSY is an example. 8 REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

10 NLSY 1988 data Dependent variable LGEARN MARRIED – (0.024)(0.028) SOONMARR–0.096–0.066 (0.037)(0.034) SINGLE–––0.163 (0.028) R n We will start with an example of the use of panel data to investigate simple dynamics. We will use data from the 1988 round of the NLSY for 1,538 males in full-time employment. REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

11 NLSY 1988 data Dependent variable LGEARN MARRIED – (0.024)(0.028) SOONMARR–0.096–0.066 (0.037)(0.034) SINGLE–––0.163 (0.028) R n Here is the result of regressing the logarithm of hourly earnings on a dummy variable for being married and a set of control variables (years of schooling, ASVABC score, years of tenure and square, years of work experience and square, etc; coefficients not shown). REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

12 NLSY 1988 data Dependent variable LGEARN MARRIED – (0.024)(0.028) SOONMARR–0.096–0.066 (0.037)(0.034) SINGLE–––0.163 (0.028) R n Married males earn 12.9 percent more than single males and the effect is highly significant (standard error in parentheses). REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

13 NLSY 1988 data Dependent variable LGEARN MARRIED – (0.024)(0.028) SOONMARR–0.096–0.066 (0.037)(0.034) SINGLE–––0.163 (0.028) R n The effect has often been found in the literature. One explanation is that marriage entails financial responsibilities in particular, the raising of children that may encourage men to work harder or seek better paying jobs. REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

14 NLSY 1988 data Dependent variable LGEARN MARRIED – (0.024)(0.028) SOONMARR–0.096–0.066 (0.037)(0.034) SINGLE–––0.163 (0.028) R n Another is that certain unobserved qualities that are valued by employers are also valued by potential spouses and hence are conducive to getting married. According to this explanation the dummy variable for being married is acting as a proxy for these qualities. REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

15 NLSY 1988 data Dependent variable LGEARN MARRIED – (0.024)(0.028) SOONMARR–0.096–0.066 (0.037)(0.034) SINGLE–––0.163 (0.028) R n Other explanations have been proposed, but we will restrict attention to these two. With cross-sectional data it is difficult to discriminate between them. REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

16 NLSY 1988 data Dependent variable LGEARN MARRIED – (0.024)(0.028) SOONMARR–0.096–0.066 (0.037)(0.034) SINGLE–––0.163 (0.028) R n However with panel data one can find out whether there is an uplift at the time of marriage or soon after, as would be predicted by the increased productivity hypothesis... REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

17 NLSY 1988 data Dependent variable LGEARN MARRIED – (0.024)(0.028) SOONMARR–0.096–0.066 (0.037)(0.034) SINGLE–––0.163 (0.028) R n or whether men who end up married tend to earn more even when unmarried, as would be predicted by the unobserved heterogeneity hypothesis. REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

18 NLSY 1988 data Dependent variable LGEARN MARRIED – (0.024)(0.028) SOONMARR–0.096–0.066 (0.037)(0.034) SINGLE–––0.163 (0.028) R n We define a second dummy variable SOONMARR equal to 1 if the respondent was single in 1988 but married within the next four years. The omitted category consists of those who were single in 1988 and still single four years later. REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

19 NLSY 1988 data Dependent variable LGEARN MARRIED – (0.024)(0.028) SOONMARR–0.096–0.066 (0.037)(0.034) SINGLE–––0.163 (0.028) R n Under the null hypothesis that the marital effect is dynamic and marriage encourages men to earn more, the coefficient of SOONMARR should be 0 because the men in this category were still single as of REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

20 NLSY 1988 data Dependent variable LGEARN MARRIED – (0.024)(0.028) SOONMARR–0.096–0.066 (0.037)(0.034) SINGLE–––0.163 (0.028) R n The t statistic is 3.10 and so it is significantly different from 0 at the 0.1 percent level, leading us to reject the null hypothesis at that level. REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

21 NLSY 1988 data Dependent variable LGEARN MARRIED – (0.024)(0.028) SOONMARR–0.096–0.066 (0.037)(0.034) SINGLE–––0.163 (0.028) R n However, if the alternative hypothesis is true, the coefficient of SOONMARR should be equal to that of MARRIED, but it is lower. REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

22 21 To test whether it is significantly lower, the easiest method is to change the reference category to those who were married by 1988 and to introduce a new dummy variable SINGLE that is equal to 1 if the respondent was still single four years later. REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION NLSY 1988 data Dependent variable LGEARN MARRIED – (0.024)(0.028) SOONMARR–0.096–0.066 (0.037)(0.034) SINGLE–––0.163 (0.028) R n

23 22 The coefficient of SOONMARR now estimates the difference between the coefficients of those married by 1988 and those married within the next four years, and if the second hypothesis is true, it should be equal to 0. REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION NLSY 1988 data Dependent variable LGEARN MARRIED – (0.024)(0.028) SOONMARR–0.096–0.066 (0.037)(0.034) SINGLE–––0.163 (0.028) R n

24 NLSY 1988 data Dependent variable LGEARN MARRIED – (0.024)(0.028) SOONMARR–0.096–0.066 (0.037)(0.034) SINGLE–––0.163 (0.028) R n The t statistic is –1.93, so we (just) do not reject the second hypothesis at the 5 percent level. The evidence is more compatible wtih the first hypothesis, but it is possible that neither hypothesis is correct on its own and the truth might reside in some compromise. REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

25 24 The starting point for a discussion of regression models using panel data is an equation of the form shown above, where the X j variables are observed and the Z p variables are unobserved. REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

26 25 The index i refers to the unit of observation, t refers to the time period, and j and p are used to differentiate between different observed and unobserved explanatory variables. it is a disturbance term assumed to satisfy the regression model assumptions. REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

27 26 A trend term t has been introduced to allow for a shift of the intercept over time. If the implicit assumption of a constant rate of change seems too strong, the trend can be replaced by a set of dummy variables, one for each time period except the reference period. REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

28 27 The X j variables are usually the variables of interest, while the Z p variables are responsible for unobserved heterogeneity and as such constitute a nuisance component of the model. REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

29 28 Note that the unobserved heterogeneity is assumed to be unchanging and accordingly the Z p variables do not have a time subscript. REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

30 29 Because the Z p variables are unobserved, there is no means of obtaining information about the p Z p component of the model and it is convenient to define a term i, known as the unobserved effect, representing the joint impact of the Z p variables on Y i. REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

31 Hence we can rewrite the regression model as shown. The characterization of the i component will be seen to be crucially important in what follows. REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION 30

32 First, however, note that if the X j controls are so comprehensive that they capture all the relevant characteristics of the individual, there will be no relevant unobserved characteristics. REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION 31

33 In that case the i term may be dropped and pooled OLS may be used to fit the model, treating all the observations for all of the time periods as a single sample. REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION 32

34 Copyright Christopher Dougherty These slideshows may be downloaded by anyone, anywhere for personal use. Subject to respect for copyright and, where appropriate, attribution, they may be used as a resource for teaching an econometrics course. There is no need to refer to the author. The content of this slideshow comes from Section 14.1 of C. Dougherty, Introduction to Econometrics, fourth edition 2011, Oxford University Press. Additional (free) resources for both students and instructors may be downloaded from the OUP Online Resource Centre Individuals studying econometrics on their own and who feel that they might benefit from participation in a formal course should consider the London School of Economics summer school course EC212 Introduction to Econometrics or the University of London International Programmes distance learning course 20 Elements of Econometrics


Download ppt "Christopher Dougherty EC220 - Introduction to econometrics (chapter 14) Slideshow: regression analysis with panel data Original citation: Dougherty, C."

Similar presentations


Ads by Google