1 REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION A panel data set, or longitudinal data set, is one where there are repeated observations on the same.

Slides:



Advertisements
Similar presentations
EC220 - Introduction to econometrics (chapter 14)
Advertisements

CHOW TEST AND DUMMY VARIABLE GROUP TEST
Christopher Dougherty EC220 - Introduction to econometrics (chapter 2) Slideshow: a Monte Carlo experiment Original citation: Dougherty, C. (2012) EC220.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 11) Slideshow: adaptive expectations Original citation: Dougherty, C. (2012) EC220.
1 THE DISTURBANCE TERM IN LOGARITHMIC MODELS Thus far, nothing has been said about the disturbance term in nonlinear regression models.
1 XX X1X1 XX X Random variable X with unknown population mean  X function of X probability density Sample of n observations X 1, X 2,..., X n : potential.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 4) Slideshow: interactive explanatory variables Original citation: Dougherty, C. (2012)
1 PROBABILITY DISTRIBUTION EXAMPLE: X IS THE SUM OF TWO DICE red This sequence provides an example of a discrete random variable. Suppose that you.
Random effects estimation RANDOM EFFECTS REGRESSIONS When the observed variables of interest are constant for each individual, a fixed effects regression.
MEASUREMENT ERROR 1 In this sequence we will investigate the consequences of measurement errors in the variables in a regression model. To keep the analysis.
1 ASSUMPTIONS FOR MODEL C: REGRESSIONS WITH TIME SERIES DATA Assumptions C.1, C.3, C.4, C.5, and C.8, and the consequences of their violations are the.
00  sd  0 –sd  0 –1.96sd  0 +sd 2.5% CONFIDENCE INTERVALS probability density function of X null hypothesis H 0 :  =  0 In the sequence.
EXPECTED VALUE OF A RANDOM VARIABLE 1 The expected value of a random variable, also known as its population mean, is the weighted average of its possible.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: variable misspecification iii: consequences for diagnostics Original.
TESTING A HYPOTHESIS RELATING TO THE POPULATION MEAN 1 This sequence describes the testing of a hypothesis at the 5% and 1% significance levels. It also.
TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT This sequence describes the testing of a hypotheses relating to regression coefficients. It is.
1 A MONTE CARLO EXPERIMENT In the previous slideshow, we saw that the error term is responsible for the variations of b 2 around its fixed component 
Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: prediction Original citation: Dougherty, C. (2012) EC220 - Introduction.
SLOPE DUMMY VARIABLES 1 The scatter diagram shows the data for the 74 schools in Shanghai and the cost functions derived from a regression of COST on N.
1 In the previous sequence, we were performing what are described as two-sided t tests. These are appropriate when we have no information about the alternative.
Cross-sectional:Observations on individuals, households, enterprises, countries, etc at one moment in time (Chapters 1–10, Models A and B). 1 During this.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 10) Slideshow: maximum likelihood estimation of regression coefficients Original citation:
DERIVING LINEAR REGRESSION COEFFICIENTS
1 In a second variation, we shall consider the model shown above. x is the rate of growth of productivity, assumed to be exogenous. w is now hypothesized.
1 PREDICTION In the previous sequence, we saw how to predict the price of a good or asset given the composition of its characteristics. In this sequence,
1 UNBIASEDNESS AND EFFICIENCY Much of the analysis in this course will be concerned with three properties of estimators: unbiasedness, efficiency, and.
FIXED EFFECTS REGRESSIONS: WITHIN-GROUPS METHOD The two main approaches to the fitting of models using panel data are known, for reasons that will be explained.
DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 12) Slideshow: autocorrelation, partial adjustment, and adaptive expectations Original.
THE DUMMY VARIABLE TRAP 1 Suppose that you have a regression model with Y depending on a set of ordinary variables X 2,..., X k and a qualitative variable.
1 INTERACTIVE EXPLANATORY VARIABLES The model shown above is linear in parameters and it may be fitted using straightforward OLS, provided that the regression.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 8) Slideshow: measurement error Original citation: Dougherty, C. (2012) EC220 - Introduction.
THE FIXED AND RANDOM COMPONENTS OF A RANDOM VARIABLE 1 In this short sequence we shall decompose a random variable X into its fixed and random components.
1 TWO SETS OF DUMMY VARIABLES The explanatory variables in a regression model may include multiple sets of dummy variables. This sequence provides an example.
Confidence intervals were treated at length in the Review chapter and their application to regression analysis presents no problems. We will not repeat.
CONSEQUENCES OF AUTOCORRELATION
ALTERNATIVE EXPRESSION FOR POPULATION VARIANCE 1 This sequence derives an alternative expression for the population variance of a random variable. It provides.
CONFLICTS BETWEEN UNBIASEDNESS AND MINIMUM VARIANCE
1 t TEST OF A HYPOTHESIS RELATING TO A POPULATION MEAN The diagram summarizes the procedure for performing a 5% significance test on the slope coefficient.
ASYMPTOTIC AND FINITE-SAMPLE DISTRIBUTIONS OF THE IV ESTIMATOR
1 PROXY VARIABLES Suppose that a variable Y is hypothesized to depend on a set of explanatory variables X 2,..., X k as shown above, and suppose that for.
EC220 - Introduction to econometrics (chapter 8)
MULTIPLE RESTRICTIONS AND ZERO RESTRICTIONS
F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION 1 This sequence describes two F tests of goodness of fit in a multiple regression model. The first relates.
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE 1 This sequence provides a geometrical interpretation of a multiple regression model with two.
A.1The model is linear in parameters and correctly specified. PROPERTIES OF THE MULTIPLE REGRESSION COEFFICIENTS 1 Moving from the simple to the multiple.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 9) Slideshow: instrumental variable estimation: variation Original citation: Dougherty,
. reg LGEARN S WEIGHT85 Source | SS df MS Number of obs = F( 2, 537) = Model |
Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: multiple restrictions and zero restrictions Original citation: Dougherty,
(1)Combine the correlated variables. 1 In this sequence, we look at four possible indirect methods for alleviating a problem of multicollinearity. POSSIBLE.
1 Y SIMPLE REGRESSION MODEL Suppose that a variable Y is a linear function of another variable X, with unknown parameters  1 and  2 that we wish to estimate.
1 We will continue with a variation on the basic model. We will now hypothesize that p is a function of m, the rate of growth of the money supply, as well.
COST 11 DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES 1 This sequence explains how you can include qualitative explanatory variables in your regression.
When should you use fixed effects estimation rather than random effects estimation, or vice versa? FIXED EFFECTS OR RANDOM EFFECTS? 1 NLSY 1980–1996 Dependent.
HETEROSCEDASTICITY 1 This sequence relates to Assumption A.4 of the regression model assumptions and introduces the topic of heteroscedasticity. This relates.
INSTRUMENTAL VARIABLES 1 Suppose that you have a model in which Y is determined by X but you have reason to believe that Assumption B.7 is invalid and.
1 HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS This sequence presents two methods for dealing with the problem of heteroscedasticity. We will.
1 ESTIMATORS OF VARIANCE, COVARIANCE, AND CORRELATION We have seen that the variance of a random variable X is given by the expression above. Variance.
SEMILOGARITHMIC MODELS 1 This sequence introduces the semilogarithmic model and shows how it may be applied to an earnings function. The dependent variable.
1 REPARAMETERIZATION OF A MODEL AND t TEST OF A LINEAR RESTRICTION Linear restrictions can also be tested using a t test. This involves the reparameterization.
F TESTS RELATING TO GROUPS OF EXPLANATORY VARIABLES 1 We now come to more general F tests of goodness of fit. This is a test of the joint explanatory power.
WHITE TEST FOR HETEROSCEDASTICITY 1 The White test for heteroscedasticity looks for evidence of an association between the variance of the disturbance.
VARIABLE MISSPECIFICATION II: INCLUSION OF AN IRRELEVANT VARIABLE In this sequence we will investigate the consequences of including an irrelevant variable.
FOOTNOTE: THE COCHRANE–ORCUTT ITERATIVE PROCESS 1 We saw in the previous sequence that AR(1) autocorrelation could be eliminated by a simple manipulation.
VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE In this sequence and the next we will investigate the consequences of misspecifying the regression.
Introduction to Econometrics, 5th edition
Introduction to Econometrics, 5th edition
Introduction to Econometrics, 5th edition
Introduction to Econometrics, 5th edition
Introduction to Econometrics, 5th edition
Presentation transcript:

1 REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION A panel data set, or longitudinal data set, is one where there are repeated observations on the same units.

2 REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION A panel data set, or longitudinal data set, is one where there are repeated observations on the same units. The units may be individuals, households, enterprises, countries, or any set of entities that remain stable through time.

3 REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION A panel data set, or longitudinal data set, is one where there are repeated observations on the same units. The units may be individuals, households, enterprises, countries, or any set of entities that remain stable through time. The National Longitudinal Survey of Youth is an example. The same respondents were interviewed every year from 1979 to Since 1994 they have been interviewed every two years.

A panel data set, or longitudinal data set, is one where there are repeated observations on the same units. The units may be individuals, households, enterprises, countries, or any set of entities that remain stable through time. The National Longitudinal Survey of Youth is an example. The same respondents were interviewed every year from 1979 to Since 1994 they have been interviewed every two years. A balanced panel is one where every unit is surveyed in every time period. The NLSY is unbalanced because some individuals have not been interviewed in some years. Some could not be located, some refused, and a few have died. 4 REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

5 Panel data sets have several advantages over cross-section data sets: They may make it possible to overcome a problem of bias caused by unobserved heterogeneity.

6 REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION Panel data sets have several advantages over cross-section data sets: They may make it possible to overcome a problem of bias caused by unobserved heterogeneity. They make it possible to investigate dynamics without relying on retrospective questions that may yield data subject to measurement error.

7 REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION Panel data sets have several advantages over cross-section data sets: They may make it possible to overcome a problem of bias caused by unobserved heterogeneity. They make it possible to investigate dynamics without relying on retrospective questions that may yield data subject to measurement error. They are often very large. If there are n units and T time periods, the potential number of observations is nT.

Panel data sets have several advantages over cross-section data sets: They may make it possible to overcome a problem of bias caused by unobserved heterogeneity. They make it possible to investigate dynamics without relying on retrospective questions that may yield data subject to measurement error. They are often very large. If there are n units and T time periods, the potential number of observations is nT. Because they tend to be expensive to undertake, they are often well designed and have high response rates. The NLSY is an example. 8 REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

9 We will start with an example of the use of panel data to investigate simple dynamics. We will use data from the 1988 round of the NLSY for 1,538 males in full-time employment. REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION NLSY 1988 data Dependent variable LGEARN OLS MARRIED0.129 (0.024) SOONMARR— SINGLE— R n 1538

10 Here is the result of regressing the logarithm of hourly earnings on a dummy variable for being married and a set of control variables (years of schooling, ASVABC score, years of tenure and square, years of work experience and square, etc; coefficients not shown). REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION NLSY 1988 data Dependent variable LGEARN OLS MARRIED0.129 (0.024) SOONMARR— SINGLE— R n 1538

11 Married males earn 12.9 percent more than single males and the effect is highly significant (standard error in parentheses). REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION NLSY 1988 data Dependent variable LGEARN OLS MARRIED0.129 (0.024) SOONMARR— SINGLE— R n 1538

12 The effect has often been found in the literature. One explanation is that marriage entails financial responsibilities — in particular, the raising of children — that may encourage men to work harder or seek better paying jobs. REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION NLSY 1988 data Dependent variable LGEARN OLS MARRIED0.129 (0.024) SOONMARR— SINGLE— R n 1538

13 Another is that certain unobserved qualities that are valued by employers are also valued by potential spouses and hence are conducive to getting married. According to this explanation the dummy variable for being married is acting as a proxy for these qualities. REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION NLSY 1988 data Dependent variable LGEARN OLS MARRIED0.129 (0.024) SOONMARR— SINGLE— R n 1538

14 Other explanations have been proposed, but we will restrict attention to these two. With cross-sectional data it is difficult to discriminate between them. REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION NLSY 1988 data Dependent variable LGEARN OLS MARRIED0.129 (0.024) SOONMARR— SINGLE— R n 1538

15 However with panel data one can find out whether there is an uplift at the time of marriage or soon after, as would be predicted by the increased productivity hypothesis... REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION NLSY 1988 data Dependent variable LGEARN OLS MARRIED0.129 (0.024) SOONMARR— SINGLE— R n 1538

16... or whether men who end up married tend to earn more even when unmarried, as would be predicted by the unobserved heterogeneity hypothesis. REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION NLSY 1988 data Dependent variable LGEARN OLS MARRIED0.129 (0.024) SOONMARR— SINGLE— R n 1538

17 We define a second dummy variable SOONMARR equal to 1 if the respondent was single in 1988 but married within the next four years. The omitted category consists of those who were single in 1988 and still single four years later. REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION NLSY 1988 data Dependent variable LGEARN OLSFixed effects MARRIED (0.024)(0.028) SOONMARR—0.096 (0.037) SINGLE—— R n

NLSY 1988 data Dependent variable LGEARN OLSFixed effects MARRIED (0.024)(0.028) SOONMARR—0.096 (0.037) SINGLE—— R n Under the null hypothesis that the marital effect is dynamic and marriage encourages men to earn more, the coefficient of SOONMARR should be 0 because the men in this category were still single as of REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

19 The t statistic is 2.60 and so it is significantly different from 0 at the 1 percent level, leading us to reject the null hypothesis at that level. REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION NLSY 1988 data Dependent variable LGEARN OLSFixed effects MARRIED (0.024)(0.028) SOONMARR—0.096 (0.037) SINGLE—— R n

20 However, if the alternative hypothesis is true, the coefficient of SOONMARR should be equal to that of MARRIED, but it is lower. REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION NLSY 1988 data Dependent variable LGEARN OLSFixed effects MARRIED (0.024)(0.028) SOONMARR—0.096 (0.037) SINGLE—— R n

21 To test whether it is significantly lower, the easiest method is to change the reference category to those who were married by 1988 and to introduce a new dummy variable SINGLE that is equal to 1 if the respondent was still single four years later. REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION NLSY 1988 data Dependent variable LGEARN OLSFixedFixedeffects MARRIED — (0.024)(0.028) SOONMARR—0.096–0.066 (0.037)(0.034) SINGLE——–0.163 (0.028) R n

22 The coefficient of SOONMARR now estimates the difference between the coefficients of those married by 1988 and those married within the next four years, and if the second hypothesis is true, it should be equal to 0. REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION NLSY 1988 data Dependent variable LGEARN OLSFixedFixedeffects MARRIED — (0.024)(0.028) SOONMARR—0.096–0.066 (0.037)(0.034) SINGLE——–0.163 (0.028) R n

NLSY 1988 data Dependent variable LGEARN OLSFixedFixedeffects MARRIED — (0.024)(0.028) SOONMARR—0.096–0.066 (0.037)(0.034) SINGLE——–0.163 (0.028) R n The t statistic is –1.93, so we (just) do not reject the second hypothesis at the 5 percent level. The evidence is more compatible wtih the first hypothesis, but it is possible that neither hypothesis is correct on its own and the truth might reside in some compromise. REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

24 The starting point for a discussion of regression models using panel data is an equation of the form shown above, where the X j variables are observed and the Z p variables are unobserved. REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

25 The index i refers to the unit of observation, t refers to the time period, and j and p are used to differentiate between different observed and unobserved explanatory variables.  it is a disturbance term assumed to satisfy the regression model assumptions. REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

26 A trend term t has been introduced to allow for a shift of the intercept over time. If the implicit assumption of a constant rate of change seems too strong, the trend can be replaced by a set of dummy variables, one for each time period except the reference period. REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

27 The X j variables are usually the variables of interest, while the Z p variables are responsible for unobserved heterogeneity and as such constitute a nuisance component of the model. REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

28 Note that the unobserved heterogeneity is assumed to be unchanging and accordingly the Z p variables do not have a time subscript. REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

29 Because the Z p variables are unobserved, there is no means of obtaining information about the  p Z p component of the model and it is convenient to define a term  i, known as the unobserved effect, representing the joint impact of the Z p variables on Y i. REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

Hence we can rewrite the regression model as shown. The characterization of the  i component will be seen to be crucially important in what follows. 30 REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

First, however, note that if the X j controls are so comprehensive that they capture all the relevant characteristics of the individual, there will be no relevant unobserved characteristics. 31 REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

In that case the  i term may be dropped and pooled OLS may be used to fit the model, treating all the observations for all of the time periods as a single sample. 32 REGRESSION ANALYSIS WITH PANEL DATA: INTRODUCTION

Copyright Christopher Dougherty These slideshows may be downloaded by anyone, anywhere for personal use. Subject to respect for copyright and, where appropriate, attribution, they may be used as a resource for teaching an econometrics course. There is no need to refer to the author. The content of this slideshow comes from Section 14.1 of C. Dougherty, Introduction to Econometrics, fourth edition 2011, Oxford University Press. Additional (free) resources for both students and instructors may be downloaded from the OUP Online Resource Centre Individuals studying econometrics on their own who feel that they might benefit from participation in a formal course should consider the London School of Economics summer school course EC212 Introduction to Econometrics or the University of London International Programmes distance learning course 20 Elements of Econometrics