# Panel and Time Series Cross Section Models

## Presentation on theme: "Panel and Time Series Cross Section Models"— Presentation transcript:

Panel and Time Series Cross Section Models
Sociology 229: Advanced Regression Copyright © 2010 by Evan Schofer Do not copy or distribute without permission

Announcements Assignment 5 Due Assignment 6 handed out

Panel Data Panel data typically refers to a particular type of multilevel data Measurement at over time (T1, T2, T3…) is nested within persons (or firms or countries) Each time is referred to as a “wave” Person 1 T2 T1 T4 T3 T5 Person 2 Person 3 Person 4

Panel Data Panel data involves combining:
Information about multiple cases A “cross-sectional” component Information about cases over time A ‘longitudinal’ or ‘time series’ component Panel datasets are described in terms of: N, the number of individual cases T, the number of waves If N is “large” compared to T, the dataset is called “cross-sectionally dominant” If T is “large” compared to N, the dataset is called “time-series dominant” “Time-series Cross-section” / TSCS data: small set of units (usually 10-30), moderate T

Panel Terminology Issue: Panel means 2 things:
1. Panel is an umbrella term for all data with time-series and cross-section components 2. Panel refers specifically to datasets with large N, small T (cross-sectionally dominant) Example: a survey of 1000 people at 3 points in time Another distinction: Balanced vs unbalanced Panel data are said to be “balanced” if information on each person is available for all T If data is missing for some cases at some points in time, the data are “unbalanced” This is common for many datasets on countries or firms.

Panel Data Example ID Wave Grade Math Test Family income Class Size
Gender 1 5 67 80K 18 2 7 78 82K 22 3 9 85 88K 26 34 27K 41 23K 33

Benefits of Panel Data 1. Pooling (either cases or across time) provides richer information More observations = better 2. Panel data is longitudinal You can follow individual cases over time Allows us to study dynamic processes Baltagi: “Dynamics of adjustment” Provides opportunities to better tease out causal relationships 3. Panel models allow us to control for individual heterogeneity.

Benefits of Panel Data 4. Panel data allows investigation of issues that are obscured in cross-sectional data Example: women’s participation in the labor force Suppose a cross-sectional dataset shows that 50% of women are in the labor force What’s going on? Are 50% of women pursuing work and 50% staying at home? Are 100% of women seeking employment, but many experiencing unemployment at any given time? Or something in between? Without longitudinal information, we can’t develop a clear picture.

Problems with Panel Data
1. Violates OLS independence assumption Clustering by cases Clustering by time Other sources? Ex: spatial correlation 2. For small N large T (TSCS): Poolability Appropriate to combine very different cases? 3. Serial correlation Temporally adjacent cases may have correlated error 4. Non-stationarity (for “larger T” data) 5. Other issues: heteroskedasticity, etc…

Panel Data Strategies Traditional strategy: Fixed vs. random effects
Use Hausman test to choose More recent strategies: 1. Distinction between tools for “panel” vs “TSCS” data Different problems, different solutions 2. Many more options New solutions to existing problems 3. More attention to “dynamic” models Models that include lagged values (lags of Y) Issue: overall, not much consensus about what constitutes “best practices”.

Main focus of econometric studies: Large N, small T panels Concern with “the omitted variables problem” Unobserved heterogeneity “Individual” heterogeneity Unobserved effects Example: Individual wages Often modeled as a function of education, experience... But, individuals differ in ways that are difficult to control Strategy: Use a model that “gets rid of” individual variability Such as “fixed effects”…

Basic “static” panel model w/ unobserved effects: Subscripts i, t refer to cases & time periods bX = covariates mi = unobserved unit-specific error nit = idiosyncratic error Look familiar? Basically the same as the random & fixed effects models discussed previously… Other texts use a or u or zeta instead of “mu” Other texts use e for error, instead of “nu” Issue: whether to treat mi as “fixed” versus “random” But, there are other choices, as well…

Unobserved Effects Basic “static” panel model w/ unobserved effects:
Issue: mi is the problem The “unobserved” time-invariant features of an individual that may cause their wages to be especially high or low across time Ex: Too much TV as a child; or lots of parental pressure to make \$ Or genes or IQ or whatever… Common strategies: 1. Wipe it out: first differences 2. Build it into the model: fixed effects Or random effects, if we think mi is not correlated with Xs 3. Control for it in some other way – such as with a lagged DV Which already might reflect the impact of mi.

Unobserved Effects: First Differences
First differences (2 waves of data): Which can be expressed generally as: Result: mi goes away! Differencing allows us to estimate coefficients, purged of unit-specific (time-invariant) unobserved effects

Unobserved Effects: Fixed Effects
Fixed Effects – start with the same basic model: A second approach: the “within” transformation Center everything around unit-specific mean (‘time demeaning’) Which wipes out mu: Equivalent to putting in dummy variables for every case Estimating m as a “fixed” effect In stata: xtreg wage educ, fe

Fixed Effects vs. First Differencing
FE and FD are very similar approaches In fact, for 2 wave datasets, results are identical For 3+ waves, results can differ Which is better for large N, small T? FE is more efficient if no serial correlation FD is better if lots of serial correlation Ex: concerns about nonstationarity Which is better for small N, larger T (TSCS)? More likely issues with nonstationarity & spurious regression problem… FE = problematic, FD = better But, FD can turn a I(1) process into weakly dependent. Fixed!

Fixed Effects vs. First Differencing
General remarks (Wooldridge 2009): Both FE and FD are biased if variables aren’t strictly exogenous X variables should be uncorrelated with present, past, & future error… Including inclusion of lagged dependent variable BUT: bias in FE declines with large T Bias in FD does not… Wooldridge 2009, Ch 14 (p. 587): “Generally, it is difficult to choose between FE and FD when they give substantially different results. It makes sense to report both sets of results and try to determine why they differ.”

Fixed Effects vs. First Differencing
More remarks 1. FE and FD cannot estimate effects of variables that do not change over time And can have problems if variables change rarely… 2. Both FE and FD are not as efficient as models that include “between case” variability A trade off between efficiency and potential bias (if unobserved effects are correlated with Xs) 3. Both FE and FD are very sensitive to measurement error Within-case variability is often small… may be swamped by error/noise.

“Two-way” Error Components
Unobserved effects may occur over time Example: common effect of year (e.g., a recession) on wages A “crossed” multilevel model What Baltagi (2008) calls 2-way error components Versus basic FE/RE, which has one eror component Strategies: Use dummies for time in combination with FE/FD Specify a “crossed” multilevel model in Stata See Rabe-Hesketh and Skrondal Basically, create a 3-level model, nesting groups underneath.

Random Effects Random effects: Additional efficiency at the cost of additional assumptions Key assumption: unit-specific unobserved effect is not correlated with X variables If assumptions aren’t met, results are biased Omitted X variables often induce correlation between other X variables and the unobserved effect If main purpose of panel analysis is to avoid unobserved effects, fixed effects is a safer choice In stata: xtreg wage educ, re – for GLS estimator xtreg wage educ, mle – for ML estimator

Random Effects The random effects model
Then, instead of the “within” transformation, we take out part of within-case variation: GLS Estimator: “Quasi time-demeaned data” Where:

Random Effects Random effects model is a hybrid: An intermediate model between OLS and FE If T is large, it becomes more like FE If the unobserved effect has small variance (isn’t important), RE becomes more like OLS If unobserved effect is big (cases hugely differ), results will be more like FE GLS random effects estimator is for large N Its properties for small N large T aren’t well studied Beck and Katz advise against it And, if you must use random effects, they recommend ML estimator

Random Effects When to use random effects?
1. If you are confident that unit unobserved effect isn’t correlated with Xs Either for theoretical reasons… Or, because you have lots of good controls Ex: lots of regional dummies for countries… 2. Your main focus is time-constant variables FE isn’t an option. Hopefully #1 applies. 3. When your focus is between case variability And/or there is hardly any within-case variability (#1!) 4. When a Hausman test indicates that they yield similar results

Hausman Specification Test
Hausman Specification Test: A tool to help evaluate fit of fixed vs. random effects Logic: Both fixed & random effects models are consistent if models are properly specified However, some model violations cause random effects models to be inconsistent Ex: if X variables are correlated to random error In short: Models should give the same results… If not, random effects may be biased If results are similar, use the most efficient model: random effects If results diverge, odds are that the random effects model is biased. In that case use fixed effects…

Hausman Specification Test
Strategy: Estimate both fixed & random effects models Save the estimates each time Finally invoke Hausman test Ex: xtreg var1 var2 var3, i(groupid) fe estimates store fixed xtreg var1 var2 var3, i(groupid) re estimates store random hausman fixed random

Hausman Specification Test
Example: Environmental attitudes fe vs re . hausman fixed random ---- Coefficients ---- | (b) (B) (b-B) sqrt(diag(V_b-V_B)) | fixed random Difference S.E. age | male | dmar | demp | educ | incomerel | ses | b = consistent under Ho and Ha; obtained from xtreg B = inconsistent under Ha, efficient under Ho; obtained from xtreg Test: Ho: difference in coefficients not systematic chi2(7) = (b-B)'[(V_b-V_B)^(-1)](b-B) = Prob>chi2 = Direct comparison of coefficients… Non-significant p-value indicates that models yield similar results…

Hausman Specification Test
Issues with Hausman tests (Wooldridge 2009) 1. Fail to reject means either: FE and RE are similar… you’re good! FE estimates are VERY imprecise Large differences from RE are nevertheless insignificant That can happen if the data are awful/noisy. Watch out! 2. Watch for difference between “statistical significance” and “practical significance” With a huge sample, the Hausman test may “fail” even though RE is nearly the same as FE If differences are tiny, you could argue that RE is appropriate.

Allison’s Hybrid Approach
Allison (2009) suggests a ‘hybrid’ approach that provides benefits of FE and RE Also discussed in Gelman & Hill Builds on idea of decomposing X vars into mean, deviation 1. Compute case-specific mean variables egen meanvar1 = mean(var1), by(groupid) 2. Transform X variables into deviations Subtract case-specific mean egen withinvar1 = var1 – meanvar1 3. Do not transform the dependent variable Y 4. Include both X deviation & X mean variables 5. Estimate with a RE model

Allison’s Hybrid Approach
Benefits of hybrid approach: 1. Effects of “X-deviation” variables are equivalent to results from a fixed effects model All time-constant factors are controlled 2. Allows inclusion of time-constant X variables 3. You can build a general multilevel model Random slope coefficients; more than 2 level models… 4. You can directly test FE vs RE No Hausman test needed X-mean and X-deviation coefficients should be equal Conduct a Wald test for equality of coefficients Also differing X-mean & X-deviation coefs are informative.

SEM Approaches Both FE and RE can be estimated using SEM Benefits:
See Allison 2009, Bollen & Brand forthcoming (SF) Software: LISREL, EQS, AMOS, Mplus Some limited models in Stata using GLLAMM Benefits: FE and RE are nested… can directly compare with fit statistics (BIC, IFI, RMSEA) – HUGE advantage Allows tremendous flexibility Lagged DVs Can relax assumptions that covariate effects or variances are constant across waves of data Flexibility in correlation between unobserved effect and Xs And more!

SEM Approaches RE in SEM (from Allison 2009)
Unobserved effect correlated with Y across waves (Assumes no correlation with X) RE in SEM (from Allison 2009)

SEM Approaches FE in SEM (from Allison 2009)
Unobserved effect correlated with Y AND X at all waves

Serial Correlation Issue: What about correlated error in nearby waves of data? So far we’ve focused on correlated error due to unobserved effect (within each case) Strategies: Use a model that accounts for serial correlation STATA: xtregar – FE and RE with “AR(1) disturbance” Many other options… xtgee, etc. Develop a “dynamic model” Actually model the patterns of correlation across Y Include the lagged dependent variable (Y)… or many lags! This causes bias in FE, RE – so other models needed.

Dynamic Panel Models What if we have a dynamic process?
Examples from Baltagi 2008: Cigarette consumption (in US states) – lots of inertia Democracy (in countries) We might consider a model like: Y from the prior period is included as an independent variable Issue: FE, RE estimators are biased Time-demeaned (or quasi-demeaned) lag y correlated with error FE is biased for small T. Gets better as T gets bigger (like 30) RE also biased.

Dynamic Panel Models One solution: Use FD and instrumental variables
Strategy: If there’s a problem between error and lag Y, let’s find a way to calculate a NEW version of lag y that doesn’t pose a problem Idea: Further lags of Y aren’t an issue in a FD model. Use them as “instrumental variables” as a proxy for lag Y Arellano Bond GMM estimator A FD estimator Lag of levels as an instrument for differenced Y Arellano-Bover/Blundell-Bond “System GMM” estimator Expand on this by using lags of differences and levels as instruments Generalized Method of Moments (GMM) estimation.

Dynamic Panel Models Stata: xtabond – basic Arellano-Bond GMM model
xtdpdsys – System GMM estimator xtdpd – A flexible command to build system GMM models Lots of control over lag structure Can run non-dynamic models Key assumptions / issues Serial correlation of differenced errors limited to 1 lag No overidentifying restrictions (“Sargan test”) How many instruments? Criticisms: Angrist & Pichke 2009: assumptions not always plausible Allison 2009 Bollen and Brand, forth: Hard to compare models.

Dynamic Panel Models General remarks:
It is important to think carefully about dynamic processes… How long does it take things to unfold? What lags does it make sense to include? With huge datasets, we can just throw lots in With smaller datasets, it is important to think things through.

Methods: IV panel models
Traditional instrumental variable panel estimator: bX = exogenous covariates gZ = endogenous covariates (may be related to nit) mi = unobserved unit-specific error nit = idiosyncratic error Treat mi as random, fixed, or use differencing to wipe it out Use contemporaneous or lagged X and (appropriate) lags of Z as instruments in two-stage estimation of yit. Works if lag Z is plausibly exogenous.

TSCS Data Time Series Cross Section Data
Example: economic variables for industrialized countries Often countries Often ~30-40 years of data Beck (2001) No specific minimum, but be suspicious of T<10 Large N isn’t required (though not harmful)

TSCS Data: OLS PCSE Beck & Katz 2001
“Old” view: Use FGLS to deal with heteroskedasticity & correlated errors Problem: This underestimates standard errors New view: Use OLS regression With “panel corrected” standard errors To address panel heteroskedasticity With FE to deal with unit heterogeneity With lagged dependent variable in the model To address serial correlation.

TSCS Data: Dynamics Beck & Katz 2009 examine dynamic models
OLS PCSE with lagged Y and FE Still appropriate Better than some IV estimators But, didn’t compare to System GMM. Plumper, Troeger, Manow (2005) FE isn’t theoretically justified and absorbs theoretically important variance Lagged Y absorbs theoretically important temporal variation Theory must guide model choices…

TSCS Data: Nonstationary Data
Issue: Analysis of longitudinal (time-series) data is going through big changes Realization that strongly trending data cause problems Random walk / unit root processes / integrated of order 1 / non-stationary data Converse: stationary data, integrated of order zero The “spurious regression” problem Strategies: Tests for “unit root” in time series & panel data Differencing as a solution A reason to try FD models.

Panel Data Remarks 1. Panel data strategies are taught as “fixes”
How do I “fix” unobserved effects? How do I “fix” dynamics/serial correlation? But, the fixes really change what you are modeling A FE (within) model is a very different look at your data, compared to OLS Goal: learn the “fixes”… but get past that… start to think about interpretation 2. Much strife in literature People arguing over what “fix” is best Don’t be afraid of criticism… but expect that people will weigh in with different views

Panel Data Remarks 3. MOST IMPORTANT THING: Try a wide range of models
If your findings are robust, you’re golden If not, differences will help you figure out what is going on… Either way, you don’t get “surprised” when your results go away after following the suggestion of a reviewer!

Reading Discussion Schofer, Evan and Wesley Longhofer. “The Structural Sources of Associational Life.” Working Paper.