Presentation on theme: "Panel Data Analysis Stefan Trappl Constanze Fay. Schularick, Taylor (2012) Is credit growth predicting financial crisis? Analysis of macroeconomic panel."— Presentation transcript:
Panel Data Analysis Stefan Trappl Constanze Fay
Schularick, Taylor (2012) Is credit growth predicting financial crisis? Analysis of macroeconomic panel data for developed countries Two time periods: pre- and post-WW2, 79 major banking crisis in 14 countries. Dependent variable: “financial crisis” from Bordo et al. (2001) and Reinhart, Rogoff (2009); independent variables: lagged credit and money supply, loans and bank assets, inflation, investment, GDP Trappl, Fay2 11/11/2014
Our approach: Trappl, Fay11/11/20143 Is income-inequality predicting financial crisis? We use the dataset of Schularick/Taylor, but use only a reduced dataset (8 countries) because of the limited availability of income-inequality-data Dependent binary variable: “financial crisis (0/1)”; independent variables: lagged credit and money supply, loans, investment, personal income inequality (measured by the „Top1%-Income-Share“) Dataset by Thomas Piketty: Capital in the 21st Century
Schularick Taylor - Model Logistic regression estimating the probability of a crisis based on previous periods credit growth OLS and Logit models with country and year fixed effects Trappl, Fay411/11/2014 Probability of crisis Lagged credit growth Control variable
Our Model: Trappl, Fay11/11/20145 Generalized Linear Mixed Effects Regression estimating the probability of a crisis based on Income-Inequality in the previous periods GLMM model; country = group Probability of crisis Fixed Effects & Random Effects Terms Error term
XLConnect package Java-based; used for importing Excel sheets, reading and writing Excel worksheets from within R Alternative: RODBC package only available in 32bit R version (switch to 32 from 64bit in „Tools/Global Options“ in Rstudio) There is a possibility to workaround the „incomplete final line“ error when using read.table to create data.frames from Excel or.csv files in R when using the JGR console (File/Load data) Trappl, Fay11/11/20146 Load Excel Sheets in R via either loadWorkbook or readWorksheetfromFile functions; Always save workbook for your commands to be done!
Panel data analysis Packages in R Paneldata: linear models for panel data pdR: panel data regression Pglm: panel generalized linear model Phtt: panel data analysis with heterogenuous time trends plm: linear models for panel data lme4, nlme: maximum likelihood estimation with panels Data preparation The pdata.frame function in plm prepares data frames for panel data analysis. An „index“ variable indicates which columns to recognize as unit and time variable. Default value („NULL“) assumes observations to be listed by individual (column 1) and then time (column 2) or add a number indicating the n° of units in a balanced panel or add a character string indicating the individual or time column; e.g. c(„state“,“year“) The first two columns in panel data have to be (1) the unit and (2) the time period (most granular level) OLS does not consider heterogeneity across units or time
Models in the plm package The individual heterogeneity across units is captured by two error components, one individual which does not change over time and one idiosyncratic assumed to be well behaved and iid. Trappl, Fay11/11/20148 No Yes Errors uncorrelated with regressors? Fixed effects model („within“) No Yes No Errors uncorrelated with regressors and white noise? OLS Pooling model Yes Random effects model („random“) First Differencing model (errors persistent)
Models in plm Types of models Pooling model (“pooling”): OLS, panel data is pooled, time series component is not considered Fixed effects model (“within”, dummy variables): based on the deviation of the individual means Fist-differences model (“fd”, lagged model): removes time- invariant individual error components by first-differencing; preferred whenindividual error component is persistent over time Random effects model (“random”): individual error term component uncorrelated with the regressors; more efficient than fixed effects „Between“ model is based on time (group) averages per unit which discards intragroup variability but is apt for non stationary data; used for estimating long run relationships Variable coefficient models assume that coefficients vary around an average FGLS is used when errors are heteroscedastic and autocorrelated, in case of fixed effects also fixed effects FGLS; Function plm model objects are the result of demeaned data; individual effects time-demeaned: fe, „within“, quasi-time demeaned for the random effects model and no-demeaning for pooling /OLS plm : within, between and random effects models pvcm : models with variable coefficients Pggls : FGLS Pgmm : GMM Effects: individual or time effects; if there are time effects use gls function in lme package (john fox appendix time series regression) Function (formula, data, index, effect, model)
Which model to chose? 1.„Poolability“ test with H 0 implying that OLS is the apt model, there are no fixed effects, units are sufficiently homogeneous and coefficients are the same for all units 2.Test for individual or time effects: 3.Test to chose between fixed or random effects models with Hausman-type test comparing estimators under the null of no significant difference between the two models; random model more efficient The F-test compares the model for the full sample with a model based on an equation for each unit. Pooltest ( plm, pvcm model=„within“)or pFtest : A significant F-statistics leads to If a rejection of the H 0 implying that there are fixed effects. plmtest (plm,type,effect) type: Lagrange multiplier tests („bp“, „honda“, „kw“, „ghm“), effect: individual, time and twoways phtest(plm „within“, plm „random“ ) Assume random effects if n is large relative to t so that individual effects can be viewed as random
Which model to choose? 4.test for serial correlation of the error term: fixed effects always cause serial correlation, in addition there may be usual AR(1) correlation of the idiosyncratic error term -> as these tests have power against each other, joint tests are needed which, however, do not give information on the reason for rejection! There are several joint, marginal and conditional tests in plm; problem is if errors are not normal and homoscedastic 5.+ further diagnostics + screening tests; dynamic models and when lack of exogeneity of regressors: GMM Trappl, Fay11/11/ In short panels with a large number of observations serial correlation is not a problem as due to the large number of observations error correlations appear as random. Not so in long time series macro models.
panel analysis functions Commands ls("package:plm") "between" "Between" "cipstest" "dynformula" "ercomp" "fixef" "has.intercept" "index" "mtest" "pbgtest" "pbltest" "pbsytest" "pcce" "pcdtest" "pdata.frame" "pdim" "pdwtest" "pFormula" "pFtest" "pggls" "pgmm" "pht" "phtest" "plm" "plm.data" "plmtest" "pmg" "pmodel.response" "pooltest" "purtest" "pvar" "pvcm" "pvcovHC" "pwartest" "pwfdtest" "pwtest" "r.squared" "sargan" "vcovBK" "vcovHC" "vcovSCC" "Within" Functions plm : function (formula, data, subset, na.action, effect = c("individual", "time", "twoways"), model = c("within", "random", "ht", "between", "pooling", "fd"), random.method = c("swar", "walhus", "amemiya", "nerlove", "kinla"), inst.method = c("bvk", "baltagi"), restrict.matrix = NULL, restrict.rhs = NULL, index = NULL,...) pdata.frame : function (x, index = NULL, drop.index = FALSE, row.names = TRUE) Explorative data analysis: use „|“ to consider both unit and year dimensions in scatterplot function of car package Literature: Croissant, Y., Millo, G.: Panel Data Econometrics in R. The plm package.
Literature Schularick, Moritz, and Alan M. Taylor "Credit Booms Gone Bust: Monetary Policy, Leverage Cycles, and Financial Crises, " American Economic Review, 102(2): Trappl, Fay1311/11/2014