STAT 497 LECTURE NOTE 9 DIAGNOSTIC CHECKS.

Slides:



Advertisements
Similar presentations
Autocorrelation Functions and ARIMA Modelling
Advertisements

Dummy Variables. Introduction Discuss the use of dummy variables in Financial Econometrics. Examine the issue of normality and the use of dummy variables.
Regression Analysis.
Econometric Modeling Through EViews and EXCEL
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Forecasting Using the Simple Linear Regression Model and Correlation
Inference for Regression
Part II – TIME SERIES ANALYSIS C5 ARIMA (Box-Jenkins) Models
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Time Series Building 1. Model Identification
STAT 497 LECTURE NOTES 8 ESTIMATION.
8. Heteroskedasticity We have already seen that homoskedasticity exists when the error term’s variance, conditional on all x variables, is constant: Homoskedasticity.
Chapter 13 Additional Topics in Regression Analysis
Lecture 20 Simple linear regression (18.6, 18.9)
Statistical Analysis SC504/HS927 Spring Term 2008 Session 7: Week 23: 7 th March 2008 Complex independent variables and regression diagnostics.
Topic 3: Regression.
Business Statistics - QBM117 Statistical inference for regression.
12 Autocorrelation Serial Correlation exists when errors are correlated across periods -One source of serial correlation is misspecification of the model.
1 MADE WHAT IF SOME OLS ASSUMPTIONS ARE NOT FULFILED?
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
What does it mean? The variance of the error term is not constant
1 MF-852 Financial Econometrics Lecture 10 Serial Correlation and Heteroscedasticity Roy J. Epstein Fall 2003.
Pure Serial Correlation
Chap 14-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 14 Additional Topics in Regression Analysis Statistics for Business.
1Spring 02 Problems in Regression Analysis Heteroscedasticity Violation of the constancy of the variance of the errors. Cross-sectional data Serial Correlation.
Problems with the Durbin-Watson test
STAT 497 LECTURE NOTE 9 DIAGNOSTIC CHECKS 1. After identifying and estimating a time series model, the goodness-of-fit of the model and validity of the.
Forecasting (prediction) limits Example Linear deterministic trend estimated by least-squares Note! The average of the numbers 1, 2, …, t is.
1 Chapter 5 : Volatility Models Similar to linear regression analysis, many time series exhibit a non-constant variance (heteroscedasticity). In a regression.
2/25/ lecture 121 STATS 330: Lecture 12. 2/25/ lecture 122 Diagnostics 4 Aim of today’s lecture To discuss diagnostics for independence.
Metrics Lab Econometric Problems Lab. Import the Macro data from Excel and use first row as variable names Time set the year variable by typing “tsset.
Analysis of financial data Anders Lundquist Spring 2010.
Forecasting. Model with indicator variables The choice of a forecasting technique depends on the components identified in the time series. The techniques.
1 Simple Linear Regression Chapter Introduction In Chapters 17 to 19 we examine the relationship between interval variables via a mathematical.
Heteroscedasticity Heteroscedasticity is present if the variance of the error term is not a constant. This is most commonly a problem when dealing with.
Chapter 8: Multiple Regression for Time Series
Ch5 Relaxing the Assumptions of the Classical Model
Vera Tabakova, East Carolina University
Inference for Least Squares Lines
REGRESSION DIAGNOSTIC III: AUTOCORRELATION
Dynamic Models, Autocorrelation and Forecasting
Linear Regression.
REGRESSION (CONTINUED)
Kakhramon Yusupov June 15th, :30pm – 3:00pm Session 3
REGRESSION DIAGNOSTIC II: HETEROSCEDASTICITY
REGRESSION (CONTINUED)
Chapter 6: Autoregressive Integrated Moving Average (ARIMA) Models
Econometric methods of analysis and forecasting of financial markets
Fundamentals of regression analysis
Fundamentals of regression analysis 2
Pure Serial Correlation
HETEROSCEDASTICITY: WHAT HAPPENS IF THE ERROR VARIANCE IS NONCONSTANT?
Chapter 12 – Autocorrelation
Autocorrelation.
Serial Correlation and Heteroskedasticity in Time Series Regressions
Lecturer Dr. Veronika Alhanaqtah
REGRESSION DIAGNOSTIC I: MULTICOLLINEARITY
Serial Correlation and Heteroscedasticity in
HETEROSCEDASTICITY: WHAT HAPPENS IF THE ERROR VARIANCE IS NONCONSTANT?
Chapter 4, Regression Diagnostics Detection of Model Violation
Tutorial 1: Misspecification
Chapter 13 Additional Topics in Regression Analysis
Autocorrelation Dr. A. PHILIP AROKIADOSS Chapter 5 Assistant Professor
Autocorrelation.
Lecturer Dr. Veronika Alhanaqtah
Autocorrelation MS management.
Financial Econometrics Fin. 505
Serial Correlation and Heteroscedasticity in
BOX JENKINS (ARIMA) METHODOLOGY
Chap 7: Seasonal ARIMA Models
Presentation transcript:

STAT 497 LECTURE NOTE 9 DIAGNOSTIC CHECKS

DIAGNOSTIC CHECKS After identifying and estimating a time series model, the goodness-of-fit of the model and validity of the assumptions should be checked. If we have a perfect model fit, then we can construct the ARIMA forecasts.

1. NORMALITY OF ERRORS Check the histogram of the standardized residuals, . Draw Normal QQ-plot of the standardized residuals (should be a straight line on 450 line). Look at Tukey’s simple 5-number summary + skewness (should be 0 for normal)+ kurtosis (should be 3 for normal) or excess kurtosis (should be 0 for normal)

1. NORMALITY OF ERRORS Jarque-Bera Normality Test: Skewness and kurtosis are used for constructing this test statistic. JB (1981) tests whether the coefficients of skewness and excess kurtosis are jointly 0.

1. NORMALITY OF ERRORS JB test statistic JB> , then reject the null hypothesis that residuals are normally distributed.

1. NORMALITY OF ERRORS The chi-square approximation, however, is overly sensitive for small samples, rejecting the null hypothesis often when it is in fact true. Furthermore, the distribution of p-values departs from a uniform distribution and becomes a right-skewed uni-modal distribution, especially for small p-values. This leads to a large Type I error rate. The table below shows some p-values approximated by a chi-square distribution that differ from their true alpha levels for very small samples. You can also use Shapiro-Wilk test.

1. NORMALITY OF ERRORS Calculated p-value equivalents to true alpha levels at given sample sizes True α level 20 30 50 70 100 .1 .307 .252 .201 .183 .1560 .05 .1461 .109 .079 .067 .062 .025 .051 .0303 .020 .016 .0168 .01 .0064 .0033 .0015 .0012 .002

2. DETECTION OF THE SERIAL CORRELATION In OLS regression, time residuals are often found to be serially correlated with their own lagged values. Serial correlation means OLS is no longer an efficient linear estimator. Standard errors are incorrect and generally overstated. OLS estimates are biased and inconsistent if a lagged dependent variable is used as a regressor.

2. DETECTION OF THE SERIAL CORRELATION Durbin-Watson test is for regular regression with independent variables. It is not appropriate for time series models with lagged dependent variables. It only tests for AR(1) errors. There should be a constant term and deterministic independent variables in the model.

2. DETECTION OF THE SERIAL CORRELATION Serial Correlation Lagrange Multiplier (Breusch-Godfrey) Test is valid in the presence of lagged dependent variables. It tests for AR(p) errors.

2. DETECTION OF THE SERIAL CORRELATION The test hypothesis: Test statistic: Obtained from the auxiliary regression

2. DETECTION OF THE SERIAL CORRELATION Determination of r: No obvious answer exists. In empirical studies for AR, ARMA: r=p+1 lags For seasonal, r=s.

2. DETECTION OF THE SERIAL CORRELATION Ljung-Box (Modified Box-Pierce) or Portmanteau Lack-of-Fit Test: Box and Pierce (1970) have developed a test to check the autocorrelation structure of the residuals. Then, it is modified by Ljung and Box. The null hypothesis to be tested:

2. DETECTION OF THE SERIAL CORRELATION The test statistic:

2. DETECTION OF THE SERIAL CORRELATION If the correct model is estimated, If , reject H0. This means that the autocorrelation exists in residuals. Assumption is violated. Check the model again. It is better to add another lag in AR or MA part of the model.

3. DETECTING HETEROSCEDASTICITY Heteroskedasticity is a violation of the constant error variance assumption. It occurs if variance of error changing by time.

3. DETECTING HETEROSCEDASTICITY ACF-PACF PLOT OF SQUARED RESIDUALS: Since {at} is a zero mean process, the variance of at is defined by the expected value of squared at’s. So, if at’s are homoscedastic, the variance will be constant (not change over time) and when we look at the ACF and PACF plots of squared residuals, they should be in 95% WN limits. If not, this is a sign of heteroscedasticity.

3. DETECTING HETEROSCEDASTICITY Let rt be the log return of an asset at time t. We are going to look at the study of volatility: the series is either serially uncorrelated or with minor lower order serial correlations, but it is a dependent series. Examine the ACF for the residuals and squared residuals for the calamari catch data. The catch data had a definite seasonality, which was removed. Then, the remaining series was modelled with an AR(5) model and the residuals of this model are obtained. There are various definitions of what constitutes weak dependence of a time series. However, the operational definition of independence here will be that both the autocorrelation functions of the series and the squared series show no autocorrelation. If there is no serial correlation of the series but there is of the squared series, then we will say there is weak dependence. This will lead us to examine the volatility of the series, since that is exemplified by the squared terms.

3. DETECTING HETEROSCEDASTICITY Figure 1: Residuals after AR(5) fitted to the deseasoned calamari data Figure 2: Autocorrelation of the squared residuals

3. DETECTING HETEROSCEDASTICITY Figure 3: Autocorrelation for the log returns for the Intel series

3. DETECTING HETEROSCEDASTICITY Figure 4: ACF of the squared returns Figure 5: PACF for squared returns Combining these three plots, it appears that this series is serially uncorrelated but dependent. Volatility models attempt to capture such dependence in the return series

3. DETECTING HETEROSCEDASTICITY If we ignore heterocedasticity: The OLS estimator is unbiased but not efficient. The GLS or WLS is the Gauss-Markov estimator. The estimate of the variance of the OLS estimator is a biased estimator of the true variance. The classical testing procedures are invalidated. Now, the question is how we can detect heteroscedasticity?

3. DETECTING HETEROSCEDASTICITY White’s General Test for Heteroscedasticity: After identified model is estimated, we obtain the residuals, . Then, can be written as

3. DETECTING HETEROSCEDASTICITY Then, construct the following artificial regression The homocedastic case implies that 1 = 2 = ... = 1 = 2=…= 1= 2=…= 0, therefore

3. DETECTING HETEROSCEDASTICITY Then, the test statistics is given by under the null hypothesis of homoscedasticity where m is the number of variables in artificial regression except the constant term.

3. DETECTING HETEROSCEDASTICITY The Breush-Pagan Test: It is a Lagrange-Multiplier test for heteroscedasticity. Consider the IF of a time series. Let’s assume that we can write our model in AR(m). Then, consider testing

3. DETECTING HETEROSCEDASTICITY Note that we need to evaluate the conditional (on the independent variables) expectation of the squared of the error term, The homocedastic case implies that 1 = 2 = ... = m = 0.

3. DETECTING HETEROSCEDASTICITY The problem, however, is that we do not know the error term , but it can be replaced by an estimate . A simple approach is to run a regression, and test if the slope coefficients are all equal to zero.

3. DETECTING HETEROSCEDASTICITY The test statistic under the null hypothesis of homoscedasticity where m is the number of variables in artificial regression except the constant term.

3. DETECTING HETEROSCEDASTICITY If we reject the null hypothesis, this means that the error variance is not constant. It is changing over time. Therefore, we need to model the volatility. ARCH (Autoregressive Conditional Heteroskedasticity) or GARCH (Generalized Autoregressive Conditional Heteroskedasticity) modeling helps us to model the error variance.

EXAMPLE (BEER) > library(TSA) > fit=arima(beer,order=c(3,1,0),seasonal=list(order=c(3,0,0), period=4)) > par(mfrow=c(1,3)) > plot(window(rstandard(fit),start=c(1975,1)), ylab='Standardized Residuals',type='o') > abline(h=0) > acf(as.vector(window(rstandard(fit),start=c(1975,1))), lag.max=36) > pacf(as.vector(window(rstandard(fit),start=c(1975,1))), lag.max=36)

EXAMPLE (BEER) > fit2=arima(beer,order=c(2,1,1),seasonal=list(order=c(3,0,1), period=4)) > fit2 Call: arima(x = beer, order = c(2, 1, 1), seasonal = list(order = c(3, 0, 1), period = 4)) Coefficients: ar1 ar2 ma1 sar1 sar2 sar3 sma1 -0.2567 -0.4255 -0.4990 1.1333 0.2656 -0.3991 -0.9721 s.e. 0.1426 0.1280 0.1501 0.1329 0.1656 0.1248 0.1160 sigma^2 estimated as 1.564: log likelihood = -157.47, aic = 328.95 > plot(window(rstandard(fit2),start=c(1975,1)), ylab='Standardized Residuals',type='o') > abline(h=0) > acf(as.vector(window(rstandard(fit2),start=c(1975,1))), lag.max=36) > pacf(as.vector(window(rstandard(fit2),start=c(1975,1))), lag.max=36)

EXAMPLE (BEER)

EXAMPLE (BEER) > hist(rstandard(fit2), xlab='Standardized Residuals')

EXAMPLE (BEER) > qqnorm(rstandard(fit2)) > qqline(rstandard(fit2))

EXAMPLE (BEER) > shapiro.test(window(rstandard(fit2),start=c(1975,1))) Shapiro-Wilk normality test data: window(rstandard(fit2), start = c(1975, 1)) W = 0.9857, p-value = 0.4181 > jarque.bera.test(resid(fit2)) Jarque Bera Test data: resid(fit2) X-squared = 1.0508, df = 2, p-value = 0.5913

EXAMPLE (BEER) > tsdiag(fit2)

EXAMPLE (BEER) > Box.test(resid(fit2),lag=15,type = c("Ljung-Box")) Box-Ljung test data: resid(fit2) X-squared = 24.2371, df = 15, p-value = 0.06118 > Box.test(resid(fit2),lag=15,type = c("Box-Pierce")) Box-Pierce test X-squared = 21.4548, df = 15, p-value = 0.1229

EXAMPLE (BEER) > rr=resid(fit2)^2 > par(mfrow=c(1,2)) > acf(rr) > pacf(rr)

EXAMPLE (BEER) > par(mfrow=c(1,1)) > result=plot(fit2,n.ahead=12,ylab='Series & Forecasts',col=NULL,pch=19) > abline(h=coef(fit2)) > forecast=result$pred > cbind(beer,forecast) > plot(fit2,n1=1975,n.ahead=12,ylab='Series, Forecasts, Actuals & Limits', pch=19)