Chapter 9 Model Building

Slides:

Advertisements

Similar presentations

FINANCIAL TIME-SERIES ECONOMETRICS SUN LIJIAN Feb 23,2001.

Advertisements

Cointegration and Error Correction Models

Autocorrelation Functions and ARIMA Modelling

Autocorrelation and Heteroskedasticity

11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.

CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.

Model Building For ARIMA time series

Part II – TIME SERIES ANALYSIS C5 ARIMA (Box-Jenkins) Models

6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.

1 MF-852 Financial Econometrics Lecture 11 Distributed Lags and Unit Roots Roy J. Epstein Fall 2003.

Unit Roots & Forecasting

Slide 1 DSCI 5340: Predictive Modeling and Business Forecasting Spring 2013 – Dr. Nick Evangelopoulos Lecture 7: Box-Jenkins Models – Part II (Ch. 9) Material.

Time Series Building 1. Model Identification

STAT 497 APPLIED TIME SERIES ANALYSIS

How should these data be modelled?. Identification step: Look at the SAC and SPAC Looks like an AR(1)- process. (Spikes are clearly decreasing in SAC.

BABS 502 Lecture 9 ARIMA Forecasting II March 23, 2009.

Data Sources The most sophisticated forecasting model will fail if it is applied to unreliable data Data should be reliable and accurate Data should be.

BABS 502 Lecture 8 ARIMA Forecasting II March 16 and 21, 2011.

ARIMA Forecasting Lecture 7 and 8 - March 14-16, 2011

Financial Econometrics

Today Concepts underlying inferential statistics

11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.

Introduction to Regression Analysis, Chapter 13,

12 Autocorrelation Serial Correlation exists when errors are correlated across periods -One source of serial correlation is misspecification of the model.

BOX JENKINS METHODOLOGY

Box Jenkins or Arima Forecasting. H:\My Documents\classes\eco346\Lectures\chap ter 7\Autoregressive Models.docH:\My Documents\classes\eco346\Lectures\chap.

Regression Method.

10 IMSC, August 2007, Beijing Page 1 An assessment of global, regional and local record-breaking statistics in annual mean temperature Eduardo Zorita.

#1 EC 485: Time Series Analysis in a Nut Shell. #2 Data Preparation: 1)Plot data and examine for stationarity 2)Examine ACF for stationarity 3)If not.

Chapter 11 Linear Regression Straight Lines, Least-Squares and More Chapter 11A Can you pick out the straight lines and find the least-square?

It’s About Time Mark Otto U. S. Fish and Wildlife Service.

Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.

Introducing ITSM 2000 By: Amir Heshmaty Far. S EVERAL FUNCTION IN ITSM to analyze and display the properties of time series data to compute and display.

Autocorrelation, Box Jenkins or ARIMA Forecasting.

How do we identify non-stationary processes? (A) Informal methods Thomas 14.1 Plot time series Correlogram (B) Formal Methods Statistical test for stationarity.

STAT 497 LECTURE NOTE 9 DIAGNOSTIC CHECKS 1. After identifying and estimating a time series model, the goodness-of-fit of the model and validity of the.

Time Series Analysis Lecture 11

Dynamic Models, Autocorrelation and Forecasting ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.

Tutorial I: Missing Value Analysis

TESTING FOR NONSTATIONARITY 1 This sequence will describe two methods for detecting nonstationarity, a graphical method involving correlograms and a more.

Stats Term Test 4 Solutions. c) d) An alternative solution is to use the probability mass function and.

TESTING FOR NONSTATIONARITY 1 This sequence will describe two methods for detecting nonstationarity, a graphical method involving correlograms and a more.

Stationarity and Unit Root Testing Dr. Thomas Kigabo RUSUHUZWA.

MODEL DIAGNOSTICS By Eni Sumarminingsih, Ssi, MM.

F-tests continued.

Step 1: Specify a null hypothesis

Linear Regression.

Nonstationary Time Series Data and Cointegration

Financial Econometrics Lecture Notes 2

Lecture 8 ARIMA Forecasting II

Chapter 6: Autoregressive Integrated Moving Average (ARIMA) Models

Statistics 153 Review - Sept 30, 2008

Model Building For ARIMA time series

STAT 497 LECTURE NOTE 9 DIAGNOSTIC CHECKS.

CHAPTER 16 ECONOMIC FORECASTING Damodar Gujarati

STATIONARY AND NONSTATIONARY TIME SERIES

CHAPTER 29: Multiple Regression*

Chapter 5 Nonstationary Time series Models

Chapter 6: Forecasting/Prediction

Chapter 4 Other Stationary Time Series Models

Chapter 8. Model Identification

Introduction to Time Series

Testing for near integration with stationary covariates

The Spectral Representation of Stationary Time Series

Regression Forecasting and Model Building

Autocorrelation MS management.

Time Series introduction in R - Iñaki Puigdollers

BOX JENKINS (ARIMA) METHODOLOGY

MGS 3100 Business Analysis Regression Feb 18, 2016

Chap 7: Seasonal ARIMA Models

Presentation transcript:

Chapter 9 Model Building NOTE: Some slides have blank sections. They are based on a teaching style in which the corresponding blank (derivations, theorem proofs, examples, …) are worked out in class on the board or overhead projector.

Check for Model Appropriateness Residuals should be white noise or, based on parameter estimates - these are calculated after estimates (tswge uses backcasting)

Testing Residuals for White Noise Check sample autocorrelations of residuals vs 95% limit lines. Ljung-Box test (Portmanteau test) portmanteau [port man tō] 1. a large travelling case made of stiff leather, esp one hinged at the back so as to open out into two compartments 2. embodying several uses or qualities

Testing Residuals for White Noise Check sample autocorrelations of residuals vs 95% limit lines. Ljung-Box test (Portmanteau) - Test the hypotheses: - Test Statistic:

3. Other tests for randomness - runs test - etc.

Examining Residuals for Some Examples Demo: Generate realization from ARMA(2,1): Example 8.8 - Simulated seasonal data (fig8.8) Example 8.9 - Airline data (airlog)

tswge demo d1=gen.arma.wge(n=200,phi=c(1.2,-.8),theta=.9) plotts.sample.wge(d1) # overfit AR models est.ar.wge(d1,p=10,type='burg') # est.ar.wge(d1,p=12,type='burg') aic.wge(d1,p=0:6,q=0:2) d1.est=est.arma.wge(d1,p= # check residuals plotts.sample.wge(d1.est$res,arlimits=TRUE) ljung.wge(d1.est$res,p= # final model mean(d1)

Example 8.7 Simulated Seasonal Data Sample Autocorrelations Residuals Residual Sample Autocorrelations

tswge demo dd1=gen.aruma.wge(n=200,s=12,phi=c(1.25,-.9)) dd1=dd1+50 plotts.sample.wge(dd1) # overfit AR models ov14=est.ar.wge(dd1,p=14, type='burg') # ov16=est.ar.wge(dd1,p=16, type='burg') dd1.12=artrans.wge(dd1,phi.tr=c(0,0,0,0,0,0,0,0,0,0,0,1)) plotts.sample.wge(dd1.12) aic.wge(dd1.12,p=0:6,q=0:2) # estimate model parameters dd1.12.est=est.arma.wge(dd1.12,p= # check residuals plotts.sample.wge(dd1.12.est$res,arlimits=TRUE) ljung.wge(dd1.12.est$res,p= # final model mean(dd1)

Example 8.8 Airline Data Data Sample Autocorrelations Residuals Res. Sample Autocorrelations

twge demo log airline data data(airlog) plotts.sample.wge(airlog) # overfit AR models ov14=est.ar.wge(airlog,p=14, type='burg') # ov16=est.ar.wge(airlog,p=16, type='burg') # transform data la.12=artrans.wge(airlog,phi.tr=c(0,0,0,0,0,0,0,0,0,0,0,1)) plotts.sample.wge(la.12) aic.wge(la.12,p=0:13,q=0:2) # estimate parameters of stationary part la.12.est=est.arma.wge(la.12,p= # check residuals plotts.sample.wge(la.12.est$res,arlimits=TRUE) ljung.wge(la.12.est$res,p= # final model

Another Important Check for Model Appropriateness Does the model make sense? stationarity vs. nonstationarity seasonal vs. non-seasonal correlation based vs signal-plus-noise model are characteristics of fitted model consistent with those of the data forecasts and spectral estimates make sense? do realizations and their characteristics behave like the data

Stationarity vs Nonstationarity? A decision that must be consciously made by the investigator Tools: overfitting unit roots tests (Dickey-Fuller test, etc.) - designed to test the null hypothesis - i.e. decision to include a unit root in the model is based on a decision not to reject a null hypothesis - in reality the models (1 - B)Xt = at and (1 - .97B)Xt = at will both create realizations for which the null hypothesis is usually not rejected using the Dickey-Fuller test an understanding of the physical problem and properties of the model selected

Modeling Global Temperature Data (a) Stationary model: data(hadley) mean(hadley) plotts.sample.wge(hadley) aic.wge(hadley,p=0:6,q=0:1) # estimate stationary model had.est=est.arma.wge(hadley,p= # check residuals plotts.sample.wge(had.est$res,arlimits=TRUE) ljung.wge(had.est$res,p= # other realizations from this model demo=gen.arma.wge(n=160,phi=had.est$phi,theta=had.est$theta) plotts.sample.wge(demo)

(b) Nonstationary model: data(hadley) plotts.sample.wge(hadley) # Overfit AR models d8=est.ar.wge(hadley,p=8, type='burg') # d12=est.ar.wge(hadley,p=12, type='burg') # Difference the data h.dif=artrans.wge(hadley,phi.tr=1) plotts.sample.wge(h.dif) aic.wge(h.dif,p=0:6,q=0:1) h.dif.est=est.arma.wge(h.dif,p= # Examine residuals plotts.sample.wge(h.dif.est$res,arlimits=TRUE) ljung.wge(h.dif.est$res,p= # other realizations from this model demo=gen.aruma.wge(n=160,d=1,phi=h.dif.est$phi,theta=h.dif.est$theta) plotts.sample.wge(demo)

Forecasts using Stationary Model data(hadley) fore.arma.wge(hadley,phi=c(1.27,-.47,.19), theta=.63,n.ahead=25,limits=FALSE) Forecasts using Nonstationary Model data(hadley) fore.aruma.wge(hadley,d=1,phi=c(.33,-.18), theta=.7,n.ahead=25,limits=FALSE)

Notes: the two models are quite similar but produce very different forecasts it is important to understand the properties of the selected model - the selection of a stationary model will automatically produce forecasts that eventually tend toward the mean of the observed data (i.e. it was the decision to use the stationary model that produced these results) beware of results by investigators who choose a model in order to produce desired results

Deterministic Signal-plus-Noise Models Example Signals: , C constant

Recall -- sometimes it’s not easy to tell whether a deterministic signal is present in the data Is there a deterministic signal?

Realizations - is there a deterministic signal? Recall - Sometimes it’s not easy to tell whether a deterministic signal is present in the data. Global Temperature Data

Realizations from the stationary model fit to temperature data

Another Possible Model Question: Should observed increasing temperature trend be predicted to continue? - based on standard ARMA/ARUMA fitting the answer is “No” Another Possible Model “deterministic signal + noise model” nonstationary due to non-constant mean

Common Strategy for Assessing which Model is Appropriate • assume Z t  AR(p) with zero mean note that this is different from usual regression since the noise is correlated in the presence of noise with positive autocorrelations, using the incorrect procedure of testing H0: b = 0 using usual regression methods, results in inflated observed significance levels testing H0: b = 0 is a difficult problem

Important Points: - realizations from AR (ARMA/ARUMA) models have random trends

Cochrane-Orcutt Method

Question: Technique: Test: Is there a deterministic trend in the data (That should be predicted to continue)? Technique: Test: IF we conclude b=0, fit AR model & trend not predicted to continue IF we conclude b 0, trend is predicted to continue

Cochrane-Orcutt Test Note: Test H0: b = 0 using the test statistic Woodward and Gray (1993) Journal of Climate showed: - when using the Cochrane Orcutt method to remove the correlated errors, the resulting test still has inflated observed significance levels. - same is true with ML methods

Simulation Results Observed Significance Levels for Tests H 0 : b = 0 -- b = 0 (i.e. null hypothesis of zero slope is true) -- 1000 replicates generated from each model -- hypothesis of zero slope tested at a = .05 for each realization using Cochrane-Orcutt Observed Significance Levels for Tests H 0 : b = 0 (nominal level = 5%) 50 18.4 27.2 37.2 100 16.0 20.0 28.4 250 8.0 12.4 17.6 1000 4.8 8.4 9.6

Bootstrap Method Woodward, Bottone, and Gray (1997) -- Journal of Agricultural, Biological, and Environmental Statistics (JABES), 403-416. Given a time series realization that may have a “deterministic trend”: To test H 0 : b = 0 1. Based on the observed data, calculate a test statistic Q that is a test for trend (e.g. Cochrane-Orcutt statistic) for which large Q suggests a trend Fit stationary AR model (with constant mean) to observed time series data and generate K realizations from this model For each realization in (2) calculate the test statistic in (1) Reject H 0 : b = 0 if Q in (1) exceeds the 95th percentile of the bootstrap-based test statistics in (3)

Simulation Results Observed Significance Levels for Tests H 0 : b = 0 -- b = 0 (i.e. null hypothesis of zero slope is true) -- 1000 replicates generated from each model Observed Significance Levels for Tests H 0 : b = 0 50 18.4 27.2 37.2 100 16.0 20.0 28.4 250 8.0 12.4 17.6 1000 4.8 8.4 9.6 5.9 6.4 3.8 4.1 8.1 5.3 5.1 3.8 10.0 7.6 6.4 6.1 black - Cochrane-Orcutt red - Bootstrap

Comments: 1. If a significant slope is detected, then forecasts from the model Xt = a + bt + Zt will forecast an existing trend to continue - such a model should be used with caution if more than short-term forecasts are needed 2. Jon Sanders (2009) developed bootstrap-based procedures to test for significant - monotonic trend - nonparametric trend

Checking Realization Characteristics Do realizations generated from the model have characteristics consistent with those of the actual data? - similar realizations? - similar sample autocorrelations? - similar spectral densities (nonparametric)? - ...

Sunspot Data: 1749-2008 (ss08)

Note: This data set ss08 contains sunspot numbers for the years 1925-2008 not included in the sunspot data sunspot.classic We consider 2 models: - AR(2) - AR(9)

Comparing Realizations Sunspot Data Realizations from the AR(2) Model Realizations from the AR(9) Model

Comparing Sample Autocorrelations Sunspot Data Figure 9.10 AR(2) Model AR(9) Model

Comparing Nonparametric Spectral Densities Sunspot Data Figure 9.11 AR(2) Model AR(9) Model

Time Series Model Checking using Parametric Bootstraps Tsay - Applied Statistics (1992) generates multiple realizations from the fitted model checks to see whether actual data are consistent with the realizations from the fitted model Woodward and Gray – Journal of Climate (1995) Given a time series realization - generate bootstrap realizations from each of two candidate models - use discriminant analysis to ascertain which model generates realizations that best match characteristics of the observed realization.

Comprehensive Analysis of Time Series Data Involves: Examination -- of data -- of sample autocorrelations Obtaining a model -- stationary or nonstationary -- correlation-based or signal-plus-noise -- identifying p and q -- estimating coefficients III. Checking for model appropriateness IV. Obtaining forecasts, spectral estimates, etc. as dictated by the situation

Important Note: Previous discussion only focused on deciding among - ARMA(p,q) - ARUMA(p,d,q) - signal-plus-noise There are MANY other models and tools available to the time series analyst, among which are: - long memory models (Ch. 11) - multivariate and state-space models (Ch. 10) - ARCH/GARCH models (Ch. 4) - wavelet analysis (Ch. 12) - models for data with time varying frequencies (TVF) (Ch. 13)