Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Ka-fu Wong University of Hong Kong Some Final Words.

Similar presentations


Presentation on theme: "1 Ka-fu Wong University of Hong Kong Some Final Words."— Presentation transcript:

1 1 Ka-fu Wong University of Hong Kong Some Final Words

2 2 Unobserved components model of time series According to the unobserved components model of a time series, the series y t has three components y t = T t + S t + C t Time trend Seasonal component Cyclical component

3 3 y t = T t + S t + C t Deterministic Trend The linear trend model – T t = β 0 + β 1 t, t = 1,…,T The polynomial trend model – T t = β 0 + β 1 t + β 2 t 2 + … + β p t p where p is a positive integer. For economic time series we almost never require p > 2. That is, if the linear trend model is not adequate, the quadratic trend model will usually work: T t = β 0 + β 1 t + β 2 t 2

4 4 y t = T t + S t + C t Seasonality Quarterly seasonality: S t =  1 D 1t +  2 D 2t +  3 D 3t +  4 D 4t or S t =  1 +  2 D 2t +  3 D 3t +  4 D 4t D it = 1 if t = quarter i, and 0 otherwise Monthly seasonality: S t =  1 D 1t +  2 D 2t + … +  12 D 12t or S t =  1 +  2 D 2t + … +  12 D 12t D it = 1 if t = month i, and 0 otherwise

5 5 y t = T t + S t + C t Cyclical component C t is usually assumed to be covariance stationarity. Covariance stationarity refers to a set of restrictions/conditions on the underlying probability structure of a time series that has proven to be especially valuable for the purpose of forecasting. 1.Constant mean 2.Constant (and finite) variance 3.Stable autocovariance function

6 6 Wold’s theorem According to Wold’s theorem, if y t is a zero mean covariance stationary process than it can be written in the form: where the ε’s are (i) WN(0,σ 2 ), (ii) b 0 = 1, and (iii) In other words, each y t can be expressed in terms of a single linear function of current and (possibly an infinite number of) past drawings of the white noise process, ε t. If y t depends on an infinite number of past ε’s, the weights on these ε’s, i.e., the b i ’s must be going to zero as i gets large (and they must be going to zero at a fast enough rate for the sum of squared b i ’s to converge).

7 7 Innovations ε t is called the innovation in y t because ε t is that part of y t not predictable from the past history of y t, i.e., E(ε t │ y t-1,y t-2,…)=0 Hence, the forecast (conditional expectation) E(y t │ y t-1,y t-2,…) = E(y t │ ε t-1,ε t-2,…) = E(ε t + b 1 ε t-1 + b 2 ε t-2 +… │ ε t-1,ε t-2,…) = E(ε t │ ε t-1,ε t-2,…) + E(b 1 ε t-1 + b 2 ε t-2 +… │ ε t-1,ε t-2,…) = 0 + (b 1 ε t-1 + b 2 ε t-2 +…) = b 1 ε t-1 + b 2 ε t-2 +… And, the one-step ahead forecast error y t - E(y t │ y t-1,y t-2,…) = (ε t + b 1 ε t-1 + b 2 ε t-2 +…)-(b 1 ε t-1 + b 2 ε t-2 +…) = ε t

8 8 Mapping Wold to a variety of models It turns out that the Wold representation can usually be well- approximated by a variety of models that can expressed in terms of a very small number of parameters. the moving-average (MA) models, the autoregressive (AR) models, and the autoregressive moving-average (ARMA) models.

9 9 Mapping Wold to a variety of models For example, suppose that the Wold representation has the form: for some b, 0 < b < 1. (i.e., b i = b i ) Then it can be shown that y t = by t-1 + ε t which is an AR(1) model.

10 10 Moving Average (MA) Models If y t is a (zero-mean) covariance stationary process, then Wold’s theorem tells us that y t can be expressed as a linear combination of current and past values of a white noise process, ε t. That is: where the ε’s are (i) WN(0,σ 2 ), (ii) b 0 = 1, and (iii) Suppose that for some positive integer q, it turns out that b q+1, b q+2,… are all equal to zero. That is suppose that y t depends on current and only a finite number of past values of ε: This is called a q-th order moving average process (MA(q))

11 11 Autoregressive Models (AR(p)) In certain circumstances, the Wold form for y t, can be “inverted” into a finite-order autoregressive form, i.e., y t = φ 1 y t-1 + φ 2 y t-2 +…+ φ p y t-p +ε t This is called a p-th order autoregressive process AR(p)). Note that it has p unknown coefficients: φ 1,…, φ p Note too that the AR(p) model looks like a standard linear regression model with zero-mean, homoskedastic, and serially uncorrelated errors.

12 12 AR(p): y t = φ 1 y t-1 + φ 2 y t-2 +…+ φ p y t-p +ε t The coefficients of the AR(p) model of a covariance stationary time series must satisfy the stationarity condition: Consider the values of x that solve the equation 1-φ 1 x-…-φ p x p = 0 These x’s must all be greater than 1 in absolute value. For example, if p = 1 (the AR(1) case), consider the solutions to 1- φx = 0 The only value of x that satisfies this equation is x = 1/φ, which will be greater than one in absolute value if and only if the absolute value of φ is less than one. So, │ φ │ < 1 is the stationarity condition for the AR(1) model. The condition guarantees that the impact of ε t on y t+  decays to zero as  increases.

13 13 AR(p): y t = φ 1 y t-1 + φ 2 y t-2 +…+ φ p y t-p +ε t The autocovariance and autocorrelation functions,  (  ) and ρ(  ), will be non-zero for all . Their exact shapes will depend upon the signs and magnitudes of the AR coefficients, though we know that they will be decaying to zero as  goes to infinity. The partial autocorrelation function, p(  ), will be equal to 0 for all  > p. The exact shape of the pacf for 1 <  < p will depend on the signs and magnitudes of φ 1,…, φ p.

14 14 Approximation Any MA process may be approximated by an AR(p) process, for sufficient large p. And the residuals will appear white noise. Any AR process may be approximated by a MA(q) process, for sufficient large q. And the residuals will appear white noise. In fact, if an AR(p) process can be written exactly as a MA(q) process, the AR(p) process is called invertible. Similarly, if a MA(q) process can be written exactly as an AR(p) process, the MA(q) process is called invertible.

15 15 Choice of ARMA(p,q) models Estimate a low-order ARMA model and check the autocorrelation and partial autocorrelation of the residuals. If the model is a good approximation, the residuals should exhibit properties of white noise in the autocorrelation and partial autocorrelation. Estimate ARMA models with various combination of p and q. Choose the model with the smallest AIC or SIC. When they are in conflict, choose the parsimonious one.

16 16 Has the probability structure remained same throughout the sample? Check parameter constancy if we are suspicious. Allow for breaks if they are known. If we know breaks exist but do not know exactly where the break point should be, try to identify the breaks.

17 17 Assessing Model Stability Using Recursive Estimation and Recursive Residuals Forecast: If the model’s parameters are different during the forecast period than they were during the sample period, then the model we estimated will not be very useful, regardless of how well it was estimated. Model: If the model’s parameters were unstable over the sample period, then model was not even a good representation of how the series evolved over the sample period.

18 18 Are the parameters constant over the sample? Consider the model of Y that combines the trend and AR(p) components into the following form: Y t =β 0 + β 1 t + β 2 t 2 +…+β s t s +φ 1 Y t-1 +…+φ p Y t-p +ε t where the ε’s are WN(0,σ 2 ). We will propose using results from applying the recursive estimation method to evaluate parameter stability over the sample period t = 1,…,T. Fit the model (by OLS) for t = p+1,…,T*, using increasing number of observations in each estimation. RegressionData used 1t= p+1, …, 2p+s+1 2t = p+1,…, 2p+s+2 3t = p+1,…, 2p+s+3 …… T-2p-st = p+1,…,T

19 19 Recursive estimation The recursive estimation yield parameter estimates for each T*: and for i = 1,..,s, j = 1,…,p and T* = 2p+s+1,…,T. If the model is stable over time then what we should find is that as T* increases the recursive parameter estimates should stabilize at some level. A model parameter is unstable if it does not appear to stabilize as T* increases or if there appears to be a sharp break in the behavior of the sequence before and after some T*.

20 20 Example: when parameters are stable Data plot Plot of recursive parameter estimates

21 21 Example: when there is a break in parameters Data plot Plot of recursive parameter estimates

22 22 Recursive Residuals and the CUSUM Test The CUSUM (“cumulative sum”) test is often used to test the null hypothesis of model stability, based on the residuals from the recursive estimates. The CUSUM statistic is calculated for each t. Under the null hypothesis of stability, the statistic follows the CUSUM distribution. If the calculated CUSUM statistics appear to be too large to have been drawn from the CUSUM distribution, we reject the null hypothesis (of model stability).

23 23 CUSUM Let e t+1,t denote the one-step-ahead forecast error associated with forecasting Y t+1 based on the model fit for over the sample period ending in period t. These are called the recursive residuals. e t+1,t = Y t+1 – Y t+1,t where the t subscripts on the estimated parameters refers to the fact that they were estimated based on a sample whose last observation was in period t. tt+1 t+2

24 24 CUSUM Let σ 1,t denote the standard error of the one-step ahead forecast of Y formed at time t, i.e, σ 1,t = sqrt(var(e t+1,t )) Define the standardized recursive residuals, w t+1,t, according to w t+1,t = e t+1,t /σ 1,t Fact: Under our maintained assumptions, including model homogeneity, w t+1,t ~ i.i.d. N(0,1). Note that there will be a set of standardized recursive residuals for each sample.

25 25 CUSUM The CUSUM (cumulative sum) statistics are defined according to: for t = k,k+1,…,T-1, where k = 2p+s+1 is the minimum sample size for which we can fit the model. Under the null hypothesis, the CUSUM t statistic is drawn from a CUSUM(t-k) distribution. The CUSUM(t-k) distribution is a symmetric distribution centered at 0. Its dispersion increases as t-k increases. We reject the null hypothesis at the 5% significance level if CUSUM t is below the 2.5-percentile or above the 97.5-percentile of the CUSUM(t-k) distribution.

26 26 Example: when parameters are stable

27 27 Example: when there is a break in parameters

28 28 Accounting for a structural break Suppose it is known that there is a structural break in the trend of a series in 1998 – due to Asian Financial Crisis.

29 29 Accounting for a structural break Introduce dummy variables into the regression to jointly estimate  0,1,  0,2,  1,1,  1,2 Let D t = 0 if t = 1,…,T 0 =1 if t > T 0 Run the regression over the full sample y t =  0 +  1 D t +  2 t +  3 (D t t) +  t, t = 1,…,T. Then Suppose we want to allow  0 to change at T 0 but we want to force  1 to remain fixed (i.e., a shift in the intercept of the trend line) – Run the regression of y t on 1, D t and t to estimate  0,  1, and  2 ( =  1 ).

30 30 Linear regression models Endogenous variable Exogenous variables Explanatory variables Rule, rather than exception: all variables are endogenous.

31 31 Vector Autoregressions, VAR(p) allows cross-variable dynamics VAR(1) of two variables. The variable vector consists of two elements. Regressors consist of the variable vector lagged one period only. The innovations allowed to be correlated.

32 32 Estimation of Vector Autoregressions Run OLS regressions equation by equation. OLS estimation turns out to have very good statistical properties when each equation has the same regressors, as in standard VARs. Otherwise, a more complicated estimation procedure called seemingly unrelated regression, which explicitly accounts for correlation across equation disturbances, would be need to obtain estimates with good statistical properties.

33 33 Forecast Estimation of Vector Autoregressions y 1,T, y 2,T y 1,T+1, y 2,T+1 y 1,T+1, Y 2,T+1 y 1,T+2, Y 2,T+2 y 1,T+2, y 2,T+2 y 1,T+3, Y 2,T+3 y 1,T+3, y 2,T+3 Given the parameters, or parameter estimates

34 34 Impulse response functions With bivariate autoregression, we can compute four sets of impulse-response functions: y 1 innovations (  1,t ) on y 1 y 1 innovations (  1,t ) on y 2 y 2 innovations (  2,t ) on y 1 y 2 innovations (  2,t ) on y 2

35 35 Variance decomposition How much of the h-step-ahead forecast error variance of variable i is explained by innovations to variable j, for h=1,2,…. ? With bivariate autoregression, we can compute four sets of variance decomposition: y 1 innovations (  1,t ) on y 1 y 1 innovations (  1,t ) on y 2 y 2 innovations (  2,t ) on y 1 y 2 innovations (  2,t ) on y 2

36 36 Assessing optimality with respect to an information set The Mincer-Zarnowitz regression Consider the regression y t+h =  0 +  1 y t+h,t + u t If the forecast y t+h,t is optimal, we should have (  0,  1 ) =(0,1). That is y t+h = 0 + 1 y t+h,t + u t e t+h,t = y t+h - y t+h,t = 0 + 0 y t+h,t + u t e t+h,t = y t+h - y t+h,t =  0 +  1 y t+h,t + u t where (  0,  1 ) =(0,0)

37 37 Measures of Forecast Accuracy

38 38 Measures of Forecast Accuracy

39 39 Statistical Comparison of Forecast Accuracy

40 40 Statistical Comparison of Forecast Accuracy Sample auto-covariance of d at  displacement L Implementation of the test: Run a regression of the loss difference on a constant. See West, Kenneth and Michael W. McCracken (1998): “Regression Based Tests of Predictive Ability,” International Economic Review 39 (1998), 817-840.

41 41 Forecast Encompassing (  a,  b )=(1,0): model a forecast-encompasses model b. (  a,  b )=(0,1): model b forecast-encompasses model a. For general values of (  a,  b ): neither model encompasses the other. To test forecast encompassing, run the above regression and test the joint hypothesis (  a,  b )=(1,0), or (  a,  b )=(0,1).

42 42 Forecast combination Typically, we obtain the weight by running the regression Allowing for Time-varying weights Allowing for serial correlation in the errors, as in the h-step-ahead forecast combinations

43 43 AR(1) vs. Random Walk y t = b 1 y t-1 +  t  t ~ WN(0,  2 ) y t = y t-1 +  t  t ~ WN(0,  2 ) AR(1): Random Walk: y t - b 1 y t-1 =  t (1- b 1 L) y t =  t y t - y t-1 =  t (1 – L) y t =  t  y t =  t I(1) Integrated of order 1

44 44 Application: Forecast of US GDP per capita Deterministic or stochastic trend trend reverts back to trend If we believe that the world is better modelled as AR with deterministic trend, we expect a rebound after a recession. Well-below trend

45 45 ARIMA(p,d,q)  d y t is ARMA(p,q). For example, d=1:  y t =(1-L)y t = y t -y t-1 d=2:  2 y t =(1-L) 2 y t =  (  y t )=y t -y t-1 – (y t-1 -y t-2 ) = y t -2y t-1 +y t-2

46 46 Similarity of ARIMA(p,1,q) to random walk ARIMA(p,1,q) processes are appropriately made stationary by differencing. Shocks (  t ) to ARIMA(p,1,q) processes have permanent effects. Hence, shock persistence means that optimal forecasts even at very long horizons don’t completely revert to a mean or a trend. The variance of an ARIMA(p,1,q) process grows without bound as time progresses. Uncertainty associated with our forecasts grows with horizon of our forecast. Width of our interval forecast grows without bound with the horizon of our forecast.

47 47 Difference or not?

48 48 ARCH(p) process Examples: (1)  ARCH(1):  t 2 =  +  1  t-1 2 (2) ARCH(2):  t 2 =  +  1  t-1 2 +  2  t-2 2 (1) Unconditional mean (2) Unconditional variance (3) Conditional variance Some properties

49 49 ARCH(1)  t 2 =  +  1  t-1 2 Note that E[  t 2 ] = E[ E(  t 2 |  t-1 ) ] = E(  t 2 ) =  2 E[(  t -E(  t )) 2 ] = ? E[  t 2 ] =  +  1 E[  t-1 2 ]  2 =  +  1  2  2 =  / (1-  1 )

50 50 GARCH(p,q) Backward substitution on  t 2 yields A infinite-order ARCH process with some restriction in the coefficients. (Analogy: An ARMA(p,q) process can be written as MA(∞) process.) GARCH can be viewed as a parsimonious way to approximate a high order ARCH process

51 51 Extension of ARCH and GARCH Models GARCH-in-Mean (i.e., GARCH-M) High risk, high return. Conditional mean regression

52 52 Estimating, Forecasting, and Diagnosing GARCH Models Diagnostic: Estimate the model without GARCH in the usual way. Look at the time series properties of the squared residuals. Correlogram, AIC, SIC, etc. ARMA(1,1) in the squared residuals implies GARCH(1,1).

53 53 Estimating, Forecasting, and Diagnosing GARCH Models Estimation: Usually use maximum likelihood with the assumption of normal distribution. Maximum likelihood estimation finds the parameter values that maximize the likelihood function Forecast: In financial applications, volatility forecasts are often of direct interest. 1-step-ahead conditional variance Better forecast confidence interval vs.

54 54 Does Anything Beat A GARCH(1,1) out of sample? No. So, use GARCH(1,1) if no other information is available.

55 55 Additional interesting topics / references Forecasting turning points. Lahiri, Kajal and Geoffrey H. Moore (1991): Leading Economic Indicators: New Approaches and Forecasting Records, Cambridge University Press. Forecasting cycles: Niemira, Michael P. and Philip A. Klein (1994): Forecasting Financial and Economic Cycles, John Wiley and Sons.

56 56 Forecasting y t Using past values of y t Using other variables x t Linear regression of y t on x t Vector autoregressions of y t and x t ARMA(p,q) Deterministic elements Trend Seasonality

57 57 Forecasting volatility Stochastic volatility models GARCH models

58 58 Probabilistic structure has changed Regime switching models Use dummy variables to account for change in structure

59 59 Nonlinearity Regime switching models Include nonlinear terms Threshold models

60 60 Using models as an approximation of the real world. No one knows what the true model should be. Even if we know the true model, we may need to include too many variables, which are not feasible.

61 61 End


Download ppt "1 Ka-fu Wong University of Hong Kong Some Final Words."

Similar presentations


Ads by Google