Model specification (identification) We already know about the sample autocorrelation function (SAC): Properties: Not unbiased (since a ratio between two.

Presentation on theme: "Model specification (identification) We already know about the sample autocorrelation function (SAC): Properties: Not unbiased (since a ratio between two."— Presentation transcript:

Model specification (identification) We already know about the sample autocorrelation function (SAC): Properties: Not unbiased (since a ratio between two random variables) Bias decreases with n Variance complicated, common to use general large-sample results

Large-sample results (asymptotics): For large n the random vector has an approximate multivariate normal distribution with zero mean vector and covariance matrix ( c ij ) where This gives that  Var (r k )  0 as n  does not diminish as n 

Hence, the distribution of r k will depend on the correlation structure of Y t and accordingly on the model behind (i.e. if it is and AR(1), an ARMA(2,1) etc.) For an AR(1), i.e. Y t =   Y t – 1 + e t For an MA(q) i.e. not dependent on k after the qth lag i.e. not dependent on k for large lags For white noise

Partial autocorrelation function Describes the “specific” part of the correlation between Y t and Y t – k that is not due to successive serial correlations between the variables Y t – 1, Y t – 2, …, Y t – k. Partial correlations are used for other types of data as well (for instance in linear models of cross-sectional data. Patterns For an AR(p)-process,  k cuts off after lag p (i.e. the same type of behaviour like  k has for an MA(q)-process For an MA(q)-process  k shows approximately the same pattern as does  k for an AR(p)-process

Estimation from data, Sample Partial Autocorrelation function (SPAC): No explicit formula, estimation has to be made recursively Properties of SPAC: More involved, but for an AR(p)-process SPAC-values for lags greater than p are approximately normally distributed with zero mean and variance 1/n

Extended Autocorrelation function (EACF) One (of several) tool to improve the choice of orders of ARMA(p, q)-processes. Very clear as a theoretical function, but noisy when estimated on series not too long. AR, MA or ARMA?

No pattern at all?

EACF table for Y AR/MA 0 1 2 3 4 5 6 7 8 9 10 11 12 13 0 o o o o o o o o o o o o o o 1 o o o o o o o o o o o o o o 2 x o o o o o o o o o o o o o 3 o o x o o o o o o o o o o o 4 x o o o o o o o o o o o o o 5 x x o o o o o o o o o o o o 6 o x o o o x o o o o o o o o 7 o o o o o x o o o o o o o o ARMA(0,0) or ARMA(1,0)? True process: Y t = 1.3 + 0.2  Y t – 1 + e t – 0.1  e t – 1

Model selection from more analytical tools Dickey-Fuller Unit-Root test H 0 : The process Y t is difference non-stationary (  Y t is stationary) H a : The process Y t is stationary Augmented Dickey-Fuller test statistic (ADF): If  =1 (difference non-stationary) 

Fit the model and test H 0 : a = 0 (difference non-stationary) vs. H a : a < 0 (stationary) using the test statistic However, not t-distributed under H 0. Another sampling distributions has been derived and tables (programmed in R)

Akaike’s criteria For an ARMA(p,q)-process, let k = p + q + 1 and find the values of the parameters p, q,  1, …,  p,  1, …,  q of the model that minimizes –2  log{max L(p, q,  1, …,  p,  1, …,  q )} + 2  k AIC [ Akaike’s Information Criterion]. Works well when the true process have (at least one) infinite order –2  log{max L(p, q,  1, …,  p,  1, …,  q )} + k  log(n) BIC [ (Schwarz) Bayesian Information Criterion]. Works well when we “know” that the true process is a finite-order ARMA(p,q)