# EC220 - Introduction to econometrics (revision lectures 2011)

## Presentation on theme: "EC220 - Introduction to econometrics (revision lectures 2011)"— Presentation transcript:

EC220 - Introduction to econometrics (revision lectures 2011)
Christopher Dougherty EC220 - Introduction to econometrics (revision lectures 2011) Slideshow: stationary processes Original citation: Dougherty, C. (2011) EC220 - Introduction to econometrics (revision lectures 2011). [Teaching Resource] © 2011 The Author This version available at: Available in LSE Learning Resources Online: May 2012 This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License. This license allows the user to remix, tweak, and build upon the work even for commercial purposes, as long as the user credits the author and licenses their new creations under the identical terms.

STATIONARY PROCESSES A very simple example: the AR(1) process Xt = b2Xt–1 + et where │b2│ < 1 and et is iid—independently and identically distributed—with zero mean and finite variance. 1

STATIONARY PROCESSES The figure shows 50 realizations of the process. The AR(1) process is said to be stationary: the distribution of the 50 different realizations does not depend on time. 9

Conditions for weak stationarity:
STATIONARY PROCESSES Conditions for weak stationarity: The population mean of the distribution is independent of time. Intuition: this fails if the process tends to go up or down with time (eg. if there is a time trend) The population variance of the distribution is independent of time. Intuition: this fails if different realizations of the process would spread out over time (eg. a random walk) 3. The population covariance between its values at any two time points depends only on the distance between those points, and not on time. 20

STATIONARY PROCESSES Example: let’s check the expectation for an AR(1) process. b2t tends to zero as t becomes large since │b2│<1. Hence b2tX0 will tend to zero and the first condition will still be satisfied, apart from initial effects. Technically, we should say that the process is asymptotically stationary (although you still need to check 2 and 3: the variance and covariance will also tend to a limit). Usually we call asymptotically stationary processes stationary. 20

NONSTATIONARY PROCESSES
What if we take an AR(1) process with b2 = 1? This is a random walk, and it’s no longer stationary. The figure shows an example realization of a random walk for the case where et has a normal distribution with zero mean and unit variance. 3

NONSTATIONARY PROCESSES
This figure shows the results of a simulation with 50 realizations. The distribution changes as t increases, becoming increasingly spread out, so the process is non-stationary. Its variance depends on time. 4

NONSTATIONARY PROCESSES
Random walk The variance of the sum of the innovations is equal to the sum of their individual variances. The covariances are all zero because the innovations are assumed to be generated independently. Hence the population variance of Xt is directly proportional to t. The variance of the process depends on time, and does not tend to any finite limit. Note: the shock in each time period is permanently incorporated in the series. There is no tendency for the effects of the shocks to attenuate with time. 12

NONSTATIONARY PROCESSES
Random walk with drift This process is known as a random walk with drift, the drift referring to the systematic change in the expectation from one time period to the next. Now both the expectation and the variance depend on time. 21

NONSTATIONARY PROCESSES
The mean is drifting upwards because b1 has been taken to be positive. If b1 were negative, it would be drifting downwards. And, as in the case of the random walk with no drift, the distribution spreads out around its mean. 24

NONSTATIONARY PROCESSES
Deterministic trend Another nonstationary time series is one possessing a deterministic time trend. It is nonstationary because the expected value of Xt is not independent of t. 27

NONSTATIONARY PROCESSES
The figure shows 50 realizations of a variation where the disturbance term is the stationary process ut = 0.8ut–1 + et. The underlying trend line is shown in white. 28

NONSTATIONARY PROCESSES
Two very common forms of non-stationarity: Difference-stationarity: I(1) is not stationary is stationary Trend-stationarity is stationary (for some coefficients) 35

NONSTATIONARY PROCESSES
Difference-stationarity If a non-stationary process can be transformed into a stationary process by differencing, it is said to be difference-stationary, integrated of order 1, or I(1). A random walk, with or without drift, is an example. 36

NONSTATIONARY PROCESSES
Trend-stationarity A non-stationary time series is described as being trend-stationary if it can be transformed into a stationary process by extracting a time trend. For example, the very simple model given by the first equation can be detrended by subtracting its regression residuals. 43

Spurious regressions with variables possessing deterministic trends
What if you regress Yt on Xt ? Intuition: the processes X and Y have nothing to do with one another, except for the fact that both tend to increase (or perhaps decrease) with time. For example, perhaps Xt is the number of calories consumed by an average American in year t, and Yt is the proportion of women who vote. Because both of these have increased steadily over the past few decades, a regression of one on the other might conclude that giving the women the right to vote caused Americans to eat bigger meals. This is spurious: both values are simply increasing with time, which is omitted from the regression. 1

Spurious regressions with variables possessing deterministic trends
Regress on Regress Yt on Xt and t This problem is easy to solve. There are two equivalent options. One option is to de-trend the variables, so that you are only looking at fluctuations around the time trend. A second option is to include t in your regression directly, since it is acting as an omitted variable. 2

What if you regress Yt on Xt ?
SPURIOUS REGRESSIONS Spurious regressions with variables that are random walks What if you regress Yt on Xt ? It turns out that two independent random walks, regressed one on the other, are likely to produce a significant slope coefficient, despite the fact that the series were unrelated. Granger and Newbold generated 100 pairs of random walks, with sample size 50 time periods. In 77 of the trials, the slope coefficient was significant at the 5 percent level: a lot of type I errors! Something is wrong with the OLS estimator when the variables are random walks… 11

We will examine the distribution of b2 .
SPURIOUS REGRESSIONS Let’s do our own simulation. Our model will be that Yt is related to Xt and we will regress Yt on Xt where Yt and Xt are generated as independent random walks. There is no reason that one should be related to the other. We will examine the distribution of b2 . We will see what happens when we test the null hypothesis H0: b2 = 0, knowing that it is true because the series have been generated independently. We will see that the OLS estimator behaves very strangely. 19

SPURIOUS REGRESSIONS Distribution of b2 Yt, Xt both iid N(0,1)
This is what we would expect to see, if the dependent variable is independent of the regressor. The OLS estimator should collapse to a spike at zero, since it is consistent. This is not what we will see in the case of random walks. 22

SPURIOUS REGRESSIONS Distribution of b2 T = 25, 50, 100, 200
Yt, Xt both random walks Look what happens in the random walk regressions. While the distribution of b2 is centred on zero, it refuses to collapse to a spike. The estimator will be unbiased, but inconsistent. Even for very large sample sizes, you’re likely to get some estimates that are very far from the true parameter value, which is zero. 30

SPURIOUS REGRESSIONS Distribution of b2 T = 25, 50, 100, 200
Yt, Xt both random walks This is the first time that we have seen an estimator that is inconsistent because its distribution fails to collapse to a spike. Other estimators we saw were asymptotically biased (the spike was in the wrong place) or inefficient (the spike collapsed slowly). 31

SPURIOUS REGRESSIONS True DGP for Yt: What we did:
We’ve misspecified the model. We omitted Yt–1. By omitting Yt–1, we have forced it into the disturbance term. Since Yt is a random walk, this implies that ut is also a random walk. This invalidates the OLS procedure for calculating standard errors, which assumes that ut is generated randomly from one observation to the next (no autocorrelation). 42

SPURIOUS REGRESSIONS True DGP for Yt: What we did:
In milder cases of autocorrelation, such as AR(1) errors, the standard errors were wrong, and the variance of the estimator was large, but this variance eventually collapsed to zero. In the case of non-stationary disturbances (in this case, a random walk), the problem is much worse. The OLS estimator can even be inconsistent; the variance is large and stays large. Lesson: if the error term is non-stationary, OLS estimates might be inconsistent. 42

TEST OF NONSTATIONARITY: INTRODUCTION
General model Alternatives or or This process is stationary if –1 < b2 < 1 and d = 0. Otherwise it is non-stationary. We will exclude b2 > 1 because explosive processes are seldom encountered in economics. We will also exclude b2 < –1 because it doesn’t make much sense. This will be the motivation for a test for non-stationarity, which is often called a unit root test. We will be more general: in our test we will actually include two lags in the model, and the process is stationary if –1 < b2 + b3 < 1 8

TEST OF NONSTATIONARITY
Augmented Dickey–Fuller test Main condition for stationarity: The ADF test involves a one-sided t-test on Yt-1. The null hypothesis is non-stationarity. Because the regressor is non-stationary under the null, the t-statistic will not have its usual distribution. You need to use Dickey-Fuller critical values. This test is low power: it will be very hard to reject the null if the process is stationary, but highly autocorrelated. 26

Perhaps, for example, Xt is a random walk, and Yt just doubles Xt :
COINTEGRATION In general, a linear combination of two or more time series will be nonstationary if one or more of them is nonstationary.However, if there is a long-run relationship between the time series, the outcome may be different. Perhaps, for example, Xt is a random walk, and Yt just doubles Xt : Then Yt is also a random walk, but there is a linear combination of Xt and Yt that is stationary: This process is identically zero, which is stationary. 1

COINTEGRATION We’ve seen that regressing one non-stationary process on another can lead to problems, ie spurious regressions. However, if the two processes are co-integrated, the regression has meaning. How can you tell if you’ve got a meaningful regression? If the processes are co-integrated, the error term εt in the regression will be stationary, because although the two processes depend on time (non-stationary), they move together in such a way that their difference does not. . 1

How do we test for co-integration?
If the relationship is known (from theory, for example), you just use a standard unit root test on that relationship: If the cointegrating relationship has to be estimated, the test is an indirect one because it must be performed on the residuals from the regression, rather than on the disturbance term. OLS coefficients minimize the sum of the squares of the residuals, and that the mean of the residuals is automatically zero, so the residuals will tend to appear more stationary than the underlying errors actually are. To allow for this, we use higher critical values than for the standard unit root test, making it harder to reject the null of non-stationarity. . 1

COINTEGRATION Example: permanent consumption and permanent income are co-integrated. Neither is stationary, but their difference is. 10

COINTEGRATION To test for cointegration, it is necessary to evaluate whether the disturbance term is a stationary process. In the case of the example of consumer expenditure and income, it is sufficient to perform a standard ADF unit root test on the difference between the two series, because the relationship is known. 14

In the partial adjustment model, there is some long run “target” relationship between X and Y, but in the short term, the dependent variable adapts smoothly. So, in the short run, Yt is a weighted average of the target value, Y*t and last period’s value Yt-1 . Intuition: for example, if I lose my job, I should sell my house and move into a studio, but in reality I move into a smaller house, then a few years later an apartment, and eventually a studio, because I don’t want to change my lifestyle so suddenly. I will eventually reach the long-run target. 1

There are a few problems with fitting this equation. Including a lagged dependent variable introduces finite sample bias. If the disturbance term in the target relationship is autocorrelated, it will be autocorrelated in the fitted relationship. Now you also have a lagged dependent variable. OLS would yield inconsistent estimates and you should use an AR(1) estimation method instead. 5

In the case of the adaptive expectations model, Y depends on what we expect X to be next year. For example, the number of employees a firm hires will depend on its expected profits for the coming year. An intelligent firm will adapt its expectation as it obtains new information. The expectation for next year will be a weighted average of this year’s actual value and the value that had been expected. We do not observe the firm’s expected values. We can substitute repeatedly for the expected value of X next period, resulting in an equation with lagged values of X, enough lags being taken to render negligible the coefficient of the unobservable variable Xet–s+1. 6

A second option is to express Y as a function of X and lagged Y. The disturbance term is a compound of ut and ut–1. Thus if the disturbance term in the original model satisfies the regression model assumptions (is not autocorrelated), the disturbance term in the regression model will be subject to MA(1) autocorrelation (first-order moving average autocorrelation). If you compare the composite disturbance terms for observations t and t – 1, you will see that they have a component ut–1 in common. 11

The combination of (moving-average) autocorrelation and the presence of the lagged dependent variable in the regression model causes a violation of Assumption C.7: the regressor and the error are autocorrelated. ut–1 is a component of both Yt–1 and the composite disturbance term. OLS estimates will be biased and inconsistent. Under these conditions, the other regression model should be used instead (the one without lagged Y’s). 12

However, look what happens if ut is AR(1) (instead of i.i.d.). Then the composite disturbance term at time t will be as shown. Now, under reasonable assumptions, both r and l should lie between 0 and 1. Hence it is possible that the coefficient of ut–1 may be small enough for the autocorrelation to be negligible. If that is the case, OLS could be used to fit the regression model after all. just perform an h test to check that there is no (significant) autocorrelation. 17

This model is nonlinear in parameters since there is a restriction.
COMMON FACTOR TEST If the disturbance term u is an AR(1) process, the model can be rewritten with Yt depending on Xt, Yt–1, Xt–1, and a disturbance term et that is not subject to autocorrelation. This model is nonlinear in parameters since there is a restriction. It can be thought of as a special case of a more general model involving the same variables. 4

Restriction embodied in the AR(1) process
COMMON FACTOR TEST Restricted model Unrestricted model Restriction embodied in the AR(1) process The restricted version imposes an interpretation on the coefficient of Yt–1 that may not be valid. 6

Restrictions embodied in the AR(1) process
COMMON FACTOR TEST Restricted model Unrestricted model Restrictions embodied in the AR(1) process If the original specification has two explanatory variables, the AR(1) special case is again a restricted version of a more general model. In this case, however, the restricted version incorporates two restrictions. 9

Restrictions embodied in the AR(1) process
COMMON FACTOR TEST Restricted model Unrestricted model Restrictions embodied in the AR(1) process One can, and one should, test the validity of the restrictions. The test is known as the common factor test. 11

Restrictions embodied in the AR(1) process
COMMON FACTOR TEST Restricted model RSSR Unrestricted model RSSU Restrictions embodied in the AR(1) process The test involves a comparison of RSSR and RSSU, the residual sums of squares in the restricted and unrestricted specifications. 12

Restrictions embodied in the AR(1) process
COMMON FACTOR TEST Restricted model RSSR Unrestricted model RSSU Restrictions embodied in the AR(1) process RSSR can never be smaller than RSSU and it will in practice be greater, because imposing a restriction in general leads to some loss of goodness of fit. The question is whether the loss of goodness of fit is significant. 13

Restrictions embodied in the AR(1) process
COMMON FACTOR TEST Test statistic: Restricted model RSSR Unrestricted model RSSU Restrictions embodied in the AR(1) process Because the restrictions are nonlinear, the F test is inappropriate. Instead, we construct the test statistic shown above. 15

Restrictions embodied in the AR(1) process
COMMON FACTOR TEST Test statistic: Restricted model RSSR Unrestricted model RSSU Restrictions embodied in the AR(1) process Under the null hypothesis that the restrictions are valid, the test statistic has a c 2 (chi- squared) distribution with degrees of freedom equal to the number of restrictions. It is in principle a large-sample test. 16

Restrictions embodied in the AR(1) process
COMMON FACTOR TEST Test statistic: Restricted model RSSR Unrestricted model RSSU Restrictions embodied in the AR(1) process Note that by fitting these models we claim to rid our specification of autocorrelation. The h-test should confirm this. 16