Presentation on theme: "Autocorrelation in Regression Analysis Tests for Autocorrelation Examples Durbin-Watson Tests Modeling Autoregressive Relationships."— Presentation transcript:
Autocorrelation in Regression Analysis Tests for Autocorrelation Examples Durbin-Watson Tests Modeling Autoregressive Relationships
What causes autocorrelation? Misspecification Data Manipulation –Before receipt –After receipt Event Inertia Spatial ordering
Checking for Autocorrelation Test: Durbin-Watson statistic: Positive Zone of No Autocorrelation Zone of Negative autocorrelationindecision indecision autocorrelation |_______________|__________________|_____________|_____________|__________________|___________________| 0 d-lower d-upper 2 4-d-upper 4-d-lower 4 Autocorrelation is clearly evident Ambiguous – cannot rule out autocorrelation Autocorrelation in not evident
Consider the following regression: Because this is time series data, we should consider the possibility of autocorrelation. To run the Durbin-Watson, first we have to specify the data as time series with the tsset command. Next we use the dwstat command. Durbin-Watson d-statistic( 3, 328) = Source | SS df MS Number of obs = F( 2, 325) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = price | Coef. Std. Err. t P>|t| [95% Conf. Interval] ice | quantity | -2.27e e e e-06 _cons |
Find the D-upper and D-lower Check a Durbin Watson table for the numbers for d-upper and d-lower. For n=20 and k=2, α =.05 the values are: –Lower = –Upper = Durbin's alternative test for autocorrelation lags(p) | chi2 df Prob > chi | H0: no serial correlation
Alternatives to the d-statistic The d-statistic is not valid in models with a lagged dependent variable –In the case of a lagged LHS variable you must use the Durbin-a test (the command is durbina in Stata) Also, the d-statistic is only for first order autocorrelation. In other instances you may use the Durbin-a –Why would you suspect other than 1 st order autocorrelation?
The Runs Test An alternative to the D-W test is a formalized examination of the signs of the residuals. We would expect that the signs of the residuals will be random in the absence of autocorrelation. The first step is to estimate the model and predict the residuals.
Runs continued Next, order the signs of the residuals against time (or spatial ordering in the case of cross-sectional data) and see if there are excessive “runs” of positives or negatives. Alternatively, you can graph the residuals and look for the same trends.
Runs test continued The final step is to use the expected mean and deviation in a standard t-test Stata does this automatically with the runtest command!
Visual diagnosis of autocorrelation (in a single series) A correlogram is a good tool to identify if a series is autocorrelated
Dealing with autocorrelation D-W is not appropriate for auto-regressive (AR) models, where: In this case, we use the Durbin alternative test For AR models, need to explicitly estimate the correlation between Y i and Y i-1 as a model parameter Techniques: AR1 models (closest to regression; 1st order only) ARIMA (any order)
Dealing with Autocorrelation There are several approaches to resolving problems of autocorrelation. –Lagged dependent variables –Differencing the Dependent variable –GLS –ARIMA
Lagged dependent variables The most common solution –Simply create a new variable that equals Y at t-1, and use as a RHS variable To do this in Stata, simply use the generate command with the new variable equal to L.variable –gen lagy = L.y –gen laglagy = L2.y This correction should be based on a theoretic belief for the specification May cause more problems than it solves Also costs a degree of freedom (lost observation) –There are several advanced techniques for dealing with this as well
Differencing Differencing is simply the act of subtracting the previous observation value from the current observation. To do this in Stata, again use the generate command with a capital D (instead of the L for lags) –This process is effective; however, it is an EXPENSIVE correction –This technique “throws away” long-term trends –Assumes the Rho = 1 exactly
GLS and ARIMA GLS approaches use maximum likelihood to estimate Rho and correct the model –These are good corrections, and can be replicated in OLS ARIMA is an acronym for Autoregressive Integrated Moving Average –This process is a univariate “filter” used to cleanse variables of a variety of pathologies before analysis
Corrections based on Rho There are several ways to estimate rho, the most simple being calculating it from the residuals We then estimate the regression by transforming the regressors so that: and This gives the regression:
High tech solutions Stata also offers the option of estimating the model with the AR (with multiple ways of estimating rho). There is also what is known as a prais-winsten regression which generates values for the lost observation For the truly adventurous, there is also the option of doing a full ARIMA model
Prais-winsten regression Prais-Winsten AR(1) regression -- iterated estimates Source | SS df MS Number of obs = F( 2, 325) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = price | Coef. Std. Err. t P>|t| [95% Conf. Interval] ice | quantity | -1.11e e e e-07 _cons | rho | Durbin-Watson statistic (original) Durbin-Watson statistic (transformed)
ARIMA The ARIMA model allows us to test the hypothesis of autocorrelation and remove it from the data. This is an iterative process akin to the purging we did when creating the ystar variable.
The model Significant lag Estimate of rho ARIMA regression Sample: 1 to 328 Number of obs = 328 Wald chi2(1) = Log likelihood = Prob > chi2 = | OPG price | Coef. Std. Err. z P>|z| [95% Conf. Interval] price | _cons | ARMA | ar | L1. | /sigma |
The residuals of the ARIMA model There are a few significant lags a ways back. Generally we should expect some, but this mess is probably an indicator of a seasonal trend (well beyond the scope of this lecture)!
ARIMA with a covariate ARIMA regression Sample: 1 to 328 Number of obs = 328 Wald chi2(3) = Log likelihood = Prob > chi2 = | OPG price | Coef. Std. Err. z P>|z| [95% Conf. Interval] price | ice | quantity | -1.04e e e e-07 _cons | ARMA | ar | L1. | /sigma |
Final thoughts Each correction has a “best” application. –If we wanted to evaluate a mean shift (dummy variable only model), calculating rho will not be a good choice. Then we would want to use the lagged dependent variable –Also, where we want to test the effect of inertia, it is probably better to use the lag
Final Thoughts Continued –In Small N, calculating rho tends to be more accurate –ARIMA is one of the best options, however, it is very complicated! –When dealing with time, the number of time periods and the spacing of the observations is VERY IMPORTANT! –When using estimates of rho, a good rule of thumb is to make sure you have time points at a minimum. More if the observations are too close for the process you are observing!
Next Time: Review for Exam –Plenary Session Exam Posting –Available after class Wednesday