3Checking for Autocorrelation Test: Durbin-Watson statistic:Positive Zone of No Autocorrelation Zone of Negativeautocorrelation indecision indecision autocorrelation|_______________|__________________|_____________|_____________|__________________|___________________|d-lower d-upper d-upper d-lowerAutocorrelation is clearly evidentAmbiguous – cannot rule out autocorrelationAutocorrelation in not evident
4Consider the following regression: Source | SS df MS Number of obs =F( 2, 325) =Model | Prob > F =Residual | R-squared =Adj R-squared =Total | Root MSE =price | Coef. Std. Err t P>|t| [95% Conf. Interval]ice |quantity | e e e e-06_cons |Because this is time series data, we should consider the possibility of autocorrelation. To run the Durbin-Watson, first we have to specify the data as time series with the tsset command. Next we use the dwstat command.Durbin-Watson d-statistic( 3, 328) =
5Find the D-upper and D-lower Check a Durbin Watson table for the numbers for d-upper and d-lower.For n=20 and k=2, α = .05 the values are:Lower = 1.643Upper = 1.704Durbin's alternative test for autocorrelationlags(p) | chi df Prob > chi21 |H0: no serial correlation
6Alternatives to the d-statistic The d-statistic is not valid in models with a lagged dependent variableIn the case of a lagged LHS variable you must use the Durbin-a test (the command is durbina in Stata)Also, the d-statistic is only for first order autocorrelation. In other instances you may use the Durbin-aWhy would you suspect other than 1st order autocorrelation?
7The Runs TestAn alternative to the D-W test is a formalized examination of the signs of the residuals. We would expect that the signs of the residuals will be random in the absence of autocorrelation.The first step is to estimate the model and predict the residuals.
8Runs continuedNext, order the signs of the residuals against time (or spatial ordering in the case of cross-sectional data) and see if there are excessive “runs” of positives or negatives. Alternatively, you can graph the residuals and look for the same trends.
9Runs test continuedThe final step is to use the expected mean and deviation in a standard t-testStata does this automatically with the runtest command!
10Visual diagnosis of autocorrelation (in a single series) A correlogram is a good tool to identify if a series is autocorrelated
11Dealing with autocorrelation D-W is not appropriate for auto-regressive (AR) models, where:In this case, we use the Durbin alternative testFor AR models, need to explicitly estimate the correlation between Yi and Yi-1 as a model parameterTechniques:AR1 models (closest to regression; 1st order only)ARIMA (any order)
12Dealing with Autocorrelation There are several approaches to resolving problems of autocorrelation.Lagged dependent variablesDifferencing the Dependent variableGLSARIMA
13Lagged dependent variables The most common solutionSimply create a new variable that equals Y at t-1, and use as a RHS variableTo do this in Stata, simply use the generate command with the new variable equal to L.variablegen lagy = L.ygen laglagy = L2.yThis correction should be based on a theoretic belief for the specificationMay cause more problems than it solvesAlso costs a degree of freedom (lost observation)There are several advanced techniques for dealing with this as well
14DifferencingDifferencing is simply the act of subtracting the previous observation value from the current observation.To do this in Stata, again use the generate command with a capital D (instead of the L for lags)This process is effective; however, it is an EXPENSIVE correctionThis technique “throws away” long-term trendsAssumes the Rho = 1 exactly
15GLS and ARIMAGLS approaches use maximum likelihood to estimate Rho and correct the modelThese are good corrections, and can be replicated in OLSARIMA is an acronym for Autoregressive Integrated Moving AverageThis process is a univariate “filter” used to cleanse variables of a variety of pathologies before analysis
16Corrections based on Rho There are several ways to estimate rho, the most simple being calculating it from the residualsWe then estimate the regression by transforming the regressors so that: andThis gives the regression:
17High tech solutionsStata also offers the option of estimating the model with the AR (with multiple ways of estimating rho). There is also what is known as a prais-winsten regression which generates values for the lost observationFor the truly adventurous, there is also the option of doing a full ARIMA model
18Prais-winsten regression Prais-Winsten AR(1) regression -- iterated estimatesSource | SS df MS Number of obs =F( 2, 325) =Model | Prob > F =Residual | R-squared =Adj R-squared =Total | Root MSE =price | Coef. Std. Err t P>|t| [95% Conf. Interval]ice |quantity | e e e e-07_cons |rho |Durbin-Watson statistic (original)Durbin-Watson statistic (transformed)
19ARIMAThe ARIMA model allows us to test the hypothesis of autocorrelation and remove it from the data.This is an iterative process akin to the purging we did when creating the ystar variable.
20The model Estimate of rho Significant lag ARIMA regression Sample: 1 to Number of obs =Wald chi2(1) =Log likelihood = Prob > chi =| OPGprice | Coef. Std. Err z P>|z| [95% Conf. Interval]price |_cons |ARMA |ar |L1. |/sigma |Estimate of rhoSignificant lag
21The residuals of the ARIMA model There are a few significant lags a ways back. Generally we should expect some, but this mess is probably an indicator of a seasonal trend (well beyond the scope of this lecture)!
22ARIMA with a covariate ARIMA regression Sample: 1 to Number of obs =Wald chi2(3) =Log likelihood = Prob > chi =| OPGprice | Coef. Std. Err z P>|z| [95% Conf. Interval]price |ice |quantity | e e e e-07_cons |ARMA |ar |L1. |/sigma |
23Final thoughts Each correction has a “best” application. If we wanted to evaluate a mean shift (dummy variable only model), calculating rho will not be a good choice. Then we would want to use the lagged dependent variableAlso, where we want to test the effect of inertia, it is probably better to use the lag
24Final Thoughts Continued In Small N, calculating rho tends to be more accurateARIMA is one of the best options, however, it is very complicated!When dealing with time, the number of time periods and the spacing of the observations is VERY IMPORTANT!When using estimates of rho, a good rule of thumb is to make sure you have time points at a minimum. More if the observations are too close for the process you are observing!
25Next Time: Review for Exam Exam Posting Plenary Session Available after class Wednesday