Presentation on theme: "Autocorrelation in Regression Analysis"— Presentation transcript:
1 Autocorrelation in Regression Analysis Tests for AutocorrelationExamplesDurbin-Watson TestsModeling Autoregressive Relationships
2 What causes autocorrelation? MisspecificationData ManipulationBefore receiptAfter receiptEvent InertiaSpatial ordering
3 Checking for Autocorrelation Test: Durbin-Watson statistic:Positive Zone of No Autocorrelation Zone of Negativeautocorrelation indecision indecision autocorrelation|_______________|__________________|_____________|_____________|__________________|___________________|d-lower d-upper d-upper d-lowerAutocorrelation is clearly evidentAmbiguous – cannot rule out autocorrelationAutocorrelation in not evident
4 Consider the following regression: Source | SS df MS Number of obs =F( 2, 325) =Model | Prob > F =Residual | R-squared =Adj R-squared =Total | Root MSE =price | Coef. Std. Err t P>|t| [95% Conf. Interval]ice |quantity | e e e e-06_cons |Because this is time series data, we should consider the possibility of autocorrelation. To run the Durbin-Watson, first we have to specify the data as time series with the tsset command. Next we use the dwstat command.Durbin-Watson d-statistic( 3, 328) =
5 Find the D-upper and D-lower Check a Durbin Watson table for the numbers for d-upper and d-lower.For n=20 and k=2, α = .05 the values are:Lower = 1.643Upper = 1.704Durbin's alternative test for autocorrelationlags(p) | chi df Prob > chi21 |H0: no serial correlation
6 Alternatives to the d-statistic The d-statistic is not valid in models with a lagged dependent variableIn the case of a lagged LHS variable you must use the Durbin-a test (the command is durbina in Stata)Also, the d-statistic is only for first order autocorrelation. In other instances you may use the Durbin-aWhy would you suspect other than 1st order autocorrelation?
7 The Runs TestAn alternative to the D-W test is a formalized examination of the signs of the residuals. We would expect that the signs of the residuals will be random in the absence of autocorrelation.The first step is to estimate the model and predict the residuals.
8 Runs continuedNext, order the signs of the residuals against time (or spatial ordering in the case of cross-sectional data) and see if there are excessive “runs” of positives or negatives. Alternatively, you can graph the residuals and look for the same trends.
9 Runs test continuedThe final step is to use the expected mean and deviation in a standard t-testStata does this automatically with the runtest command!
10 Visual diagnosis of autocorrelation (in a single series) A correlogram is a good tool to identify if a series is autocorrelated
11 Dealing with autocorrelation D-W is not appropriate for auto-regressive (AR) models, where:In this case, we use the Durbin alternative testFor AR models, need to explicitly estimate the correlation between Yi and Yi-1 as a model parameterTechniques:AR1 models (closest to regression; 1st order only)ARIMA (any order)
12 Dealing with Autocorrelation There are several approaches to resolving problems of autocorrelation.Lagged dependent variablesDifferencing the Dependent variableGLSARIMA
13 Lagged dependent variables The most common solutionSimply create a new variable that equals Y at t-1, and use as a RHS variableTo do this in Stata, simply use the generate command with the new variable equal to L.variablegen lagy = L.ygen laglagy = L2.yThis correction should be based on a theoretic belief for the specificationMay cause more problems than it solvesAlso costs a degree of freedom (lost observation)There are several advanced techniques for dealing with this as well
14 DifferencingDifferencing is simply the act of subtracting the previous observation value from the current observation.To do this in Stata, again use the generate command with a capital D (instead of the L for lags)This process is effective; however, it is an EXPENSIVE correctionThis technique “throws away” long-term trendsAssumes the Rho = 1 exactly
15 GLS and ARIMAGLS approaches use maximum likelihood to estimate Rho and correct the modelThese are good corrections, and can be replicated in OLSARIMA is an acronym for Autoregressive Integrated Moving AverageThis process is a univariate “filter” used to cleanse variables of a variety of pathologies before analysis
16 Corrections based on Rho There are several ways to estimate rho, the most simple being calculating it from the residualsWe then estimate the regression by transforming the regressors so that: andThis gives the regression:
17 High tech solutionsStata also offers the option of estimating the model with the AR (with multiple ways of estimating rho). There is also what is known as a prais-winsten regression which generates values for the lost observationFor the truly adventurous, there is also the option of doing a full ARIMA model
18 Prais-winsten regression Prais-Winsten AR(1) regression -- iterated estimatesSource | SS df MS Number of obs =F( 2, 325) =Model | Prob > F =Residual | R-squared =Adj R-squared =Total | Root MSE =price | Coef. Std. Err t P>|t| [95% Conf. Interval]ice |quantity | e e e e-07_cons |rho |Durbin-Watson statistic (original)Durbin-Watson statistic (transformed)
19 ARIMAThe ARIMA model allows us to test the hypothesis of autocorrelation and remove it from the data.This is an iterative process akin to the purging we did when creating the ystar variable.
20 The model Estimate of rho Significant lag ARIMA regression Sample: 1 to Number of obs =Wald chi2(1) =Log likelihood = Prob > chi =| OPGprice | Coef. Std. Err z P>|z| [95% Conf. Interval]price |_cons |ARMA |ar |L1. |/sigma |Estimate of rhoSignificant lag
21 The residuals of the ARIMA model There are a few significant lags a ways back. Generally we should expect some, but this mess is probably an indicator of a seasonal trend (well beyond the scope of this lecture)!
22 ARIMA with a covariate ARIMA regression Sample: 1 to Number of obs =Wald chi2(3) =Log likelihood = Prob > chi =| OPGprice | Coef. Std. Err z P>|z| [95% Conf. Interval]price |ice |quantity | e e e e-07_cons |ARMA |ar |L1. |/sigma |
23 Final thoughts Each correction has a “best” application. If we wanted to evaluate a mean shift (dummy variable only model), calculating rho will not be a good choice. Then we would want to use the lagged dependent variableAlso, where we want to test the effect of inertia, it is probably better to use the lag
24 Final Thoughts Continued In Small N, calculating rho tends to be more accurateARIMA is one of the best options, however, it is very complicated!When dealing with time, the number of time periods and the spacing of the observations is VERY IMPORTANT!When using estimates of rho, a good rule of thumb is to make sure you have time points at a minimum. More if the observations are too close for the process you are observing!
25 Next Time: Review for Exam Exam Posting Plenary Session Available after class Wednesday