Diagnostics – Part II Using statistical tests to check to see if the assumptions we made about the model are realistic.

Diagnostics – Part II Using statistical tests to check to see if the assumptions we made about the model are realistic

Diagnostic methods Some simple (but subjective) plots. (Then) Some formal statistical tests. (Now)

Simple linear regression model Error terms have mean 0, i.e., E(  i ) = 0.  i and  j are uncorrelated (independent). Error terms have same variance, i.e., Var(  i ) =  2. Error terms  i are normally distributed. The response Y i is a function of a systematic linear component and a random error component: with assumptions that:

Why should we keep NAGGING ourselves about the model? All of the estimates, confidence intervals, prediction intervals, hypothesis tests, etc. have been developed assuming that the model is correct. If the model is incorrect, then the formulas and methods we use are at risk of being incorrect. (Some are more forgiving than others.)

Summary of the tests we’ll learn … Durbin-Watson test for detecting correlated (adjacent) error terms. Modified Levene test for constant error variance. (Ryan-Joiner) correlation test for normality of error terms.

The Durbin-Watson test for uncorrelated (adjacent) error terms Durbin-Watson test statistic Compare D to Durbin-Watson test bounds in Table B.7: If D > upper bound (d U ), conclude no correlation. If D < lower bound (d L ), conclude positive correlation. If D is between the two bounds, the test is inconclusive.

Example: Blaisdell Company Seasonally adjusted quarterly data, 1988 to 1992. Reasonable fit, but are the error terms positively auto-correlated?

Blaisdell Company Example: Durbin-Watson test Stat >> Regression >> Regression. Under Options…, select Durbin-Watson statistic. Durbin-Watson statistic = 0.73 Table B.7 with level of significance α=0.01, (p-1)=1 predictor variable, and n=20 (5 years, 4 quarters each) gives d L = 0.95 and d U =1.15. Since D=0.73 < d L =0.95, conclude error terms are positively auto-correlated.

For completeness’ sake … one more thing about Durbin-Watson test If test for negative auto-correlation is desired, use D*=4-D instead. If D* < d L, then conclude error terms are negatively auto-correlated. If two-sided test is desired (both positive and negative auto-correlation possible), conduct both one-sided tests, D and D*, separately. Level of significance is then 2α.

Modified Levene Test for nonconstant error variance Divide the data set into two roughly equal-sized groups, based on the level of X. If the error variance is either increasing or decreasing with X, the absolute deviations of the residuals around their group median will be larger for one of the two groups. Two-sample t* to test whether mean of absolute deviations for one group differs significantly from mean of absolute deviations for second group.

Modified Levene Test in Minitab Use Manip >> Code >> Numeric to numeric … to create a GROUP variable based on the values of X. Stat >> Regression >> Regression. Under Storage …, select residuals. Stat >> Basic statistics >> 2 Variances … Specify Samples (RESI1) and Subscripts (GROUP). Select OK. Look in session window for Levene P-value.

Example: How is plutonium activity related to alpha particle counts?

A residual versus fits plot suggesting non-constant error variance

Plutonium Alpha Example: Modified Levene’s Test Levene's Test (any continuous distribution) Test Statistic: 9.452 P-Value : 0.006 It is highly unlikely (P=0.006) that we’d get such an extreme Levene statistic (L=9.452) if the variances of the two groups were equal. Reject the null hypothesis at the 0.01 level, and conclude that the error variances are not constant.

(Ryan-Joiner) Correlation test for normality of error terms in Minitab H 0 : Error terms are normally distributed vs. H A : Error terms are not normally distributed Stat >> Regression >> Regression. Under storage…, select residuals. Stat >> Basic statistics >> Normality Test. Select residuals (RESI1) and request Ryan- Joiner test. Select OK.

100 chi-square (1 df) data values

Normal probability plot and test for 100 chi-square (1 df) data values

100 normal(0,1) data values

Normal probability plot and test for 100 normal(0,1) data values

Normal probability plot for Tree diameter (X) and C-dating Age (Y)

Tree diameter and Age Example: Ryan-Joiner Correlation Test

Some closing comments Checking of assumptions is important, but be aware of the “robustness” of your methods, so you don’t get too hung up. Model checking is an art as well as a science. Do not think that there is some definitive correct answer “in the back of the book.” Use your knowledge of the subject matter.

Diagnostics – Part II Using statistical tests to check to see if the assumptions we made about the model are realistic.

Similar presentations

Presentation on theme: "Diagnostics – Part II Using statistical tests to check to see if the assumptions we made about the model are realistic."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Diagnostics – Part II Using statistical tests to check to see if the assumptions we made about the model are realistic.

Similar presentations

Presentation on theme: "Diagnostics – Part II Using statistical tests to check to see if the assumptions we made about the model are realistic."— Presentation transcript:

Similar presentations

About project

Feedback