Download presentation

Presentation is loading. Please wait.

1
Topic 9: Remedies

2
**Outline Review diagnostics for residuals Discuss remedies**

Nonlinear relationship Nonconstant variance Non-Normal distribution Outliers

3
**Diagnostics for residuals**

Look at residuals to find serious violations of the model assumptions nonlinear relationship nonconstant variance non-Normal errors presence of outliers a strongly skewed distribution

4
**Recommendations for checking assumptions**

Plot Y vs X (is it a linear relationship?) Look at distribution of residuals Plot residuals vs X, time, or any other potential explanatory variable Use the i=sm## in symbol statement to get smoothed curves

5
**Plots of Residuals Plot residuals vs Look for nonrandom patterns**

Time (order) X or predicted value (b0+b1X) Look for nonrandom patterns outliers (unusual observations)

6
Residuals vs Order Pattern in plot suggests dependent errors / lack of indep Pattern usually a linear or quadratic trend and/or cyclical If you are interested read KNNL pgs

7
**Residuals vs X Can look for nonconstant variance**

nonlinear relationship outliers somewhat address Normality of residuals

8
Tests for Normality H0: data are an i.i.d. sample from a Normal population Ha: data are not an i.i.d. sample from a Normal population KNNL (p 115) suggest a correlation test that requires a table look-up

9
Tests for Normality We have several choices for a significance testing procedure Proc univariate with the normal option provides four proc univariate normal; Shapiro-Wilk is a common choice

10
**All P-values > 0.05…Do not reject H0**

Tests for Normality Test Statistic p Value Shapiro-Wilk W Pr < W 0.8626 Kolmogorov-Smirnov D Pr > D >0.1500 Cramer-von Mises W-Sq Pr > W-Sq >0.2500 Anderson-Darling A-Sq Pr > A-Sq All P-values > 0.05…Do not reject H0

11
**Other tests for model assumptions**

Durbin-Watson test for serially correlated errors (KNNL p 114) Modified Levene test for homogeneity of variance (KNNL p ) Breusch-Pagan test for homogeneity of variance (KNNL p 118) For SAS commands see topic9.sas

12
**Plots vs significance test**

Plots are more likely to suggest a remedy Significance tests results are very dependent on the sample size; with sufficiently large samples we can reject most null hypotheses

13
**Default graphics with SAS 9.3**

proc reg data=toluca; model hours=lotsize; id lotsize; run;

14
**Will discuss these diagnostics more in multiple regression**

Provides rule of thumb limits Questionable observation (30,273)

15
Additional summaries Rstudent: Studentized residual…almost all should be between ± 2 Leverage: “Distance” of X from center…helps determine outlying X values in multivariable setting…outlying X values may be influential Cooks’D: Influence of ith case on all predicted values

18
Lack of fit When we have repeat observations at different values of X, we can do a significance test for nonlinearity Browse through KNNL Section 3.7 Details of approach discussed when we get to KNNL 17.9, p 762 Basic idea is to compare two models Gplot with a smooth is a better (i.e., simpler) approach

19
**SAS code and output proc reg data=toluca;**

model hours=lotsize / lackfit; run; Analysis of Variance Source DF Sum of Squares Mean Square F Value Pr > F Model 1 252378 105.88 <.0001 Error 23 54825 Lack of Fit 9 17245 0.71 0.6893 Pure Error 14 37581 Corrected Total 24 307203

20
**Nonlinear relationships**

We can model many nonlinear relationships with linear models, some have several explanatory variables (i.e., multiple linear regression) Y = β0 + β1X + β2X2 + e (quadratic) Y = β0 + β1log(X) + e

21
**Nonlinear Relationships**

Sometimes can transform a nonlinear equation into a linear equation Consider Y = β0exp(β1X) + e Can form linear model using log log(Y) = log(β0) + β1X + log(e) Note that we have changed our assumption about the error

22
**Nonlinear Relationship**

We can perform a nonlinear regression analysis KNNL Chapter 13 SAS PROC NLIN

23
Nonconstant variance Sometimes we model the way in which the error variance changes may be linearly related to X We can then use a weighted analysis KNNL 11.1 Use a weight statement in PROC REG

24
**Non-Normal errors Transformations often help**

Use a procedure that allows different distributions for the error term SAS PROC GENMOD

25
**Generalized Linear Model**

Possible distributions of Y: Binomial (Y/N or percentage data) Poisson (Count data) Gamma (exponential) Inverse gaussian Negative binomial Multinomial Specify a link function for E(Y)

26
**Ladder of Reexpression (transformations)**

1.5 p Transformation is xp 1.0 0.5 0.0 -0.5 -1.0

27
**Circle of Transformations**

X down, Y up X up, Y up Y X X up, Y down X down, Y down

28
**Box-Cox Transformations**

Also called power transformations These transformations adjust for non-Normality and nonconstant variance Y´ = Y or Y´ = (Y - 1)/ In the second form, the limit as approaches zero is the (natural) log

29
**Important Special Cases**

= 1, Y´ = Y1, no transformation = .5, Y´ = Y1/2, square root = -.5, Y´ = Y-1/2, one over square root = -1, Y´ = Y-1 = 1/Y, inverse = 0, Y´ = (natural) log of Y

30
Box-Cox Details We can estimate by including it as a parameter in a non-linear model Y = β0 + β1X + e and using the method of maximum likelihood Details are in KNNL p SAS code is in boxcox.sas

31
**Box-Cox Solution Standardized transformed Y is K1(Y - 1) if ≠ 0**

K2log(Y) if = 0 where K2 = ( Yi)1/n (the geometric mean) and K1 = 1/ ( K2 -1) Run regressions with X as explanatory variable estimated minimizes SSE

32
**Example data a1; input age plasma @@; cards;**

4 6.23 ;

34
Box Cox Procedure *Procedure that will automatically find the Box-Cox transformation; proc transreg data=a1; model boxcox(plasma)=identity(age); run;

35
**Lambda R-Square Log Like -2.50 0.76 -17.0444 -2.00 0.80 -12.3665 **

Transformation Information for BoxCox(plasma) Lambda R-Square Log Like * < * < - Best Lambda * - Confidence Interval + - Convenient Lambda

36
***The first part of the program gets the geometric mean;**

data a2; set a1; lplasma=log(plasma); proc univariate data=a2 noprint; var lplasma; output out=a3 mean=meanl;

37
data a4; set a2; if _n_ eq 1 then set a3; keep age yl l; k2=exp(meanl); do l = -1.0 to 1.0 by .1; k1=1/(l*k2**(l-1)); yl=k1*(plasma**l -1); if abs(l) < 1E-8 then yl=k2*log(plasma); output; end;

38
**proc sort data=a4 out=a4;**

by l; proc reg data=a4 noprint outest=a5; model yl=age; data a5; set a5; n=25; p=2; sse=(n-p)*(_rmse_)**2; proc print data=a5; var l sse;

39
Obs l sse

40
symbol1 v=none i=join; proc gplot data=a5; plot sse*l; run;

42
data a1; set a1; tplasma = plasma**(-.5); tage = (age+.5)**(-.5); symbol1 v=circle i=sm50; proc gplot; plot tplasma*age; proc sort; by tage; proc gplot; plot tplasma*tage; run;

45
Background Reading Sections describe significance tests for assumptions (read it if you are interested). Box-Cox transformation is in nknw132.sas Read sections 4.1, 4.2, 4.4, 4.5, and 4.6

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google