Presentation is loading. Please wait.

Presentation is loading. Please wait.

Anareg Week 10 Multicollinearity Interesting special cases Polynomial regression.

Similar presentations


Presentation on theme: "Anareg Week 10 Multicollinearity Interesting special cases Polynomial regression."— Presentation transcript:

1 Anareg Week 10 Multicollinearity Interesting special cases Polynomial regression

2 Multicollinearity Numerical analysis problem is that the matrix X’X is close to singular and is therefore difficult to invert accurately Statistical problem is that there is too much correlation among the explanatory variables and it is therefore difficult to determine the regression coefficients

3 Multicollinearity (2) Solve the statistical problem and the numerical problem will also be solved The statistical problem is more serious than the numerical problem We want to refine a model that has redundancy in the explanatory variables even if X’X can be inverted without difficulty

4 Multicollinearity (3) Extremes cases can help us to understand the problem if all X’s are uncorrelated, Type I SS and Type II SS will be the same, i.e, the contribution of each explanatory variable to the model will be the same whether or not the other explanatory variables are in the model if there is a linear combination of the explanatory variables that is a constant (e.g. X 1 = X 2 (X 1 - X 2 = 0)), then the Type II SS for the X’s involved will be zero

5 An example Y = gpaX 1 = hsm X 3 = hssX 4 = hse X 5 = satmX 6 = satv X 7 = genderm; Define: sat=satm+satv; We will regress Y on sat satm and satv;

6 Output Source DF Model 2 Error 221 Corrected Total 223 Something is wrong dfM=2 but there are 3 Xs

7 Output (2) NOTE: Model is not full rank. Least-squares solutions for the parameters are not unique. Some statistics will be misleading. A reported DF of 0 or B means that the estimate is biased.

8 Output (3) NOTE: The following parameters have been set to 0, since the variables are a linear combination of other variables as shown. satv = sat - satm

9 Output (4) Par St Var DF Est Err t P Int 1 1.28 0.37 3.43 0.0007 sat B -0.00 0.00 -0.04 0.9684 satm B 0.00 0.00 2.10 0.0365 satv 0 0...

10 Extent of multicollinearity Our CS example had one explanatory variable equal to a linear combination of other explanatory variables This is the most extreme case of multicollinearity and is detected by statistical software because (X’X) does not have an inverse We are concerned with cases less extreme

11 Effects of multicollinearity Regression coefficients are not well estimated and may be meaningless Similarly for standard errors of these estimates Type I SS and Type II SS will differ R 2 and predicted values are usually ok

12 Two separate problems Numerical accuracy (X’X) is difficult to invert Need good software Statistical problem Results are difficult to interpret Need a better model

13 Polynomial regression We can do linear, quadratic, cubic, etc. by defining squares, cubes, etc. in a data step and using these as predictors in a multiple regression We can do this with more than one explanatory variable When we do this we generally create a multicollinearity problem

14 Polynomial Regression (2) We can remove the correlation between explanatory variables and their squares Center (subtract the mean) before squaring NKNW rescale by standardizing (subtract the mean and divide by the standard deviation)

15 Interaction Models With several explanatory variables, we need to consider the possibility that the effect of one variable depends on the value of another variable Special cases One indep variable – second order One indep variable – Third order Two cindep variables – second order

16 One Independent variable – Second Order The regression model: The mean response is a parabole and is frequently called a quadratic response function. βo reperesents the mean response of Y when x = 0 and β1 is often called the linear effect coeff while β11 is called the quadratic effect coeff.

17 One Independent variable –Third Order The regression model: The mean response is

18 Two Independent variable – Second Order The regression model: The mean response is the equation of a conic section. The coeff β12 is often called the interaction effect coeff.

19 NKNW Example p 330 Response variable is the life (in cycles) of a power cell Explanatory variables are Charge rate (3 levels) Temperature (3 levels) This is a designed experiment

20 check the data Obs cycles chrate temp 1 150 0.6 10 2 86 1.0 10 3 49 1.4 10 4 288 0.6 20 5 157 1.0 20 6 131 1.0 20 7 184 1.0 20 8 109 1.4 20 9 279 0.6 30 10 235 1.0 30 11 224 1.4 30

21 Create the new variables and run the regression Create new variables chrate2=chrate*chrate; temp2=temp*temp; ct=chrate*temp; Then regress cycles on chrate, temp, chrate2, temp2, and ct;

22 a. Regression Coefficients VarbS(b)tPr>|t| int162.8416.619.81<.0002 Chrate-55.8313.22-4.22<0.01 Temp75.5013.225.71<0.005 Chrate227.3920.341.350.2359 Temp2-10.6120.34-.520.6244 ct11.5016.19.710.5092 Output

23 Output (2) b. ANOVA Table SourcedfSSMS Regression56636611703 X1X1 118704 X 2 |X 1 134201 X 1 2 |X 1,X 2 11646 X 2 2 |X 1,X 2,X 1 2 1285 X 2 2 |X 1,X 2,X 1 2, X 2 2 1529 Error542401048 Total1060606

24 Conclusion We have a multicollinearity problem Lets look at the correlations (use proc corr) There are some very high correlations r(chrate,chrate2) = 0.99103 r(temp,temp2) = 0.98609

25 A remedy We can remove the correlation between explanatory variables and their squares Center (subtract the mean) before squaring NKNW rescale by standardizing (subtract the mean and divide by the standard deviation)

26 Last slide Read NKNW 7.6 to 7.7 and the problems on pp 317-326 We used programs cs4.sas and NKNW302.sas to generate the output for today

27 Last slide Read NKNW 8.5 and Chapter 9


Download ppt "Anareg Week 10 Multicollinearity Interesting special cases Polynomial regression."

Similar presentations


Ads by Google