2018-12-06 Lecture 5 732G21/732G28/732A35 Detta är en generell mall för att göra PowerPoint presentationer enligt LiUs grafiska profil. Du skriver in din rubrik, namn osv på sid 1. Börja sedan skriva in din text på sid 2. För att skapa nya sidor, tryck Ctrl+M. Sidan 3 anger placering av bilder och grafik. Titta gärna på ”Baspresentation 2008” för exempel. Den sista bilden är en avslutningsbild som visar LiUs logotype och webadress. Om du vill ha fast datum, eller ändra författarnamn, gå in under Visa, Sidhuvud och Sidfot. Linköpings universitet
2018-12-06 Extra sums of squares The difference between SSE for a model with a certain setup of predictors and the SSE for a model with the same predictors plus one or more additional predictors Consider the model Then, we can define the extra sums of squares from adding X2 to the model as Linköpings universitet
Salary example Regression Analysis: Salary (Y) versus Age (X1) 2018-12-06 Salary example Regression Analysis: Salary (Y) versus Age (X1) The regression equation is Salary (Y) = 8.45 + 0.547 Age (X1) Predictor Coef SE Coef T P Constant 8.454 4.848 1.74 0.132 Age (X1) 0.5471 0.1099 4.98 0.003 S = 4.05592 R-Sq = 80.5% R-Sq(adj) = 77.2% Analysis of Variance Source DF SS MS F P Regression 1 407.30 407.30 24.76 0.003 Residual Error 6 98.70 16.45 Total 7 506.00 Linköpings universitet
Salary (Y) Age (X1) Highschool points (X2) 17 21 30 32 120 27 40 35 56 2018-12-06 Salary (Y) Age (X1) Highschool points (X2) 17 21 30 32 120 27 40 35 56 90 44 61 160 38 55 36 39 140 25 33 80 Linköpings universitet
2018-12-06 Regression Analysis: Salary (Y) versus Age (X1), Highschool points (X2) The regression equation is Salary (Y) = 10.1 + 0.319 Age (X1) + 0.0805 Highschool points (X2) Predictor Coef SE Coef T P Constant 10.126 2.347 4.32 0.008 Age (X1) 0.31869 0.07225 4.41 0.007 Highschool points (X2) 0.08049 0.01746 4.61 0.006 S = 1.93941 R-Sq = 96.3% R-Sq(adj) = 94.8% Analysis of Variance Source DF SS MS F P Regression 2 487.19 243.60 64.76 0.000 Residual Error 5 18.81 3.76 Total 7 506.00 Source DF Seq SS Age (X1) 1 407.30 Highschool points (X2) 1 79.90 Linköpings universitet
2018-12-06 Partial F-test H0: βq = βq+1 = … = βp-1 = 0 Ha: not all β in H0 = 0 Reject H0 if F* > F(1-α; p-q; n-p) Linköpings universitet
Linköpings universitet 2018-12-06 Salary (Y) Age (X1) Highschool points (X2) Female/Male (X3) 17 21 1 30 32 120 27 40 35 56 90 44 61 160 38 55 36 39 140 25 33 80 Linköpings universitet
2018-12-06 Regression Analysis: Salary (Y) versus Age (X1), Highschool point, ... The regression equation is Salary (Y) = 7.13 + 0.393 Age (X1) + 0.0652 Highschool points (X2) + 2.73 Female/Male (X3) Predictor Coef SE Coef T P Constant 7.132 2.155 3.31 0.030 Age (X1) 0.39317 0.06201 6.34 0.003 Highschool points (X2) 0.06521 0.01441 4.52 0.011 Female/Male (X3) 2.732 1.185 2.31 0.082 S = 1.42101 R-Sq = 98.4% R-Sq(adj) = 97.2% Analysis of Variance Source DF SS MS F P Regression 3 497.92 165.97 82.20 0.000 Residual Error 4 8.08 2.02 Total 7 506.00 Source DF Seq SS Age (X1) 1 407.30 Highschool points (X2) 1 79.90 Female/Male (X3) 1 10.73 Linköpings universitet
Summary of tests of regression coefficients 2018-12-06 Summary of tests of regression coefficients Test whether a single βk = 0: t-test Test whether all β = 0: F-test Test whether a subset of the β = 0: Partial F-test Linköpings universitet
Coefficient of partial determination 2018-12-06 Coefficient of partial determination Tell us how much R2 increases if another predictor is added to the model. Consider and add X2. Linköpings universitet
Multicollinearity When we have high correlation among the predictors. 2018-12-06 Multicollinearity When we have high correlation among the predictors. Multicollinearity causes Adding or deleting a predictor changes the estimates of the regression coefficients very much. The standard errors of the regression coefficients become very large. Thus, conclusions from the model become more imprecise. The estimated regression coefficients will be nonsignificant, although they are highly correlated with Y. When we interpret the regression coefficients, we interpret one at the time, keeping the others constant. If there is high correlation among the predictors, this is of course not possible because if we change one of them, the others will change too (it is possible mathematically, but not logically). Linköpings universitet
Indications of the presence of multicollinearity 2018-12-06 Indications of the presence of multicollinearity Large changes in the regression coefficients when a predictor is added or deleted Non-significant results in t-tests on the regression coefficients for variables that through scatter matrix and correlation matrices (and logically) seemed to be very important. Estimated regression coefficients with a sign opposite to what we expect it to be. Linköpings universitet
Formal test of the presence of multicollinearity 2018-12-06 Formal test of the presence of multicollinearity Variance Inflation Factor (VIF) is the coefficient of determination when performing a regression of Xk versus the other X-variables in the model. Consider a model with predictors X1, X2 and X3: Decision rule: if the largest VIF > 10 and the average of the VIF:s are larger than one, we may multicollinearity in the model. give give give Linköpings universitet