Presentation is loading. Please wait.

Presentation is loading. Please wait.

Class 23 The most over-rated statistic The four assumptions The most Important hypothesis test yet Using yes/no variables in regressions.

Similar presentations


Presentation on theme: "Class 23 The most over-rated statistic The four assumptions The most Important hypothesis test yet Using yes/no variables in regressions."— Presentation transcript:

1 Class 23 The most over-rated statistic The four assumptions The most Important hypothesis test yet Using yes/no variables in regressions

2 Adjusted R-square Pg 9-12 Pfeifer note Hours 2 4.17 4.42 4.75 4.83 6.67 7 7.08 7.17 10 12 12.5 13.67 15.08 Hours Mean7.900667 Standard Error1.003487 Median7.08 Mode7.17 Standard Deviation3.886488 Sample Variance15.10479 Kurtosis-0.75506 Skewness0.524811 Range13.08 Minimum2 Maximum15.08 Sum118.51 Count15 Our better method of forecasting hours would use a mean of 7.9 and standard deviation of 3.89 (and the t- distribution with 14 dof) The variation in Hours that regression will try to explain

3 Our better method of forecasting hours for job A would use a mean of 10.51 and standard deviation of 2.77 (and the t-distribution with 13 dof) The variation in Hours regression leaves unexplained. MSFHours 262 34.24.17 294.42 34.34.75 85.94.83 143.26.67 85.57 140.67.08 140.67.17 40.47.17 10110 239.712 179.312.5 126.513.67 140.815.08 SUMMARY OUTPUT Regression Statistics Multiple R0.7260033 R Square0.5270808 Adjusted R Square0.4907024 Standard Error2.7735959 Observations15 ANOVA df Regression1 Residual13 Total14 Coefficients Intercept3.312316 MSF0.0444895 Adjusted R-square Pg 9-12 Pfeifer note

4 Adjusted R-square Pg 9-12 Pfeifer note

5 From the Pfeifer note Adj R-square = 0.0 Adj R-square = 0.5 Adj R-square = 1.0 Standard error = 0 Standard error = s

6 Why Pfeifer says R2 is over-rated There is no standard for how large it should be. – In some situations an adjusted R 2 of 0.05 would be FANTASTIC. In others, an adjusted R 2 of 0.96 would be DISAPOINTING. It has no real use. – Unlike “standard error” which is needed to make probability forecasts. It is usually redundant – When comparing models, lower standard errors mean higher adj R 2 – The correlation coefficient (which shares the same sign as b) ≈ the square root of adj R 2.

7 The Coal Pile Example The firm needed a way to estimate the weight of a coal pile (based on it’s dimensions) WDhd 56201015 93251020 161301224 31151210 70201413 76201413 375401632 3415148 4520816 58201015 SUMMARY OUTPUT Regression Statistics Multiple R0.986792416 R Square0.973759272 Adjusted R Square0.960638908 Standard Error20.56622179 Observations10 ANOVA df Regression3 Residual6 Total9 Coefficients Intercept-294.6954733 D-15.12016461 h23.02366255 d27.62139918 96% of the variation in W is explained by this regression. We just used MULTIPLE regression.

8 The Coal Pile Example Engineer Bob calculated the Volume of each pile and used simple regression… 100% of the variation in W is explained by this regression. Standard error went from to 20.6 to 2.8!!! W Vol 562421.64 933992.44 1616898.94 311492.26 703038.44 763038.44 37516353.04 341499.06 452044.13 582421.64 SUMMARY OUTPUT Regression Statistics Multiple R0.999673782 R Square0.99934767 Adjusted R Square0.999266129 Standard Error2.808218162 Observations10 ANOVA df Regression1 Residual8 Total9 Coefficients Intercept0.668821408 Vol0.022970159

9 The Four Assumptions Sec 5 of Pfeifer note Sec 12.4 of EMBS

10 Our better method of forecasting hours for job A would use a mean of 10.51 and standard deviation of 2.77 (and the t-distribution with 13 dof) MSFHours 262 34.24.17 294.42 34.34.75 85.94.83 143.26.67 85.57 140.67.08 140.67.17 40.47.17 10110 239.712 179.312.5 126.513.67 140.815.08 SUMMARY OUTPUT Regression Statistics Multiple R0.7260033 R Square0.5270808 Adjusted R Square0.4907024 Standard Error2.7735959 Observations15 ANOVA df Regression1 Residual13 Total14 Coefficients Intercept3.312316 MSF0.0444895 The four assumptions Linearity Independence (all 15 points count equally) homoskedasticity Normality Sec 5 of Pfeifer note Sec 12.4 of EMBS

11 Hypotheses H0: P=0.5 (LTT, wunderdog) H0: Independence (supermarket job and response, treatment and heart attack, light and myopia, tosser and outcome) H0: μ=100 (IQ) H0: μ M = μ F (heights, weights, batting average) H0: μ compact = μ mid = μ large (displacement) P 13 of Pfeifer note Sec 12.5 of EMBS

12 H0: b=0 P 13 of Pfeifer note Sec 12.5 of EMBS

13 Testing b=0 is EASY!!! MSFHours 262 34.24.17 294.42 34.34.75 85.94.83 143.26.67 85.57 140.67.08 140.67.17 40.47.17 10110 239.712 179.312.5 126.513.67 140.815.08 CoefficientsStandard Errort StatP-valueLower 95%Upper 95% Intercept3.31231.40212.36240.03440.28326.3414 MSF0.04450.01173.80640.00220.01920.0697 The standard error of the coefficient The t-stat to test b=0. The 2-tailed p- value. P 13 of Pfeifer note Sec 12.5 of EMBS

14 Using Yes/No variable in Regression Car Class Displaceme ntFuel TypeHwy MPG 1 Midsize3.5R28 2 Midsize3R26 3 Large3P26 4 Large3.5P25.......... 58 Compact6P20 59 Midsize2.5R30 60 Midsize2R32 Categorical Numerical n=60 Sec 8 of Pfeifer note Sec 13.7 of EMBS Does MPG “depend” on fuel type?

15 Fuel type (yes/no) and mpg (numerical) Un-stack the data so there are two columns of MPG data. Data Analysis, T-test two sample t-Test: Two-Sample Assuming Equal Variances PR Mean24.3333327.70833 Variance12.49.519928 Observations3624 Pooled Variance11.2579 Hypothesized Mean Difference0 df58 t Stat-3.81704 P(T<=t) one-tail0.0001650.999835 t Critical one-tail1.671553 P(T<=t) two-tail0.000331 t Critical two-tail2.001717 Sec 8 of Pfeifer note Sec 13.7 of EMBS H0: μ P = μ R Or H0: μ P – μ R = 0

16 Using Yes/No variables in Regression 1.Convert the categorical variable into a 1/0 DUMMY Variable. – Use an if statement to do this. – It won’t matter which is assigned 1, which is assigned 0. – It doesn’t even matter what 2 numbers you assign to the two categories (regression will adjust) 2.Regress MPG (numerical) on DUMMY (1/0 numerical) 3.Test H0: b=0 using the regression output. Sec 8 of Pfeifer note Sec 13.7 of EMBS

17 Using Yes/No variables in Regression Fuel TypeDprem Hwy MPG R028 R026 P1 P125... P121 P125 P120 R030 R032 SUMMARY OUTPUT Regression Statistics Adj R Square0.1870 Standard Error3.3553 Observations60 ANOVA dfSSMSFSig F Regression1164.025 14.5703.306E-04 Residual58652.95811.258 Total59816.983 CoeffStd Errort StatP-value Intercept27.7080.684940.45643.321E-44 Dprem-3.3750.8842-3.81703.306E-04 Sec 8 of Pfeifer note Sec 13.7 of EMBS

18 Regression with one Dummy variable H0: μ P = μ R Or H0: μ P – μ R = 0 Or H0: b = 0

19 What we learned today We learned about “adjusted R square” – The most over-rated statistic of all time. We learned the four assumptions required to use regression to make a probability forecast of Y│X. – And how to check each of them. We learned how to test H0: b=0. – And why this is such an important test. We learned how to use a yes/no variable in a regression. – Create a dummy variable.


Download ppt "Class 23 The most over-rated statistic The four assumptions The most Important hypothesis test yet Using yes/no variables in regressions."

Similar presentations


Ads by Google