# Class 25 T-test 2-sample ≡ Regression with Dummy Understanding Multiple Regression. ANOVA ≡ Regression with p-1 Dummies EMBS 13.7 Pfeifer Note: section.

## Presentation on theme: "Class 25 T-test 2-sample ≡ Regression with Dummy Understanding Multiple Regression. ANOVA ≡ Regression with p-1 Dummies EMBS 13.7 Pfeifer Note: section."— Presentation transcript:

Class 25 T-test 2-sample ≡ Regression with Dummy Understanding Multiple Regression. ANOVA ≡ Regression with p-1 Dummies EMBS 13.7 Pfeifer Note: section 8 (pages 39-42)

T-test 2-sample ≡ Regression with Dummy t-Test: Two-Sample Assuming Equal Variances Hours(S)Hours(W) Mean10.058869.728033 Variance3.9676023.399553 Observations7061 Pooled Variance3.703393 Hypothesized Mean Difference0 df129 t Stat0.981467 P(T<=t) one-tail0.1641 t Critical one-tail1.656752 P(T<=t) two-tail0.328199 t Critical two-tail1.978524 MilesStopsHoursDs 331310.170 20628.000 22148.250........ 320911.501 18199.501 369711.751 ANOVA dfSSMSFSignif F Regression13.5673977233.56740.9632780.3282 Residual129477.73767253.70339 Total130481.3050702 CoefficientsStandard Errort StatP-value Intercept9.72800.246439.480.0000 Ds0.33080.33710.980.3282 H0: μ S = μ W H0: b=0

Hours vs D s The average Spencer route took 1/3 Hours more than the average Williams route. The sample mean hours was 9.73 for Williams and 10.06 for Spencer. The b coefficient was NOT statistically significant.

Regression Line goes through the two sample means. The slope is ALWAYS the difference in sample means.

Left-Handers Die Younger, Study Says; Finding That Trait Cuts Lifespan 9 Years Draws Surprise, Skepticism April 4, 1991 | Malcolm Gladwell | Copyright Malcolm Gladwell Surveyed next of kin of death records of 2 California counties to determine handedness of the deceased Young children and homicide victims were eliminated. Age of Death (AOD) in years was regressed against DR (1 if right, 0 if left)

Pfeifer’s Trick They want you to assume X causes Y. ALWAYS ask if Y could be causing X. And then ask if both Y and X are caused by Z.

Retailers running Oracle are 32% more profitable than their peers.

Female athletes in the nationwide survey were less than half as likely to get pregnant as female non-athletes (5% and 11%, respectively).

People without health insurance more likely to forego routine physical exams Medical Studies/Trials Published: Wednesday, 4-Apr-2007

"There is a fairly long history of research showing that early cannabis (marijuana) use is associated with increased risks for later use of so- called 'hard drugs,' but that research is based on the fact that most heroin and cocaine users report first having used cannabis," says lead author Michael T.

Understanding Multiple Regression In excel, just highlight multiple adjacent columns of independent (X) variables. Regression Output gives a coefficient for each of the X variables. –As well as a standard error, t-stat, p-value The multiple regression equation is a PACKAGE DEAL. –You have to use the entire equation to make valid predictions.

Multiple Regression Example MilesStopsDsHours 3313010.17 206208.00 221408.25........ 3209111.50 181919.50 3697111.75 ANOVA dfSSMSFSignificance F Regression3367.7819122.5940137.14761.1566E-39 Residual127113.52320.8939 Total130481.3051 CoefficientsStandard Errort StatP-value Intercept4.20870.300913.991.740E-27 Miles0.01680.000917.822.386E-36 Stops0.32340.03299.842.491E-17 Ds-0.96490.1788-5.403.231E-07

Multiple Regression Coefficients Intercept4.2087 Miles0.0168 Stops0.3234 Ds-0.9649 Intercept11 Miles260 Stops66 Ds01 Hours Hat10.5279.562 Point forecast for route with 260 miles, 6 stops, driven by Spencer Point forecast for route with 260 miles, 6 stops, driven by williams

Multiple Regression Example MilesStopsDsHours 3313010.17 206208.00 221408.25........ 3209111.50 181919.50 3697111.75 ANOVA dfSSMSFSignificance F Regression3367.7819122.5940137.14761.1566E-39 Residual127113.52320.8939 Total130481.3051 CoefficientsStandard Errort StatP-value Intercept4.20870.300913.991.740E-27 Miles0.01680.000917.822.386E-36 Stops0.32340.03299.842.491E-17 Ds-0.96490.1788-5.403.231E-07 The coefficient of Ds = -0.96 The coefficient of Ds IS significant Spencer takes LESS time!

What??? ANOVA dfSSMSFSignif F Regression13.5673977233.56740.9632780.3282 Residual129477.73767253.70339 Total130481.3050702 CoefficientsStandard Errort StatP-value Intercept9.72800.246439.480.0000 Ds0.33080.33710.980.3282 ANOVA dfSSMSFSignificance F Regression3367.7819122.5940137.14761.1566E-39 Residual127113.52320.8939 Total130481.3051 CoefficientsStandard Errort StatP-value Intercept4.20870.300913.991.740E-27 Miles0.01680.000917.822.386E-36 Stops0.32340.03299.842.491E-17 Ds-0.96490.1788-5.403.231E-07 Spencer takes more time. Spencer takes less time. Yes, he does!!!!

What?? When Packaged with Miles and Stops, subtract 0.96 if Spencer, not Williams, drives. Add 0.33 hours if Spencer, not Williams, drove.

Multiple Regression The coefficient of X depends on what other X’s are in the model! –Alone, it is how the forecast of Y changes if X changes by 1 (not keeping all the other X’s constant). –In a multiple regression, the coefficient of X is how the forecast of Y changes if X changes by 1 (KEEPING all the other X’s constant).

Multiple Regression Allows us to compare Williams and Spencer even though they drove different difficulty routes…if we have the data. It is the ANSWER to the tough question. –S hours are higher, but perhaps because S had higher Miles and Stops –In the multiple regression, we separate the effects of HOURS, MILES, and DRIVER on Hours. –So the DRIVER gets the coefficient he deserves because MILES and STOPS get their own coefficients.

Other tough questions? Hospital A has a high death rate –But maybe A treats sicker people. Private School kids do better in college –But maybe they were smarter to begin with..had access to tutors, etc. ND had a great record –But maybe they played an easier schedule People who took the expensive drug had better outcomes –But the drug was expensive. Maybe those who took the drug had better health care, better diets, etc. than those who did not. People who took the drug (followed instructions) did better. –But maybe taking the drug is a signal of other things about these people that explain why they did better. Women make 70 cents on the dollar compared to men. Girls who play sports do better in school.

Price vs Speed and Type Corporate printers were higher priced –In part because they were faster? Faster printers were higher priced –In part because they were corporate? NameTypeDcorpSpeedPrice Minolta-QMS PagePro 1250WSmall Office012199 Brother HL-1850Small Office010499 Lexmark E320Small Office012.2299 Minolta-QMS PagePro 1250ESmall Office010.3299 HP Laserjet 1200Small Office011.7399 Xerox Phaser 4400/NCorporate117.81850 Brother HL-2460NCorporate116.11000 IBM Infoprint 1120nCorporate111.81387 Lexmark W812Corporate119.82089 Oki Data B8300nCorporate128.22200

Regression Statistics Multiple R0.95020.90240.8409 R Square0.90290.81440.7071 Adjusted R Square0.87510.79120.6705 Standard Error281.9757364.6325458.0249 Observations10 ANOVA df Regression211 Residual788 Total999 Coefficients Intercept-312.86339-745.480629 Dcorp931.241366.2 Speed58.00117.9173201

Total vs Exams one and two IDExam OneExam TwoTotal 110200210 220180200 340120160 46089149 58050130 69060150 710010110

Designed Experiment X1X2Y 8 1 22 122 1128 Coefficients Intercept20 X155 X25 5 Multiple coefficients are different than simple only when X’s are correlated. Regression accounts for the correlation among Xs.

Regression hypothesis Testing Simple ANOVA dfSSMSFSignif F Regression13.5673977233.56740.9632780.3282 Residual129477.73767253.70339 Total130481.3050702 CoefficientsStandard Errort StatP-value Intercept9.72800.246439.480.0000 Ds0.33080.33710.980.3282 H0: b=0

ANOVA dfSSMSFSignificance F Regression3367.7819122.5940137.14761.1566E-39 Residual127113.52320.8939 Total130481.3051 CoefficientsStandard Errort StatP-value Intercept4.20870.300913.991.740E-27 Miles0.01680.000917.822.386E-36 Stops0.32340.03299.842.491E-17 Ds-0.96490.1788-5.403.231E-07 H0: b1=b2=b3=0 H0: b1=0 H0: b2=0 H0: b3=0 Regression Hypothesis Testing Multiple As part of the multiple regression package. H0: b3=0│b1,b2

t-Test: Two-Sample Assuming Equal Variances Hours(S)Hours(W) Mean10.058869.728033 Variance3.9676023.399553 Observations7061 Pooled Variance3.703393 Hypothesized Mean Difference0 df129 t Stat0.981467 P(T<=t) one-tail0.1641 t Critical one-tail1.656752 P(T<=t) two-tail0.328199 t Critical two-tail1.978524 MilesStopsHoursDs 331310.170 20628.000 22148.250........ 320911.501 18199.501 369711.751 ANOVA dfSSMSFSignif F Regression13.5673977233.56740.9632780.3282 Residual129477.73767253.70339 Total130481.3050702 CoefficientsStandard Errort StatP-value Intercept9.72800.246439.480.0000 Ds0.33080.33710.980.3282 H0: μ S = μ W H0: b=0 T-test 2-sample ≡ Regression with Dummy

ANOVA ≡ Regression with p-1 Dummies H0: μ C = μ L = μ M Anova: Single Factor SUMMARY GroupsCountSumAverageVariance Compact1981.84.3051.281 Large1653.13.3190.160 Midsize2562.32.4920.216 ANOVA Source of VariationSSdfMSFP-value Between Groups35.517217.75933.0452.96E-10 Within Groups30.632570.537 Total66.14959

ANOVA ≡ Regression with p-1 Dummies 3. Test H0: bc = bL = 0 p-value is “significance F” Car ClassDispFuelMPGDcDLDm 1 Midsize3.5R28 001 2 Midsize3R26 001 3 Large3P26 010................ 58 Compact6P20 100 59 Midsize2.5R30 001 60 Midsize2R32 001 SUMMARY OUTPUT Regression Statistics Multiple R0.733 R Square0.537 Adjusted R Square0.521 Standard Error0.733 Observations60 ANOVA dfSSMSFSignificance F Regression235.51717.75933.0452.96E-10 Residual5730.6320.537 Total5966.149 CoefficientsStandard Errort StatP-value Intercept2.4920.14716.9975.53E-24 Dc1.8130.2238.1274.23E-11 DL0.8270.2353.5238.49E-04 1. Create dummy variables 2. Regress Displacement on any 2 of the 3 dummies.

Class 25 T-test 2-sample ≡ Regression with Dummy Understanding Multiple Regression. –If X’s are correlated (and they usually are), multiple and simple coefficients measure different things. I hope you know what… ANOVA ≡ Regression with p-1 Dummies –Don’t use an index comp=1, mid=2, large=3. –Create p-1 dummies (columns) EMBS 13.7 Pfeifer Note: section 8 (pages 39-42)

Assignment 26 Due Wednesday

Download ppt "Class 25 T-test 2-sample ≡ Regression with Dummy Understanding Multiple Regression. ANOVA ≡ Regression with p-1 Dummies EMBS 13.7 Pfeifer Note: section."

Similar presentations