Presentation is loading. Please wait.

Presentation is loading. Please wait.

Class 25 T-test 2-sample ≡ Regression with Dummy Understanding Multiple Regression. ANOVA ≡ Regression with p-1 Dummies EMBS 13.7 Pfeifer Note: section.

Similar presentations


Presentation on theme: "Class 25 T-test 2-sample ≡ Regression with Dummy Understanding Multiple Regression. ANOVA ≡ Regression with p-1 Dummies EMBS 13.7 Pfeifer Note: section."— Presentation transcript:

1 Class 25 T-test 2-sample ≡ Regression with Dummy Understanding Multiple Regression. ANOVA ≡ Regression with p-1 Dummies EMBS 13.7 Pfeifer Note: section 8 (pages 39-42)

2 T-test 2-sample ≡ Regression with Dummy t-Test: Two-Sample Assuming Equal Variances Hours(S)Hours(W) Mean Variance Observations7061 Pooled Variance Hypothesized Mean Difference0 df129 t Stat P(T<=t) one-tail t Critical one-tail P(T<=t) two-tail t Critical two-tail MilesStopsHoursDs ANOVA dfSSMSFSignif F Regression Residual Total CoefficientsStandard Errort StatP-value Intercept Ds H0: μ S = μ W H0: b=0

3 Hours vs D s The average Spencer route took 1/3 Hours more than the average Williams route. The sample mean hours was 9.73 for Williams and for Spencer. The b coefficient was NOT statistically significant.

4 Regression Line goes through the two sample means. The slope is ALWAYS the difference in sample means.

5 Left-Handers Die Younger, Study Says; Finding That Trait Cuts Lifespan 9 Years Draws Surprise, Skepticism April 4, 1991 | Malcolm Gladwell | Copyright Malcolm Gladwell Surveyed next of kin of death records of 2 California counties to determine handedness of the deceased Young children and homicide victims were eliminated. Age of Death (AOD) in years was regressed against DR (1 if right, 0 if left)

6 Pfeifer’s Trick They want you to assume X causes Y. ALWAYS ask if Y could be causing X. And then ask if both Y and X are caused by Z.

7 Retailers running Oracle are 32% more profitable than their peers.

8 Female athletes in the nationwide survey were less than half as likely to get pregnant as female non-athletes (5% and 11%, respectively).

9 People without health insurance more likely to forego routine physical exams Medical Studies/Trials Published: Wednesday, 4-Apr-2007

10 "There is a fairly long history of research showing that early cannabis (marijuana) use is associated with increased risks for later use of so- called 'hard drugs,' but that research is based on the fact that most heroin and cocaine users report first having used cannabis," says lead author Michael T.

11 Understanding Multiple Regression In excel, just highlight multiple adjacent columns of independent (X) variables. Regression Output gives a coefficient for each of the X variables. –As well as a standard error, t-stat, p-value The multiple regression equation is a PACKAGE DEAL. –You have to use the entire equation to make valid predictions.

12 Multiple Regression Example MilesStopsDsHours ANOVA dfSSMSFSignificance F Regression E-39 Residual Total CoefficientsStandard Errort StatP-value Intercept E-27 Miles E-36 Stops E-17 Ds E-07

13 Multiple Regression Coefficients Intercept Miles Stops Ds Intercept11 Miles260 Stops66 Ds01 Hours Hat Point forecast for route with 260 miles, 6 stops, driven by Spencer Point forecast for route with 260 miles, 6 stops, driven by williams

14 Multiple Regression Example MilesStopsDsHours ANOVA dfSSMSFSignificance F Regression E-39 Residual Total CoefficientsStandard Errort StatP-value Intercept E-27 Miles E-36 Stops E-17 Ds E-07 The coefficient of Ds = The coefficient of Ds IS significant Spencer takes LESS time!

15 What??? ANOVA dfSSMSFSignif F Regression Residual Total CoefficientsStandard Errort StatP-value Intercept Ds ANOVA dfSSMSFSignificance F Regression E-39 Residual Total CoefficientsStandard Errort StatP-value Intercept E-27 Miles E-36 Stops E-17 Ds E-07 Spencer takes more time. Spencer takes less time. Yes, he does!!!!

16 What?? When Packaged with Miles and Stops, subtract 0.96 if Spencer, not Williams, drives. Add 0.33 hours if Spencer, not Williams, drove.

17 Multiple Regression The coefficient of X depends on what other X’s are in the model! –Alone, it is how the forecast of Y changes if X changes by 1 (not keeping all the other X’s constant). –In a multiple regression, the coefficient of X is how the forecast of Y changes if X changes by 1 (KEEPING all the other X’s constant).

18 Multiple Regression Allows us to compare Williams and Spencer even though they drove different difficulty routes…if we have the data. It is the ANSWER to the tough question. –S hours are higher, but perhaps because S had higher Miles and Stops –In the multiple regression, we separate the effects of HOURS, MILES, and DRIVER on Hours. –So the DRIVER gets the coefficient he deserves because MILES and STOPS get their own coefficients.

19 Other tough questions? Hospital A has a high death rate –But maybe A treats sicker people. Private School kids do better in college –But maybe they were smarter to begin with..had access to tutors, etc. ND had a great record –But maybe they played an easier schedule People who took the expensive drug had better outcomes –But the drug was expensive. Maybe those who took the drug had better health care, better diets, etc. than those who did not. People who took the drug (followed instructions) did better. –But maybe taking the drug is a signal of other things about these people that explain why they did better. Women make 70 cents on the dollar compared to men. Girls who play sports do better in school.

20 Price vs Speed and Type Corporate printers were higher priced –In part because they were faster? Faster printers were higher priced –In part because they were corporate? NameTypeDcorpSpeedPrice Minolta-QMS PagePro 1250WSmall Office Brother HL-1850Small Office Lexmark E320Small Office Minolta-QMS PagePro 1250ESmall Office HP Laserjet 1200Small Office Xerox Phaser 4400/NCorporate Brother HL-2460NCorporate IBM Infoprint 1120nCorporate Lexmark W812Corporate Oki Data B8300nCorporate

21 Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations10 ANOVA df Regression211 Residual788 Total999 Coefficients Intercept Dcorp Speed

22 Total vs Exams one and two IDExam OneExam TwoTotal

23 Designed Experiment X1X2Y Coefficients Intercept20 X155 X25 5 Multiple coefficients are different than simple only when X’s are correlated. Regression accounts for the correlation among Xs.

24 Regression hypothesis Testing Simple ANOVA dfSSMSFSignif F Regression Residual Total CoefficientsStandard Errort StatP-value Intercept Ds H0: b=0

25 ANOVA dfSSMSFSignificance F Regression E-39 Residual Total CoefficientsStandard Errort StatP-value Intercept E-27 Miles E-36 Stops E-17 Ds E-07 H0: b1=b2=b3=0 H0: b1=0 H0: b2=0 H0: b3=0 Regression Hypothesis Testing Multiple As part of the multiple regression package. H0: b3=0│b1,b2

26 t-Test: Two-Sample Assuming Equal Variances Hours(S)Hours(W) Mean Variance Observations7061 Pooled Variance Hypothesized Mean Difference0 df129 t Stat P(T<=t) one-tail t Critical one-tail P(T<=t) two-tail t Critical two-tail MilesStopsHoursDs ANOVA dfSSMSFSignif F Regression Residual Total CoefficientsStandard Errort StatP-value Intercept Ds H0: μ S = μ W H0: b=0 T-test 2-sample ≡ Regression with Dummy

27 ANOVA ≡ Regression with p-1 Dummies H0: μ C = μ L = μ M Anova: Single Factor SUMMARY GroupsCountSumAverageVariance Compact Large Midsize ANOVA Source of VariationSSdfMSFP-value Between Groups E-10 Within Groups Total

28 ANOVA ≡ Regression with p-1 Dummies 3. Test H0: bc = bL = 0 p-value is “significance F” Car ClassDispFuelMPGDcDLDm 1 Midsize3.5R Midsize3R Large3P Compact6P Midsize2.5R Midsize2R SUMMARY OUTPUT Regression Statistics Multiple R0.733 R Square0.537 Adjusted R Square0.521 Standard Error0.733 Observations60 ANOVA dfSSMSFSignificance F Regression E-10 Residual Total CoefficientsStandard Errort StatP-value Intercept E-24 Dc E-11 DL E Create dummy variables 2. Regress Displacement on any 2 of the 3 dummies.

29 Class 25 T-test 2-sample ≡ Regression with Dummy Understanding Multiple Regression. –If X’s are correlated (and they usually are), multiple and simple coefficients measure different things. I hope you know what… ANOVA ≡ Regression with p-1 Dummies –Don’t use an index comp=1, mid=2, large=3. –Create p-1 dummies (columns) EMBS 13.7 Pfeifer Note: section 8 (pages 39-42)

30 Assignment 26 Due Wednesday


Download ppt "Class 25 T-test 2-sample ≡ Regression with Dummy Understanding Multiple Regression. ANOVA ≡ Regression with p-1 Dummies EMBS 13.7 Pfeifer Note: section."

Similar presentations


Ads by Google