Download presentation

Presentation is loading. Please wait.

Published byLuisa Raff Modified over 2 years ago

1
Exam Feb 28: sets 1,2 Set 1 due Thurs Memo C-1 due Feb 14 Free tutoring will be available next week Plan A: MW 4-6PM OR Plan B: TT 2-4PM VOTE for Plan A or Plan B Announce results Thurs

2
Kinderman Supplement Ch 2: Multiple Regression Ch 3: Analysis of Variance

3
MULTIPLE REGRESSION Kinderman, Ch 2

4
Example Reference: Statistics for Managers By Levine, David M; Berenson; Stephan Second edition (1999) Prentice Hall

5
Y = dependent variable = heating oil sales (gal) X1 = Temperature (degrees) X2 = Insulation (inches) X1 and X2 are independent variables Y = bo + b1X1 + b2X2 Enter data to Excel NOTE: If you can’t find Data Analysis, try Add-Ins

6
Y = 562 –5X1 –20X2 Bottom table: Coefficient Column

7
Interpret coefficients Intercept = bo = 562: If temp =0 and insulation = 0, heating oil sales = 562 b1 = -5: For all homes with same insulation, each 1 degree increase in temperature should decrease heating oil sales by 5 gallons b2 = -20: For all months with same temp, each additional 1 inch of insulation should decrease sales by 20 gallons

8
Categorical Variables X = 0 or 1 Example: 0 if male, 1 if female Example: 1 if graduate, 0 if drop out Example: 1 if citizen, 0 if alien NOTE: not in this fuel oil example

9
Estimate sales if temp = 30, insulation = 6 Y = 562 -5(30) – 20(6) = 292 gal

10
Standard Error = 26 Top table Interpret: Typical fuel oil sales were about 26 gal away from average fuel oil sales of other homes with same temp and insulation

11
COEFFICIENT OF MULTIPLE DETERMINATION Top table, R square Interpret: 96% of total variation in fuel oil sales can be explained by variation in temperature and insulation

12
Is there a relationship between all independent variables and dependent variables? Ho: Null hypothesis: All coefficients = 0 Ho: NO Relationship H1: Alternative hypothesis: At least one coefficient is not zero H1: There is a relationship

13
Computer output: Sample data Hypotheses: Population parameters Ho: Parameters = 0, but sample data makes it appear that there is a relationship Simple regression: Ho: zero slope vs H1: slope positive or slope negative

14
Exponents 10 -1 = 0.1 10 -2 =0.01

15
Decision Rule Reject Ho if “Significance F” < alpha Middle table Fuel oil example: Significance F = 1.6E-09 Excel: E = Exponent 1.6E-09 = 1.6*10 -9 =0.0000000016 Approaches zero as limit

16
Significance F=p-value Excel uses p-value only if t distribution Significance F = probability F is greater than Sample F

17
Assume alpha =.05 Since 0 <.05, reject Ho We conclude there IS a relationship between fuel oil sales and the independent variables

18
Which independent variables seem to be important factors? Ho: Temperature not important factor H1: Temperature is important Reject Ho if p-value < alpha Bottom table: p-value column, X1 row P-value = 1.6E-09, or zero Reject Ho Temp is important

19
Insulation Ho: insulation unimportant H1: insulation important P-value = 1.9E-06, or zero Reject Ho Insulation important

20
Analysis of Variance (ANOVA) Kinderman, Ch 3

21
X = number of auto accidents Live in CityLive in SuburbLive in rural 121 300 210

22
Hypothesis Testing Ho: µ 1 = µ 2 = µ 3 H1: Not all means are = H1: There are differences among 3 populations H1: Average number of accidents different depending on where you live

23
This course: manual calculations If you used computer software, you could have as many populations as needed Homework, exam: 3 populations Computer: 4 or more populations Ex: Ethnic classifications at CSUN

24
Sample Sizes Column 1: n1 = number of drivers sampled from policyholders living in city = 3 Column 2: n2 = sampled from suburban drivers = 3 Col 3: n3 = sampled from rural = 3 Number of rows of data Kinderman example: Different sample sizes

25
n = n1 + n2 + n3 n =3 + 3 + 3 = 9

26
X = number of auto accidents Live in CityLive in SuburbLive in rural 1=X1121 3=X2100 2=X3110

29
Do not assume n1=3 on exam

31
X = number of auto accidents Live in CityLive in SuburbLive in rural 1=X1121 3=X2100 2=X3110 Σ=6Σ=3Σ=1 Sample mean=2Sample mean=1Sample mean=.3

34
Hypotheses Ho: Differences in sample means due to chance, but no differences if ALL drivers were included (Prop 103) H1: Population means are different because city drivers have more accidents

38
Grand mean = 1.1

39
SSB = Sum of Squares Between Between 3 groups Explained Variation Here: Variation in number of accidents explained by where you live (city, suburb, rural) If where you live did not affect accidents, we would expect SSB = 0 Next slide: SSB formula

41
X = number of auto accidents Live in CityLive in SuburbLive in rural 1=X1121 3=X2100 2=X3110 Σ=6Σ=3Σ=1 Sample mean=2Sample mean=1Sample mean=.3

42
This example SSB = 3(2-1.1) 2 +3(1-1.1) 2 +3(.3-1.1) 2 =4.2

43
MSB = Mean Square Between MSB = SSB/2 Note: OK for this course, but bigger problems would have bigger denominator MSB = 4.2/2 = 2.1

44
SSE= Sum of Squared Error Variation within group Ex: Variation within group of city drivers Unexplained variation If every city driver had same number of accidents, we would expect SSE = 0 Formula on next slide

47
X = number of auto accidents Live in CityLive in SuburbLive in rural 1=X1121 3=X2100 2=X3110 Σ=6Σ=3Σ=1 Sample mean=2Sample mean=1Sample mean=.3

48
(1-2) 2 +(3-2) 2 +(2-2) 2 + (2-1) 2 + (0-1) 2 + (1-1) 2 + (1-.3) 2 + (0-.3) 2 + (0-.3) 2 =4.67

49
MSE = Mean Square Error Mean Square Within Next slide is formula for this course. Bigger problems have bigger denominator

52
MSE = 0.78

53
F RATIO Sample F statistic Test statistic SAM F

56
Sam F = 2.7 Extreme case#1: Where you live does not affect number of accidents, so SSB =0, so MSB = 0, so sam F = 0 Extreme case #2: Every city driver has same number of accidents, etc, so SSE = 0, so MSE = 0, so sam F is very large

57
Critical F = cr F F table at end of Kinderman Supplement Appendix A, Table A.3, p 60 in Second Edition (assumes alpha =.05) Column = 2 (denominator of MSB) Row = n – 3 (denominator of MSE) Correct for this course, different for bigger problems

58
Example Col 2 Row = 9-3 = 6 Cr F = 5.14

59
Hypothesis Testing Ho: µ 1 = µ 2 = µ 3 H1: Not all means are = H1: There are differences among 3 populations H1: Average number of accidents different depending on where you live

60
Decision Rule Reject Ho if sam F > cr F Only right tail since SSB>0, SSE>0, so sam F>0 If you reject Ho, you conclude that where you live affects number of accidents If you do not reject Ho, you conclude that there is too much variation within city drivers, etc to draw any conclusions

61
Example Since 2.7 is NOT > 5.14, we can NOT reject Ho Differences between city and suburb, etc are NOT significant

62
Computer Approach Similar to multiple regression Reject Ho if Significance F < alpha Needed if more than 3 groups

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google