# Exam Feb 28: sets 1,2 Set 1 due Thurs Memo C-1 due Feb 14 Free tutoring will be available next week Plan A: MW 4-6PM OR Plan B: TT 2-4PM VOTE for Plan.

## Presentation on theme: "Exam Feb 28: sets 1,2 Set 1 due Thurs Memo C-1 due Feb 14 Free tutoring will be available next week Plan A: MW 4-6PM OR Plan B: TT 2-4PM VOTE for Plan."— Presentation transcript:

Exam Feb 28: sets 1,2 Set 1 due Thurs Memo C-1 due Feb 14 Free tutoring will be available next week Plan A: MW 4-6PM OR Plan B: TT 2-4PM VOTE for Plan A or Plan B Announce results Thurs

Kinderman Supplement Ch 2: Multiple Regression Ch 3: Analysis of Variance

MULTIPLE REGRESSION Kinderman, Ch 2

Example Reference: Statistics for Managers By Levine, David M; Berenson; Stephan Second edition (1999) Prentice Hall

Y = dependent variable = heating oil sales (gal) X1 = Temperature (degrees) X2 = Insulation (inches) X1 and X2 are independent variables Y = bo + b1X1 + b2X2 Enter data to Excel NOTE: If you can’t find Data Analysis, try Add-Ins

Y = 562 –5X1 –20X2 Bottom table: Coefficient Column

Interpret coefficients Intercept = bo = 562: If temp =0 and insulation = 0, heating oil sales = 562 b1 = -5: For all homes with same insulation, each 1 degree increase in temperature should decrease heating oil sales by 5 gallons b2 = -20: For all months with same temp, each additional 1 inch of insulation should decrease sales by 20 gallons

Categorical Variables X = 0 or 1 Example: 0 if male, 1 if female Example: 1 if graduate, 0 if drop out Example: 1 if citizen, 0 if alien NOTE: not in this fuel oil example

Estimate sales if temp = 30, insulation = 6 Y = 562 -5(30) – 20(6) = 292 gal

Standard Error = 26 Top table Interpret: Typical fuel oil sales were about 26 gal away from average fuel oil sales of other homes with same temp and insulation

COEFFICIENT OF MULTIPLE DETERMINATION Top table, R square Interpret: 96% of total variation in fuel oil sales can be explained by variation in temperature and insulation

Is there a relationship between all independent variables and dependent variables? Ho: Null hypothesis: All coefficients = 0 Ho: NO Relationship H1: Alternative hypothesis: At least one coefficient is not zero H1: There is a relationship

Computer output: Sample data Hypotheses: Population parameters Ho: Parameters = 0, but sample data makes it appear that there is a relationship Simple regression: Ho: zero slope vs H1: slope positive or slope negative

Exponents 10 -1 = 0.1 10 -2 =0.01

Decision Rule Reject Ho if “Significance F” < alpha Middle table Fuel oil example: Significance F = 1.6E-09 Excel: E = Exponent 1.6E-09 = 1.6*10 -9 =0.0000000016 Approaches zero as limit

Significance F=p-value Excel uses p-value only if t distribution Significance F = probability F is greater than Sample F

Assume alpha =.05 Since 0 <.05, reject Ho We conclude there IS a relationship between fuel oil sales and the independent variables

Which independent variables seem to be important factors? Ho: Temperature not important factor H1: Temperature is important Reject Ho if p-value < alpha Bottom table: p-value column, X1 row P-value = 1.6E-09, or zero Reject Ho Temp is important

Insulation Ho: insulation unimportant H1: insulation important P-value = 1.9E-06, or zero Reject Ho Insulation important

Analysis of Variance (ANOVA) Kinderman, Ch 3

X = number of auto accidents Live in CityLive in SuburbLive in rural 121 300 210

Hypothesis Testing Ho: µ 1 = µ 2 = µ 3 H1: Not all means are = H1: There are differences among 3 populations H1: Average number of accidents different depending on where you live

This course: manual calculations If you used computer software, you could have as many populations as needed Homework, exam: 3 populations Computer: 4 or more populations Ex: Ethnic classifications at CSUN

Sample Sizes Column 1: n1 = number of drivers sampled from policyholders living in city = 3 Column 2: n2 = sampled from suburban drivers = 3 Col 3: n3 = sampled from rural = 3 Number of rows of data Kinderman example: Different sample sizes

n = n1 + n2 + n3 n =3 + 3 + 3 = 9

X = number of auto accidents Live in CityLive in SuburbLive in rural 1=X1121 3=X2100 2=X3110

Do not assume n1=3 on exam

X = number of auto accidents Live in CityLive in SuburbLive in rural 1=X1121 3=X2100 2=X3110 Σ=6Σ=3Σ=1 Sample mean=2Sample mean=1Sample mean=.3

Hypotheses Ho: Differences in sample means due to chance, but no differences if ALL drivers were included (Prop 103) H1: Population means are different because city drivers have more accidents

Grand mean = 1.1

SSB = Sum of Squares Between Between 3 groups Explained Variation Here: Variation in number of accidents explained by where you live (city, suburb, rural) If where you live did not affect accidents, we would expect SSB = 0 Next slide: SSB formula

X = number of auto accidents Live in CityLive in SuburbLive in rural 1=X1121 3=X2100 2=X3110 Σ=6Σ=3Σ=1 Sample mean=2Sample mean=1Sample mean=.3

This example SSB = 3(2-1.1) 2 +3(1-1.1) 2 +3(.3-1.1) 2 =4.2

MSB = Mean Square Between MSB = SSB/2 Note: OK for this course, but bigger problems would have bigger denominator MSB = 4.2/2 = 2.1

SSE= Sum of Squared Error Variation within group Ex: Variation within group of city drivers Unexplained variation If every city driver had same number of accidents, we would expect SSE = 0 Formula on next slide

X = number of auto accidents Live in CityLive in SuburbLive in rural 1=X1121 3=X2100 2=X3110 Σ=6Σ=3Σ=1 Sample mean=2Sample mean=1Sample mean=.3

(1-2) 2 +(3-2) 2 +(2-2) 2 + (2-1) 2 + (0-1) 2 + (1-1) 2 + (1-.3) 2 + (0-.3) 2 + (0-.3) 2 =4.67

MSE = Mean Square Error Mean Square Within Next slide is formula for this course. Bigger problems have bigger denominator

MSE = 0.78

F RATIO Sample F statistic Test statistic SAM F

Sam F = 2.7 Extreme case#1: Where you live does not affect number of accidents, so SSB =0, so MSB = 0, so sam F = 0 Extreme case #2: Every city driver has same number of accidents, etc, so SSE = 0, so MSE = 0, so sam F is very large

Critical F = cr F F table at end of Kinderman Supplement Appendix A, Table A.3, p 60 in Second Edition (assumes alpha =.05) Column = 2 (denominator of MSB) Row = n – 3 (denominator of MSE) Correct for this course, different for bigger problems

Example Col 2 Row = 9-3 = 6 Cr F = 5.14

Hypothesis Testing Ho: µ 1 = µ 2 = µ 3 H1: Not all means are = H1: There are differences among 3 populations H1: Average number of accidents different depending on where you live

Decision Rule Reject Ho if sam F > cr F Only right tail since SSB>0, SSE>0, so sam F>0 If you reject Ho, you conclude that where you live affects number of accidents If you do not reject Ho, you conclude that there is too much variation within city drivers, etc to draw any conclusions

Example Since 2.7 is NOT > 5.14, we can NOT reject Ho Differences between city and suburb, etc are NOT significant

Computer Approach Similar to multiple regression Reject Ho if Significance F < alpha Needed if more than 3 groups

Download ppt "Exam Feb 28: sets 1,2 Set 1 due Thurs Memo C-1 due Feb 14 Free tutoring will be available next week Plan A: MW 4-6PM OR Plan B: TT 2-4PM VOTE for Plan."

Similar presentations