2 Height and Weight Is CM or Inches the better predictor of KG? Whichever has the lower standard errorWill also have a variety of better statsNOT whichever has the bigger coefficientA multiple regression lets you testH0: all b’s = 0 (nothing in the model matters)H0: b1=0 given all the other b’sWhen using both CM and INCHESWe reject H0 b1=b2=0We fail to reject H0 b1=0 given b2We fail to reject H0 b2=0 given b1You need either CM or INCHES but not bothBecause they are highly correlatedRegressions ALWAYS go thru the sample averages
3 Things I expect you will know How to interpret a regression using p-1 dummy variablesThe p possible forecasts will equal the sample average Y for each of the p groupsThe intercept is the average of the left-out groupThe coefficients are differences in group averages.The p-value/significance F will match that from ANOVA single factor
4 Things I expect you will know How to interpret a residual (error)It is Y - 𝑌It is the distance each Y is from the line.Positive means above the line.They measure the difference between actual Y and expected Y (based on the X’s)The most over-weight girl (for her height) is the girl with the largest positive residual.Check the box to get residuals.
5 Things I expect you will know How to interpret a coefficient in a multiple regression.It measures the change in expected Y for a unit change in that X keeping all other Xs constant.If I keep miles and stops constant and change from williams to spencer, expect 0.97 hours less.If I change from Williams to Spencer, expect 0.33 hours more.It is the easy way to answer some questions.If the previous rating goes from 17.5 to 20, how will the expected ratings change? (by per point)
6 Things I expect you will know How to use a regression model to calculate a point forecast.Plug and chug.I use SUMPRODUCTYou must know what Xs to plug in.It is a package deal….you must know and plug in ALL the Xs.
7 Things I expect you will know How to use a regression model to calculate a probability.The question gives you the Y.You Plug and chug to get the 𝑌 .You calculate t = (Y - 𝑌 )/ standard errorUse t.dist.rt( t , dof)Dof is n – total number of regression terms.Requires the FOUR assumptions.
8 Things I expect you will know If the coefficient of X1 changes when X2 is included in the model…..You know X1 and X2 are correlated.You can use the two regression results to tell whether X1 and X2 are positively or negatively correlated.Ds was positively correlated with MilesFact was negatively correlated with StarsNobel was positively correlated with YanksSpeed was positively correlated with DcorporateExam 1 was negatively correlated with Exam 2.
9 Oh…Fact Movies had fewer Stars! UNDERSTANDINGCoefficientRegression TableConstantFactCoefficientRegression TableConstant12.568Fact1.799Stars1.259Oh…Fact Movies had fewer Stars!
10 Oh…Fact Movies had fewer Stars! Secret FormulaCoefficientRegression TableConstantFactCoefficientRegression TableConstant12.568Fact1.799Stars1.259Regress Y on X1𝑐 = 𝑏 − 𝑏 𝑏 2Regress Y on X1 and X2Oh…Fact Movies had fewer Stars!Regress Y on X1 and X2Regress X2 on X1
11 Secret Formula 𝑐 = 1.40−1.80 1.26 𝑐 =−0.32 Regress Y on X1 CoefficientRegression TableConstantFactCoefficientRegression TableConstant12.568Fact1.799Stars1.259Regress Y on X1Regress Y on X1 and X2𝑐 = 1.40−Regress Y on X1 and X2𝑐 =−0.32Regress X2 on X1
12 Oh…Fact Movies had fewer Stars! UNDDERSTANDINGCoefficientRegression TableConstantFactCoefficientRegression TableConstant12.568Fact1.799Stars1.259Oh…Fact Movies had fewer Stars!
13 Fact Movies averaged 0.32 fewer Stars! UNDERSTANDINGSecret FormulaCoefficientRegression TableConstantFactCoefficientRegression TableConstant12.568Fact1.799Stars1.259Fact Movies averaged 0.32 fewer Stars!
14 Regression is the line through a cloud of points Scatter-plot the cloudIt is up to YOU to interpret the results.Don’t assume X causes YY might be causing XBoth might be caused by ZDon’t assume better fitting lines are better at forecastingThey usually are not…..too good a fit means too complicated a model…..means poorer performance.
15 Class 28 Assignment Variable School Graduation Rate % of Classes Under 20Student/Faculty RatioAlumni Giving RateDescriptionThe name of theUniversityPercentage of enrollees who graduatePercentage of Classes offered with <= 20 students.Number of students enrolled divided by total number of facultyPercentage of living alumni who gave to the University in 2000Mean83.04255.72911.54229.271Median83.559.510.529Mode926513Standard Deviation8.60713.1944.85113.441Skewness-0.282-0.5010.5820.370Minimum6637Maximum97772367Count48
16 Regress Giving Rate on Grad Rate Check if coeff is positive 1. Test the hypothesis that graduation rate and alumni giving rate are (linearly) independent. We expect universities with higher graduation rates to have higher mean giving rates. [15 points]Regress Giving Rate on Grad RateCheck if coeff is positiveDivide reported p-value (found in two places) by 2.Reject if less than 0.05.CoefficientsStandard Errort StatP-valueIntercept-68.7612.58-5.461.82E-06Graduation Rate184.108.40.2065.24E-10
17 2. If the graduation rate of school A is 5 percentage points higher than that of school B, how much higher do we expect school A’s giving rate to be? [10 points]Using the above regression (graduation rate is all we know), the expected giving rate will be 1.18*5 = 5.9 percentage points higher for school A.
18 3. If you learn that A and B above have identical student to faculty ratios, what is your revised answer to question 2? Be certain to explain why it went up (if it went up) or why it went down (if it went down) or why it stayed the same. Direct your response to a university administrator. [15 points]CoefficientsStandard Errort StatP-valueInterceptGraduation RateStudent/Faculty RatioIF we keep SFR constant, expected Giving Rate goes up 0.76 points per point of graduation rate.If we don’t keep SFR constant, expected Giving Rates went up 1.18 points per point.Schools with higher grad rates had LOWER SFR (that makes sense)If we don’t hold SFR constant, increases in grad rate mean decreases in SFR and the combined effect of the two is 1.18.So….if grad rate is higher (but SFR is not), expected 0.76 increase.If grad rate is higher (and SFR is lower as in the data), expect 1.18 increase.
19 Don’t Use this variable. 4. Provide a point forecast of alumni giving rate for a university with graduation rate of 80, 65 percent of its classes with 20 or fewer students, and a student/faculty ratio of 20. [25 points]CoefficientsStandard Errort StatP-valueIntercept0.2433Graduation Rate0.74820.16604.50820.0000% of Classes Under 200.02900.13930.20840.8358Student/Faculty Ratio0.38670.0035Don’t Use this variable.CoefficientsInterceptGraduation RateStudent/Faculty Ratio18020POINT FORECAST16.43Use this model.Plug and Chug.The best model includes Grad Rate and SFR (% classes <20 not needed)
20 The university with the most negative residual. 5. Of the 48 universities in the data set, which one has the most surprisingly low alumni giving rate? [10 points]The university with the most negative residual.Use the best model, ask for residuals, find the minimum.MICHIGAN!
21 ANOVA or Regression of SFR on 2 dummies. 6. Bo notices that some of the 48 have “university” in their names, some have “college” and the rest have “institute”. Bo wonders whether these names are predictive of student/faculty ratio? (Formulate and test a relevant hypothesis.) [25 points]Three groups (p=3)ANOVA or Regression of SFR on 2 dummies.SUMMARY OUTPUTANOVAdfSSMSFSignificance FRegression22.32900.1090Residual45Total47CoefficientsStandard Errort StatP-valueIntercept0.71140.0000Dcollege3.41200.9156Dinstitute0.0363
22 Get Ready….. More practice problems (answers) on website. I’ll host Sunday night Office Hours.I am available Monday and Tuesday until 2pm.Check the website to see where I am…you are welcome to join us.