BA 275 Quantitative Business Methods Agenda Residual Analysis Multiple Linear Regression Adjusted R-squared Prediction
Simple Linear Regression Model population True effect of X on Y Estimated effect of X on Y sample Key questions: 1. Does X have any effect on Y? 2. If yes, how large is the effect? 3. Given X, what is the estimated Y?
Hypothesis Testing Key Q1: Does X have any effect on Y? b0 H0: b1 = 0 Ha: b1 ≠ 0 b1 SEb1 SEb0 Degrees of freedom = n – p – 1 p = # of independent variables used.
Interval Estimation Key Q2: If so, how large is the effect? b0 b1 SEb1 SEb0 Degrees of freedom = n – p – 1 p = # of independent variables used.
Prediction and Confidence Intervals Key Q3: Given X, what is the estimated Y? What is your estimated price of that 2000-sf house on the 9th street? Quick answer: estimated price = -15.1245 + 76.1745 (2) = 137.2245 What is the average price of a house that occupies 2000 sf? What is the difference?
Prediction and Confidence Intervals Prediction interval Confidence interval
Model Comparison: A Good Fit? SS = Sum of Squares = ???
Residual Analysis The three conditions required for the validity of the regression analysis are: the error variable is normally distributed. the error variance is constant for all values of x. the errors are independent of each other. How can we diagnose violations of these conditions?
Residual Analysis We do not have e (random error), but we can calculate residuals from the sample. Residual = actual Y – estimated Y Examining the residuals (or standardized residuals), help detect violations of the required conditions.
Residuals, Standardized Residuals, and Studentized Residuals
The random error e is normally distributed
The error variance se is constant for all values of X and estimated Y Constant spread !
The spread increases with y Constant Variance When the requirement of a constant variance is violated we have a condition of heteroscedasticity. Diagnose heteroscedasticity by plotting the residual against the predicted y. Residual + + + + + + + + + + + + + ^ + + + y + + + + + + + + The spread increases with y ^
The errors are independent of each other Do not want to see any pattern.
Non Independence of Error Variables Residual Residual + + + + + + + + + + + + + + + Time Time + + + + + + + + + + + + + Note the runs of positive residuals, replaced by runs of negative residuals Note the oscillating behavior of the residuals around zero.
Multiple Regression Model
Correlations
Fitted Model Q: Effect of AGE? H0: bAGE = 0 Ha: bAGE ≠ 0 Multiple Regression Analysis Dependent variable: Price Standard T Parameter Estimate Error Statistic P-Value CONSTANT -1336.41 173.344 -7.70957 0.0000 Age 12.7351 0.902317 14.1138 0.0000 Bidder 85.8023 8.70515 9.8565 0.0000 ? ? ? ? Q: Effect of AGE? H0: bAGE = 0 Ha: bAGE ≠ 0 Q: Effect of BIDDER? H0: bBIDDER = 0 Ha: bBIDDER ≠ 0 Degrees of freedom = ?
Fitted Model Fitted Model: Multiple Regression Analysis Dependent variable: Price Standard T Parameter Estimate Error Statistic P-Value CONSTANT -1336.41 173.344 -7.70957 0.0000 Age 12.7351 0.902317 14.1138 0.0000 Bidder 85.8023 8.70515 9.8565 0.0000 Fitted Model: Estimated price = -1336.41 + 12.7351 AGE + 85.8023 BIDDER
Analysis of Variance ? ?
Model Selection
Using the Model What is the total variation of auction prices? How much has been explained by the model? If there are 10 bidders and the age of the clock is 100 years old, what is the expected auction price? If AGE is held fixed and the number of bidders increases from 10 to 11, how much does PRICE increase?
Prediction and Confidence Intervals Fitted Model: Estimated price = -1336.41 + 12.7351 AGE + 85.8023 BIDDER Statgraphics demo