# Ch. 14: The Multiple Regression Model building

## Presentation on theme: "Ch. 14: The Multiple Regression Model building"β Presentation transcript:

Ch. 14: The Multiple Regression Model building
Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more independent variables (Xi) Multiple Regression Model with k Independent Variables: Y-intercept Population slopes Random Error

The coefficients of the multiple regression model are estimated using sample data with k independent variables Interpretation of the Slopes: (referred to as a Net Regression Coefficient) b1=The change in the mean of Y per unit change in X1, taking into account the effect of X2 (or net of X2) b0 Y intercept. It is the same as simple regression. Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients

Graph of a Two-Variable Model
Three dimension Y Slope for variable X1 X2 Slope for variable X2 X1

Example: Simple Regression Results Multiple Regression Results
Check the size and significance level of the coefficients, the F-value, the R-Square, etc. You will see what the βnet of β effects are.

Using The Equation to Make Predictions
Predict the appraised value at average lot size (7.24) and average number of rooms (7.12). What is the total effect from 2000 sf increase in lot size and 2 additional rooms?

Coefficient of Multiple Determination, r2 and Adjusted r2
Reports the proportion of total variation in Y explained by all X variables taken together (the model) Adjusted r2 r2 never decreases when a new X variable is added to the model This can be a disadvantage when comparing models

What is the net effect of adding a new variable?
We lose a degree of freedom when a new X variable is added Did the new X variable add enough explanatory power to offset the loss of one degree of freedom? Shows the proportion of variation in Y explained by all X variables adjusted for the number of X variables used (where n = sample size, k = number of independent variables) Penalize excessive use of unimportant independent variables Smaller than r2 Useful in comparing among models

Multiple Regression Assumptions
The errors are normally distributed Errors have a constant variance The model errors are independent Errors (residuals) from the regression model: ei = (Yi β Yi) These residual plots are used in multiple regression: Residuals vs. Yi Residuals vs. X1i Residuals vs. X2i Residuals vs. time (if time series data)

Two variable model Y Yi Residual = ei = (Yi β Yi) Yi x2i X2 x1i
Sample observation Yi Residual = ei = (Yi β Yi) < Yi < x2i X2 x1i < The best fit equation, Y , is found by minimizing the sum of squared errors, οe2 X1

Are Individual Variables Significant?
Use t-tests of individual variable slopes Shows if there is a linear relationship between the variable Xi and Y; Hypotheses: H0: Ξ²i = 0 (no linear relationship) H1: Ξ²i β  0 (linear relationship does exist between Xi and Y) Test Statistic: Confidence interval for the population slope Ξ²i

Is the Overall Model Significant?
F-Test for Overall Significance of the Model Shows if there is a linear relationship between all of the X variables considered together and Y Use F test statistic; Hypotheses: H0: Ξ²1 = Ξ²2 = β¦ = Ξ²k = 0 (no linear relationship) H1: at least one Ξ²i β  0 (at least one independent variable affects Y) Test statistic:

Testing Portions of the Multiple Regression Model
To find out if inclusion of an individual Xj or a set of Xs, significantly improves the model, given that other independent variables are included in the model Two Measures: Partial F-test criterion The Coefficient of Partial Determination

Contribution of a Single Independent Variable Xj
SSR(Xj | all variables except Xj) = SSR (all variables) β SSR(all variables except Xj) Measures the contribution of Xj in explaining the total variation in Y (SST) consider here a 3-variable model: SSR(X1 | X2 and X3) = SSR (all variablesX1-x3) β SSR(X2 and X3) SSRR Model SSRUR Model

The Partial F-Test Statistic
Consider the hypothesis test: H0: variable Xj does not significantly improve the model after all other variables are included H1: variable Xj significantly improves the model after all other variables are included Note that the numerator is the contribution of Xj to the regression. If Actual F Statistic is > than the Critical F, then Conclusion is: Reject H0; adding X1 does improve model

Coefficient of Partial Determination for one or a set of variables
Measures the proportion of total variation in the dependent variable (SST) that is explained by Xj while controlling for (holding constant) the other explanatory variables

Regression intercepts are different if the variable is significant
Using Dummy Variables A dummy variable is a categorical explanatory variable with two levels: yes or no, on or off, male or female coded as 0 or 1 Regression intercepts are different if the variable is significant Assumes equal slopes for other variables If more than two levels, the number of dummy variables needed is (number of levels - 1)

Different Intercepts, same slope
Fire Place No Fire Place Fire Place (X2 = 1) Y (sales) If H0: Ξ²2 = 0 is rejected, then βFire Placeβ has a significant effect on Values b0 + b2 No Fire place (X2 = 0) b0

Interaction Between Explanatory Variables
Hypothesizes interaction between pairs of X variables Response to one X variable may vary at different levels of another X variable Contains two-way cross product terms Effect of Interaction Without interaction term, effect of X1 on Y is measured by Ξ²1 With interaction term, effect of X1 on Y is measured by Ξ²1 + Ξ²3 X2 Effect changes as X2 changes

Slopes are different if the effect of X1 on Y depends on X2 value
Example: Suppose X2 is a dummy variable and the estimated regression equation is = 1 + 2X1 + 3X2 + 4X1X2 Y Y = 1 + 2X1 + 3(1) + 4X1(1) = 4 + 6X1 X2 = 1: Y = 1 + 2X1 + 3(0) + 4X1(0) = 1 + 2X1 X2 = 0: X1 0.5 1 1.5 Slopes are different if the effect of X1 on Y depends on X2 value

Download ppt "Ch. 14: The Multiple Regression Model building"

Similar presentations