 # Multiple Regression Involves the use of more than one independent variable. Multivariate analysis involves more than one dependent variable - OMS 633 Adding.

## Presentation on theme: "Multiple Regression Involves the use of more than one independent variable. Multivariate analysis involves more than one dependent variable - OMS 633 Adding."— Presentation transcript:

Multiple Regression Involves the use of more than one independent variable. Multivariate analysis involves more than one dependent variable - OMS 633 Adding more variables will help us to explain more variance - the trick becomes: are the additional variables significant and do they improve the overall model? Additionally, the added independent variables should not be too highly related with each other!

Multiple Regression A sample data set: Sales= hundreds of gallons Price = price per gallon Advertising = hundreds of dollars

Analyzing the output Evaluate for multicollinearity State and interpret the equation Interpret Adjusted R 2 Interpret S yx Are the independent variables significant? Is the model significant Forecast and develop prediction interval Examine the error terms Calculate MAD, MSE, MAPE, MPE

Correlation Matrix Simple correlation for each combination of variables (independents vs. independents; independents vs. dependent)

Multicollinearity It’s possible that the independent variables are related to one another. If they are highly related, this condition is called multicollinearity. Problems: A regression coefficient that is positive in sign in a two- variable model may change to a negative sign Estimates of the regression coefficient change greatly from sample to sample because the standard error of the regression coefficient is large. Highly interrelated independent variable can explain some of the same variance in the dependent variable - so there is no added benefit, even though the R-square has increased. We would throw one variable out - high correlation (.7)

Multiple Regression Equation Gallon Sales = 16.4 - 8.2476 (Price) +.59 (Adv)

Regression Coefficients b o is the Y-intercept - the value of sales when X 1 and X 2 are 0. b 1 and b 2 are net regression coefficients. The change in Y per unit change in the relevant independent variable, holding the other independent variables constant.

Regression Coefficients For each unit increase (\$1.00) in price, sales will decrease 8.25 hundred gallons, holding advertising constant. For each unit increase (\$100, represented as 1) in Advertising, sales will increase.59 hundred gallons, holding price constant. Be very careful about the units! 10 in the advertising indicates \$1,000 because advertising is in hundreds Gallons = 16.4 - 8.2476 (1.00) +.59 (10) = 14.06 or 1,406 Gallons

Regression Coefficients How does a one cent increase in price affect sales (holding advertising at \$1,000)? 16.4-8.25(1.01)+.59(10) = 13.9675 If price stays \$1.00, and increase advertising \$100, from \$1,000 to \$1100: 16.4-8.25(1.00)+.59(11) = 14.65

Regression Statistics Standard error of the estimate R 2 and Adjusted R 2

Same formulas as Simple Regression SSR/SST (this is an UNADJUSTED R 2 ) Adjusted R 2 from ANOVA = 1-MSR/(SST/n-1) 91% of the variance in gallons sold is explained by price per gallon and advertising.

Standard Error of the Estimate Measures the standard amount that the actual values (Y) differ from the estimated values. No change in formula, except, in this example, k=3. Can still use square root of MSE

Evaluate the Independent Variables H o : The regression coefficient is not significantly different from zero H A : The regression coefficient is significantly different from zero Use the t-stat and the --value to evaluate EACH independent variable. If an independent variable is NOT significant, we remove it from the model and re- run!

Evaluate the Model Ho: The model is NOT valid and there is NOT a statistical relationship between the dependent and independent variables HA: The model is valid. There is a statistical relationship between the dependent and independent variables. If F from the ANOVA is greater than the F from the F- table, reject Ho: The model is valid. We can look at the P-values. If the p-value is less than our set  level, we can REJECT Ho.

Forecast and Prediction Interval Same as simple regression - however, many times we will not have the correction factor (formula under the square root). It is acceptable to use the Standard error of the estimate provided in the computer output.

Examining the Errors Heteroscedasticity exists when the residuals do not have a constant variance across an entire range of values. Run an autocorrelation on the error terms to determine if the errors are random. If the errors are not random, the model needs to be re-evaluated. More on this in Chapter 9. Evaluate with MAD, MAPE, MPE, MSE

Dummy Variables Used to determine the relationship between qualitative independent variables and a dependent variable. Differences based on gender Effect of training/no-training on performance Seasonal data- quarters We use 0 and 1 to indicate “off” or “on”. For example, code males as 1 and females as 0.

Dummy Variables The data indicates job performance rating based on achievement test score and female (0) and males (1). How do males and females differ in their job performance?

Dummy Variables The regression equation: Job performance = -1.96 +.12 (test score) -2.18 (gender) Holding gender constant, a one unit increase in test score increases job performance rating by 1.2 points. Holding test score constant, males experience a 2.18 point lower performance rating than females. Or stated differently, females have a 2.18 higher job performance than males, holding test scores constant.

Dummy Variable Analysis Evaluate for multicollinearity State and interpret the equation Interpret Adjusted R 2 Interpret S yx Are the independent variables significant? Is the model significant Forecast and develop prediction interval Examine the error terms Calculate MAD, MSE, MAPE, MPE

Model Evaluation If the variables indicate multicollinearity, run the model, interpret, but then re-run the best model (I.e. throw out one of the highly correlated variables) If one of the independent variables are NOT significant, (whether dummy variable or other) throw it out and re-run the model If the overall model is not significant - back to the drawing board - need to gather better predictor variables… maybe an elective course!

Stepwise Regression Sometimes, we will have a great number of variables - running a correlation matrix will help determine if any variables should NOT be in the model (low correlation with the dependent variable). Can also run different types of regression, such as stepwise regression

Stepwise regression Adds one variable at a time - one step at a time. Based on explained variance (and highest correlation with the dependent variable). The independent variable that explains the most variance in the dependent variable is entered into the model first. A partial f-test is determined to see if a new variable stays or is eliminated.