Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.

Slides:



Advertisements
Similar presentations
Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.
Advertisements

Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 14-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
Qualitative Variables and
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
Chapter 13 Multiple Regression
Chapter 14 Introduction to Multiple Regression
Korelasi Ganda Dan Penambahan Peubah Pertemuan 13 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Chapter 12 Simple Regression
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Simple Linear Regression Basic Business Statistics 11 th Edition.
Interaksi Dalam Regresi (Lanjutan) Pertemuan 25 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Regresi dan Rancangan Faktorial Pertemuan 23 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Chapter 12 Multiple Regression
© 2000 Prentice-Hall, Inc. Chap Multiple Regression Models.
Multiple Regression Models. The Multiple Regression Model The relationship between one dependent & two or more independent variables is a linear function.
Chapter 13 Introduction to Linear Regression and Correlation Analysis
© 2003 Prentice-Hall, Inc.Chap 14-1 Basic Business Statistics (9 th Edition) Chapter 14 Introduction to Multiple Regression.
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 13-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 11 th Edition.
Chapter 12 Simple Linear Regression
Chapter Topics Types of Regression Models
Linear Regression and Correlation Analysis
© 2004 Prentice-Hall, Inc.Chap 14-1 Basic Business Statistics (9 th Edition) Chapter 14 Introduction to Multiple Regression.
Linear Regression Example Data
Ch. 14: The Multiple Regression Model building
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 15-1 Business Statistics: A Decision-Making Approach 8 th Edition Chapter 15 Multiple.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Simple Linear Regression Basic Business Statistics 10 th Edition.
Chapter 13 Simple Linear Regression
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Chapter 13 Simple Linear Regression
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Purpose of Regression Analysis Regression analysis is used primarily to model causality and provide prediction –Predicts the value of a dependent (response)
Chapter 15 Multiple Regression
Lecture 14 Multiple Regression Model
Chapter 14 Simple Regression
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Chapter 14 Introduction to Multiple Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Statistics for Business and Economics 8 th Edition Chapter 12 Multiple Regression Ch Copyright © 2013 Pearson Education, Inc. Publishing as Prentice.
Chap 14-1 Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics.
Chap 13-1 Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall Chap 13-1 Chapter 13 Simple Linear Regression Basic Business Statistics 12.
Statistics for Business and Economics 8 th Edition Chapter 11 Simple Regression Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Lecture 4 Introduction to Multiple Regression
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
Lecture 10: Correlation and Regression Model.
Applied Quantitative Analysis and Practices LECTURE#25 By Dr. Osman Sadiq Paracha.
Lecture 3 Introduction to Multiple Regression Business and Economic Forecasting.
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 14-1 Chapter 14 Introduction to Multiple Regression Statistics for Managers using Microsoft.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice- Hall, Inc. Chap 14-1 Business Statistics: A Decision-Making Approach 6 th Edition.
Statistics for Business and Economics 8 th Edition Chapter 11 Simple Regression Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Chap 13-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 13 Multiple Regression and.
Statistics for Managers Using Microsoft® Excel 5th Edition
Chapter 12 Simple Linear Regression.
Multiple Regression Learning Objectives n Explain the Linear Multiple Regression Model n Interpret Linear Multiple Regression Computer Output n Test.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Conceptual Foundations © 2008 Pearson Education Australia Lecture slides for this course are based on teaching materials provided/referred by: (1) Statistics.
Chapter 14 Introduction to Multiple Regression
Chapter 15 Multiple Regression and Model Building
Multiple Regression Analysis and Model Building
Chapter 15 Multiple Regression Analysis and Model Building
Korelasi Parsial dan Pengontrolan Parsial Pertemuan 14
Presentation transcript:

Introduction to Multiple Regression Lecture 11

The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more independent variables (X i ) Multiple Regression Model with k Independent Variables: Y-intercept Population slopesRandom Error

Multiple Regression Equation The coefficients of the multiple regression model are estimated using sample data Estimated (or predicted) value of Y Estimated slope coefficients Multiple regression equation with k independent variables: Estimated intercept In this chapter we also use STATA

Two variable model Y X1X1 X2X2 Slope for variable X 1 Slope for variable X 2 Multiple Regression Equation (continued)

Example: 2 Independent Variables A distributor of frozen dessert pies wants to evaluate factors thought to influence demand Dependent variable: Pie sales (units per week) Independent variables: Price (in $) Advertising ($100’s) Data are collected for 15 weeks

Pie Sales Example Sales = b 0 + b 1 (Price) + b 2 (Advertising) Week Pie Sales Price ($) Advertising ($100s) Multiple regression equation:

Excel Multiple Regression Output Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations15 ANOVA dfSSMSFSignificance F Regression Residual Total CoefficientsStandard Errort StatP-valueLower 95%Upper 95% Intercept Price Advertising

The Multiple Regression Equation b 1 = : sales will decrease, on average, by pies per week for each $1 increase in selling price, net of the effects of changes due to advertising b 2 = : sales will increase, on average, by pies per week for each $100 increase in advertising, net of the effects of changes due to price where Sales is in number of pies per week Price is in $ Advertising is in $100’s.

Using The Equation to Make Predictions Predict sales for a week in which the selling price is $5.50 and advertising is $350: Predicted sales is pies Note that Advertising is in $100’s, so $350 means that X 2 = 3.5

Coefficient of Multiple Determination Reports the proportion of total variation in Y explained by all X variables taken together

Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations15 ANOVA dfSSMSFSignificance F Regression Residual Total CoefficientsStandard Errort StatP-valueLower 95%Upper 95% Intercept Price Advertising % of the variation in pie sales is explained by the variation in price and advertising Multiple Coefficient of Determination

Adjusted r 2 r 2 never decreases when a new X variable is added to the model This can be a disadvantage when comparing models What is the net effect of adding a new variable? We lose a degree of freedom when a new X variable is added Did the new X variable add enough explanatory power to offset the loss of one degree of freedom?

Shows the proportion of variation in Y explained by all X variables adjusted for the number of X variables used (where n = sample size, k = number of independent variables) Penalize excessive use of unimportant independent variables Smaller than r 2 Useful in comparing among models Adjusted r 2 (continued)

Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations15 ANOVA dfSSMSFSignificance F Regression Residual Total CoefficientsStandard Errort StatP-valueLower 95%Upper 95% Intercept Price Advertising % of the variation in pie sales is explained by the variation in price and advertising, taking into account the sample size and number of independent variables Adjusted r 2

Is the Model Significant? F Test for Overall Significance of the Model Shows if there is a linear relationship between all of the X variables considered together and Y Use F-test statistic Hypotheses: H 0 : β 1 = β 2 = … = β k = 0 (no linear relationship) H 1 : at least one β i ≠ 0 (at least one independent variable affects Y)

F Test for Overall Significance Test statistic: where F STAT has numerator d.f. = k and denominator d.f. = (n – k - 1)

Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations15 ANOVA dfSSMSFSignificance F Regression Residual Total CoefficientsStandard Errort StatP-valueLower 95%Upper 95% Intercept Price Advertising (continued) F Test for Overall Significance With 2 and 12 degrees of freedom P-value for the F Test

H 0 : β 1 = β 2 = 0 H 1 : β 1 and β 2 not both zero  =.05 df 1 = 2 df 2 = 12 Test Statistic: Decision: Conclusion: Since F STAT test statistic is in the rejection region (p- value <.05), reject H 0 There is evidence that at least one independent variable affects Y 0  =.05 F 0.05 = Reject H 0 Do not reject H 0 Critical Value: F 0.05 = F Test for Overall Significance (continued) F

Two variable model Y X1X1 X2X2 YiYi Y i < x 2i x 1i The best fit equation is found by minimizing the sum of squared errors,  e 2 Sample observation Residuals in Multiple Regression Residual = e i = (Y i – Y i ) <

Multiple Regression Assumptions Assumptions: The errors are normally distributed Errors have a constant variance The model errors are independent e i = (Y i – Y i ) < Errors ( residuals ) from the regression model:

Residual Plots Used in Multiple Regression These residual plots are used in multiple regression: Residuals vs. Y i Residuals vs. X 1i Residuals vs. X 2i Residuals vs. time (if time series data) < Use the residual plots to check for violations of regression assumptions

Are Individual Variables Significant? Use t tests of individual variable slopes Shows if there is a linear relationship between the variable X j and Y holding constant the effects of other X variables Hypotheses: H 0 : β j = 0 (no linear relationship) H 1 : β j ≠ 0 (linear relationship does exist between X j and Y)

Are Individual Variables Significant? H 0 : β j = 0 (no linear relationship) H 1 : β j ≠ 0 (linear relationship does exist between X j and Y) Test Statistic: ( df = n – k – 1) (continued)

Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations15 ANOVA dfSSMSFSignificance F Regression Residual Total CoefficientsStandard Errort StatP-valueLower 95%Upper 95% Intercept Price Advertising t Stat for Price is t STAT = , with p-value.0398 t Stat for Advertising is t STAT = 2.855, with p-value.0145 (continued) Are Individual Variables Significant?

d.f. = = 12  =.05 t  /2 = Inferences about the Slope: t Test Example H 0 : β j = 0 H 1 : β j  0 The test statistic for each variable falls in the rejection region (p-values <.05) There is evidence that both Price and Advertising affect pie sales at  =.05 Reject H 0 for each variable Decision: Conclusion: Reject H 0  /2=.025 -t α/2 Do not reject H 0 0 t α/2  /2= For Price t STAT = , with p-value.0398 For Advertising t STAT = 2.855, with p-value.0145

Confidence Interval Estimate for the Slope Confidence interval for the population slope β j Example: Form a 95% confidence interval for the effect of changes in price (X 1 ) on pie sales: ± (2.1788)(10.832) So the interval is ( , ) (This interval does not contain zero, so price has a significant effect on sales) CoefficientsStandard Error Intercept Price Advertising where t has (n – k – 1) d.f. Here, t has (15 – 2 – 1) = 12 d.f.

Confidence Interval Estimate for the Slope Confidence interval for the population slope β j Example: Excel output also reports these interval endpoints: Weekly sales are estimated to be reduced by between 1.37 to pies for each increase of $1 in the selling price, holding the effect of price constant CoefficientsStandard Error…Lower 95%Upper 95% Intercept … Price … Advertising … (continued)

Using Dummy Variables A dummy variable is a categorical independent variable with two levels: yes or no, on or off, male or female coded as 0 or 1 Assumes the slopes associated with numerical independent variables do not change with the value for the categorical variable If more than two levels, the number of dummy variables needed is (number of levels - 1)

Dummy-Variable Example (with 2 Levels) Let: Y = pie sales X 1 = price X 2 = holiday (X 2 = 1 if a holiday occurred during the week) (X 2 = 0 if there was no holiday that week)

Same slope Dummy-Variable Example (with 2 Levels) (continued) X 1 (Price) Y (sales) b 0 + b 2 b0b0 Holiday No Holiday Different intercept Holiday (X 2 = 1) No Holiday (X 2 = 0) If H 0 : β 2 = 0 is rejected, then “Holiday” has a significant effect on pie sales

Sales: number of pies sold per week Price: pie price in $ Holiday: Interpreting the Dummy Variable Coefficient (with 2 Levels) Example: 1 If a holiday occurred during the week 0 If no holiday occurred b 2 = 15: on average, sales were 15 pies greater in weeks with a holiday than in weeks without a holiday, given the same price

Dummy-Variable Models (more than 2 Levels) The number of dummy variables is one less than the number of levels Example: Y = house price ; X 1 = square feet If style of the house is also thought to matter: Style = ranch, split level, colonial Three levels, so two dummy variables are needed DCOVA

Dummy-Variable Models (more than 2 Levels) Example: Let “colonial” be the default category, and let X 2 and X 3 be used for the other two categories: Y = house price X 1 = square feet X 2 = 1 if ranch, 0 otherwise X 3 = 1 if split level, 0 otherwise The multiple regression equation is: (continued)

Interpreting the Dummy Variable Coefficients (with 3 Levels) With the same square feet, a ranch will have an estimated average price of thousand dollars more than a colonial. With the same square feet, a split-level will have an estimated average price of thousand dollars more than a colonial. Consider the regression equation: For a colonial: X 2 = X 3 = 0 For a ranch: X 2 = 1; X 3 = 0 For a split level: X 2 = 0; X 3 = 1 DCOVA

Logistic Regression Used when the dependent variable Y is binary (i.e., Y takes on only two values) Examples Customer prefers Brand A or Brand B Employee chooses to work full-time or part-time Loan is delinquent or is not delinquent Person voted in last election or did not Logistic regression allows you to predict the probability of a particular categorical response

Logistic Regression Logistic regression is based on the odds ratio, which represents the probability of a success compared with the probability of failure The logistic regression model is based on the natural log of this odds ratio (continued)

Logistic Regression Where k = number of independent variables in the model ε i = random error in observation i Logistic Regression Model: Logistic Regression Equation: (continued)

Estimated Odds Ratio and Probability of Success Once you have the logistic regression equation, compute the estimated odds ratio: The estimated probability of success is