Chapter 20 Linear and Multiple Regression

Slides:



Advertisements
Similar presentations
Chapter 12 Simple Linear Regression
Advertisements

Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Objectives (BPS chapter 24)
Chapter 12 Simple Linear Regression
Chapter 10 Simple Regression.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Simple Linear Regression Basic Business Statistics 11 th Edition.
SIMPLE LINEAR REGRESSION
Chapter Topics Types of Regression Models
Simple Linear Regression Analysis
SIMPLE LINEAR REGRESSION
Chapter 7 Forecasting with Simple Regression
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Lecture 5 Correlation and Regression
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
CHAPTER 14 MULTIPLE REGRESSION
Introduction to Linear Regression
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Chap 13-1 Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall Chap 13-1 Chapter 13 Simple Linear Regression Basic Business Statistics 12.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Lecture 10: Correlation and Regression Model.
Environmental Modeling Basic Testing Methods - Statistics III.
Chapter 12 Simple Linear Regression.
Chapter 12 Simple Linear Regression n Simple Linear Regression Model n Least Squares Method n Coefficient of Determination n Model Assumptions n Testing.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Bivariate Regression. Bivariate Regression analyzes the relationship between two variables. Bivariate Regression analyzes the relationship between two.
Chapter 13 Simple Linear Regression
Warm-Up The least squares slope b1 is an estimate of the true slope of the line that relates global average temperature to CO2. Since b1 = is very.
Lecture 11: Simple Linear Regression
Inference for Least Squares Lines
Statistics for Managers using Microsoft Excel 3rd Edition
Correlation and Simple Linear Regression
Chapter 11: Simple Linear Regression
Chapter 13 Created by Bethany Stubbe and Stephan Kogitz.
John Loucks St. Edward’s University . SLIDES . BY.
Simple Linear Regression
Chapter 11 Simple Regression
Statistics for Business and Economics (13e)
Regression model with multiple predictors
Relationship with one independent variable
Chapter 13 Simple Linear Regression
Quantitative Methods Simple Regression.
Slides by JOHN LOUCKS St. Edward’s University.
Correlation and Simple Linear Regression
CHAPTER 29: Multiple Regression*
Simple Linear Regression
PENGOLAHAN DAN PENYAJIAN
Correlation and Simple Linear Regression
Multiple Regression Chapter 14.
Relationship with one independent variable
Business Statistics, 4e by Ken Black
SIMPLE LINEAR REGRESSION
Simple Linear Regression and Correlation
SIMPLE LINEAR REGRESSION
Simple Linear Regression
Chapter Fourteen McGraw-Hill/Irwin
Chapter Thirteen McGraw-Hill/Irwin
Introduction to Regression
St. Edward’s University
Chapter 13 Simple Linear Regression
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Chapter 20 Linear and Multiple Regression

Empirical Models Study of relationship between two or more variables Response variable: (dependent, output) Predictor or explanatory variables: (independent, input) Deterministic relationship: The outcomes can be predicted precisely (physics, chemistry, etc.) Regression Analysis: statistical tools used to model and explore relationships between variables

Regression Analysis Simple regression models: one explanatory variable Linear Non-linear Multiple regression models: two or more explanatory variables

Simple Linear Regression Model Population regression line y = response variable A = y-intercept (population parameter) B = slope (population parameter) x = explanatory variable  = random error Missing or omitted variables Random variation Estimated regression equation ŷ = estimated value of y for a given x

Scatterplots and Least Squares Line  (residual): difference between the actual value y and the predicted value of y for population data e: error for the estimated equation Sum of Squared Errors (SSE)

Scatterplots and Least Squares Line Least squares method finds a and b to minimize SSE a and b are called the least squares estimates of A and B Excel slope(y, x) intercept(y, x) forecast

Scatterplots and Least Squares Line –Example 20.1 y 63 16 73.4694 0.0204 -1.2245 88 25 1127.0408 78.4490 297.3469 38 13 269.8980 9.8776 51.6327 70 19 242.4694 8.1633 44.4898 27 9 752.3265 51.0204 195.9184 51 15 11.7551 1.3061 3.9184 44 108.7551 1.4898 Mean 54.4286 16.1429 Std Dev 20.7594 4.9809 Sum 2585.7143 148.8571 593.5714 SSxx SSyy SSxy

Scatterplots and Least Squares Line –Example 20.1 Minitab Graph Scatterplot

Scatterplots and Least Squares Line –Example 20.1 Regression Analysis: y versus x The regression equation is y = 3.65 + 0.230 x S = 1.58733 R-Sq = 91.5% R-Sq(adj) = 89.8% Analysis of Variance Minitab Stat Regression Predictor Coef SE Coef T P Constant 3.648 1.802 2.02 0.099 x 0.22956 0.03122 7.35 0.001 Source DF SS MS F P Regression 1 136.26 54.08 0.001 Residual Error 5 12.60 2.52 Total 6 148.86

Interpretations of a and b Interpretation of a Intercept on y axis at x=0 Caution on extrapolation Interpretation of b Slope Change in y due to an increase of one unit in x Positive relationship when b>0 Negative relationship when b<0

Assumptions of the Regression Model y = A + Bx +  The random error  has a mean equal to zero. y|x = A + Bx The errors associated with different observations are independent For any given x, the distribution of errors is normal The distribution of population errors for each x has the same standard deviation, 

Standard Deviation of Random Errors For the population, y = A + Bx +   is the std. dev. of all  Since  is unknown, it is estimated by the std. dev. For the sample data, se

Standard Deviation of Errors – Example 20.2 Income Food Exp ŷ e e2 63 16 18.1105 -2.1105 4.4542 88 25 23.8494 1.1506 1.3238 38 13 12.3715 0.6285 0.3950 70 19 19.7174 -0.7174 0.5147 27 9 9.8464 -0.8464 0.7164 51 15 15.3558 -0.3558 0.1266 44 13.7489 2.2511 5.0675 0.0000 12.5981

Standard Deviation of Errors – Example 20.2 Regression Analysis: y versus x The regression equation is y = 3.65 + 0.230 x S = 1.58733 R-Sq = 91.5% R-Sq(adj) = 89.8% Analysis of Variance Predictor Coef SE Coef T P Constant 3.648 1.802 2.02 0.099 x 0.22956 0.03122 7.35 0.001 se Source DF SS MS F P Regression 1 136.26 54.08 0.001 Residual Error 5 12.60 2.52 Total 6 148.86 n-2 SSE MSE

Coefficient of Determination and Correlation Measures how well does the explanatory variable explain the response variable in the regression model Total sum of squares (SST) Regression sum of squares (SSR) SST = SSR + SSE Coefficient of Determination (2 for population data) 0  R2  1

Coefficient of Determination -- Example 20.3 Income Food Exp ŷ (y-ŷ)2=e2 (y-ÿ)2 (ŷ-ÿ)2 63 16 18.1105 4.4542 0.0204 3.8716 88 25 23.8494 1.3238 78.4490 59.3915 38 13 12.3715 0.3950 9.8776 14.2228 70 19 19.7174 0.5147 8.1633 12.7774 27 9 9.8464 0.7164 51.0204 39.6453 51 15 15.3558 0.1266 1.3061 0.6195 44 13.7489 5.0675 5.7311 ÿ= 16.1429 12.5981 148.8571 136.2591

Coefficient of Determination – Example 20.3 Regression Analysis: y versus x The regression equation is y = 3.65 + 0.230 x S = 1.58733 R-Sq = 91.5% R-Sq(adj) = 89.8% Analysis of Variance Predictor Coef SE Coef T P Constant 3.648 1.802 2.02 0.099 x 0.22956 0.03122 7.35 0.001 R2 Source DF SS MS F P Regression 1 136.26 54.08 0.001 Residual Error 5 12.60 2.52 Total 6 148.86 SSR SST

Correlation -1    1 and -1  r  1 Pearson product-moment correlation coefficient Measures the strength of the linear association between two variables Correlation coefficient:  for population data, r for sample data -1    1 and -1  r  1

Correlation – Example 20.4 x y 63 16 73.4694 0.0204 -1.2245 88 25 1127.0408 78.4490 297.3469 38 13 269.8980 9.8776 51.6327 70 19 242.4694 8.1633 44.4898 27 9 752.3265 51.0204 195.9184 51 15 11.7551 1.3061 3.9184 44 108.7551 1.4898 Mean 54.4286 16.1429 Std Dev 20.7594 4.9809 Sum 2585.7143 148.8571 593.5714 SSxx SSyy SSxy

Multiple Regression Model Population regression line y = response variable A = constant term (population parameter) Bs = regression coefficients of x’s (population parameter) x’s = explanatory variables  = random error Missing or omitted variables Random variation Estimated regression equation ŷ = estimated value of y for a given x’s

Least Squares Line  (residual): difference between the actual value y and the predicted value of y for population data e: error for the estimated equation Sum of Squared Errors (SSE) Regression equation is obtained to minimize SSE

Assumptions of the Multiple Regression Model The random error  has a mean equal to zero. The errors associated with different observations are independent The distribution of errors is normal The distribution of population errors for each x has the same standard deviation,  The explanatory variables are not linearly related. There exists a 0 correlation between the random error  and each explanatory variable xi

Standard Deviation of Random Errors For the population,  is the std. dev. of all  Since  is unknown, it is estimated by the std. dev. For the sample data, se

Coefficient of Multiple Determination Total sum of squares (SST) Sum of squared errors (SSE) Regression sum of squares (SSR) SST = SSR + SSE Coefficient of Multiple Determination 0  R2  1

Adjusted Coefficient of Multiple Determination and Correlation

Coefficient of Determination – Example 20.3 Regression Analysis: y versus x The regression equation is y = 3.65 + 0.230 x S = 1.58733 R-Sq = 91.5% R-Sq(adj) = 89.8% Analysis of Variance Predictor Coef SE Coef T P Constant 3.648 1.802 2.02 0.099 x 0.22956 0.03122 7.35 0.001 Source DF SS MS F P Regression 1 136.26 54.08 0.001 Residual Error 5 12.60 2.52 Total 6 148.86

Multiple Regression Model - Example Period Promotion Demand 1 10 37 2 12 40 3 41 4 5 45 6 14 50 7 43 8 47 9 56 15 52 11 55 16 54

Multiple Regression Analysis - Example Regression Analysis: Demand versus Period, Promotion The regression equation is Demand = 32.5 + 1.61 Period + 0.268 Promotion Predictor Coef SE Coef T P Constant 32.530 9.051 3.59 0.006 Period 1.6077 0.4748 3.39 0.008 Promotion 0.2678 0.8796 0.30 0.768 S = 3.38207 R-Sq = 80.5% R-Sq(adj) = 76.2%

Multiple Regression Analysis - Example Analysis of Variance Source DF SS MS F P Regression 2 425.97 212.99 18.62 0.001 Residual Error 9 102.95 11.44 Total 11 528.92 Source DF Seq SS Period 1 424.91 Promotion 1 1.06 Unusual Observations Obs Period Demand Fit SE Fit Residual St Resid 9 9.0 56.000 50.213 2.073 5.787 2.17R R denotes an observation with a large standardized residual.

Multiple Regression Analysis Test of overall significance on the set of regression coefficients, B1, B2, … Bk Test on an individual regression coefficient, Bi Develop a confidence interval for an individual regression coefficient, Bi

Test of Overall Significance of Multiple Regression Model Null Hypothesis: H0: B1 = B2 = … = Bk = 0 Alt. Hypothesis: H1: At least one of the Bi  0 Test statistic: Degrees of Freedom = k, n-k-1 Alt. Hypothesis P-value Rejection Criterion H1 P(F>F0) F0 > F,k,n-k-1

Test of Overall Significance of Simple Regression Model – Example Regression Analysis: y versus x The regression equation is y = 3.65 + 0.230 x S = 1.58733 R-Sq = 91.5% R-Sq(adj) = 89.8% Analysis of Variance Predictor Coef SE Coef T P Constant 3.648 1.802 2.02 0.099 x 0.22956 0.03122 7.35 0.001 Source DF SS MS F P Regression 1 136.26 54.08 0.001 Residual Error 5 12.60 2.52 Total 6 148.86

Sampling Distribution of b

Test of Overall Significance of Simple Regression Model – Example Regression Analysis: y versus x The regression equation is y = 3.65 + 0.230 x S = 1.58733 R-Sq = 91.5% R-Sq(adj) = 89.8% Analysis of Variance Minitab Stat Regression Predictor Coef SE Coef T P Constant 3.648 1.802 2.02 0.099 x 0.22956 0.03122 7.35 0.001 b sb Source DF SS MS F P Regression 1 136.26 54.08 0.001 Residual Error 5 12.60 2.52 Total 6 148.86

Test on an Individual Regression Coefficient Null Hypothesis: H0: Bi = Bi0 Test statistic: Degree of Freedom = n-k-1 Alt. Hypothesis P-value Rejection Criterion H1: Bi  Bi0 2*P(t>|t0|) t0 > t/2,n-k-1 or t0 < -t/2, n-k-1 H1: Bi > Bi0 P(t>t0) t0 > t, n-k-1 H1: Bi < Bi0 P(t<-t0) t0 <- t, n-k-1

Test of Overall Significance of Simple Regression Model – Example Regression Analysis: y versus x The regression equation is y = 3.65 + 0.230 x S = 1.58733 R-Sq = 91.5% R-Sq(adj) = 89.8% Analysis of Variance Minitab Stat Regression Predictor Coef SE Coef T P Constant 3.648 1.802 2.02 0.099 x 0.22956 0.03122 7.35 0.001 t0 Source DF SS MS F P Regression 1 136.26 54.08 0.001 Residual Error 5 12.60 2.52 Total 6 148.86

Develop a Confidence Interval for an Individual Regression Coefficient

Scatterplots and Least Squares Line –Example 20.5 Exp. Premium 5 92 39.0625 30.2500 -34.3750 2 127 85.5625 1640.2500 -374.6250 12 73 0.5625 182.2500 -10.1250 9 104 5.0625 306.2500 -39.3750 15 65 14.0625 462.2500 -80.6250 6 82 27.5625 20.2500 23.6250 25 62 189.0625 600.2500 -336.8750 16 87 22.5625 0.2500 2.3750 Mean 11.25 86.5 Std Dev 7.4017 21.5208 SSxx SSyy SSxy Sum 383.5000 3242.0000 -850.0000

Standard Deviation of Errors – Example 20.5 Exp. Premium ŷ e e2 5 92 100.3527 -8.3527 69.7671 2 127 107.0020 19.9980 399.9218 12 73 84.8377 -11.8377 140.1307 9 104 91.4870 12.5130 156.5761 15 65 78.1884 -13.1884 173.9338 6 82 98.1362 -16.1362 260.3784 25 62 56.0241 5.9759 35.7111 16 87 75.9720 11.0280 121.6175 0.0000 1358.0365

Coefficient of Determination -- Example 20.5 Exp. Premium ŷ (y-ŷ)2=e2 (y-ÿ)2 (ŷ-ÿ)2 5 92 100.3527 69.76714 30.25 191.8965 2 127 107.0020 399.9218 1640.25 420.3302 12 73 84.8377 140.1307 182.25 2.7633 9 104 91.4870 156.5761 306.25 24.8698 15 65 78.1884 173.9338 462.25 69.0828 6 82 98.1362 260.3784 20.25 135.4022 25 62 56.0241 35.71114 600.25 928.7793 16 87 75.9720 121.6175 0.25 110.8394 ÿ= 86.5 1358.037 3242.00 1883.9635

Correlation – Example 20.5 Mean Std Dev SSyy SSxy Exp. Premium 5 92 39.0625 30.2500 -34.3750 2 127 85.5625 1640.2500 -374.6250 12 73 0.5625 182.2500 -10.1250 9 104 5.0625 306.2500 -39.3750 15 65 14.0625 462.2500 -80.6250 6 82 27.5625 20.2500 23.6250 25 62 189.0625 600.2500 -336.8750 16 87 22.5625 0.2500 2.3750 Mean 11.25 86.5 Std Dev 7.4017 21.5208 SSxx SSyy SSxy 383.5000 3242.0000 -850.0000

Test on an Individual Regression Coefficient – Example 20.5 Null Hypothesis: H0: Bi = 0 Test statistic: Degree of Freedom = n-2 = 6 H1: Bi < 0,  = .05 Critical Value: - t.05, 6=-1.9432 p-value = .0139 Reject H0

Confidence Interval for an Individual Regression Coefficient – Example 20.5  = 10%

Scatterplots and Least Squares Line –Example 20.5

Scatterplots and Least Squares Line –Example 20.5 Regression Analysis: y versus x The regression equation is Premium = 111 - 2.22 Exp. S = 15.0446 R-Sq = 58.1% R-Sq(adj) = 51.1% Analysis of Variance Predictor Coef SE Coef T P Constant 111.43 10.15 10.98 0.000 Exp. -2.2164 0.7682 -2.89 0.028 Source DF SS MS F P Regression 1 1884.0 8.32 0.028 Residual Error 6 1358.0 226.3 Total 7 3242.0

Residual Analysis From Minitab Histogram of the Residuals Normal Probability Plot of residuals Residuals Versus Fitted Values Residuals Versus Order of Data Residuals versus predictors

Cautions in Using Regression Determining whether a Model is Good or Bad: R2 and correlation coefficient are not enough Watch for Outliers and Influential Observations Avoid Multicollinearity Extra precaution for Extrapolation