Multiple Regression
Learning Objectives n Explain the Linear Multiple Regression Model n Interpret Linear Multiple Regression Computer Output n Test for Overall Significance n Explain Multicollinearity n Describe the Types of Multiple Regression Models
Multiple Regression Models Multiple Regression Models Linear Dummy Variable Linear Non- Linear Inter- action Poly- Nomial Logit Square Root LogReciprocalExponential
Linear Multiple Regression Model Relationship Between 1 Dependent & 2 or More Independent Variables Is a Linear Function YXXX kk 1122 Dependent (Response) Variable Independent (Explanatory) Variables Population Slopes Population Y-Intercept Random Error
X 2 Y X 1 0 Response Plane (X 1,X 2 ) YX = + 1 X 1 + 2 X 2 Y = + 1 X 1 + 2 X 2 + (ObservedY) Population Bivariate Linear Multiple Regression Model
Sample Bivariate Linear Multiple Regression Model X 2 Y X 1 b 0 Y =a +b 1 X 1 +b 2 X 2 +e Response Plane (X 1,X 2 ) (Observed Y) e ^ 1122 Y =a +bX +bX
Regression Modeling Steps 1.Define Problem or Question 2.Specify Model 3.Collect Data 4.Do Descriptive Data Analysis 5.Estimate Unknown Parameters 6.Evaluate Model 7.Use Model for Prediction
Multiple Linear Regression Coefficient Equations Too Complicated By Hand! Ouch!
Parameter Estimation Example n You work in advertising for the New York Times. You want to find the effect of ad size (sq. in.) & newspaper circulation (000) on the number of ad responses (00). You’ve collected the following data: RespSizeCirc
Parameter Estimation Computer Output b2b2b2b2 b1b1b1b1 a
Interpretation of Coefficients Solution n Slope (b 1 =.2049) –For Each 1 Sq. In. Increase in Ad Size, the # Responses to Ad Is Expected to Increase by Holding Circulation Constant n Slope (b 2 =.2805) –For Each 1,000 paper (1 Unit) Increase in Circulation, # Responses to Ad Is Expected to Increase by Holding Ad Size Constant
Regression Modeling Steps 1.Define Problem or Question 2.Specify Model 3.Collect Data 4.Do Descriptive Data Analysis 5.Estimate Unknown Parameters 6.Evaluate Model 7.Use Model for Prediction
Evaluating the Model n How Well Does the Model Describe the Relationship Between the Variables? n Closeness of ‘Best Fit’ n Assumptions Met n Significance of Parameter Estimates n Correlation Between X Variables n Outliers (Unusual Observations)
Evaluating Multiple Regression Model Steps n Examine Variation Measures n Do Residual Analysis n Test Parameter Significance –Overall Model –Individual Coefficients n Test for Multicollinearity n Do Influence Analysis New! Expanded!
Coefficient of Multiple Determination n Proportion of Variation in Y ‘Explained’ by All X Variables Taken Together n r 2 Y. 12 = Explained Variation = SSR Total Variation SST n Never Decreases When New X Variable Is Added to Model –Only Y Values Determine SST –Disadvantage When Comparing Models
n Proportion of Variation in Y ‘Explained’ by All X Variables Taken Together n Reflects –Sample Size –Number of Independent Variables n Smaller Than r 2 Y. 12 n Used to Compare Models Adjusted Coefficient of Multiple Determination
Coefficient of Determination Computer Output r 2 adj Means 95.61% of Variation in Y Is Due to Ad Size & Circulation
Coefficients of Correlation
Correlation Matrix Computer Output r 12 r Y2 r Y1 All 1’s
Evaluating Multiple Regression Model Steps 1.Examine Variation Measures 2.Do Residual Analysis 3.Test Parameter Significance Overall Model Overall Model Individual Coefficients Individual Coefficients 4.Test for Multicollinearity 5.Do Influence Analysis New! Expanded!
Testing Overall Significance n Tests If There Is a Linear Relationship Between All X Variables Together & Y n Uses F Test Statistic n Hypotheses –H 0 : 1 = 2 =... = P = 0 No Linear Relationship –H 1 : At Least One Coefficient Is Not 0 At Least One X Variable Affects Y
Analysis of Variance Source of Variation Regression Residual (Error) Total Sum of Squares Degrees of Freedom 1 = k 2 = n- k - 1 n - 1 Mean Square
MSR/MSE n - k - 1 P-value k
Evaluating Multiple Regression Model Steps 1.Examine Variation Measures 2.Do Residual Analysis 3.Test Parameter Significance Overall Model Overall Model Individual Coefficients Individual Coefficients 4.Test for Multicollinearity 5.Do Influence Analysis New! Expanded!
Multicollinearity n High Correlation Between X Variables n Coefficients Measure Combined Effect n Leads to Unstable Coefficients Depending on X Variables in Model n Always Exists - Matter of Degree n Example - Using Both Sales & Profit as Explanatory Variables in Same Model
When independent variables are highly correlated, odd results may occur such that net coefficients are unreliable.
Odd Things Happen l Examine Correlation Matrix n Correlations Between Pairs of X Variables Are More than With Y Variable l Sign of slope changes from simple to multiple regression equation l Model passes F-test, but not individual t-tests l Correlation matrix shows different sign than net coefficient (slope)
Detecting Multicollinearity Examine Correlation Matrix Examine Correlation Matrix Correlations Between Pairs of X Variables Are More than With Y Variable Rule of Thumb: Potential problem if r >.7 for any 2 independent variables Rule of Thumb: Potential problem if r >.7 for any 2 independent variables Few Remedies Few Remedies Obtain New Sample Data Eliminate One Correlated X Variable
Age Experience Salary
n Delete one or more of the correlated variables –Drop variable if its t statistic for its net regression coefficient < 1 –Drop the variable if R c 2 increases upon its deletion n Change form of one or more of the independent variables –Change actual salary to real salary –Divide income by population for a per capita income n Exclude variable with lower correlation with y Possible Solutions
Multiple Regression Models Multiple Regression Models Linear Dummy Variable Linear Non- Linear Inter- action Poly- Nomial Logit Square Root LogReciprocalExponential
Dummy-Variable Regression Model 1. Involves Categorical X Variable with 2 Levels e.g., Male-Female; College-No College e.g., Male-Female; College-No College 2. Variable Levels Coded 0 & 1 3. Assumes Only Intercept Is Different Slopes Are Constant Across Categories Slopes Are Constant Across Categories 4. Dummy-Variable Model YXXX kk 1122
Y X 1 Dummy-Variable Model Relationships Category 1 Category 2 Same Slope Different Y-Intercepts
Dummy-Variable Model Worksheet X 2 Levels: 0 = Group 1, 1 = Group 2. Run Regression with Y, X 1, X 2 Case YX 1 X ::::
Given: Starting Salary of College Grad's GPA i if Female Males ( Females ( if Male ): YbbXbX Y X X YbbXbbbX YbbXb (b bbX i i X X ) (0)(0) (1)(1) Interpreting Dummy- Variable Model Equation Same Slopes
Computer Output: f Male if Female Males ( Females ( i ): YXX X YXX YXX X X X (0) (1) (37) Dummy-Variable Model Example Same Slopes
Conclusion n Explained the Linear Multiple Regression Model n Interpreted Linear Multiple Regression Computer Output n Explained Multicollinearity n Described the Types of Multiple Regression Models