Presentation on theme: " Population multiple regression model Data for multiple regression Multiple linear regression model Confidence intervals and significance tests."— Presentation transcript:
Population multiple regression model Data for multiple regression Multiple linear regression model Confidence intervals and significance tests Squared multiple correlation R 2 1
Extension of SLR Statistical model Estimation of the parameters and interpretation R-square with MLR Anova Table F-Test and t-tests
Most things are conceptually similar to SLR and an extension of what we learned thru chapters 2 and 10. However, most things get much more complex, including the SAS output and learning to interpret it. Lastly, whereas before there usually was a set procedure to analyze the data, we now will have to be more flexible and take things as they come, so to speak.
Population Multiple Regression Equation Up to this point, we have considered in detail the linear regression model in which the mean response, μ y, is related to one explanatory variable x: Usually, more complex linear models are needed in practical situations. There are many problems in which a knowledge of more than one explanatory variable is necessary in order to obtain a better understanding and better prediction of a particular response. 4 In multiple regression, the response variable y depends on p explanatory variables :
Data for Multiple Regression The data for a simple linear regression problem consist of n observations of two variables. Data for multiple linear regression consist of the value of a response variable y and p explanatory variables on each of n cases. We write the data and enter them into software in the form: Variables Casex1x1 x2x2 …xpxp y 1x 11 x 12 …x1px1p y1y1 2x 21 x 22 …x2px2p y2y2 ……………… nxn1xn1 xn2xn2 …x np ynyn 5
We are interested in finding variables to predict college GPA. Grades from high school will be used as potential explanatory variables (also called predictors) namely: HSM (math grades), HSS (science grades), and HSE (english grades). Since there are several explanatory variables or x’s, they need to be distinguished using subscripts: ◦ X 1 =HSM ◦ X 2 =HSS ◦ X 3 =HSE
Why not do several Simple Linear Regressions ◦ Do GPA with HSM Significant? ◦ Do GPA with HSS Significant? ◦ Do GPA with HSE Significant? Why not? ◦ Each alone may not explain GPA very well at all but used together they may explain GPA quite well. ◦ Predictors could (and usually do) overlap some, so we’d like to distinguish this overlap (and remove it) if possible.
Unfortunately because scatterplots are restricted to only 2 axes (Y-axis and X-axis), they are less useful here. Could plot Y with each predictor separately, like an SLR, but this is just a preliminary look at each of the variables and cannot tell us whether we have a good MLR or not.
The deviations ε i are assumed to be independent and N(0, ). The parameters of the model are: 0, 1, 2, 3, and . ◦ Estimates then become b 0, b 1, b 2, b 3, and
b 0 is still the intercept b 1 is the estimated “slope” for 1, it explains how y changes as x 1 changes Then b 2 is the estimated “slope” for 2, it explains how y changes as x 2 changes ◦ Suppose b 2 = 0.7, then if I change x 2 by 1 point, y changes by 0.7, etc ◦ The exact same interpretation as in SLR
Predicted values ◦ Given values for x 1, x 2, and x 3, plug those into the regression equation and get a Residuals ◦ Still Observed – Predicted = y – ◦ Calculations and interpretations are the same Assumptions ◦ Independence, Linearity, Constant Variance and Normality ◦ Use the same plots, same interpretation
Confidence intervals for the slopes ◦ Still of the form ◦ *CHANGE!!! DF = n – p – 1 p is the number of predictors in our model Recall in SLR we only had 1 predictor, or one x So, df = n – 1 – 1 = n – 2 for SLR Now we have p predictors, For GPA example, df = n – 3 – 1 = n – 4
Since there is more than 1 predictor, a simple t- test will not suffice to test whether there is a significant linear relationship or not. The good news… ◦ The fundamental principle is still the same ◦ To help with understanding let’s look at what R-square means…
Still trying to explain the changes in Y R-square measures the % of explained variation by the regression line. ◦ So in SLR, this is just the percent explained by the changes in x. In MLR, it represents the percent explained by all predictors combined simultaneously. ◦ Problem: What if the predictors are overlapping? ◦ In fact, they almost always overlap at least a little bit
Rectangle represents total variation of Y; Ovals represent variables; Note OVERLAP! X1 X2 X3 Total Variation of Y
First, we need a number to describe the total variation (the yellow box) ◦ SST = Total Sums of Squares Next we need to describe the parts explained by the different predictors. ◦ Unfortunately, for now, all we get is one number for all the variables together. ◦ SSM = Model Sums of Squares (Regression) Then naturally, R 2 = SSM/SST ◦ The amount of variation the regression explains out of the total variation
Using the same principle, a single t-test for each predictor is not good enough, we need a collective test for all predictors at the same time. ◦ ANOVA Table
Breaks up the different pieces of sums of squares ◦ SST = Total variation ◦ SSM = Part explained by the model(regression) ◦ SSE = Leftover unexplained portion Called Error Sums of Squares Let’s look again…
ANOVA Table for Multiple Regression SourcedfSum of squares SS Mean square MS FP-value Modelp (from data) MSM=SSM/DFMMSM/MSEFrom Table Errorn − p − 1 (from data) MSE=SSE/DFE Totaln − 1 (from data) 20 SSM = model sums of squares SSE = error sums of squares SST = total sums of squares SST = SSM + SSE DFM = p DFE = n – p – 1 DFT = n – 1 DFT = DFM + DFE
Additionally, the ANOVA Table tests whether or not there is a significant multiple linear regression ◦ Test statistic is F = MSM/MSE Under H 0, F has an F distribution (see Table E) with p and n-p-1degrees of freedom (two types): ◦ “Degrees of freedom in the numerator" DFM = p ◦ “Degrees of freedom in the denominator" DFE = n – p – 1
The hypotheses for the F-test are as follows: H 0 : 1 = 2 = 3 = 0 H α : Some i ≠ 0 (only need one non-zero i ) So a rejection of the null indicates that collectively the Xs do well at explaining Y What it doesn’t show is which of the Xs are doing “the explaining” ◦ We’ll come back to this later
Since the P-value for the F-test is small, <0.0001, we reject H 0 There is a significant Multiple Linear Regression between Y and the Xs. ◦ Model is useful in predicting Y My data provides evidence that there is a significant linear regression between GPA and the predictors HSM, HSS, and HSE
The t-tests now become useful in determining which predictors are actually contributing to the explanation of Y. There are several different methods of determining which Xs are the best ◦ All possible models selection ◦ Forward selection ◦ Stepwise selection ◦ Backward elimination We will just learn backward elimination…
So suppose X1 does a good job explaining Y by itself. Then maybe X2 and X3 are “piggybacking” in to the model. ◦ They themselves aren’t good by themselves but combined with X1, all three look good collectively in the MLR. X1 X2 X3 Total Variation of Y
A t-test in MLR is similar to what it was in SLR Hypotheses: H 0 : 1 = 0 vs. H a : 1 ≠ 0 The difference is this is testing the usefulness or significance of X1 AFTER X2 and X3 are already in the model. Added last
P-value very significant HSM significant HSS, HSE not 30
So both X2 and X3 aren’t significant added last ◦ The backward elimination procedure removes ONLY the single worst predictor, then reruns the MLR with all remaining variables NOTE: this changes the entire MLR model Since X2 is the least significant added last, it is removed… What will the new model be without X2?
Changes in MLR if the model changes: ◦ The MLR regression line ◦ Parameter estimates ◦ Predicted values, Residuals ◦ R-square ◦ ANOVA table ◦ F-test, T-tests ◦ Assumptions ◦ EVERYTHING!!!
So what’s the next step of backward elimination? ?
The T-test for X3 now has a P-value better than before, , but should it be removed also? What are reasonable levels for alpha in MLR? ◦ There is no default alpha level like in SLR. ◦ It just depends on the researcher. ◦ SAS defaults to α = 0.15, why? Suppose we decide to remove X3 based on this P-value, what will the new model be without X3?
X1 X2 X3 Total Variation of Y Take out X2, what happens in the picture? Then take out X3, what happens?
Remember what made the regression line in SLR best? The Least Squares Regression refers to making the ERROR sums of squares as small as possible ◦ If SSE is as small as possible then SSM (the explained variation) is as LARGE as possible!
The actual data will not fit the regression line exactly: DATA = FIT + RESIDUAL ◦ FIT is the MLR regression line ◦ RESIDUAL (“noise”) = ◦ The deviations ε i are still assumed to be independent and N(0, ).
All the same things for SLR now apply ◦ Can do confidence intervals for slope estimates ◦ R-square ◦ Predictions and Residuals ◦ Prediction Intervals for individuals ◦ Confidence interval for Mean response for a group Must check model assumptions, if something is violated needs to be addressed. ◦ Interpretation is the same as before.
Did we get all significant predictors? Yes! According to the original model containing X1, X2, and X3, we chose the best predictors. Did we get all significant predictors? ◦ No way! We could have left out predictors to begin with.
R-square helps us see how much room for improvement there is! P-value very significant R 2 is fairly small (20.46%) There could be other variables that can explain the remaining 79.54%
Book continues the GPA example with a look at adding several more potential predictors (scores from the SAT exam) So are we done?