STAT E-150 Statistical Methods

Presentation on theme: "STAT E-150 Statistical Methods"— Presentation transcript:

STAT E-150 Statistical Methods
Multiple Regression

Three percent of a man's body is essential fat, which is necessary for a healthy body. However, too much body fat can be dangerous. For men between the ages of 18 and 39, a healthy body fat percent is 8% to 19%. (For women it is 21% to 32%.) It is not easy to measure body fat percent, but we can find a model for the relationship between body fat percent and waist size and use it to find the body weight percent associated with a given waist size.

The scatterplot indicates a positive linear relationship between waist size and body fat percent:

The SPSS output shows a significant linear relationship between the two variables.
R2 = .678, so we know that almost 68% of the variability in the body fat percentage is accounted for by the waist size. What other variables might be used to predict body fat percentage? Can we improve the prediction by including additional variables? Coefficientsa  Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) 2.717 .000 Waist 1.700 .074 .824 22.875 a. Dependent Variable: Pct BF Model Summary Model R R Square Adjusted R Square Std. Error of the Estimate 1 .824a .678 .677 4.7126

The Multiple Linear Regression Model
We have n observations on k explanatory variables X1, X2, X3, …, Xk and a response variable, Y. The multiple regression model is:   Y = β0 + β1x1 + β2x2 +  + βkxk+ ε where ε ~ N(0, σε) and the errors are independent from one another. The predictor variables may be higher powers or other functions of quantitative variables, coded categorical variables, or interaction terms. The main restriction is that the model is linear; that is, each term is a constant multiple of a predictor.

Fitting a Multiple Linear Regression Model
As we did in Simple Linear Regression, we will choose a possible set of predictors, estimate the coefficients based on sample data, and assess the fit. We will again use the sum of squared residuals, where the residuals are the differences between the actual Y values and the Y values predicted by the prediction equation and use SPSS to determine the estimates of the coefficients βi that minimize the sum of the squared residuals.

We will test the hypotheses
H0: β1 = β2 = β3 =  = βk = 0 Ha: The slopes are not all zero. Our assumptions are: - The y-values are independent of each other - Y has a constant variance for any combination of predictors - The values of y are normally distributed for any fixed set of values for the explanatory variables That is, the errors are independent values from a N(0, σε) distribution.

If the null hypothesis is rejected, then test a null hypothesis for each of the coefficients:
H0: βj = 0 Ha: βj ≠ 0 Note: If the null hypothesis is not rejected, it does not mean that the corresponding predictor variable has no relationship to y; it means that the predictor variable contributes nothing to modeling y after allowing for all the other predictors.

The hypotheses for fitting a multiple linear regression model to predict body fat percentage based on waist size and height are H0: βheight = βweight = 0 Ha: The slopes are not both zero.

Here are the scatterplots using the individual predictors:
Although this suggests a linear relationship between waist size and body fat percentage, there doesn't appear to be a linear relationship between height and body fat percentage.

Here are some of the results for a multiple regression analysis with both height and waist as predictors: The p-value for height is close to 0, so we know that height does contribute to the multiple regression model. Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) -3.110 7.687 -.405 .686 Waist 1.773 .072 .859 24.768 .000 Height -.601 .110 -.190 -5.470 a. Dependent Variable: Pct BF

The graph shown below is called a scatterplot matrix
The graph shown below is called a scatterplot matrix. It shows the scatterplots for all pairs of the variables we are using Which pair of variables shows a strong linear relationship? Which pair of variables shows a weak linear relationship? Which pair of variables shows no linear relationship?

The graph shown below is called a scatterplot matrix
The graph shown below is called a scatterplot matrix. It shows the scatterplots for all pairs of the variables we are using Which pair of variables shows a strong linear relationship?   Pct BF and Waist Which pair of variables shows a weak linear relationship?   Height and Waist Which pair of variables shows no linear relationship? Pct BF and Height

Residual Analysis These plots tell us that there is no particular scatter to the residuals, and that the distribution of the residuals is close to normal.

Use the SPSS output provided to answer the questions below:
What is the fitted regression equation? Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) -3.110 7.687 -.405 .686 Waist 1.773 .072 .859 24.768 .000 Height -.601 .110 -.190 -5.470 a. Dependent Variable: Pct BF Model Summaryb Model R R Square Adjusted R Square Std. Error of the Estimate 1 .845a .713 .711 4.4598 a. Predictors: (Constant), Height, Waist b. Dependent Variable: Pct BF

Use the SPSS output provided to answer the questions below:
What is the fitted regression equation? %BodyFat = waist height Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) -3.110 7.687 -.405 .686 Waist 1.773 .072 .859 24.768 .000 Height -.601 .110 -.190 -5.470 a. Dependent Variable: Pct BF Model Summaryb Model R R Square Adjusted R Square Std. Error of the Estimate 1 .845a .713 .711 4.4598 a. Predictors: (Constant), Height, Waist b. Dependent Variable: Pct BF

Use the SPSS output provided to answer the questions below:
%BodyFat = waist height What does the value tell you? An increase of one inch in the waist measurement is associated with an increase of in body fat percentage. Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) -3.110 7.687 -.405 .686 Waist 1.773 .072 .859 24.768 .000 Height -.601 .110 -.190 -5.470 a. Dependent Variable: Pct BF Model Summaryb Model R R Square Adjusted R Square Std. Error of the Estimate 1 .845a .713 .711 4.4598 a. Predictors: (Constant), Height, Waist b. Dependent Variable: Pct BF

Use the SPSS output provided to answer the questions below:
%BodyFat = waist height What does the value tell you? An increase of one inch in the waist measurement is associated with an increase of in body fat percentage for men of a particular height. Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) -3.110 7.687 -.405 .686 Waist 1.773 .072 .859 24.768 .000 Height -.601 .110 -.190 -5.470 a. Dependent Variable: Pct BF Model Summaryb Model R R Square Adjusted R Square Std. Error of the Estimate 1 .845a .713 .711 4.4598 a. Predictors: (Constant), Height, Waist b. Dependent Variable: Pct BF

Use the SPSS output provided to answer the questions below:
%BodyFat = waist height What change in Body Fat Percentage is associated with each additional inch of height? An increase of one inch of height is associated with an decrease of .601 in body fat percentage for men of a particular weight. Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) -3.110 7.687 -.405 .686 Waist 1.773 .072 .859 24.768 .000 Height -.601 .110 -.190 -5.470 a. Dependent Variable: Pct BF Model Summaryb Model R R Square Adjusted R Square Std. Error of the Estimate 1 .845a .713 .711 4.4598 a. Predictors: (Constant), Height, Waist b. Dependent Variable: Pct BF

Use the SPSS output provided to answer the questions below:
%BodyFat = waist height What change in Body Fat Percentage is associated with each additional inch of height? An increase of one inch of height is associated with an decrease of .601 in body fat percentage for men of a particular weight. Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) -3.110 7.687 -.405 .686 Waist 1.773 .072 .859 24.768 .000 Height -.601 .110 -.190 -5.470 a. Dependent Variable: Pct BF Model Summaryb Model R R Square Adjusted R Square Std. Error of the Estimate 1 .845a .713 .711 4.4598 a. Predictors: (Constant), Height, Waist b. Dependent Variable: Pct BF

Use the SPSS output provided to answer the questions below:
What is the value of R2 ? What does it tell you? Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) -3.110 7.687 -.405 .686 Waist 1.773 .072 .859 24.768 .000 Height -.601 .110 -.190 -5.470 a. Dependent Variable: Pct BF Model Summaryb Model R R Square Adjusted R Square Std. Error of the Estimate 1 .845a .713 .711 4.4598 a. Predictors: (Constant), Height, Waist b. Dependent Variable: Pct BF

Use the SPSS output provided to answer the questions below:
What is the value of R2 ? What does it tell you? R2 = .713 which tells us that height and waist size together account for about 71.3% of the variation in the body fat percentage for men. Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) -3.110 7.687 -.405 .686 Waist 1.773 .072 .859 24.768 .000 Height -.601 .110 -.190 -5.470 a. Dependent Variable: Pct BF Model Summaryb Model R R Square Adjusted R Square Std. Error of the Estimate 1 .845a .713 .711 4.4598 a. Predictors: (Constant), Height, Waist b. Dependent Variable: Pct BF

Use the SPSS results to complete the hypothesis test:
The value the test statistic is: p = 0+ What can you conclude? Since p is close to zero, the null hypothesis is rejected. This data indicates that there is a linear relationship between body fat percentage and the predictor variables Waist and Height.

Use the SPSS results to complete the hypothesis test:
The value the test statistic is: p = 0+ What can you conclude? is close to zero, the null hypothesis is rejected. This data indicates that there is a linear relationship between body fat percentage and the predictor variables Waist and Height.

Use the SPSS results to complete the hypothesis test:
The value the test statistic is: p = 0+ What can you conclude? Since p is close to zero, the null hypothesis is rejected. This data indicates that there is a linear relationship between body fat percentage and the predictor variables waist and height.is close to zero, the null hypothesis is rejected. This data indicates that there is a linear relationship between body fat percentage and the predictor variables Waist and Height.

We also want to estimate the standard deviation of the error term, σε
As we add a new predictor to the model, we have a new coefficient to estimate, and so we lose one more degree of freedom. The estimate for the standard error of the multiple regression model with k predictors is

Use the SPSS output to find the standard error of this regression model:

Use the SPSS output to find the standard error of this regression model:

Assessing a Multiple Regression Model Individual t-Tests for Coefficients in Multiple Regression
In order to determine whether any one of the predictor variables is helpful to include in the model, we test the coefficient for that predictor: H0: βi = 0 Ha: βi ≠ 0 The test statistic is with n - k - 1 degrees of freedom.

It is important to remember that the meaning of each coefficient depends on all of the predictors in the regression model. If we fail to reject the null hypothesis, it means that the corresponding predictor variable contributes nothing to the multiple regression model after allowing for all other predictors.

Use the SPSS output to test the coefficients in our model:
H0: βheight = 0 Ha: βheight ≠ 0 t = p = What is your conclusion? Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) -3.110 7.687 -.405 .686 Waist 1.773 .072 .859 24.768 .000 Height -.601 .110 -.190 -5.470 a. Dependent Variable: Pct BF

Use the SPSS output to test the coefficients in our model:
H0: βheight = 0 Ha: βheight ≠ 0 t = p = 0+ What is your conclusion? Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) -3.110 7.687 -.405 .686 Waist 1.773 .072 .859 24.768 .000 Height -.601 .110 -.190 -5.470 a. Dependent Variable: Pct BF

Use the SPSS output to test the coefficients in our model:
H0: βheight = 0 Ha: βheight ≠ 0 t = p = 0+ What is your conclusion? Since p is close to 0, we will reject the null hypothesis. There is evidence that the percent of body fat is related to the height. We can conclude that the body fat percentage changes as the height changes, for men with the same waist size. Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) -3.110 7.687 -.405 .686 Waist 1.773 .072 .859 24.768 .000 Height -.601 .110 -.190 -5.470 a. Dependent Variable: Pct BF

Use the SPSS output to test the coefficients in our model:
H0: βwaist = 0 Ha: βwaist ≠ 0 t = p = What is your conclusion? Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) -3.110 7.687 -.405 .686 Waist 1.773 .072 .859 24.768 .000 Height -.601 .110 -.190 -5.470 a. Dependent Variable: Pct BF

Use the SPSS output to test the coefficients in our model:
H0: βwaist = 0 Ha: βwaist ≠ 0 t = p = 0+ What is your conclusion? Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) -3.110 7.687 -.405 .686 Waist 1.773 .072 .859 24.768 .000 Height -.601 .110 -.190 -5.470 a. Dependent Variable: Pct BF

Use the SPSS output to test the coefficients in our model:
H0: βwaist = 0 Ha: βwaist ≠ 0 t = p = 0+ What is your conclusion? Since p is close to 0, we will reject the null hypothesis. There is evidence that the percent of body fat is related to the waist size. We can conclude that the body fat percentage changes as the waist size changes, for men of the same height. Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) -3.110 7.687 -.405 .686 Waist 1.773 .072 .859 24.768 .000 Height -.601 .110 -.190 -5.470 a. Dependent Variable: Pct BF

Can we do a one-tailed test?
H0: βwaist = 0 Ha: βwaist > 0 t = p = What is your conclusion? Since p is close to 0, we will reject the null hypothesis. There is evidence that the percent of body fat is related to the waist size. We can conclude that the body fat percentage changes as the waist size changes, for men of the same height. Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) -3.110 7.687 -.405 .686 Waist 1.773 .072 .859 24.768 .000 Height -.601 .110 -.190 -5.470 a. Dependent Variable: Pct BF

Can we do a one-tailed test?
H0: βwaist = 0 Ha: βwaist > 0 t = p = .000/2 = 0+ What is your conclusion? Since p is close to 0, we will reject the null hypothesis. There is evidence that the percent of body fat is related to the waist size. We can conclude that the body fat percentage changes as the waist size changes, for men of the same height. Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) -3.110 7.687 -.405 .686 Waist 1.773 .072 .859 24.768 .000 Height -.601 .110 -.190 -5.470 a. Dependent Variable: Pct BF

Can we do a one-tailed test?
H0: βwaist = 0 Ha: βwaist > 0 t = p = .000/2 = 0+ What is your conclusion? Since p is close to 0, we will reject the null hypothesis. There is evidence that the percent of body fat is related to the waist size. We can conclude that the body fat percentage increases as the waist size changes, for men of the same height.to 0, we will Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) -3.110 7.687 -.405 .686 Waist 1.773 .072 .859 24.768 .000 Height -.601 .110 -.190 -5.470 a. Dependent Variable: Pct BF

Adjusted R2 The adjusted R2 is an adjustment to R2 that takes the sample size and the number of parameters (βj) into consideration. The adjusted R2 increases as more predictors are added to the model, and so it can be useful in comparing regression models with different numbers of predictor variables.

Creating a Scatterplot Matrix
Click on Graphs > Chart Builder.   Select Scatter/Dot from the list of charts. Drag the Scatterplot Matrix to the window.

Drag the matrix variables to the horizontal axis.
Click on OK. The scatterplot matrix will appear in the Output Viewer.

Estimating the Model Click on Analyze > Regression > Linear Drag the dependent variable and all independent variables to the appropriate locations. Click on OK.

This will produce several tables:
Model Summary Model R R Square Adjusted R Square Std. Error of the Estimate 1 .845a .713 .711 4.4598 a. Predictors: (Constant), Waist, Height Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) -3.110 7.687 -.405 .686 Height -.601 .110 -.190 -5.470 .000 Waist 1.773 .072 .859 24.768 a. Dependent Variable: Pct BF ANOVAb Sum of Squares df Mean Square F Regression 2 .000a Residual 247 19.890 Total 249 a. Predictors: (Constant), Waist, Height b. Dependent Variable: Pct BF

If you click on Plots in the Linear Regression dialog box, you will get this dialog box:
Plot the *ZRESIDS on the Y axis against the *ZPRED values on the X axis. You may also choose to create a Normal Probability Plot and/or histogram of the residuals.

Click on Continue and then OK. Here are the results: