Download presentation
Presentation is loading. Please wait.
Published byWinfred Bryan Modified over 8 years ago
1
SW388R6 Data Analysis and Computers I Slide 1 Multiple Regression Key Points about Multiple Regression Sample Homework Problem Solving the Problem with SPSS Logic for multiple Regression
2
SW388R6 Data Analysis and Computers I Slide 2 Key points about multiple regression Few, if any, phenomena in social and behavioral research can be explained with a single predictor. More realistically, social phenomena are very complex, requiring a number of predictors to model the relationship. Multiple regression is an extension of simple linear regression that enables us to include multiple predictors in our regression equation. The interpretation of a multiple regression is very similar to the interpretation of a simple linear regression, but there are important differences.
3
SW388R6 Data Analysis and Computers I Slide 3 Similarities and differences - 1 In both simple linear and multiple regression, there is an ANOVA test of the overall relationship. In both simple linear and multiple regression, R 2 represents the proportion of variance explained (error reduced) in predicting the dependent variable based on the independent variable. In both simple linear and multiple regression, Multiple R represents the strength of the relationship and the effect size. In multiple regression it is always positive and is not equal to any of the beta coefficients.
4
SW388R6 Data Analysis and Computers I Slide 4 Similarities and differences - 2 In simple linear regression, the significance of the overall relationship and the relationship of each independent variable were the same. In multiple regression, there is a test of significance for the coefficient of each independent variable. There is no necessary relationship between the significance of the overall relationship and the significance of the relationships for each of the individual predictors. When the overall relationship is significant, it is possible that none, some, or all of the individual relationships will be significant.
5
SW388R6 Data Analysis and Computers I Slide 5 Similarities and differences - 3 Multiple regression is required to satisfy all of the assumptions of simple linear regression: 1. The relationship is linear 2. The residuals have the same variance 3. The residuals are independent of each other 4. The residuals are normally distributed Plus one additional assumption: The independent variables are independent of one another, i.e. they add to the variance explained in the dependent variable rather than explain the same variance explained by other independent variables.
6
SW388R6 Data Analysis and Computers I Slide 6 Similarities and differences - 4 In a multiple regression equation, the coefficient for each individual variable represents the change in the dependent variable that it is uniquely responsible for, i.e. assuming the relationships between the other independent variables and the dependent variable. The correlation between individual predictors results in contribution toward explaining the dependent variable made jointly by both, and not credited to either individual predictor. In extreme cases, the relationship between independent variable is so strong that they are not credited with explaining the dependent variable, even though both might have a strong individual relationship to the dependent variable.
7
SW388R6 Data Analysis and Computers I Slide 7 Similarities and differences - 5 If this happens, we may have predictors that really have a strong relationship having a b coefficient that is not statistically significant. The interpretation, based on the non-significant b coefficient, that the variable did not have a relationship would be an error. To satisfy the assumption of independence of variable, our regression must not include variables that are collinear. The diagnostic statistic for detecting multicollinearity is “tolerance,” which SPSS includes in the table of coefficients.
8
SW388R6 Data Analysis and Computers I Slide 8 Similarities and differences - 6 In extreme cases of multicollinearity, SPSS cannot compute the regression equation. In this case, SPSS will exclude the variable which it thinks is producing the variable even though we have told it to include the variable in the analysis.
9
SW388R6 Data Analysis and Computers I Slide 9 Similarities and differences - 7 Having more than one predictor in the regression equation leads to the question of which variable has the more important relationship to the dependent variable, i.e. which has the largest impact on the predicted scores. Since beta coefficients are standardized, the one with the largest absolute value (ignoring the sign) is the most important, since it is the amount of increase in standard deviations for the dependent variable that is produced by a one standard deviation change in the independent variable.
10
SW388R6 Data Analysis and Computers I Slide 10 Change in response for sample size On the simple linear regression problems, the answer was an Incorrect application of a statistic if the sample size available to the analysis was less than the number recommended by Tabachnick and Fidell. In reviewing problems, there were numerous occasions when a smaller sample yielded a statistically significant result, making the response Incorrect application of a statistic inappropriate itself. For these problems, I am changing the response to adding a caution when the answer is true. This reflects the possibility that planning a sample of the given size risked not finding a significant result, but does not negate an otherwise useful result.
11
SW388R6 Data Analysis and Computers I Slide 11 Based on information from the data set 2001WorldFactbook.sav, is the following statement true, false, or an incorrect application of a statistic? Use.05 for alpha. "Population growth rate" [pgrowth],"total fertility rate" [fertrate] and "percent of the population below poverty line" [poverty] significantly predicted "infant mortality rate" [infmort]. The relationship was strong and reduced the error in predicting "infant mortality rate" by approximately 75% (R² = 0.753, F(3, 91) = 92.67, p <.001). "Population growth rate" significantly predicted "infant mortality rate", ß = -0.393, t(91) = -4.04, p <.001. Higher values of "population growth rate" were inversely related to lower values of "infant mortality rate". "Total fertility rate" significantly predicted "infant mortality rate", ß = 0.965, t(91) = 8.90, p <.001. Higher values of "total fertility rate" were directly related to higher values of "infant mortality rate". Sample homework problem: Multiple regression – part 1 This is the general framework for the problems in the homework assignment on multiple regression problems. The problem includes a statement for the overall relationship, an individual statement for each of the independent variables, and a statement on the relative importance of predictors.
12
SW388R6 Data Analysis and Computers I Slide 12 (cont’d) "Percent of the population below poverty line" significantly predicted "infant mortality rate", ß = 0.280, t(91) = 4.41, p <.001. Higher values of "percent of the population below poverty line" were directly related to higher values of "infant mortality rate". "Total fertility rate" [fertrate] was the most important predictor of the value of "infant mortality rate" [infmort] compared to the other independent variables. o True o True with caution o False o Incorrect application of a statistic Sample homework problem: Multiple regression - part 2 The problem includes a statement for the overall relationship, an individual statement for each of the independent variables, and a statement on the relative importance of predictors.
13
SW388R6 Data Analysis and Computers I Slide 13 Based on information from the data set 2001WorldFactbook.sav, is the following statement true, false, or an incorrect application of a statistic? Use.05 for alpha. "Population growth rate" [pgrowth],"total fertility rate" [fertrate] and "percent of the population below poverty line" [poverty] significantly predicted "infant mortality rate" [infmort]. The relationship was strong and reduced the error in predicting "infant mortality rate" by approximately 75% (R² = 0.753, F(3, 91) = 92.67, p <.001). "Population growth rate" significantly predicted "infant mortality rate", ß = -0.393, t(91) = -4.04, p <.001. Higher values of "population growth rate" were inversely related to lower values of "infant mortality rate". "Total fertility rate" significantly predicted "infant mortality rate", ß = 0.965, t(91) = 8.90, p <.001. Higher values of "total fertility rate" were directly related to higher values of "infant mortality rate". Sample homework problem: Data set and alpha The first paragraph identifies: The data set to use, e.g. 2001WorldFactbook.sav The alpha level for the hypothesis test
14
SW388R6 Data Analysis and Computers I Slide 14 Based on information from the data set 2001WorldFactbook.sav, is the following statement true, false, or an incorrect application of a statistic? Use.05 for alpha. "Population growth rate" [pgrowth],"total fertility rate" [fertrate] and "percent of the population below poverty line" [poverty] significantly predicted "infant mortality rate" [infmort]. The relationship was strong and reduced the error in predicting "infant mortality rate" by approximately 75% (R² = 0.753, F(3, 91) = 92.67, p <.001). "Population growth rate" significantly predicted "infant mortality rate", ß = -0.393, t(91) = -4.04, p <.001. Higher values of "population growth rate" were inversely related to lower values of "infant mortality rate". "Total fertility rate" significantly predicted "infant mortality rate", ß = 0.965, t(91) = 8.90, p <.001. Higher values of "total fertility rate" were directly related to higher values of "infant mortality rate". Sample homework problem: The overall relationship The second paragraph states the finding that we want to verify with a multiple regression. The finding identifies: The independent variables The dependent variable The strength of the relationship
15
SW388R6 Data Analysis and Computers I Slide 15 Based on information from the data set 2001WorldFactbook.sav, is the following statement true, false, or an incorrect application of a statistic? Use.05 for alpha. "Population growth rate" [pgrowth],"total fertility rate" [fertrate] and "percent of the population below poverty line" [poverty] significantly predicted "infant mortality rate" [infmort]. The relationship was strong and reduced the error in predicting "infant mortality rate" by approximately 75% (R² = 0.753, F(3, 91) = 92.67, p <.001). "Population growth rate" significantly predicted "infant mortality rate", ß = -0.393, t(91) = -4.04, p <.001. Higher values of "population growth rate" were inversely related to lower values of "infant mortality rate". "Total fertility rate" significantly predicted "infant mortality rate", ß = 0.965, t(91) = 8.90, p <.001. Higher values of "total fertility rate" were directly related to higher values of "infant mortality rate". Sample homework problem: Individual relationships Each of the paragraphs for the individual independent variables contains: A statement about the significance of the relationship between the individual independent variable and the dependent variable A statement about the direction of the relationship between the individual independent variable and the dependent variable
16
SW388R6 Data Analysis and Computers I Slide 16 "Percent of the population below poverty line" significantly predicted "infant mortality rate", ß = 0.280, t(91) = 4.41, p <.001. Higher values of "percent of the population below poverty line" were directly related to higher values of "infant mortality rate". "Total fertility rate" [fertrate] was the most important predictor of the value of "infant mortality rate" [infmort] compared to the other independent variables. o True o True with caution o False o Incorrect application of a statistic Sample homework problem: Importance of variables The answer will be True if all parts of the problem are correct. The answer to a problem will Incorrect application of a statistic if the level of measurement or multicollinearity requirement is violated. The answer to a problem will be True with caution if the analysis includes an ordinal or we do not meet the sample size requirement. The answer will be False if any part of the problem is not correct. The last paragraph is a statement of the relative importance of the predictors, e.g. which variable makes the largest change in the dependent variable.
17
SW388R6 Data Analysis and Computers I Slide 17 Solving the problem with SPSS: Level of measurement Multiple regression requires that the dependent variable be interval and the independent variables be interval or dichotomous. "Infant mortality rate" [infmort] is interval level, satisfying the requirement for the dependent variable. "Population growth rate" [pgrowth] is interval level, satisfying the requirement for the independent variable. "Total fertility rate" [fertrate] is interval level, satisfying the requirement for the independent variable. "Percent of the population below poverty line" [poverty] is interval level, satisfying the requirement for the independent variable.
18
SW388R6 Data Analysis and Computers I Slide 18 Solving the problem with SPSS: Multiple regression -1 Before we can address the other issues involved in solving the problem, we need to generate the SPSS output. Select Regression > Linear… from the Analyze menu.
19
SW388R6 Data Analysis and Computers I Slide 19 Solving the problem with SPSS: Multiple regression -2 First, move the dependent variable infmort to the Dependent list box. Second, move the independent variables pgrowth, fertrate, and poverty to the Independents list box. Third, click on the Statistics button to add the additional statistics.
20
SW388R6 Data Analysis and Computers I Slide 20 Solving the problem with SPSS: Multiple regression -3 Second, click on the Continue button to close the dialog box. First, in addition to the SPSS defaults, we add the check box for Descriptives and Collinearity diagnositics.
21
SW388R6 Data Analysis and Computers I Slide 21 Solving the problem with SPSS: Multiple regression -4 When we return to the Linear Regression dialog box, we click on OK to obtain the output.
22
SW388R6 Data Analysis and Computers I Slide 22 Solving the problem with SPSS: Multicollinearity The tolerance values for all of the independent variables are larger than 0.10: "population growth rate" [pgrowth] (0.287), "total fertility rate" [fertrate] (0.230) and "percent of the population below poverty line" [poverty] (0.673). Multicollinearity is not a problem in this regression analysis.
23
SW388R6 Data Analysis and Computers I Slide 23 Solving the problem with SPSS: Sample size NOTE: adding a caution to our findings rather than concluding that it is not an appropriate use of statistics is a more reasonable response than what we did for multiple regression. Using the rule of thumb from Tabachnick and Fidell that the required number of cases should be the larger of the number of independent variables x 8 + 50 or the number of independent variables + 105, multiple regression requires 108 cases. With 95 valid cases, the sample size requirement is not satisfied. A caution should be added to our findings.
24
SW388R6 Data Analysis and Computers I Slide 24 Solving the problem with SPSS: Interpreting the overall relationship - 1 The overall relationship between the independent variables "population growth rate" [pgrowth],"total fertility rate" [fertrate] and "percent of the population below poverty line" [poverty] and the dependent variable "infant mortality rate" [infmort] was statistically significant, R² = 0.753, F(3, 91) = 92.67, p <.001. The first sentence in the finding states that: "Population growth rate" [pgrowth],"total fertility rate" [fertrate] and "percent of the population below poverty line" [poverty] significantly predicted "infant mortality rate" [infmort]. The relationship was strong and reduced the error in predicting "infant mortality rate" by approximately 75% (R² = 0.753, F(3, 91) = 92.67, p <.001). The R² of.753 is the reduction in error achieved by using scores for Population growth rate" [pgrowth],"total fertility rate" [fertrate] and "percent of the population below poverty line" [poverty] to predict scores for "infant mortality rate" [infmort].
25
SW388R6 Data Analysis and Computers I Slide 25 Solving the problem with SPSS: Interpreting the overall relationship - 2 We reject the null hypothesis that all of the partial slopes (b coefficients) = 0 and conclude that at least one of the partial slopes (b coefficients) ≠ 0. The first sentence in the finding states that: "Population growth rate" [pgrowth],"total fertility rate" [fertrate] and "percent of the population below poverty line" [poverty] significantly predicted "infant mortality rate" [infmort]. The relationship was strong and reduced the error in predicting "infant mortality rate" by approximately 75% (R² = 0.753, F(3, 91) = 92.67, p <.001).
26
SW388R6 Data Analysis and Computers I Slide 26 Solving the problem with SPSS: Interpreting the overall relationship - 3 The Multiple R of 0.868 was correctly characterized as a strong relationship, using Cohen’s criteria: r <.1 = Trivial.1 ≤ r <.3 = Small or weak.3 ≤ r <.5 = Medium or moderate r ≥.5 = Large or strong The first sentence in the finding states that: "Population growth rate" [pgrowth],"total fertility rate" [fertrate] and "percent of the population below poverty line" [poverty] significantly predicted "infant mortality rate" [infmort]. The relationship was strong and reduced the error in predicting "infant mortality rate" by approximately 75% (R² = 0.753, F(3, 91) = 92.67, p <.001).
27
SW388R6 Data Analysis and Computers I Slide 27 Solving the problem with SPSS: Interpreting individual relationships - 1 The individual relationship between the independent variable "population growth rate" [pgrowth] and the dependent variable "infant mortality rate" [infmort] was statistically significant, β = - 0.393, t(91) = -4.04, p <.001. The second sentence in the finding states that: "Population growth rate" significantly predicted "infant mortality rate", β = -0.393, t(91) = -4.04, p <.001. Higher values of "population growth rate" were inversely related to lower values of "infant mortality rate". We reject the null hypothesis that the partial slope (b coefficient) for the variable "population growth rate" = 0 and conclude that the partial slope (b coefficient) for the variable "population growth rate" ≠ 0.
28
SW388R6 Data Analysis and Computers I Slide 28 Solving the problem with SPSS: Interpreting individual relationships - 2 The second sentence in the finding states that: "Population growth rate" significantly predicted "infant mortality rate", β = -0.393, t(91) = -4.04, p <.001. Higher values of "population growth rate" were inversely related to lower values of "infant mortality rate". The negative sign of the B coefficient and the Beta coefficient implies that higher values of "population growth rate" were inversely related to lower values of "infant mortality rate".
29
SW388R6 Data Analysis and Computers I Slide 29 Solving the problem with SPSS: Interpreting individual relationships - 3 The individual relationship between the independent variable "total fertility rate" [fertrate] and the dependent variable "infant mortality rate" [infmort] was statistically significant, β = 0.965, t(91) = 8.90, p <.001. The third sentence in the finding states that: "Total fertility rate" significantly predicted "infant mortality rate", β = 0.965, t(91) = 8.90, p <.001. Higher values of "total fertility rate" were directly related to higher values of "infant mortality rate". We reject the null hypothesis that the partial slope (b coefficient) for the variable "total fertility rate" = 0 and conclude that the partial slope (b coefficient) for the variable "total fertility rate" ≠ 0.
30
SW388R6 Data Analysis and Computers I Slide 30 Solving the problem with SPSS: Interpreting individual relationships - 4 The third sentence in the finding states that: "Total fertility rate" significantly predicted "infant mortality rate", β = 0.965, t(91) = 8.90, p <.001. Higher values of "total fertility rate" were directly related to higher values of "infant mortality rate". The positive sign of the B coefficient and the Beta coefficient implies that higher values of "total fertility rate" were directly related to higher values of "infant mortality rate".
31
SW388R6 Data Analysis and Computers I Slide 31 Solving the problem with SPSS: Interpreting individual relationships - 5 The individual relationship between the independent variable "percent of the population below poverty line" [poverty] and the dependent variable "infant mortality rate" [infmort] was statistically significant, β = 0.280, t(91) = 4.41, p <.001. The fourth sentence in the finding states that: "Percent of the population below poverty line" significantly predicted "infant mortality rate", β = 0.280, t(91) = 4.41, p <.001. Higher values of "percent of the population below poverty line" were directly related to higher values of "infant mortality rate". We reject the null hypothesis that the partial slope (b coefficient) for the variable "population growth rate" = 0 and conclude that the partial slope (b coefficient) for the variable "population growth rate" ≠ 0.
32
SW388R6 Data Analysis and Computers I Slide 32 Solving the problem with SPSS: Interpreting individual relationships - 6 The fourth sentence in the finding states that: "Percent of the population below poverty line" significantly predicted "infant mortality rate", β = 0.280, t(91) = 4.41, p <.001. Higher values of "percent of the population below poverty line" were directly related to higher values of "infant mortality rate". The positive sign of the B coefficient and the Beta coefficient implies that higher values of "percent of the population below poverty line" were directly related to higher values of "infant mortality rate".
33
SW388R6 Data Analysis and Computers I Slide 33 Solving the problem with SPSS: Interpreting individual relationships - 7 The fifth sentence in the finding states that: "Total fertility rate" [fertrate] was the most important predictor of the value of "infant mortality rate" [infmort] compared to the other independent variables. "Total fertility rate" [fertrate] was the most important predictor because the absolute value of it's beta coefficient (0.965) was larger than the absolute value of the beta coefficients for the other independent variables.
34
SW388R6 Data Analysis and Computers I Slide 34 Solving the problem with SPSS: Answering the question The findings for this problem state that: "Population growth rate" [pgrowth],"total fertility rate" [fertrate] and "percent of the population below poverty line" [poverty] significantly predicted "infant mortality rate" [infmort]. The relationship was strong and reduced the error in predicting "infant mortality rate" by approximately 75% (R² = 0.753, F(3, 91) = 92.67, p <.001). "Population growth rate" significantly predicted "infant mortality rate", ß = -0.393, t(91) = -4.04, p <.001. Higher values of "population growth rate" were inversely related to lower values of "infant mortality rate". "Total fertility rate" significantly predicted "infant mortality rate", ß = 0.965, t(91) = 8.90, p <.001. Higher values of "total fertility rate" were directly related to higher values of "infant mortality rate". "Percent of the population below poverty line" significantly predicted "infant mortality rate", ß = 0.280, t(91) = 4.41, p <.001. Higher values of "percent of the population below poverty line" were directly related to higher values of "infant mortality rate". "Total fertility rate" [fertrate] was the most important predictor of the value of "infant mortality rate" [infmort] compared to the other independent variables. All of the statements of findings are true, so the answer to the question is True with caution. The caution is added because we did not satisfy the required sample size.
35
SW388R6 Data Analysis and Computers I Slide 35 Logic for multiple regression: Level of measurement Measurement level of independent variable? Interval/Ordinal /Dichotomous Measurement level of dependent variable? Interval/ordinal Nominal/ Dichotomous Inappropriate application of a statistic Strictly speaking, the test requires an interval level variable. We will allow ordinal level variables with a caution. Inappropriate application of a statistic Nominal
36
SW388R6 Data Analysis and Computers I Slide 36 Logic for multiple regression: multicollinearity Inappropriate application of a statistic Compute linear regression including descriptive statistics Tolerance for all independent variables ≥ 0.10? No Yes
37
SW388R6 Data Analysis and Computers I Slide 37 Logic for multiple regression: Sample size requirement Caution added to any true findings Compute linear regression including descriptive statistics Valid cases satisfies computed requirement? No Yes The sample size requirement is the larger of : the number of independent variables x 8 + 50 the number of independent variables + 105 NOTE: violation of sample size requirements is a caution rather than an inappropriate application of a statistic.
38
SW388R6 Data Analysis and Computers I Slide 38 Logic for multiple regression: Significant, non-trivial overall relationship Probability for F-test for all coefficients less than or equal to alpha? False Effect size (Multiple R) is not trivial by Cohen’s scale, i.e. equal to or larger than 0.10? Yes No FalseYes
39
SW388R6 Data Analysis and Computers I Slide 39 Logic for multiple regression: Strength of overall relationship Strength of relationship correctly interpreted (Multiple R)? No FalseYes Reduction in error correctly interpreted based Multiple R²? No False Yes
40
SW388R6 Data Analysis and Computers I Slide 40 Logic for multiple regression: Significance and direction individual relationships Probability for t-test for B coefficient less than or equal to alpha? False Yes No Direction of relationship correctly interpreted based on B or Beta coefficient? No FalseYes These steps must be repeated for each independent variable.
41
SW388R6 Data Analysis and Computers I Slide 41 Logic for multiple regression: Importance of individual predictors The statistics in the SPSS output match all of the statistics cited in the problem? No FalseYes True Add caution if dependent or independent variable is ordinal or we do not meet sample size requirement. Predictor with largest absolute Beta identified as most important? False Yes No
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.