Presentation is loading. Please wait.

Presentation is loading. Please wait.

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.1 Using Several Variables to Predict a Response.

Similar presentations


Presentation on theme: "Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.1 Using Several Variables to Predict a Response."— Presentation transcript:

1

2 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.1 Using Several Variables to Predict a Response

3 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 3 Regression Models The model that contains only two variables, x and y, is called a bivariate model.

4 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 4 Suppose there are two predictors, denoted by and. This is called a multiple regression model. Regression Models

5 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 5 The multiple regression model relates the mean of a quantitative response variable y to a set of explanatory variables Multiple Regression Model

6 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 6 Example: For three explanatory variables, the multiple regression equation is: Multiple Regression Model

7 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 7 Example: The sample prediction equation with three explanatory variables is: Multiple Regression Model

8 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 8 The data set “house selling prices” contains observations on 100 home sales in Florida in November 2003. A multiple regression analysis was done with selling price as the response variable and with house size and number of bedrooms as the explanatory variables. Example: Predicting Selling Price Using House Size and Number of Bedrooms

9 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 9 Output from the analysis: Table 13.3 Regression of Selling Price on House Size and Bedrooms. The regression equation is price = 60,102 + 63.0 house size + 15,170 bedrooms. Example: Predicting Selling Price Using House Size and Number of Bedrooms

10 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 10 Prediction Equation: where y = selling price, =house size and = number of bedrooms. Example: Predicting Selling Price Using House Size and Number of Bedrooms

11 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 11 One house listed in the data set had house size = 1679 square feet, number of bedrooms = 3: Find its predicted selling price: Example: Predicting Selling Price Using House Size and Number of Bedrooms

12 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 12 Find its residual: The residual tells us that the actual selling price was $21,111 higher than predicted. Example: Predicting Selling Price Using House Size and Number of Bedrooms

13 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 13 The Number of Explanatory Variables You should not use many explanatory variables in a multiple regression model unless you have lots of data. A rough guideline is that the sample size n should be at least 10 times the number of explanatory variables. For example, to use two explanatory variables, you should have at least n = 20.

14 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 14 Plotting Relationships Always look at the data before doing a multiple regression. Most software has the option of constructing scatterplots on a single graph for each pair of variables.  This is called a scatterplot matrix.

15 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 15 Figure 13.1 Scatterplot Matrix for Selling Price, House Size, and Number of Bedrooms. The middle plot in the top row has house size on the x -axis and selling price on the y -axis. The first plot in the second row reverses this, with selling price on the x -axis and house size on the y -axis. Question: Why are the plots of main interest the ones in the first row? Plotting Relationships

16 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 16 Interpretation of Multiple Regression Coefficients The simplest way to interpret a multiple regression equation looks at it in two dimensions as a function of a single explanatory variable. We can look at it this way by fixing values for the other explanatory variable(s).

17 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 17 Example using the housing data: Suppose we fix number of bedrooms = three bedrooms. The prediction equation becomes: Interpretation of Multiple Regression Coefficients

18 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 18 Since the slope coefficient of is 63, the predicted selling price increases, for houses with this number of bedrooms, by $63.00 for every additional square foot in house size. For a 100 square-foot increase in lot size, the predicted selling price increases by 100(63.00) = $6300. Interpretation of Multiple Regression Coefficients

19 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 19 Summarizing the Effect While Controlling for a Variable The multiple regression model assumes that the slope for a particular explanatory variable is identical for all fixed values of the other explanatory variables.

20 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 20 For example, the coefficient of in the prediction equation: is 63.0 regardless of whether we plug in or or. Summarizing the Effect While Controlling for a Variable

21 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 21 Figure 13.2 The Relationship Between and for the Multiple Regression Equation. This shows how the equation simplifies when number of bedrooms, or, or. Question: The lines move upward (to higher -values ) as increases. How would you interpret this fact? Summarizing the Effect While Controlling for a Variable

22 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 22 Slopes in Multiple Regression and in Bivariate Regression  In multiple regression, a slope describes the effect of an explanatory variable while controlling effects of the other explanatory variables in the model.  Bivariate regression has only a single explanatory variable. A slope in bivariate regression describes the effect of that variable while ignoring all other possible explanatory variables.

23 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 23 Importance of Multiple Regression  One of the main uses of multiple regression is to identify potential lurking variables and control for them by including them as explanatory variables in the model.  Doing so can have a major impact on a variable’s effect.  When we control a variable, we keep that variable from influencing the associations among the other variables in the study.

24 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.2 Extending the Correlation and R-Squared for Multiple Regression

25 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 25 To summarize how well a multiple regression model predicts y, we analyze how well the observed y values correlate with the predicted values. The multiple correlation is the correlation between the observed y values and the predicted values.  It is denoted by R. Multiple Correlation

26 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 26  For each subject, the regression equation provides a predicted value.  Each subject has an observed y-value and a predicted y-value. Table 13.4 Selling Prices and Their Predicted Values. These values refer to the two home sales listed in Table 13.1. The predictors are = house size and = number of bedrooms. Multiple Correlation

27 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 27 The correlation computed between all pairs of observed y-values and predicted y-values is the multiple correlation, R. The larger the multiple correlation, the better are the predictions of y by the set of explanatory variables. Multiple Correlation

28 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 28 The R-value always falls between 0 and 1. In this way, the multiple correlation ‘R’ differs from the bivariate correlation ‘r’ between y and a single variable x, which falls between -1 and +1. Multiple Correlation

29 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 29 For predicting y, the square of R describes the relative improvement from using the prediction equation instead of using the sample mean,. The error in using the prediction equation to predict y is summarized by the residual sum of squares: R-squared

30 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 30 The error in using to predict y is summarized by the total sum of squares: R-squared

31 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 31 The proportional reduction in error is: R-squared

32 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 32  The better the predictions are using the regression equation, the larger is.  For multiple regression, is the square of the multiple correlation,. R-squared

33 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 33 For the 200 observations on y=selling price, = house size, and = lot size, a table, called the ANOVA (analysis of variance) table was created. The table displays the sums of squares in the SS column. Example: Predicting House Selling Prices Table 13.5 ANOVA Table and R -Squared for Predicting House Selling Price (in thousands of dollars) Using House Size (in thousands of square feet) and Number of Bedrooms.

34 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 34 The value can be created from the sums of squares in the table Example: Predicting House Selling Prices

35 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 35 Using house size and lot size together to predict selling price reduces the prediction error by 52%, relative to using alone to predict selling price. Example: Predicting House Selling Prices

36 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 36 Find and interpret the multiple correlation. There is a moderately strong association between the observed and the predicted selling prices. House size and number of bedrooms are very helpful in predicting selling prices. Example: Predicting House Selling Prices

37 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 37 If we used a bivariate regression model to predict selling price with house size as the predictor, the value would be 0.51. If we used a bivariate regression model to predict selling price with number of bedrooms as the predictor, the value would be 0.1. Example: Predicting House Selling Prices

38 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 38 The multiple regression model has, a similar value to using only house size as a predictor. There is clearly more to this prediction than using only one variable in the model. Interpretation of results is important. Larger lot sizes in this area could mean older homes with smaller size or fewer bedrooms or bathrooms. Example: Predicting House Selling Prices

39 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 39 Table 13.6 Value for Multiple Regression Models for y = House Selling Price Example: Predicting House Selling Prices

40 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 40 Although R 2 goes up by only small amounts after house size is in the model, this does not mean that the other predictors are only weakly correlated with selling price. Because the predictors are themselves highly correlated, once one or two of them are in the model, the remaining ones don’t help much in adding to the predictive power. For instance, lot size is highly positively correlated with number of bedrooms and with size of house. So, once number of bedrooms and size of house are included as predictors in the model, there’s not much benefit to including lot size as an additional predictor. Example: Predicting House Selling Prices

41 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 41 Properties of The previous example showed that for the multiple regression model was larger than for a bivariate model using only one of the explanatory variables. A key factor of is that it cannot decrease when predictors are added to a model.

42 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 42  falls between 0 and 1.  The larger the value, the better the explanatory variables collectively predict y.  only when all residuals are 0, that is, when all regression predictions are perfect.  when the correlation between y and each explanatory variable equals 0. Properties of

43 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 43  gets larger, or at worst stays the same, whenever an explanatory variable is added to the multiple regression model.  The value of does not depend on the units of measurement. Properties of

44 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences

45 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 45 Inferences about the Population Assumptions required when using a multiple regression model to make inferences about the population:  The regression equation truly holds for the population means. This implies that there is a straight-line relationship between the mean of y and each explanatory variable, with the same slope at each value of the other predictors.

46 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 46 Assumptions required when using a multiple regression model to make inferences about the population:  The data were gathered using randomization.  The response variable y has a normal distribution at each combination of values of the explanatory variables, with the same standard deviation. Inferences about the Population

47 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 47 Inferences about Individual Regression Parameters Consider a particular parameter,  If, the mean of y is identical for all values of, at fixed values of the other explanatory variables.  So, states that y and are statistically independent, controlling for the other variables.  This means that once the other explanatory variables are in the model, it doesn’t help to have in the model.

48 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 48 SUMMARY: Significance Test about a Multiple Regression Parameter 1. Assumptions:  Each explanatory variable has a straight-line relation with with the same slope for all combinations of values of other predictors in the model.  Data gathered using randomization.  Normal distribution for y with same standard deviation at each combination of values of other predictors in model.

49 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 49 2. Hypotheses:  When is true, y is independent of, controlling for the other predictors. SUMMARY: Significance Test about a Multiple Regression Parameter

50 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 50 3. Test Statistic: where se is the standard error for b 1 SUMMARY: Significance Test about a Multiple Regression Parameter

51 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 51 4. P-value: Two-tail probability from t-distribution of values larger than observed t test statistic (in absolute value). The t-distribution has: df = n – number of parameters in the regression equation SUMMARY: Significance Test about a Multiple Regression Parameter

52 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 52 5. Conclusion: Interpret P-value in context; compare to significance level if decision needed. SUMMARY: Significance Test about a Multiple Regression Parameter

53 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 53 Example: What Helps Predict a Female Athlete’s Weight? The “College Athletes” data set comes from a study of 64 University of Georgia female athletes. The study measured several physical characteristics, including total body weight in pounds (TBW), height in inches (HGT), the percent of body fat (%BF) and age.

54 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 54 The results of fitting a multiple regression model for predicting weight using the other variables: Table 13.10 Multiple Regression Analysis for Predicting Weight Predictors are HGT = height, %BF = body fat, and age of subject. Example: What Helps Predict a Female Athlete’s Weight?

55 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 55 Interpret the effect of age on weight in the multiple regression equation: Example: What Helps Predict a Female Athlete’s Weight?

56 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 56 The slope coefficient of age is -0.96. For athletes having fixed values for and, the predicted weight decreases by 0.96 pounds for a 1-year increase in age, and the ages vary only between 17 and 23. Example: What Helps Predict a Female Athlete’s Weight?

57 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 57 Run a hypothesis test to determine whether age helps to predict weight, if you already know height and percent body fat. Here are the steps: 1. Assumptions Met:  The 64 female athletes were a convenience sample, not a random sample.  Caution should be taken when making inferences about all female college athletes. Example: What Helps Predict a Female Athlete’s Weight?

58 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 58 2. Hypotheses:  3. Test statistic: Example: What Helps Predict a Female Athlete’s Weight?

59 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 59 4. P-value: This value is reported in the output as 0.144. 5. Conclusion: The P-value of 0.144 does not give much evidence against the null hypothesis that.  Age may not significantly predict weight if we already know height and % body fat. Example: What Helps Predict a Female Athlete’s Weight?

60 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 60 Confidence Interval for a Multiple Regression Parameter A 95% confidence interval for a slope parameter in multiple regression equals: The t-score has df = (n - # of parameters in the model). The assumptions are the same as for the t test.

61 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 61 Construct and interpret a 95% CI for, the effect of age while controlling for height and % body fat Confidence Interval for a Multiple Regression Parameter

62 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 62 At fixed values of and, we infer that the population mean of weight changes very little (and maybe not at all) for a 1 year increase in age.  The confidence interval contains 0.  Age may have no effect on weight, once we control for height and % body fat. Confidence Interval for a Multiple Regression Parameter

63 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 63 Estimating Variability Around the Regression Equation A standard deviation parameter,, describes variability of the observations around the regression equation. Its sample estimate is:

64 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 64  Anova Table for the “college athletes” data set: Example: What Helps Predict a Female Athletes’ Weight? Table 13.7 ANOVA Table for Multiple Regression Analysis of Athlete Weights

65 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 65 For female athletes at particular values of height, % of body fat, and age, estimate the standard deviation of their weights.  Begin by finding the Mean Square Error: Notice that this value (102.2) appears in the MS column in the ANOVA table. Example: What Helps Predict a Female Athletes’ Weight?

66 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 66  The standard deviation is:  This value is also displayed in the ANOVA table.  For athletes with certain fixed values of height, % body fat, and age, the weights vary with a standard deviation of about 10 pounds. Example: What Helps Predict a Female Athletes’ Weight?

67 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 67 Insight: The conditional distributions of weight are approximately bell-shaped, about 95% of the weight values fall within about 2s = 20 pounds of the true regression equation. Example: What Helps Predict a Female Athletes’ Weight?

68 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 68 The Collective Effect of Explanatory Variables Example: With 3 predictors in a model, we can check this by testing:

69 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 69 The test statistic for is denoted by F. It equals the ratio of the mean squares from the ANOVA table, The Collective Effect of Explanatory Variables

70 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 70  When is true, the expected value of the F test statistic is approximately 1.  When is false, F tends to be larger than 1.  The larger the F test statistic, the stronger the evidence against. The Collective Effect of Explanatory Variables

71 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 71 SUMMARY: F Test That All Beta Parameters = 0 1. Assumptions:  Multiple regression equation holds  Data gathered randomly  Normal distribution for y with same standard deviation at each combination of predictors

72 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 72 2. Hypothesis: 3. Test statistic: SUMMARY: F Test That All Beta Parameters = 0

73 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 73 4. P-value: Right-tail probability above observed F-test statistic value from F distribution with:  df1 = number of explanatory variables  df2 = n – (number of parameters in regression equation) SUMMARY: F Test That All Beta Parameters = 0

74 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 74 5. Conclusion: The smaller the P-value, the stronger the evidence that at least one explanatory variable has an effect on y.  If a decision is needed, reject if P-value significance level, such as 0.05. SUMMARY: F Test That All Beta Parameters = 0

75 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 75 Example: What Helps Predict a Female Athlete’s Weight? For the 64 female college athletes, the regression model for predicting y = weight using = height, = % body fat and = age is summarized in the ANOVA table on the next slide.

76 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 76 Anova table for the “college athletes” data set: Table 13.7 ANOVA Table for Multiple Regression Analysis of Athlete Weights Example: What Helps Predict a Female Athlete’s Weight?

77 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 77 Use the output in the ANOVA table to test the hypothesis: Example: What Helps Predict a Female Athlete’s Weight?

78 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 78  The observed F statistic is 40.5  The corresponding P-value is 0.000  We can reject H 0 at the 0.05 significance level In summary, we conclude that at least one predictor has an effect on weight. Example: What Helps Predict a Female Athlete’s Weight?

79 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 79 Insight:  The F-test tells us that at least one explanatory variable has an effect.  If the explanatory variables are chosen sensibly, at least one should have some predictive power.  The F-test result tells us whether there is sufficient evidence to make it worthwhile to consider the individual effects, using t-tests. Example: What Helps Predict a Female Athlete’s Weight?

80 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 80 The individual t-tests identify which of the variables are significant (controlling for the other variables) Table 13.10 Multiple Regression Analysis for Predicting Weight Predictors are HGT = height, %BF = body fat, and age of subject. Example: What Helps Predict a Female Athlete’s Weight?

81 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 81  If a variable turns out not to be significant, it can be removed from the model.  In this example, ‘age’ can be removed from the model. Example: What Helps Predict a Female Athlete’s Weight?

82 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.4 Checking a Regression Model Using Residual Plots

83 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 83 Assumptions for Inference with a Multiple Regression Model 1.The regression equation approximates well the true relationship between the predictors and the mean of y. 2.The data were gathered randomly. 3.y has a normal distribution with the same standard deviation at each combination of predictors.

84 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 84 Checking Shape and Detecting Unusual Observations To test Assumption 3 (the conditional distribution of y is normal at any fixed values of the explanatory variables):  Construction a histogram of the standardized residuals.  The histogram should be approximately bell-shaped.  Nearly all the standardized residuals should fall between -3 and +3. Any residual outside these limits is a potential outlier.

85 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 85 Example: House Selling Price  For the house selling price data, a MINITAB histogram of the standardized residuals for the multiple regression model predicting selling price by the house size and the lot size was created and is displayed on the following slide.

86 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 86 Figure 13.4 Histogram of Standardized Residuals for Multiple Regression Model Predicting Selling Price. Question: Give an example of a shape for this histogram that would indicate that a few observations are highly unusual. Example: House Selling Price

87 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 87  The residuals are roughly bell shaped about 0.  They fall mostly between about -3 and +3.  No severe nonnormality is indicated. Example: House Selling Price

88 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 88 Plotting Residuals against Each Explanatory Variable  Plots of residuals against each explanatory variable help us check for potential problems with the regression model.  Ideally, the residuals should fluctuate randomly about the horizontal line at 0.  There should be no obvious change in trend or change in variation as the values of the explanatory variable increases.

89 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 89 Figure 13.5 Possible Patterns for Residuals, Plotted Against an Explanatory Variable. Question: Why does the pattern in (b) suggest that the effect of is not linear? Plotting Residuals against Each Explanatory Variable

90 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.5 Regression and Categorical Predictors

91 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 91 Indicator Variables Regression models can specify categories of a categorical explanatory variable using artificial variables, called indicator variables.  The indicator variable for a particular category is binary.  It equals 1 if the observation falls into that category and it equals 0 otherwise.

92 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 92 In the house selling prices data set, the condition of the house is a categorical variable. It was measured with categories (good, not good).  The indicator variable x for condition is  if house is in good condition  if house is not in good condition Indicator Variables

93 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 93 The regression model is then, with x as just defined. Substituting the possible values 1 and 0 for x, The difference between the mean selling price for houses in good condition and not in good condition is The coefficient of the indicator variable x is the difference between the mean selling prices for homes in good condition and for homes not in good condition. Indicator Variables

94 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 94 Example: Including Condition in Regression for House Selling Price Output from the regression model for selling price of home using house size and region. Table 13.11 Regression Analysis of y = Selling Price Using =House Size and = Indicator Variable for Condition (Good, Not Good)

95 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 95 1.Find and plot the lines showing how predicted selling price varies as a function of house size, for homes in good condition or not in good condition. 2.Interpret the coefficient of the indicator variable for condition. Example: Including Condition in Regression for House Selling Price

96 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 96 The regression equation from the MINITAB output is: Example: Including Condition in Regression for House Selling Price

97 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 97  For homes not in good condition,  The prediction equation then simplifies to: Example: Including Condition in Regression for House Selling Price

98 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 98  For homes in good condition,  The prediction equation then simplifies to: Example: Including Condition in Regression for House Selling Price

99 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 99 Figure 13.7 Plot of Equation Relating =Predicted Selling Price to =House Size, According to =Condition (1=Good, 0=Not Good). Question: Why are the lines parallel? Example: Including Condition in Regression for House Selling Price

100 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 100 Both lines have the same slope, 66.5  The line for homes in good condition is above the other line (not good) because its y-intercept is larger. This means that for any fixed value of house size, the predicted selling price is higher for homes in better condition.  The P-value of 0.453 for the test for the coefficient of the indicator variable suggests that this difference is not statistically significant. Example: Including Condition in Regression for House Selling Price

101 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 101 Is there Interaction? For two explanatory variables, interaction exists between them in their effects on the response variable when the slope of the relationship between and one of them changes as the value of the other changes.

102 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 102 Example: Interaction in effects on House Selling Price Suppose the actual population relationship between house size and the mean selling price is:  Then the slope for the effect of differs for the two conditions. There is then interaction between house size and condition in their effects on selling price. See Figure 13.8 on the next slide.

103 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 103 Figure 13.8 An Example of Interaction. There’s a larger slope between selling price and house size for homes in good condition than in other conditions. Example: Interaction in effects on House Selling Price

104 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 104 How can you allow for interaction when you do a regression analysis?  To allow for interaction with two explanatory variables, one quantitative and one categorical, you can fit a separate regression line with a different slope between the two quantitative variables for each category of the categorical variable. Example: Interaction in effects on House Selling Price

105 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.6 Modeling a Categorical Response

106 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 106 Modeling a Categorical Response Variable The regression models studied so far are designed for a quantitative response variable y. When y is categorical, a different regression model applies, called logistic regression.

107 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 107 Examples of Logistic Regression  A voter’s choice in an election (Democrat or Republican), with explanatory variables: annual income, political ideology, religious affiliation, and race.  Whether a credit card holder pays their bill on time (yes or no), with explanatory variables: family income and the number of months in the past year that the customer paid the bill on time.

108 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 108 The Logistic Regression Model  Denote the possible outcomes for y as 0 and 1.  Use the generic terms failure (for outcome = 0), and success (for outcome =1).  The population mean of the scores equals the population proportion of ‘1’ outcomes (successes).  That is,  The proportion, p, also represents the probability that a randomly selected subject has a successful outcome.

109 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 109  The straight-line model is usually inadequate when there are multiple explanatory variables.  A more realistic model has a curved S-shape instead of a straight-line trend.  The regression equation that best models this S- shaped curve is known as the logistic regression equation. The Logistic Regression Model

110 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 110 Figure 13.10 Two Possible Regressions for a Probability p of a Binary Response Variable. A straight line is usually less appropriate than an S-shaped curve. Question: Why is the straight-line regression model for a binary response variable often poor? The Logistic Regression Model

111 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 111 A regression equation for an S-shaped curve for the probability of success p is: This equation for p is called the logistic regression equation. Logistic regression is used when the response variable has only two possible outcomes (it’s binary). The Logistic Regression Model

112 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 112 Example: Travel Credit Cards An Italian study with 100 randomly selected Italian adults considered factors that are associated with whether a person possesses at least one travel credit card. The table 13.12 on the next slide shows results for the first 15 people on this response variable and on the person’s annual income (in thousands of euros).

113 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 113 Table 13.12 Annual Income (in thousands of euros) and Whether Possess a Travel Credit Card. The response y equals 1 if a person has a travel credit card and equals 0 otherwise. Example: Travel Credit Cards

114 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 114 Let x = annual income and let y = whether the person possesses a travel credit card (1 = yes, 0 = no). Table 13.13 shows what software provides for conducting a logistic regression analysis. Table 13.13 Results of Logistic Regression for Italian Credit Card Data Example: Travel Credit Cards

115 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 115 Substituting the and estimates into the logistic regression model formula yields: Example: Travel Credit Cards

116 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 116 Find the estimated probability of possessing a travel credit card at the lowest and highest annual income levels in the sample, which were x = 12 and x = 65. Example: Travel Credit Cards

117 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 117 For x = 12 thousand euros, the estimated probability of possessing a travel credit card is: Example: Travel Credit Cards

118 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 118 For x = 65 thousand euros, the estimated probability of possessing a travel credit card is: Example: Travel Credit Cards

119 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 119 Insight:  Annual income has a strong positive effect on having a credit card.  The estimated probability of having a travel credit card changes from 0.09 to 0.97 as annual income changes over its range. Example: Travel Credit Cards

120 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 120 Example: Estimating Proportion of Students Who’ve Used Marijuana A three-variable contingency table from a survey of senior high-school students is shown on the next slide. The students were asked whether they had ever used: alcohol, cigarettes or marijuana. We’ll treat marijuana use as the response variable and cigarette use and alcohol use as explanatory variables.

121 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 121 Table 13.14 Alcohol, Cigarette, and Marijuana Use for High School Seniors Example: Estimating Proportion of Students Who’ve Used Marijuana

122 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 122  Let y indicate marijuana use, coded: (1 = yes, 0 = no)  Let be an indicator variable for alcohol use, coded (1 = yes, 0 = no)  Let be an indicator variable for cigarette use, coded (1 = yes, 0 = no) Example: Estimating Proportion of Students Who’ve Used Marijuana

123 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 123 Table 13.15 MINITAB Output for Estimating the Probability of Marijuana Use Based on Alcohol Use and Cigarette Use Example: Estimating Proportion of Students Who’ve Used Marijuana

124 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 124 The logistic regression prediction equation is: Example: Estimating Proportion of Students Who’ve Used Marijuana

125 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 125 For those who have not used alcohol or cigarettes,. For them, the estimated probability of marijuana use is Example: Estimating Proportion of Students Who’ve Used Marijuana

126 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 126 For those who have used alcohol and cigarettes,. For them, the estimated probability of marijuana use is Example: Estimating Proportion of Students Who’ve Used Marijuana

127 Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 127 SUMMARY: The probability that students have tried marijuana seems to depend greatly on whether they’ve used alcohol and/or cigarettes. Example: Estimating Proportion of Students Who’ve Used Marijuana


Download ppt "Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.1 Using Several Variables to Predict a Response."

Similar presentations


Ads by Google