Presentation is loading. Please wait.

Presentation is loading. Please wait.

Agresti/Franklin Statistics, 1 of 141 Chapter 12 Multiple Regression Learn…. T o use Multiple Regression Analysis to predict a response variable using.

Similar presentations


Presentation on theme: "Agresti/Franklin Statistics, 1 of 141 Chapter 12 Multiple Regression Learn…. T o use Multiple Regression Analysis to predict a response variable using."— Presentation transcript:

1 Agresti/Franklin Statistics, 1 of 141 Chapter 12 Multiple Regression Learn…. T o use Multiple Regression Analysis to predict a response variable using more than one explanatory variable.

2 Agresti/Franklin Statistics, 2 of 141 Section 12.1 How Can We Use Several Variables to Predict a Response?

3 Agresti/Franklin Statistics, 3 of 141 Regression Models The model that contains only two variables, x and y, is called a bivariate model

4 Agresti/Franklin Statistics, 4 of 141 Regression Models The regression equation for the bivariate model is:

5 Agresti/Franklin Statistics, 5 of 141 Regression Models Suppose there are two predictors, denoted by x 1 and x 2 This is called a multiple regression model

6 Agresti/Franklin Statistics, 6 of 141 Regression Models The regression equation for this multiple regression model with two predictors is:

7 Agresti/Franklin Statistics, 7 of 141 Multiple Regression Model The multiple regression model relates the mean µ y of a quantitative response variable y to a set of explanatory variables x 1, x 2,….

8 Agresti/Franklin Statistics, 8 of 141 Multiple Regression Model Example: For three explanatory variables, the multiple regression equation is:

9 Agresti/Franklin Statistics, 9 of 141 Multiple Regression Model Example: The sample prediction equation with three explanatory variables is:

10 Agresti/Franklin Statistics, 10 of 141 Example: Predicting Selling Price Using House and Lot Size The data set house selling prices contains observations on 100 home sales in Florida in November 2003 A multiple regression analysis was done with selling price as the response variable and with house size and lot size as the explanatory variables

11 Agresti/Franklin Statistics, 11 of 141 Example: Predicting Selling Price Using House and Lot Size Output from the analysis:

12 Agresti/Franklin Statistics, 12 of 141 Example: Predicting Selling Price Using House and Lot Size Prediction Equation: where y = selling price, x 1 =house size and x 2 = lot size

13 Agresti/Franklin Statistics, 13 of 141 Example: Predicting Selling Price Using House and Lot Size One house listed in the data set had house size = 1240 square feet, lot size = 18,000 square feet and selling price = $145,000 Find its predicted selling price:

14 Agresti/Franklin Statistics, 14 of 141 Example: Predicting Selling Price Using House and Lot Size Find its residual: The residual tells us that the actual selling price was $37,724 higher than predicted

15 Agresti/Franklin Statistics, 15 of 141 The Number of Explanatory Variables You should not use many explanatory variables in a multiple regression model unless you have lots of data A rough guideline is that the sample size n should be at least 10 times the number of explanatory variables

16 Agresti/Franklin Statistics, 16 of 141 Plotting Relationships Always look at the data before doing a multiple regression Most software has the option of constructing scatterplots on a single graph for each pair of variables This is called a scatterplot matrix

17 Agresti/Franklin Statistics, 17 of 141 Plotting Relationships

18 Agresti/Franklin Statistics, 18 of 141 Interpretation of Multiple Regression Coefficients The simplest way to interpret a multiple regression equation looks at it in two dimensions as a function of a single explanatory variable We can look at it this way by fixing values for the other explanatory variable(s)

19 Agresti/Franklin Statistics, 19 of 141 Interpretation of Multiple Regression Coefficients Example using the housing data: Suppose we fix x 1 = house size at 2000 square feet The prediction equation becomes:

20 Agresti/Franklin Statistics, 20 of 141 Interpretation of Multiple Regression Coefficients Since the slope coefficient of x 2 is 2.84, the predicted selling price for 2000 square foot houses increases by $2.84 for every square foot increase in lot size For a 1000 square-foot increase in lot size, the predicted selling price of 2000 sq. ft. houses increases by 1000(2.84) = $2840

21 Agresti/Franklin Statistics, 21 of 141 Interpretation of Multiple Regression Coefficients Example using the housing data: Suppose we fix x 2 = lot size at 30,000 square feet The prediction equation becomes:

22 Agresti/Franklin Statistics, 22 of 141 Interpretation of Multiple Regression Coefficients Since the slope coefficient of x 1 is 53.8, the predicted selling price for houses with a lot size of 30,000 sq. ft. increases by $53.80 for every square foot increase in house size

23 Agresti/Franklin Statistics, 23 of 141 Interpretation of Multiple Regression Coefficients In summary, an increase of a square foot in house size has a larger impact on the selling price ($53.80) than an increase of a square foot in lot size ($2.84) We can compare slopes for these explanatory variables because their units of measurement are the same (square feet) Slopes cannot be compared when the units differ

24 Agresti/Franklin Statistics, 24 of 141 Summarizing the Effect While Controlling for a Variable The multiple regression model assumes that the slope for a particular explanatory variable is identical for all fixed values of the other explanatory variables

25 Agresti/Franklin Statistics, 25 of 141 Summarizing the Effect While Controlling for a Variable For example, the coefficient of x 1 in the prediction equation: is 53.8 regardless of whether we plug in x 2 = 10,000 or x 2 = 30,000 or x 2 = 50,000

26 Agresti/Franklin Statistics, 26 of 141 Summarizing the Effect While Controlling for a Variable

27 Agresti/Franklin Statistics, 27 of 141 Slopes in Multiple Regression and in Bivariate Regression In multiple regression, a slope describes the effect of an explanatory variable while controlling effects of the other explanatory variables in the model

28 Agresti/Franklin Statistics, 28 of 141 Slopes in Multiple Regression and in Bivariate Regression Bivariate regression has only a single explanatory variable A slope in bivariate regression describes the effect of that variable while ignoring all other possible explanatory variables

29 Agresti/Franklin Statistics, 29 of 141 Importance of Multiple Regression One of the main uses of multiple regression is to identify potential lurking variables and control for them by including them as explanatory variables in the model

30 Agresti/Franklin Statistics, 30 of 141 For all students at Walden Univ., the prediction equation for y = college GPA and x 1 = H.S. GPA and x 2 = study time is: Find the predicted college GPA of a student who has a H.S. GPA of 3.5 and who studies 3 hrs. per day. a.3.67 b c d.3.4

31 Agresti/Franklin Statistics, 31 of 141 For students with fixed study time, what is the change in predicted college GPA when H.S. GPA increases from 3.0 to 4.0? a.1.13 b c d.1.00 For all students at Walden Univ., the prediction equation for y = college GPA and x 1 = H.S. GPA and x 2 = study time is:

32 Agresti/Franklin Statistics, 32 of 141 Section 12.2 Extending the Correlation and R-Squared for Multiple Regression

33 Agresti/Franklin Statistics, 33 of 141 Multiple Correlation To summarize how well a multiple regression model predicts y, we analyze how well the observed y values correlate with the predicted y values The multiple correlation is the correlation between the observed y values and the predicted y values It is denoted by R

34 Agresti/Franklin Statistics, 34 of 141 Multiple Correlation For each subject, the regression equation provides a predicted value Each subject has an observed y-value and a predicted y-value

35 Agresti/Franklin Statistics, 35 of 141 Multiple Correlation The correlation computed between all pairs of observed y-values and predicted y-values is the multiple correlation, R The larger the multiple correlation, the better are the predictions of y by the set of explanatory variables

36 Agresti/Franklin Statistics, 36 of 141 Multiple Correlation The R-value always falls between 0 and 1 In this way, the multiple correlation R differs from the bivariate correlation r between y and a single variable x, which falls between -1 and +1

37 Agresti/Franklin Statistics, 37 of 141 R-squared For predicting y, the square of R describes the relative improvement from using the prediction equation instead of using the sample mean, y

38 Agresti/Franklin Statistics, 38 of 141 R-squared The error in using the prediction equation to predict y is summarized by the residual sum of squares:

39 Agresti/Franklin Statistics, 39 of 141 R-squared The error in using y to predict y is summarized by the total sum of squares:

40 Agresti/Franklin Statistics, 40 of 141 R-squared The proportional reduction in error is:

41 Agresti/Franklin Statistics, 41 of 141 R-squared The better the predictions are using the regression equation, the larger R 2 is For multiple regression, R 2 is the square of the multiple correlation, R

42 Agresti/Franklin Statistics, 42 of 141 Example: How Well Can We Predict House Selling Prices? For the 100 observations on y = selling price, x 1 = house size, and x 2 = lot size, a table, called the ANOVA (analysis of variance) table was created The table displays the sums of squares in the SS column

43 Agresti/Franklin Statistics, 43 of 141 Example: How Well Can We Predict House Selling Prices? The R 2 value can be created from the sums of squares in the table

44 Agresti/Franklin Statistics, 44 of 141 Example: How Well Can We Predict House Selling Prices? Using house size and lot size together to predict selling price reduces the prediction error by 71%, relative to using y alone to predict selling price

45 Agresti/Franklin Statistics, 45 of 141 Example: How Well Can We Predict House Selling Prices? Find and interpret the multiple correlation There is a strong association between the observed and the predicted selling prices House size and lot size very much help us to predict selling prices

46 Agresti/Franklin Statistics, 46 of 141 Example: How Well Can We Predict House Selling Prices? If we used a bivariate regression model to predict selling price with house size as the predictor, the r 2 value would be 0.58 If we used a bivariate regression model to predict selling price with lot size as the predictor, the r 2 value would be 0.51

47 Agresti/Franklin Statistics, 47 of 141 Example: How Well Can We Predict House Selling Prices? The multiple regression model has R , so it provides better predictions than either bivariate model

48 Agresti/Franklin Statistics, 48 of 141 Properties of R 2 The previous example showed that R 2 for the multiple regression model was larger than r 2 for a bivariate model using only one of the explanatory variables A key factor of R 2 is that it cannot decrease when predictors are added to a model

49 Agresti/Franklin Statistics, 49 of 141 Properties of R 2 R 2 falls between 0 and 1 The larger the value, the better the explanatory variables collectively predict y R 2 =1 only when all residuals are 0, that is, when all regression predictions are prefect R 2 = 0 when the correlation between y and each explanatory variable equals 0

50 Agresti/Franklin Statistics, 50 of 141 Properties of R 2 R 2 gets larger, or at worst stays the same, whenever an explanatory variable is added to the multiple regression model The value of R 2 does not depend on the units of measurement

51 Agresti/Franklin Statistics, 51 of 141 R 2 Values for Various Multiple Regression Models

52 Agresti/Franklin Statistics, 52 of 141 R 2 Values for Various Multiple Regression Models The single predictor in the data set that is most strongly associated with y is the houses real estate tax assessment (r 2 = 0.679) When we add house size as a second predictor, R 2 goes up from to As other predictors are added, R 2 continues to go up, but not by much

53 Agresti/Franklin Statistics, 53 of 141 R 2 Values for Various Multiple Regression Models R 2 does not increase much after a few predictors are in the model When there are many explanatory variables but the correlations among them are strong, once you have included a few of them in the model, R 2 usually doesnt increase much more when you add additional ones

54 Agresti/Franklin Statistics, 54 of 141 R 2 Values for Various Multiple Regression Models This does not mean that the additional variables are uncorrelated with the response variable It merely means that they dont add much new power for predicting y, given the values of the predictors already in the model

55 Agresti/Franklin Statistics, 55 of 141 In a data set used to predict body weight (in pounds), three predictors were used: height, percent body fat and age. Their correlations with total body weight were: Height: Percent Body fat: Age: Which explanatory variable gives by itself the best prediction of weight? a.Height b.Percent body fat c.Age

56 Agresti/Franklin Statistics, 56 of 141 With height as the sole predictor, what is r 2 ? a..745 b..555 c..625 d..825 In a data set used to predict body weight (in pounds), three predictors were used: height, percent body fat and age. Their correlations with total body weight were: Height: Percent Body fat: Age:

57 Agresti/Franklin Statistics, 57 of 141 If Percent Body Fat is added to the model R 2 = If Age is then added to the model R 2 =0.67. Once you know height and % body fat, does age seem to help in predicting weight? a.No b.Yes In a data set used to predict body weight (in pounds), three predictors were used: height, percent body fat and age. Their correlations with total body weight were: Height: Percent Body fat: Age:

58 Agresti/Franklin Statistics, 58 of 141 Section 12.3 How Can We Use Multiple Regression to Make Inferences?

59 Agresti/Franklin Statistics, 59 of 141 Inferences about the Population Assumptions required when using a multiple regression model to make inferences about the population: The regression equation truly holds for the population means This implies that there is a straight-line relationship between the mean of y and each explanatory variable, with the same slope at each value of the other predictors

60 Agresti/Franklin Statistics, 60 of 141 Inferences about the Population Assumptions required when using a multiple regression model to make inferences about the population: The data were gathered using randomization The response variable y has a normal distribution at each combination of values of the explanatory variables, with the same standard deviation

61 Agresti/Franklin Statistics, 61 of 141 Inferences about Individual Regression Parameters Consider a particular parameter, β 1 If β 1 = 0, the mean of y is identical for all values of x 1, at fixed values of the other explanatory variables So, H 0 : β 1 = 0 states that y and x 1 are statistically independent, controlling for the other variables This means that once the other explanatory variables are in the model, it doesnt help to have x 1 in the model

62 Agresti/Franklin Statistics, 62 of 141 Significance Test about a Multiple Regression Parameter 1.Assumptions: Each explanatory variable has a straight- line relation with µ y with the same slope for all combinations of values of other predictors in the model Data gathered with randomization Normal distribution for y with same standard deviation at each combination of values of other predictors in model

63 Agresti/Franklin Statistics, 63 of 141 Significance Test about a Multiple Regression Parameter 2.Hypotheses: H 0 : β 1 = 0 H a : β 1 0 When H 0 is true, y is independent of x 1, controlling for the other predictors

64 Agresti/Franklin Statistics, 64 of 141 Significance Test about a Multiple Regression Parameter 3.Test Statistic:

65 Agresti/Franklin Statistics, 65 of 141 Significance Test about a Multiple Regression Parameter 4.P-value: Two-tail probability from t- distribution of values larger than observed t test statistic (in absolute value) The t-distribution has: df = n – number of parameters in the regression equation

66 Agresti/Franklin Statistics, 66 of 141 Significance Test about a Multiple Regression Parameter 5.Conclusion: Interpret P-value; compare to significance level if decision needed

67 Agresti/Franklin Statistics, 67 of 141 Example: What Helps Predict a Female Athletes Weight? The College Athletes data set comes from a study of 64 University of Georgia female athletes The study measured several physical characteristics, including total body weight in pounds (TBW), height in inches (HGT), the percent of body fat (%BF) and age

68 Agresti/Franklin Statistics, 68 of 141 Example: What Helps Predict a Female Athletes Weight? The results of fitting a multiple regression model for predicting weight using the other variables:

69 Agresti/Franklin Statistics, 69 of 141 Example: What Helps Predict a Female Athletes Weight? Interpret the effect of age on weight in the multiple regression equation:

70 Agresti/Franklin Statistics, 70 of 141 Example: What Helps Predict a Female Athletes Weight? The slope coefficient of age is For athletes having fixed values for x 1 and x 2, the predicted weight decreases by 0.96 pounds for a 1-year increase in age, and the ages vary only between 17 and 23

71 Agresti/Franklin Statistics, 71 of 141 Example: What Helps Predict a Female Athletes Weight? Run a hypothesis test to determine whether age helps to predict weight, if you already know height and percent body fat

72 Agresti/Franklin Statistics, 72 of 141 Example: What Helps Predict a Female Athletes Weight? 1.Assumptions: The 64 female athletes were a convenience sample, not a random sample Caution should be taken when making inferences about all female college athletes

73 Agresti/Franklin Statistics, 73 of 141 Example: What Helps Predict a Female Athletes Weight? 2.Hypotheses: H 0 : β 3 = 0 H a : β Test statistic:

74 Agresti/Franklin Statistics, 74 of 141 Example: What Helps Predict a Female Athletes Weight? 4.P-value: This value is reported in the output as Conclusion: The P-value of 0.14 does not give much evidence against the null hypothesis that β 3 = 0 Age does not significantly predict weight if we already know height and % body fat

75 Agresti/Franklin Statistics, 75 of 141 Confidence Interval for a Multiple Regression Parameter A 95% confidence interval for a β slope parameter in multiple regression equals: The t-score has: df = (n - # of parameters in the model)

76 Agresti/Franklin Statistics, 76 of 141 Example: Whats Plausible for the Effect of Age on Weight? Construct and interpret a 95% CI for β 3, the effect of age while controlling for height and % body fat

77 Agresti/Franklin Statistics, 77 of 141 Example: Whats Plausible for the Effect of Age on Weight? At fixed values of x 1 and x 2, we infer that the population mean of weight changes very little (and maybe not at all) for a 1-year increase in age The confidence interval contains 0 Age may have no effect on weight, once we control for height and % body fat

78 Agresti/Franklin Statistics, 78 of 141 Estimating Variability Around the Regression Equation A standard deviation parameter, σ, describes variability of the observations around the regression equation Its sample estimate is:

79 Agresti/Franklin Statistics, 79 of 141 Example: Estimating Variability of Female Athletes Weight Anova Table for the college athletes data set:

80 Agresti/Franklin Statistics, 80 of 141 Example: Estimating Variability of Female Athletes Weight For female athletes at particular values of height, % of body fat, and age, estimate the standard deviation of their weights Begin by finding the Mean Square Error: Notice that this value (102.2) appears in the MS column in the ANOVA table

81 Agresti/Franklin Statistics, 81 of 141 Example: Estimating Variability of Female Athletes Weight The standard deviation is: This value is also displayed in the ANOVA table For athletes with certain fixed values of height, % body fat, and age, the weights vary with a standard deviation of about 10 pounds

82 Agresti/Franklin Statistics, 82 of 141 Example: Estimating Variability of Female Athletes Weight If the conditional distributions of weight are approximately bell-shaped, about 95% of the weight values fall within about 2s = 20 pounds of the true regression line

83 Agresti/Franklin Statistics, 83 of 141 Do the Explanatory Variables Collectively Have an Effect? Example: With 3 predictors in a model, we can check this by testing:

84 Agresti/Franklin Statistics, 84 of 141 Do the Explanatory Variables Collectively Have an Effect? The test statistic for H 0 is denoted by F

85 Agresti/Franklin Statistics, 85 of 141 Do the Explanatory Variables Collectively Have an Effect? When H 0 is true, the expected value of the F test statistic is approximately 1 When H 0 is false, F tends to be larger than 1 The larger the F test statistic, the stronger the evidence against H 0

86 Agresti/Franklin Statistics, 86 of 141 Summary of F Test That All βeta Parameters = 0 1.Assumptions: Multiple regression equation holds, data gathered randomly, normal distribution for y with same standard deviation at each combination of predictors

87 Agresti/Franklin Statistics, 87 of 141 Summary of F Test That All βeta Parameters = Test statistic:

88 Agresti/Franklin Statistics, 88 of 141 Summary of F-Test That All βeta Parameters = 0 4.P-value: Right-tail probability above observed F-test statistic value from F- distribution with: df 1 = number of explanatory variables df 2 = n – (number of parameters in regression equation)

89 Agresti/Franklin Statistics, 89 of 141 Summary of F-Test That All βeta Parameters = 0 5.Conclusion: The smaller the P- value, the stronger the evidence that at least one explanatory variable has an effect on y If a decision is needed, reject H 0 if P- value significance level, such as 0.05

90 Agresti/Franklin Statistics, 90 of 141 Example: The F-Test for Predictors of Athletes Weight For the 64 female college athletes, the regression model for predicting y = weight using x 1 = height, x 2 = % body fat and x 3 = age is summarized in the ANOVA table on the next page

91 Agresti/Franklin Statistics, 91 of 141 Example: The F-Test for Predictors of Athletes Weight

92 Agresti/Franklin Statistics, 92 of 141 Example: The F-Test for Predictors of Athletes Weight Use the output in the ANOVA table to test the hypothesis:

93 Agresti/Franklin Statistics, 93 of 141 Example: The F-Test for Predictors of Athletes Weight The observed F statistic is The corresponding P-value is We can reject H 0 at the 0.05 significance level We conclude that at least one predictor has an effect on weight

94 Agresti/Franklin Statistics, 94 of 141 Example: The F-Test for Predictors of Athletes Weight The F-test tells us that at least one explanatory variable has an effect If the explanatory variables are chosen sensibly, at least one should have some predictive power The F-test result tells us whether there is sufficient evidence to make it worthwhile to consider the individual effects, using t-tests

95 Agresti/Franklin Statistics, 95 of 141 Example: The F-Test for Predictors of Athletes Weight The individual t-tests identify which of the variables are significant (controlling for the other variables)

96 Agresti/Franklin Statistics, 96 of 141 Example: The F-Test for Predictors of Athletes Weight If a variable turns out not to be significant, it can be removed from the model In this example, age can be removed from the model

97 Agresti/Franklin Statistics, 97 of 141 Section 12.4 Checking a Regression Model Using Residual Plots

98 Agresti/Franklin Statistics, 98 of 141 Assumptions for Inference with a Multiple Regression Model The regression equation approximates well the true relationship between the predictors and the mean of y The data were gathered randomly y has a normal distribution with the same standard deviation at each combination of predictors

99 Agresti/Franklin Statistics, 99 of 141 Checking Shape and Detecting Unusual Observations To test Assumption 3 (the conditional distribution of y is normal at any fixed values of the explanatory variables): Construction a histogram of the standardized residuals The histogram should be approximately bell- shaped Nearly all the standardized residuals should fall between -3 and +3. Any residual outside these limits is a potential outlier

100 Agresti/Franklin Statistics, 100 of 141 Example: Residuals for House Selling Price For the house selling price data, a MINITAB histogram of the standardized residuals for the multiple regression model predicting selling price by the house size and the lot size was created and is displayed on the following page

101 Agresti/Franklin Statistics, 101 of 141 Example: Residuals for House Selling Price

102 Agresti/Franklin Statistics, 102 of 141 Example: Residuals for House Selling Price The residuals are roughly bell shaped about 0 They fall between about -3 and +3 No severe nonnormality is indicated

103 Agresti/Franklin Statistics, 103 of 141 Plotting Residuals against Each Explanatory Variable Plots of residuals against each explanatory variable help us check for potential problems with the regression model Ideally, the residuals should fluctuate randomly about 0 There should be no obvious change in trend or change in variation as the values of the explanatory variable increases

104 Agresti/Franklin Statistics, 104 of 141 Plotting Residuals against Each Explanatory Variable

105 Agresti/Franklin Statistics, 105 of 141 Section 12.5 How Can Regression Include Categorical Predictors?

106 Agresti/Franklin Statistics, 106 of 141 Indicator Variables Regression models specify categories of a categorical explanatory variable using artificial variables, called indicator variables The indicator variable for a particular category is binary It equals 1 if the observation falls into that category and it equals 0 otherwise

107 Agresti/Franklin Statistics, 107 of 141 Indicator Variables In the house selling prices data set, the city region in which a house is located is a categorical variable The indicator variable x for region is x = 1 if house is in NW (northwest region) x = 0 if house is not in NW

108 Agresti/Franklin Statistics, 108 of 141 Indicator Variables The coefficient β of the indicator variable x is the difference between the mean selling prices for homes in the NW and for homes not in the NW

109 Agresti/Franklin Statistics, 109 of 141 Example: Including Region in Regression for House Selling Price Output from the regression model for selling price of home using house size and region

110 Agresti/Franklin Statistics, 110 of 141 Example: Including Region in Regression for House Selling Price Find and plot the lines showing how predicted selling price varies as a function of house size, for homes in the NW and for homes no in the NW

111 Agresti/Franklin Statistics, 111 of 141 Example: Including Region in Regression for House Selling Price The regression equation from the MINITAB output is:

112 Agresti/Franklin Statistics, 112 of 141 Example: Including Region in Regression for House Selling Price For homes not in the NW, x 2 = 0 The prediction equation then simplifies to:

113 Agresti/Franklin Statistics, 113 of 141 Example: Including Region in Regression for House Selling Price For homes in the NW, x 2 = 1 The prediction equation then simplifies to:

114 Agresti/Franklin Statistics, 114 of 141 Example: Including Region in Regression for House Selling Price

115 Agresti/Franklin Statistics, 115 of 141 Example: Including Region in Regression for House Selling Price Both lines have the same slope, 78 For homes in the NW and for homes not in the NW, the predicted selling price increases by $78 for each square-foot increase in house size The figure portrays a separate line for each category of region (NW, not NW)

116 Agresti/Franklin Statistics, 116 of 141 Example: Including Region in Regression for House Selling Price The coefficient of the indicator variable is For any fixed value of house size, we predict that the selling price is $30,569 higher for homes in the NW

117 Agresti/Franklin Statistics, 117 of 141 Example: Including Region in Regression for House Selling Price The line for homes in the NW is above the line for homes not in the NW The predicted selling price is higher for homes in the NW The P-value of for the test for the coefficient of the indicator variable suggests that this difference is statistically significant

118 Agresti/Franklin Statistics, 118 of 141 Is There Interaction? For two explanatory variables, interaction exists between them in their effects on the response variable when the slope of the relationship between µ y and one of them changes as the value of the other changes

119 Agresti/Franklin Statistics, 119 of 141 Is There Interaction?

120 Agresti/Franklin Statistics, 120 of 141 Section 12.6 How Can We Model a Categorical Response?

121 Agresti/Franklin Statistics, 121 of 141 Modeling a Categorical Response Variable When y is categorical, a different regression model applies, called a logistic regression

122 Agresti/Franklin Statistics, 122 of 141 Examples of Logistic Regression A voters choice in an election (Democrat or Republican), with explanatory variables: annual income, political ideology, religious affiliation, and race Whether a credit card holder pays their bill on time (yes or no), with explanatory variables: family income and the number of months in the past year that the customer paid the bill on time

123 Agresti/Franklin Statistics, 123 of 141 The Logistic Regression Model Denote the possible outcomes for y as 0 and 1 Use the generic terms failure (for outcome = 0) and success (for outcome =1) The population mean of the scores equals the population proportion of 1 outcomes (successes) That is, µ y = p The proportion, p, also represents the probability that a randomly selected subject has a success outcome

124 Agresti/Franklin Statistics, 124 of 141 The Logistic Regression Model The straight-line model is usually inadequate A more realistic model has a curved S-shape instead of a straight-line trend

125 Agresti/Franklin Statistics, 125 of 141 The Logistic Regression Model A regression equation for an S- shaped curve for the probability of success p is:

126 Agresti/Franklin Statistics, 126 of 141 Example: Annual Income and Having a Travel Credit Card An Italian study with 100 randomly selected Italian adults considered factors that are associated with whether a person possesses at least one travel credit card The table on the next page shows results for the first 15 people on this response variable and on the persons annual income (in thousands of euros)

127 Agresti/Franklin Statistics, 127 of 141 Example: Annual Income and Having a Travel Credit Card

128 Agresti/Franklin Statistics, 128 of 141 Example: Annual Income and Having a Travel Credit Card Let x = annual income and let y = whether the person possesses a travel credit card (1 = yes, 0 = no)

129 Agresti/Franklin Statistics, 129 of 141 Example: Annual Income and Having a Travel Credit Card Substituting the α and β estimates into the logistic regression model formula yields:

130 Agresti/Franklin Statistics, 130 of 141 Example: Annual Income and Having a Travel Credit Card Find the estimated probability of possessing a travel credit card at the lowest and highest annual income levels in the sample, which were x = 12 and x = 65

131 Agresti/Franklin Statistics, 131 of 141 Example: Annual Income and Having a Travel Credit Card For x = 12 thousand euros, the estimated probability of possessing a travel credit card is:

132 Agresti/Franklin Statistics, 132 of 141 Example: Annual Income and Having a Travel Credit Card For x = 65 thousand euros, the estimated probability of possessing a travel credit card is:

133 Agresti/Franklin Statistics, 133 of 141 Example: Annual Income and Having a Travel Credit Card Annual income has a strong positive effect on having a credit card The estimated probability of having a travel credit card changes from 0.09 to 0.97 as annual income changes over its range

134 Agresti/Franklin Statistics, 134 of 141 Example: Estimating Proportion of Students Whove Used Marijuana A three-variable contingency table from a survey of senior high-school students in shown on the next page The students were asked whether they had ever used: alcohol, cigarettes or marijuana

135 Agresti/Franklin Statistics, 135 of 141 Example: Estimating Proportion of Students Whove Used Marijuana

136 Agresti/Franklin Statistics, 136 of 141 Example: Estimating Proportion of Students Whove Used Marijuana Let y indicate marijuana use, coded: (1 = yes, 0 = no) Let x 1 be an indicator variable for alcohol use (1 = yes, 0 = no) Let x 2 be an indicator variable for cigarette use (1 = yes, 0 = no)

137 Agresti/Franklin Statistics, 137 of 141 Example: Estimating Proportion of Students Whove Used Marijuana

138 Agresti/Franklin Statistics, 138 of 141 Example:Estimating Proportion of Students Whove Used Marijuana The logistic regression prediction equation is:

139 Agresti/Franklin Statistics, 139 of 141 Example: Estimating Proportion of Students Whove Used Marijuana For those who have not used alcohol or cigarettes, x 1 = x 2 = 0 and:

140 Agresti/Franklin Statistics, 140 of 141 Example: Estimating Proportion of Students Whove Used Marijuana For those who have used alcohol and cigarettes, x 1 = x 2 = 1 and:

141 Agresti/Franklin Statistics, 141 of 141 Example: Estimating Proportion of Students Whove Used Marijuana The probability that students have tried marijuana seems to depend greatly on whether theyve used alcohol and cigarettes


Download ppt "Agresti/Franklin Statistics, 1 of 141 Chapter 12 Multiple Regression Learn…. T o use Multiple Regression Analysis to predict a response variable using."

Similar presentations


Ads by Google