Chapter 13 Multiple Regression Section 13.6 Modeling a Categorical Response
Modeling a Categorical Response Variable The regression models studied so far are designed for a quantitative response variable y. When y is categorical, a different regression model applies, called logistic regression.
Examples of Logistic Regression A voter’s choice in an election (Democrat or Republican), with explanatory variables: annual income, political ideology, religious affiliation, and race. Whether a credit card holder pays their bill on time (yes or no), with explanatory variables: family income and the number of months in the past year that the customer paid the bill on time.
The Logistic Regression Model Denote the possible outcomes for y as 0 and 1. Use the generic terms failure (for outcome = 0), and success (for outcome =1). The population mean of the scores equals the population proportion of ‘1’ outcomes (successes). That is, The proportion, p, also represents the probability that a randomly selected subject has a successful outcome.
The Logistic Regression Model The straight-line model is usually inadequate when there are multiple explanatory variables. A more realistic model has a curved S-shape instead of a straight-line trend. The regression equation that best models this S- shaped curve is known as the logistic regression equation.
The Logistic Regression Model Figure 13.10 Two Possible Regressions for a Probability p of a Binary Response Variable. A straight line is usually less appropriate than an S-shaped curve. Question: Why is the straight-line regression model for a binary response variable often poor?
The Logistic Regression Model A regression equation for an S-shaped curve for the probability of success p is: This equation for p is called the logistic regression equation. Logistic regression is used when the response variable has only two possible outcomes (it’s binary).
Example: Travel Credit Cards An Italian study with 100 randomly selected Italian adults considered factors that are associated with whether a person possesses at least one travel credit card. The table 13.12 on the next slide shows results for the first 15 people on this response variable and on the person’s annual income (in thousands of euros).
Example: Travel Credit Cards Table 13.12 Annual Income (in thousands of euros) and Whether Possess a Travel Credit Card. The response y equals 1 if a person has a travel credit card and equals 0 otherwise.
Example: Travel Credit Cards Let x = annual income and let y = whether the person possesses a travel credit card (1 = yes, 0 = no). Table 13.13 shows what software provides for conducting a logistic regression analysis. Table 13.13 Results of Logistic Regression for Italian Credit Card Data
Example: Travel Credit Cards Substituting the and estimates into the logistic regression model formula yields:
Example: Travel Credit Cards Find the estimated probability of possessing a travel credit card at the lowest and highest annual income levels in the sample, which were x = 12 and x = 65.
Example: Travel Credit Cards For x = 12 thousand euros, the estimated probability of possessing a travel credit card is:
Example: Travel Credit Cards For x = 65 thousand euros, the estimated probability of possessing a travel credit card is:
Example: Travel Credit Cards Insight: Annual income has a strong positive effect on having a credit card. The estimated probability of having a travel credit card changes from 0.09 to 0.97 as annual income changes over its range.
Example: Estimating Proportion of Students Who’ve Used Marijuana A three-variable contingency table from a survey of senior high-school students is shown on the next slide. The students were asked whether they had ever used: alcohol, cigarettes or marijuana. We’ll treat marijuana use as the response variable and cigarette use and alcohol use as explanatory variables.
Example: Estimating Proportion of Students Who’ve Used Marijuana Table 13.14 Alcohol, Cigarette, and Marijuana Use for High School Seniors
Example: Estimating Proportion of Students Who’ve Used Marijuana Let y indicate marijuana use, coded: (1 = yes, 0 = no) Let be an indicator variable for alcohol use, coded (1 = yes, 0 = no) Let be an indicator variable for cigarette use, coded (1 = yes, 0 = no)
Example: Estimating Proportion of Students Who’ve Used Marijuana Table 13.15 MINITAB Output for Estimating the Probability of Marijuana Use Based on Alcohol Use and Cigarette Use
Example: Estimating Proportion of Students Who’ve Used Marijuana The logistic regression prediction equation is:
Example: Estimating Proportion of Students Who’ve Used Marijuana For those who have not used alcohol or cigarettes, . For them, the estimated probability of marijuana use is
Example: Estimating Proportion of Students Who’ve Used Marijuana For those who have used alcohol and cigarettes, . For them, the estimated probability of marijuana use is
Example: Estimating Proportion of Students Who’ve Used Marijuana SUMMARY: The probability that students have tried marijuana seems to depend greatly on whether they’ve used alcohol and/or cigarettes.