Chapter 2: Logistic Regression 2.1 Likelihood Approach 2.2 Binary Logistic Regression 2.3 Nominal and Ordinal Logistic Regression Models 1.

Chapter 2: Logistic Regression 2.1 Likelihood Approach 2.2 Binary Logistic Regression 2.3 Nominal and Ordinal Logistic Regression Models 1

Objectives Explain likelihood and maximum likelihood theory and estimation. Demonstrate likelihood for categorical response and explanatory variable. 3

Likelihood The likelihood is a statement about a data set. The likelihood assumes a model for the data. Changing the model, either the function or the parameter values, changes the likelihood. The likelihood is the probability of the data as a whole. This likelihood assumes independence. 4

Likelihood for Binomial Example The marginal distribution of Survived can be modeled with the binomial distribution. 5

Maximum Likelihood Theory The objective is to estimate the parameter and maximize the likelihood of the observed data. The maximum likelihood estimator provides –a large sample normal distribution of estimates –asymptotic consistency (convergence) –asymptotic efficiency (smallest standard errors) 6

Maximum Likelihood Estimation Use the kernel, the part of the likelihood function that depends on the model parameter. Use the logarithm transform. –The product of probabilities becomes the sum of the logs of the probabilities. Maximize the log-likelihood by finding the solution to the derivative of the likelihood with respect to the parameter or by an appropriate numerical method. 7

Estimation for Binomial Example 8

2.01 Multiple Choice Poll What is the likelihood of the data? a.The sum of the probabilities of individual cases b.The product of the log of the probabilities of individual cases c.The product of the log of the individual cases d.The sum of the log of the probabilities of individual cases 10

2.01 Multiple Choice Poll – Correct Answer What is the likelihood of the data? a.The sum of the probabilities of individual cases b.The product of the log of the probabilities of individual cases c.The product of the log of the individual cases d.The sum of the log of the probabilities of individual cases 11

Titanic Example The null hypothesis is that there is no association between Survived and Class. The alternative hypothesis is that there is an association between Survived and Class. Compute the likelihood under both hypotheses. Compare the hypotheses by examining the difference in the likelihood. 12

Titanic Example 13 1 st 2 nd 3 rd Crew Row Total Survived203118178212711 Lost1221675286731490 Column Total 3252857068852201

Uncertainty The negative log-likelihood measures variation, sometimes called uncertainty, in the sample. The higher the value of the negative log-likelihood is, the greater the variability (uncertainty) in the data. Use negative log-likelihood in much the same way as you use the sum of squares with a continuous response. 14

Null Hypothesis 15 1 st 2 nd 3 rd Crew Survived0.323 Lost0.677 using the marginal distribution

Uncertainty: Null Hypothesis 16 Analogous to corrected total sum of squares

Alternative Hypothesis 17 1 st 2 nd 3 rd Crew Survived0.62460.41400.25210.2395 Lost0.37540.58600.74790.7605 using the conditional distribution

Uncertainty: Alternative Hypothesis 18

Uncertainty: Alternative Hypothesis 19 Analogous to error sum of squares

Model Uncertainty 20 Analogous to model sum of squares

Hypothesis Test for Association 21

Model R 2 22

2.02 Multiple Answer Poll How does the difference between the – log-likelihood for the full model and the reduced model inform you? a.It is the probability of the model. b.It represents the reduction in the uncertainty. c.It is the numerator of the R 2 statistic. d.It is twice the likelihood ratio test statistic. 24

2.02 Multiple Answer Poll – Correct Answer How does the difference between the – log-likelihood for the full model and the reduced model inform you? a.It is the probability of the model. b.It represents the reduction in the uncertainty. c.It is the numerator of the R 2 statistic. d.It is twice the likelihood ratio test statistic. 25

Model Selection Akaike’s Information Criterion is widely accepted as a useful metric in model selection. Smaller AIC values indicate a better model. A correction is added for small samples. 26

AIC c Difference The AIC c for any given model cannot be interpreted by itself. The difference in AIC c can be used to determine how much support the candidate model has compared to the model with the smallest AIC c. 27 ΔSupport 0-2Substantial 4-7Considerably Less >10Essentially None

Model Selection Another popular statistic for model selection is Schwartz’s Bayesian Information Criterion (BIC). It measures bias and variance in the model like AIC. Select the model with the smallest BIC to minimize over-fitting the data. It uses a stronger penalty term than AIC. 28

This demonstration illustrates the concepts discussed previously. Hypothesis Tests and Model Selection 29

Exercise This exercise reinforces the concepts discussed previously. 31

2.03 Quiz Is this association significant? Use the LRT to decide. 33

2.03 Quiz – Correct Answer Is this association significant? Use the LRT to decide. It is not significant at α=0.05 level. 34

Objectives Explain the concepts of logistic regression. Fit a logistic regression model using JMP software. Examine logistic regression output. 36

Overview 37 ResponseExplanatoryMethod ContinuousCategoricalANOVA Continuous Linear Regression Categorical Crosstabulation CategoricalContinuousLogistic Regression

Types of Logistic Regression Models Binary logistic regression addresses a response with only two levels. Nominal logistic regression addresses a response with more than two levels with no inherent order. Ordinal logistic regression addresses a response with more than two levels with an inherent order. 38

Purpose of Logistic Regression A logistic regression model predicts the probability of specific outcomes. It is designed to describe probabilities associated with the levels of the response variable. Probability is bounded, [0, 1], but the response in a linear regression model is unbounded, (-∞,∞). 39

The Logistic Curve The relationship between the probability of a response and a predictor might not be linear. –Asymptotes arise from bounded probability. Transform the probability to make the relationship linear. –Two-step transformation for logistic regression. Linear regression cannot model this relationship well, but logistic regression can. 40

Logistic Curve The asymptotic limits of the probability produce a nonlinear relationship with the explanatory variable. 41

Transform Probability Step 1: Convert the probability to the odds. –Range of odds is 0 to ∞. Step 2: Convert the odds to the logarithm of the odds. –Range of log(odds) is -∞ to ∞. The log(odds) is a function of the probability and its range is suitable for linear regression. 42

What Are the Odds? The odds are a function of the probability of an event. The odds of two events or of one event under two conditions can be compared as a ratio. 43

Probability of Outcome 44 Default on Loan Yes No Yes Late Payments (Group A) 2060 No Late Payments (Group B) 1090 Total30150 Probability of defaulting=20/80 (.25) in Group A Probability of not defaulting=60/80 (.75) in Group A Total 80 100 180

Odds of Outcome 45 Odds of Defaulting in Group A probability of defaulting in group with history of late payments probability of not defaulting in group with history of late payments 0.25÷0.75=0.33 ÷ Odds are the ratio of P(A) to P(not A).

Odds Ratio of Outcome 46 Odds Ratio of Group A to Group B odds of defaulting in group with history of late payments odds of defaulting in group with no history of late payments 0.33÷0.11=3 ÷ Odds ratio is the ratio of odds(A) to odds(B).

Interpretation of the Odds Ratio 47 01 ∞ no association B more likelyA more likely

2.04 Quiz If the chance of rain is 75%, then what are the odds that it will rain? 49

2.04 Quiz – Correct Answer If the chance of rain is 75%, then what are the odds that it will rain? The odds are 3 because the odds are the ratio of probability that it will rain to the probability that it will not, or 0.75/0.25=3. 50

Target or Positive Value The binary response takes two possible values that represent two states, an event and the corresponding non-event. The logit transform is based on the odds of the event. This event is known as the target or positive value. The target value is the first of the two response values. The first value is determined by alphanumeric sorting, the order in a recognized series of values, or by the Value Ordering column property if you add it. 51

Logit Transformation 52 where iindexes all cases (observations). π i is the probability that the event (survived, for example) occurs in the i th case. 1- π i is the probability that the event (survived, for example) does not occur in the i th case logis the natural log (to the base e).

Assumption 53  i Predictor Logit Transform 

Logistic Regression Model 54

2.05 Multiple Answer Poll Which of the following statements about the logit transform are true? a.The logit is a function of the odds of an outcome. b.The logit is a probability of an outcome. c.The logit linearizes the relationship with the predictor. d.The logit transformation parameters must be estimated. 56

2.05 Multiple Answer Poll – Correct Answer Which of the following statements about the logit transform are true? a.The logit is a function of the odds of an outcome. b.The logit is a probability of an outcome. c.The logit linearizes the relationship with the predictor. d.The logit transformation parameters must be estimated. 57

Likelihood Function A likelihood function expresses the probability of the observed data as a function of the unknown model parameters. The goal is to derive values of the parameters such that the probability of the observed data is as large as possible. 58

Maximum Likelihood Estimate 59 Log-likelihood

Model Inference 60 0 LogL 1 LogL 0 Log-likelihood function

Logistic Curve 61 Weak Relationship Strong Relationship Very Strong Relationship

Central Cutoff The ROC curve presented in Logistic Fit includes a yellow line that is tangent to the curve at the point with the maximum vertical distance from the diagonal line. This point provides the greatest separation between sensitivity and 1-specificity. This point is identified with an asterisk in the ROC Table. 62

Titanic Passengers Example There is another data set for 1309 passengers of the final voyage of the Titanic. –The crew members are not included. The new data set includes the variables Survived, Passenger Class, Sex, Age, Siblings and Spouses, Parents and Children, and Fare. –Some variables in this data set are not used in the demonstration. 63

64 This demonstration illustrates the concepts discussed previously. Binary Logistic Regression

2.06 Quiz You want to predict the probability of not surviving, given the number of siblings and spouses aboard. What kind of association exists between these two variables? Is it a strong relationship or a weak relationship? 66

2.06 Quiz – Correct Answer You want to predict the probability of not surviving, given the number of siblings and spouses aboard. What kind of association exists between these two variables? Is it a strong relationship or a weak relationship? Weak: the fitted regression line is nearly flat, indicating a weak association. 67

Multiple Logistic Regression Several explanatory variables exhibit an association with Survived. Any of these associations can predict the outcome of Survived better than the overall proportion. Using more than one explanatory variable in a logistic regression model will further improve predictions of the outcome. 70

Interaction Effect The linear combination of the effects of predictors might not account for all of the association. The effect of one predictor might depend on the level of another predictor. This additional effect is known as an interaction. It is modeled by including a crossed term involving both predictors in the linear combination. –A, B, A*B 71

Lack of Fit The whole model test is a likelihood ratio test to decide whether the model is significantly better at predicting the response than the marginal distribution. The lack of fit test is a likelihood ratio test to decide whether another model could predict better than the current model. It compares the –log-likelihood of the fitted model to the –log-likelihood of the saturated model. The saturated model is achieved with a parameter for every observation and is a perfect fit to the data. 72

73 This demonstration illustrates the concepts discussed previously. Multiple Logistic Regression

75 Exercise This exercise reinforces the concepts discussed previously.

2.07 Multiple Choice Poll Suppose process A makes a product, which is evaluated as defective or non-defective. Suppose the probability of a defective is 0.2. Which is true? a.The odds of a defective from process A is given by 0.8/0.2=4. b.The odds of a defective from process A is given by 0.2/0.8=0.25. 77

2.07 Multiple Choice Poll – Correct Answer Suppose process A makes a product, which is evaluated as defective or non-defective. Suppose the probability of a defective is 0.2. Which is true? a.The odds of a defective from process A is given by 0.8/0.2=4. b.The odds of a defective from process A is given by 0.2/0.8=0.25. 78

2.08 Multiple Choice Poll The odds ratio for getting a defective product from process A versus getting one from process B is 0.25. What is its interpretation? a.You expect defectives to occur 25 times more often from process B than from process A. b.You expect defectives to occur ¼ as often from process B than from process A. c.You expect defectives to occur 75% less often from process A than from process B. 79

2.08 Multiple Choice Poll – Correct Answer The odds ratio for getting a defective product from process A versus getting one from process B is 0.25. What is its interpretation? a.You expect defectives to occur 25 times more often from process B than from process A. b.You expect defectives to occur ¼ as often from process B than from process A. c.You expect defectives to occur 75% less often from process A than from process B. 80

Objectives Explain the generalized logit and the cumulative logit. Fit a nominal logistic and an ordinal logistic regression model. Interpret the parameter estimates and odds ratios. 82

Nominal Logistic Regression Binary logistic regression can be extended to responses with more than two levels. Three or more levels with no particular order or rank can be modeled with nominal logistic regression. The linear portion of the model remains the same as the one in the binary logistic model. The logit transform of the response is adapted to the nominal response. –This adaptation is known as the generalized logit. 83

Generalized Logits 84 Response Log Logit(1) Logit(2) Number of Generalized Logits= Number of Levels -1

Generalized Logit Model 85 Logit(i) Predictor X Different Slopes and Intercepts Logit(i) Predictor X Logit(2)=a 2 +B 2 X Logit(1)=a 1 +B 1 X Different Slopes and Intercepts

2.09 Multiple Answer Poll Suppose a nominal response variable has four levels. Which of the following statements is true? a.JMP computes three generalized logits. b.Logit(1) is the log odds for level 1 occurring versus level 4 occurring. c.JMP computes a separate intercept parameter for each logit. d.JMP computes a separate slope parameter for each logit. 87

2.09 Multiple Answer Poll – Correct Answer Suppose a nominal response variable has four levels. Which of the following statements is true? a.JMP computes three generalized logits. b.Logit(1) is the log odds for level 1 occurring versus level 4 occurring. c.JMP computes a separate intercept parameter for each logit. d.JMP computes a separate slope parameter for each logit. 88

Titanic Passengers Example The passengers on this voyage boarded the Titanic in one of three ports. –Southampton, England (S) –Cherbourg, France (C) –Queenstown, Ireland (Q) (Known today as Cobh) Predict the port of departure using the continuous predictors Fare, Siblings and Spouse, and Parents and Children. The explanatory variable Age has many missing values and it is correlated with the other predictors. 89

90 This demonstration illustrates the concepts discussed previously. Nominal Logistic Regression Model

92 Exercise This exercise reinforces the concepts discussed previously.

Ordinal Logistic Regression The generalized logits used in nominal logistic regression provide the most flexibility, but at the cost of a full set of parameters for each level of the response. Some responses are naturally ordinal. An ordinal response requires a unique intercept for all but the last level. An ordinal response uses a common set of parameters for all of the remaining terms. 93

Cumulative Logits 94 Response Log Logit(1) Logit(2) Number of Cumulative Logits= Number of Levels -1

Proportional Odds Assumptions 95 Predictor X Logit(i) Logit(2)= a 2 +BX Logit(1)= a 1 +BX Equal Slopes

Popcorn Example An experiment was conducted to determine whether the appeal of popcorn depended on the amount of salt. The response is ordinal. –1 (poor) to 5 (excellent). The factor is continuous with four levels, 0 to 3. 96

97 This demonstration illustrates the concepts discussed previously. Ordinal Logistic Regression

Chapter 2: Logistic Regression 2.1 Likelihood Approach 2.2 Binary Logistic Regression 2.3 Nominal and Ordinal Logistic Regression Models 1.

Similar presentations

Presentation on theme: "Chapter 2: Logistic Regression 2.1 Likelihood Approach 2.2 Binary Logistic Regression 2.3 Nominal and Ordinal Logistic Regression Models 1."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 2: Logistic Regression 2.1 Likelihood Approach 2.2 Binary Logistic Regression 2.3 Nominal and Ordinal Logistic Regression Models 1.

Similar presentations

Presentation on theme: "Chapter 2: Logistic Regression 2.1 Likelihood Approach 2.2 Binary Logistic Regression 2.3 Nominal and Ordinal Logistic Regression Models 1."— Presentation transcript:

Similar presentations

About project

Feedback