Presentation is loading. Please wait.

Presentation is loading. Please wait.

LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture.

Similar presentations


Presentation on theme: "LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture."— Presentation transcript:

1 LOGISTIC REGRESSION

2 Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture of numerical and categorical Independent Variables. Rahul Chandra

3 To Predict What Since the dependent variable is dichotomous we cannot predict a numerical value for it using logistic regression. Instead, it employs binomial probability theory in which there are only two values to predict: that probability (p) is 1 rather than 0, i.e. the event/person belongs to one group rather than the other. Rahul Chandra

4 How?  Logistic regression forms a best fitting equation or function using the maximum likelihood method, which maximizes the probability of classifying the observed data into the appropriate category given the regression coefficients. Rahul Chandra

5 Uses of logistic regression  The first is the prediction of group membership. Since it calculates the probability of success over the probability of failure.  It also provides knowledge of the relationships and strengths among the variables (e.g. marrying the boss’s daughter puts you at a higher probability for job promotion than undertaking five hours unpaid overtime each week). Rahul Chandra

6 Assumptions of logistic regression  It does not assume a linear relationship between the dependent and independent variables.  The dependent variable must be a dichotomy (2 categories).  The independent variables need not be interval, nor normally distributed, nor linearly related, nor of equal variance within each group. Rahul Chandra

7 Assumptions of logistic regression  The categories (groups) must be mutually exclusive and exhaustive; a case can only be in one group and every case must be a member of one of the groups.  Larger samples are needed than for linear regression. A minimum of 50 cases per predictor is recommended Rahul Chandra

8 Logistic regression equation (example) Rahul Chandra

9 Logistic regression equation (example) Rahul Chandra

10 Log Transformation of p  This log transformation of the p values to a log distribution enables us to create a link with the normal regression equation.  The log distribution (or logistic transformation of p) is also called the logit of p or logit(p).  Logit(p) is the log (to base e) of the odds ratio or likelihood ratio that the dependent variable is 1. In symbols it is defi ned as: logit(p) Rahul Chandra

11 Logit Function  logit(p) = log[p / (1 − p)] = ln[p / (1-p)]  Whereas p can only range from 0 to 1, logit(p) scale ranges from negative infinity to positive  infinity Rahul Chandra

12 logistic regression equation It uses a maximum likelihood method, which maximizes the probability of getting the observed results given the fitted regression coefficients. Rahul Chandra

13 Calculating P Where: p = the probability that a case is in a particular category, exp = the base of natural logarithms (approx 2.72), a = the constant of the equation and, b = the coefficient of the predictor variables. Rahul Chandra

14 ODDS  For a dichotomous variable the odds of membership of the target group are equal to the probability of membership in the target group divided by the probability of membership in the other group.  Odds value can range from 0 to infinity and tell you how much more likely it is that an observation is a member of the target group rather than a member of the other group. Rahul Chandra

15 ODDS  If the probability is 0.80, the odds are 4 to 1 or.80/.20  If the probability is 0.25, the odds are.33 (.25/.75). Rahul Chandra

16 Logits (log odds)  Logistic regression calculates changes in the log odds of the dependent, not changes in the dependent value as OLS regression does.  The Logits (log odds) are the b coefficients (the slope values) of the regression equation. Rahul Chandra

17 ODDS RATIO (OR)  It estimates the change in the odds of membership in the target group for a one unit increase in the predictor. It is calculated by using the regression coefficient of the predictor as the exponent or exp.  SPSS calculates odd ratio as EXP(B) Rahul Chandra

18 ODDS RATIO (OR)  Assume in the example earlier where we were predicting accountancy success by a math competency predictor that b = 2.69. Thus the odds ratio is exp2.69 or 14.73.  Therefore the odds of passing are 14.73 times greater for a student, for example, who had a pre-test score of 5, than for a student whose pre-test score was 4. Rahul Chandra

19 SPSS Outputs Rahul Chandra

20 Block 0: Beginning Block  Block 0 presents the results with only the constant included before any coefficients (i.e. those relating to family size and mortgage) are entered into the equation.  Logistic regression compares this model with a model including all the predictors (family size and mortgage) to determine whether the latter model is more appropriate. Rahul Chandra

21 Block 0: Beginning Block Rahul Chandra

22 Variables in the Equation Rahul Chandra

23 Model chi square  The difference between –2LL for the best-fitting model and –2LL for the initial model (in which all the b values are set to zero in block 0) is distributed like chi squared, with degrees of freedom equal to the number of predictors. Rahul Chandra

24 Model fit and the likelihood function  The Maximum Likelihood (or ML) is used instead to find the function that will maximize our ability to predict the probability of Y based on what we know about X. Likelihood just means probability. Rahul Chandra

25 Model fit and the likelihood function  We then work out the likelihood of observing the data we actually did observe under each of these hypotheses. The result is usually a very small number, and to make it easier to handle, the natural logarithm is used, producing a log likelihood (LL). Probabilities are always less than one, so LL’s are always negative. Rahul Chandra

26 The likelihood ratio test  This tests the difference between –2LL for the full model with predictors and –2LL for initial chi-square in the null model only a constant in it.  Significance at the.05 level or lower means the researcher’s model with the predictors is significantly different from the one with the constant only (all ‘b’ coefficients being zero). Rahul Chandra

27 Model chi square  This difference is the Model chi square that SPSS refers to. Very conveniently, the difference between – 2LL values for models with successive terms added also has a chi squared distribution, so when we use a stepwise procedure, we can use chi-squared tests to find out if adding one or more extra predictors significantly improves the fit of our model. Rahul Chandra

28 Block 1 Method = Enter Rahul Chandra

29 Hosmer and Lemeshow Statistic  This is an alternative to model chi square.  If the H-L goodness-of-fit test statistic is greater than.05, as we want for well-fitting models, we fail to reject the null hypothesis that there is no difference between observed and model-predicted values, implying that the model’s estimates fit the data at an acceptable level. Rahul Chandra

30 Hosmer and Lemeshow Test H-L statistic has a significance of.605 which means that it is not statistically significant and therefore our model is quite a good fit. Rahul Chandra

31 Classification Table  In a perfect model, all cases will be on the diagonal and the overall percent correct will be 100%. In this study, 87.5% were correctly classified for the take offer group and 92.9% for the decline offer group. Overall 90% were correctly classified. This is a considerable improvement on the 53.3% correct classification with the constant model Rahul Chandra

32 Classification Table Rahul Chandra

33 The Variables in the Equation  This table (Table 24.9) has several important elements. The Wald statistic and associated probabilities provide an index of the significance of each predictor in the equation.  The simplest way to assess Wald is to take the significance values and if less than.05 reject the null hypothesis, as the variable does make a significant contribution.  In this case, we note that family size contributed significantly to the prediction (p =.013) but mortgage did not (p =.075). In this case mortgage can be dropped as independent variable. Rahul Chandra

34 The Variables in the Equation Rahul Chandra

35 Exp(B)  EXP(B) value associated with family size is 11.007. Hence when family size is raised by one unit (one person) the odds ratio is 11 times as large and therefore householders are 11 more times likely to belong to the take offer group. Rahul Chandra

36 ‘B’ values These are the logistic coefficients that can be used to create a predictive Equation. Rahul Chandra


Download ppt "LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture."

Similar presentations


Ads by Google