Logistic Regression Analysis Like ordinary regression and ANOVA, logistic regression is part of a category of models called generalized linear models. Generalized linear models were developed to unify various statistical models (linear regression, logistic regression, poisson regression). We can think of maximum likelihood as a general algorithm to estimate all these models.
Logistic Regression Analysis--GLM GLM Each outcome of the dependent variable (that is, each Y) is assumed to be generated from a particular distribution function in the exponential family (normal, binomial, poisson, etc.)
Logistic Regression Analysis (a diversion into probability distributions) Normal distribution—a family of distributions, each member of which can be defeind by the mean and variance—many physical phenomena can be approximated well by the normal distribution. Binomial distribution—probability distribution of # of successes in a sequence of Bermoulli trials (where outcomes fall into one of two categories—i.e., ”occurred” and “did not occur”. Note that in large samples, if the dependent variable is not too skewed, then the normal distribution approximates the binomial distribution.
Logistic Regression Analysis (a diversion into probability distributions) Poisson Distribution—expresses the probability of a # of events occurring in a fixed period of time, if the events occur with a known average rate, and independently of the time since the last event. (Note that the negative binomial distribution is used to model event counts that are skewed. One can also think about the “polya” distribution which can be used to model occurrences of “contagious” discrete events – tornado outbreaks.
Logistic Regression—when? Logistic regression models are appropriate for dependent variables coded 0/1. We only observe “0” and “1” for the dependent variable—but we think of the dependent variable conceptually as a probability that “1” will occur.
Logistic Regression--examples Some examples Vote for Obama (yes, no) Turned out to vote (yes, no) Sought medical assistance in last year (yes, no)
Logistic Regression—why not OLS? Why can’t we use OLS? After all, linear regression is so straightforward, and (unlike other models) actually has a “closed form solution” for the estimates.
Logistic Regression—why not OLS? Three problems with using OLS. First, what is our dependent variable, conceptually? It is the probability of y=1. But we only observe y=0 and y=1. If we use OLS, we’ll get predicted values that fall between 0 and 1—which is what we want—but we’ll also get predicted values that are greater than 1, or less than 0. That makes no sense.
Logistic Regression—Why not OLS? Three problems using OLS. Second problem—there is heteroskedasticity in the model. Think about the meaning of “residual”. The residual is the difference between the observed and the predicted Y. By definition, what will that residual look like at the center of the distribution? By definition, what will that residual look like at the tails of the distribution?
Logistic Regression—why not OLS? Three problems using OLS. The third problem is substantive. The reality is that many choice functions can be modeled by an S- shaped curve. Therefore (much as when we discussed linear transformations of the X variable), it makes sense to model a non-linear relationship.
Logistic Regression—but similar to OLS.... So. We actually could correct for the heteroskedasticity, and we could transform the equation so that it captured the “non-linear” relationship, and then use linear regression. But what we usually do....
Logistic Regression—but similar to OLS......is use logistic regression to predict the probability of the occurrence of an event.
Logistic Regression—s shaped curve
Logistic Regression— S shaped curve and Bernoulli variables Note that the observed dependent variable is a Bernoulli (or binary) variable. But what we are really interested in is predicting the probability that an event occurs (i.e., the probability that y=1).
Logistic Regression--advantage Logistic regression is particularly handy because (unlike, say, discriminant analysis) it makes no assumptions about how the independent variables are distributed. They don’t have to be continuous versus categorical, normally distributed—they can take any form.
Logistic Regression— exponential values and natural logs Note—”exp” is the exponential function. Ln is the natural log. These are opposites. When we take the exponential function of any number, we take 2.72 raised to the power of that number. So, exp(3)=2.72 * 2.72 * 2.72= If we take ln (20.09), we get the number 3.
Logistic Regression--transformation Note that you can think of logistic regression in terms of transforming the dependent variable so that it fits an s-shaped curve. Note that the odds ratio is the probability that a case will be a 1 divided by the probability that it will not be a 1. The natural log of the odds ratio is the “logit” and it is a linear function of the x’s (that is, of the right hand side of the model).
Logistic Regression--transformation Note that you can equivalently talk about modelling the probability that y=1 (theta, below), as below (these are the same mathematical expressions):
Logistic Regression Note that the independent variables are not related to the probability that y=1. However, the independent variables are linearly related to the logit of the dependent variables.
Logistic Regression--recap Logistic regression analysis, in other words, is very similar to OLS regression, just with a transformation of the regression formula. We also use binomial theory to conduct the tests.
Logistic Regression—Model fit Recall that in OLS, we minimized the squared residuals in order to find the line that best fit the data. In logistic regression analysis, we use a calculus-based function called Maximum Likelihood.
Logistic Regression—MLE Through an iterative process, it finds the function that will maximize our ability to predict the probability of y based on what we know about x. In other words, ML will find the best values for the estimated effect of party, ideology, sex, race, etc. the predict the likelihood that someone will vote for Obama.
Logistic Regression Analysis-- iteration In other words, MLE starts with an initial (arbitrary) guesstimate of what the coefficients will be, and then determines the direction and size change which will increase the log likelihood (goodness of fit—that is, how likely it is that the observed value of the dependent variable can be predicted from the observed variables of the independent variables).
Logistic Regression Analysis-- iteration After estimating an initial function, the program continues estimating with new estimates to reach an improved function—until convergence is reached (that is, the log likelihood, or the goodness of fit, does not change significantly).
Logistic Regression--tests There are two main forms of the likelihood ratio test for goodness of fit.
Logistic Regression--tests 1. Test of the overall model (model chi-square test). Compares the researcher’s model to a reduced model (the baseline model with the constant only). A well fitting model is significant at the.05 level or above—that is, a well fitting model is one that fits the data better than a model with only the constant. A finding of significance means that one can reject the null hypothesis that all of the predictor effects are zero (this is equivalent to an “f” test in OLS.)
Logistic Regression--tests 2. Test of individual model parameters. (Note that the Wald statistic has a chi-squared distribution, but other than that, it is just the same as the “t” that we use in OLS.) You can also calculate a likelihood ratio statistic. Essentially, one is comparing the goodness of fit for the overall model with the goodness of fit with a “nested” model which drops an independent variable. (This is generally considered preferable to the Wald statistic if the coefficient values are very high).
Logistic Regression--interpretation Most commonly, with all other variables held constant, there is a constant increase of b1 in the logit (p) for every 1-unit increase in x1. But remember that even though the right hand side of the model is linearly related to the logit (that is, to the natural log of the odds-ratio), what does it mean for the actual probability that y=1?
Logistic Regression It’s fairly straightforward—it’s multiplicative. If b1 takes the value of 2.3 (and we know that exp(2.3)=10), then if x1 increases by 1, the odds that the dependent variable takes the value of 1 increase tenfold.
Logistic Regression—presentation Likewise, it’s difficult to explain to the reader what the parameter estimates mean—because they reflect changes in the logit (the natural log of the odds-ratio) for each one-unit change in x. But what you want to tell your readers is how much the probability that y=1 changes (given a 1-unit change in x).
Logistic Regression—transform back So, you need to transform into predicted probabilities. Create predicted y’s (just as you would in OLS predicted y=a + bx + bx....) And then transform: e py / (1 + e py ) = predicted probability (many software packages will do this for you. See Gary King. Or, if you are fond of rotary dial phones, create your own excel file to do this (which has the advantage of flexibility)).
Logistic Regression—logit v. probit What’s the difference? Well, MLE requires assumptions about the probability distribution of the errors— logistic regression uses the standard logistic probability distribution, whereas probit uses the standard normal distribution.
Logistic Regression—logit v. probit Logit is more common. And note that logit and probit often give the same results. But note that there can be differences between the two link functions—see this paper by Hahn and Soyer.Hahn and Soyer
Logistic Regression—ordered logit Ordered models assume there's some underlying, unobservable true outcome variable, occurring on an interval scale. We don't observe that interval-level information about the outcome, but only whether that unobserved value crosses some threshold(s) that put the outcome into a lower or a higher category, categories which are ranked, revealing ordinal but not interval-level information.
Logistic Regression—ordered logit If you are using ordered logit, you will get results that include “cut points” (intercepts) and coefficients. OLR essentially runs multiple equations—one less than the number of options on one’s scale.
Logistic Regression—ordered logit For example, assume that you have a 4 point scale, 1=not at all optimistic, 2=not very optimistic, 3=somewhat optimistic, and 4=very optimistic. The first equation compares the likelihood that y=1 to the likelihood that y does not =1 (that is, y=2 or 3 or 4)
Logistic Regression—ordered logit The second equation compares the likelihood that y=1 or 2 to the likelihood that y=3 or 4. The third equation compares the likelihood that y=1, 2, or 3 to the likelihood that y=4.
Logistic Regression—ordered logit Note that OLR only reports one parameter estimate for each indpendent variable. That is, it constrains the parameter estimates to be constant across categories.
Logistic Regression—ordered logit It assumes that the coefficients for the variables would not vary if one actually separately estimated the different equations.
Logistic Regression—ordered logit (Note that in Stata one can actually test if this assumption is true, without running the separate models. There’s some parallel here to the non-linearity issue we discussed last week, where OLS is assuming that your independent variable is linearly related to the dependent variable—but you can actually break apart the independent variable to test whether that is true.)
Logistic Regression—ordered logit The results also give you intercepts (check to see how these are coded—they generally mean the same thing, but the directions of the parameters are different in SAS versus Stata (just as an example). (SAS also models y=0 in a regular logistic regulation, so you need to flip the signs to get the more intuitive results).
Multinomial Analyses Multinomial logit can be used when categories of the dependent variable cannot be ordered in a meaningful way. One category is chosen as the “comparison category”, and the beta coefficient (b) represents the change in odds of being in the dependent variable category relative to the comparison category (for a one-unit change in the right-hand side variables).
Multinomial Analyses The model:
Multinomial Analyses Multinomial logit is simple to estimate—and is often used. However, it is appropriate only if the introduction or removal of a choice has no effect on the (proportional) probability of choosing each of the others. For example—Perot versus Clinton versus Bush, Does removing Perot from the equation mean that the probability of choosing Clinton relative to the probability of choosing Bush changes? If so multinomial logit is inappropriate.
Multinomial Analyses Multinomial probit does not require that assumption that choices are independent across alternatives. And, though it demands a great deal of computing resources, recent advances mean that it is increasingly practical to use.
Multinomial Analyses So, often Multinomial Probit is recommended. Dow and Endersby (2004) point out, however, that the choice of a model really depends on how you see the underlying choice process that generated the observed data. In reality, neither model (MNP or MNL) will be clearly advantageous. Dow and Endersby
Multinomial Analyses And Dow and Endersby argue that MNP sometimes “fails to converge at a global optimum”. Put simply, they argue that MNP often comes up with imprecise estimates—that is, there are multiple sets of estimates that fit the data equally well. Two studies that compare the MNP and MNL model: Alvarez and Nagler (2001) and Quinn et. al. (1999) Alvarez and Nagler argue for MNP—Quinn et. al. are more agnostic.
Multinomial logit Also, conditional logit: Conditional logit only includes variables that are related to the options being chosen for the dependent variable.