Presentation is loading. Please wait.

Presentation is loading. Please wait.

Analyzing dichotomous dummy variables

Similar presentations


Presentation on theme: "Analyzing dichotomous dummy variables"— Presentation transcript:

1 Analyzing dichotomous dummy variables
Quantitative Methods Analyzing dichotomous dummy variables

2 Logistic Regression Analysis
Like ordinary regression and ANOVA, logistic regression is part of a category of models called generalized linear models. Generalized linear models were developed to unify various statistical models (linear regression, logistic regression, poisson regression). We can think of maximum likelihood as a general algorithm to estimate all these models.

3 Logistic Regression—when?
Logistic regression models are appropriate for dependent variables coded 0/1. We only observe “0” and “1” for the dependent variable—but we think of the dependent variable conceptually as a probability that “1” will occur.

4 Logistic Regression--examples
Some examples Vote for Obama (yes, no) Turned out to vote (yes, no) Sought medical assistance in last year (yes, no)

5 Logistic Regression—why not OLS?
Why can’t we use OLS? After all, linear regression is so straightforward, and (unlike other models) actually has a “closed form solution” for the estimates.

6 Logistic Regression—why not OLS?
Three problems with using OLS. First, what is our dependent variable, conceptually? It is the probability of y=1. But we only observe y=0 and y=1. If we use OLS, we’ll get predicted values that fall between 0 and 1—which is what we want— but we’ll also get predicted values that are greater than 1, or less than 0. That makes no sense.

7 Logistic Regression—Why not OLS?
Three problems using OLS. Second problem—there is heteroskedasticity in the model. Think about the meaning of “residual”. The residual is the difference between the observed and the predicted Y. By definition, what will that residual look like at the center of the distribution? By definition, what will that residual look like at the tails of the distribution?

8 Logistic Regression—why not OLS?
Three problems using OLS. The third problem is substantive. The reality is that many choice functions can be modeled by an S- shaped curve. Therefore (much as when we discussed linear transformations of the X variable), it makes sense to model a non-linear relationship.

9 Logistic Regression—but similar to OLS....
So. We actually could correct for the heteroskedasticity, and we could transform the equation so that it captured the “non-linear” relationship, and then use linear regression. But what we usually do....

10 Logistic Regression—but similar to OLS...
...is use logistic regression to predict the probability of the occurrence of an event.

11 Logistic Regression—s shaped curve

12 Logistic Regression— S shaped curve and Bernoulli variables
Note that the observed dependent variable is a Bernoulli (or binary) variable. But what we are really interested in is predicting the probability that an event occurs (i.e., the probability that y=1).

13 Logistic Regression--advantage
Logistic regression is particularly handy because (unlike, say, discriminant analysis) it makes no assumptions about how the independent variables are distributed. They don’t have to be continuous versus categorical, normally distributed—they can take any form.

14 Logistic Regression— exponential values and natural logs
Note—”exp” is the exponential function. Ln is the natural log. These are opposites. When we take the exponential function of any number, we take 2.72 raised to the power of that number. So, exp(3)=2.72 * 2.72 * 2.72=20.09. If we take ln (20.09), we get the number 3.

15 Logistic Regression--transformation
Note that you can think of logistic regression in terms of transforming the dependent variable so that it fits an s-shaped curve. Note that the odds ratio is the probability that a case will be a 1 divided by the probability that it will not be a 1. The natural log of the odds ratio is the “logit” and it is a linear function of the x’s (that is, of the right hand side of the model).

16 Logistic Regression--transformation
Note that you can equivalently talk about modelling the probability that y=1 (theta, below), as below (these are the same mathematical expressions):

17 Logistic Regression Note that the independent variables are not related to the probability that y=1. However, the independent variables are linearly related to the logit of the dependent variables.

18 Logistic Regression--recap
Logistic regression analysis, in other words, is very similar to OLS regression, just with a transformation of the regression formula. We also use binomial theory to conduct the tests.

19 Logistic Regression--interpretation
Most commonly, with all other variables held constant, there is a constant increase of b1 in the logit (p) for every 1-unit increase in x1. But remember that even though the right hand side of the model is linearly related to the logit (that is, to the natural log of the odds-ratio), what does it mean for the actual probability that y=1?

20 Logistic Regression It’s fairly straightforward—it’s multiplicative.
If b1 takes the value of 2.3 (and we know that exp(2.3)=10), then if x1 increases by 1, the odds that the dependent variable takes the value of 1 increase tenfold.

21 Od pravdepodobnosti šancí k logaritmom šancí
Všetko začína s pojmom pravdepodobnosti. Pravdepodobnosť úspechu nejakej udalosti, je 0,8. Potom pravdepodobnosť poruchy je 1- 0,8 = 0,2. Šance na úspech sú definované ako pomer pravdepodobnosti úspechu cez pravdepodobnosti poruchy. V našom príklade, šance na úspech sú 0,8 / 0,2 = 4. To znamená, že šance na úspech sú 4 ku 1. V prípade, že pravdepodobnosť úspechu je 0,5, teda 50 až 50 percent šanca, potom šanca na úspech je 1 až 1.

22 Od pravdepodobnosti šancí k logaritmom šancí
Transformácia z pravdepodobnosťou šanca je monotónna transformácie, čo znamená, že pravdepodobnosť zvyšovať so zvyšujúcou pravdepodobnosť alebo naopak. Pravdepodobnosť sa pohybuje od 0 a 1. kurzy v rozmedzí od 0 do kladného nekonečna.

23 Logistická regresia bez prediktorov
Inými slovami, lokujúca konštanta modelu bez prediktora je odhadom logaritmu šance byť v triede vyznamenaných z celkovej skúmanej vzorky. Môžeme taktiež transformovať logaritmus šance späť na pravdepodobnosť.

24 Logistická regresia s jednou dichotomickou premennou
V našom datasete. Aké sú šance mužov byť v triede vyznamenaných? Aké sú šance žien byť v triede vyznamenaných? Môžeme vypočítať ručne tieto šance od stola: u mužov, šance sú v triede vyznamenaných sú (17/91) / (74/91) = 17/74 = .23; a pre ženy, šance sú v triede vyznamenaných sú (32/109) / (77/109) = 32/77 = .42. Pomer šancí pre ženy ku šancí pre mužov je (32/77) / (17/74) = (32 * 74) / (77 * 17) = 1,809. Takže šanca pre mužov sú 17-74, šanca pre ženy je 32 až 77, a šance pre ženy sú o 81% vyššie ako šance pre mužov

25 Zamerajme sa na šance pre mužov a ženy a výstupu z logistickej regresie.
Intercept z záznamu je šanca pre mužov, pretože muž je referenčná skupina (female = 0). Použitie šance sme vypočítali vyššie pre mužov, môžeme potvrdiť toto: log (.23) = -1,47. Koeficient pre ženy je log pomer šancí medzi ženskými skupinami a mužskými skupinami: log (1.809) = .593. Takže môžeme dostať pomer šancí tým, že vypočítame exponenciálny koeficient pre ženy. Väčšina balíčkov pre štatistické zobrazenie oboch surové regresné koeficienty a umocňuje koeficienty pre logistické regresné modely. 

26 zdroje l/odds_ratio.htm


Download ppt "Analyzing dichotomous dummy variables"

Similar presentations


Ads by Google