Presentation is loading. Please wait.

Presentation is loading. Please wait.

Modeling with Dichotomous Dependent Variables

Similar presentations


Presentation on theme: "Modeling with Dichotomous Dependent Variables"— Presentation transcript:

1 Modeling with Dichotomous Dependent Variables
Logistic Regression Modeling with Dichotomous Dependent Variables

2 A New Type of Model… Dichotomous Dependent Variable:
Why did someone vote for Bush or Kerry? Why did residents own or rent their houses? Why do some people drink alcohol and others don’t? What determined if a household owned a car?

3 Dependent Variable… Is binary, with a yes or a no answer
Can be coded, 1 for yes and 0 for no. There are no other valid responses.

4 Problem: OLS Regression does not model the relationship well

5 Solution: Use a Different Functional Form
The properties we need: The model should be bounded by 0 and 1 The model should estimate a value for the dependent variable in terms of the probability of being in one category or the other, e.g., a owner or renter; or a Bush voter or Kerry voter

6 Solution, cont. We want to know the probability, p, that a particular case falls in the 0 or the 1 category. We want to derive a model which gives good estimates of 0 and 1, or put another way, that a particular case is likely to be a 0 or a 1.

7 Solution: A Logistic Curve

8 The Logistic Function Probability that a case is a 0 or a 1 is distributed according to the logistic function.

9 Remember probabilities…
Probabilities range from 0 to 1. Probability: frequency of being in one category relative to the total of all categories. Example: The probability that the first card dealt in a card game is a queen of hearts is 1/52 (one in 52). It does us no good to “predict” a value of .5 as in the linear regression model.

10 But can we manipulate probabilities to estimate the logistic function?
Steps: Convert probabilities to odds ratios Convert odds ratios to log odds or logits

11 Manipulating probabilities to estimate the logistic function
LIST V2 V3 V4 V5 /N=13 Case number P P P/1-P ln(P/1-P)

12 Logistic Function

13 Logistic Function

14 Steps…. Log odds = a + bx Odds ratio = Exponentiate (a + bx)
Probability is distributed according to the logistic function

15 An Example Determinants of Homeownership: Age of the householder
Age of the householder squared Building Type Year house was built Householder’s Ethnicity Occupational status scale

16 Calculating the Model Maximum Likelihood Estimation (not OLS)
Estimates of the b’s, standard errors, t ratios and p values for coefficients Coefficients are estimates of the impact of the independent variable on the logit of the dependent variable

17 Logistic Regression Model
Parameter Estimate S.E t-ratio p-value 1 CONSTANT 2 AGE 3 AGESQ 4 BLDGTYP2$_cottage 5 BLDGTYP2$_duplex 6 YEAR 7 GERMAN 8 POLISH 9 OCCSCALE

18 Logistic Regression model, cont.
Parameter Odds Ratio Upper Lower 2 AGE 3 AGESQ 4 BLDGTYP2$_cottage 5 BLDGTYP2$_duplex 6 YEAR 7 GERMAN 8 POLISH 9 OCCSCALE Log Likelihood of constants only model = LL(0) = 2*[LL(N)-LL(0)] = with 8 df Chi-sq p-value = 0.000 McFadden's Rho-Squared =

19 Converting Odds Ratios to Probabilities
Odds ratio = P/1-P. For Germans, compared with the omitted category (Americans and other ethnicities) controlling for other variables, = P/(1-P) Germans are more likely to own houses than Americans. Can we be more specific?

20 Calculating Probability of a Case
Log odds of homeownership = Age Agesquared cottage – duplex Year German Polish occscale Plug in values and solve the equation. Exponentiate the result to create the odds Convert the odds to a probability for the case.

21 Calculations Log odds of homeownership = Age Agesquared cottage – duplex Year German Polish occscale For a 40 year old skilled, American born worker, living in a residence built in 1892: Log odds of homeownership = * * * *3 Log odds = .699

22 Calculations, cont. log odds = .699
odds = anti log or exponentiation of.699 = 2.012 odds = P/(1-P) = 2.012 Solve for P. The result is .67.

23 More calculations…. How about a 40 year old German skilled worker in an 1892 residence? Log odds of homeownership = Age Agesquared cottage – duplex Year German Polish occscale Log odds = * * * *3 = 1.405 Note as well that = Note as well that .699 * (or the odds ratio for the variable “German”) = 1.405

24 More calculations Convert the log odds to odds, e.g., take the antilog of = Odds = = P/(1-P). Solve for P. P = .803. So the probability of the increase in home ownership between Americans and Germans is from .67 to .803 or about 13%.

25 More calculations For a 30 year old American worker in a residence built in 1892: Log odds = * * * *3 = Odds = Antilog of (-.401) = Probability of ownership = .670/1.670 = 0.401

26 Classification Table Model Prediction Success Table
Actual Predicted Choice Actual Choice Response Reference Total Response Reference Pred. Tot Correct Success Ind Tot. Correct Sensitivity: Specificity: False Reference: False Response:

27 Extending the Logic… Logistic Regression can be extended to more than 2 categories for the dependent variable, for multi response models Classification Tables can be used to understand misclassified cases Results can be analyzed for patterns across different values of the independent variables.


Download ppt "Modeling with Dichotomous Dependent Variables"

Similar presentations


Ads by Google