Modeling with Dichotomous Dependent Variables

Slides:



Advertisements
Similar presentations
Continued Psy 524 Ainsworth
Advertisements

Sociology 680 Multivariate Analysis Logistic Regression.
Brief introduction on Logistic Regression
Analysis of Categorical Data Nick Jackson University of Southern California Department of Psychology 10/11/
A Model to Evaluate Recreational Management Measures Objective I – Stock Assessment Analysis Create a model to distribute estimated landings (A + B1 fish)
Logistic Regression Psy 524 Ainsworth.
Discrete Choice Modeling William Greene Stern School of Business IFS at UCL February 11-13, 2004
Logistic Regression.
1 9. Logistic Regression ECON 251 Research Methods.
Limited Dependent Variables
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 13 Nonlinear and Multiple Regression.
Models with Discrete Dependent Variables
Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number.
Multipe and non-linear regression. What is what? Regression: One variable is considered dependent on the other(s) Correlation: No variables are considered.
N-way ANOVA. 3-way ANOVA 2 H 0 : The mean respiratory rate is the same for all species H 0 : The mean respiratory rate is the same for all temperatures.
Binary Response Lecture 22 Lecture 22.
QUALITATIVE AND LIMITED DEPENDENT VARIABLE MODELS.
Introduction to Logistic Regression. Simple linear regression Table 1 Age and systolic blood pressure (SBP) among 33 adult women.
Statistical Inference and Regression Analysis: GB Professor William Greene Stern School of Business IOMS Department Department of Economics.
An Introduction to Logistic Regression JohnWhitehead Department of Economics Appalachian State University.
An Introduction to Logistic Regression
Logistic regression for binary response variables.
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Wednesday PM  Presentation of AM results  Multiple linear regression Simultaneous Simultaneous Stepwise Stepwise Hierarchical Hierarchical  Logistic.
1 G Lect 11W Logistic Regression Review Maximum Likelihood Estimates Probit Regression and Example Model Fit G Multiple Regression Week 11.
CS 478 – Tools for Machine Learning and Data Mining Linear and Logistic Regression (Adapted from various sources) (e.g., Luiz Pessoa PY 206 class at Brown.
AN INTRODUCTION TO LOGISTIC REGRESSION ENI SUMARMININGSIH, SSI, MM PROGRAM STUDI STATISTIKA JURUSAN MATEMATIKA UNIVERSITAS BRAWIJAYA.
When and why to use Logistic Regression?  The response variable has to be binary or ordinal.  Predictors can be continuous, discrete, or combinations.
Linear vs. Logistic Regression Log has a slightly better ability to represent the data Dichotomous Prefer Don’t Prefer Linear vs. Logistic Regression.
APPLIED DATA ANALYSIS IN CRIMINAL JUSTICE CJ 525 MONMOUTH UNIVERSITY Juan P. Rodriguez.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Lecture Slide #1 Logistic Regression Analysis Estimation and Interpretation Hypothesis Tests Interpretation Reversing Logits: Probabilities –Averages.
Week 5: Logistic regression analysis Overview Questions from last week What is logistic regression analysis? The mathematical model Interpreting the β.
Logistic Regression. Linear Regression Purchases vs. Income.
Multiple Logistic Regression STAT E-150 Statistical Methods.
1 G Lect 7a G Lecture 7a Comparing proportions from independent samples Analysis of matched samples Small samples and 2  2 Tables Strength.
LOGISTIC REGRESSION Binary dependent variable (pass-fail) Odds ratio: p/(1-p) eg. 1/9 means 1 time in 10 pass, 9 times fail Log-odds ratio: y = ln[p/(1-p)]
Logistic Regression Analysis Gerrit Rooks
Qualitative and Limited Dependent Variable Models ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.
Dates Presentations Wed / Fri Ex. 4, logistic regression, Monday Dec 7 th Final Tues. Dec 8 th, 3:30.
Logistic regression (when you have a binary response variable)
Logistic Regression Saed Sayad 1www.ismartsoft.com.
Applied Epidemiologic Analysis - P8400 Fall 2002 Labs 6 & 7 Case-Control Analysis ----Logistic Regression Henian Chen, M.D., Ph.D.
Logistic Regression An Introduction. Uses Designed for survival analysis- binary response For predicting a chance, probability, proportion or percentage.
Roger B. Hammer Assistant Professor Department of Sociology Oregon State University Conducting Social Research Logistic Regression Categorical Data Analysis.
Logistic Regression and Odds Ratios Psych DeShon.
Nonparametric Statistics
Instructor: R. Makoto 1richard makoto UZ Econ313 Lecture notes.
LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture.
Logistic Regression: Regression with a Binary Dependent Variable.
Nonparametric Statistics
BINARY LOGISTIC REGRESSION
Logistic Regression When and why do we use logistic regression?
Logistic Regression APKC – STATS AFAC (2016).
Advanced Quantitative Techniques
CHAPTER 7 Linear Correlation & Regression Methods
Notes on Logistic Regression
Chapter 13 Nonlinear and Multiple Regression
Advanced Quantitative Techniques
Drop-in Sessions! When: Hillary Term - Week 1 Where: Q-Step Lab (TBC) Sign up with Alice Evans.
Multiple logistic regression
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Nonparametric Statistics
Jeremiah Coldsmith University of Pittsburgh at Johnstown
Logistic Regression.
BIVARIATE ANALYSIS: Measures of Association Between Two Variables
Lexico-grammar: From simple counts to complex models
BIVARIATE ANALYSIS: Measures of Association Between Two Variables
Logistic Regression.
Presentation transcript:

Modeling with Dichotomous Dependent Variables Logistic Regression Modeling with Dichotomous Dependent Variables

A New Type of Model… Dichotomous Dependent Variable: Why did someone vote for Bush or Kerry? Why did residents own or rent their houses? Why do some people drink alcohol and others don’t? What determined if a household owned a car?

Dependent Variable… Is binary, with a yes or a no answer Can be coded, 1 for yes and 0 for no. There are no other valid responses.

Problem: OLS Regression does not model the relationship well

Solution: Use a Different Functional Form The properties we need: The model should be bounded by 0 and 1 The model should estimate a value for the dependent variable in terms of the probability of being in one category or the other, e.g., a owner or renter; or a Bush voter or Kerry voter

Solution, cont. We want to know the probability, p, that a particular case falls in the 0 or the 1 category. We want to derive a model which gives good estimates of 0 and 1, or put another way, that a particular case is likely to be a 0 or a 1.

Solution: A Logistic Curve

The Logistic Function Probability that a case is a 0 or a 1 is distributed according to the logistic function.

Remember probabilities… Probabilities range from 0 to 1. Probability: frequency of being in one category relative to the total of all categories. Example: The probability that the first card dealt in a card game is a queen of hearts is 1/52 (one in 52). It does us no good to “predict” a value of .5 as in the linear regression model.

But can we manipulate probabilities to estimate the logistic function? Steps: Convert probabilities to odds ratios Convert odds ratios to log odds or logits

Manipulating probabilities to estimate the logistic function LIST V2 V3 V4 V5 /N=13 Case number P 1-P P/1-P ln(P/1-P) 1 0.010 0.990 0.010 -4.595 2 0.050 0.950 0.053 -2.944 3 0.100 0.900 0.111 -2.197 4 0.200 0.800 0.250 -1.386 5 0.300 0.700 0.429 -0.847 6 0.400 0.600 0.667 -0.405 7 0.500 0.500 1.000 0.000 8 0.600 0.400 1.500 0.405 9 0.700 0.300 2.333 0.847 10 0.800 0.200 4.000 1.386 11 0.900 0.100 9.000 2.197 12 0.950 0.050 19.000 2.944 13 0.990 0.010 99.000 4.595

Logistic Function

Logistic Function

Steps…. Log odds = a + bx Odds ratio = Exponentiate (a + bx) Probability is distributed according to the logistic function

An Example Determinants of Homeownership: Age of the householder Age of the householder squared Building Type Year house was built Householder’s Ethnicity Occupational status scale

Calculating the Model Maximum Likelihood Estimation (not OLS) Estimates of the b’s, standard errors, t ratios and p values for coefficients Coefficients are estimates of the impact of the independent variable on the logit of the dependent variable

Logistic Regression Model Parameter Estimate S.E. t-ratio p-value 1 CONSTANT -6.976 1.501 -4.647 0.000 2 AGE 0.250 0.060 4.132 0.000 3 AGESQ -0.002 0.001 -3.400 0.001 4 BLDGTYP2$_cottage 0.036 0.277 0.131 0.895 5 BLDGTYP2$_duplex -1.432 0.328 -4.363 0.000 6 YEAR 0.061 0.022 2.757 0.006 7 GERMAN 0.706 0.264 2.677 0.007 8 POLISH 0.777 0.422 1.841 0.066 9 OCCSCALE 0.190 0.091 2.074 0.038

Logistic Regression model, cont. Parameter Odds Ratio Upper Lower 2 AGE 1.284 1.445 1.140 3 AGESQ 0.998 0.999 0.997 4 BLDGTYP2$_cottage 1.037 1.784 0.603 5 BLDGTYP2$_duplex 0.239 0.454 0.125 6 YEAR 1.063 1.109 1.018 7 GERMAN 2.026 3.398 1.208 8 POLISH 2.175 4.972 0.951 9 OCCSCALE 1.209 1.446 1.011 Log Likelihood of constants only model = LL(0) = -303.864 2*[LL(N)-LL(0)] = 85.180 with 8 df Chi-sq p-value = 0.000 McFadden's Rho-Squared = 0.140

Converting Odds Ratios to Probabilities Odds ratio = P/1-P. For Germans, compared with the omitted category (Americans and other ethnicities) controlling for other variables, 2.026 = P/(1-P) Germans are more likely to own houses than Americans. Can we be more specific?

Calculating Probability of a Case Log odds of homeownership = -6.976 + .250Age - .002Agesquared + .036 cottage – 1.432 duplex + .061Year + .706 German + .777 Polish + .190 occscale Plug in values and solve the equation. Exponentiate the result to create the odds Convert the odds to a probability for the case.

Calculations Log odds of homeownership = -6.976 + .250Age - .002Agesquared + .036 cottage – 1.432 duplex + .061Year + .706 German + .777 Polish + .190 occscale For a 40 year old skilled, American born worker, living in a residence built in 1892: Log odds of homeownership = -6.976 + .250*40 - .002*1600 + .061* 5 + .190*3 Log odds = .699

Calculations, cont. log odds = .699 odds = anti log or exponentiation of.699 = 2.012 odds = P/(1-P) = 2.012 Solve for P. The result is .67.

More calculations…. How about a 40 year old German skilled worker in an 1892 residence? Log odds of homeownership = -6.976 + .250Age - .002Agesquared + .036 cottage – 1.432 duplex + .061Year + .706 German + .777 Polish + .190 occscale Log odds = -6.976 + .250*40 - .002*1600 + .061* 5 + .706 + .190*3 = 1.405 Note as well that .699 + .706 = 1.405. Note as well that .699 * 2.026 (or the odds ratio for the variable “German”) = 1.405

More calculations Convert the log odds to odds, e.g., take the antilog of 1.405 = 4.076. Odds = 4.076 = P/(1-P). Solve for P. P = .803. So the probability of the increase in home ownership between Americans and Germans is from .67 to .803 or about 13%.

More calculations For a 30 year old American worker in a residence built in 1892: Log odds = -6.976 + .250*30 - .002*900 + .061*5 + .190*3 = -0.401 Odds = Antilog of (-.401) = 0.670 Probability of ownership = .670/1.670 = 0.401

Classification Table Model Prediction Success Table Actual Predicted Choice Actual Choice Response Reference Total Response 281.647 85.353 367.000 Reference 85.353 58.647 144.000 Pred. Tot. 367.000 144.000 511.000 Correct 0.767 0.407 Success Ind. 0.049 0.125 Tot. Correct 0.666 Sensitivity: 0.767 Specificity: 0.407 False Reference: 0.233 False Response: 0.593

Extending the Logic… Logistic Regression can be extended to more than 2 categories for the dependent variable, for multi response models Classification Tables can be used to understand misclassified cases Results can be analyzed for patterns across different values of the independent variables.