Binary Logistic Regression: One Dichotomous Independent Variable

Slides:



Advertisements
Similar presentations
Dummy Dependent variable Models
Advertisements

Continued Psy 524 Ainsworth
Statistical Analysis SC504/HS927 Spring Term 2008
Lesson 10: Linear Regression and Correlation
Linear Regression.
Brief introduction on Logistic Regression
Logistic Regression Psy 524 Ainsworth.
Logistic Regression I Outline Introduction to maximum likelihood estimation (MLE) Introduction to Generalized Linear Models The simplest logistic regression.
Logistic Regression.
Limited Dependent Variables
Models with Discrete Dependent Variables
Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number.
1Prof. Dr. Rainer Stachuletz Limited Dependent Variables P(y = 1|x) = G(  0 + x  ) y* =  0 + x  + u, y = max(0,y*)
Binary Response Lecture 22 Lecture 22.
GRA 6020 Multivariate Statistics; The Linear Probability model and The Logit Model (Probit) Ulf H. Olsson Professor of Statistics.
FIN357 Li1 Binary Dependent Variables Chapter 12 P(y = 1|x) = G(  0 + x  )
An Introduction to Logistic Regression JohnWhitehead Department of Economics Appalachian State University.
So far, we have considered regression models with dummy variables of independent variables. In this lecture, we will study regression models whose dependent.
Nemours Biomedical Research Statistics April 23, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
In previous lecture, we dealt with the unboundedness problem of LPM using the logit model. In this lecture, we will consider another alternative, i.e.
Generalized Linear Models
Unit 4c: Taxonomies of Logistic Regression Models © Andrew Ho, Harvard Graduate School of EducationUnit 4c – Slide 1
Unit 4c: Taxonomies of Logistic Regression Models © Andrew Ho, Harvard Graduate School of EducationUnit 4c – Slide 1
Unit 4b: Fitting the Logistic Model to Data © Andrew Ho, Harvard Graduate School of EducationUnit 4b – Slide 1
Inference for regression - Simple linear regression
Education 795 Class Notes Applied Research Logistic Regression Note set 10.
Excepted from HSRP 734: Advanced Statistical Methods June 5, 2008.
Logistic Regression STA2101/442 F 2014 See last slide for copyright information.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
Session 10. Applied Regression -- Prof. Juran2 Outline Binary Logistic Regression Why? –Theoretical and practical difficulties in using regular (continuous)
9-1 MGMG 522 : Session #9 Binary Regression (Ch. 13)
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
CS 478 – Tools for Machine Learning and Data Mining Linear and Logistic Regression (Adapted from various sources) (e.g., Luiz Pessoa PY 206 class at Brown.
AN INTRODUCTION TO LOGISTIC REGRESSION ENI SUMARMININGSIH, SSI, MM PROGRAM STUDI STATISTIKA JURUSAN MATEMATIKA UNIVERSITAS BRAWIJAYA.
When and why to use Logistic Regression?  The response variable has to be binary or ordinal.  Predictors can be continuous, discrete, or combinations.
Linear vs. Logistic Regression Log has a slightly better ability to represent the data Dichotomous Prefer Don’t Prefer Linear vs. Logistic Regression.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Week 5: Logistic regression analysis Overview Questions from last week What is logistic regression analysis? The mathematical model Interpreting the β.
Assessing Binary Outcomes: Logistic Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
MBP1010 – Lecture 8: March 1, Odds Ratio/Relative Risk Logistic Regression Survival Analysis Reading: papers on OR and survival analysis (Resources)
Chapter 13: Limited Dependent Vars. Zongyi ZHANG College of Economics and Business Administration.
Multiple Logistic Regression STAT E-150 Statistical Methods.
Generalized Linear Models (GLMs) and Their Applications.
LOGISTIC REGRESSION Binary dependent variable (pass-fail) Odds ratio: p/(1-p) eg. 1/9 means 1 time in 10 pass, 9 times fail Log-odds ratio: y = ln[p/(1-p)]
Logistic Regression Analysis Gerrit Rooks
Logistic regression. Recall the simple linear regression model: y =  0 +  1 x +  where we are trying to predict a continuous dependent variable y from.
Dates Presentations Wed / Fri Ex. 4, logistic regression, Monday Dec 7 th Final Tues. Dec 8 th, 3:30.
Logistic Regression Saed Sayad 1www.ismartsoft.com.
CSE 5331/7331 F'07© Prentice Hall1 CSE 5331/7331 Fall 2007 Regression Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Roger B. Hammer Assistant Professor Department of Sociology Oregon State University Conducting Social Research Logistic Regression Categorical Data Analysis.
Logistic Regression and Odds Ratios Psych DeShon.
Nonparametric Statistics
Logistic Regression For a binary response variable: 1=Yes, 0=No This slide show is a free open source document. See the last slide for copyright information.
The Probit Model Alexander Spermann University of Freiburg SS 2008.
Analysis of matched data Analysis of matched data.
Instructor: R. Makoto 1richard makoto UZ Econ313 Lecture notes.
LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture.
Logistic Regression: Regression with a Binary Dependent Variable.
The Probit Model Alexander Spermann University of Freiburg SoSe 2009
Nonparametric Statistics
Logistic Regression APKC – STATS AFAC (2016).
Logistic Regression.
THE LOGIT AND PROBIT MODELS
CHAPTER 10: Logistic Regression
Generalized Linear Models
THE LOGIT AND PROBIT MODELS
Nonparametric Statistics
Introduction to Logistic Regression
Logistic Regression.
Presentation transcript:

Binary Logistic Regression: One Dichotomous Independent Variable Adapted from John Whitehead Department of Economics East Carolina University http://personal.ecu.edu/whiteheadj/data/logit/logit.ppt And from notes from Kimberly Maier, Michigan State University

Why use logistic regression? There are many important research topics for which the dependent variable is "limited." For example: whether or not a person smokes, or drinks, or skips class, or takes advanced mathematics. For these the outcome is not continuous or distributed normally. Example: Are mother’s who have high school education less likely to have children with IEP’s (individualized plans, indicating cognitive or emotional disabilities Binary logistic regression is a type of regression analysis where the dependent variable is a dummy variable: coded 0 (did not smoke) or 1(did smoke)

A Problem with Linear Regression (slides 3-6 from Kim Maier) However, transforming the independent variables does not remedy all of the potential problems. What if we have a non-normally distributed dependent variable? The following example depicts the problem of fitting a regular regression line to a non-normal dependent variable). Suppose you have a binary outcome variable. The problem of having a non-continuous dependent variable becomes apparent when you create a scatterplot of the relationship. Here, we see that it is very difficult to decipher a relationship among these variables.

A Problem with Linear Regression We could severely simplify the plot by drawing a line between the means for the two dependent variable levels, but this is problematic in two ways: (a) the line seems to oversimplify the relationship and (b) it gives predictions that cannot be observable values of Y for extreme values of X. The reason this doesn’t work is because the approach is analogous to fitting a linear model to the probability of the event. As you know, probabilities can only take values between 0 and 1. Hence, we need a different approach to ensure that our model is appropriate for the data.

A Problem with Linear Regression The mean of a binomial variable coded as (1,0) is a proportion. We could plot conditional probabilities as Y for each level of X. Of course, we could fit a linear model to these conditional probabilities, but (as shown) the linear model does not predict the maximum likelihood estimates for each group (the mean—shown by the circles) and it still produces unobservable predictions for extreme values of the dependent variable. This plot gives us a better picture of the relationship between X and Y. It is clear that the relationship is non-linear. In fact, the shape of the curve is sigmoid.

The Linear Probability Model In the OLS regression: Y = β0 + β1X + e ; where Y = (0, 1) The error terms are heteroskedastic e is not normally distributed because Y takes on only two values The predicted probabilities can be greater than 1 or less than 0

A Problem with Linear Regression If you think about the shape of this distribution, you may posit that the function is a cumulative probability distribution. As stated previously, we can model the nonlinear relationship between X and Y by transforming one of the variables. Two common transformations that result in sigmoid functions are probit and logit transformations. In short, a probit transformation imposes a cumulative normal function on the data. But, probit functions are difficult to work with because they require integration. Logit transformations, on the other hand, give nearly identical values as a probit function, but they are much easier to work with because the function can be simplified to a linear equation.

The Logistic Regression Model The "logit" model solves these problems: ln[p/(1-p)] = 0 + 1X p is the probability that the event Y occurs, p(Y=1) [range=0 to 1] p/(1-p) is the "odds ratio" [range=0 to ∞] ln[p/(1-p)]: log odds ratio, or "logit“ [range=-∞ to +∞]

Odds & Odds Ratios Recall the definitions of an odds: The odds has a range of 0 to  with values greater than 1 associated with an event being more likely to occur than to not occur and values less than 1 associated with an event that is less likely to occur than not occur. The logit is defined as the log of the odds: This transformation is useful because it creates a variable with a range from - to +. Hence, this transformation solves the problem we encountered in fitting a linear model to probabilities. Because probabilities (the dependent variable) only range from 0 to 1, we can get linear predictions that are outside of this range. If we transform our probabilities to logits, then we do not have this problem because the range of the logit is not restricted. In addition, the interpretation of logits is simple—take the exponential of the logit and you have the odds for the two groups in question.

Interpretation of Ogive The logistic distribution constrains the estimated probabilities to lie between 0 and 1. The estimated probability is: p = 1/[1 + e(0 + 1X )] if you let 0 + 1X =0, then p = .50 as 0 + 1X gets really big, p approaches 1 as 0 + 1X gets really small, p approaches 0

Introducing the Odds Ratio for the Logistic Transformation If there is a 75% chance that it will rain tomorrow, then 3 out of 4 times we say this it will rain. That means for every three times it rains once it will not. The odds of it raining tomorrow are 3 to 1. This can also be understood as (¾)/¼=3/1. If the odds that my pony will win the race is 1 to 3, that means for every 4 races it runs, it will win 1 and lose 3. Therefore I should be paid $3 for every dollar I bet.

Example Interpretation of coefficient b1 p/(1-p)=odds 5% / 95% =.5/.95=.056 Odds in IEP in with HS = (33/623)/(590/623)= 33/590=.056 8% / 92% =.8/.92 =.089 Odds in IEP, No HS = (45/553)/(508/553) =45/508=.089 Change in odds due to HS =.056/.089=.63 The odds that the child of a mother with high school education has an IEP is .63 that of other mothers – it is lower because they are less likely. Logistic regression coefficient=LN(.63)= -.46 Change in odds =e0 + 1/e0=e1 e-.46 =.63

Running logistic in spss

Running logistic in SPSS for child has IEP or not in ECLS-K ln[p/(1-p)] = 0 + 1X= ln[p/(1-p)] = -2.424 -.46X Change in odds =e0 + 1/e0=e1 e-.46 =.63

Hypothesis Testing The Wald statistic for the  coefficient is: Wald = [ /s.e.B]2 which is distributed chi-square with 1 degree of freedom.

Running logistic in SPSS for child has IEP or not in ECLS-K

Logistic Regression Reflection What part is most confusing to you? What are the possible interpretations for the part that is confusing? Find a partner or two and share your questions

References http://personal.ecu.edu/whiteheadj/data/logit/ Video for running logistic in spss http://www.youtube.com/watch?v=ICN6CMDxHwg&noredirect=1 power points http://personal.ecu.edu/whiteheadj/data/logit/logit.ppt http://www.google.com/search?q=logistic+regression+ppt&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a with sas: http://www.math.yorku.ca/SCS/Courses/grcat/grc6.html http://www.ats.ucla.edu/stat/sas/seminars/sas_logistic/logistic1.htm http://www.pauldickman.com/teaching/sas/sas_logistic_seminar8.pdf for poisson http://www.uwm.edu/IMT/Computing/sasdoc8/sashtml/insight/chap17/sect1.htm In stata http://psg_mac43.ucsf.edu/ticr/syllabus/courses/38/2004/11/02/Lecture/notes/Session%204%20lecture%20slides.ppt