Logistic Regression and Odds Ratios Psych 818 - DeShon.

Slides:



Advertisements
Similar presentations
Statistical Analysis SC504/HS927 Spring Term 2008
Advertisements

Brief introduction on Logistic Regression
Logistic Regression Psy 524 Ainsworth.
Binary Logistic Regression: One Dichotomous Independent Variable
Linear regression models
Logistic Regression Example: Horseshoe Crab Data
Logistic Regression Part I - Introduction. Logistic Regression Regression where the response variable is dichotomous (not continuous) Examples –effect.
Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number.
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
Logistic Regression Predicting Dichotomous Data. Predicting a Dichotomy Response variable has only two states: male/female, present/absent, yes/no, etc.
Chapter 8 Logistic Regression 1. Introduction Logistic regression extends the ideas of linear regression to the situation where the dependent variable,
Introduction to Logistic Regression. Simple linear regression Table 1 Age and systolic blood pressure (SBP) among 33 adult women.
Log-linear and logistic models Generalised linear model ANOVA revisited Log-linear model: Poisson distribution logistic model: Binomial distribution Deviances.
Log-linear and logistic models
Nemours Biomedical Research Statistics April 23, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Notes on Logistic Regression STAT 4330/8330. Introduction Previously, you learned about odds ratios (OR’s). We now transition and begin discussion of.
1 G Lect 11M Binary outcomes in psychology Can Binary Outcomes Be Studied Using OLS Multiple Regression? Transforming the binary outcome Logistic.
An Introduction to Logistic Regression
Introduction to Linear and Logistic Regression. Basic Ideas Linear Transformation Finding the Regression Line Minimize sum of the quadratic residuals.
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Linear Regression and Linear Prediction Predicting the score on one variable.
Generalized Linear Models
Unit 4c: Taxonomies of Logistic Regression Models © Andrew Ho, Harvard Graduate School of EducationUnit 4c – Slide 1
Unit 4c: Taxonomies of Logistic Regression Models © Andrew Ho, Harvard Graduate School of EducationUnit 4c – Slide 1
Unit 4b: Fitting the Logistic Model to Data © Andrew Ho, Harvard Graduate School of EducationUnit 4b – Slide 1
MODELS OF QUALITATIVE CHOICE by Bambang Juanda.  Models in which the dependent variable involves two ore more qualitative choices.  Valuable for the.
7.1 - Motivation Motivation Correlation / Simple Linear Regression Correlation / Simple Linear Regression Extensions of Simple.
ALISON BOWLING THE GENERAL LINEAR MODEL. ALTERNATIVE EXPRESSION OF THE MODEL.
Stats/Methods I JEOPARDY. Jeopardy CorrelationRegressionZ-ScoresProbabilitySurprise $100 $200$200 $300 $500 $400 $300 $400 $300 $400 $500 $400.
Logistic regression. Analysis of proportion data We know how many times an event occurred, and how many times did not occur. We want to know if these.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
CS 478 – Tools for Machine Learning and Data Mining Linear and Logistic Regression (Adapted from various sources) (e.g., Luiz Pessoa PY 206 class at Brown.
AN INTRODUCTION TO LOGISTIC REGRESSION ENI SUMARMININGSIH, SSI, MM PROGRAM STUDI STATISTIKA JURUSAN MATEMATIKA UNIVERSITAS BRAWIJAYA.
LOGISTIC REGRESSION A statistical procedure to relate the probability of an event to explanatory variables Used in epidemiology to describe and evaluate.
Linear correlation and linear regression + summary of tests
When and why to use Logistic Regression?  The response variable has to be binary or ordinal.  Predictors can be continuous, discrete, or combinations.
Linear vs. Logistic Regression Log has a slightly better ability to represent the data Dichotomous Prefer Don’t Prefer Linear vs. Logistic Regression.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Regression Chapter 16. Regression >Builds on Correlation >The difference is a question of prediction versus relation Regression predicts, correlation.
1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary.
Assessing Binary Outcomes: Logistic Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
© Department of Statistics 2012 STATS 330 Lecture 20: Slide 1 Stats 330: Lecture 20.
Introduction to logistic regression and Generalized Linear Models July 14, 2011 Introduction to Statistical Measurement and Modeling Karen Bandeen-Roche,
Psychology 820 Correlation Regression & Prediction.
Logistic Regression. Linear Regression Purchases vs. Income.
Multiple Logistic Regression STAT E-150 Statistical Methods.
© Department of Statistics 2012 STATS 330 Lecture 22: Slide 1 Stats 330: Lecture 22.
Université d’Ottawa - Bio Biostatistiques appliquées © Antoine Morin et Scott Findlay :32 1 Logistic regression.
LOGISTIC REGRESSION Binary dependent variable (pass-fail) Odds ratio: p/(1-p) eg. 1/9 means 1 time in 10 pass, 9 times fail Log-odds ratio: y = ln[p/(1-p)]
Logistic Regression Analysis Gerrit Rooks
Logistic regression (when you have a binary response variable)
ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL.
Roger B. Hammer Assistant Professor Department of Sociology Oregon State University Conducting Social Research Logistic Regression Categorical Data Analysis.
Nonparametric Statistics
R Programming/ Binomial Models Shinichiro Suna. Binomial Models In binomial model, we have one outcome which is binary and a set of explanatory variables.
The “Big Picture” (from Heath 1995). Simple Linear Regression.
LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture.
Odds, Logs, and Logits Psych 818 DeShon. Dichotomous Data Assume you have dichotomous outcome data coded as 0 or 1 (true/false, right/wrong) Assume you.
Nonparametric Statistics
The simple linear regression model and parameter estimation
Logistic regression.
A priori violations In the following cases, your data violates the normality and homoskedasticity assumption on a priori grounds: (1) count data  Poisson.
Logistic Regression APKC – STATS AFAC (2016).
CHAPTER 7 Linear Correlation & Regression Methods
Drop-in Sessions! When: Hillary Term - Week 1 Where: Q-Step Lab (TBC) Sign up with Alice Evans.
Correlation and Simple Linear Regression
Nonparametric Statistics
Correlation and Simple Linear Regression
Presentation transcript:

Logistic Regression and Odds Ratios Psych DeShon

Dichotomous Response Used when the outcome or DV is a dichotomous, random variable Used when the outcome or DV is a dichotomous, random variable Can only take one of two possible values (1,0) Can only take one of two possible values (1,0) Pass/Fail Pass/Fail Disease/No Disease Disease/No Disease Agree/Disagree Agree/Disagree True/False True/False Present/Absent Present/Absent This data structure causes problems for OLS regression This data structure causes problems for OLS regression

Dichotomous Response Properties of dichotomous response variables (Y) Properties of dichotomous response variables (Y) POSITIVE RESPONSE (Success =1)  p POSITIVE RESPONSE (Success =1)  p NEGATIVE RESPONSE (Failure = 0)  q = (1-p) NEGATIVE RESPONSE (Failure = 0)  q = (1-p)  observed proportion of successes  observed proportion of successes Var(Y) = p*q Var(Y) = p*q Ooops! Variance depends on the mean Ooops! Variance depends on the mean

Dichotomous Response Lets generate some (0,1) data Lets generate some (0,1) data Y <- rbinom(n=1000,size=1,prob=.3) Y <- rbinom(n=1000,size=1,prob=.3) mean(Y) = mean(Y) =  =.3  =.3 var(Y) = var(Y) =  2 = (.3 *.7) =.21  2 = (.3 *.7) =.21 hist(Y)

Describing Dichotomous Data Proportion of successes (p) Proportion of successes (p) Odds Odds Odds of an event is the probability it occurs divided by the probability it does not occur Odds of an event is the probability it occurs divided by the probability it does not occur p/(1-p) p/(1-p) if p=.53; odds=.53/.47 = 1.13 if p=.53; odds=.53/.47 = 1.13

Modeling Y (Categorical X) Odds Ratio Odds Ratio Used to compare two proportions across groups Used to compare two proportions across groups odds for males =.54/(1-.53) = 1.13 odds for males =.54/(1-.53) = 1.13 odds for females =.62/(1-.62) = 1.63 odds for females =.62/(1-.62) = 1.63 Odds-ratio = 1.62/1.13 = 1.44 Odds-ratio = 1.62/1.13 = 1.44 A female is 1.44 times more likely than a male to get a 1 A female is 1.44 times more likely than a male to get a 1 Or… 1.13/1.62 = 0.69 Or… 1.13/1.62 = 0.69 A male is.69 times as likely as a female to get a 1 A male is.69 times as likely as a female to get a 1 OR > 1: increased odds for group 1 relative to 2 OR > 1: increased odds for group 1 relative to 2 OR = 1: no difference in odds for group 1 relative to 2 OR = 1: no difference in odds for group 1 relative to 2 OR < 1: lower odds for group 1 relative to 2 OR < 1: lower odds for group 1 relative to 2

Modeling Y (Categorical X) Odds-ratio for a 2 x 2 table Odds-ratio for a 2 x 2 table Odds(Hi) Odds(Hi) 11/4 11/4 Odds(Lo) Odds(Lo) 2/5 2/5 O.R. = (11/4)/(2/5)=8.25 O.R. = (11/4)/(2/5)=8.25 Odds of HD are 8.25 time larger for high cholesterol Odds of HD are 8.25 time larger for high cholesterol Heart Disease YN CholestinDietHi11415 Lo

Odds-Ratio Ranges from 0 to infinity Ranges from 0 to infinity 0  1  ∞ 0  1  ∞ Tends to be skewed Tends to be skewed Often transform to log-odds to get symmetry Often transform to log-odds to get symmetry The log-OR comparing females to males = log(1.44) = 0.36 The log-OR comparing females to males = log(1.44) = 0.36 The log-OR comparing males to females = log(0.69) = The log-OR comparing males to females = log(0.69) = -0.36

Modeling Y (Continuous X) We need to form a general prediction model We need to form a general prediction model Standard OLS regression won’t work Standard OLS regression won’t work The errors of a dichotomous variable can not be normally distributed with constant variance The errors of a dichotomous variable can not be normally distributed with constant variance Also, the estimated parameters don’t make much sense Also, the estimated parameters don’t make much sense Let’s look at a scatterplot of dichotomous data… Let’s look at a scatterplot of dichotomous data…

Dichotomous Scatterplot What smooth function can we use to model something that looks like this? What smooth function can we use to model something that looks like this?

Dichotomous Scatterplot OLS regression? Smooth but… OLS regression? Smooth but…

Dichotomous Scatterplot Could break X into groups to form a more continuous scale for Y Could break X into groups to form a more continuous scale for Y proportion or percentage scale proportion or percentage scale

Dichotomous Scatterplot Now, plot the categorized data Now, plot the categorized data Notice the “S” Shape? = sigmoid Notice that we just shifted to a continuous scale?

Dichotomous Scatterplot We can fit a smooth function by modeling the probability of success (“1”) directly We can fit a smooth function by modeling the probability of success (“1”) directly Model the probability of a ‘1’ rather than the (0,1) data directly

Another Example

Another Example (cont)

Logistic Equation E(y|x)=  (x) = probability that a person with a given x-score will have a score of ‘1’ on Y E(y|x)=  (x) = probability that a person with a given x-score will have a score of ‘1’ on Y Could just expand u to include more predictors for a multiple logistic regression Could just expand u to include more predictors for a multiple logistic regression

Logistic Regression  - shifts the distribution (value of x where  =.5)  - reflects the steepness of the transition (slope)

Features of Logistic Regression Change in probability is not constant (linear) with constant changes in X Change in probability is not constant (linear) with constant changes in X probability of a success (Y = 1) given the predictor variable (X) is a non-linear function probability of a success (Y = 1) given the predictor variable (X) is a non-linear function Can rewrite the logistic equation as an Odds Can rewrite the logistic equation as an Odds

Logit Transform Can linearize the logistic equation by using the “logit” transformation Can linearize the logistic equation by using the “logit” transformation apply the natural log to both sides of the equation apply the natural log to both sides of the equation Yields the logit or log-odds: Yields the logit or log-odds:

Logit Transformation The logit transformation puts the interpretation of the regression estimates back on familiar footing The logit transformation puts the interpretation of the regression estimates back on familiar footing  = expected value of the logit (log-odds) when X = 0  = expected value of the logit (log-odds) when X = 0  = ‘logit difference’ = The amount the logit (log-odds) changes, with a one unit change in X;  = ‘logit difference’ = The amount the logit (log-odds) changes, with a one unit change in X;

Logit Logit Logit the natural log of the odds the natural log of the odds often called a log odds often called a log odds logit scale is continuous, linear, and functions much like a z-score scale. logit scale is continuous, linear, and functions much like a z-score scale. p = 0.50, then logit = 0 p = 0.50, then logit = 0 p = 0.70, then logit = 0.84 p = 0.70, then logit = 0.84 p = 0.30, then logit = p = 0.30, then logit = -0.84

Odds-Ratios and Logistic Regression The slope may also be interpreted as the log odds-ratio associated with a unit increase in x The slope may also be interpreted as the log odds-ratio associated with a unit increase in x exp(  )=odds-ratio exp(  )=odds-ratio Compare the log odds (logit) of a person with a score of x to a person with a score of x+1 Compare the log odds (logit) of a person with a score of x to a person with a score of x+1

There and back again… If the data are consistent with a logistic function, then the relationship between the model and the logit is linear If the data are consistent with a logistic function, then the relationship between the model and the logit is linear The logit scale is somewhat difficult to understand The logit scale is somewhat difficult to understand Could interpret as odds but people seem to prefer probability as the natural scale, so… Could interpret as odds but people seem to prefer probability as the natural scale, so…

There and back again… Logit Odds Probability

Estimation Don’t meet OLS assumptions so some variant of MLE is used Don’t meet OLS assumptions so some variant of MLE is used Let’s develop the likelihood Let’s develop the likelihood Assuming observations are independent… Assuming observations are independent…

Estimation Likelihood Likelihood recall.. recall..

Estimation Upon substitution… Upon substitution…

Example Heart Disease & Age Heart Disease & Age 100 participants 100 participants DV = presence of heart disease DV = presence of heart disease IV = Age IV = Age

Heart Disease Example

library(MASS) library(MASS) glm(formula = y ~ x, family = binomial,data=mydata) glm(formula = y ~ x, family = binomial,data=mydata) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) e-06 *** age e-06 *** Null deviance: on 99 degrees of freedom Residual deviance: on 98 degrees of freedom AIC: Number of Fisher Scoring iterations: 4

Heart Disease Example Logistic regression Logistic regression Odds-Ratio Odds-Ratio exp(.111)=1.117 exp(.111)=1.117

Heart Disease Example In terms of logits… In terms of logits…