Presentation is loading. Please wait.

Presentation is loading. Please wait.

Logistic Regression and Odds Ratios Psych 818 - DeShon.

Similar presentations


Presentation on theme: "Logistic Regression and Odds Ratios Psych 818 - DeShon."— Presentation transcript:

1 Logistic Regression and Odds Ratios Psych 818 - DeShon

2 Dichotomous Response Used when the outcome or DV is a dichotomous, random variable Used when the outcome or DV is a dichotomous, random variable Can only take one of two possible values (1,0) Can only take one of two possible values (1,0) Pass/Fail Pass/Fail Disease/No Disease Disease/No Disease Agree/Disagree Agree/Disagree True/False True/False Present/Absent Present/Absent This data structure causes problems for OLS regression This data structure causes problems for OLS regression

3 Dichotomous Response Properties of dichotomous response variables (Y) Properties of dichotomous response variables (Y) POSITIVE RESPONSE (Success =1)  p POSITIVE RESPONSE (Success =1)  p NEGATIVE RESPONSE (Failure = 0)  q = (1-p) NEGATIVE RESPONSE (Failure = 0)  q = (1-p)  observed proportion of successes  observed proportion of successes Var(Y) = p*q Var(Y) = p*q Ooops! Variance depends on the mean Ooops! Variance depends on the mean

4 Dichotomous Response Lets generate some (0,1) data Lets generate some (0,1) data Y <- rbinom(n=1000,size=1,prob=.3) Y <- rbinom(n=1000,size=1,prob=.3) mean(Y) = 0.295 mean(Y) = 0.295  =.3  =.3 var(Y) = 0.208 var(Y) = 0.208  2 = (.3 *.7) =.21  2 = (.3 *.7) =.21 hist(Y)

5 Describing Dichotomous Data Proportion of successes (p) Proportion of successes (p) Odds Odds Odds of an event is the probability it occurs divided by the probability it does not occur Odds of an event is the probability it occurs divided by the probability it does not occur p/(1-p) p/(1-p) if p=.53; odds=.53/.47 = 1.13 if p=.53; odds=.53/.47 = 1.13

6 Modeling Y (Categorical X) Odds Ratio Odds Ratio Used to compare two proportions across groups Used to compare two proportions across groups odds for males =.54/(1-.53) = 1.13 odds for males =.54/(1-.53) = 1.13 odds for females =.62/(1-.62) = 1.63 odds for females =.62/(1-.62) = 1.63 Odds-ratio = 1.62/1.13 = 1.44 Odds-ratio = 1.62/1.13 = 1.44 A female is 1.44 times more likely than a male to get a 1 A female is 1.44 times more likely than a male to get a 1 Or… 1.13/1.62 = 0.69 Or… 1.13/1.62 = 0.69 A male is.69 times as likely as a female to get a 1 A male is.69 times as likely as a female to get a 1 OR > 1: increased odds for group 1 relative to 2 OR > 1: increased odds for group 1 relative to 2 OR = 1: no difference in odds for group 1 relative to 2 OR = 1: no difference in odds for group 1 relative to 2 OR < 1: lower odds for group 1 relative to 2 OR < 1: lower odds for group 1 relative to 2

7 Modeling Y (Categorical X) Odds-ratio for a 2 x 2 table Odds-ratio for a 2 x 2 table Odds(Hi) Odds(Hi) 11/4 11/4 Odds(Lo) Odds(Lo) 2/5 2/5 O.R. = (11/4)/(2/5)=8.25 O.R. = (11/4)/(2/5)=8.25 Odds of HD are 8.25 time larger for high cholesterol Odds of HD are 8.25 time larger for high cholesterol Heart Disease YN CholestinDietHi11415 Lo268 131023

8 Odds-Ratio Ranges from 0 to infinity Ranges from 0 to infinity 0  1  ∞ 0  1  ∞ Tends to be skewed Tends to be skewed Often transform to log-odds to get symmetry Often transform to log-odds to get symmetry The log-OR comparing females to males = log(1.44) = 0.36 The log-OR comparing females to males = log(1.44) = 0.36 The log-OR comparing males to females = log(0.69) = -0.36 The log-OR comparing males to females = log(0.69) = -0.36

9 Modeling Y (Continuous X) We need to form a general prediction model We need to form a general prediction model Standard OLS regression won’t work Standard OLS regression won’t work The errors of a dichotomous variable can not be normally distributed with constant variance The errors of a dichotomous variable can not be normally distributed with constant variance Also, the estimated parameters don’t make much sense Also, the estimated parameters don’t make much sense Let’s look at a scatterplot of dichotomous data… Let’s look at a scatterplot of dichotomous data…

10 Dichotomous Scatterplot What smooth function can we use to model something that looks like this? What smooth function can we use to model something that looks like this?

11 Dichotomous Scatterplot OLS regression? Smooth but… OLS regression? Smooth but…

12 Dichotomous Scatterplot Could break X into groups to form a more continuous scale for Y Could break X into groups to form a more continuous scale for Y proportion or percentage scale proportion or percentage scale

13 Dichotomous Scatterplot Now, plot the categorized data Now, plot the categorized data Notice the “S” Shape? = sigmoid Notice that we just shifted to a continuous scale?

14 Dichotomous Scatterplot We can fit a smooth function by modeling the probability of success (“1”) directly We can fit a smooth function by modeling the probability of success (“1”) directly Model the probability of a ‘1’ rather than the (0,1) data directly

15 Another Example

16 Another Example (cont)

17 Logistic Equation E(y|x)=  (x) = probability that a person with a given x-score will have a score of ‘1’ on Y E(y|x)=  (x) = probability that a person with a given x-score will have a score of ‘1’ on Y Could just expand u to include more predictors for a multiple logistic regression Could just expand u to include more predictors for a multiple logistic regression

18 Logistic Regression  - shifts the distribution (value of x where  =.5)  - reflects the steepness of the transition (slope)

19 Features of Logistic Regression Change in probability is not constant (linear) with constant changes in X Change in probability is not constant (linear) with constant changes in X probability of a success (Y = 1) given the predictor variable (X) is a non-linear function probability of a success (Y = 1) given the predictor variable (X) is a non-linear function Can rewrite the logistic equation as an Odds Can rewrite the logistic equation as an Odds

20 Logit Transform Can linearize the logistic equation by using the “logit” transformation Can linearize the logistic equation by using the “logit” transformation apply the natural log to both sides of the equation apply the natural log to both sides of the equation Yields the logit or log-odds: Yields the logit or log-odds:

21 Logit Transformation The logit transformation puts the interpretation of the regression estimates back on familiar footing The logit transformation puts the interpretation of the regression estimates back on familiar footing  = expected value of the logit (log-odds) when X = 0  = expected value of the logit (log-odds) when X = 0  = ‘logit difference’ = The amount the logit (log-odds) changes, with a one unit change in X;  = ‘logit difference’ = The amount the logit (log-odds) changes, with a one unit change in X;

22 Logit Logit Logit the natural log of the odds the natural log of the odds often called a log odds often called a log odds logit scale is continuous, linear, and functions much like a z-score scale. logit scale is continuous, linear, and functions much like a z-score scale. p = 0.50, then logit = 0 p = 0.50, then logit = 0 p = 0.70, then logit = 0.84 p = 0.70, then logit = 0.84 p = 0.30, then logit = -0.84 p = 0.30, then logit = -0.84

23 Odds-Ratios and Logistic Regression The slope may also be interpreted as the log odds-ratio associated with a unit increase in x The slope may also be interpreted as the log odds-ratio associated with a unit increase in x exp(  )=odds-ratio exp(  )=odds-ratio Compare the log odds (logit) of a person with a score of x to a person with a score of x+1 Compare the log odds (logit) of a person with a score of x to a person with a score of x+1

24 There and back again… If the data are consistent with a logistic function, then the relationship between the model and the logit is linear If the data are consistent with a logistic function, then the relationship between the model and the logit is linear The logit scale is somewhat difficult to understand The logit scale is somewhat difficult to understand Could interpret as odds but people seem to prefer probability as the natural scale, so… Could interpret as odds but people seem to prefer probability as the natural scale, so…

25 There and back again… Logit Odds Probability

26 Estimation Don’t meet OLS assumptions so some variant of MLE is used Don’t meet OLS assumptions so some variant of MLE is used Let’s develop the likelihood Let’s develop the likelihood Assuming observations are independent… Assuming observations are independent…

27 Estimation Likelihood Likelihood recall.. recall..

28 Estimation Upon substitution… Upon substitution…

29 Example Heart Disease & Age Heart Disease & Age 100 participants 100 participants DV = presence of heart disease DV = presence of heart disease IV = Age IV = Age

30 Heart Disease Example

31 library(MASS) library(MASS) glm(formula = y ~ x, family = binomial,data=mydata) glm(formula = y ~ x, family = binomial,data=mydata) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -5.30945 1.13365 -4.683 2.82e-06 *** age 0.11092 0.02406 4.610 4.02e-06 *** Null deviance: 136.66 on 99 degrees of freedom Residual deviance: 107.35 on 98 degrees of freedom AIC: 111.35 Number of Fisher Scoring iterations: 4

32 Heart Disease Example Logistic regression Logistic regression Odds-Ratio Odds-Ratio exp(.111)=1.117 exp(.111)=1.117

33 Heart Disease Example In terms of logits… In terms of logits…


Download ppt "Logistic Regression and Odds Ratios Psych 818 - DeShon."

Similar presentations


Ads by Google