Lecture 18 Ordinal and Polytomous Logistic Regression BMTRY 701 Biostatistical Methods II.

Slides:



Advertisements
Similar presentations
Continued Psy 524 Ainsworth
Advertisements

Gologit2: Generalized Logistic Regression/ Partial Proportional Odds Models for Ordinal Dependent Variables Part 1: The gologit model & gologit2 program.
What is Interaction for A Binary Outcome? Chun Li Department of Biostatistics Center for Human Genetics Research September 19, 2007.
Lecture 16: Logistic Regression: Goodness of Fit Information Criteria ROC analysis BMTRY 701 Biostatistical Methods II.
Departments of Medicine and Biostatistics
SC968: Panel Data Methods for Sociologists Random coefficients models.
Logistic Regression Example: Horseshoe Crab Data
Introduction to Logistic Regression In Stata Maria T. Kaylen, Ph.D. Indiana Statistical Consulting Center WIM Spring 2014 April 11, 2014, 3:00-4:30pm.
Gologit2: Generalized Logistic Regression/ Partial Proportional Odds Models for Ordinal Dependent Variables Richard Williams Department of Sociology University.
Repeated Measures, Part 3 May, 2009 Charles E. McCulloch, Division of Biostatistics, Dept of Epidemiology and Biostatistics, UCSF.
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
1 BINARY CHOICE MODELS: PROBIT ANALYSIS In the case of probit analysis, the sigmoid function F(Z) giving the probability is the cumulative standardized.
Multinomial Logit Sociology 8811 Lecture 11 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
Lecture 17: Regression for Case-control Studies BMTRY 701 Biostatistical Methods II.
Generalized Ordered Logit Models Part II: Interpretation
Ordinal Regression Analysis: Fitting the Proportional Odds Model Using Stata and SAS Xing Liu Neag School of Education University of Connecticut.
In previous lecture, we highlighted 3 shortcomings of the LPM. The most serious one is the unboundedness problem, i.e., the LPM may make the nonsense predictions.
Multinomial Logistic Regression
Sociology 601 Class 28: December 8, 2009 Homework 10 Review –polynomials –interaction effects Logistic regressions –log odds as outcome –compared to linear.
Ordered probit models.
BIOST 536 Lecture 3 1 Lecture 3 – Overview of study designs Prospective/retrospective  Prospective cohort study: Subjects followed; data collection in.
Introduction to Regression Analysis Straight lines, fitted values, residual values, sums of squares, relation to the analysis of variance.
Ordinal Logistic Regression
EPI 809/Spring Multiple Logistic Regression.
Nemours Biomedical Research Statistics April 23, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Logistic Regression Biostatistics 510 March 15, 2007 Vanessa Perez.
In previous lecture, we dealt with the unboundedness problem of LPM using the logit model. In this lecture, we will consider another alternative, i.e.
Notes on Logistic Regression STAT 4330/8330. Introduction Previously, you learned about odds ratios (OR’s). We now transition and begin discussion of.
BINARY CHOICE MODELS: LOGIT ANALYSIS
TOBIT ANALYSIS Sometimes the dependent variable in a regression model is subject to a lower limit or an upper limit, or both. Suppose that in the absence.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 10) Slideshow: binary choice logit models Original citation: Dougherty, C. (2012) EC220.
Logistic Regression II Simple 2x2 Table (courtesy Hosmer and Lemeshow) Exposure=1Exposure=0 Disease = 1 Disease = 0.
Methods Workshop (3/10/07) Topic: Event Count Models.
1 BINARY CHOICE MODELS: PROBIT ANALYSIS In the case of probit analysis, the sigmoid function is the cumulative standardized normal distribution.
Basic epidemiologic analysis with Stata Biostatistics 212 Lecture 5.
1 Rob Woodruff Battelle Memorial Institute, Health & Analytics Cynthia Ferre Centers for Disease Control and Prevention Conditional.
Multinomial Logit Sociology 8811 Lecture 10
HSRP 734: Advanced Statistical Methods June 19, 2008.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Lecture 7: Multiple Linear Regression Interpretation with different types of predictors BMTRY 701 Biostatistical Methods II.
Lecture 6: Multiple Linear Regression Adjusted Variable Plots BMTRY 701 Biostatistical Methods II.
Lecture 6: Multiple Linear Regression Adjusted Variable Plots BMTRY 701 Biostatistical Methods II.
Multiple Logistic Regression STAT E-150 Statistical Methods.
A first order model with one binary and one quantitative predictor variable.
The dangers of an immediate use of model based methods The chronic bronchitis study: bronc: 0= no 1=yes poll: pollution level cig: cigarettes smokes per.
Qualitative and Limited Dependent Variable Models ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.
Dates Presentations Wed / Fri Ex. 4, logistic regression, Monday Dec 7 th Final Tues. Dec 8 th, 3:30.
‘Interpreting results from statistical modelling – A seminar for Scottish Government Social Researchers’ Professor Vernon Gayle and Dr Paul Lambert (Stirling.
Conditional Logistic Regression Epidemiology/Biostats VHM812/802 Winter 2016, Atlantic Veterinary College, PEI Raju Gautam.
Exact Logistic Regression
1 BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL Economists are often interested in the factors behind the decision-making of individuals or enterprises,
1 Ordinal Models. 2 Estimating gender-specific LLCA with repeated ordinal data Examining the effect of time invariant covariates on class membership The.
Birthweight (gms) BPDNProp Total BPD (Bronchopulmonary Dysplasia) by birth weight Proportion.
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
Logistic Regression Jeff Witmer 30 March Categorical Response Variables Examples: Whether or not a person smokes Success of a medical treatment.
EHS Lecture 14: Linear and logistic regression, task-based assessment
Advanced Quantitative Techniques
Discussion: Week 4 Phillip Keung.
Lecture 18 Matched Case Control Studies
Introduction to Logistic Regression
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Logistic Regression.
Gologit2: Generalized Logistic Regression/ Partial Proportional Odds Models for Ordinal Dependent Variables Part 1: The gologit model & gologit2 program.
Problems with infinite solutions in logistic regression
Logistic Regression 4 Sociology 8811 Lecture 9
Introduction to Econometrics, 5th edition
Presentation transcript:

Lecture 18 Ordinal and Polytomous Logistic Regression BMTRY 701 Biostatistical Methods II

Categorical Outcomes  Logistic regression is appropriate for binary outcomes  What about other kinds of categorical data? >2 categories ordinal data  Standard logistic is not applicable unless you ‘threshold’ the date or collapse categories  BMTRY 711: Analysis of Categorical Data  This is just an overview

Ordinal Logistic Regression  Ordinal Dependent Variable Teaching experience SES (high, middle, low) Degree of Agreement Ability level (e.g. literacy, reading) Severity of disease/outcome Severity of toxicity  Context is important  Example: attitudes towards smoking

Proportional Odds Model  One of several possible regression models for the analysis of ordinal data, and also the most common.  Model predicts the ln(odds) of being in category j or beyond.  Simplifying assumption: “proportional odds” Effect of covariate assumed to be invariant across splits Example: 4 categories  0 vs 1,2,3  0,1 vs 2,3  0,1,2 vs 3 Assumes that each of these comparisons yields the same odds ratio

Motivating Example: YTS  The South Carolina Youth Tobacco Survey (SC YTS) is part of the National Youth Tobacco Survey program sponsored by the Centers for Disease Control and Prevention. The YTS is an annual school-based survey designed to evaluate youth-related smoking practices, including initiation and prevalence, cessation, attitudes towards smoking, media influences, and more. The SC YTS is coordinated by the SC Department of Health and Environmental Control and has been administered yearly since Data for this report are based on years The SC YTS uses a two-stage sample cluster design to select a representative sample of public middle (grades 6-8) and high school (grades 9-12) students.

Ordinal Outcomes. tab cr44 “do you think | smoking | cigarettes | makes young | people look | cool or fit | in?” | Freq. Percent Cum definitely yes | probably yes | probably not | 1, definitely not | 4, Total | 7, “do you think | young people | risk harming | themselves if | they smoke | from | ciga | Freq. Percent Cum definitely yes | 5, probably yes | 1, probably not | definitely not | Total | 7,

What factors are related to these attitudes?  Gender?  Grade?  Race?  parental education (surrogate for SES)?  year? (2005, 200, 2007)  have tried cigarettes?  school performance?  smoker in the home?

Tabulation of gender vs. look cool “do you think | smoking | cigarettes | makes young | people look | cool or fit | gender in?” | 0 1 | Total definitely yes | | 455 probably yes | | 810 probably not | | 1,320 definitely not | 2,158 2,797 | 4, Total | 3,574 3,966 | 7,540

Possible “breaks” OR = 1.81 malefemale def yes else OR = 1.59 malefemale yes no OR = 1.57 malefemale else def no

Proportional Odds Assumption  How to implement this?  Model the probability of ‘cumulative’ logits  Instead of  Here, we have

The (simple) ordinal logistic model Warning! different packages parameterize it different ways! Stata codes it differently than SAS and R. Notice how this differs from logistic regression: there is a ‘level’ specific intercept. But, there is just ONE log odds ratio describing the association between x and y.

Example. ologit lookcool gender Iteration 0: log likelihood = Iteration 1: log likelihood = Iteration 2: log likelihood = Ordered logistic regression Number of obs = 7540 LR chi2(1) = Prob > chi2 = Log likelihood = Pseudo R2 = lookcool | Coef. Std. Err. z P>|z| [95% Conf. Interval] gender | /cut1 | /cut2 | /cut3 |

R estimation  Different parameterization  Makes you think about what the model is doing!

> library(Design) > oreg <- lrm(lookcool ~ gender, data=data) > oreg Logistic Regression Model lrm(formula = lookcool ~ gender, data = data) Frequencies of Responses Frequencies of Missing Values Due to Each Variable lookcool gender Obs Max Deriv Model L.R. d.f. P C Dxy e Gamma Tau-a R2 Brier Coef S.E. Wald Z P y>= y>= y>= gender

MLR. ologit lookcool gender evertried smokerhome grade school_perf Iteration 0: log likelihood = Iteration 1: log likelihood = Iteration 2: log likelihood = Iteration 3: log likelihood = Ordered logistic regression Number of obs = 2125 LR chi2(5) = Prob > chi2 = Log likelihood = Pseudo R2 = lookcool | Coef. Std. Err. z P>|z| [95% Conf. Interval] gender | evertried | smokerhome | grade | school_per~e | /cut1 | /cut2 | /cut3 |

It is a pretty strong assumption  How can we check?  Simple check as shown in 2x2 table.  Continuous variables: harder need to consider the model no direct ‘tabular’ comparison  multiple regression: does it hold for all?  Tricky! It needs to make sense and you need to do some ‘model checking’ for all of your variables  Worthwhile to check each individually.

There is another approach  There is a test of proportionality.  Implemented easily in Stata with an add-on package: omodel Ho: proportionality holds Ha: proportionality is violated  Why? violation would require more parameters and would be a larger model  What does small p-value imply? but be careful of sample size! large sample sizes will make it hard to ‘adhere’ to proportionality assumption

Estimation in Stata. omodel logit lookcool gender Iteration 0: log likelihood = Iteration 1: log likelihood = Iteration 2: log likelihood = Ordered logit estimates Number of obs = 7540 LR chi2(1) = Prob > chi2 = Log likelihood = Pseudo R2 = lookcool | Coef. Std. Err. z P>|z| [95% Conf. Interval] gender | _cut1 | (Ancillary parameters) _cut2 | _cut3 | Approximate likelihood-ratio test of proportionality of odds across response categories: chi2(2) = 2.43 Prob > chi2 =

. omodel logit lookcool grade Iteration 0: log likelihood = Iteration 1: log likelihood = Iteration 2: log likelihood = Ordered logit estimates Number of obs = 7505 LR chi2(1) = 0.68 Prob > chi2 = Log likelihood = Pseudo R2 = lookcool | Coef. Std. Err. z P>|z| [95% Conf. Interval] grade | _cut1 | (Ancillary parameters) _cut2 | _cut3 | Approximate likelihood-ratio test of proportionality of odds across response categories: chi2(2) = Prob > chi2 =

What would the ORs be?  Generate three separate binary outcome variables from the ordinal variable lookcool1v234 lookcool12v34 lookcool123v4  Estimate the odds ratio for each binary outcome

Stata Code gen lookcool1v234=1 if lookcool==2 | lookcool==3 | lookcool==4 replace lookcool1v234=0 if lookcool==1 gen lookcool12v34=1 if lookcool==3 | lookcool==4 replace lookcool12v34=0 if lookcool==1 | lookcool==2 gen lookcool123v4=1 if lookcool==4 replace lookcool123v4=0 if lookcool==2 | lookcool==3 | lookcool==1 logit lookcool1v234 grade logit lookcool12v34 grade logit lookcool123v4 grade

Results  For a one grade difference (range = 6 – 12) lookcool1v234 vs. grade: OR = (0.93) lookcool12vs34 vs. grade: OR = 1.04 (p=0.03) lookcool123v4 vs. grade: OR = 0.98 (p=0.11)

Another approach: Polytomous Logistic Regression  Polytomous (aka Polychotomous) Logistic Regression  Fits the regression model with all contrasts.  Can be used as an inferential model  Or, can be used to estimate odds ratio to see if they look ‘ordered”  Model is different though

. mlogit lookcool gender Iteration 0: log likelihood = Iteration 1: log likelihood = Iteration 2: log likelihood = Iteration 3: log likelihood = Multinomial logistic regression Number of obs = 7540 LR chi2(3) = Prob > chi2 = Log likelihood = Pseudo R2 = lookcool | Coef. Std. Err. z P>|z| [95% Conf. Interval] definitely~s | gender | _cons | probably yes | gender | _cons | probably not | gender | _cons | (lookcool==definitely not is the base outcome)

Interpretation  For gender, notice the ordered nature of the odds ratio  Suggests that it may be appropriate to use an ordinal model  This model is more general, less restrictive  but, sort of a mess to interpret

. mlogit lookcool grade Iteration 0: log likelihood = Iteration 1: log likelihood = Iteration 2: log likelihood = Multinomial logistic regression Number of obs = 7505 LR chi2(3) = Prob > chi2 = Log likelihood = Pseudo R2 = lookcool | Coef. Std. Err. z P>|z| [95% Conf. Interval] definitely~s | grade | _cons | probably yes | grade | _cons | probably not | grade | _cons | (lookcool==definitely not is the base outcome)

In R?  mlogit library  requires a data transformation step