Introduction to Logistic Regression Analysis Dr Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia.

Slides:



Advertisements
Similar presentations
Statistical Analysis SC504/HS927 Spring Term 2008
Advertisements

Simple Logistic Regression
Logistic Regression Example: Horseshoe Crab Data
Logistic Regression.
April 25 Exam April 27 (bring calculator with exp) Cox-Regression
Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number.
Confidence Intervals Underlying model: Unknown parameter We know how to calculate point estimates E.g. regression analysis But different data would change.
Logistic Regression Predicting Dichotomous Data. Predicting a Dichotomy Response variable has only two states: male/female, present/absent, yes/no, etc.
Log-linear and logistic models Generalised linear model ANOVA revisited Log-linear model: Poisson distribution logistic model: Binomial distribution Deviances.
Log-linear and logistic models
EPI 809/Spring Multiple Logistic Regression.
Nemours Biomedical Research Statistics April 23, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Notes on Logistic Regression STAT 4330/8330. Introduction Previously, you learned about odds ratios (OR’s). We now transition and begin discussion of.
An Introduction to Logistic Regression
Logistic Regression with “Grouped” Data Lobster Survival by Size in a Tethering Experiment Source: E.B. Wilkinson, J.H. Grabowski, G.D. Sherwood, P.O.
Logistic regression for binary response variables.
Logistic Regression In logistic regression the outcome variable is binary, and the purpose of the analysis is to assess the effects of multiple explanatory.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Logistic Regression and Generalized Linear Models:
Lecture 15: Logistic Regression: Inference and link functions BMTRY 701 Biostatistical Methods II.
MATH 3359 Introduction to Mathematical Modeling Project Multiple Linear Regression Multiple Logistic Regression.
7.1 - Motivation Motivation Correlation / Simple Linear Regression Correlation / Simple Linear Regression Extensions of Simple.
© Department of Statistics 2012 STATS 330 Lecture 25: Slide 1 Stats 330: Lecture 25.
Logistic Regression Pre-Challenger Relation Between Temperature and Field-Joint O-Ring Failure Dalal, Fowlkes, and Hoadley (1989). “Risk Analysis of the.
Logit model, logistic regression, and log-linear model A comparison.
POTH 612A Quantitative Analysis Dr. Nancy Mayo. © Nancy E. Mayo A Framework for Asking Questions Population Exposure (Level 1) Comparison Level 2 OutcomeTimePECOT.
Design and Analysis of Clinical Study 11. Analysis of Cohort Study Dr. Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia.
Repeated Measures  The term repeated measures refers to data sets with multiple measurements of a response variable on the same experimental unit or subject.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
Logistic regression. Analysis of proportion data We know how many times an event occurred, and how many times did not occur. We want to know if these.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
November 5, 2008 Logistic and Poisson Regression: Modeling Binary and Count Data LISA Short Course Series Mark Seiss, Dept. of Statistics.
AN INTRODUCTION TO LOGISTIC REGRESSION ENI SUMARMININGSIH, SSI, MM PROGRAM STUDI STATISTIKA JURUSAN MATEMATIKA UNIVERSITAS BRAWIJAYA.
1 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה פרופ’ בנימין רייזר פרופ’ דוד פרג’י גב’ אפרת ישכיל.
When and why to use Logistic Regression?  The response variable has to be binary or ordinal.  Predictors can be continuous, discrete, or combinations.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
MBP1010 – Lecture 8: March 1, Odds Ratio/Relative Risk Logistic Regression Survival Analysis Reading: papers on OR and survival analysis (Resources)
© Department of Statistics 2012 STATS 330 Lecture 20: Slide 1 Stats 330: Lecture 20.
Regression & Correlation. Review: Types of Variables & Steps in Analysis.
Design and Analysis of Clinical Study 10. Cohort Study Dr. Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia.
Environmental Modeling Basic Testing Methods - Statistics III.
A preliminary exploration into the Binomial Logistic Regression Models in R and their potential application Andrew Trant PPS Arctic - Labrador Highlands.
Logistic Regression. Linear Regression Purchases vs. Income.
Applied Statistics Week 4 Exercise 3 Tick bites and suspicion of Borrelia Mihaela Frincu
Count Data. HT Cleopatra VII & Marcus Antony C c Aa.
Multiple Logistic Regression STAT E-150 Statistical Methods.
Design and Analysis of Clinical Study 7. Analysis of Case-control Study Dr. Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia.
© Department of Statistics 2012 STATS 330 Lecture 22: Slide 1 Stats 330: Lecture 22.
Logistic Regression. Example: Survival of Titanic passengers  We want to know if the probability of survival is higher among children  Outcome (y) =
© Department of Statistics 2012 STATS 330 Lecture 24: Slide 1 Stats 330: Lecture 24.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Logistic Regression and Odds Ratios Psych DeShon.
Logistic Regression. What is the purpose of Regression?
R Programming/ Binomial Models Shinichiro Suna. Binomial Models In binomial model, we have one outcome which is binary and a set of explanatory variables.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: Multiple, Logistic and Proportional Hazards Regression.
LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture.
Logistic Regression Jeff Witmer 30 March Categorical Response Variables Examples: Whether or not a person smokes Success of a medical treatment.
Other tests of significance. Independent variables: continuous Dependent variable: continuous Correlation: Relationship between variables Regression:
Transforming the data Modified from:
Logistic regression.
A priori violations In the following cases, your data violates the normality and homoskedasticity assumption on a priori grounds: (1) count data  Poisson.
CHAPTER 7 Linear Correlation & Regression Methods
Generalized Linear Models
Phân tích hồi quy tuyến tính đa biến sử dụng phần mềm R
SAME THING?.
PSY 626: Bayesian Statistics for Psychological Science
Introduction to Logistic Regression
Logistic Regression with “Grouped” Data
Introductory Statistics
Confidence and Prediction Intervals
Presentation transcript:

Introduction to Logistic Regression Analysis Dr Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia

Introductory example 1 Gender difference in preference for white wine. A group of 57 men and 167 women were asked to make preference for a new white wine. The results are as follows: Gender difference in preference for white wine. A group of 57 men and 167 women were asked to make preference for a new white wine. The results are as follows: Question: Is there a gender effect on the preference ? GenderLikeDislikeALL Men Women ALL

Introductory example 2 Fat concentration and preference. 435 samples of a sauce of various fat concentration were tasted by consumers. There were two outcome: like or dislike. The results are as follows: Question: Is there an effect of fat concentration on the preference ? ConcentrationLikeDislikeALL

Consideration … The question in example 1 can be addressed by “traditional” analysis such as z-statistic or Chi-square test. The question in example 1 can be addressed by “traditional” analysis such as z-statistic or Chi-square test. The question in example 2 is a bit difficult to handle as the factor (fat concentration ) was a continuous variable and the outcome was a categorical variable (like or dislike) The question in example 2 is a bit difficult to handle as the factor (fat concentration ) was a continuous variable and the outcome was a categorical variable (like or dislike) However, there is a much better and more systematic method to analysis these data: Logistic regression However, there is a much better and more systematic method to analysis these data: Logistic regression

Odds and odds ratio Let P be the probability of preference, then the odds of preference is: O = P / (1-P) Let P be the probability of preference, then the odds of preference is: O = P / (1-P) GenderLikeDislikeALLP(like) Men Women ALL O men = / = O men = / = O women = / = O women = / = Odds ratio: OR = O men / O women = / = 2.55 (Meaning: the odds of preference is 2.55 times higher in men than in women)

Meanings of odds ratio OR > 1: the odds of preference is higher in men than in women OR > 1: the odds of preference is higher in men than in women OR < 1: the odds of preference is lower in men than in women OR < 1: the odds of preference is lower in men than in women OR = 1: the odds of preference in men is the same as in women OR = 1: the odds of preference in men is the same as in women How to assess the “significance” of OR ? How to assess the “significance” of OR ?

Computing variance of odds ratio The significance of OR can be tested by calculating its variance. The significance of OR can be tested by calculating its variance. The variance of OR can be indirectly calculated by working with logarithmic scale: The variance of OR can be indirectly calculated by working with logarithmic scale: Convert OR to log(OR) Convert OR to log(OR) Calculate variance of log(OR) Calculate variance of log(OR) Calculate 95% confidence interval of log(OR) Calculate 95% confidence interval of log(OR) Convert back to 95% confidence interval of OR Convert back to 95% confidence interval of OR

Computing variance of odds ratio OR = (23/34)/ (35/132) = 2.55 OR = (23/34)/ (35/132) = 2.55 Log(OR) = log(2.55) = Log(OR) = log(2.55) = Variance of log(OR): Variance of log(OR): V = 1/23 + 1/34 + 1/35 + 1/132 = Standard error of log(OR) Standard error of log(OR) SE = sqrt(0.109) = % confidence interval of log(OR) 95% confidence interval of log(OR) (1.96) = to Convert back to 95% confidence interval of OR Convert back to 95% confidence interval of OR Exp(0.289) = 1.33 to Exp(1.584) = 4.87 GenderLikeDislike Men2334 Women35132 ALL58166

Logistic analysis by R sex <- c(1, 2) like <- c(23, 35) dislike <- c(34, 132) total <- like + dislike prob <- like/total logistic <- glm(prob ~ sex, family=”binomial”, weight=total) GenderLikeDislike Men2334 Women35132 ALL58166 > summary(logistic) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) sex ** --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: e+00 on 1 degrees of freedom Residual deviance: e-15 on 0 degrees of freedom AIC:

Logistic regression model for continuous factor Concentr ation LikeDislike % like

Analysis by using R conc <- c(1.35, 1.60, 1.75, 1.85, 1.95, 2.05, 2.15, 2.25, 2.35) like <- c(13, 19, 67, 45, 71, 50, 35, 7, 1) dislike <- c(0, 0, 2, 5, 8, 20, 31, 49, 12) total <- like+dislike prob <- like/total plot(prob ~ conc, pch=16, xlab="Concentration")

Logistic regression model for continuous factor - model Let p = probability of preference Let p = probability of preference Logit of p is: Logit of p is: Model: Logit(p) =  +  (FAT) where  is the intercept, and  is the slope that have to be estimated from the data

Analysis by using R logistic <- glm(prob ~ conc, family="binomial", weight=total) summary(logistic) Deviance Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) <2e-16 *** conc <2e-16 *** --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: on 8 degrees of freedom Residual deviance: on 7 degrees of freedom AIC:

Logistic regression model for continuous factor – Interpretation The odds ratio associated with each 0.1 increase in fat concentration was 2.90 (95% CI: 2.34, 3.59) The odds ratio associated with each 0.1 increase in fat concentration was 2.90 (95% CI: 2.34, 3.59) Interpretation: Each 0.1 increase in fat concentration was associated with a 2.9 odds of disliking the product. Since the 95% confidence interval exclude 1, this association was statistically significant at the p<0.05 level. Interpretation: Each 0.1 increase in fat concentration was associated with a 2.9 odds of disliking the product. Since the 95% confidence interval exclude 1, this association was statistically significant at the p<0.05 level.

Multiple logistic regression id fx age bmi bmd ictp pinp Fracture (0=no, 1=yes) Dependent variables: age, bmi, bmd, ictp, pinp Question: Which variables are important for fracture?

Multiple logistic regression: R analysis setwd(“c:/works/stats”) fracture <- read.table(“fracture.txt”, header=TRUE, na.string=”.”) names(fracture) fulldata <- na.omit(fracture) attach(fulldata) temp <- glm(fx ~., family=”binomial”, data=fulldata) search <- step(temp) summary(search)

Bayesian Model Average (BMA) analysis Library(BMA) xvars <- fulldata[, 3:7] y <- fx bma.search <- bic.glm(xvars, y, strict=F, OR=20, glm.family="binomial") summary(bma.search)imageplot.bma(bma.search)

Bayesian Model Average (BMA) analysis > summary(bma.search) Call: Best 5 models (cumulative posterior probability = ): p!=0 EV SD model 1 model 2 model 3 model 4 model 5 p!=0 EV SD model 1 model 2 model 3 model 4 model 5 Intercept age bmi bmd ictp pinp nVar BIC post prob

Bayesian Model Average (BMA) analysis > imageplot.bma(bma.search)

Summary of main points Logistic regression model is used to analyze the association between a binary outcome and one or many determinants. Logistic regression model is used to analyze the association between a binary outcome and one or many determinants. The determinants can be binary, categorical or continuous measurements The determinants can be binary, categorical or continuous measurements The model is logit(p) = log[p / (1-p)] =  +  X, where X is a factor, and  and  must be estimated from observed data. The model is logit(p) = log[p / (1-p)] =  +  X, where X is a factor, and  and  must be estimated from observed data.

Summary of main points Exp(  ) is the odds ratio associated with an increment in the determinant X. Exp(  ) is the odds ratio associated with an increment in the determinant X. The logistic regression model can be extended to include many determinants: The logistic regression model can be extended to include many determinants: logit(p) = log[p / (1-p)] =  +  X 1 +  X 2 +  X 3 + … logit(p) = log[p / (1-p)] =  +  X 1 +  X 2 +  X 3 + …