Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Thomas Grein, Alain Moren, Viviane Bremer.

Slides:



Advertisements
Similar presentations
Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Thomas Grein, Alain Moren.
Advertisements

1 Matching EPIET introductory course Mahón, 2011.
Continued Psy 524 Ainsworth
Statistical Analysis SC504/HS927 Spring Term 2008
Brief introduction on Logistic Regression
M2 Medical Epidemiology
Logistic Regression Psy 524 Ainsworth.
Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June 20091/29 Multivariate analysis: Introduction Third training Module EpiSouth.
Logistic Regression I Outline Introduction to maximum likelihood estimation (MLE) Introduction to Generalized Linear Models The simplest logistic regression.
Logistic Regression.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Departments of Medicine and Biostatistics
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 13 Nonlinear and Multiple Regression.
Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number.
Multinomial Logistic Regression
Introduction to Logistic Regression. Simple linear regression Table 1 Age and systolic blood pressure (SBP) among 33 adult women.
Chapter 11 Survival Analysis Part 2. 2 Survival Analysis and Regression Combine lots of information Combine lots of information Look at several variables.
REGRESSION AND CORRELATION
(Correlation and) (Multiple) Regression Friday 5 th March (and Logistic Regression too!)
An Introduction to Logistic Regression
Assessing Survival: Cox Proportional Hazards Model Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Regression and Correlation
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Quantitative Business Analysis for Decision Making Multiple Linear RegressionAnalysis.
Logistic Regression II Simple 2x2 Table (courtesy Hosmer and Lemeshow) Exposure=1Exposure=0 Disease = 1 Disease = 0.
1 Regression Models with Binary Response Regression: “Regression is a process in which we estimate one variable on the basis of one or more other variables.”
Amsterdam Rehabilitation Research Center | Reade Correlation and linear regression analysis Martin van der Esch, Phd.
Simple Linear Regression
Assessing Survival: Cox Proportional Hazards Model
Excepted from HSRP 734: Advanced Statistical Methods June 5, 2008.
Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,
Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Alain Moren, Thomas Grein.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
AN INTRODUCTION TO LOGISTIC REGRESSION ENI SUMARMININGSIH, SSI, MM PROGRAM STUDI STATISTIKA JURUSAN MATEMATIKA UNIVERSITAS BRAWIJAYA.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
LOGISTIC REGRESSION A statistical procedure to relate the probability of an event to explanatory variables Used in epidemiology to describe and evaluate.
APPLIED DATA ANALYSIS IN CRIMINAL JUSTICE CJ 525 MONMOUTH UNIVERSITY Juan P. Rodriguez.
HSRP 734: Advanced Statistical Methods July 17, 2008.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Week 5: Logistic regression analysis Overview Questions from last week What is logistic regression analysis? The mathematical model Interpreting the β.
MBP1010 – Lecture 8: March 1, Odds Ratio/Relative Risk Logistic Regression Survival Analysis Reading: papers on OR and survival analysis (Resources)
Logistic Regression Applications Hu Lunchao. 2 Contents 1 1 What Is Logistic Regression? 2 2 Modeling Categorical Responses 3 3 Modeling Ordinal Variables.
Regression & Correlation. Review: Types of Variables & Steps in Analysis.
Evaluating Risk Adjustment Models Andy Bindman MD Department of Medicine, Epidemiology and Biostatistics.
1 Multivariable Modeling. 2 nAdjustment by statistical model for the relationships of predictors to the outcome. nRepresents the frequency or magnitude.
Multiple Logistic Regression STAT E-150 Statistical Methods.
Multiple Regression  Similar to simple regression, but with more than one independent variable R 2 has same interpretation R 2 has same interpretation.
Logistic Regression. Linear regression – numerical response Logistic regression – binary categorical response eg. has the disease, or unaffected by the.
Logistic Regression Analysis Gerrit Rooks
POPLHLTH 304 Regression (modelling) in Epidemiology Simon Thornley (Slides adapted from Assoc. Prof. Roger Marshall)
1 Introduction to Modeling Beyond the Basics (Chapter 7)
Probability and odds Suppose we a frequency distribution for the variable “TB status” The probability of an individual having TB is frequencyRelative.
Introduction to Biostatistics, Harvard Extension School, Fall, 2005 © Scott Evans, Ph.D.1 Contingency Tables.
Nonparametric Statistics
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Chapter Seventeen Copyright © 2004 John Wiley & Sons, Inc. Multivariate Data Analysis.
REGRESSION MODEL FITTING & IDENTIFICATION OF PROGNOSTIC FACTORS BISMA FAROOQI.
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: Multiple, Logistic and Proportional Hazards Regression.
Nonparametric Statistics
The simple linear regression model and parameter estimation
BINARY LOGISTIC REGRESSION
Logistic Regression When and why do we use logistic regression?
Logistic Regression APKC – STATS AFAC (2016).
Multiple logistic regression
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Nonparametric Statistics
Introduction to Logistic Regression
Presentation transcript:

Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Thomas Grein, Alain Moren, Viviane Bremer

Objectives When do we need to use logistic regression Principles of logistic regression Uses of logistic regression What to keep in mind

Chlamorea Sexually transmitted infection –Virus recently identified –Leads to general rash, blush, pimples and feeling of shame –Increasing prevalence with age –Risk factors unknown so far

Case control study Population of Berlin 150 cases, 150 controls Hypothesis: Consistent use of condoms protects against chlamorea Questionnaire with questions on demographic characteristics, sexual behaviour OR, t-test

Results bivariate analysis Cases n=150 Controls n=150 Odds ratio Used condoms at last sex Did not use condoms 11060Ref

Results bivariate analysis Cases n=150 Controls n=150 Odds ratio Single Currently in a relationship 25100Ref

Results bivariate analysis Cases n=150 Controls n=150 T-test nr partners during last year 42p=0.001 Mean age in years 3926p=0.001 Confounding?

a c b d OR raw a1a1 c1c1 b1b1 d1d1 a2a2 c2c2 b2b2 d2d2 OR 1 OR 2 aiai cici bibi didi OR i a1a1 c1c1 b1b1 d1d1 a2a2 c2c2 b2b2 d2d2 OR 1 OR 2 aiai cici bibi didi OR i a1a1 c1c1 b1b1 d1d1 a2a2 c2c2 b2b2 d2d2 OR 1 OR 2 aiai cici bibi didi OR i a1a1 c1c1 b1b1 d1d1 a2a2 c2c2 b2b2 d2d2 OR 1 OR 2 aiai cici bibi didi OR i a1a1 c1c1 b1b1 d1d1 a2a2 c2c2 b2b2 d2d2 OR 1 OR 2 aiai cici bibi didi OR i a1a1 c1c1 b1b1 d1d1 a2a2 c2c2 b2b2 d2d2 OR 1 OR 2 aiai cici bibi didi OR i a1a1 c1c1 b1b1 d1d1 a2a2 c2c2 b2b2 d2d2 OR 1 OR 2 aiai cici bibi didi OR i a1a1 c1c1 b1b1 d1d1 a2a2 c2c2 b2b2 d2d2 OR 1 OR 2 aiai cici bibi didi OR i a1a1 c1c1 b1b1 d1d1 a2a2 c2c2 b2b2 d2d2 OR 1 OR 2 aiai cici bibi didi OR i a1a1 c1c1 b1b1 d1d1 a2a2 c2c2 b2b2 d2d2 OR 1 OR 2 aiai cici bibi didi OR i a1a1 c1c1 b1b1 d1d1 a2a2 c2c2 b2b2 d2d2 OR 1 OR 2 aiai cici bibi didi OR i a1a1 c1c1 b1b1 d1d1 a2a2 c2c2 b2b2 d2d2 OR 1 OR 2 aiai cici bibi didi OR i a1a1 c1c1 b1b1 d1d1 a2a2 c2c2 b2b2 d2d2 OR 1 OR 2 aiai cici bibi didi OR i a1a1 c1c1 b1b1 d1d1 a2a2 c2c2 b2b2 d2d2 OR 1 OR 2 aiai cici bibi didi OR i a1a1 c1c1 b1b1 d1d1 a2a2 c2c2 b2b2 d2d2 OR 1 OR 2 aiai cici bibi didi OR i a1a1 c1c1 b1b1 d1d1 a2a2 c2c2 b2b2 d2d2 OR 1 OR 2 aiai cici bibi didi OR i a1a1 c1c1 b1b1 d1d1 a2a2 c2c2 b2b2 d2d2 OR 1 OR 2 a3a3 c3c3 b3b3 d3d3 OR 3 aiai cici bibi didi OR 4 Chlamorea and condom use Single status Agegroup Number of partners Stratification

Lets go one step back

Simple linear regression Table 1 Age and systolic blood pressure (SBP) among 33 adult women

SBP (mm Hg) Age (years) adapted from Colton T. Statistics in Medicine. Boston: Little Brown, 1974

Simple linear regression Relation between 2 continuous variables (SBP and age) Regression coefficient 1 –Measures association between y and x –Amount by which y changes on average when x changes by one unit –Least squares method y x Slope

What if we have more than one independent variable?

Multiple risk factors Objective: To attribute to each risk factors the respective effect (RR) it has on the occurrence of disease.

Types of multivariable analysis Multiple models –Linear regression –Logistic regression –Cox model –Poisson regression –Loglinear model –Discriminant analysis… Choice of the tool according objectives, study design and variables

Multiple linear regression Relation between a continuous variable and a set of i variables Partial regression coefficients i –Amount by which y changes when x i changes by one unit and all the other x i remain constant –Measures association between x i and y adjusted for all other x i Example –Number of partners in relation to age & income

Multiple linear regression Predicted Predictor variables Response variableExplanatory variables Outcome variableCovariables Dependent Independent variables y (number of partners) = α + β 1 age + β 2 income + β 3 gender

What if our outcome variable is dichotomous?

Logistic regression (1) Table 2 Age and chlamorea

How can we analyse these data? Compare mean age of diseased and non-diseased –Non-diseased: 26 years –Diseased: 39 years (p=0.0001) Linear regression?

Dot-plot: Data from Table 2 Presence of Chlamorea

Logistic regression (2) Table 3 Prevalence (%) of chlamorea according to age group

Dot-plot: Data from Table 3 Diseased % Age group

Logistic function (1) Probability of disease x

Logistic function Logistic regression models the logit of the outcome =natural logarithm of the odds of the outcome Probability of the outcome (p) Probability of not having the outcome (1-p) ln

Logistic function = log odds of disease in unexposed = log odds ratio associated with being exposed e = odds ratio

Multiple logistic regression More than one independent variable –Dichotomous, ordinal, nominal, continuous … Interpretation of i –Increase in log-odds for a one unit increase in x i with all the other x i s constant –Measures association between x i and log-odds adjusted for all other x i

Uses of multivariable analysis Etiologic models –Identify risk factors adjusted for confounders –Adjust for differences in baseline characteristics Predictive models –Determine diagnosis –Determine prognosis

Fitting equation to the data Linear regression: –Least squares Logistic regression: –Maximum likelihood

Elaborating e β e β = OR What if the independent variable is continuous? whats the effect of a change in x by more than one unit?

The Q fever example Distance to farm as independent continuous variable counted in meters –β in logistic regression was and statistically significant OR for each 1 meter distance is –Too small to use Whats the OR for every 1000 meters? –e 1000*β = e -1000* =

Continuous variables Increase in OR for a one unit change in exposure variable Logistic model is multiplicative OR increases exponentially with x –If OR = 2 for a one unit change in exposure and x increases from 2 to 5: OR = 2 x 2 x 2 = 2 3 = 8 Verify if OR increases exponentially with x –When in doubt, treat as qualitative variable

Coding of variables (2) Nominal variables or ordinal with unequal classes: –Preferred hair colour of partners: »No hair=0, grey=1, brown=2, blond=3 –Model assumes that OR for blond partners = OR for grey-haired partners 3 –Use indicator variables (dummy variables)

Indicator variables: Hair colour Neutralises artificial hierarchy between classes in variable hair colour of partners" No assumptions made 3 variables in model using same reference OR for each type of hair adjusted for the others in reference to no hair

Classes Relationship between number of partners during last year and chlamorea –Code number of partners: 0-1 = 1, 2-3 = 2, 4-5 = 3 Compatible with assumption of multiplicative model –If not compatible, use indicator variables Code nr partners CasesControlsOR

Risk factors for Chlamorea No condom use Chlamorea Sex Hair colour Agegroup Single Visiting bars Number of partners

Unconditional Logistic Regression Term Odds Ratio 95% C.I.Coef.S. E. Z- Statistic P- Value # partners1,26640,263410,70820,23620,94520,54860,5833 Single (Yes/No)1,03450,3277 3,26600,03390,58660,05780,9539 Hair colour (1/0) 1,61260,26759,72200,47780,91660,52130,6022 Hair colour (2/0)0,72910,0991 5,3668-0,31591,0185-0,31020,7564 Hair colour (3/0) 1,11370,15737,88700,10760,99880,10780,9142 Visiting bars 1,59420,49535,13170,46640,59650,78190,4343 Used no Condoms 9,09183,021927,35332,20740,56203,92780,0001 Sex (f/m) 1,30240,22787,44680,26420,88960,29700,7665 CONSTANT ** * -3,00802,0559-1,46310,1434

Last but not least

Why do we need multivariable analysis? Our real world is multivariable Multivariable analysis is a tool to determine the relative contribution of all factors

Sequence of analysis Descriptive analysis –Know your dataset Bivariate analysis –Identify associations Stratified analysis –Confounding and effect modifiers Multivariable analysis –Control for confounding

What can go wrong Small sample size and too few cases Wrong coding Skewed distribution of independent variables –Empty subgroups Collinearity –Independent variables express the same

Do not forget Rubbish in - rubbish out Check for confounders first Number of subjects >> variables in the model Keep the model simple –Statisticians can help with the model but you need to understand the interpretation You will need several attempts to find the best model

If in doubt… Really call a statistician !!!!

References Norman GR, Steiner DL. Biostatistics. The Bare Essentials. BC Decker, London, 2000 Hosmer DW, Lemeshow S. Applied logistic regression. Wiley & Sons, New York, 1989 Schwartz MH. Multivariable analysis. Cambridge University Press, 2006