Introduction to Logistic Regression. Content Simple and multiple linear regression Simple logistic regression –The logistic function –Estimation of parameters.

Slides:



Advertisements
Similar presentations
Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Thomas Grein, Alain Moren.
Advertisements

Statistical Analysis SC504/HS927 Spring Term 2008
If we use a logistic model, we do not have the problem of suggesting risks greater than 1 or less than 0 for some values of X: E[1{outcome = 1} ] = exp(a+bX)/
Qualitative predictor variables
Brief introduction on Logistic Regression
Logistic Regression Psy 524 Ainsworth.
Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June 20091/29 Multivariate analysis: Introduction Third training Module EpiSouth.
Logistic Regression I Outline Introduction to maximum likelihood estimation (MLE) Introduction to Generalized Linear Models The simplest logistic regression.
Simple Logistic Regression
April 25 Exam April 27 (bring calculator with exp) Cox-Regression
Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number.
Introduction to Logistic Regression. Simple linear regression Table 1 Age and systolic blood pressure (SBP) among 33 adult women.
BIOST 536 Lecture 9 1 Lecture 9 – Prediction and Association example Low birth weight dataset Consider a prediction model for low birth weight (< 2500.
Chapter 11 Survival Analysis Part 2. 2 Survival Analysis and Regression Combine lots of information Combine lots of information Look at several variables.
EPI 809/Spring Multiple Logistic Regression.
Logistic Regression Biostatistics 510 March 15, 2007 Vanessa Perez.
Log-linear analysis Summary. Focus on data analysis Focus on underlying process Focus on model specification Focus on likelihood approach Focus on ‘complete-data.
Generalized Linear Models
Logistic regression for binary response variables.
Assessing Survival: Cox Proportional Hazards Model Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Logistic Regression In logistic regression the outcome variable is binary, and the purpose of the analysis is to assess the effects of multiple explanatory.
Logistic Regression Logistic Regression - Dichotomous Response variable and numeric and/or categorical explanatory variable(s) –Goal: Model the probability.
Regression and Correlation
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Logistic Regression II Simple 2x2 Table (courtesy Hosmer and Lemeshow) Exposure=1Exposure=0 Disease = 1 Disease = 0.
Logistic Regression. Outline Review of simple and multiple regressionReview of simple and multiple regression Simple Logistic RegressionSimple Logistic.
Simple Linear Regression
Assoc. Prof. Pratap Singhasivanon Faculty of Tropical Medicine, Mahidol University.
Assessing Survival: Cox Proportional Hazards Model
Excepted from HSRP 734: Advanced Statistical Methods June 5, 2008.
Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,
Correlation and Regression SCATTER DIAGRAM The simplest method to assess relationship between two quantitative variables is to draw a scatter diagram.
01/20151 EPI 5344: Survival Analysis in Epidemiology Interpretation of Models March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health & Preventive.
Logistic Regression STA2101/442 F 2014 See last slide for copyright information.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Alain Moren, Thomas Grein.
AN INTRODUCTION TO LOGISTIC REGRESSION ENI SUMARMININGSIH, SSI, MM PROGRAM STUDI STATISTIKA JURUSAN MATEMATIKA UNIVERSITAS BRAWIJAYA.
LOGISTIC REGRESSION A statistical procedure to relate the probability of an event to explanatory variables Used in epidemiology to describe and evaluate.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Week 5: Logistic regression analysis Overview Questions from last week What is logistic regression analysis? The mathematical model Interpreting the β.
MBP1010 – Lecture 8: March 1, Odds Ratio/Relative Risk Logistic Regression Survival Analysis Reading: papers on OR and survival analysis (Resources)
Lecture 12: Cox Proportional Hazards Model
1 Multivariable Modeling. 2 nAdjustment by statistical model for the relationships of predictors to the outcome. nRepresents the frequency or magnitude.
Multiple Logistic Regression STAT E-150 Statistical Methods.
1 Follow the three R’s: Respect for self, Respect for others and Responsibility for all your actions.
Logistic Regression. Linear regression – numerical response Logistic regression – binary categorical response eg. has the disease, or unaffected by the.
A first order model with one binary and one quantitative predictor variable.
Logistic Regression Analysis Gerrit Rooks
Logistic regression (when you have a binary response variable)
POPLHLTH 304 Regression (modelling) in Epidemiology Simon Thornley (Slides adapted from Assoc. Prof. Roger Marshall)
1 Introduction to Modeling Beyond the Basics (Chapter 7)
Probability and odds Suppose we a frequency distribution for the variable “TB status” The probability of an individual having TB is frequencyRelative.
Roger B. Hammer Assistant Professor Department of Sociology Oregon State University Conducting Social Research Logistic Regression Categorical Data Analysis.
Nonparametric Statistics
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: Multiple, Logistic and Proportional Hazards Regression.
Chapter 13 LOGISTIC REGRESSION. Set of independent variables Categorical outcome measure, generally dichotomous.
Nonparametric Statistics
BINARY LOGISTIC REGRESSION
Logistic Regression APKC – STATS AFAC (2016).
Generalized Linear Models
Multiple logistic regression
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Nonparametric Statistics
Logistic Regression.
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Nazmus Saquib, PhD Head of Research Sulaiman AlRajhi Colleges
Önder Ergönül, MD, MPH Koç University, School of Medicine
Presentation transcript:

Introduction to Logistic Regression

Content Simple and multiple linear regression Simple logistic regression –The logistic function –Estimation of parameters –Interpretation of coefficients Multiple logistic regression –Interpretation of coefficients –Coding of variables

Simple linear regression Table 1 Age and systolic blood pressure (SBP) among 33 adult women

SBP (mm Hg) Age (years) adapted from Colton T. Statistics in Medicine. Boston: Little Brown, 1974

Simple linear regression Relation between 2 continuous variables (SBP and age) Regression coefficient  1 –Measures association between y and x –Amount by which y changes on average when x changes by one unit –Least squares method y x Slope

Multiple linear regression Relation between a continuous variable and a set of i continuous variables Partial regression coefficients  i –Amount by which y changes on average when x i changes by one unit and all the other x i s remain constant –Measures association between x i and y adjusted for all other x i Example –SBP versus age, weight, height, etc

Multiple linear regression Dependent Independent variables Predicted Predictor variables Response variable Explanatory variables Outcome variable Covariables

Multivariate analysis Model Outcome Linear regression continous Poisson regression counts Cox model survival Logistic regression binomial Choice of the tool according to study, objectives, and the variables –Control of confounding –Model building, prediction

Logistic regression Models the relationship between a set of variables x i –dichotomous (eat : yes/no) –categorical (social class,... ) –continuous (age,...) and –dichotomous variable Y Dichotomous (binary) outcome most common situation in biology and epidemiology

Logistic regression (1) Table 2 Age and signs of coronary heart disease (CD)

How can we analyse these data? Comparison of the mean age of diseased and non-diseased women –Non-diseased: 38.6 years –Diseased: 58.7 years (p<0.0001) Linear regression?

Dot-plot: Data from Table 2

Logistic regression (2) Table 3 Prevalence (%) of signs of CD according to age group

Dot-plot: Data from Table 3 Diseased Age (years) P 1-P

Dot-plot: Data from Table 3 Diseased % Age (years)

The logistic function (2) logit of P(y|x) {

The logistic function (3) Advantages of the logit –Simple transformation of P(y|x) –Linear relationship with x –Can be continuous (Logit between -  to +  ) –Known binomial distribution (P between 0 and 1) –Directly related to the notion of odds of disease

Interpretation of  (1)

Interpretation of  (2)  = increase in log-odds for a one unit increase in x Test of the hypothesis that  =0 (Wald test) Interval testing

Example Age (<55 and 55+ years) and risk of developing coronary heart disease (CD)

جمعپسردختر دارد ندارد جمع فراوانی مطلق ابتلا به آسم بر حسب جنس در دانش آموزان شهر زنجان 1374

پسردختر شيوع Odds لگاريتم Odds

Results of fitting Logistic Regression Model

Interpretation of  (1)

Multiple logistic regression More than one independent variable –Dichotomous, ordinal, nominal, continuous … Interpretation of  i –Increase in log-odds for a one unit increase in x i with all the other x i s constant –Measures association between x i and log-odds adjusted for all other x i

Multiple logistic regression Effect modification –Can be modelled by including interaction terms

Statistical testing Question –Does model including a given independent variable provide more information about dependent variable than model without this variable? Three tests –Likelihood ratio statistic (LRS) –Wald test

Fitting equation to the data Linear regression: Least squares Logistic regression: Maximum likelihood Likelihood function –Estimates parameters  and  with property that likelihood (probability) of observed data is higher than for any other values –Practically easier to work with log-likelihood

Likelihood ratio statistic Compares two nested models Log(odds) =  +  1 x 1 +  2 x 2 +  3 x 3 +  4 x 4 (model 1) Log(odds) =  +  1 x 1 +  2 x 2 (model 2)

Likelihood ratio statistic Compares two nested models Log(odds) =  +  1 x 1 +  2 x 2 +  3 x 3 +  4 x 4 (model 1) Log(odds) =  +  1 x 1 +  2 x 2 (model 2) LR statistic -2 log (likelihood model 2 / likelihood model 1) = -2 log (likelihood model 2) minus -2log (likelihood model 1) LR statistic is a  2 with DF = number of extra parameters in model

Example P Probability for cardiac arrest Exc 1= lack of exercise, 0 = exercise Smk 1= smokers, 0= non-smokers adapted from Kerr, Handbook of Public Health Methods, McGraw-Hill, 1998

Interactive effect between smoking and exercise? Product term  3 = (SE ) Wald test = 0.75 (1df) -2log(L) = with interaction term = without interaction term  LR statistic = 0.74 (1df), p = 0.39  No evidence of any interaction

Coding of variables (1) Dichotomous variables: yes = 1, no = 0 Continuous variables –Increase in OR for a one unit change in exposure variable –Logistic model is multiplicative  OR increases exponentially with x »If OR = 2 for a one unit change in exposure and x increases from 2 to 5: OR = 2 x 2 x 2 = 2 3 = 8 –Verify that OR increases exponentially with x. When in doubt, treat as qualitative variable

Continuous variable? Relationship between SBP>160 mmHg and body weight Introduce BW as continuous variable? –Code weight as single variable, eg. 3 equal classes: kg = 0, kg = 1, kg = 2 –Compatible with assumption of multiplicative model –If not compatible, use indicator variables

Coding of variables (2) Nominal variables or ordinal with unequal classes: –Tobacco smoked: no=0, grey=1, brown=2, blond=3 –Model assumes that OR for blond tobacco = OR for grey tobacco 3 –Use indicator variables (dummy variables)

Indicator variables: Type of tobacco Neutralises artificial hierarchy between classes in the variable "type of tobacco" No assumptions made 3 variables (3 df) in model using same reference OR for each type of tobacco adjusted for the others in reference to non-smoking

تولد با وزن پايين (LBW) بعنوان يکي از شاخصهاي مهم سلامتي پيامدي است که شرايط اقتصادي و بهداشتي تاثير زيادي بر روي آن دارد. اين شرايط که به نوبه خود در مکانهاي مختلف متفاوتند باعث تنوع در الگوي مکاني رخداد تولد با وزن پايين مي شوند. توجه به اين جنبه و نقش پراهميت مکان در تنوع پيامدهاي بهداشتي حيطه اي است که در هر منطقه بايد جداگانه صورت گيرد تا مناطق پرخطر براي تخصيص مداخلات لازم شناسايي شوند. لذا اين بررسي با هدف تعيين مناطق پرخطر LBW در روستاهاي شهرستان رشت انجام شد.

روش كار: براي جمع آوري داده‌هاي اين مطالعه، تولدهاي با وزن زير 2500 گرم به تفکيک واحدهاي روستايي شهرستان رشت در فاصله زماني 1380 و 1381 از زيجهاي حياتي روستاها استخراج گرديد. براي تعيين توزيع مکاني تولدهاي با وزن پايين ابتدا "نقشه هاي مسطح" براي ميزان تولدهاي با وزن پايين تهيه شد.

تعداد كل LBW در روستاهاي شهرستان 295 مورد و تعداد زايمان‌هاي زنده 5987 است. آيا تعداد موارد LBW در روستاهاي زير بيش از حد انتظار شماست كد روستاتعداد تولد زنده LBW

محقق بر روي نقشه مناطق پر خطر LBW را مشخص مي‌كند. حال بنظرش مناسب مي‌رسد كه ارتباط برخي ازشاخصهاي بهداشتي و اقتصادي روستاها و تولد با وزن پايين را مورد سنجش قرار دهد. به همين دليل ميزان باروري عمومي(3/82)، دسترسي به خانه‌بهداشت(96 درصد) و دسترسي به وسيله نقليه در خانواده(52 درصد) را براي ساكنين روستاهاي شهرستان مشخص مي‌كند.

Poisson Regression Poisson Regression Analysis is used when the outcome variable comprises counts, usually of rather rare events e.g. number of cases of cancer over a defined period in a cohort of subjects. log (rate) = β0 + β1X1 + β2X2 + … +βnXn log (event/person-time) = β0 + β1X1 + … log (event) – log (person-time) = β0 + β1X1 + … log (event) = log (person-time) + β0 + β1X1 + … log (event) = β0* + β1X1 + …

شاخصضريب  محدوده اطمينان 95 درصد P value نداشتن خانه بهداشت 008/0-35/037/0-96/0 نداشتن آب لوله كشي 08/034/017/0-52/0 عدم دسترسي به وسايل نقليه 23/048/001/0-06/0 نداشتن مركز بهداشتي درماني 47/0-09/0-85/0-01/ = GFR08/038/022/0-60/ = GFR09/022/042/0-05/0

Reference Hosmer DW, Lemeshow S. Applied logistic regression.Wiley & Sons, New York, 1989

Example 1: Low Birth Weight Study 198 observations Low Birth Weigth [LBW] –1= Birth weight < 2500g –0= Birth weight >= 2500g Age of mother in years Weight of mother in pounds [LWT] Race (1,2,3) Number of doctor’s visit in last trimester [FTV]

Example 2: Risk of death from bacterial meningitis according to treatment 161 observations Death (0,1) Treatment –1=Chloramphenicol, 2=Ampicillin) Delay before treatment (onset, in days) Convulsions (1,0) Level of consciousness (1-3) Severity of dehydration (1-3) Age in years Pathogen –1 Others, 2 HiB, 3 Streptococcus pneumoniae

The logistic function (1) Probability of disease x

The logistic function (2) logit of P(y|x) {

Fitting equation to the data Linear regression: Least squares Logistic regression: Maximum likelihood Likelihood function –Estimates parameters  and  with property that likelihood (probability) of observed data is higher than for any other values –Practically easier to work with log-likelihood

Maximum likelihood Iterative computing –Choice of an arbitrary value for the coefficients (usually 0) –Computing of log-likelihood –Variation of coefficients’ values –Reiteration until maximisation (plateau) Results –Maximum Likelihood Estimates (MLE) for  and  –Estimates of P(y) for a given value of x