Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Alain Moren, Thomas Grein.

Similar presentations


Presentation on theme: "Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Alain Moren, Thomas Grein."— Presentation transcript:

1 Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Alain Moren, Thomas Grein

2 Oral contraceptives (OC) and myocardial infarction (MI) Case-control study, unstratified data OC MIControlsOR Yes 693 3204.8 No 307 680Ref. Total1000 1000

3 Oral contraceptives (OC) and myocardial infarction (MI) Case-control study, unstratified data Smoking MIControlsOR Yes 700 5002.3 No 300 500Ref. Total1000 1000

4 Odds ratio for OC adjusted for smoking = 4.5

5 Ebola 6 2 potential risk factors 2 Contact with a case 2 Contact with the hospital

6

7

8

9

10 Number of cases One case 181920212223242526271716151314 0 5 10 Days Cases of gastroenteritis among residents of a nursing home, by date of onset, Pennsylvania, October 1986

11 ProteinTotalCasesAR%RR suppl. YES 29 22763.3 NO 74 1723 Total103 3938 Cases of gastroenteritis among residents of a nursing home according to protein supplement consumption, Pa, 1986

12 Sex-specific attack rates of gastroenteritis among residents of a nursing home, Pa, 1986 SexTotalCases AR(%)RR & 95% CI Male225 23Reference Female8134 421.8 (0.8-4.2) Total10339 38

13 Attack rates of gastroenteritis among residents of a nursing home, by place of meal, Pa, 1986 MealTotal CasesAR(%)RR & 95% CI Dining room 41 12 29Reference Bedroom 62 27 441.5 (0.9-2.6) Total103 39 38

14 Age – specific attack rates of gastroenteritis among residents of a nursing home, Pa, 1986 Age groupTotalCasesAR(%) 50-591250 60-699222 70-7928932 80-89451738 90+191053 Total1033938

15 Attack rates of gastroenteritis among residents of a nursing home, by floor of residence, Pa, 1986 FloorTotalCasesAR (%) One12325 Two321753 Three30723 Four291241 Total1033938

16

17 Multivariate analysis Multiple models –Linear regression –Logistic regression –Cox model –Poisson regression –Loglinear model –Discriminant analysis –...... Choice of the tool according to the objectives, the study, and the variables

18 Simple linear regression Table 1 Age and systolic blood pressure (SBP) among 33 adult women

19 SBP (mm Hg) Age (years) adapted from Colton T. Statistics in Medicine. Boston: Little Brown, 1974

20 Simple linear regression Relation between 2 continuous variables (SBP and age) Regression coefficient  1 –Measures association between y and x –Amount by which y changes on average when x changes by one unit –Least squares method y x Slope

21 Multiple linear regression Relation between a continuous variable and a set of i continuous variables Partial regression coefficients  i –Amount by which y changes on average when x i changes by one unit and all the other x i s remain constant –Measures association between x i and y adjusted for all other x i Example –SBP versus age, weight, height, etc

22 Multiple linear regression Predicted Predictor variables Response variable Explanatory variables Outcome variable Covariables Dependent Independent variables

23 Logistic regression (1) Table 2 Age and signs of coronary heart disease (CD)

24 How can we analyse these data? Compare mean age of diseased and non-diseased –Non-diseased: 38.6 years –Diseased: 58.7 years (p<0.0001) Linear regression?

25 Dot-plot: Data from Table 2

26

27 Logistic regression (2) Table 3 Prevalence (%) of signs of CD according to age group

28 Dot-plot: Data from Table 3 Diseased % Age group

29 Logistic function (1) Probability of disease x

30 Transformation logit of P(y|x) {  = log odds of disease in unexposed  = log odds ratio associated with being exposed e  = odds ratio

31 Fitting equation to the data Linear regression: Least squares Logistic regression: Maximum likelihood Likelihood function –Estimates parameters  and  –Practically easier to work with log-likelihood

32 Maximum likelihood Iterative computing –Choice of an arbitrary value for the coefficients (usually 0) –Computing of log-likelihood –Variation of coefficients’ values –Reiteration until maximisation (plateau) Results –Maximum Likelihood Estimates (MLE) for  and  –Estimates of P(y) for a given value of x

33 Multiple logistic regression More than one independent variable –Dichotomous, ordinal, nominal, continuous … Interpretation of  i –Increase in log-odds for a one unit increase in x i with all the other x i s constant –Measures association between x i and log-odds adjusted for all other x i

34 Statistical testing Question –Does model including given independent variable provide more information about dependent variable than model without this variable? Three tests –Likelihood ratio statistic (LRS) –Wald test –Score test

35 Likelihood ratio statistic Compares two nested models Log(odds) =  +  1 x 1 +  2 x 2 +  3 x 3 (model 1) Log(odds) =  +  1 x 1 +  2 x 2 (model 2) LR statistic -2 log (likelihood model 2 / likelihood model 1) = -2 log (likelihood model 2) minus -2log (likelihood model 1) LR statistic is a  2 with DF = number of extra parameters in model

36 Coding of variables (2) Nominal variables or ordinal with unequal classes: –Tobacco smoked: no=0, grey=1, brown=2, blond=3 –Model assumes that OR for blond tobacco = OR for grey tobacco 3 –Use indicator variables (dummy variables)

37 Indicator variables: Type of tobacco Neutralises artificial hierarchy between classes in the variable "type of tobacco" No assumptions made 3 variables (3 df) in model using same reference OR for each type of tobacco adjusted for the others in reference to non-smoking

38 Reference Hosmer DW, Lemeshow S. Applied logistic regression. Wiley & Sons, New York, 1989

39 Logistic regression Synthesis

40 Salmonella enteritidis Protein supplement S. Enteritidis gastroenteritis Sex Floor Age Place of meal Blended diet

41 Unconditional Logistic Regression Term Odds Ratio 95% C.I.Coef.S. E. Z- Statistic P- Value AGG (2/1)1,67950,263410,70820,51850,94520,54860,5833 AGG (3/1)1,75700,32499,50220,56360,86120,65450,5128 Blended (Yes/No)1,03450,32773,26600,03390,58660,05780,9539 Floor (2/1)1,61260,26759,72200,47780,91660,52130,6022 Floor (3/1)0,72910,09915,3668-0,31591,0185-0,31020,7564 Floor (4/1)1,11370,15737,88700,10760,99880,10780,9142 Meal1,59420,49535,13170,46640,59650,78190,4343 Protein (Yes/No)9,09183,021927,35332,20740,56203,92780,0001 Sex1,30240,22787,44680,26420,88960,29700,7665 CONSTANT***-3,00802,0559-1,46310,1434

42 Unconditional Logistic Regression TermOdds Ratio 95% C.I.CoefficientS. E.Z-StatisticP-Value Age1,02340,96601,08420,02310,02940,78480,4326 Blended (Yes/No)1,01840,32203,22070,01830,58740,03110,9752 Floor (2/1)1,64400,27459,84680,49710,91330,54430,5862 Floor (3/1)0,71320,09725,2321-0,33791,0167-0,33240,7396 Floor (4/1)1,07080,15227,53220,06840,99530,06870,9452 Meal1,65610,52365,23790,50450,58750,85870,3905 Protein (Yes/No)8,76782,952126,04032,17110,55543,90910,0001 Sex1,19570,21356,69810,17870,87910,20330,8389 CONSTANT***-4,28962,8908-1,48390,1378

43 Logistic Regression Model Summary Statistics ValueDFp-value Deviance107,981495 Likelihood ratio test34,80688< 0.001 Parameter Estimates 95% C.I. TermsCoefficientStd.Errorp-valueORLowerUpper %GM-1,88571,04200,07030,15170,01971,1695 SEX ='2'0,21390,88120,80821,23850,22026,9662 FLOOR ='2'0,49870,90830,58291,64660,27769,7659 ²FLOOR ='3'-0,32351,01500,75000,72360,09905,2909 FLOOR ='4'0,10880,98390,91191,11500,16217,6698 MEAL ='2'0,53080,56130,34431,70020,56595,1081 Protein ='1'2,18090,5303< 0.0018,85413,131625,034 TWOAGG ='2'0,19040,51620,71221,20980,43993,3272 Termwise Wald Test TermWald Stat.DFp-value FLOOR1,081230,7816

44 Poisson Regression Model Summary Statistics ValueDFp-value Deviance60,262295 Likelihood ratio test67,73788< 0.001 Parameter Estimates 95% C.I. TermsCoefficientStd.Errorp-valueRRLowerUpper %GM-1,82130,84460,03100,16180,03090,8471 SEX ='2'0,12950,71060,85541,13830,28274,5828 FLOOR ='2'0,25030,68670,71541,28440,33444,9343 FLOOR ='3'-0,14220,80320,85950,86740,17974,1877 FLOOR ='4'0,13680,72630,85061,14660,27614,7608 MEAL ='2'0,23730,38540,53811,26780,59562,6987 Protein ='1'1,06580,34130,00182,90321,48715,6679 TWOAGG ='2'0,06450,36820,86111,06660,51822,1951 Termwise Wald Test TermWald Stat.DFp-value FLOOR0,417830,9365

45 Cox Proportional Hazards TermHazard Ratio95%C.I.CoefficientS. E.Z-StatisticP-Value _AGG (2/1)1,06660,51832,1950,06450,36820,1750,8611 Floor(2/1)1,28440,33444,93420,25030,68670,36460,7154 Floor(3/1)0,86740,17974,1876-0,14220,8032-0,1770,8595 Floor(4/1)1,14660,27614,76070,13680,72630,18830,8506 Meal (2/1)1,26780,59572,69860,23730,38540,61570,5381 Protein(Yes/No)2,90321,48715,66781,06580,34133,12250,0018 Sex (2/1)1,13830,28274,58270,12950,71060,18220,8554 Convergence:Converged Iterations:5 -2 * Log-Likelihood:346,0200 TestStatisticD.F.P-Value Score17,172770,0163 Likelihood Ratio15,488970,0302


Download ppt "Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Alain Moren, Thomas Grein."

Similar presentations


Ads by Google