Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Thomas Grein, Alain Moren, Viviane Bremer.

Similar presentations


Presentation on theme: "Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Thomas Grein, Alain Moren, Viviane Bremer."— Presentation transcript:

1 Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Thomas Grein, Alain Moren, Viviane Bremer

2 Objectives When do we need to use logistic regression Principles of logistic regression Uses of logistic regression What to keep in mind

3 Chlamorea Sexually transmitted infection –Virus recently identified –Leads to general rash, blush, pimples and feeling of shame –Increasing prevalence with age –Risk factors unknown so far

4 Case control study Population of Berlin 150 cases, 150 controls Hypothesis: Consistent use of condoms protects against chlamorea Questionnaire with questions on demographic characteristics, sexual behaviour OR, t-test

5 Results bivariate analysis Cases n=150 Controls n=150 Odds ratio Used condoms at last sex 40900.17 Did not use condoms 11060Ref

6 Results bivariate analysis Cases n=150 Controls n=150 Odds ratio Single125504.7 Currently in a relationship 25100Ref

7 Results bivariate analysis Cases n=150 Controls n=150 T-test nr partners during last year 42p=0.001 Mean age in years 3926p=0.001 Confounding?

8 a c b d OR raw a1a1 c1c1 b1b1 d1d1 a2a2 c2c2 b2b2 d2d2 OR 1 OR 2 aiai cici bibi didi OR i a1a1 c1c1 b1b1 d1d1 a2a2 c2c2 b2b2 d2d2 OR 1 OR 2 aiai cici bibi didi OR i a1a1 c1c1 b1b1 d1d1 a2a2 c2c2 b2b2 d2d2 OR 1 OR 2 aiai cici bibi didi OR i a1a1 c1c1 b1b1 d1d1 a2a2 c2c2 b2b2 d2d2 OR 1 OR 2 aiai cici bibi didi OR i a1a1 c1c1 b1b1 d1d1 a2a2 c2c2 b2b2 d2d2 OR 1 OR 2 aiai cici bibi didi OR i a1a1 c1c1 b1b1 d1d1 a2a2 c2c2 b2b2 d2d2 OR 1 OR 2 aiai cici bibi didi OR i a1a1 c1c1 b1b1 d1d1 a2a2 c2c2 b2b2 d2d2 OR 1 OR 2 aiai cici bibi didi OR i a1a1 c1c1 b1b1 d1d1 a2a2 c2c2 b2b2 d2d2 OR 1 OR 2 aiai cici bibi didi OR i a1a1 c1c1 b1b1 d1d1 a2a2 c2c2 b2b2 d2d2 OR 1 OR 2 aiai cici bibi didi OR i a1a1 c1c1 b1b1 d1d1 a2a2 c2c2 b2b2 d2d2 OR 1 OR 2 aiai cici bibi didi OR i a1a1 c1c1 b1b1 d1d1 a2a2 c2c2 b2b2 d2d2 OR 1 OR 2 aiai cici bibi didi OR i a1a1 c1c1 b1b1 d1d1 a2a2 c2c2 b2b2 d2d2 OR 1 OR 2 aiai cici bibi didi OR i a1a1 c1c1 b1b1 d1d1 a2a2 c2c2 b2b2 d2d2 OR 1 OR 2 aiai cici bibi didi OR i a1a1 c1c1 b1b1 d1d1 a2a2 c2c2 b2b2 d2d2 OR 1 OR 2 aiai cici bibi didi OR i a1a1 c1c1 b1b1 d1d1 a2a2 c2c2 b2b2 d2d2 OR 1 OR 2 aiai cici bibi didi OR i a1a1 c1c1 b1b1 d1d1 a2a2 c2c2 b2b2 d2d2 OR 1 OR 2 aiai cici bibi didi OR i a1a1 c1c1 b1b1 d1d1 a2a2 c2c2 b2b2 d2d2 OR 1 OR 2 a3a3 c3c3 b3b3 d3d3 OR 3 aiai cici bibi didi OR 4 Chlamorea and condom use Single status Agegroup Number of partners Stratification

9 Lets go one step back

10 Simple linear regression Table 1 Age and systolic blood pressure (SBP) among 33 adult women

11 SBP (mm Hg) Age (years) adapted from Colton T. Statistics in Medicine. Boston: Little Brown, 1974

12 Simple linear regression Relation between 2 continuous variables (SBP and age) Regression coefficient 1 –Measures association between y and x –Amount by which y changes on average when x changes by one unit –Least squares method y x Slope

13 What if we have more than one independent variable?

14 Multiple risk factors Objective: To attribute to each risk factors the respective effect (RR) it has on the occurrence of disease.

15 Types of multivariable analysis Multiple models –Linear regression –Logistic regression –Cox model –Poisson regression –Loglinear model –Discriminant analysis… Choice of the tool according objectives, study design and variables

16 Multiple linear regression Relation between a continuous variable and a set of i variables Partial regression coefficients i –Amount by which y changes when x i changes by one unit and all the other x i remain constant –Measures association between x i and y adjusted for all other x i Example –Number of partners in relation to age & income

17 Multiple linear regression Predicted Predictor variables Response variableExplanatory variables Outcome variableCovariables Dependent Independent variables y (number of partners) = α + β 1 age + β 2 income + β 3 gender

18 What if our outcome variable is dichotomous?

19 Logistic regression (1) Table 2 Age and chlamorea

20 How can we analyse these data? Compare mean age of diseased and non-diseased –Non-diseased: 26 years –Diseased: 39 years (p=0.0001) Linear regression?

21 Dot-plot: Data from Table 2 Presence of Chlamorea

22 Logistic regression (2) Table 3 Prevalence (%) of chlamorea according to age group

23 Dot-plot: Data from Table 3 Diseased % Age group

24 Logistic function (1) Probability of disease x

25 Logistic function Logistic regression models the logit of the outcome =natural logarithm of the odds of the outcome Probability of the outcome (p) Probability of not having the outcome (1-p) ln

26 Logistic function = log odds of disease in unexposed = log odds ratio associated with being exposed e = odds ratio

27 Multiple logistic regression More than one independent variable –Dichotomous, ordinal, nominal, continuous … Interpretation of i –Increase in log-odds for a one unit increase in x i with all the other x i s constant –Measures association between x i and log-odds adjusted for all other x i

28 Uses of multivariable analysis Etiologic models –Identify risk factors adjusted for confounders –Adjust for differences in baseline characteristics Predictive models –Determine diagnosis –Determine prognosis

29 Fitting equation to the data Linear regression: –Least squares Logistic regression: –Maximum likelihood

30 Elaborating e β e β = OR What if the independent variable is continuous? whats the effect of a change in x by more than one unit?

31 The Q fever example Distance to farm as independent continuous variable counted in meters –β in logistic regression was -0.00050013 and statistically significant OR for each 1 meter distance is 0.9995 –Too small to use Whats the OR for every 1000 meters? –e 1000*β = e -1000*0.00050013 = 0.6064

32 Continuous variables Increase in OR for a one unit change in exposure variable Logistic model is multiplicative OR increases exponentially with x –If OR = 2 for a one unit change in exposure and x increases from 2 to 5: OR = 2 x 2 x 2 = 2 3 = 8 Verify if OR increases exponentially with x –When in doubt, treat as qualitative variable

33 Coding of variables (2) Nominal variables or ordinal with unequal classes: –Preferred hair colour of partners: »No hair=0, grey=1, brown=2, blond=3 –Model assumes that OR for blond partners = OR for grey-haired partners 3 –Use indicator variables (dummy variables)

34 Indicator variables: Hair colour Neutralises artificial hierarchy between classes in variable hair colour of partners" No assumptions made 3 variables in model using same reference OR for each type of hair adjusted for the others in reference to no hair

35 Classes Relationship between number of partners during last year and chlamorea –Code number of partners: 0-1 = 1, 2-3 = 2, 4-5 = 3 Compatible with assumption of multiplicative model –If not compatible, use indicator variables Code nr partners CasesControlsOR 120401.0 222301.5 312112.2 1.5 2 2.2

36 Risk factors for Chlamorea No condom use Chlamorea Sex Hair colour Agegroup Single Visiting bars Number of partners

37 Unconditional Logistic Regression Term Odds Ratio 95% C.I.Coef.S. E. Z- Statistic P- Value # partners1,26640,263410,70820,23620,94520,54860,5833 Single (Yes/No)1,03450,3277 3,26600,03390,58660,05780,9539 Hair colour (1/0) 1,61260,26759,72200,47780,91660,52130,6022 Hair colour (2/0)0,72910,0991 5,3668-0,31591,0185-0,31020,7564 Hair colour (3/0) 1,11370,15737,88700,10760,99880,10780,9142 Visiting bars 1,59420,49535,13170,46640,59650,78190,4343 Used no Condoms 9,09183,021927,35332,20740,56203,92780,0001 Sex (f/m) 1,30240,22787,44680,26420,88960,29700,7665 CONSTANT ** * -3,00802,0559-1,46310,1434

38 Last but not least

39 Why do we need multivariable analysis? Our real world is multivariable Multivariable analysis is a tool to determine the relative contribution of all factors

40 Sequence of analysis Descriptive analysis –Know your dataset Bivariate analysis –Identify associations Stratified analysis –Confounding and effect modifiers Multivariable analysis –Control for confounding

41 What can go wrong Small sample size and too few cases Wrong coding Skewed distribution of independent variables –Empty subgroups Collinearity –Independent variables express the same

42 Do not forget Rubbish in - rubbish out Check for confounders first Number of subjects >> variables in the model Keep the model simple –Statisticians can help with the model but you need to understand the interpretation You will need several attempts to find the best model

43 If in doubt… Really call a statistician !!!!

44 References Norman GR, Steiner DL. Biostatistics. The Bare Essentials. BC Decker, London, 2000 Hosmer DW, Lemeshow S. Applied logistic regression. Wiley & Sons, New York, 1989 Schwartz MH. Multivariable analysis. Cambridge University Press, 2006


Download ppt "Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Thomas Grein, Alain Moren, Viviane Bremer."

Similar presentations


Ads by Google