Multiple logistic regression

Multiple logistic regression
Many of same principles (confounding, multicollinearity...) apply Again adding other covariates (more ‘x’s to the model) ln[Y/(1−Y)]=a + b1x1 + b2x bixi [remember the equation for a simple logistic regression has one ‘x’ and looks like this : ln[Y/(1−Y)]=a + bX compare to the above equations which includes additional ‘x’s ]

Example: Is smoking associated with depression?
But I also know females might be more depressed, and there are age differences so I want to control for them too. DV: depression, coded no depression and 1= depressed (remember in logistic regression the DV must be nominal) IV s : -smoking (smokerec yes=1/ no=0) (those smoking will be compared to the reference-nonsmokers who are coded with a 0) Age (continuous) Gender (0= female and 1=male (males will be compared to the reference-females who are coded with a 0) ln[Y/(1−Y)]=a + b1x1 + b2x bixi ln[depression/(1−depression)]= a + b1(smoking) + b2(age) + b3(gender)

This is the kind of information SPSS provides:
The model only explains 1.8% -2.4% of the variation in depression Cox & Snell R Square and Nagelkerke R Square – indication of variation in the dependent variable explained by the model. These are pseudo R-squares. Logistic regression does not have an equivalent to the R-squared that is found in OLS regression; however, many people have tried to come up with one. There are a wide variety of pseudo-R-square statistics (these are only two of them). Because this statistic does not mean what R-squared means in OLS regression (the proportion of variance explained by the predictors), interpret this statistic with great caution.

Model fit information The Hosmer and Lemeshow Goodness-of-Fit Test divides subjects into deciles based on predicted probabilities, then computes a chi-square from observed and expected frequencies. If the Hosmer and Lemeshow Goodness-of-Fit test statistic is .05 or less, we reject the null hypothesis that there is no difference between the observed and predicted values of the dependent; if it is greater, as we want, we fail to reject the null hypothesis that there is no difference, implying that the model's estimates fit the data at an acceptable level.

Wald statistic is a multivariate form of chi square The ratio of the logistic coefficient B to its standard error S.E., squared. SPSS output Variables in the Equation .124 .163 .580 1 .446 1.132 .823 1.556 .000 .006 .001 .976 1.000 .988 1.012 .868 .257 11.436 2.382 1.440 3.938 -.235 .262 .805 .370 .791 gender age smokerec Constant Step a B S.E. Wald df Sig. Exp(B) Lower Upper 95.0% C.I.for EXP(B) Variable(s) entered on step 1: gender, age, smokerec. a. B - These are the values for the logistic regression equation for predicting the dependent variable from the independent variable. They are in log-odds units Wald and Sig. - These columns provide the Wald chi-square value and 2-tailed p-value used in testing the null hypothesis that the coefficient (parameter) is 0. Exp(B) - These are the odds ratios for the predictors. They are the exponentiation of the coefficients.

Remember our main Q: Smoking depression
Variables in the Equation .124 .163 .580 1 .446 1.132 .823 1.556 .000 .006 .001 .976 1.000 .988 1.012 .868 .257 11.436 2.382 1.440 3.938 -.235 .262 .805 .370 .791 gender age smokerec Constant Step a B S.E. Wald df Sig. Exp(B) Lower Upper 95.0% C.I.for EXP(B) Variable(s) entered on step 1: gender, age, smokerec. a. Smoking is the only variable that has a significant relationship with depression (OR=2.38; CI: 1.44, 3.94, p=.001) (95%CI do not trap 1.0) People who smoke are 2.4 times more likely to be depressed than people who do not smoke (the referent) while holding age and gender constant –note neither age or gender has a significant association with depression You can also write an equation similarly as in multiple regression: ln[depression/(1−depression)]= a + b1(smoking) + b2(age) + b3(gender) Ln (depression/1-depression)= (smoke) + .00(age) (gender)

Another logistic regression example
Research Question: What are the clinical factors that predict the chances of death? A researcher conducted a study to determine what clinical variables predict the chances of death. The sample consisted of 4185 patients with cardiac disease who participated in a 5 year study. The outcome variable was the death of the patient during the study or the patient did not die during the study. coded as 0= patient did not die and 1 = patient died during the study. The predictors included: clinical predictors (BUN (Blood Urea Nitrogen), Creatinine, NYHA (New York Heart Association classification of heart failure), LVEF (Left Ventricular Ejection Fraction) depression.

Sample sixe of 4185 The R2 = .13 means that 13% of the variance in the dependent variable (death) is explained by the logistic regression model. On the above table, the B value is equivalent to the B values in multiple regression. These values are used to calculate the probability of a case falling into a particular category (death or no death). You should check the sign associated with each value. This sign will tell you the direction of the relationship (which factors increase the likelihood of yes answer and which factors decrease it). For example, a negative B value indicates that an increase in the independent variable score will result in a decreased probability of the case recording a score of 1 in the dependent variable (in this study a score of 1 indicates a patient died during the study). The SE (B) is the standard error of B and is the same as in multiple regression.

The odds ratio is referred to as the Exp (B).
The values in the odds ratio column are the odds ratios for the independent variables in the model. These are interpreted for each of the significant predictors in the model. An odds ratio that is less than 1.0 will have a negative sign associated with the B value which means that the chance of the outcome occurring is reduced. For example, the LVEF has an odds ratio of .95 which would be interpreted as follows: for each unit increase in LVEF the patient is 5% less likely to die during the study. The 5% comes from subtracting = The 1.0 is equal odds so that is why we use that value to subtract from. If the odds ratio reported is greater than 1.0 than the chance of the outcome occurring is increased. For example, the odds ratio for creatinine is 1.81 this would be interpreted as follows: for each unit increase in creatinine level the patient is 81% more likely to die. So, clinically this value is an important predictor to monitor in cardiac patients. If we were to interpret BUN it would be stated as follows: for each unit increase in BUN the patient is 2% more likely to die. Another way to state this might be: for each unit increase in BUN the patient has a 2% increased risk of death. Both ways are correct!

The final two columns are 95% confidence intervals which report both lower and upper limits. When interpreting confidence intervals in logistic regression is it important to note that if the confidence interval crosses 1.0 this is equal odds and the p-value will not be significant. For example, the confidence interval for Depression ranges from .99 to 1.01 and traps the value of 1.0 in that range. This indicates that we could not rule out the possibility that the true odds ratio was 1.0, indicating equal odds and no difference in the outcome categories (did not die or died). Depression also has a significance value of .895 (which is greater than .05) which means that depression is not a significant predictor in the model. So, you can determine significance by looking at the significance value or the confidence interval. The confidence interval for NYHA ranges from 1.43 to The confidence interval does not contain the value of 1.0, therefore this result is statistically significant (p<.05). Since this result is significant, it would be interpreted as followed: for each unit increase in the NYHA level the patient is 59% more likely to die. This makes sense because there are 4 levels in the NYHA and as the levels increase in number the patient is more ill and at a higher risk of death.

Values used if calculating the equation
If you had this kind of information can you determine what kind of regression was run and what the equation would be? Logistic regression give away that we have exponentiated values values Values used if calculating the equation

How do know what test to ever perform
How do know what test to ever perform? Imagine you had measured the cholesterol level in the blood of a large number of >54-yr-old women, then followed them up 10 yrs later to see who had had a heart attack. 1. Looking at the type of variables you have you can narrow down to what kind of tests to run: You have with one nominal (heart attack yes/no) and one interval/ratio variable (cholesterol level) so you could use a t-test. Your outcome is nominal (heart attack) and one interval/ratio variable so you could also use a regression, in this case logistic regression. 2. But now what do you really want to learn? if the hypothesis you are interested in was to explore whether there are differences in cholesterol between the two outcome groups..... You could do a t-test, comparing the cholesterol levels of the women who had a heart attacks vs. those who didn't, and that would be a perfectly reasonable way to test the null hypothesis that cholesterol level is not associated with heart attacks; However, if you wanted to predict the probability that a woman aged 55 or more with a particular cholesterol level would have a heart attack in the next ten years, so that you could tell patients "If you reduce your cholesterol by 40 points, you'll reduce your risk of heart attack by X percent," you would have to use logistic regression. Also note this approach would allow you to control for possible confounders like the blood pressure.

imagine that you had measured the cholesterol level in the blood of a large number of >54-year-old women, then followed up ten years later to see who had had a heart attack. Holding other variables constant (age, BP, ht, wt), for every one point change in cholesterol there is 0.8% (make sure you know the right place to read for the .8%) increased chance of having a heart attack (usually say for every 1 point increase 0.8% increase but can also say another way too— for a 40 point change =40*.8=32%) so if she reduces her cholesterol by 40 points this will reduce her risk of a heart attack by roughly one third.

Multiple logistic regression

Similar presentations

Presentation on theme: "Multiple logistic regression"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Multiple logistic regression

Similar presentations

Presentation on theme: "Multiple logistic regression"— Presentation transcript:

Similar presentations

About project

Feedback