Multiple logistic regression

Slides:



Advertisements
Similar presentations
Continued Psy 524 Ainsworth
Advertisements

Logistic Regression Psy 524 Ainsworth.
Logistic Regression.
Simple Logistic Regression
1 SSS II Lecture 1: Correlation and Regression Graduate School 2008/2009 Social Science Statistics II Gwilym Pryce
April 25 Exam April 27 (bring calculator with exp) Cox-Regression
Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number.
Multiple Regression Involves the use of more than one independent variable. Multivariate analysis involves more than one dependent variable - OMS 633 Adding.
Chapter 11 Survival Analysis Part 2. 2 Survival Analysis and Regression Combine lots of information Combine lots of information Look at several variables.
Notes on Logistic Regression STAT 4330/8330. Introduction Previously, you learned about odds ratios (OR’s). We now transition and begin discussion of.
An Introduction to Logistic Regression
Data Analysis Statistics. Inferential statistics.
Multinomial Logistic Regression Basic Relationships
Logistic Regression – Basic Relationships
Assessing Survival: Cox Proportional Hazards Model Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
STAT E-150 Statistical Methods
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
+ Quantitative Statistics: Chi-Square ScWk 242 – Session 7 Slides.
Elements of Multiple Regression Analysis: Two Independent Variables Yong Sept
Wednesday PM  Presentation of AM results  Multiple linear regression Simultaneous Simultaneous Stepwise Stepwise Hierarchical Hierarchical  Logistic.
Chapter 13: Inference in Regression
Understanding Multivariate Research Berry & Sanders.
T-Tests and Chi2 Does your sample data reflect the population from which it is drawn from?
Hierarchical Binary Logistic Regression
Multinomial Logistic Regression Basic Relationships
Assessing Survival: Cox Proportional Hazards Model
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Logistic Regression. Conceptual Framework - LR Dependent variable: two categories with underlying propensity (yes/no) (absent/present) Independent variables:
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
11/16/2015Slide 1 We will use a two-sample test of proportions to test whether or not there are group differences in the proportions of cases that have.
Political Science 30: Political Inquiry. Linear Regression II: Making Sense of Regression Results Interpreting SPSS regression output Coefficients for.
Logistic Regression Analysis Gerrit Rooks
Chapter 13 Understanding research results: statistical inference.
Nonparametric Statistics
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture.
Other tests of significance. Independent variables: continuous Dependent variable: continuous Correlation: Relationship between variables Regression:
Methods of Presenting and Interpreting Information Class 9.
Stats Methods at IC Lecture 3: Regression.
Nonparametric Statistics
Data measurement, probability and Spearman’s Rho
Statistics 200 Lecture #9 Tuesday, September 20, 2016
BINARY LOGISTIC REGRESSION
Chapter 14 Inference on the Least-Squares Regression Model and Multiple Regression.
Hypothesis Testing.
Learning Objectives: 1. Understand the use of significance levels. 2
Logistic Regression APKC – STATS AFAC (2016).
Advanced Quantitative Techniques
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Logistic Regression II/ (Hierarchical)
Notes on Logistic Regression
Inference and Tests of Hypotheses
Applied Biostatistics: Lecture 2
Chi-Square X2.
Political Science 30: Political Inquiry
Hypothesis Testing Review
Statistics 103 Monday, July 10, 2017.
Difference Between Means Test (“t” statistic)
Nonparametric Statistics
NURS 790: Methods for Research and Evidence Based Practice
Categorical Data Analysis Review for Final
UNDERSTANDING RESEARCH RESULTS: STATISTICAL INFERENCE
Tutorial 1: Misspecification
Data measurement, probability and statistical tests
15.1 The Role of Statistics in the Research Process
Logistic Regression.
MGS 3100 Business Analysis Regression Feb 18, 2016
CHI SQUARE (χ2) Dangerous Curves Ahead!.
Presentation transcript:

Multiple logistic regression Many of same principles (confounding, multicollinearity...) apply Again adding other covariates (more ‘x’s to the model) ln[Y/(1−Y)]=a + b1x1 + b2x2......+ bixi [remember the equation for a simple logistic regression has one ‘x’ and looks like this : ln[Y/(1−Y)]=a + bX --compare to the above equations which includes additional ‘x’s ]

Example: Is smoking associated with depression? But I also know females might be more depressed, and there are age differences so I want to control for them too. DV: depression, coded no depression and 1= depressed (remember in logistic regression the DV must be nominal) IV s : -smoking (smokerec yes=1/ no=0) (those smoking will be compared to the reference-nonsmokers who are coded with a 0) Age (continuous) Gender (0= female and 1=male (males will be compared to the reference-females who are coded with a 0) ln[Y/(1−Y)]=a + b1x1 + b2x2......+ bixi ln[depression/(1−depression)]= a + b1(smoking) + b2(age) + b3(gender)

This is the kind of information SPSS provides: The model only explains 1.8% -2.4% of the variation in depression Cox & Snell R Square and Nagelkerke R Square – indication of variation in the dependent variable explained by the model. These are pseudo R-squares. Logistic regression does not have an equivalent to the R-squared that is found in OLS regression; however, many people have tried to come up with one.  There are a wide variety of pseudo-R-square statistics (these are only two of them).  Because this statistic does not mean what R-squared means in OLS regression (the proportion of variance explained by the predictors), interpret this statistic with great caution.

Model fit information The Hosmer and Lemeshow Goodness-of-Fit Test divides subjects into deciles based on predicted probabilities, then computes a chi-square from observed and expected frequencies. If the Hosmer and Lemeshow Goodness-of-Fit test statistic is .05 or less, we reject the null hypothesis that there is no difference between the observed and predicted values of the dependent; if it is greater, as we want, we fail to reject the null hypothesis that there is no difference, implying that the model's estimates fit the data at an acceptable level.

Wald statistic is a multivariate form of chi square The ratio of the logistic coefficient B to its standard error S.E., squared. SPSS output Variables in the Equation .124 .163 .580 1 .446 1.132 .823 1.556 .000 .006 .001 .976 1.000 .988 1.012 .868 .257 11.436 2.382 1.440 3.938 -.235 .262 .805 .370 .791 gender age smokerec Constant Step a B S.E. Wald df Sig. Exp(B) Lower Upper 95.0% C.I.for EXP(B) Variable(s) entered on step 1: gender, age, smokerec. a. B - These are the values for the logistic regression equation for predicting the dependent variable from the independent variable.  They are in log-odds units Wald and Sig. - These columns provide the Wald chi-square value and 2-tailed p-value used in testing the null hypothesis that the coefficient (parameter) is 0.  Exp(B) - These are the odds ratios for the predictors.  They are the exponentiation of the coefficients.

Remember our main Q: Smoking depression Variables in the Equation .124 .163 .580 1 .446 1.132 .823 1.556 .000 .006 .001 .976 1.000 .988 1.012 .868 .257 11.436 2.382 1.440 3.938 -.235 .262 .805 .370 .791 gender age smokerec Constant Step a B S.E. Wald df Sig. Exp(B) Lower Upper 95.0% C.I.for EXP(B) Variable(s) entered on step 1: gender, age, smokerec. a. Smoking is the only variable that has a significant relationship with depression (OR=2.38; CI: 1.44, 3.94, p=.001) (95%CI do not trap 1.0) People who smoke are 2.4 times more likely to be depressed than people who do not smoke (the referent) while holding age and gender constant –note neither age or gender has a significant association with depression You can also write an equation similarly as in multiple regression: ln[depression/(1−depression)]= a + b1(smoking) + b2(age) + b3(gender) Ln (depression/1-depression)= -.235 + .868(smoke) + .00(age) + .124(gender)

Another logistic regression example Research Question: What are the clinical factors that predict the chances of death? A researcher conducted a study to determine what clinical variables predict the chances of death. The sample consisted of 4185 patients with cardiac disease who participated in a 5 year study. The outcome variable was the death of the patient during the study or the patient did not die during the study. coded as 0= patient did not die and 1 = patient died during the study. The predictors included: clinical predictors (BUN (Blood Urea Nitrogen), Creatinine, NYHA (New York Heart Association classification of heart failure), LVEF (Left Ventricular Ejection Fraction) depression.

Sample sixe of 4185 The R2 = .13 means that 13% of the variance in the dependent variable (death) is explained by the logistic regression model. On the above table, the B value is equivalent to the B values in multiple regression. These values are used to calculate the probability of a case falling into a particular category (death or no death). You should check the sign associated with each value. This sign will tell you the direction of the relationship (which factors increase the likelihood of yes answer and which factors decrease it). For example, a negative B value indicates that an increase in the independent variable score will result in a decreased probability of the case recording a score of 1 in the dependent variable (in this study a score of 1 indicates a patient died during the study). The SE (B) is the standard error of B and is the same as in multiple regression.

The odds ratio is referred to as the Exp (B). The values in the odds ratio column are the odds ratios for the independent variables in the model. These are interpreted for each of the significant predictors in the model. An odds ratio that is less than 1.0 will have a negative sign associated with the B value which means that the chance of the outcome occurring is reduced. For example, the LVEF has an odds ratio of .95 which would be interpreted as follows: for each unit increase in LVEF the patient is 5% less likely to die during the study. The 5% comes from subtracting 1.0 -.95 = .05. The 1.0 is equal odds so that is why we use that value to subtract from. If the odds ratio reported is greater than 1.0 than the chance of the outcome occurring is increased. For example, the odds ratio for creatinine is 1.81 this would be interpreted as follows: for each unit increase in creatinine level the patient is 81% more likely to die. So, clinically this value is an important predictor to monitor in cardiac patients. If we were to interpret BUN it would be stated as follows: for each unit increase in BUN the patient is 2% more likely to die. Another way to state this might be: for each unit increase in BUN the patient has a 2% increased risk of death. Both ways are correct!

The final two columns are 95% confidence intervals which report both lower and upper limits. When interpreting confidence intervals in logistic regression is it important to note that if the confidence interval crosses 1.0 this is equal odds and the p-value will not be significant. For example, the confidence interval for Depression ranges from .99 to 1.01 and traps the value of 1.0 in that range. This indicates that we could not rule out the possibility that the true odds ratio was 1.0, indicating equal odds and no difference in the outcome categories (did not die or died). Depression also has a significance value of .895 (which is greater than .05) which means that depression is not a significant predictor in the model. So, you can determine significance by looking at the significance value or the confidence interval. The confidence interval for NYHA ranges from 1.43 to 1.76. The confidence interval does not contain the value of 1.0, therefore this result is statistically significant (p<.05). Since this result is significant, it would be interpreted as followed: for each unit increase in the NYHA level the patient is 59% more likely to die. This makes sense because there are 4 levels in the NYHA and as the levels increase in number the patient is more ill and at a higher risk of death.

Values used if calculating the equation If you had this kind of information can you determine what kind of regression was run and what the equation would be? Logistic regression give away that we have exponentiated values values Values used if calculating the equation

How do know what test to ever perform How do know what test to ever perform? Imagine you had measured the cholesterol level in the blood of a large number of >54-yr-old women, then followed them up 10 yrs later to see who had had a heart attack. 1. Looking at the type of variables you have you can narrow down to what kind of tests to run: You have with one nominal (heart attack yes/no) and one interval/ratio variable (cholesterol level) so you could use a t-test. Your outcome is nominal (heart attack) and one interval/ratio variable so you could also use a regression, in this case logistic regression. 2. But now what do you really want to learn? if the hypothesis you are interested in was to explore whether there are differences in cholesterol between the two outcome groups..... You could do a t-test, comparing the cholesterol levels of the women who had a heart attacks vs. those who didn't, and that would be a perfectly reasonable way to test the null hypothesis that cholesterol level is not associated with heart attacks; However, if you wanted to predict the probability that a woman aged 55 or more with a particular cholesterol level would have a heart attack in the next ten years, so that you could tell patients "If you reduce your cholesterol by 40 points, you'll reduce your risk of heart attack by X percent," you would have to use logistic regression. Also note this approach would allow you to control for possible confounders like the blood pressure.

imagine that you had measured the cholesterol level in the blood of a large number of >54-year-old women, then followed up ten years later to see who had had a heart attack. Holding other variables constant (age, BP, ht, wt), for every one point change in cholesterol there is 0.8% (make sure you know the right place to read for the .8%) increased chance of having a heart attack (usually say for every 1 point increase 0.8% increase but can also say another way too— for a 40 point change =40*.8=32%) so if she reduces her cholesterol by 40 points this will reduce her risk of a heart attack by roughly one third.