Logistic Regression Analysis Gerrit Rooks 30-03-10.

Slides:



Advertisements
Similar presentations
1 What you've always wanted to know about logistic regression analysis, but were afraid to ask... Februari, Gerrit Rooks Sociology of Innovation.
Advertisements

Continued Psy 524 Ainsworth
Brief introduction on Logistic Regression
Logistic Regression I Outline Introduction to maximum likelihood estimation (MLE) Introduction to Generalized Linear Models The simplest logistic regression.
Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests
Logistic Regression.
Regression With Categorical Variables. Overview Regression with Categorical Predictors Logistic Regression.
Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number.
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
An Introduction to Logistic Regression JohnWhitehead Department of Economics Appalachian State University.
Logistic Regression Biostatistics 510 March 15, 2007 Vanessa Perez.
Notes on Logistic Regression STAT 4330/8330. Introduction Previously, you learned about odds ratios (OR’s). We now transition and begin discussion of.
An Introduction to Logistic Regression
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Linear Regression and Linear Prediction Predicting the score on one variable.
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Model Checking in the Proportional Hazard model
Relationships Among Variables
Assessing Survival: Cox Proportional Hazards Model Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
STAT E-150 Statistical Methods
An Illustrative Example of Logistic Regression
Logistic Regression II Simple 2x2 Table (courtesy Hosmer and Lemeshow) Exposure=1Exposure=0 Disease = 1 Disease = 0.
Advanced Methods and Models in Behavioral Research – Advanced Models and Methods in Behavioral Research Chris Snijders 3 ects.
Logistic Regression.
Inference for regression - Simple linear regression
Wednesday PM  Presentation of AM results  Multiple linear regression Simultaneous Simultaneous Stepwise Stepwise Hierarchical Hierarchical  Logistic.
1 G Lect 11W Logistic Regression Review Maximum Likelihood Estimates Probit Regression and Example Model Fit G Multiple Regression Week 11.
Assessing Survival: Cox Proportional Hazards Model
Regression Analysis. Scatter plots Regression analysis requires interval and ratio-level data. To see if your data fits the models of regression, it is.
Slide 1 The SPSS Sample Problem To demonstrate these concepts, we will work the sample problem for logistic regression in SPSS Professional Statistics.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Multiple Regression The Basics. Multiple Regression (MR) Predicting one DV from a set of predictors, the DV should be interval/ratio or at least assumed.
Multilevel Linear Models Field, Chapter 19. Why use multilevel models? Meeting the assumptions of the linear model – Homogeneity of regression coefficients.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Logistic (regression) single and multiple. Overview  Defined: A model for predicting one variable from other variable(s).  Variables:IV(s) is continuous/categorical,
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Week 5: Logistic regression analysis Overview Questions from last week What is logistic regression analysis? The mathematical model Interpreting the β.
Logistic Regression July 28, 2008 Ivan Katchanovski, Ph.D. POL 242Y-Y.
Advanced Methods and Models in Behavioral Research – 2011/2012 Advanced Models and Methods in Behavioral Research Chris Snijders
© Department of Statistics 2012 STATS 330 Lecture 20: Slide 1 Stats 330: Lecture 20.
Slide 1 The Kleinbaum Sample Problem This problem comes from an example in the text: David G. Kleinbaum. Logistic Regression: A Self-Learning Text. New.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Model Building and Model Diagnostics Chapter 15.
Multiple Logistic Regression STAT E-150 Statistical Methods.
Applied Quantitative Analysis and Practices LECTURE#31 By Dr. Osman Sadiq Paracha.
AMMBR II Gerrit Rooks. Checking assumptions in logistic regression Hosmer & Lemeshow Residuals Multi-collinearity Cooks distance.
Linear Discriminant Analysis and Logistic Regression.
Today Introduction to Stata – Files / directories – Stata syntax – Useful commands / functions Logistic regression analysis with Stata – Estimation – GOF.
Dates Presentations Wed / Fri Ex. 4, logistic regression, Monday Dec 7 th Final Tues. Dec 8 th, 3:30.
Logistic Regression Saed Sayad 1www.ismartsoft.com.
Logistic Regression and Odds Ratios Psych DeShon.
Nonparametric Statistics
The Probit Model Alexander Spermann University of Freiburg SS 2008.
LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture.
Logistic Regression: Regression with a Binary Dependent Variable.
Chapter 13 LOGISTIC REGRESSION. Set of independent variables Categorical outcome measure, generally dichotomous.
Stats Methods at IC Lecture 3: Regression.
The Probit Model Alexander Spermann University of Freiburg SoSe 2009
Nonparametric Statistics
Regression Analysis.
BINARY LOGISTIC REGRESSION
Logistic Regression APKC – STATS AFAC (2016).
Notes on Logistic Regression
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Multiple logistic regression
Nonparametric Statistics
Chapter 6 Logistic Regression: Regression with a Binary Dependent Variable Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall.
Logistic Regression.
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

Logistic Regression Analysis Gerrit Rooks

This lecture 1.Why do we have to know and sometimes use logistic regression? 2.What is the model? What is maximum likelihood estimation? 3.Logistics of logistic regression analysis 1.Estimate coefficients 2.Assess model fit 3.Interpret coefficients 4.Check residuals 4.An SPSS example

Suppose we have 100 observations with information about an individuals age and wether or not this indivual had some kind of a heart disease (CHD) IDageCHD …

A graphic representation of the data

Suppose, as a researcher I am interested in the relation between age and the probability of CHD

To try to predict the probability of CHD, I can regress CHD on Age pr(CHD|age) = *Age

However, linear regression is not a suitable model for probalities. pr(CHD|age) = *Age

In this graph for 8 age groups, I plotted the probability of having a heart disease (proportion)

Instead of a linear probality model, I need a non-linear one

Something like this

This is the logistic regression model

Predicted probabilities are always between 0 and 1 similar to classic regression analysis

Logistics of logistic regression 1.How do we estimate the coefficients? 2.How do we assess model fit? 3.How do we interpret coefficients? 4.How do we check regression assumptions ?

Logistics of logistic regression 1.How do we estimate the coefficients? 2.How do we assess model fit? 3.How do we interpret coefficients? 4.How do we check regression? assumptions ?

Maximum likelihood estimation Method of maximum likelihood yields values for the unknown parameters which maximize the probability of obtaining the observed set of data. Unknown parameters

Maximum likelihood estimation First we have to construct the likelihood function (probability of obtaining the observed set of data). Likelihood = pr(obs1)*pr(obs2)*pr(obs3)…*pr(obsn) Assuming that observations are independent

IDageCHD …

The likelihood function (for the CHD data) Given that we have 100 observations I summarize the function

Log-likelihood For technical reasons the likelihood is transformed in the log-likelihood LL= ln[pr(obs1)]+ln[pr(obs2)]+ln[pr(obs3)]…+ln[pr(obsn)]

The likelihood function (for the CHD data) A clever algorithm gives us values for the parameters b0 and b1 that maximize the likelihood of this data

Estimation of coefficients: SPSS Results

This function fits very good, other values of b0 and b1 give worse results

Illustration 1: suppose we chose.05X instead of.11X

Illustration 2: suppose we chose.40X instead of.11X

Logistics of logistic regression Estimate the coefficients Assess model fit Interpret coefficients Check regression assumptions

Logistics of logistic regression Estimate the coefficients Assess model fit – Between model comparisons – Pseudo R 2 (similar to multiple regression) – Predictive accuracy Interpret coefficients Check regression assumptions

28 Model fit: Between model comparison The log-likelihood ratio test statistic can be used to test the fit of a model The test statistic has a chi-square distribution reduced model full model

29 Between model comparisons: likelihood ratio test reduced model full model The model including only an intercept Is often called the empty model. SPSS uses this model as a default.

30 Between model comparisons: Test can be used for individual coefficients reduced model full model

29.31 = -107,35 – 2LL(baseline)-2LL(baseline) = 136,66  This is the test statistic, and it’s associated significance Between model comparison: SPSS output

32 Overall model fit pseudo R 2 Just like in multiple regression, pseudo R 2 ranges 0.0 to 1.0 – Cox and Snell cannot theoretically reach 1 – Nagelkerke adjusted so that it can reach 1 log-likelihood of model before any predictors were entered log-likelihood of the model that you want to test NOTE: R2 in logistic regression tends to be (even) smaller than in multiple regression

33 Overall model fit: Classification table We correctly predict 74% of our observation

34 Overall model fit: Classification table 14 cases had a CHD while according to our model this shouldnt have happened.

35 Overall model fit: Classification table 12 cases didnt have a CHD while according to our model this should have happened.

Logistics of logistic regression Estimate the coefficients Assess model fit Interpret coefficients Check regression assumptions

Logistics of logistic regression Estimate the coefficients Assess model fit Interpret coefficients – Direction – Significance – Magnitude Check regression assumptions

38 Interpreting coefficients: direction We can rewrite our LRM as follows: into:

39 Interpreting coefficients: direction original b reflects changes in logit: b>0 -> positive relationship exponentiated b reflects the changes in odds: exp(b) > 1 -> positive relationship

40 Interpreting coefficients: direction We can rewrite our LRM as follows: into:

41 Interpreting coefficients: direction original b reflects changes in logit: b>0 -> positive relationship exponentiated b reflects the changes in odds: exp(b) > 1 -> positive relationship

42 Testing significance of coefficients In linear regression analysis this statistic is used to test significance In logistic regression something similar exists however, when b is large, standard error tends to become inflated, hence underestimation (Type II errors are more likely) t-distribution standard error of estimate estimate Note: This is not the Wald Statistic SPSS presents!!!

Interpreting coefficients: significance SPSS presents While Andy Field thinks SPSS presents this:

44 3. Interpreting coefficients: magnitude The slope coefficient (b) is interpreted as the rate of change in the "log odds" as X changes … not very useful. exp(b) is the effect of the independent variable on the odds, more useful for calculating the size of an effect

Magnitude of association: Percentage change in odds (Exponentiated coefficient i - 1.0) * 100 ProbabilityOdds 25% %1 75%3

46 For our age variable: – Percentage change in odds = (exponentiated coefficient – 1) * 100 = 12% – A one unit increase in previous will result in 12% increase in the odds that the person will have a CHD – So if a soccer player is one year older, the odds that (s)he will have CHD is 12% higher Magnitude of association

Another way: Calculating predicted probabilities So, for somebody 20 years old, the predicted probability is.04 For somebody 70 years old, the predicted probability is.91

Checking assumptions Influential data points & Residuals – Follow Samanthas tips Hosmer & Lemeshow – Divides sample in subgroups – Checks whether there are differences between observed and predicted between subgroups – Test should not be significant, if so: indication of lack of fit

Hosmer & Lemeshow Test divides sample in subgroups, checks whether difference between observed and predicted is about equal in these groups Test should not be significant (indicating no difference)

Examining residuals in lR 1.Isolate points for which the model fits poorly 2.Isolate influential data points

Residual statistics

Cooks distance Means square error Number of parameter Prediction for j from all observations Prediction for j for observations excluding observation i

53 Illustration with SPSS Penalty kicks data, variables: – Scored: outcome variable, 0 = penalty missed, and 1 = penalty scored – Pswq: degree to which a player worries – Previous: percentage of penalties scored by a particulare player in their career

54 SPSS OUTPUT Logistic Regression Tells you something about the number of observations and missings

55 Block 0: Beginning Block this table is based on the empty model, i.e. only the constant in the model these variables will be entered in the model later on

56 Block 1: Method = Enter Block is useful to check significance of individual coefficients, see Field New model this is the test statistic after dividing by - 2 Note: Nagelkerke is larger than Cox

57 Block 1: Method = Enter (Continued) Predictive accuracy has improved (was 53%) estimates standard error estimates significance based on Wald statistic change in odds

58 How is the classification table constructed? # cases not predicted corrrectly # cases not predicted corrrectly

59 How is the classification table constructed? pswqpreviousscoredPredict. prob

60 How is the classification table constructed? pswqprevio us scoredPredict. prob. predict ed