LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture.

Slides:



Advertisements
Similar presentations
Continued Psy 524 Ainsworth
Advertisements

Sociology 680 Multivariate Analysis Logistic Regression.
Brief introduction on Logistic Regression
Logistic Regression Psy 524 Ainsworth.
Hypothesis Testing Steps in Hypothesis Testing:
Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number.
Statistics II: An Overview of Statistics. Outline for Statistics II Lecture: SPSS Syntax – Some examples. Normal Distribution Curve. Sampling Distribution.
An Introduction to Logistic Regression JohnWhitehead Department of Economics Appalachian State University.
Log-linear and logistic models Generalised linear model ANOVA revisited Log-linear model: Poisson distribution logistic model: Binomial distribution Deviances.
Log-linear and logistic models
Nemours Biomedical Research Statistics April 23, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Topic 3: Regression.
Notes on Logistic Regression STAT 4330/8330. Introduction Previously, you learned about odds ratios (OR’s). We now transition and begin discussion of.
An Introduction to Logistic Regression
Logistic Regression Source
Data Analysis Statistics. Levels of Measurement Nominal – Categorical; no implied rankings among the categories. Also includes written observations and.
Multiple Regression – Basic Relationships
Standard Binary Logistic Regression
Logistic Regression – Basic Relationships
Generalized Linear Models
SW388R7 Data Analysis & Computers II Slide 1 Multiple Regression – Basic Relationships Purpose of multiple regression Different types of multiple regression.
STAT E-150 Statistical Methods
Categorical Data Prof. Andy Field.
Selecting the Correct Statistical Test
Correlation and Linear Regression
Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis.
Hierarchical Binary Logistic Regression
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Chapter 20 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 These tests can be used when all of the data from a study has been measured on.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
AN INTRODUCTION TO LOGISTIC REGRESSION ENI SUMARMININGSIH, SSI, MM PROGRAM STUDI STATISTIKA JURUSAN MATEMATIKA UNIVERSITAS BRAWIJAYA.
Logistic (regression) single and multiple. Overview  Defined: A model for predicting one variable from other variable(s).  Variables:IV(s) is continuous/categorical,
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Week 5: Logistic regression analysis Overview Questions from last week What is logistic regression analysis? The mathematical model Interpreting the β.
Slide 26-1 Copyright © 2004 Pearson Education, Inc.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Discussion of time series and panel models
Multiple Logistic Regression STAT E-150 Statistical Methods.
Copyright © 2010 Pearson Education, Inc. Warm Up- Good Morning! If all the values of a data set are the same, all of the following must equal zero except.
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
LOGISTIC REGRESSION Binary dependent variable (pass-fail) Odds ratio: p/(1-p) eg. 1/9 means 1 time in 10 pass, 9 times fail Log-odds ratio: y = ln[p/(1-p)]
Logistic Regression Analysis Gerrit Rooks
Dates Presentations Wed / Fri Ex. 4, logistic regression, Monday Dec 7 th Final Tues. Dec 8 th, 3:30.
Logistic Regression Saed Sayad 1www.ismartsoft.com.
Comparing Counts Chapter 26. Goodness-of-Fit A test of whether the distribution of counts in one categorical variable matches the distribution predicted.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
Roger B. Hammer Assistant Professor Department of Sociology Oregon State University Conducting Social Research Logistic Regression Categorical Data Analysis.
Logistic Regression and Odds Ratios Psych DeShon.
Nonparametric Statistics
DISCRIMINANT ANALYSIS. Discriminant Analysis  Discriminant analysis builds a predictive model for group membership. The model is composed of a discriminant.
(Slides not created solely by me – the internet is a wonderful tool) SW388R7 Data Analysis & Compute rs II Slide 1.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Instructor: R. Makoto 1richard makoto UZ Econ313 Lecture notes.
Logistic Regression: Regression with a Binary Dependent Variable.
Chapter 13 LOGISTIC REGRESSION. Set of independent variables Categorical outcome measure, generally dichotomous.
Nonparametric Statistics
Regression Analysis.
BINARY LOGISTIC REGRESSION
Regression Analysis AGEC 784.
Logistic Regression APKC – STATS AFAC (2016).
CHAPTER 7 Linear Correlation & Regression Methods
Notes on Logistic Regression
Categorical Data Aims Loglinear models Categorical data
Multiple logistic regression
Nonparametric Statistics
Chapter 6 Logistic Regression: Regression with a Binary Dependent Variable Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall.
Presentation transcript:

LOGISTIC REGRESSION

Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture of numerical and categorical Independent Variables. Rahul Chandra

To Predict What Since the dependent variable is dichotomous we cannot predict a numerical value for it using logistic regression. Instead, it employs binomial probability theory in which there are only two values to predict: that probability (p) is 1 rather than 0, i.e. the event/person belongs to one group rather than the other. Rahul Chandra

How?  Logistic regression forms a best fitting equation or function using the maximum likelihood method, which maximizes the probability of classifying the observed data into the appropriate category given the regression coefficients. Rahul Chandra

Uses of logistic regression  The first is the prediction of group membership. Since it calculates the probability of success over the probability of failure.  It also provides knowledge of the relationships and strengths among the variables (e.g. marrying the boss’s daughter puts you at a higher probability for job promotion than undertaking five hours unpaid overtime each week). Rahul Chandra

Assumptions of logistic regression  It does not assume a linear relationship between the dependent and independent variables.  The dependent variable must be a dichotomy (2 categories).  The independent variables need not be interval, nor normally distributed, nor linearly related, nor of equal variance within each group. Rahul Chandra

Assumptions of logistic regression  The categories (groups) must be mutually exclusive and exhaustive; a case can only be in one group and every case must be a member of one of the groups.  Larger samples are needed than for linear regression. A minimum of 50 cases per predictor is recommended Rahul Chandra

Logistic regression equation (example) Rahul Chandra

Logistic regression equation (example) Rahul Chandra

Log Transformation of p  This log transformation of the p values to a log distribution enables us to create a link with the normal regression equation.  The log distribution (or logistic transformation of p) is also called the logit of p or logit(p).  Logit(p) is the log (to base e) of the odds ratio or likelihood ratio that the dependent variable is 1. In symbols it is defi ned as: logit(p) Rahul Chandra

Logit Function  logit(p) = log[p / (1 − p)] = ln[p / (1-p)]  Whereas p can only range from 0 to 1, logit(p) scale ranges from negative infinity to positive  infinity Rahul Chandra

logistic regression equation It uses a maximum likelihood method, which maximizes the probability of getting the observed results given the fitted regression coefficients. Rahul Chandra

Calculating P Where: p = the probability that a case is in a particular category, exp = the base of natural logarithms (approx 2.72), a = the constant of the equation and, b = the coefficient of the predictor variables. Rahul Chandra

ODDS  For a dichotomous variable the odds of membership of the target group are equal to the probability of membership in the target group divided by the probability of membership in the other group.  Odds value can range from 0 to infinity and tell you how much more likely it is that an observation is a member of the target group rather than a member of the other group. Rahul Chandra

ODDS  If the probability is 0.80, the odds are 4 to 1 or.80/.20  If the probability is 0.25, the odds are.33 (.25/.75). Rahul Chandra

Logits (log odds)  Logistic regression calculates changes in the log odds of the dependent, not changes in the dependent value as OLS regression does.  The Logits (log odds) are the b coefficients (the slope values) of the regression equation. Rahul Chandra

ODDS RATIO (OR)  It estimates the change in the odds of membership in the target group for a one unit increase in the predictor. It is calculated by using the regression coefficient of the predictor as the exponent or exp.  SPSS calculates odd ratio as EXP(B) Rahul Chandra

ODDS RATIO (OR)  Assume in the example earlier where we were predicting accountancy success by a math competency predictor that b = Thus the odds ratio is exp2.69 or  Therefore the odds of passing are times greater for a student, for example, who had a pre-test score of 5, than for a student whose pre-test score was 4. Rahul Chandra

SPSS Outputs Rahul Chandra

Block 0: Beginning Block  Block 0 presents the results with only the constant included before any coefficients (i.e. those relating to family size and mortgage) are entered into the equation.  Logistic regression compares this model with a model including all the predictors (family size and mortgage) to determine whether the latter model is more appropriate. Rahul Chandra

Block 0: Beginning Block Rahul Chandra

Variables in the Equation Rahul Chandra

Model chi square  The difference between –2LL for the best-fitting model and –2LL for the initial model (in which all the b values are set to zero in block 0) is distributed like chi squared, with degrees of freedom equal to the number of predictors. Rahul Chandra

Model fit and the likelihood function  The Maximum Likelihood (or ML) is used instead to find the function that will maximize our ability to predict the probability of Y based on what we know about X. Likelihood just means probability. Rahul Chandra

Model fit and the likelihood function  We then work out the likelihood of observing the data we actually did observe under each of these hypotheses. The result is usually a very small number, and to make it easier to handle, the natural logarithm is used, producing a log likelihood (LL). Probabilities are always less than one, so LL’s are always negative. Rahul Chandra

The likelihood ratio test  This tests the difference between –2LL for the full model with predictors and –2LL for initial chi-square in the null model only a constant in it.  Significance at the.05 level or lower means the researcher’s model with the predictors is significantly different from the one with the constant only (all ‘b’ coefficients being zero). Rahul Chandra

Model chi square  This difference is the Model chi square that SPSS refers to. Very conveniently, the difference between – 2LL values for models with successive terms added also has a chi squared distribution, so when we use a stepwise procedure, we can use chi-squared tests to find out if adding one or more extra predictors significantly improves the fit of our model. Rahul Chandra

Block 1 Method = Enter Rahul Chandra

Hosmer and Lemeshow Statistic  This is an alternative to model chi square.  If the H-L goodness-of-fit test statistic is greater than.05, as we want for well-fitting models, we fail to reject the null hypothesis that there is no difference between observed and model-predicted values, implying that the model’s estimates fit the data at an acceptable level. Rahul Chandra

Hosmer and Lemeshow Test H-L statistic has a significance of.605 which means that it is not statistically significant and therefore our model is quite a good fit. Rahul Chandra

Classification Table  In a perfect model, all cases will be on the diagonal and the overall percent correct will be 100%. In this study, 87.5% were correctly classified for the take offer group and 92.9% for the decline offer group. Overall 90% were correctly classified. This is a considerable improvement on the 53.3% correct classification with the constant model Rahul Chandra

Classification Table Rahul Chandra

The Variables in the Equation  This table (Table 24.9) has several important elements. The Wald statistic and associated probabilities provide an index of the significance of each predictor in the equation.  The simplest way to assess Wald is to take the significance values and if less than.05 reject the null hypothesis, as the variable does make a significant contribution.  In this case, we note that family size contributed significantly to the prediction (p =.013) but mortgage did not (p =.075). In this case mortgage can be dropped as independent variable. Rahul Chandra

The Variables in the Equation Rahul Chandra

Exp(B)  EXP(B) value associated with family size is Hence when family size is raised by one unit (one person) the odds ratio is 11 times as large and therefore householders are 11 more times likely to belong to the take offer group. Rahul Chandra

‘B’ values These are the logistic coefficients that can be used to create a predictive Equation. Rahul Chandra