AN INTRODUCTION TO LOGISTIC REGRESSION ENI SUMARMININGSIH, SSI, MM PROGRAM STUDI STATISTIKA JURUSAN MATEMATIKA UNIVERSITAS BRAWIJAYA.

Slides:



Advertisements
Similar presentations
Statistical Analysis SC504/HS927 Spring Term 2008
Advertisements

Qualitative predictor variables
Brief introduction on Logistic Regression
Binary Logistic Regression: One Dichotomous Independent Variable
13- 1 Chapter Thirteen McGraw-Hill/Irwin © 2005 The McGraw-Hill Companies, Inc., All Rights Reserved.
732G21/732G28/732A35 Lecture computer programmers with different experience have performed a test. For each programmer we have recorded whether.
Simple Logistic Regression
Trashball: A Logistic Regression Classroom Activity Christopher Morrell (Joint work with Richard Auer) Mathematics and Statistics Department Loyola University.
Logistic Regression Example: Horseshoe Crab Data
Objectives (BPS chapter 24)
Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number.
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
© 2010 Pearson Prentice Hall. All rights reserved Single Factor ANOVA.
QUALITATIVE AND LIMITED DEPENDENT VARIABLE MODELS.
Introduction to Logistic Regression. Simple linear regression Table 1 Age and systolic blood pressure (SBP) among 33 adult women.
Data mining and statistical learning, lecture 5 Outline  Summary of regressions on correlated inputs  Ridge regression  PCR (principal components regression)
An Introduction to Logistic Regression JohnWhitehead Department of Economics Appalachian State University.
So far, we have considered regression models with dummy variables of independent variables. In this lecture, we will study regression models whose dependent.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 11 th Edition.
Nemours Biomedical Research Statistics April 23, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Simple Linear Regression Analysis
REGRESSION AND CORRELATION
Multiple Regression and Correlation Analysis
An Introduction to Logistic Regression
Descriptive measures of the strength of a linear association r-squared and the (Pearson) correlation coefficient r.
Logistic regression for binary response variables.
Logistic Regression In logistic regression the outcome variable is binary, and the purpose of the analysis is to assess the effects of multiple explanatory.
Logistic Regression Logistic Regression - Dichotomous Response variable and numeric and/or categorical explanatory variable(s) –Goal: Model the probability.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803.
1 1 Slide © 2003 Thomson/South-Western Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple Coefficient of Determination.
Correlation and Regression SCATTER DIAGRAM The simplest method to assess relationship between two quantitative variables is to draw a scatter diagram.
Introduction to Linear Regression
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
Session 10. Applied Regression -- Prof. Juran2 Outline Binary Logistic Regression Why? –Theoretical and practical difficulties in using regular (continuous)
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Chap 14-1 Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics.
1 Lecture 4 Main Tasks Today 1. Review of Lecture 3 2. Accuracy of the LS estimators 3. Significance Tests of the Parameters 4. Confidence Interval 5.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Part 2: Model and Inference 2-1/49 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.
Week 5: Logistic regression analysis Overview Questions from last week What is logistic regression analysis? The mathematical model Interpreting the β.
Copyright ©2011 Nelson Education Limited Linear Regression and Correlation CHAPTER 12.
14- 1 Chapter Fourteen McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved.
Inference with computer printouts. Coefficie nts Standard Errort StatP-value Lower 95% Upper 95% Intercept
Multiple Logistic Regression STAT E-150 Statistical Methods.
1 1 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
I271B QUANTITATIVE METHODS Regression and Diagnostics.
A first order model with one binary and one quantitative predictor variable.
LOGISTIC REGRESSION Binary dependent variable (pass-fail) Odds ratio: p/(1-p) eg. 1/9 means 1 time in 10 pass, 9 times fail Log-odds ratio: y = ln[p/(1-p)]
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Logistic regression (when you have a binary response variable)
Logistic Regression Saed Sayad 1www.ismartsoft.com.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Logistic Regression Hal Whitehead BIOL4062/5062.
Chapter 9 Minitab Recipe Cards. Contingency tests Enter the data from Example 9.1 in C1, C2 and C3.
Logistic Regression and Odds Ratios Psych DeShon.
Nonparametric Statistics
The Probit Model Alexander Spermann University of Freiburg SS 2008.
LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture.
Nonparametric Statistics
BINARY LOGISTIC REGRESSION
Basic Estimation Techniques
Basic Estimation Techniques
Multiple logistic regression
Nonparametric Statistics
Logistic Regression.
Introduction to Logistic Regression
Presentation transcript:

AN INTRODUCTION TO LOGISTIC REGRESSION ENI SUMARMININGSIH, SSI, MM PROGRAM STUDI STATISTIKA JURUSAN MATEMATIKA UNIVERSITAS BRAWIJAYA

OUTLINE  Introduction and Description  Some Potential Problems and Solutions

INTRODUCTION AND DESCRIPTION  Why use logistic regression?  Estimation by maximum likelihood  Interpreting coefficients  Hypothesis testing  Evaluating the performance of the model

WHY USE LOGISTIC REGRESSION?  There are many important research topics for which the dependent variable is "limited."  For example: voting, morbidity or mortality, and participation data is not continuous or distributed normally.  Binary logistic regression is a type of regression analysis where the dependent variable is a dummy variable: coded 0 (did not vote) or 1(did vote)

THE LINEAR PROBABILITY MODEL In the OLS regression: Y =  +  X + e ; where Y = (0, 1)  The error terms are heteroskedastic  e is not normally distributed because Y takes on only two values  The predicted probabilities can be greater than 1 or less than 0

You are a researcher who is interested in understanding the effect of smoking and weight upon resting pulse rate. Because you have categorized the response-pulse rate-into low and high, a binary logistic regression analysis is appropriate to investigate the effects of smoking and weight upon pulse rate. AN EXAMPLE

THE DATA RestingPulseSmokesWeight LowNo140 LowNo145 LowYes160 LowYes190 LowNo155 LowNo165 HighNo150 LowNo190 LowNo195 LowNo110 HighNo150 LowNo108

OLS RESULTS Results Regression Analysis: Tekanan Darah versus Weight, Merokok The regression equation is Tekanan Darah = Weight Merokok Predictor Coef SE Coef T P Constant Weight Merokok S = R-Sq = 7.9% R-Sq(adj) = 5.8%

PROBLEMS: Predicted Values outside the 0,1 range Descriptive Statistics: FITS1 Variable N N* Mean StDev Minimum Q1 Median Q3 Maximum FITS

HETEROSKEDASTICITY

THE LOGISTIC REGRESSION MODEL The "logit" model solves these problems: ln[p/(1-p)] =  +  X + e  p is the probability that the event Y occurs, p(Y=1)  p/(1-p) is the "odds ratio"  ln[p/(1-p)] is the log odds ratio, or "logit"

More:  The logistic distribution constrains the estimated probabilities to lie between 0 and 1.  The estimated probability is: p = 1/[1 + exp(-  -  X)]  if you let  +  X =0, then p =.50  as  +  X gets really big, p approaches 1  as  +  X gets really small, p approaches 0

COMPARING LP AND LOGIT MODELS 0 1 LP Model Logit Model

MAXIMUM LIKELIHOOD ESTIMATION (MLE)  MLE is a statistical method for estimating the coefficients of a model.

INTERPRETING COEFFICIENTS  Since: ln[p/(1-p)] =  +  X + e The slope coefficient (  ) is interpreted as the rate of change in the "log odds" as X changes … not very useful.

 An interpretation of the logit coefficient which is usually more intuitive is the "odds ratio"  Since: [p/(1-p)] = exp(  +  X) exp(  ) is the effect of the independent variable on the "odds ratio"

FROM MINITAB OUTPUT: **Although there is evidence that the estimated coefficient for Weight is not zero, the odds ratio is very close to one (1.03), indicating that a one pound increase in weight minimally effects a person's resting pulse rate **Given that subjects have the same weight, the odds ratio can be interpreted as the odds of smokers in the sample having a low pulse being 30% of the odds of non-smokers having a low pulse. Logistic Regression Table Odds 95% CI Predictor Coef SE Coef Z P Ratio Lower Upper Constant Smokes Yes Weight

HYPOTHESIS TESTING  The Wald statistic for the  coefficient is: Wald (Z)= [  /s.e. B ] 2 which is distributed chi-square with 1 degree of freedom.  The last Log-Likelihood from the maximum likelihood iterations is displayed along with the statistic G. This statistic tests the null hypothesis that all the coefficients associated with predictors equal zero versus these coefficients not all being equal to zero. In this example, G = 7.574, with a p-value of 0.023, indicating that there is sufficient evidence that at least one of the coefficients is different from zero, given that your accepted level is greater than

EVALUATING THE PERFORMANCE OF THE MODEL Goodness-of-Fit Tests displays Pearson, deviance, and Hosmer-Lemeshow goodness- of-fit tests. If the p-value is less than your accepted α-level, the test would reject the null hypothesis of an adequate fit. The goodness-of-fit tests, with p-values ranging from to 0.724, indicate that there is insufficient evidence to claim that the model does not fit the data adequately

MULTICOLLINEARITY  The presence of multicollinearity will not lead to biased coefficients.  But the standard errors of the coefficients will be inflated.  If a variable which you think should be statistically significant is not, consult the correlation coefficients.  If two variables are correlated at a rate greater than.6,.7,.8, etc. then try dropping the least theoretically important of the two.