Introduction to Logistic Regression

Slides:



Advertisements
Similar presentations
Statistical Analysis SC504/HS927 Spring Term 2008
Advertisements

Sociology 680 Multivariate Analysis Logistic Regression.
© Department of Statistics 2012 STATS 330 Lecture 32: Slide 1 Stats 330: Lecture 32.
Brief introduction on Logistic Regression
The %LRpowerCorr10 SAS Macro Power Estimation for Logistic Regression Models with Several Predictors of Interest in the Presence of Covariates D. Keith.
Logistic Regression Psy 524 Ainsworth.
Logistic Regression I Outline Introduction to maximum likelihood estimation (MLE) Introduction to Generalized Linear Models The simplest logistic regression.
Regression analysis Linear regression Logistic regression.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Logistic Regression Example: Horseshoe Crab Data
Overview of Logistics Regression and its SAS implementation
April 25 Exam April 27 (bring calculator with exp) Cox-Regression
Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number.
Chapter 8 Logistic Regression 1. Introduction Logistic regression extends the ideas of linear regression to the situation where the dependent variable,
© Copyright 2000, Julia Hartman 1 An Interactive Tutorial for SPSS 10.0 for Windows © by Julia Hartman Binomial Logistic Regression Next.
Multiple Linear Regression Model
Multiple Linear Regression Introduction to Business Statistics, 5e Kvanli/Guynes/Pavur (c)2000 South-Western College Publishing.
Introduction to Logistic Regression. Simple linear regression Table 1 Age and systolic blood pressure (SBP) among 33 adult women.
An Introduction to Logistic Regression JohnWhitehead Department of Economics Appalachian State University.
EPI 809/Spring Multiple Logistic Regression.
Logistic Regression Biostatistics 510 March 15, 2007 Vanessa Perez.
An Introduction to Logistic Regression
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Generalized Linear Models
MODELS OF QUALITATIVE CHOICE by Bambang Juanda.  Models in which the dependent variable involves two ore more qualitative choices.  Valuable for the.
1 G Lect 11W Logistic Regression Review Maximum Likelihood Estimates Probit Regression and Example Model Fit G Multiple Regression Week 11.
Simple Linear Regression
Biostatistics Case Studies 2015 Youngju Pak, PhD. Biostatistician Session 4: Regression Models and Multivariate Analyses.
ALISON BOWLING THE GENERAL LINEAR MODEL. ALTERNATIVE EXPRESSION OF THE MODEL.
Correlation and Regression SCATTER DIAGRAM The simplest method to assess relationship between two quantitative variables is to draw a scatter diagram.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
Multilevel Linear Models Field, Chapter 19. Why use multilevel models? Meeting the assumptions of the linear model – Homogeneity of regression coefficients.
AN INTRODUCTION TO LOGISTIC REGRESSION ENI SUMARMININGSIH, SSI, MM PROGRAM STUDI STATISTIKA JURUSAN MATEMATIKA UNIVERSITAS BRAWIJAYA.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
LOGISTIC REGRESSION A statistical procedure to relate the probability of an event to explanatory variables Used in epidemiology to describe and evaluate.
When and why to use Logistic Regression?  The response variable has to be binary or ordinal.  Predictors can be continuous, discrete, or combinations.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Forecasting Choices. Types of Variable Variable Quantitative Qualitative Continuous Discrete (counting) Ordinal Nominal.
Assessing Binary Outcomes: Logistic Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
©2006 Thomson/South-Western 1 Chapter 14 – Multiple Linear Regression Slides prepared by Jeff Heyl Lincoln University ©2006 Thomson/South-Western Concise.
Chapter 16 Data Analysis: Testing for Associations.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Model Building and Model Diagnostics Chapter 15.
Multiple Logistic Regression STAT E-150 Statistical Methods.
Multiple Regression  Similar to simple regression, but with more than one independent variable R 2 has same interpretation R 2 has same interpretation.
Applied Epidemiologic Analysis - P8400 Fall 2002 Labs 6 & 7 Case-Control Analysis ----Logistic Regression Henian Chen, M.D., Ph.D.
Multiple Regression Learning Objectives n Explain the Linear Multiple Regression Model n Interpret Linear Multiple Regression Computer Output n Test.
More on regression Petter Mostad More on indicator variables If an independent variable is an indicator variable, cases where it is 1 will.
Applied Epidemiologic Analysis - P8400 Fall 2002 Labs 6 & 7 Case-Control Analysis ----Logistic Regression Henian Chen, M.D., Ph.D.
Logistic Regression Hal Whitehead BIOL4062/5062.
LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture.
1 BUSI 6220 By Dr. Nick Evangelopoulos, © 2012 Brief overview of Linear Regression Models (Pre-MBA level)
Nonparametric Statistics
BINARY LOGISTIC REGRESSION
LINEAR REGRESSION 1.
Logistic Regression When and why do we use logistic regression?
A priori violations In the following cases, your data violates the normality and homoskedasticity assumption on a priori grounds: (1) count data  Poisson.
Logistic Regression APKC – STATS AFAC (2016).
An Interactive Tutorial for SPSS 10.0 for Windows©
Logistic Regression.
Advanced Quantitative Techniques
Basic Estimation Techniques
Generalized Linear Models
Business Statistics, 4e by Ken Black
Basic Estimation Techniques
Stats Club Marnie Brennan
Nonparametric Statistics
Chapter 15 – Multiple Linear Regression
Logistic Regression.
What’s the plan? First, we are going to look at the correlation between two variables: studying for calculus and the final percentage grade a student gets.
Business Statistics, 4e by Ken Black
Presentation transcript:

Introduction to Logistic Regression 宇传华(yuchua@163.com ) 武汉大学公共卫生学院流行病与卫生统计学系 2011,5,31 16:59

New Words LPM       线性概率模型 Odds Ratio    优势比 Nominal Variables 名义变量 Dummy Variable 哑变量 Multiple Logistic Regression 多重Logistic回归 16:59

CONTENTS 1. Review the Type of Variables 2. Variables In Logistic Regression 3. Why cannot we use a Linear Regression for Categorical Response? 4. Logistic Regression Model 5. What Is an Odds Ratio? 6. Multiple Logistic Regression 16:59

1. Review the Type of Variables 16:59

Choosing the Scale of Measurement Before analyzing, select the measurement scale for each variable. 16:59

分类(定性)变量 数值(定量)变量 名义变量 有序变量 离散变量 连续变量 16:59

Nominal Variables 16:59

Ordinal Variables 16:59

Binomial Variables Weather Good or Bad ? Male or Female ? 16:59

Continuous Variables 16:59

2. Variables In Logistic Regression 16:59

Predicted ,Outcome ,Dependent variable 应变量 16:59

Types of Logistic Regression 16:59 3. 有序分类logistic回归

What Does Logistic Regression Do? 自变量 to predict the probability of specific outcomes. 二分类应变量 Predictor variables Predicted variable Explanatory variables Response variable Covariables Outcome variable Independent variables Dependent variable 16:59

Independent variables of Logistic Regression Continuous variables Dummy Variable for Nominal 16:59

3. Why cannot we use a Linear Regression for Categorical Response? 16:59

Example: Failing or Passing an Exam Let us define a variable ‘Outcome’ Outcome = 0 if the individual fails the exam = 1 if the individual passes the exam Predictor variable:the quantity of hours we use to study Linear Probability Model’ (LPM) : Prob (Outcome=1) = α + β*Quantity of hours of study 16:59

Linear Probability Models (LPM) Student id Outcome Quantity of Study Hours 1 3 2 34 17 4 6 5 12 15 7 26 8 29 9 14 10 58 11 31 13 ? 16:59

4. Logistic Regression Model 16:59

Logistic Regression Curve 1.0 0.9 0.8 0.7 Probability 0.6 0.5 0.4 0.3 0.2 0.1 0.0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 x 16:59

Logit Transformation Logistic regression models transform probabilities called logits. where i indexes all cases (observations). is the probability the event (a sale, for example) occurs in the ith case. ln is the natural log (to the base e). 16:59

Assumption 1 16:59

Logistic Regression Model logit ( ) = b0 + b1X1 where logit( ) logit transformation of the probability of the event b0 intercept of the regression line b1 slope of the regression line. 线性关系 16:59

LOGISTIC Procedure SAS SPSS Analyze Regression Binary Logistic… PROC LOGISTIC DATA=SAS-data-set <options>; CLASS variables </option>; MODEL response=predictors </options>; OUTPUT OUT=SAS-data-set keyword=name </option>; RUN; SPSS Analyze Regression Binary Logistic… Dependent: y Covariates: x Method: Forward Ward Save…—— Predicted Values  Probabilities  Group membership Option——  CI for exp 95% Probability for Stepwise Entry: 0.1 Removal 0.15 Maximum Likelihood Estimation is a statistical method for estimating the coefficients of a model. The likelihood function L = Prob (p1* p2* … * pn) 16:59

SPSS Output result Odds Ratio 16:59

LPM and Logistic Regression Models Student id Outcome Quantity of Study Hours 1 3 2 34 17 4 6 5 12 15 7 26 8 29 9 14 10 58 11 31 13 16:59

Comparing LPM and the Logistic Curve 16:59

5. What Is an Odds Ratio? An odds ratio indicates how much more likely, with respect to odds, a certain event occurs in one group relative to its occurrence in another group. 16:59

Probabilities from odds The odds, calculated as Can be rearranged to express the probability of an event in terms of the odds: 16:59

Probabilities and Odds 16:59

Probability of Outcome 16:59

Odds 16:59

Odds Ratio 16:59

Properties of the Odds Ratio No Association Odds Ratio Group B More Likely Group A More Likely 0 1 Regression Coefficient b 16:59 -∞ 0 ∞

Odds Ratio from a Logistic Regression Model Estimated logistic regression model: Estimated odds ratio (each more 1 Study Hours): odds ratio = (e-8.469+.495(a+1))/(e-8.469+.495(a)) odds ratio = eb=e.495 = 1.640 16:59

6. Multiple Logistic Regression logit ( ) = b0 + b1X1 + b2X2 + b3X3 16:59

Backward Elimination Method 16:59

Adjusted Odds Ratio 16:59

Interaction in Multiple Logistic Regression 16:59

Interaction Plot Predicted Logit Income Level Females Males Low Medium High Income Level 16:59

Backward Elimination Method . . . 16:59

Multicollinearity in Multiple Logistic Regression The presence of multicollinearity will not lead to biased coefficients. But the standard errors of the coefficients will be inflated. If a variable which you think should be statistically significant is not, consult the correlation coefficients. If two variables are correlated at a rate greater than .6, .7, .8, etc. then try dropping the least theoretically important of the two. 16:59

=15~20 times number of variables Sample Sizes =15~20 times number of variables 16:59

Thanks for your attention! Thanks for your attention 16:59