Logistic Regression.

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

Statistical Analysis SC504/HS927 Spring Term 2008
Linear Regression.
Brief introduction on Logistic Regression
Logistic Regression Psy 524 Ainsworth.
Logistic Regression I Outline Introduction to maximum likelihood estimation (MLE) Introduction to Generalized Linear Models The simplest logistic regression.
Week 3. Logistic Regression Overview and applications Additional issues Select Inputs Optimize complexity Transforming Inputs.
Simple Logistic Regression
Regression With Categorical Variables. Overview Regression with Categorical Predictors Logistic Regression.
Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number.
Statistics II: An Overview of Statistics. Outline for Statistics II Lecture: SPSS Syntax – Some examples. Normal Distribution Curve. Sampling Distribution.
Introduction to Logistic Regression. Simple linear regression Table 1 Age and systolic blood pressure (SBP) among 33 adult women.
Nemours Biomedical Research Statistics April 23, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Topic 3: Regression.
Notes on Logistic Regression STAT 4330/8330. Introduction Previously, you learned about odds ratios (OR’s). We now transition and begin discussion of.
An Introduction to Logistic Regression
Logistic Regression Chapter 8.
Generalized Linear Models
STAT E-150 Statistical Methods
Regression and Correlation
Categorical Data Prof. Andy Field.
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
Education 795 Class Notes Applied Research Logistic Regression Note set 10.
Multinomial Logistic Regression Basic Relationships
ALISON BOWLING THE GENERAL LINEAR MODEL. ALTERNATIVE EXPRESSION OF THE MODEL.
Excepted from HSRP 734: Advanced Statistical Methods June 5, 2008.
Logistic Regression STA2101/442 F 2014 See last slide for copyright information.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
CS 478 – Tools for Machine Learning and Data Mining Linear and Logistic Regression (Adapted from various sources) (e.g., Luiz Pessoa PY 206 class at Brown.
LOGISTIC REGRESSION A statistical procedure to relate the probability of an event to explanatory variables Used in epidemiology to describe and evaluate.
Logistic (regression) single and multiple. Overview  Defined: A model for predicting one variable from other variable(s).  Variables:IV(s) is continuous/categorical,
Logistic Regression. Conceptual Framework - LR Dependent variable: two categories with underlying propensity (yes/no) (absent/present) Independent variables:
When and why to use Logistic Regression?  The response variable has to be binary or ordinal.  Predictors can be continuous, discrete, or combinations.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Contingency tables Brian Healy, PhD. Types of analysis-independent samples OutcomeExplanatoryAnalysis ContinuousDichotomous t-test, Wilcoxon test ContinuousCategorical.
© Department of Statistics 2012 STATS 330 Lecture 20: Slide 1 Stats 330: Lecture 20.
Regression & Correlation. Review: Types of Variables & Steps in Analysis.
Introduction to logistic regression and Generalized Linear Models July 14, 2011 Introduction to Statistical Measurement and Modeling Karen Bandeen-Roche,
Logistic Regression. Linear Regression Purchases vs. Income.
Warsaw Summer School 2015, OSU Study Abroad Program Advanced Topics: Interaction Logistic Regression.
Multiple Logistic Regression STAT E-150 Statistical Methods.
1 Chapter 16 logistic Regression Analysis. 2 Content Logistic regression Conditional logistic regression Application.
Logistic Regression Analysis Gerrit Rooks
Dates Presentations Wed / Fri Ex. 4, logistic regression, Monday Dec 7 th Final Tues. Dec 8 th, 3:30.
1 Introduction to Modeling Beyond the Basics (Chapter 7)
Roger B. Hammer Assistant Professor Department of Sociology Oregon State University Conducting Social Research Logistic Regression Categorical Data Analysis.
Nonparametric Statistics
Logistic Regression For a binary response variable: 1=Yes, 0=No This slide show is a free open source document. See the last slide for copyright information.
LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture.
Logistic Regression: Regression with a Binary Dependent Variable.
Nonparametric Statistics
BINARY LOGISTIC REGRESSION
Logistic Regression When and why do we use logistic regression?
Logistic Regression APKC – STATS AFAC (2016).
CHAPTER 7 Linear Correlation & Regression Methods
Notes on Logistic Regression
Chapter 13 Nonlinear and Multiple Regression
Drop-in Sessions! When: Hillary Term - Week 1 Where: Q-Step Lab (TBC) Sign up with Alice Evans.
Categorical Data Aims Loglinear models Categorical data
Generalized Linear Models
Multiple logistic regression
Nonparametric Statistics
Categorical Data Analysis Review for Final
Logistic Regression.
Statistics II: An Overview of Statistics
What’s the plan? First, we are going to look at the correlation between two variables: studying for calculus and the final percentage grade a student gets.
Chapter 6 Logistic Regression: Regression with a Binary Dependent Variable Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall.
Logistic Regression.
Presentation transcript:

Logistic Regression

Logistic Regression When ? Why ? Just like multiple regression, but when the dependent variable is dichotomous. E.g. improved or not improved; successful or not successful. Why ? Logistic regression can be used for classification purpose (it includes c2). Give probability of an effect (outcome) and evaluate the risk (odds). Why not performed a discriminant analysis ? Probability of success outside [0,1] Normality Why not performed a multiple regression ? Homoscedasticity

Logistic Regression Example: Suppose we want to predict whether someone has a coronary disease (DV) using age in years (IV). It is customary to code a binary DV either 0 or 1.

Logistic Regression The logistic curve Linear part Nonlinear part

Logistic Regression The logistic curve

Logistic Regression Example: Suppose we want to predict whether someone has a coronary disease (DV) using age in years (IV). It is customary to code a binary DV either 0 or 1.

Logistic Regression The logistic curve where is the probability of a 1, e is the base of the natural logarithm (about 2.718) and b is the parameters of the model. b adjusts how quickly the probability changes when X increases by a single unit. Because the relationship between X and is nonlinear, b does not have a straightforward interpretation in this model; contrary to ordinary linear regression.

Logistic Regression (Where did it came from) Suppose we only know a person's age and we want to predict whether that person has a coronary disease or not. We can talk about the probability of having the disease, or we can talk about the odds of having the disease. Let's say that the probability of not having the disease for a given age is .95. Then the odds of not having the disease is Now the odds of having the disease would be .05/.95 or 1/19 or 0.0526. This asymmetry is unappealing, because the odds of having the disease should be the opposite of the odds of not having the disease.

Logistic Regression (Where did it came from) We can take care of this asymmetry by using the natural logarithm, ln. The natural log of 19 is 2.9444 (ln(0.95/0.05)=2.9444). The natural log of 1/19 is - 2.9444 (ln(0.05/0.95)=-2.9444), so the log odds of having a coronary disease is exactly the opposite of the log odds of not having a disease. In term of odds Solving for In term of probability

Maximum log likelihood Logistic Regression Finding the regression weights. In multiple regression, we wanted to minimize the residual sum of squares. With the logistic curve, there is no mathematical solution that will produce least squares estimates of the parameters. We will use instead the maximum (log) likelihood. A likelihood is a conditional probability: P( |X), the probability of given X). The idea is to choose the regression weights that will give the maximum (log) likelihood between the data and the logistic curve. Maximum likelihood Maximum log likelihood

Logistic Regression Finding the regression weights. The maximum of this expression can then be found numerically using an optimization algorithm

Logistic Regression Finding the regression weights. The maximum of this expression can then be found numerically using an optimization algorithm

Logistic Regression Finding the regression weights. The maximum of this expression can then be found numerically using an optimization algorithm

There is only 1 predictor Logistic Regression Hypothesis testing The idea is to compare the full model with only the constant using chi-square. There is only 1 predictor This indicates that age can reliably distinguished between people having a coronary disease from those who do not.

Fisher information matrix Logistic Regression Hypothesis testing We can use the same idea to build a regression model. Also, the Wald statistic can be used (Z test). Fisher information matrix

Logistic Regression Hypothesis testing Also, the Wald statistic can be used Constant IV (coronary disease)

Logistic Regression Explained variability There are three popular measures that approximate the variance interpretation found in linear regression (R2).

Logistic Regression Odds Ratio (OR) The odds ratio is the increase (or decrease) in odds of being in one outcome category when the value of the predictor increases by on unit. If the odds are the same across groups, then OR=1. If the odds are greater than 1, then there is an increase probability of being classify into the category. If the odds are smaller than 1, then there is a decrease probability of being classify into the given category. Thus, at each of my birthdays I increase my odds of having a coronary disease by 1.12. In other words, each year I increase the risk of developing a coronary disease by 12 percents.

Logistic Regression Odds Ratio (OR) Classification table For a 5 year age difference, say, the increase is exp(b)5 [= 1.117315] = 1.74, or a 74% increase. Classification table Cutoff = 0.5 Constant only All predictors Total correct percentage = 57 Total correct percentage = 74

Logistic Regression Prediction If I have (x’=)50 years old, what is my probability of having a coronary disease ?

Logistic Regression Confidence intervals CI=0.95

Logistic Regression Confidence bands CI=0.95

Logistic Regression Recoding a continuous variable into a dichotomous variable Cutoff at 55 Contingency table

Logistic Regression Recoding a continuous variable into a dichotomous variable Cutoff at 55 Regression weights Wald test

Logistic Regression Recoding a continuous variable into a dichotomous variable Cutoff at 55 Explained variability

Logistic Regression Recoding a continuous variable into a dichotomous variable Cutoff at 55 Classification table Total correct percentage = 57 Total correct percentage = 72

Logistic Regression Recoding a continuous variable into a dichotomous variable Cutoff at 55 Odds ratio If I am 55 years old and up, I have 8 times more chances to have a coronary disease.

Logistic Regression Recoding a continuous variable into a dichotomous variable Cutoff at 55 Confidence intervals The CI (0.95) is asymmetric. It suggests that coronary disease is 2.9 to 22.9 more likely to occur if I am 55 yrs and up.