Logistic Regression Database Marketing Instructor: N. Kumar.

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 1 Stats 330: Lecture 32.
Brief introduction on Logistic Regression
Logistic Regression Psy 524 Ainsworth.
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
1 Chapter 2 Simple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Bivariate Regression Analysis
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Chapter 10 Simple Regression.
GRA 6020 Multivariate Statistics; The Linear Probability model and The Logit Model (Probit) Ulf H. Olsson Professor of Statistics.
Chapter Eighteen MEASURES OF ASSOCIATION
Data mining and statistical learning, lecture 5 Outline  Summary of regressions on correlated inputs  Ridge regression  PCR (principal components regression)
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 11 th Edition.
Lecture 16 – Thurs, Oct. 30 Inference for Regression (Sections ): –Hypothesis Tests and Confidence Intervals for Intercept and Slope –Confidence.
Notes on Logistic Regression STAT 4330/8330. Introduction Previously, you learned about odds ratios (OR’s). We now transition and begin discussion of.
An Introduction to Logistic Regression
Ch. 14: The Multiple Regression Model building
Measures of Association Deepak Khazanchi Chapter 18.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Simple Linear Regression Analysis
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Statistical hypothesis testing – Inferential statistics II. Testing for associations.
Lecture 5 Correlation and Regression
Regression and Correlation Methods Judy Zhong Ph.D.
Introduction to Linear Regression and Correlation Analysis
Chapter 11 Simple Regression
STATISTICS: BASICS Aswath Damodaran 1. 2 The role of statistics Aswath Damodaran 2  When you are given lots of data, and especially when that data is.
Stats for Engineers Lecture 9. Summary From Last Time Confidence Intervals for the mean t-tables Q Student t-distribution.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Session 10. Applied Regression -- Prof. Juran2 Outline Binary Logistic Regression Why? –Theoretical and practical difficulties in using regular (continuous)
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Linear correlation and linear regression + summary of tests
Linear vs. Logistic Regression Log has a slightly better ability to represent the data Dichotomous Prefer Don’t Prefer Linear vs. Logistic Regression.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Week 5: Logistic regression analysis Overview Questions from last week What is logistic regression analysis? The mathematical model Interpreting the β.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
Linear Discriminant Analysis (LDA). Goal To classify observations into 2 or more groups based on k discriminant functions (Dependent variable Y is categorical.
Logistic Regression. Linear Regression Purchases vs. Income.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Logistic Regression Analysis Gerrit Rooks
Logistic Regression Saed Sayad 1www.ismartsoft.com.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
MathematicalMarketing Slide 5.1 OLS Chapter 5: Ordinary Least Square Regression We will be discussing  The Linear Regression Model  Estimation of the.
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
McGraw-Hill/Irwin © 2003 The McGraw-Hill Companies, Inc.,All Rights Reserved. Part Four ANALYSIS AND PRESENTATION OF DATA.
INTRODUCTION TO MULTIPLE REGRESSION MULTIPLE REGRESSION MODEL 11.2 MULTIPLE COEFFICIENT OF DETERMINATION 11.3 MODEL ASSUMPTIONS 11.4 TEST OF SIGNIFICANCE.
Logistic Regression: Regression with a Binary Dependent Variable.
The simple linear regression model and parameter estimation
BINARY LOGISTIC REGRESSION
Logistic Regression When and why do we use logistic regression?
Regression Analysis AGEC 784.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Probability Theory and Parameter Estimation I
Notes on Logistic Regression
Correlation and Simple Linear Regression
Basic Estimation Techniques
Regression Techniques
Correlation and Simple Linear Regression
Basic Estimation Techniques
Correlation and Simple Linear Regression
10701 / Machine Learning Today: - Cross validation,
BIVARIATE ANALYSIS: Measures of Association Between Two Variables
Chapter 6 Logistic Regression: Regression with a Binary Dependent Variable Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall.
Nazmus Saquib, PhD Head of Research Sulaiman AlRajhi Colleges
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Logistic Regression Database Marketing Instructor: N. Kumar

Logistic Regression vs TGDA Two-Group Discriminant Analysis Implicitly assumes that the Xs are Multivariate Normally (MVN) Distributed This assumption is violated if Xs are categorical variables Logistic Regression does not impose any restriction on the distribution of the Xs Logistic Regression is the recommended approach if at least some of the Xs are categorical variables

Data

Contingency Table Type of Stock LargeSmallTotal Preferred10212 Not Preferred Total111324

Basic Concepts Probability Probability of being a preferred stock = 12/24 = 0.5 Probability that a company’s stock is preferred given that the company is large = 10/11 = Probability that a company’s stock is preferred given that the company is small = 2/13 = 0.154

Concepts … contd. Odds Odds of a preferred stock = 12/12 = 1 Odds of a preferred stock given that the company is large = 10/1 = 10 Odds of a preferred stock given that the company is small = 2/11 = 0.182

Odds and Probability Odds(Event) = Prob(Event)/(1-Prob(Event)) Prob(Event) = Odds(Event)/(1+Odds(Event))

Logistic Regression Take Natural Log of the odds: ln(odds(Preferred|Large)) = ln(10) = ln(odds(Preferred|Small)) = ln(0.182) = Combining these relationships ln(odds(Preferred|Size)) = *Size Log of the odds is a linear function of size The coefficient of size can be interpreted like the coefficient in regression analysis

Interpretation Positive sign  ln(odds) is increasing in size of the company i.e. a large company is more likely to have a preferred stock vis-à-vis a small company Magnitude of the coefficient gives a measure of how much more likely

General Model ln(odds) =  0 +  1 X 1 +  2 X 2 +…+  k X K (1) Recall: Odds = p/(1-p) ln(p/1-p) =  0 +  1 X 1 +  2 X 2 +…+  k X K (2) p =

Logistic Function

Estimation Coefficients in the regression model are estimated by minimizing the sum of squared errors Since, p is non-linear in the parameter estimates we need a non-linear estimation technique Maximum-Likelihood Approach Non-Linear Least Squares

Maximum Likelihood Approach Conditional on parameter , write out the probability of observing the data Write this probability out for each observation Multiply the probability of each observation out to get the joint probability of observing the data condition on  Find the  that maximizes the conditional probability of realizing this data

Logistic Regression Logistic Regression with one categorical explanatory variable reduces to an analysis of the contingency table

Interpretation of Results Look at the –2 Log L statistic Intercept only: Intercept and Covariates: Difference: with 1 DF (p=0.0001) Means that the size variable is explaining a lot

Do the Variables Have a Significant Impact? Like testing whether the coefficients in the regression model are different from zero Look at the output from Analysis of Maximum Likelihood Estimates Loosely, the column Pr>Chi-Square gives you the probability of realizing the estimate in the Parameter estimate column if the estimate were truly zero – if this value is < 0.05 the estimate is considered to be significant

Other things to Look for Akaike’s Information Criterion (AIC), Schwartz’s Criterion (SC) – this like Adj- R 2 – so there is a penalty for having additional covariates The larger the difference between the second and third columns – the better the model fit

Interpretation of the Parameter Estimates ln(p/(1-p)) = *Size p/(1-p) = e (-1.705) e (4.007*Size) For a unit increase in size, odds of being a favored stock go up by e =

Predicted Probabilities and Observed Responses The response variable (success) classifies an observation into an event or a no-event A concordant pair is defined as that pair formed by an event with a PHAT higher than that of the no-event Higher the Concordant pair % the better

Classification For a set of new observations where you have information on size alone You can use the model to predict the probability that success = 1 i.e. the stock is favored If PHAT > 0.5 success = 1else success=2

Logistic Regression with multiple independent variables Independent variables a mixture of continuous and categorical variables

Data

General Model ln(odds) =  0 +  1 Size +  2 FP ln(p/1-p) =  0 +  1 Size +  2 FP p =

Estimation & Interpretation of the Results Identical to the case with one categorical variable

Summary Logistic Regression or Discriminant Analysis Techniques differ in underlying assumptions about the distribution of the explanatory (independent) variables Use logistic regression if you have a mix of categorical and continuous variables