Logistic Regression Database Marketing Instructor: N. Kumar.

Logistic Regression Database Marketing Instructor: N. Kumar

Logistic Regression vs TGDA Two-Group Discriminant Analysis Implicitly assumes that the Xs are Multivariate Normally (MVN) Distributed This assumption is violated if Xs are categorical variables Logistic Regression does not impose any restriction on the distribution of the Xs Logistic Regression is the recommended approach if at least some of the Xs are categorical variables

Contingency Table Type of Stock LargeSmallTotal Preferred10212 Not Preferred 11112 Total111324

Basic Concepts Probability Probability of being a preferred stock = 12/24 = 0.5 Probability that a company’s stock is preferred given that the company is large = 10/11 = 0.909 Probability that a company’s stock is preferred given that the company is small = 2/13 = 0.154

Concepts … contd. Odds Odds of a preferred stock = 12/12 = 1 Odds of a preferred stock given that the company is large = 10/1 = 10 Odds of a preferred stock given that the company is small = 2/11 = 0.182

Odds and Probability Odds(Event) = Prob(Event)/(1-Prob(Event)) Prob(Event) = Odds(Event)/(1+Odds(Event))

Logistic Regression Take Natural Log of the odds: ln(odds(Preferred|Large)) = ln(10) = 2.303 ln(odds(Preferred|Small)) = ln(0.182) = -1.704 Combining these relationships ln(odds(Preferred|Size)) = -1.704 + 4.007*Size Log of the odds is a linear function of size The coefficient of size can be interpreted like the coefficient in regression analysis

Interpretation Positive sign  ln(odds) is increasing in size of the company i.e. a large company is more likely to have a preferred stock vis-à-vis a small company Magnitude of the coefficient gives a measure of how much more likely

General Model ln(odds) =  0 +  1 X 1 +  2 X 2 +…+  k X K (1) Recall: Odds = p/(1-p) ln(p/1-p) =  0 +  1 X 1 +  2 X 2 +…+  k X K (2) p =

Logistic Function

Estimation Coefficients in the regression model are estimated by minimizing the sum of squared errors Since, p is non-linear in the parameter estimates we need a non-linear estimation technique Maximum-Likelihood Approach Non-Linear Least Squares

Maximum Likelihood Approach Conditional on parameter , write out the probability of observing the data Write this probability out for each observation Multiply the probability of each observation out to get the joint probability of observing the data condition on  Find the  that maximizes the conditional probability of realizing this data

Logistic Regression Logistic Regression with one categorical explanatory variable reduces to an analysis of the contingency table

Interpretation of Results Look at the –2 Log L statistic Intercept only: 33.271 Intercept and Covariates: 17.864 Difference: 15.407 with 1 DF (p=0.0001) Means that the size variable is explaining a lot

Do the Variables Have a Significant Impact? Like testing whether the coefficients in the regression model are different from zero Look at the output from Analysis of Maximum Likelihood Estimates Loosely, the column Pr>Chi-Square gives you the probability of realizing the estimate in the Parameter estimate column if the estimate were truly zero – if this value is < 0.05 the estimate is considered to be significant

Other things to Look for Akaike’s Information Criterion (AIC), Schwartz’s Criterion (SC) – this like Adj- R 2 – so there is a penalty for having additional covariates The larger the difference between the second and third columns – the better the model fit

Interpretation of the Parameter Estimates ln(p/(1-p)) = -1.705 + 4.007*Size p/(1-p) = e (-1.705) e (4.007*Size) For a unit increase in size, odds of being a favored stock go up by e 4.007 = 54.982

Predicted Probabilities and Observed Responses The response variable (success) classifies an observation into an event or a no-event A concordant pair is defined as that pair formed by an event with a PHAT higher than that of the no-event Higher the Concordant pair % the better

Classification For a set of new observations where you have information on size alone You can use the model to predict the probability that success = 1 i.e. the stock is favored If PHAT > 0.5 success = 1else success=2

Logistic Regression with multiple independent variables Independent variables a mixture of continuous and categorical variables

General Model ln(odds) =  0 +  1 Size +  2 FP ln(p/1-p) =  0 +  1 Size +  2 FP p =

Estimation & Interpretation of the Results Identical to the case with one categorical variable

Summary Logistic Regression or Discriminant Analysis Techniques differ in underlying assumptions about the distribution of the explanatory (independent) variables Use logistic regression if you have a mix of categorical and continuous variables

Logistic Regression Database Marketing Instructor: N. Kumar.

Similar presentations

Presentation on theme: "Logistic Regression Database Marketing Instructor: N. Kumar."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Logistic Regression Database Marketing Instructor: N. Kumar.

Similar presentations

Presentation on theme: "Logistic Regression Database Marketing Instructor: N. Kumar."— Presentation transcript:

Similar presentations

About project

Feedback