Presentation is loading. Please wait.

Presentation is loading. Please wait.

Logistic Regression Database Marketing Instructor: N. Kumar.

Similar presentations


Presentation on theme: "Logistic Regression Database Marketing Instructor: N. Kumar."— Presentation transcript:

1 Logistic Regression Database Marketing Instructor: N. Kumar

2 Logistic Regression vs TGDA Two-Group Discriminant Analysis Implicitly assumes that the Xs are Multivariate Normally (MVN) Distributed This assumption is violated if Xs are categorical variables Logistic Regression does not impose any restriction on the distribution of the Xs Logistic Regression is the recommended approach if at least some of the Xs are categorical variables

3 Data

4 Contingency Table Type of Stock LargeSmallTotal Preferred10212 Not Preferred 11112 Total111324

5 Basic Concepts Probability Probability of being a preferred stock = 12/24 = 0.5 Probability that a company’s stock is preferred given that the company is large = 10/11 = 0.909 Probability that a company’s stock is preferred given that the company is small = 2/13 = 0.154

6 Concepts … contd. Odds Odds of a preferred stock = 12/12 = 1 Odds of a preferred stock given that the company is large = 10/1 = 10 Odds of a preferred stock given that the company is small = 2/11 = 0.182

7 Odds and Probability Odds(Event) = Prob(Event)/(1-Prob(Event)) Prob(Event) = Odds(Event)/(1+Odds(Event))

8 Logistic Regression Take Natural Log of the odds: ln(odds(Preferred|Large)) = ln(10) = 2.303 ln(odds(Preferred|Small)) = ln(0.182) = -1.704 Combining these relationships ln(odds(Preferred|Size)) = -1.704 + 4.007*Size Log of the odds is a linear function of size The coefficient of size can be interpreted like the coefficient in regression analysis

9 Interpretation Positive sign  ln(odds) is increasing in size of the company i.e. a large company is more likely to have a preferred stock vis-à-vis a small company Magnitude of the coefficient gives a measure of how much more likely

10 General Model ln(odds) =  0 +  1 X 1 +  2 X 2 +…+  k X K (1) Recall: Odds = p/(1-p) ln(p/1-p) =  0 +  1 X 1 +  2 X 2 +…+  k X K (2) p =

11 Logistic Function

12 Estimation Coefficients in the regression model are estimated by minimizing the sum of squared errors Since, p is non-linear in the parameter estimates we need a non-linear estimation technique Maximum-Likelihood Approach Non-Linear Least Squares

13 Maximum Likelihood Approach Conditional on parameter , write out the probability of observing the data Write this probability out for each observation Multiply the probability of each observation out to get the joint probability of observing the data condition on  Find the  that maximizes the conditional probability of realizing this data

14 Logistic Regression Logistic Regression with one categorical explanatory variable reduces to an analysis of the contingency table

15 Interpretation of Results Look at the –2 Log L statistic Intercept only: 33.271 Intercept and Covariates: 17.864 Difference: 15.407 with 1 DF (p=0.0001) Means that the size variable is explaining a lot

16 Do the Variables Have a Significant Impact? Like testing whether the coefficients in the regression model are different from zero Look at the output from Analysis of Maximum Likelihood Estimates Loosely, the column Pr>Chi-Square gives you the probability of realizing the estimate in the Parameter estimate column if the estimate were truly zero – if this value is < 0.05 the estimate is considered to be significant

17 Other things to Look for Akaike’s Information Criterion (AIC), Schwartz’s Criterion (SC) – this like Adj- R 2 – so there is a penalty for having additional covariates The larger the difference between the second and third columns – the better the model fit

18 Interpretation of the Parameter Estimates ln(p/(1-p)) = -1.705 + 4.007*Size p/(1-p) = e (-1.705) e (4.007*Size) For a unit increase in size, odds of being a favored stock go up by e 4.007 = 54.982

19 Predicted Probabilities and Observed Responses The response variable (success) classifies an observation into an event or a no-event A concordant pair is defined as that pair formed by an event with a PHAT higher than that of the no-event Higher the Concordant pair % the better

20 Classification For a set of new observations where you have information on size alone You can use the model to predict the probability that success = 1 i.e. the stock is favored If PHAT > 0.5 success = 1else success=2

21 Logistic Regression with multiple independent variables Independent variables a mixture of continuous and categorical variables

22 Data

23 General Model ln(odds) =  0 +  1 Size +  2 FP ln(p/1-p) =  0 +  1 Size +  2 FP p =

24 Estimation & Interpretation of the Results Identical to the case with one categorical variable

25 Summary Logistic Regression or Discriminant Analysis Techniques differ in underlying assumptions about the distribution of the explanatory (independent) variables Use logistic regression if you have a mix of categorical and continuous variables


Download ppt "Logistic Regression Database Marketing Instructor: N. Kumar."

Similar presentations


Ads by Google