Presentation is loading. Please wait.

Presentation is loading. Please wait.

Discrete Choice Modeling William Greene Stern School of Business IFS at UCL February 11-13, 2004

Similar presentations


Presentation on theme: "Discrete Choice Modeling William Greene Stern School of Business IFS at UCL February 11-13, 2004"— Presentation transcript:

1 Discrete Choice Modeling William Greene Stern School of Business IFS at UCL February 11-13, 2004

2 Part 3 Modeling Binary Choice

3 A Model for Binary Choice  Yes or No decision (Buy/Not buy)  Example, choose to fly or not to fly to a destination when there are alternatives.  Model: Net utility of flying U fly =  +  1Cost +  2Time +  Income +  Choose to fly if net utility is positive  Data: X = [1,cost,terminal time] Z = [income] y = 1 if choose fly, U fly > 0, 0 if not.

4 What Can Be Learned from the Data? (A Sample of Consumers, i = 1,…,N) Are the attributes “relevant?” Predicting behavior - Individual - Aggregate Analyze changes in behavior when attributes change

5 Application  210 Commuters Between Sydney and Melbourne  Available modes = Air, Train, Bus, Car  Observed: Choice Attributes: Cost, terminal time, other Characteristics: Household income  First application: Fly or other

6 Binary Choice Data Choose Air Gen.Cost Term Time Income

7 An Econometric Model  Choose to fly iff U FLY > 0 U fly =  +  1Cost +  2Time +  Income +  U fly > 0   > -(  +  1Cost +  2Time +  Income)  Probability model: For any person observed by the analyst, Prob(fly) = Prob[  > -(  +  1Cost +  2Time +  Income)]  Note the relationship between the unobserved  and the outcome

8  +  1Cost +  2TTime +  Income

9 Econometrics  How to estimate ,  1,  2,  ? It’s not regression The technique of maximum likelihood Prob[y=1] = Prob[  > -(  +  1Cost +  2Time +  Income)] Prob[y=0] = 1 - Prob[y=1]  Requires a model for the probability

10 Completing the Model: F(  )  The distribution Normal: PROBIT, natural for behavior Logistic: LOGIT, allows “thicker tails” Gompertz: EXTREME VALUE, asymmetric, underlies the basic logit model for multiple choice  Does it matter? Yes, large difference in estimates Not much, quantities of interest are more stable.

11

12 Estimated Binary Choice Model | Binomial Probit Model | | Maximum Likelihood Estimates | | Model estimated: Jan 20, 2004 at 04:08:11PM.| | Dependent variable MODE | | Weighting variable None | | Number of observations 210 | | Iterations completed 6 | | Log likelihood function | | Restricted log likelihood | | Chi squared | | Degrees of freedom 3 | | Prob[ChiSqd > value] = | | Hosmer-Lemeshow chi-squared = | | P-value= with deg.fr. = 8 | |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X| Index function for probability Constant GC TTME HINC

13 Estimated Binary Choice Models LOGIT PROBIT EXTREME VALUE Variable Estimate t-ratio Estimate t-ratio Estimate t-ratio Constant GC TTME HINC Log-L Log-L(0)

14  +  1Cost +  2Time +  (Income+1) Effect on predicted probability of an increase in income (  is positive)

15 How Well Does the Model Fit?  There is no R squared  “Fit measures” computed from log L “pseudo R squared = 1 – logL0/logL Others… - these do not measure fit.  Direct assessment of the effectiveness of the model at predicting the outcome

16 Fit Measures for Binary Choice  Likelihood Ratio Index Bounded by 0 and 1 Rises when the model is expanded  Cramer (and others)

17 Fit Measures for the Logit Model | Fit Measures for Binomial Choice Model | | Probit model for variable MODE | | Proportions P0= P1= | | N = 210 N0= 152 N1= 58 | | LogL = LogL0 = | | Estrella = 1-(L/L0)^(-2L0/n) = | | Efron | McFadden | Ben./Lerman | | | | | | Cramer | Veall/Zim. | Rsqrd_ML | | | | | | Information Akaike I.C. Schwarz I.C. | | Criteria | Pseudo – R-squared

18 Predicting the Outcome  Predicted probabilities P = F(a + b1Cost + b2Time + cIncome)  Predicting outcomes Predict y=1 if P is large Use 0.5 for “large” (more likely than not)  Count successes and failures

19 Individual Predictions from a Logit Model Observation Observed Y Predicted Y Residual x(i)b Pr[Y=1] Note two types of errors and two types of successes.

20 Predictions in Binary Choice Predict y = 1 if P > P* Success depends on the assumed P*

21 ROC Curve  Plot %Y=1 correctly predicted vs. %y=1 incorrectly predicted  45 0 is no fit. Curvature implies fit.  Area under the curve compares models

22

23 Aggregate Predictions Frequencies of actual & predicted outcomes Predicted outcome has maximum probability. Threshold value for predicting Y=1 =.5000 Predicted Actual 0 1 | Total | | Total | 210

24 Analyzing Predictions Frequencies of actual & predicted outcomes Predicted outcome has maximum probability. Threshold value for predicting Y=1 is P* (This table can be computed with any P*.) Predicted Actual 0 1 | Total N(a0,p0) N(a0,p1) | N(a0) 1 N(a1,p0) N(a1,p1) | N(a1) Total N(p0) N(p1) | N

25 Analyzing Predictions - Success  Sensitivity = % actual 1s correctly predicted = 100N(a1,p1)/N(a1) % [100(38/58)=65.5%]  Specificity = % actual 0s correctly predicted = 100N(a0,p0)/N(a0) % [100(151/152)=99.3%]  Positive predictive value = % predicted 1s that were actual 1s = 100N(a1,p1)/N(p1) % [100(38/39)=97.4%]  Negative predictive value = % predicted 0s that were actual 0s = 100N(a0,p0)/N(p0) % [100(151/171)=88.3%]  Correct prediction = %actual 1s and 0s correctly predicted = 100[N(a1,p1)+N(a0,p0)]/N [100(151+38)/210=90.0%]

26 Analyzing Predictions - Failures  False positive for true negative = %actual 0s predicted as 1s = 100N(a0,p1)/N(a0) % [100(1/152)=0.668%]  False negative for true positive = %actual 1s predicted as 0s = 100N(a1,p0)/N(a1) % [100(20/258)=34.5%]  False positive for predicted positive = % predicted 1s that were actual 0s = 100N(a0,p1)/N(p1) % [100(1/39)=2/56%]  False negative for predicted negative = % predicted 0s that were actual 1s = 100N(a1,p0)/N(p0) % [100(20/171)=11.7%]  False predictions = %actual 1s and 0s incorrectly predicted = 100[N(a0,p1)+N(a1,p0)]/N [100(1+20)/210=10.0%]

27 Aggregate Prediction is a Useful Way to Assess the Importance of a Variable Frequencies of actual & predicted outcomes. Predicted outcome has maximum probability. Threshold value for predicting Y=1 =.5000 Predicted Actual 0 1 | Total | | Total | 210 Predicted Actual 0 1 | Total | | Total | 210 Model fit without TTMEModel fit with TTME


Download ppt "Discrete Choice Modeling William Greene Stern School of Business IFS at UCL February 11-13, 2004"

Similar presentations


Ads by Google