Presentation is loading. Please wait.

Presentation is loading. Please wait.

[Part 6] 1/26 Discrete Choice Modeling Count Data Models Discrete Choice Modeling William Greene Stern School of Business New York University 0Introduction.

Similar presentations


Presentation on theme: "[Part 6] 1/26 Discrete Choice Modeling Count Data Models Discrete Choice Modeling William Greene Stern School of Business New York University 0Introduction."— Presentation transcript:

1 [Part 6] 1/26 Discrete Choice Modeling Count Data Models Discrete Choice Modeling William Greene Stern School of Business New York University 0Introduction 1Methodology 2Binary Choice 3Panel Data 4Bivariate Probit 5Ordered Choice 6Count Data Models 7Multinomial Choice 8Nested Logit 9Heterogeneity 10Latent Class 11Mixed Logit 12Stated Preference 13Hybrid Choice

2 [Part 6] 2/26 Discrete Choice Modeling Count Data Models Application: Major Derogatory Reports AmEx Credit Card Holders N = 1310 (of 13,777) Number of major derogatory reports in 1 year Issues: Nonrandom selection Excess zeros

3 [Part 6] 3/26 Discrete Choice Modeling Count Data Models Histogram for MAJORDRG NOBS= 1310, Too low: 0, Too high: 0 Bin Lower limit Upper limit Frequency Cumulative Frequency ======================================================================== (.8038) 1053(.8038) (.1038) 1189(.9076) (.0382) 1239(.9458) (.0183) 1263(.9641) (.0130) 1280(.9771) (.0076) 1290(.9847) (.0038) 1295(.9885) (.0046) 1301(.9931) (.0000) 1301(.9931) (.0015) 1303(.9947) (.0008) 1304(.9954) (.0031) 1308(.9985) (.0008) 1309(.9992) (.0000) 1309(.9992) (.0008) 1310(1.0000) Histogram for Credit Data

4 [Part 6] 4/26 Discrete Choice Modeling Count Data Models

5 [Part 6] 5/26 Discrete Choice Modeling Count Data Models Doctor Visits

6 [Part 6] 6/26 Discrete Choice Modeling Count Data Models Basic Modeling for Counts of Events E.g., Visits to site, number of purchases, number of doctor visits Regression approach Quantitative outcome measured Discrete variable, model probabilities Poisson probabilities – loglinear model

7 [Part 6] 7/26 Discrete Choice Modeling Count Data Models Poisson Model for Doctor Visits Poisson Regression Dependent variable DOCVIS Log likelihood function Restricted log likelihood Chi squared [ 6 d.f.] Significance level McFadden Pseudo R-squared Estimation based on N = 27326, K = 7 Information Criteria: Normalization=1/N Normalized Unnormalized AIC Chi- squared = RsqP=.0818 G - squared = RsqD=.0601 Overdispersion tests: g=mu(i) : Overdispersion tests: g=mu(i)^2: Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X Constant|.77267*** AGE|.01763*** EDUC| *** FEMALE|.29287*** MARRIED| HHNINC| *** HHKIDS| ***

8 [Part 6] 8/26 Discrete Choice Modeling Count Data Models Partial Effects Partial derivatives of expected val. with respect to the vector of characteristics. Effects are averaged over individuals. Observations used for means are All Obs. Conditional Mean at Sample Point Scale Factor for Marginal Effects Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X AGE|.05613*** EDUC| *** FEMALE|.93237*** MARRIED| HHNINC| *** HHKIDS| ***

9 [Part 6] 9/26 Discrete Choice Modeling Count Data Models Poisson Model Specification Issues Equi-Dispersion: Var[y i |x i ] = E[y i |x i ]. Overdispersion: If i = exp[x i + ε i ], E[y i |x i ] = γexp[x i ] Var[y i ] > E[y i ] (overdispersed) ε i ~ log-Gamma Negative binomial model ε i ~ Normal[0, 2 ] Normal-mixture model ε i is viewed as unobserved heterogeneity (frailty). Normal model may be more natural. Estimation is a bit more complicated.

10 [Part 6] 10/26 Discrete Choice Modeling Count Data Models

11 [Part 6] 11/26 Discrete Choice Modeling Count Data Models Poisson Model for Doctor Visits Poisson Regression Dependent variable DOCVIS Log likelihood function Restricted log likelihood Chi squared [ 6 d.f.] Significance level McFadden Pseudo R-squared Estimation based on N = 27326, K = 7 Information Criteria: Normalization=1/N Normalized Unnormalized AIC Chi- squared = RsqP=.0818 G - squared = RsqD=.0601 Overdispersion tests: g=mu(i) : Overdispersion tests: g=mu(i)^2: Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X Constant|.77267*** AGE|.01763*** EDUC| *** FEMALE|.29287*** MARRIED| HHNINC| *** HHKIDS| ***

12 [Part 6] 12/26 Discrete Choice Modeling Count Data Models Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X | Standard – Negative Inverse of Second Derivatives Constant|.77267*** AGE|.01763*** EDUC| *** FEMALE|.29287*** MARRIED| HHNINC| *** HHKIDS| *** | Robust – Sandwich Constant|.77267*** AGE|.01763*** EDUC| *** FEMALE|.29287*** MARRIED| HHNINC| *** HHKIDS| *** | Cluster Correction Constant|.77267*** AGE|.01763*** EDUC| *** FEMALE|.29287*** MARRIED| HHNINC| *** HHKIDS| ***

13 [Part 6] 13/26 Discrete Choice Modeling Count Data Models Negative Binomial Specification Prob(Y i =j|x i ) has greater mass to the right and left of the mean Conditional mean function is the same as the Poisson: E[y i |x i ] = λ i =Exp(x i ), so marginal effects have the same form. Variance is Var[y i |x i ] = λ i (1 + α λ i ), α is the overdispersion parameter; α = 0 reverts to the Poisson. Poisson is consistent when NegBin is appropriate. Therefore, this is a case for the ROBUST covariance matrix estimator. (Neglected heterogeneity that is uncorrelated with x i.)

14 [Part 6] 14/26 Discrete Choice Modeling Count Data Models NegBin Model for Doctor Visits Negative Binomial Regression Dependent variable DOCVIS Log likelihood function NegBin LogL Restricted log likelihood Poisson LogL Chi squared [ 1 d.f.] Reject Poisson model Significance level McFadden Pseudo R-squared Estimation based on N = 27326, K = 8 Information Criteria: Normalization=1/N Normalized Unnormalized AIC NegBin form 2; Psi(i) = theta Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X Constant|.80825*** AGE|.01806*** EDUC| *** FEMALE|.32596*** MARRIED| HHNINC| *** HHKIDS| *** |Dispersion parameter for count data model Alpha| ***

15 [Part 6] 15/26 Discrete Choice Modeling Count Data Models Marginal Effects Scale Factor for Marginal Effects POISSON Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X AGE|.05613*** EDUC| *** FEMALE|.93237*** MARRIED| HHNINC| *** HHKIDS| *** Scale Factor for Marginal Effects NEGATIVE BINOMIAL AGE|.05767*** EDUC| *** FEMALE| *** MARRIED| HHNINC| *** HHKIDS| ***

16 [Part 6] 16/26 Discrete Choice Modeling Count Data Models

17 [Part 6] 17/26 Discrete Choice Modeling Count Data Models

18 [Part 6] 18/26 Discrete Choice Modeling Count Data Models Model Formulations E[y i |x i ]=λ i

19 [Part 6] 19/26 Discrete Choice Modeling Count Data Models NegBin-P Model Negative Binomial (P) Model Dependent variable DOCVIS Log likelihood function Restricted log likelihood Chi squared [ 1 d.f.] Variable| Coefficient Standard Error b/St.Er Constant|.60840*** AGE|.01710*** EDUC| *** FEMALE|.36386*** MARRIED|.03670* HHNINC| *** HHKIDS| *** |Dispersion parameter for count data model Alpha| *** |Negative Binomial. General form, NegBin P P| *** NB-2 NB-1 Poisson

20 [Part 6] 20/26 Discrete Choice Modeling Count Data Models

21 [Part 6] 21/26 Discrete Choice Modeling Count Data Models Zero Inflation – ZIP Models Two regimes: (Recreation site visits) Zero (with probability 1). (Never visit site) Poisson with Pr(0) = exp[- x i ]. (Number of visits, including zero visits this season.) Unconditional: Pr[0] = P(regime 0) + P(regime 1)*Pr[0|regime 1] Pr[j | j >0] = P(regime 1)*Pr[j|regime 1] Two inflation – Number of children These are latent class models

22 [Part 6] 22/26 Discrete Choice Modeling Count Data Models Zero Inflation Models

23 [Part 6] 23/26 Discrete Choice Modeling Count Data Models Notes on Zero Inflation Models Poisson is not nested in ZIP. γ = 0 in ZIP does not produce Poisson; it produces ZIP with P(regime 0) = ½. Standard tests are not appropriate Use Vuong statistic. ZIP model almost always wins. Zero Inflation models extend to NB models – ZINB is a standard model Creates two sources of overdispersion Generally difficult to estimate

24 [Part 6] 24/26 Discrete Choice Modeling Count Data Models ZIP Model Zero Altered Poisson Regression Model Logistic distribution used for splitting model. ZAP term in probability is F[tau x Z(i) ] Comparison of estimated models Pr[0|means] Number of zeros Log-likelihood Poisson Act.= Prd.= Z.I.Poisson Act.= Prd.= Vuong statistic for testing ZIP vs. unaltered model is Distributed as standard normal. A value greater than favors the zero altered Z.I.Poisson model. A value less than rejects the ZIP model Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X |Poisson/NB/Gamma regression model Constant| *** AGE|.01100*** EDUC| *** FEMALE|.10943*** MARRIED| *** HHNINC| *** HHKIDS| *** |Zero inflation model Constant| *** FEMALE| *** EDUC|.04114***

25 [Part 6] 25/26 Discrete Choice Modeling Count Data Models Scale Factor for Marginal Effects POISSON Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X AGE|.05613*** EDUC| *** FEMALE|.93237*** MARRIED| HHNINC| *** HHKIDS| *** Scale Factor for Marginal Effects NEGATIVE BINOMIAL - 2 AGE|.05767*** EDUC| *** FEMALE| *** MARRIED| HHNINC| *** HHKIDS| *** Scale Factor for Marginal Effects ZERO INFLATED POISSON AGE|.03427*** EDUC| *** FEMALE|.97958*** MARRIED| *** HHNINC| *** HHKIDS| *** Marginal Effects for Different Models

26 [Part 6] 26/26 Discrete Choice Modeling Count Data Models A Hurdle Model Two part model: Model 1: Probability model for more than zero occurrences Model 2: Model for number of occurrences given that the number is greater than zero. Applications common in health economics Usage of health care facilities Use of drugs, alcohol, etc.

27 [Part 6] 27/26 Discrete Choice Modeling Count Data Models Hurdle Model

28 [Part 6] 28/26 Discrete Choice Modeling Count Data Models Hurdle Model for Doctor Visits Poisson hurdle model for counts Dependent variable DOCVIS Log likelihood function Restricted log likelihood Chi squared [ 1 d.f.] Significance level McFadden Pseudo R-squared Estimation based on N = 27326, K = 10 LOGIT hurdle equation Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X |Parameters of count model equation Constant| *** AGE|.01088*** EDUC| *** FEMALE|.10244*** MARRIED| *** HHNINC| *** HHKIDS| *** |Parameters of binary hurdle equation Constant|.77475*** FEMALE|.59389*** EDUC| ***

29 [Part 6] 29/26 Discrete Choice Modeling Count Data Models Partial Effects Partial derivatives of expected val. with respect to the vector of characteristics. Effects are averaged over individuals. Observations used for means are All Obs. Conditional Mean at Sample Point.0109 Scale Factor for Marginal Effects Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X |Effects in Count Model Equation Constant| AGE| EDUC| FEMALE| MARRIED| HHNINC| HHKIDS| |Effects in Binary Hurdle Equation Constant|.86178*** FEMALE|.66060*** EDUC| *** |Combined effect is the sum of the two parts Constant| * EDUC| *** FEMALE|.96915***

30 [Part 6] 30/26 Discrete Choice Modeling Count Data Models Zero Inflation Models

31 [Part 6] 31/26 Discrete Choice Modeling Count Data Models Notes on Zero Inflation Models Poisson is not nested in ZIP. γ = 0 in ZIP does not produce Poisson; it produces ZIP with P(regime 0) = ½. Standard tests are not appropriate Use Vuong statistic. ZIP model almost always wins. Zero Inflation models extend to NB models – ZINB(tau) and ZINB are standard models Creates two sources of overdispersion Generally difficult to estimate

32 [Part 6] 32/26 Discrete Choice Modeling Count Data Models An Unidentified ZINB Model

33 [Part 6] 33/26 Discrete Choice Modeling Count Data Models

34 [Part 6] 34/26 Discrete Choice Modeling Count Data Models

35 [Part 6] 35/26 Discrete Choice Modeling Count Data Models Partial Effects for Different Models

36 [Part 6] 36/26 Discrete Choice Modeling Count Data Models The Vuong Statistic for Nonnested Models

37 [Part 6] 37/26 Discrete Choice Modeling Count Data Models

38 [Part 6] 38/26 Discrete Choice Modeling Count Data Models A Hurdle Model Two part model: Model 1: Probability model for more than zero occurrences Model 2: Model for number of occurrences given that the number is greater than zero. Applications common in health economics Usage of health care facilities Use of drugs, alcohol, etc.

39 [Part 6] 39/26 Discrete Choice Modeling Count Data Models

40 [Part 6] 40/26 Discrete Choice Modeling Count Data Models Hurdle Model

41 [Part 6] 41/26 Discrete Choice Modeling Count Data Models Hurdle Model for Doctor Visits

42 [Part 6] 42/26 Discrete Choice Modeling Count Data Models Partial Effects

43 [Part 6] 43/26 Discrete Choice Modeling Count Data Models

44 [Part 6] 44/26 Discrete Choice Modeling Count Data Models

45 [Part 6] 45/26 Discrete Choice Modeling Count Data Models

46 [Part 6] 46/26 Discrete Choice Modeling Count Data Models Application of Several of the Models Discussed in this Section

47 [Part 6] 47/26 Discrete Choice Modeling Count Data Models

48 [Part 6] 48/26 Discrete Choice Modeling Count Data Models Winkelmann finds that there is no correlation between the decisions… A significant correlation is expected … [T]he correlation comes from the way the relation between the decisions is modeled. See also: van Ophem H Modeling selectivity in count data models. Journal of Business and Economic Statistics 18: 503–511.

49 [Part 6] 49/26 Discrete Choice Modeling Count Data Models Probit Participation Equation Poisson-Normal Intensity Equation

50 [Part 6] 50/26 Discrete Choice Modeling Count Data Models Bivariate-Normal Heterogeneity in Participation and Intensity Equations Gaussian Copula for Participation and Intensity Equations

51 [Part 6] 51/26 Discrete Choice Modeling Count Data Models Correlation between Heterogeneity Terms Correlation between Counts

52 [Part 6] 52/26 Discrete Choice Modeling Count Data Models Panel Data Models Heterogeneity; λ it = exp(βx it + c i ) Fixed Effects Poisson: Standard, no incidental parameters issue Negative Binomial Hausman, Hall, Griliches (1984) put FE in variance, not the mean Use brute force to get a conventional FE model Random Effects Poisson Log-gamma heterogeneity becomes an NB model Contemporary treatments are using normal heterogeneity with simulation or quadrature based estimators NB with random effects is equivalent to two effects one time varying one time invariant. The model is probably overspecified Random Parameters: Mixed models, latent class models, hiererchical – all extended to Poisson and NB

53 [Part 6] 53/26 Discrete Choice Modeling Count Data Models Poisson (log)Normal Mixture

54 [Part 6] 54/26 Discrete Choice Modeling Count Data Models

55 [Part 6] 55/26 Discrete Choice Modeling Count Data Models A Peculiarity of the FENB Model True FE model has λ i =exp(α i +x itβ). Cannot be fit if there are time invariant variables. Hausman, Hall and Griliches (Econometrica, 1984) has α i appearing in θ. Produces different results Implies that the FEM can contain time invariant variables.

56 [Part 6] 56/26 Discrete Choice Modeling Count Data Models

57 [Part 6] 57/26 Discrete Choice Modeling Count Data Models See: Allison and Waterman (2002), Guimaraes (2007) Greene, Econometric Analysis (2011)

58 [Part 6] 58/26 Discrete Choice Modeling Count Data Models

59 [Part 6] 59/26 Discrete Choice Modeling Count Data Models Bivariate Random Effects

60 [Part 6] 60/26 Discrete Choice Modeling Count Data Models


Download ppt "[Part 6] 1/26 Discrete Choice Modeling Count Data Models Discrete Choice Modeling William Greene Stern School of Business New York University 0Introduction."

Similar presentations


Ads by Google