Discrete Choice Modeling William Greene Stern School of Business New York University Lab Sessions
Lab Session 4 Bivariate Extensions of the Probit Model
Bivariate Probit Model Two equation model General usage of LHS = the set of dependent variables RH1 = one set of independent variables RH2 = a second set of variables Economical use of namelists is useful here Namelist ; x1=one,age,female,educ,married,working $ Namelist ; x2=one,age,female,hhninc,hhkids $ BivariateProbit ;lhs=doctor,hospital ;rh1=x1 ;rh2=x2;marginal effects $
Heteroscedasticity in the Bivariate Probit Model General form of heteroscedasticity in LIMDEP/NLOGIT: Exponential σ i = σ exp(γ’z i ) so that σ i > 0 γ = 0 returns the homoscedastic case σ i = σ Easy to specify Namelist ; x1=one,age,female,educ,married,working ; z1 = … $ Namelist ; x2=one,age,female,hhninc,hhkids ; z2 = … $ BivariateProbit ;lhs=doctor,hospital ;rh1=x1 ; hf1 = z1 ;rh2=x2 ; hf2 = z2$
Heteroscedasticity in Marginal Effects Univariate case: If the variables are the same in x and z, these terms are added. Sign and magnitude are ambiguous Vastly more complicated for the bivariate probit case. NLOGIT handles it internally.
Marginal Effects: Heteroscedasticity | Partial Effects for Ey1|y2=1 | | | Regression Function | Heteroscedasticity | | | | Direct | Indirect | Direct | Indirect | | Variable | Efct x1 | Efct x2 | Efct h1 | Efct h2 | | AGE | | | | | | FEMALE | | | | | | EDUC | | | | | | MARRIED | | | | | | WORKING | | | | | | HHNINC | | | | | | HHKIDS | | | | |
Marginal Effects: Total Effects | Partial derivatives of E[y1|y2=1] with | | respect to the vector of characteristics. | | They are computed at the means of the Xs. | | Effect shown is total of 4 parts above. | | Estimate of E[y1|y2=1] = | | Observations used for means are All Obs. | | Total effects reported = direct+indirect. | |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X| Constant (Fixed Parameter) AGE FEMALE EDUC MARRIED WORKING HHNINC HHKIDS
Imposing Fixed Value and Equality Constraints Used throughout LIMDEP in all models, model parameters appear as a long list: β 1 β 2 β 3 β 4 α 1 α 2 α 3 α 4 σ and so on. M parameters in total. Use ; RST = list of symbols for the model parameters, in the right order This may be used for nonlinear models. Not in REGRESS. Use ;CLS:… for linear models Use the same name for equal parameters Use specific numbers to fix the values
BivariateProbit ; lhs=doctor,hospital ; rh1=one,age,female,educ,married,working ; rh2=one,age,female,hhninc,hhkids ; rst = beta1,beta2,beta3,be,bm,bw, beta1,beta2,beta3,bi,bk, 0.4 $ Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X |Index equation for DOCTOR Constant| *** AGE|.01244*** FEMALE|.38543*** EDUC|.08144*** MARRIED|.42021*** WORKING| |Index equation for HOSPITAL Constant| *** AGE|.01244*** FEMALE|.38543*** HHNINC| *** HHKIDS| ** |Disturbance correlation RHO(1,2)| (Fixed Parameter)
Miscellaneous Topics Two Step Estimation Robust (Sandwich) Covariance matrix Matrix Algebra – Testing for Normality
Two Step Estimation
Murphy and Topel This can usually easily be programmed using the models, CREATE, CALC and MATRIX. Several leading cases are built in.
Two Step Estimation: Automated
Application: Recursive Probit Hospital = bh ’ xh + c*Doctor + eh Doctor = bd ’ xd + ed Sample ; All $ Namelist ; xD=one,age,female,educ,married,working ; xH=one,age,female,hhninc,hhkids $ Reject ; _Groupti < 7 $ Probit ; lhs=hospital;rhs=xh,doctor$ Probit ; lhs=doctor;rhs=xd;prob=pd;hold$ Probit ; lhs=hospital;rhs=xh,pd;2step=pd$
Robust Covariance Matrix
; ROBUST Using the health care data: | Binomial Probit Model | |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | |Index function for probability Constant| *** AGE|.01393*** FEMALE|.32097*** EDUC| *** MARRIED| WORKING| *** Robust VC= G used for estimates. Constant| *** AGE|.01393*** FEMALE|.32097*** EDUC| *** MARRIED| WORKING| ***
Cluster Correction PROBIT ; Lhs = doctor ; Rhs = one,age,female,educ,married,working ; Cluster = ID $ Normal exit: 4 iterations. Status=0. F= | Covariance matrix for the model is adjusted for data clustering. | | Sample of observations contained 7293 clusters defined by | | variable ID which identifies by a value a cluster ID. | Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X |Index function for probability Constant| ** AGE|.01393*** FEMALE|.32097*** EDUC| *** MARRIED| WORKING| ***
Using Matrix Algebra Namelists with the current sample serve 2 major functions: (1) Define lists of variables for model estimation (2) Define the columns of matrices built from the data. NAMELIST ; X = a list ; Z = a list … $ Set the sample any way you like. Observations are now the rows of all matrices. When the sample changes, the matrices change. Lists may be anything, may contain ONE, may overlap (some or all variables) and may contain the same variable(s) more than once
Matrix Functions Matrix Product: MATRIX ; XZ = X ’ Z $ Moments and Inverse MATRIX ; XPX = X ’ X ; InvXPX = $ Moments with individual specific weights in variable w. Σ i w i x i x i ’ = X ’ [w]X. [Σ i w i x i x i ’ ] -1 = Unweighted Sum of Rows in a Matrix Σ i x i = 1 ’ X Column of Sample Means (1/n) Σ i x i = 1/n * X ’ 1 or MEAN(X) (Matrix function. There are over 100 others.) Weighted Sum of rows in matrix Σ i w i x i = 1 ’ [w]X
Normality Test for Probit Thanks to Joachim Wilde, Univ. Halle, Germany for suggesting this.
Normality Test for Probit NAMELIST ; XI = One,... $ CREATE ; yi = the dependent variable $ PROBIT ; Lhs = yi ; Rhs = Xi ; Prob = Pfi $ CREATE ; bxi = b'Xi ; fi = N01(bxi) $ CREATE ; zi3 = -1/2*(bxi^2 - 1) ; zi4 = 1/4*(bxi*(bxi^2+3)) $ NAMELIST ; Zi = Xi,zi3,zi4 $ CREATE ; di = fi/sqr(pfi*(1-pfi)) ; ei = yi - pfi ; eidi = ei*di ; di2 = di*di $ MATRIX ; List ; LM = 1'[eidi]Zi * * Zi'[eidi]1 $
Multivariate Probit MPROBIT ; LHS = y1,y2, …,yM ; Eq1 = RHS for equation 1 ; Eq2 = RHS for equation 2 … ; EqM = RHS for equation M $ Parameters are the slope vectors followed by the lower triangle of the correlation matrix
Constrained Panel Probit Sample ; $ MPROBIT ; LHS = IP84, IP85, IP86 ; MarginalEffects ; Eq1 = One,Fdium84,SP84 ; Eq2 = One,Fdium85,SP85 ; Eq3 = One,Fdium86,SP86 ; Rst = b1,b2,b3,b1,b2,b3,b1,b2,b3,r45, r46, r56 ; Maxit = 3 ; Pts = 15 $ (Reduces time to compute)
Estimated Multivariate Probit | Multivariate Probit Model: 3 equations. | | Number of observations 1270 | | Log likelihood function | | Number of parameters 6 | | Replications for simulated probs. = 15 | |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X| Index function for IP84 Constant FDIUM SP Index function for IP85 Constant FDIUM SP Index function for IP86 Constant FDIUM SP Correlation coefficients R(01,02) R(01,03) R(02,03)
Endogenous Variable in Probit Model PROBIT ; Lhs = y1, y2 ; Rh1 = rhs for the probit model,y2 ; Rh2 = exogenous variables for y2 $ SAMPLE ; All $ CREATE ; GoodHlth = Hsat > 5 $ PROBIT ; Lhs = GoodHlth,Hhninc ; Rh1 = One,Female,Hhninc ; Rh2 = One,Age,Educ $