Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ordered probit models.

Similar presentations


Presentation on theme: "Ordered probit models."— Presentation transcript:

1 Ordered probit models

2 Ordered Probit Many discrete outcomes are to questions that have a natural ordering but no quantitative interpretation: Examples: Self reported health status (excellent, very good, good, fair, poor) Do you agree with the following statement Strongly agree, agree, disagree, strongly disagree

3 Can use the same type of model as in the previous section to analyze these outcomes
Another ‘latent variable’ model Key to the model: there is a monotonic ordering of the qualitative responses

4 Self reported health status
Excellent, very good, good, fair, poor Coded as 1, 2, 3, 4, 5 on National Health Interview Survey We will code as 5,4,3,2,1 (easier to think of this way) Asked on every major health survey Important predictor of health outcomes, e.g. mortality Key question: what predicts health status?

5 Important to note – the numbers 1-5 mean nothing in terms of their value, just an ordering to show you the lowest to highest The example below is easily adapted to include categorical variables with any number of outcomes

6 Model yi* = latent index of reported health
The latent index measures your own scale of health. Once yi* crosses a certain value you report poor, then good, then very good, then excellent health

7 yi = (1,2,3,4,5) for (fair, poor, VG, G, excel) Interval decision rule
yi=1 if yi* ≤ u1 yi=2 if u1 < yi* ≤ u2 yi=3 if u2 < yi* ≤ u3 yi=4 if u3 < yi* ≤ u4 yi=5 if yi* > u4

8 As with logit and probit models, we will assume yi
As with logit and probit models, we will assume yi* is a function of observed and unobserved variables yi* = β0 + x1i β1 + x2i β2 …. xki βk + εi yi* = xi β + εi

9 The threshold values (u1, u2, u3, u4) are unknown
The threshold values (u1, u2, u3, u4) are unknown. We do not know the value of the index necessary to push you from very good to excellent. In theory, the threshold values are different for everyone Computer will not only estimate the β’s, but also the thresholds – average across people

10 As with probit and logit, the model will be determined by the assumed distribution of ε
In practice, most people pick nornal, generating an ‘ordered probit’ (I have no idea why) We will generate the math for the probit version

11 Probabilities Lets do the outliers, Pr(yi=1) and Pr(yi=5) first
= Pr(yi* ≤ u1) = Pr(xi β +εi ≤ u1 ) =Pr(εi ≤ u1 - xi β) = Φ[u1 - xi β] = 1- Φ[xi β – u1]

12 Pr(yi=5) = Pr(yi* > u4) = Pr(xi β +εi > u4 ) =Pr(εi > u4 - xi β) = 1 - Φ[u4 - xi β] = Φ[xi β – u4]

13 Sample one for y=3 Pr(yi=3) = Pr(u2 < yi* ≤ u3)
= Pr(yi* ≤ u3) – Pr(yi* ≤ u2) = Pr(xi β +εi ≤ u3) – Pr(xi β +εi ≤ u2) = Pr(εi ≤ u3- xi β) - Pr(εi ≤ u2 - xi β) = Φ[u3- xi β] - Φ[u2 - xi β] = 1 - Φ[xi β - u3] – 1 + Φ[xi β - u2] = Φ[xi β - u2] - Φ[xi β - u3]

14 Summary Pr(yi=1) = 1- Φ[xi β – u1]
Pr(yi=2) = Φ[xi β – u1] - Φ[xi β – u2] Pr(yi=3) = Φ[xi β – u2] - Φ[xi β – u3] Pr(yi=4) = Φ[xi β – u3] - Φ[xi β – u4] Pr(yi=5) = Φ[xi β – u4]

15 Likelihood function There are 5 possible choices for each person
Only 1 is observed L = Σi ln[Pr(yi=k)] for k

16 Programming example Cancer control supplement to 1994 National Health Interview Survey Question: what observed characteristics predict self reported health (1-5 scale) 1=poor, 5=excellent Key covariates: income, education, age, current and former smoking status Programs sr_health_status.do, .dta, .log

17 desc; male byte %9.0g =1 if male age byte %9.0g age in years educ byte %9.0g years of education smoke byte %9.0g current smoker smoke byte %9.0g smoked in past 5 years black float %9.0g =1 if respondent is black othrace float %9.0g =1 if other race (white is ref) sr_health float %9.0g self reported health, 5=excel, 1=poor famincl float %9.0g log family income

18 tab sr_health; 1-5 self | reported | health, | 5=excel, | 1=poor | Freq. Percent Cum. 1 | 2 | 3 | , 4 | , 5 | , Total | ,

19 In STATA oprobit sr_health male age educ famincl black othrace smoke smoke5;

20 Ordered probit estimates Number of obs = 12900
LR chi2(8) = Prob > chi2 = Log likelihood = Pseudo R = sr_health | Coef. Std. Err z P>|z| [95% Conf. Interval] male | age | educ | famincl | black | othrace | smoke | smoke5 | _cut1 | (Ancillary parameters) _cut2 | _cut3 | _cut4 |

21 Interpret coefficients
Marginal effects/changes in probabilities are now a function of 2 things Point of expansion (x’s) Frame of reference for outcome (y) STATA Picks mean values for x’s You pick the value of y

22 Continuous x’s Consider y=5 d Pr(yi=5)/dxi
= d Φ[xi β – u4]/dxi = βφ[xi β – u4] Consider y=3 d Pr(yi=3)/dxi = βφ[xi β – u3] - βφ[xi β – u4]

23 Discrete X’s xi β = β0 + x1i β1 + x2i β2 …. xki βk ΔPr(yi=5) =
X2i is yes or no (1 or 0) ΔPr(yi=5) = Φ[β0 + x1i β1 + β2 + x3i β3 +.. xki βk] - Φ[β0 + x1i β1 + x3i β3 …. xki βk] Change in the probabilities when x2i=1 and x2i=0

24 Ask for marginal effects
mfx compute, predict(outcome(5));

25 mfx compute, predict(outcome(5));
Marginal effects after oprobit y = Pr(sr_health==5) (predict, outcome(5)) = variable | dy/dx Std. Err. z P>|z| [ 95% C.I. ] X male*| age | educ | famincl | black*| othrace*| smoke*| smoke5*| (*) dy/dx is for discrete change of dummy variable from 0 to 1

26 Interpret the results Males are 4.7 percentage points more likely to report excellent Each year of age decreases chance of reporting excellent by 0.7 percentage points Current smokers are 7.5 percentage points less likely to report excellent health

27 Minor notes about estimation
Wald tests/-2 log likelihood tests are done the exact same was as in PROBIT and LOGIT

28 Use PRCHANGE to calculate marginal effect for a specific person
prchange, x(age=40 black=0 othrace=0 smoke=0 smoke5=0 educ=16); When a variable is NOT specified (famincl), STATA takes the sample mean.

29 PRCHANGE will produce results for all outcomes
male Avg|Chg| 0-> 5 0->

30 age Avg|Chg| Min->Max -+1/ -+sd/ MargEfct


Download ppt "Ordered probit models."

Similar presentations


Ads by Google