# Multinomial Logit & Ordered Probit. Multinomial Logit Is used when the data cannot be ordered. An example is choice of holiday: (i) beach, (ii) mountain,

## Presentation on theme: "Multinomial Logit & Ordered Probit. Multinomial Logit Is used when the data cannot be ordered. An example is choice of holiday: (i) beach, (ii) mountain,"— Presentation transcript:

Multinomial Logit & Ordered Probit

Multinomial Logit Is used when the data cannot be ordered. An example is choice of holiday: (i) beach, (ii) mountain, (iii) culture. For each individual they are go on just one holiday. We will examine this within the context of insurance data. The exact meaning does not matter, just treat it like holiday data. But for a clue type: describe summ *ins* label list insure

use http://www.stata-press.com/data/r11/sysdsn1.dta,clear There are 3 options: those who prepay, those who are not insured and those who are covered by an indemnity generate site1=site==1 generate site2=site==2 generate site3=site==3 NOW TYPE: mlogit insure age male nonwhite site2 site3

Note two equations one to exalpain those who opt for prepaid and a second for those who opt for uninsure

But there are three choices, so why two equations. Well if you know the determinants of two of the choices the third comes about from default. It can also be viewed as the default choice against which the other two are being compared. Here the default case is the first, indemnity. Could we change it? YES.

mlogit insure age male nonwhite site2 site3, base(2) This will change the default case to the second option.

Data also comes from: use http://www.stata- press.com/data/r11/sysdsn1.dta mlogit insure age male nonwhite

Clear, set memory and load data clear set mem 100000 use "http://staff.bath.ac.uk/hssjrh/oprob.dta"

Describe pers

The variable relates to a persons situation and how it has changed over the last five years. Let us look at it. Type: tab2 pers pers

The most common response was improved, but for over half of the sample this was not the case

Ordered probit We use this when we have discrete data and when it is ordered. In this case 1 best (improved) 2 next best (stayed about the same) 3 worst (got worse). The ordering is clear.

Change in personal situation Assume an underlying and continuous variable relating to changes in the individuals personal situation

Change in personal situation If this underlying variable is to the left of μ 1 we classify the variable as 1 the individuals position has improved

Change in personal situation If this underlying variable is to the right of μ 2 we classify the variable as 3 the individuals position has got worse

Change in personal situation In between these two values we classify the variable as 2 the individuals position has stayed the same

You might say: surely stay the same is one specific value (perhaps 0) anything to the left of this has improved and anything to the right has got worse. But it is common to assume a range of values which denote too small a change to denote either improve or got worse and these values are μ 2 and μ 1

Do the estimation. Simply use oprobit rather than regress. oprobit persi lgnipc male age agesq rlaw estonia village town selfemp marrd educ2 unemp manual if age 17 & persi<4 This regresses persi (note we do not have to write its full name as this is the only variable in the data set to begin with persi) on a set of right hand side variables

if age 17 & persi<4 This limits the regressions to individuals older than 17 and under 98 and also cuts out those who answered dont know (coded 4) for persi

The results

The summary output shows the number of observations, the log likelihood and the likelihood ratio. A pseudo R 2 is exactly that and we may cover in the lectures later. It is rarely very high in ordered probit.

Remember the lower is the dependent variable (persi...) the better the person has done (1 for improved, 3 got worse). So a negative coefficient indicates that as that variable increases so the person tends to have been doing better. OK The self employed have been doing better as have people in Estonia???????? Those in countries with a good rule of law have done better and those in richer countries too (lgnipic: log Gross nattional income per capita)

Married people and educated people have been doing better but the unemployed and manual workers worse.

Impact of age The impact of age is thus 0.0513* AGE - 0.0322*AGE*AGE/100 0.0322*AGE*AGE/100 because this is how age squared was calculated So the impact is: AGE IMPACT 25 1.0812 40 1.5368 55 1.8474 70 2.0132 As people get older the probability of things getting worse increases. WHY?

And finally These are the estimates of μ 1 and μ 2

If for an individual the predicted value from the regression is less than -0.6564 then they would be predicted to be categorised as 1 – position improved. If for an individual the predicted value from the regression is greater than 0.3096 then they would be predicted to be categorised as 3 –position has got worse..

And if the predicted value lies between these two values, then predicted value is no change.

Let us calculate some examples. First do the regression and store the coefficient vector as cy oprobit persi lgnipc male age agesq rlaw estonia village town selfemp marrd educ2 unemp manual if age 17 & persi<4 matrix cy= e(b)

oprobit persi lgnipc male age agesq rlaw estonia village town selfemp marrd educ2 unemp manual if age 17 & persi<4 cy[1,1] is the coefficient on lgnipc. The average value for this is 3.0 Then calculate scalar py50 =cy[1,1]*3.0 + cy[1,2]*1 + cy[1,3]* 50 + cy[1,4]* 50*50/100 + cy[1,5]*5+ cy[1,6]*0 + cy[1,7]*1 + cy[1,8]*0 + cy[1,9]*0 + cy[1,10]*1 + cy[1,11]*4 + cy[1,12]*0 + cy[1,13]*0

cy[1,2] is the coefficient on male. Let us code this as 1, i.e. We are predicting for a man. scalar py50 =cy[1,1]*3.0 + cy[1,2]*1 + cy[1,3]* 50 + cy[1,4]* 50*50/100 + cy[1,5]*5+ cy[1,6]*0 + cy[1,7]*1 + cy[1,8]*0 + cy[1,9]*0 + cy[1,10]*1 + cy[1,11]*4 + cy[1,12]*0 + cy[1,13]*0

The other characteristics are 50 years old, country with the highest level of rule of law (5), etc,

This lies between -0.6564 and 0.3096, the two critical values and hence this person would be predicted to be no change Now let us try the same person, but aged 30. scalar py30 =cy[1,1]*3.0 + cy[1,2]*1 + cy[1,3]* 30 + cy[1,4]* 30*30/100 + cy[1,5]*5+ cy[1,6]*0 + cy[1,7]*1 + cy[1,8]*0 + cy[1,9]*0 + cy[1,10]*1 + cy[1,11]*4 + cy[1,12]*0 + cy[1,13]*0

This is less than the lower critical value of -0.6564 hence this person would be predicted to have improved.

No one has ever analysed this before and there may be a paper. That peoples situation gets worse as they age is not surprising, once they reach say 50. But these results suggest It is so for those aged 30 viz a viz 20, just as much as 60 viz a viz 50. Perhaps we should try a spline on this just to check the quadratic form on age is not misleading And why do educated people fare better?

Multinomial Logit by hand program myologit args lnf xb a1 a2 quietly replace `lnf' = ln(1/(1+exp(-`a1' + `xb'))) if \$ML_y1 == 1 quietly replace `lnf' = ln(1/(1+exp(-`a2'+ `xb')) - 1/(1+exp(-`a1' + `xb'))) if \$ML_y1 == 2 quietly replace `lnf' = ln(1 - 1/(1+exp(-`a2'+ `xb'))) if \$ML_y1 == 3 end

* specify the method (lf) and the name of your evaluator (myologit) * followed by the equation(s) in parantheses and then the cutpoints. ml model lf myologit (xb: insure = age male nonwhite ) /a1 /a2 ml check ml search ml maximize,iterate(50) ologit insure age male nonwhite oprobit insure age male nonwhite

Does not converge and no second cut off point. But the coefficients per se the same as if we use the ologit command: