VII. Ordinal & Multinomial

Slides:

Advertisements

Similar presentations

Continued Psy 524 Ainsworth

Advertisements

Richard Williams (with assistance from Cheng Wang) Notre Dame Sociology August 2012 Annual Meetings of the.

Brief introduction on Logistic Regression

Gologit2: Generalized Logistic Regression/ Partial Proportional Odds Models for Ordinal Dependent Variables Part 1: The gologit model & gologit2 program.

Logistic Regression Psy 524 Ainsworth.

EC220 - Introduction to econometrics (chapter 10)

Multiple Regression Models: Interactions and Indicator Variables

Logit & Probit Regression

Limited Dependent Variables

Logistic Regression Example: Horseshoe Crab Data

Introduction to Logistic Regression In Stata Maria T. Kaylen, Ph.D. Indiana Statistical Consulting Center WIM Spring 2014 April 11, 2014, 3:00-4:30pm.

Gologit2: Generalized Logistic Regression/ Partial Proportional Odds Models for Ordinal Dependent Variables Richard Williams Department of Sociology University.

1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.

1 BINARY CHOICE MODELS: PROBIT ANALYSIS In the case of probit analysis, the sigmoid function F(Z) giving the probability is the cumulative standardized.

Chapter 11 Contingency Table Analysis. Nonparametric Systems Another method of examining the relationship between independent (X) and dependant (Y) variables.

Multilevel Models 4 Sociology 8811, Class 26 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.

Multinomial Logit Sociology 8811 Lecture 11 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.

Generalized Ordered Logit Models Part II: Interpretation

Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number.

In previous lecture, we highlighted 3 shortcomings of the LPM. The most serious one is the unboundedness problem, i.e., the LPM may make the nonsense predictions.

Multinomial Logistic Regression

Sociology 601 Class 28: December 8, 2009 Homework 10 Review –polynomials –interaction effects Logistic regressions –log odds as outcome –compared to linear.

Ordered probit models.

Ordinal Logistic Regression

Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 11 th Edition.

Logistic Regression Biostatistics 510 March 15, 2007 Vanessa Perez.

In previous lecture, we dealt with the unboundedness problem of LPM using the logit model. In this lecture, we will consider another alternative, i.e.

Lecture 14-2 Multinomial logit (Maddala Ch 12.2)

Notes on Logistic Regression STAT 4330/8330. Introduction Previously, you learned about odds ratios (OR’s). We now transition and begin discussion of.

Data Analysis Statistics. Levels of Measurement Nominal – Categorical; no implied rankings among the categories. Also includes written observations and.

Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.

BINARY CHOICE MODELS: LOGIT ANALYSIS

SW388R7 Data Analysis & Computers II Slide 1 Multiple Regression – Basic Relationships Purpose of multiple regression Different types of multiple regression.

1 Chapter 20 Two Categorical Variables: The Chi-Square Test.

Christopher Dougherty EC220 - Introduction to econometrics (chapter 10) Slideshow: binary choice logit models Original citation: Dougherty, C. (2012) EC220.

Presentation 12 Chi-Square test.

Bivariate Relationships Analyzing two variables at a time, usually the Independent & Dependent Variables Like one variable at a time, this can be done.

Methods Workshop (3/10/07) Topic: Event Count Models.

1 BINARY CHOICE MODELS: PROBIT ANALYSIS In the case of probit analysis, the sigmoid function is the cumulative standardized normal distribution.

Introduction To Biological Research. Step-by-step analysis of biological data The statistical analysis of a biological experiment may be broken down into.

Multinomial Logistic Regression Basic Relationships

Multinomial Logit Sociology 8811 Lecture 10

7. Comparing Two Groups Goal: Use CI and/or significance test to compare means (quantitative variable) proportions (categorical variable) Group 1 Group.

April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.

MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 26.

BPS - 5th Ed. Chapter 221 Two Categorical Variables: The Chi-Square Test.

Testing hypotheses Continuous variables. H H H H H L H L L L L L H H L H L H H L High Murder Low Murder Low Income 31 High Income 24 High Murder Low Murder.

Lecture 18 Ordinal and Polytomous Logistic Regression BMTRY 701 Biostatistical Methods II.

Multiple Logistic Regression STAT E-150 Statistical Methods.

Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.

Logistic Regression Analysis Gerrit Rooks

Qualitative and Limited Dependent Variable Models ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.

Dates Presentations Wed / Fri Ex. 4, logistic regression, Monday Dec 7 th Final Tues. Dec 8 th, 3:30.

Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.

1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.

Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &

Chi Square Procedures Chapter 14. Chi-Square Goodness-of-Fit Tests Section 14.1.

BINARY LOGISTIC REGRESSION

Presentation 12 Chi-Square test.

Logistic Regression APKC – STATS AFAC (2016).

Advanced Quantitative Techniques

Notes on Logistic Regression

Advanced Quantitative Techniques

INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE

Hypothesis Testing Review

POSC 202A: Lecture Lecture: Substantive Significance, Relationship between Variables 1.

Review for Exam 2 Some important themes from Chapters 6-9

Logistic Regression.

Gologit2: Generalized Logistic Regression/ Partial Proportional Odds Models for Ordinal Dependent Variables Part 1: The gologit model & gologit2 program.

Presentation transcript:

VII. Ordinal & Multinomial Logit Models

To what degree do the dietary & exercise habits of a sample of adults predict whether they are in the low, medium, or high-risk categories for cardiovascular disease? How well do the social traits of a sample of high school students predict whether their achievement test scores are low, medium-low, medium-high, or high?

To what extent do the institutional characteristics of a sample of political regimes predict whether their responsiveness to citizen demands is low, medium, or high? How helpful are the institutional characteristics of a sample of industrial firms in predicting whether the amount of pollution they emit is low, medium, or high?

These are examples of ordinal outcome variables. The categories of an ordinal variable can be ranked, but the distances between the categories are not equal. Because the distances between the categories are not equal, analyzing ordinal outcome variables via OLS regression violates its assumptions & leads to erroneous conclusions. What statistical model avoids the assumption of equal intervals between ordinal categories?

Logit & probit versions of the ordinal regression model safely ignore the OLS assumption of equal intervals between a variable’s categories. But as Long & Freese (pages 137-38) observe, “Simply because the values of a variable can be ordered does not imply that the variable should be analyzed as ordinal.” A categorical, multi-level variable could conceivably be ordered for one purpose but unordered for another.

As Long & Freese conclude, “Overall, when the proper ordering is ambiguous, the models for nominal outcomes [multinomial regression] …should be considered.” Multinomial models treat categories as nominal rather than ordinal: Which do you prefer—apple pie, hot fudge sundae, cheese cake, or cannoli? Which is your racial-ethnic identity: Black, White, Asian, Hispanic, or other?

Let’s use ordinal logistic regression to analyze respondent answers to this statement: “A working mother can establish just as warm & secure of a relationship with her child as a mother who does not work.” The responses are coded as: 1=strongly disagree (SD), 2=disagree (D), 3=agree (A), & 4=strongly agree (SA). These data are examined in Long/Freese, chapter 5. . use ordwarm2, clear

Let’s assume we’ve done the preparatory data analysis & transformations. . ologit warm yr89 male white age ed prst, or nolog table

. ologit warm yr89 male white age ed prst, or nolog table Ordered logit estimates Number of obs = 2293 LR chi2(6) = 301.72 Prob > chi2 = 0.0000 Log likelihood = -2844.9123 Pseudo R2 = 0.0504 ------------------------------------------------------------------------- warm | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+----------------------------------------------------------- yr89 | 1.688605 .1349175 6.56 0.000 1.443836 1.974867 male | .4803214 .0376969 -9.34 0.000 .4118389 .5601915 white | .6762723 .0800576 -3.30 0.001 .5362357 .8528791 age | .9785675 .0024154 -8.78 0.000 .9738449 .983313 ed | 1.06948 .0170849 4.20 0.000 1.036513 1.103496 prst | 1.006091 .003313 1.84 0.065 .9996188 1.012605

What’s the interpretation? Let’s see the coefficients as percentage change in odds: . listcoef, percent ologit (N=2293): Percentage Change in Odds Odds of: >m vs <=m warm b z P>z % %StdX SDofX yr89 0.52390 6.557 0.000 68.9 29.2 0.4897 male -0.73330 -9.343 0.000 -52.0 -30.6 0.4989 white -0.39116 -3.304 0.001 -32.4 -12.1 0.3290 age -0.02167 -8.778 0.000 - 2.1 -30.5 16.7790 ed 0.06717 4.205 0.000 6.9 23.7 3.1608 prst 0.00607 1.844 0.065 0.6 9.2 14.4923

Try fitting the model by means of ordinal probit: . oprobit warm yr89 male white age ed prst, nolog table Of course we can’t obtain odds ratios via ordinal probit. Otherwise the only notable difference is that the logit coefficients are 1.7 times greater than the probit coefficients: the substantive conclusions are basically the same.

We could have used the robust &/or cluster options: .ologit warm yr89 male white age ed prst, or robust nolog table .ologit warm yr89 male white age ed prst, or cluster(district) nolog table .oprobit warm yr89 male white age ed prst, robust nolog table .oprobit warm yr89 male white age ed prst, cluster(district) nolog table Recall that cluster invokes robust standard errors.

One possible problem with ologit or oprobit is perfect prediction: if the outcome variable does not vary within one of the categories of the explanatory variable, Stata will tell you. E.g.: . note: 40 observations completely determined. Standard errors questionable. We may receive the same message with binary outcome variables, but in that case Stata tells us which is the variable at fault & automatically drops the offending observations.

In the case of logistic regression (i. e In the case of logistic regression (i.e. a binary outcome variable),we may decide it wise to drop the offending variable from the model & re-estimate the model. In ordinal categorical regression, we cross-tab the explanatory variables with the outcome variable to identify the culprit. Then we re-categorize or drop the offending variable or—if we deem it wise—drop only the observations at fault (see Long/Freese, page 145).

Let’s return to our ologit model & test nested models: . ologit warm yr89 male white age ed prst, or nolog table . estimates store full . ologit warm yr89 male white age, nolog . lrtest full . likelihood-ratio test LR chi2(2) = 44.57 (Assumption: . nested in full) Prob > chi2 = 0.0000

We can also use the Wald-test to do the same thing (although, as mentioned previously, the likelihood ratio test is the preferred alternative): . test ed prst ( 1) ed = 0 ( 2) prst = 0 chi2( 2) = 44.17 Prob > chi2 = 0.0000 The Wald-test & likelihood ratio test yield the same conclusion (as they usually do).

. linktest, nolog Next step: test the model specification: Ordered logit estimates Number of obs = 2293 LR chi2(2) = 302.75 Prob > chi2 = 0.0000 Log likelihood = -2844.3934 Pseudo R2 = 0.0505 warm Coef. Std. Err. z P>z [95% Conf. Interval] _hat 1.05767 .0821499 12.87 0.000 .8966591 1.218681 _hatsq .0652007 .0640337 1.02 0.309 -.0603031 .1907045 _cut1 -2.444759 .0763629 (Ancillary parameters) _cut2 -.6149168 .052821 _cut3 1.282015 .0601348 No problems here.

An aspect of model specification testing for ologit & oprobit models concerns the proportional odds (or, parallel regression) assumption: similar to OLS, the assumption is that the slope coefficients are identical across levels of the outcome variable—each probability curve is assumed to differ only in being shifted to the left or right (see Long/Freese, pages 150-52). There are two ways of testing this assumption: . omodel logit warm yrs89 male white age ed prst . ologit warm yrs89 male white age ed prst . brant

. omodel logit warm yrs89 male white age ed prst Ordered logit estimates Number of obs = 2293 LR chi2(6) = 301.72 Prob > chi2 = 0.0000 Log likelihood = -2844.9123 Pseudo R2 = 0.0504 warm Coef. Std. Err. z P>z [95% Conf. Interval] yr89 .5239025 .0798988 6.56 0.000 .3673037 .6805013 male -.7332997 .0784827 -9.34 0.000 -.8871229 -.5794766 white -.3911595 .1183808 -3.30 0.001 -.6231815 -.1591374 age -.0216655 .0024683 -8.78 0.000 -.0265032 -.0168278 ed .0671728 .015975 4.20 0.000 .0358624 .0984831 prst .0060727 .0032929 1.84 0.065 -.0003813 .0125267 _cut1 -2.465362 .2389126 (Ancillary parameters) _cut2 -.630904 .2333155 _cut3 1.261854 .2340179 Approximate likelihood-ratio test of proportionality of odds across response categories: chi2(12) = 48.91 Prob > chi2 = 0.0000

. brant Brant Test of Parallel Regression Assumption Variable chi2 p>chi2 df All 49.18 0.000 12 yr89 13.01 0.001 2 male 22.24 0.000 2 white 1.27 0.531 2 age 7.38 0.025 2 ed 4.31 0.116 2 prst 4.33 0.115 2 A significant test statistic provides evidence that the parallel regression assumption has been violated. Both tests say that the model violates the parallel odds (or regression) assumption.

What should we do in response to this model violation? Most basically, recall the difference between statistical & practical significance. We need to explore changes to the explanatory variables or alternatives that safely ignore the parallel odds assumption: e.g., generalized ordered logit (gologit2, which is downloadable); or else multinomial logit (mlogit). Then compare the results across the kinds of models: Are there practically significant differences? If not, perhaps stick with the ologit model.

One more thing: Brant test is more likely to yield significant results as samples get larger.

. gologit2 warm yr89 male white age ed prst, auto . findit gologit2 . view help gologit . gologit2 warm yr89 male white age ed prst, auto . linktest Conclusion: gologit2 works fine, but the relative ease & clarity of interpreting ologit vs. gologit2 must be considered. On gologit2 & its various options, see: http://www.nd.edu/~rwilliam/gologit2/ gologit2 (generalized ordered ologit):

There are no diagnostics for gologit2 beyond linktest. So let’s return to the ologit model to find out how its diagnostics work.

. ologit warm yr89 male white age ed prst, or nolog table .predict pSD pD pA pSA if e(sample) .su pSD-pSA Variable Obs Mean Std. Dev. Min Max pSD 2293 .1293539 .0793024 .0153572 .4657959 pD 2293 .3152335 .0832117 .073616 .4289543 pA 2293 .3738817 .070512 .1279493 .4407727 pSA 2293 .1815308 .0961532 .0268523 .6067042

.dotplot pSD-pSA

The rest of the diagnostics for ologit or oprobit are mere approximations of diagnostics for the overall model based on sequentially re-estimating the model in higher-versus-lower binary segments: in this example, category D(2) versus SD(1), category A(3) versus D(2), & category SD(4) versus A(3). This is extremely tedious & time consuming. Here, nonetheless, is how to do it.

We must first recode the outcome variable so that the base value=0. We must use not ologit but rather logit regression. And using logit regression, we estimate an ordinal series of binary outcome variables: e.g., 0/1, 1/2, & 2/3.

. recode warm 1=0 2=1 3=2 4=3 . logit warm yr89 male white age ed prst if warm~=2 & warm~=3, nolog . predict p1 if e(sample) [note: p1=pD in original coding] . predict db, db . predict dd, dd . predict dx2, dx2 . predict n, n . su p1-n Then plot the graphs & proceed as discussed under logistic regression.

. drop db-n . logit warm yr89 male white age ed prst if warm~=0 & warm~=1, nolog . predict p2 if e(sample) . predict db, db . predict dd, dd . predict dx2, dx2 . predict n, n . su p2-n Again, plot the graphs & proceed as discussed under logistic regression.

. drop db-n . ologit warm yr89 male white age ed prst if warm~=1 & warm~=2, nolog . predict p3 if e(sample) . predict db, db . predict dd, dd . predict dx2, dx2 . predict n, n . su p3-n Continue plotting graphs & proceed as discussed under logistic regression.

At best, though, they represent approximations. Such diagnostics can alert us to problems of model fit, outliers & influence. At best, though, they represent approximations. At this point we typically would explore predicted probabilities (postgr3, prchange, prtab, prvalue, & prgen-graphs). We’ll look only at postgr3, specifying a separate graph for outcomes 2, 3 & 4.

. xi3:ologit warm yr89 male white age ed prst, or nolog table . postgr3 age, by(male) outcome(2) table Female: top line.

. postgr3 age, by(male) outcome(3) table Female: top line.

. postgr3 age, by(male) outcome(4) table Female: top line.

There are other ordered logit models, such as ordered continuation-ratio (ocratio), which predicts likelihoods of reaching higher versus lower categories that require passing through the lower categories to reach the higher ones. E.g., earning a Ph.D. versus an M.A. versus a B.A. versus a high school degree versus less than a high school degree.

But let’s turn our attention to the multinomial logit model.

An outcome is nominal when the categories are assumed to be unordered. Marital status: divorced, never married, married, or widowed. Occupation: professional, white collar, blue collar, craft, or menial. Race-ethnicity, religion, political affiliation, and citizenship are among the other examples of nominal variables.

We use a multinomial logit model to compare nominal outcomes, or when the assumption of parallel regressions (i.e. parallel odds) is violated. Among the other versions is the conditional logit model (see Long/Freese, chapter 6), which uses the characteristics of the outcomes to predict which choice is made (e.g., your voting options are George Bush Sr., George Bush Jr., or Jeb Bush: which would you choose given the array of choices?)

Multinomial logit models include a lot of parameters, & interpreting the results can be overwhelming. Advice: keep the categories of the outcome variable to the fewest number possible. The STATA-based approaches developed by Long & Freese (chapter 6) are helpful in grappling with the complexities. Here’s an example: . u nomocc2, clear

We’ll pretend that we’ve done the background exploratory analysis & transformations. The research question: how effective are the variables white, ed & exper in predicting whether a sample of respondents work in menial jobs, blue collar jobs, craft jobs, white collar jobs, or professional jobs? . mlogit occ white ed exper, rrr base(5) nolog

. mlogit occ white ed exper, rrr base(5) nolog Multinomial logistic regression Number of obs = 1685 LR chi2(12) = 830.44 Prob > chi2 = 0.0000 Log likelihood = -2134.0024 Pseudo R2 = 0.1629

------------------------------------------------------------------------------ occ | RRR Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- Menial | white | .169601 .0572693 -5.25 0.000 .0874989 .3287412 ed | .4589326 .0235266 -15.19 0.000 .4150621 .50744 exper | .9649771 .0077839 -4.42 0.000 .949841 .9803545 BlueCol | white | .5840301 .2088454 -1.50 0.133 .2897685 1.177116 ed | .4154983 .0186829 -19.53 0.000 .3804478 .4537781 exper | .9695438 .0062475 -4.80 0.000 .957376 .9818662

Craft | white | .2719974 .0787523 -4.50 0.000 .1542104 .4797509 ed | .5040718 .0201306 -17.15 0.000 .4661212 .5451123 exper | .9920646 .005637 -1.40 0.161 .9810776 1.003175 -------------+---------------------------------------------------------------- WhiteCol | white | .8163426 .3173662 -0.52 0.602 .3810256 1.749003 ed | .653316 .0269439 -10.32 0.000 .602585 .7083181 exper | .9989455 .0064144 -0.16 0.869 .9864523 1.011597 ------------------------------------------------------------------------------ (Outcome occ==Prof is the comparison group) rrr (relative risk ratio): mlogit’s relative risk ratio coefficients are an approximation of the real thing; see Statalist on this. The default is to display logit coefficients; ‘outreg,’ ‘estimates,’ & other table commands can display odds ratios.

. listcoef, factor help Or: . listcoef, percent help If you feel more comfortable using odds ratios or percentage change in odds: . quietly mlogit occ white ed exper, base(5) . listcoef, factor help Or: . qui mlogit occ white ed exper, base(5) . listcoef, percent help

We can change the comparison group via base( ), or not specify base( ) & let Stata choose the comparison group.

We can display the results in terms of odds ratios or percentage change in odds either for all the explanatory variables together or for them individually: . listcoef, factor help . listcoef white, factor help . listcoef, percent help . listcoef white, percent help And to simplify the output, we can display only those explanatory variables that attain a specified level of statistical significance: . listcoef white, pvalue(.10)

Recall that cluster invokes robust standard errors. And we could have specified the robust &/or cluster options in the multinomial equation. Recall that cluster invokes robust standard errors.

The problem of perfect prediction: mlogit does not give us a warning message, but rather lists the culprit variables as z=0 (and p>|z|=1). What to do: re-estimate the model, excluding the problem variable & deleting the observations that imply perfect prediction. Identify the problem observations by doing a cross-tab of the problem variable with the outcome variable.

Model specification & related tests: linktest is not an option for mlogit, but Long & Freese have developed mlogtest to greatly facilitate the battery of tests for a multinomial logit model. . mlogtest, lr wald lrcom sm set These & other options can be specified either individually or collectively.

**** Likelihood-ratio tests for independent variables Ho: All coefficients associated with given variable(s) are 0. occ chi2 df P>chi2 white 40.477 4 0.000 ed 784.686 4 0.000 exper 42.805 4 0.000 **** Wald tests for independent variables occ chi2 df P>chi2 white 40.746 4 0.000 ed 424.841 4 0.000 exper 39.975 4 0.000

**** Small-Hsiao tests of IIA assumption Ho: Odds(Outcome-J vs Outcome-K) are independent of other alternatives. Omitted lnL(full) lnL(omit) chi2 df P>chi2 evidence Menial -867.293 -860.532 13.522 4 0.009 against Ho BlueCol -732.720 -727.573 10.295 4 0.036 against Ho Craft -666.973 -660.184 13.578 4 0.009 against Ho WhiteCol -800.319 -791.873 16.892 4 0.002 against Ho

**** LR tests for combining outcome categories Ho: All coefficients except intercepts associated with given pair of outcomes are 0 (i.e., categories can be collapsed). Categories tested chi2 df P>chi2 Menial- BlueCol 20.474 3 0.000 Menial- Craft 16.882 3 0.001 Menial-WhiteCol 66.113 3 0.000 Menial- Prof 323.036 3 0.000 BlueCol- Craft 45.881 3 0.000 BlueCol-WhiteCol 114.015 3 0.000 BlueCol- Prof 628.497 3 0.000 Craft-WhiteCol 49.958 3 0.000 Craft- Prof 479.447 3 0.000 WhiteCol- Prof 133.678 3 0.000

How to test for the joint significance of, say, ‘ethnicity’ (with ‘white’ as the base category)? We can do so using either a likelihood ratio test (which is preferred) or a Wald test: . mlogtest, lr set(black hispanic asian) . mlogtest, wald set(black hispanic asian) These tests of joint significance can be specified together with the other mlogtest options.

Other Fit & Influence Diagnostics . mlogit occ white ed exper, base(5) nolog . predict prM prB prC prW prP if e(sample) . su prM-prP Note: M, B, C, W & P are labels attached to the coded responses 1, 2, 3, 4 & 5.

As for the influence-diagnostic graphs, which are mere approximations as diagnostic tools: Re-code the outcome variable so that the reference group=0. Use not mlogit but rather logit to estimate the model for each binary outcome Estimate the logit model for (at least) each outcome-level versus the base-level (e.g., menial vs. professional, craft versus professional, white collar vs. professional).

How to explore predicted probabilities How to explore predicted probabilities? Returing to the original coding of ‘occ’: . mlogit occ white ed exper, base(5) nolog Then use postgr3, prchange, prtab, prvalue & prgen to predict & graph probabilities. We could also use prchange & mlogview to graph the predictions in another format.

. xi3: mlogit occ white ed exper, base(5) nolog . postgr3 ed, by(white) outcome(1) table Nonwhite: top line.

. postgr3 ed, by(white) outcome(2) table Nonwhite: top line.

. postgr3 ed, by(white) outcome(3) Nonwhite: bottom line.

. postgr3 ed, by(white) outcome(4) table Nonwhite: bottom line.

See their particular commands for mlogit. Predicted probabilities: Long & Freese’s suite of commands (e.g., prvalue, x(…) delta save; prvalue, x(…) delta diff) is relevant. See their particular commands for mlogit. Remember to see Long & Freese’s final chapter to learn how to predict probabilities based on curvilinear explanatory variables.

Finally, consider regression models for another, common form of categorical outcome variable: counts (see Long/Freese, chapter 7). E.g., the number of homicides, suicides, hospitalizations, accidents, alcoholic drinks consumed, academic publications, or wars. OLS regression is commonly but inappropriately applied to such problems. Instead use count models such as poisson & negative binomial regression. That’s all!