# Overview of Conditional Logistic Regression

## Presentation on theme: "Overview of Conditional Logistic Regression"— Presentation transcript:

Overview of Conditional Logistic Regression
Gur Hoshen BASAS 7-7-9

What is Conditional Logistic Regression?
Also known as fixed effects (not be confused with fixed vs. random effects model) Useful for non-experimental data In experimental studies, one ensures that groups of interest have similar characteristics In non-experimental studies, traditional way was to use explanatory variables to account for variances in outcomes What happens if we do not have these variables at our disposal? Use each individual as a control Models are based on variances within individuals

Conditional Logistic Regression: a brief history in SAS
Phreg STRATA ID had limitations in number of records Strata statement in PROC LOGISTIC version 9 Has matched pairs example Paul Allison’s excellent examples under Longitudinal studies on poverty, repeat offenders STRATA ID (not to be confused with SURVEY PROCs)

Conlog: a slight twist Can also be used in a completely different context of employment equity, like promotions. Minorities do not get their fair share of promotions What does this have to be with Conlog? How do we answer such allegations? Consider a simple case first: What is fair share?

Conlog: a slight twist Example below has data aggregated requisition, but we do have individual data. AGGREGATED job data minority applied white applied minority promoted white promoted expected minority promoted difference A 4 2 -2 B 8 1 0.2 0.8 C 3 D All 13 15 7 2.2 -1.2

Back to Conlog In this particular situation, the STRATA is not the individual ID but the job. Also, in the previous table assumes everyone equally qualified. Individual qualifications are the explanatory variables account for selections and reduce the disparities Expected (predicted) promotions is equal to actual at the level of the STRATA in Conlog Strata where no one (or everyone) is selected do not contribute to disparities “noninformative” strata in SAS Strata with just whites or just minorities do not contribute to disparities

Actual Settled Case Allegation that certain race did not receive promotions proportional to their representation About 670 applicants About 70 job requisitions How do we measure this shortfall with no qualifications? Calculate shortfall from one job to the next, then aggregate across all requisitions Assumes minority is equally likely to be selected as white Selection in each job requisition is independent of other job openings Think of it as drawing balls from urns of different colors

Actual Settled Case AGGREGATED job data minor-ity applied
white applied Minor-ity promoted white promoted Expdmin-ority promoted difference A 4 2.0 -2.0 B 2 8 1 0.2 0.8 C 3 3.0 0.0 D 5 A few dozen+ -- All 351 323 113 134 120.7 -9.7

Actual Settled Case Notice again that requisitions with no selections or where everyone is selected have disparities equal to zero Yes, we deal with partial people. This is one way of quantifying disparities and \$s. Some requisitions have positive, others negative shortfall so one can see which particulars areas are problematic.

Actual Settled Case What if we use unconditional logistic regression?
What if we use simulated variables to account for the disparity? Simulated variables in order to tell a simple story What if we use simulated variables which do not account for the disparity in order to see if we are over-fitting the model? Common allegation is there are too many variables in model used in order to capitalize on chance

Variables in Model (100 simulations)
What kind of variables go into model: Education (BA degree or not) Work experience: roughly triangular with lots of “0”s

Parameter Estimates 100 simulations type Con-log Regu-lar Estim-ate
Stan-dard Error Odd Ratio Odds Ratio Pr > Chi-Squ-are Variable Uncorrelated BA -0.02 -0.04 0.26 0.20 1.03 0.99 0.43 0.40 years 0.00 0.02 0.01 1.00 0.44 0.47 Correlated 1.85 0.28 0.22 6.65 6.56 <.000 0.11 1.12

Disparities Parameter estimates are not meaningful in terms of disparities but one can use predicted probabilities that an individual will be selected. These can be aggregated to a meaningful disparity. Uncorrelated variables for conditional regression has a disparity of -9.7 (-9.7 if used no qualifications) Note that, even though odd ratios were closers to zero for uncorrelated model for regular model, disparity increases to from Why? l l Variables in model Conditional Regular correlated with outcome -6.1 -8.2 Uncorrelated -9.7 -15.6!

AGGREGATED job data minority applied white applied minority promoted white promoted Expected No Quals Con-Log Reg-ular Diff No Quals Diff Con-Log Diff Reg-ular A 4 2.0 1.9 1.8 -2.0 -1.9 -1.8 B 2 8 1 0.2 1.6 0.8 -0.6 C 3 3.0 3.0* 2.8 0.0 D 5 0.0* 0.1 -0.1 A few dozen+ -- All 351 323 113 134 120.7 119.1 121.2 -9.7 -6.1 -8.2 l * Noninformative strata where we fill in data

Disparities Regular regression ignores that some strata may have no (or 100%) selections and yet calculate non-zero expected number of selections, as well as strata with one race only If do regular regression by requisition, would have on average, 9-10 applicants per position so could not use many variables. Could do regression by level of position (low, medium, high) rather than just one overall regression, but still have problem that some pools with no (or 100%) selections will have non-zero disparities l l

Conclusions Can use Conlog in situations where people are competing against one another Could be for scholarships, college admissions, not just in the labor market Note that if strata have large number of records, resultant disparities are close to regular logistic regressions Can take long time to run if want to output predicted (expected) probabilities Questions? l l