Presentation on theme: "Overview of Conditional Logistic Regression Gur Hoshen BASAS 7-7-9."— Presentation transcript:
Overview of Conditional Logistic Regression Gur Hoshen BASAS 7-7-9
What is Conditional Logistic Regression? Also known as fixed effects (not be confused with fixed vs. random effects model) Useful for non-experimental data In experimental studies, one ensures that groups of interest have similar characteristics In non-experimental studies, traditional way was to use explanatory variables to account for variances in outcomes What happens if we do not have these variables at our disposal? Use each individual as a control Models are based on variances within individuals
Conditional Logistic Regression: a brief history in SAS Phreg STRATA ID had limitations in number of records Strata statement in PROC LOGISTIC version 9 http://www2.sas.com/proceedings/sugi27/p257-27.pdf http://www2.sas.com/proceedings/sugi27/p257-27.pdf Has matched pairs example Paul Allison’s excellent examples under http://www2.sas.com/proceedings/sugi31/184-31.pdf Longitudinal studies on poverty, repeat offenders STRATA ID (not to be confused with SURVEY PROCs)
Conlog: a slight twist Can also be used in a completely different context of employment equity, like promotions. Minorities do not get their fair share of promotions What does this have to be with Conlog? How do we answer such allegations? Consider a simple case first: What is fair share?
Conlog: a slight twist Example below has data aggregated requisition, but we do have individual data. AGGRE GATED job data minorit y applied white applie d minority promote d white promot ed expecte d minorit y promot ed differen ce A44042-2 B28100.20.8 C333300 D400000 All1315472.2-1.2
Back to Conlog In this particular situation, the STRATA is not the individual ID but the job. Also, in the previous table assumes everyone equally qualified. Individual qualifications are the explanatory variables account for selections and reduce the disparities Expected (predicted) promotions is equal to actual at the level of the STRATA in Conlog Strata where no one (or everyone) is selected do not contribute to disparities “noninformative” strata in SAS Strata with just whites or just minorities do not contribute to disparities
Actual Settled Case Allegation that certain race did not receive promotions proportional to their representation About 670 applicants About 70 job requisitions How do we measure this shortfall with no qualifications? Calculate shortfall from one job to the next, then aggregate across all requisitions Assumes minority is equally likely to be selected as white Selection in each job requisition is independent of other job openings Think of it as drawing balls from urns of different colors
Actual Settled Case AGGR EGATE D job data minor- ity applie d white applie d Minor- ity promo ted white promo ted Expdm in- ority promo ted differe nce A44042.0-2.0 B28100.20.8 C33333.00.0 D4500 A few dozen +-- All351323113134120.7-9.7
Actual Settled Case Notice again that requisitions with no selections or where everyone is selected have disparities equal to zero Yes, we deal with partial people. This is one way of quantifying disparities and $s. Some requisitions have positive, others negative shortfall so one can see which particulars areas are problematic.
Actual Settled Case What if we use unconditional logistic regression? What if we use simulated variables to account for the disparity? Simulated variables in order to tell a simple story What if we use simulated variables which do not account for the disparity in order to see if we are over- fitting the model? Common allegation is there are too many variables in model used in order to capitalize on chance
Variables in Model (100 simulations) What kind of variables go into model: Education (BA degree or not) Work experience: roughly triangular with lots of “0”s
Parameter Estimates 100 simulations type Con- log Regu- lar Con- log Regu- lar Con- log Regu- lar Con- log Regu- lar Estim- ate Stan- dard Error Odd Ratio Odds Ratio Pr > Chi- Squ- are typeVariab le Uncorr elated BA-0.02-0.040.260.201.030.990.430.40 years0.00 0.020.011.00 0.440.47 Correl ated BA1.85 0.280.226.656.56<.000 years0.11 0.020.011.12 <.000
Disparities Parameter estimates are not meaningful in terms of disparities but one can use predicted probabilities that an individual will be selected. These can be aggregated to a meaningful disparity. Uncorrelated variables for conditional regression has a disparity of -9.7 (-9.7 if used no qualifications) Note that, even though odd ratios were closers to zero for uncorrelated model for regular model, disparity increases to -15.6 from -9.7. Why? l l Variables in model ConditionalRegular correlated with outcome -6.1-8.2 Uncorrelated-9.7-15.6!
l AGG REGA TED job data mino rity appli ed white appli ed mino rity prom oted whi te pro mot ed Expect ed No Quals Con- Log Reg- ular Diff No Qu als Diff Con- Log Diff Reg- ular A44042.01.91.8-2.0-1.9-1.8 B28100.21.6 0.8-0.6 C33333.03.0*2.80.0 0.2 D45000.00.0*0.10.0 -0.1 A few doze n+-- All351323113134120.7119.1121.2-9.7-6.1-8.2 * Noninformative strata where we fill in data
Disparities Regular regression ignores that some strata may have no (or 100%) selections and yet calculate non-zero expected number of selections, as well as strata with one race only If do regular regression by requisition, would have on average, 9-10 applicants per position so could not use many variables. Could do regression by level of position (low, medium, high) rather than just one overall regression, but still have problem that some pools with no (or 100%) selections will have non-zero disparities l l
Conclusions Can use Conlog in situations where people are competing against one another Could be for scholarships, college admissions, not just in the labor market Note that if strata have large number of records, resultant disparities are close to regular logistic regressions Can take long time to run if want to output predicted (expected) probabilities Questions? l l