Presentation on theme: "Overview of Conditional Logistic Regression"— Presentation transcript:
1Overview of Conditional Logistic Regression Gur HoshenBASAS 7-7-9
2What is Conditional Logistic Regression? Also known as fixed effects(not be confused with fixed vs. random effects model)Useful for non-experimental dataIn experimental studies, one ensures that groups of interest have similar characteristicsIn non-experimental studies, traditional way was to use explanatory variables to account for variances in outcomesWhat happens if we do not have these variables at our disposal?Use each individual as a controlModels are based on variances within individuals
3Conditional Logistic Regression: a brief history in SAS PhregSTRATA IDhad limitations in number of recordsStrata statement in PROC LOGISTIC version 9Has matched pairs examplePaul Allison’s excellent examples underLongitudinal studies on poverty, repeat offendersSTRATA ID (not to be confused with SURVEY PROCs)
4Conlog: a slight twistCan also be used in a completely different context of employment equity, like promotions.Minorities do not get their fair share of promotionsWhat does this have to be with Conlog?How do we answer such allegations?Consider a simple case first:What is fair share?
5Conlog: a slight twistExample below has data aggregated requisition, but we do have individual data.AGGREGATED job dataminority appliedwhite appliedminority promotedwhite promotedexpected minority promoteddifferenceA42-2B810.20.8C3DAll131572.2-1.2
6Back to ConlogIn this particular situation, the STRATA is not the individual ID but the job.Also, in the previous table assumes everyone equally qualified.Individual qualifications are the explanatory variables account for selections and reduce the disparitiesExpected (predicted) promotions is equal to actual at the level of the STRATA in ConlogStrata where no one (or everyone) is selected do not contribute to disparities“noninformative” strata in SASStrata with just whites or just minorities do not contribute to disparities
7Actual Settled CaseAllegation that certain race did not receive promotions proportional to their representationAbout 670 applicantsAbout 70 job requisitionsHow do we measure this shortfall with no qualifications?Calculate shortfall from one job to the next, then aggregate across all requisitionsAssumes minority is equally likely to be selected as whiteSelection in each job requisition is independent of other job openingsThink of it as drawing balls from urns of different colors
8Actual Settled Case AGGREGATED job data minor-ity applied white appliedMinor-ity promotedwhite promotedExpdmin-ority promoteddifferenceA42.0-2.0B2810.20.8C33.00.0D5A few dozen+--All351323113134120.7-9.7
9Actual Settled CaseNotice again that requisitions with no selections or where everyone is selected have disparities equal to zeroYes, we deal with partial people. This is one way of quantifying disparities and $s.Some requisitions have positive, others negative shortfall so one can see which particulars areas are problematic.
10Actual Settled Case What if we use unconditional logistic regression? What if we use simulated variables to account for the disparity?Simulated variables in order to tell a simple storyWhat if we use simulated variables which do not account for the disparity in order to see if we are over-fitting the model?Common allegation is there are too many variables in model used in order to capitalize on chance
11Variables in Model (100 simulations) What kind of variables go into model:Education (BA degree or not)Work experience: roughly triangular with lots of “0”s
13DisparitiesParameter estimates are not meaningful in terms of disparities but one can use predicted probabilities that an individual will be selected. These can be aggregated to a meaningful disparity.Uncorrelated variables for conditional regression has a disparity of -9.7 (-9.7 if used no qualifications)Note that, even though odd ratios were closers to zero for uncorrelated model for regular model, disparity increases to from Why?llVariables in modelConditionalRegularcorrelated with outcome-6.1-8.2Uncorrelated-9.7-15.6!
14AGGREGATED job dataminority appliedwhite appliedminority promotedwhite promotedExpected No QualsCon-LogReg-ularDiff No QualsDiff Con-LogDiff Reg-ularA42.01.91.8-2.0-1.9-1.8B2810.21.60.8-0.6C33.03.0*2.80.0D50.0*0.1-0.1A few dozen+--All351323113134120.7119.1121.2-9.7-6.1-8.2l* Noninformative strata where we fill in data
15DisparitiesRegular regression ignores that some strata may have no (or 100%) selections and yet calculate non-zero expected number of selections, as well as strata with one race onlyIf do regular regression by requisition, would have on average, 9-10 applicants per position so could not use many variables.Could do regression by level of position (low, medium, high) rather than just one overall regression, but still have problem that some pools with no (or 100%) selections will have non-zero disparitiesll
16ConclusionsCan use Conlog in situations where people are competing against one anotherCould be for scholarships, college admissions, not just in the labor marketNote that if strata have large number of records, resultant disparities are close to regular logistic regressionsCan take long time to run if want to output predicted (expected) probabilitiesQuestions?ll