Presentation on theme: "Applying Propensity Score Matching Methods in Institutional Research"— Presentation transcript:
1Applying Propensity Score Matching Methods in Institutional Research Stephen L. DesJardinsProfessorCenter for the Study of Higher and Postsecondary EducationSchool of EducationandProfessor, Gerald R. Ford School of Public PolicyUniversity of MichiganCA AIR Conference WorkshopNovember 20, 2014
2Organization of the Workshop Examine conceptual basis of non-experimental methodsThis is a necessary but not sufficient condition for conducting methodologically rigorous researchSurvey conceptual foundations of matching methods, esp. PSM methodsProvide & discuss Stata commands to estimate PSM modelsShare references to readings & sources of code to enhance post-workshop learning
3Importance of Rigor in Research Systematically improving education policies, programs, practices requires understanding of “what works”Goal: Make causal statementsWithout doing so “it is difficult to accumulate a knowledge base that has value for practice or future study” (Schneider, 2007, p. 2).However, education research has lacked rigor & relevance Quote
4Why the Lack of Rigor?Often lack of clarity about the designs & methods optimal for making causal claimsMany researchers were not educated in the application of these methodsMany lack time to learn new methods; may feel they are to complicated to learnHard to create & sustain norms & common discourse about what constitutes rigor
5Policy Changes Driving Push Toward Rigor NCLB Act (2001): Included definition of “scientifically-based” research & set aside funds for studies consistent with definitionEducation Sciences Reform Act (2002) replaced Office of Ed Research & Improvement (OERI) with IESFunding from IES, NSF, & other federal agencies tied to rigorous designs/methodsMany reports focused on need to improve the quality of education research
6Cause and EffectIn randomized control trials (RCTs) the question is: What is effect of a specific program or intervention?Summer Bridge program (intervention) may cause an effect (improved college readiness)Shadish, Cook, & Campbell (2002): Rarely know all the causes of effects or how they relate to one anotherNeed for controls in regression frameworks
7Cause and Effect (cont’d) Holland (1986) notes that true causes hard to determine unequivocally; seek to determine probability that an effect will occurAllows opportunity to est. why some effects occur in some situations but not in othersExample: Completing higher levels of math courses in HS may improve chances of finishing college more for some students than for othersHere we are measuring likelihood that cause led to the effect; not “true” cause/effect
8Determining Causation RCTs are the “gold standard” to determine causal effectsPros: Reduce bias & spurious findings, thereby improving knowledge of what worksCons: Ethics, external validity, cost, errors that are also inherent in observational studiesMeasurement problems; “spillover” effects, attritionPossibilities: Oversubscribed programs (Living Learning Communities, UROP…)
9The Logic of Causal Inference Need to distinguish between inference model specifying cause/effect relation & statistical methods determining strength of relationThe inference model specifies the parameters we want to estimate or testThe statistical technique describes the mathematical procedure(s) to test hypotheses about whether a treatment produces an effect
10A Common Causal Scenario Observed orUnobservedConfoundingVariable(s)Cause(e.g., Treatment)Effect(e.g., EducationalOutcome)
11The Counterfactual Framework Owing to Rubin (1974, 1977, 1978, 1980)Intuition: What would have happened if individual exposed to a treatment was NOT exposed or exposed to a different treatment?Causal effect: Difference between outcome under treatment & outcome if individual exposed to the control condition (no treatment or other treatment)Formally: di = Yit – Yic
12The Fundamental Problem… …of causal inference is that if we observe Yit we cannot simultaneously observe YicHolland (1986) ID’d two solutions to this problem: One scientific, one statisticalScientific: Expose i to treatment 1, measure Y; expose i to treatment 2, measure Y. Difference in outcomes is causal effectAssumptions: Temporal stability (response constancy) & causal transience (effect of 1st treatment does not affect i’s response to 2nd treatment)
13Fundamental Problem (cont’d) Second scientific way: Assume all units are identical, thus, doesn’t matter which unit receives the treatment (unit homogeneity)Give treatment to unit 1 & use unit 2 as control, then compare difference in Y.These assumptions are rarely plausible when studying individualsMaybe when studying twins, as in the MN Twin Family StudyAnd this is not a study of baseball team!
14The Statistical Solution Rather than focusing on units (i), estimate the average causal effect for a population of units (i’s). Formally:di = E(Yt – Yc)where Y’s are average outcomes for individuals in treatment & control groupsAssume: i’s differ only in terms of treatment group assignment, not on characteristics or prior experiences that could affect Y
15ExampleIf we study the effects of being in a summer bridge program on GPA in 1st semester of college, maybe students who select into treatment are materially different than peersIf we could randomly assign students to the program (or not) then we could examine causal impact of program on GPA.Why? Because group assignment would, on average, be independent of any measured or unmeasured pretreatment characteristics.
16Problems with Idealized Solution Random assignment not always possible, so pretreatment characteristics & treatment group assignment independence violatedEven when randomization is used, statistical methods are often used to adjust for confounding variablesBy controlling for student, classroom, school characteristics that predict treatment assignment & outcomesBut this approach is often sub-optimal
17Criteria for Making Causal Statements Causal relativity: Effect of cause must be made compared to effect of another causeCausal manipulation: Units must be potentially exposable to both the treatment & control conditions.Temporal ordering: Exposure to cause must occur at specific time or within specific time period before effectElimination of alternative explanations
18Issues in Employing RCTs May be differences in treated/controls even under randomization: Small samplesEmploy regression methods to control for diffsCross-study comparisons & replication usefulAvg effect in population may not be of most interest: ATT; Heterogeneous treat. effectsTest for sub-group differences of treatmentMechanism for assignment to treatment may not be independent of responsesMerit-based programs & responses (“halo”)
19Issues in Employing RCTs (cont’d) Responses of treated should not be affected by treatment of others (“spillover” effects)e.g.: New retention program initiated; controls respond by being demoralized (motivated), leading to bias upward (downward) of the treatment effects.Treatment non-compliance & attritionRandom assignment of students to programs; but some will leave programs before completionITT analysis; remove non-compliers; focus on “true compliers”
20Quasi/Non-Experimental Designs Compared to RCTs, no randomizationMany quasi-experimental designsMany are variation of pre-test/post-test structure without randomizationApply when non-experimental (“observational”) data used, which is often case in ed. researchPros: When properly done may be more generalizable than RCTsMain Problem: Internal validityDid the “treatment” really produce the effect?
21“Causation” with Observational Data Often difficult to ascertain because of non-random assignment to “treatment”Example: Students often self-select into courses, interventions, programs, may result in biased estimates when “naïve” methods employed to ascertain treatment effectsGoal? Mimic desirable properties of RCTsSolution? Employ designs/methods that account for non-random assignment; will demonstrate some today
22CounterfactualsWhen using observational data the idea is: Find a group that looks like the treated on as many dimensions as you can measureEstablishing what counterfactual is & how to create legitimate control group is difficultThe best counterfactual is one’s self!Adam & Grace time machine exampleOften why you see repeated measures designsTwins study in MN
23The “Naïve” Statistical Approach Y = a + 𝛽1X + 𝜷2T + e (1)where Y is outcome of interest; X is set of controls; T is treatment “dummy”; a & 𝛽 are parameters to be estimated, with 𝜷2 being parameter estimate of interest; e is error term accounting for unmeasured or unobservable factors affecting Y.Problem: If T & e are correlated, then estimate of 𝛽2 will be biased(1) is known as the “outcome” or “structural” equation or sometimes “stage 2”
24Selection Adjustment Methods Fixed effects (FE) methods, instrumental variables (IV), propensity score matching (PSM), & regression discontinuity (RD) designs all have been used to approximate randomized controlled experiment resultsAll are regression-based methodsEach have strengths/weaknesses & their applicability often depends on knowledge of DGP & richness of data available
25Matching MethodsCompare outcomes of similar individuals where only difference is treatment; discard other observationsExample: GEAR UP effects on HS gradLow income (on avg) have lower achievement & are less likely to graduate from HSNaïve comparison of GEAR UP to others likely to give biased results because untreated tend to have higher HS graduation ratesUse matching methods to develop similar non-treated group to compare HS grad rates
26One Remedy: Direct Matching Find control cases with pre-treatment characteristics that are exactly the same as those of the treated groupStrategy breaks down because as number of X’s increases, pr(match) goes to zeroKnown as the “curse of dimensionality”e.g., Matching on 20 binary variables results in 220 or 1,048,576 possible values for X’s!If you add in continuous vars (e.g., GPA, income) problem becomes even more intractable
27Propensity Score Matching Solution: Estimate the “propensity score” (PS) & match treated with control cases based only on this single numberThis approach controls for pre-treatment differences by balancing each group’s set of observable characteristics on a single numberGoal: Estimate treatment effects for individuals with similar observable characteristics, as indexed by the PS
28Estimating the Propensity Score Estimate Pr(treatment)Typically done using logistic regression, but some software uses probitUse PS to find control(s) with “same” score as treated observationEstablishes counterfactual (“control” group)Test for differences in outcomes between treated & counterfactual (“controls”)Often done using regression methods
29Goal of PS MatchingWhen done correctly, probability that treated observation has specific trait (X=x) is same as Pr(untreated) has (X=x)PSM is basically a “resampling” or even “oversampling” method, which involves a bias & variance tradeoffe.g., When matching with replacement, avg. match quality increases & bias decreases, but fewer distinct controls are used, increasing the variance of the estimator
30PSM Assumptions: Conditional Independence Assumption Conditional on observables, there is no correlation between the treatment & outcome that occurs absent the treatmentMathematically: (Y1 ,Y0 ) ┴ D | XAfter controlling for observables, the treatment assignment is as good as randomUpshot: Untreated observations can serve as the counterfactual for the treated
31Assumption: Common Support The probability of receiving treatment for each value of X lies between 0 and 1Mathematically: 0 < P(D = 1| X ) <1AKA the overlap condition because ensures overlap in characteristics of treated & untreated to find matches (common support)Upshot: A match can actually be made between the treated and untreated observations
32Assumptions (cont’d)When CIA & common support are satisfied, treatment assignment is strongly ignorableThough not an assumption, observed characteristics need to be balanced across the treated & untreated groupsIf not, then regardless of whether assumptions hold there will be biased from selection on observable characteristicsCan check for balancing & how much bias is reduced by matching on observables
33Plan of Action for This Portion Discuss logical folder structure to store do files (programs), data, & output filesLearn how Stata works & some basic commandsSimulate DGP to examine consequences of violations of assumptionsLater examine code to undertake PSM modeling & discuss how these techniques might be used in your research
34Importance of Good Structure My bet is that IR folks like you know this already but…Creating a logical folder structure for each project is important step in analysis processIf you use a similar structure all the time you will be able to come back to projects at later date & understand what was doneAlso very important to provide comments in your do files so you know what you didMaybe someone else will pick up your work
35Folder Structure CA AIR 2014 (folder located on C: drive) Articles (contains articles/chapters)Data (contains data files)Do Files (contains do files)Graphs (place to send graphs created by code)Results (place to send output created by code)Powerpoint (contains PowerPoints)Examples of path names:log using “C:\CA AIR 2014\Log Files\CA AIR Log 1.log”, replaceuse “C:\CA AIR 2014\Data\CA AIR PSM DataSub.dta”, clear
36How Stata Works Command or “point & click” driven software Software resides in:C:\Program Files (x86) Stata13 (or Stata12)Type: “adopath” on command line to find paths to the ado files usedRole of “ado” filesExamine ado & help filesDiscuss user written ado & help files
37The “Look” of StataToolbar contains icons that allow you to Open & Save ﬁles, Print results, control Logs, & manipulate windowsOf particular interest: Opening the Do-File Editor, the Data Editor and the Data Browser.Data Editor & Browser: Spreadsheet view of dataDo-File Editor allows you to construct a ﬁle of Stata commands, save them, & execute all/partsThe Current Working Directory is where any ﬁles created in your active Stata session will be saved (by default).Don’t save stuff here, direct to folders discussed above
38Windows in Stata Review, Results, Command, & Variables windows Help: Search for any command/feature. Help Browser, which opens in Viewer window, provides hyperlinks to help pages & to pages in the Stata manuals (which are quite good)May search for help using command lineRole of “findit” & “ssc install”Locate commands in Stata Technical Bulletin & Stata Journal; Demo loading the “psmatch2” commandOn command line type: “ssc describe psmatch2” then “ssc install psmatch2” & then “help psmatch2”
39Stata Program FilesCalled “do” files; contain Stata code/commands we “run” to produce resultsDo File Name:CA AIR PSM Violations Simulation.do in the “Do Files” sub-folder in CA AIR 2014 main project folderLater will use: CA AIR PSM.do in same placeThere are also menu options to run commands in Stata, but we won’t do thisMay be useful for some “on the fly” analysis, but it is NOT a good way to do most projectsReasons: Reproducibility & transportability
40Simulating Condition Violations Before delving into real application of propensity score matching in education research, we will examine effects of a few condition/assumption violations on resultsTo do so, we’ll create “fake” data set so we know true parameters & can therefore figure out bias due to such violations
41Effect of Selection Bias Under Different DGP Scenarios Examine effectiveness of different statistical methods to remedy selection biasCreate artificial data using regression model:y = a + 𝜷x + tw + ewhere x is a control, w is treatment; data is created for y, x, w, e and parameters are:y = x + 2w + eTrue treatment effect known; evaluate bias under different scenarios/using alt. methods
42Simulations Conducted Relax following conditions:No correlation between x and eNo correlation between x and w
43Scenario 1: The Ideal Condition Conditional on observables (x), treatment (w) is independent of the error (e)The scenario mimics the data that would be generated from a randomized studyx is created as an ordinal variable, taking on the values 1, 2, 3, 4If we regress y on x (controls) and w (treatment indicator) we obtain…
44Scenario 2: Ignorable Treatment Assignment Assumption Violated Conditional on observables (x), the treatment (w) is NOT independent of the error (e)All other conditions holdThis is a classic selection bias conditionGiven the correlation between treatment and the error, we’d expect “naïve” regression to result in biased estimate of treatment effect
45Scenario 3: Multicollinearity In this scenario, conditional on observables (x), treatment (w) is independent of the error (e) (ignorable treatment assignment)But we allow x & w to be correlated (there is multicollinearity)Often happens in social science researchThis scenario should not affect the size of the treatment effect, but SEs should be incorrect, thus significance tests wrong
46Scenario 4There is correlation between the regressors and non-ignorable treatment assignmentCorrelation between x and error & tx is continuous instead of ordinalAll other assumptions from Scenario 1 holdPattern in graph is produced by correlation between treatment & error termHappens when control variables (x’s) are omittedKnown as "selection on unobservables"
47Scenario 5In this scenario t and x correlated with the error term; w and x are also correlatedThis scenario assumes the weakest conditions for data generationThe results produced by both the naïve regression and the matching methods result in substantial bias in the estimation of the treatment effect
48Some parents provide the support they are required to, others do not Does Failure of Parents to Provide Required Support Hinder Student Success?Some parents provide the support they are required to, others do notInferential problem: Students who do not get support (“treated”) may be different (on observed & unobserved factors) than those who receive supportCorrelation between Pr(no support) & educational outcomes makes parsing causal effects from observed & unobserved differences in students very difficult
49Empirical ExampleExamine whether lack of expected parental financial support causes differences in:Loan use; attending part-time; worked 20+ hours/week in college; whether student dropped out in year one; completion of a bachelor’s degree within 6 yearsTreatment variable: T = 1 if student did not receive required funds from their parents to pay for college expenses; 0 otherwise
50PSM: Charting the Way, Step 1 Estimate conditional probability of receiving treatment; the “propensity score”Remedy imbalance in treated/controls using variables affecting selection into treatment; choose functional form (logit or probit)e.g. ln p/1-p = a + 𝛽x + tw + ePairs of treated/control cases with similar PS are viewed as “comparable” even though they may have different covariate values
52Step 2: MatchingPropensity score used to match treated to control case(s) to make cases “alike”Extent of “common support” will dictate whether there is match for all treatedLack of will lead to non-matches; loss of casesThus, this is really resampling, with new sample balanced in terms of selection biasMany algorithms available to match cases with similar PS
55Variable SelectionMay want to include large # of variables & remove insignificant onesMay improve fit according to model fit measures, but does not focus on the task at hand: Achieving balance among Xs (satisfying the CIA).An X may not be significant but removing it may remove important variation necessary to satisfy CIA.
56Variable Selection (cont’d) Use conceptual theory & prior research to suggest necessary conditioning XsXs affecting selection into treatment & the outcome can and should be includedNeed to be careful about temporal orderingOnly variables unaffected by participation (or the anticipation of it) should be includedSome debate in literature about specification of PS regression model
57Step 3: Post-Matching Analysis Balanced sample corrects for selection bias & violations of assumptions inherent when using naïve statistical methods to est. effectsUse resample to do multivariate analysis as normally would if DGP from randomizationCould also stratify on PS and compare means between treated/controls in each stratumMany variations on this general 3 step approach; see Guo & Fraser for details
60Different Matching Algorithms Nearest Neighbor: Treated obs matched to control obs with similar PSLatter case used as counterfactual for formerCan perform NN with/without replacementWith: Higher quality matches (< biased) by always using closest neighbor regardless of whether it has been used beforeDoing so increases variance of estimates because fewer untreated units are used in the matching
61Matching Algorithms (cont’d) Without replacement: Order in which matches made is important because matches must be unique. If made in particular order (going from low to higher PS), then systematic biases may be built in.When using NN matching without replacement it is critical that order in which the matches are made be random.Will see how to do this later
62Caliper & Radius Matching Drawback of NN: NN may not be near!Caliper matching: NN & define range in which acceptable matches can be madeBandwidth chosen by researcher; represents max interval in which to make a matchNN outside of bandwidth, no match & treated case has no counterfactual/not usedMethod imposes common support for each observation in the data
63Caliper & Radius (cont’d) Caliper: Treated obs PS = .40 & h=.05Where h is the “bandwidth” Match made if 0.35<= NN <= 0.45.Equivalent when matching with replacement is called “radius” matchingMatches within bandwidth are equally weighted when constructing counterfactualBoth require h & bias/Var tradeoffWider h lowers Var as more data used, but also lowers the match quality & bias increases
64Kernel & Local Linear Regression Both are one-to-many algorithmsUnlike radius, these weight each untreated obs according to how close match isFunction determining weight: the “kernel”As match becomes worse; weight on untreated unit decreasesLLR uses kernel to weight obs but does so using regression-based methodsBoth are computationally intensive
65PS ReweightingSimpler procedure focuses on reweighting & does not involve matching obsAKA “inverse probability weighting”Reweight untreated obs with high (low) PS up (down)Untreated obs with high PS most like treated so weight more heavily than the observations that are dissimilar (as indicated by low PS)Advantage: Program ease because no need to create counterfactuals for each unit one-by-one.
66Inference How to construct SEs of treatment effects? Incorrect to t-test on null ATT=0; doesn’t account for V intro. by estimation of PSSolution: Use teffects command or if using psmatch2 need to bootstrap SEs to obtain correct CIs for estimated effectsRandomly pull obs (with replacement) then calc. effect; draw new sample; est another effect; do this many (e.g., thousands) times
67Inference (cont’d)For NN using psmatch2, bs may not produce accurate SEsLack of “smoothness” of algorithm?Smoother algorithms, such as kernel matching, local linear regression, & PS reweighting may not suffer from similar problemsDespite concerns, bs is most common method for producing SEs in matching methods (if not using teffects command)
68BoundingIf there are unobserved variables that simultaneously affect assignment into treatment & the outcome variable, a hidden bias might arise to which matching estimators are not robustSince estimating the magnitude of selection bias with nonexperimental data is not possible, we address this problem with the bounding approach proposed by Rosenbaum (2002)
69BoundingThe basic question is whether unobserved factors can alter inference about treatment effects. One wants to determine how strongly an unmeasured variable must influence the selection process to undermine the implications of the matching analysis.Rbounds test sensitivity for continuous-outcome variables, mhbounds for binary-outcome variables
70Boundingif there is hidden bias, two individuals with the same observed covariates x have different chances of receiving treatmentSensitivity analysis now evaluates how changing the values of γ and (ui−uj) alters inference about the program effect.individuals who appear to be similar (in terms of x) could differ in their odds of receiving the treatment by as much as a factor of 2. In this sense, eγ is a measure of the degree of departure from a study that is free of hidden bias
71Pros/Cons of PSM Benefits Limitations Make inference from comparable groupFocuses on population of interestUse of propensity score solves the dimensionality problem in direct matchingLimitationsCannot directly control for unobserved characteristics that affect the outcomeCan, however, examine sensitivity of this, which is an innovation in method
72ConclusionsRCTs are desirable in terms of making causal statements, but often difficult to employIn education we often have observational data but methods used to make statements of treatment effects are typically deficientUltimate goal: Make strong (“causal”) statements to improve knowledge of mechanisms that determine program & practice effectivenessWe need to be much more attentive to the problems that arise when we are using observational data
73Other Take AwaysEducation research has not kept pace with advances in quantitative methodsThere are really few good reasons for not applying these new methodsThere is a payoff for doing so: Better information about the mechanisms that affect higher education processes, policies, and outcomesWe need to employ these methods more broadly in IR to ascertain “what works”
74Suggestion: Read This Book… Guo, S. and Fraser, M. W. (2014). Propensity Score Analysis: Statistical Methods and Applications, Second Edition. Thousand Oaks, CA: Sages Publications.Companion page:
75…and Read This ChapterReynolds, C. L., & DesJardins, S. L. (2009). The Use of Matching Methods in Higher Education Research: Answering Whether Attendance at a Two-Year Institution Results in Differences in Educational Attainment. In John Smart (Ed.), Higher Education: Handbook of Theory and Research XXIII:
76Purchasing StataDepending on your needs, there are a number of software options when purchasing StataSingle user/institutional/Grad Plan licensesSmall vs. IC vs. SE versionsPerpetual license; continually updatedStat Transfer softwareSee the Stata website for more information:
77ReferencesAdelman, C. (1999). Answers in the toolbox: Academic intensity, attendance patterns, and bachelor‘s degree attainment. Washington, D.C.: U.S. Department of Education.Adelman, C. (2006). The toolbox revisited: Paths to degree completion from high school through college. Washington, D.C.: U.S. Department of Education.Angrist, J. D., & Pischke, J. S. (2009). Mostly harmless econometrics. Princeton, NJ: Princeton University Press.Caliendo, M. & Kopeinig, S. (2008) Some practical guidance for the implementation of propensity score matching. Journal of Economic Surveys, 22,Cohn, E., & Geske, T. G. (1990). The economics of education (3rd ed.). Oxford: Pergamon Press.Guo, S. and Fraser, M. W. (2010). Propensity Score Analysis: Statistical Methods and Applications. Thousand Oaks, CA: Sages Publications.Companion page:Holland, P. W. (1986). Statistics and causal inference. Journal of the American Statistical Association, 81(396), 945–960.Heckman J. J. (1976). The common structure of statistical models of truncation, sample selection and limited dependent variables and a simple estimator for such models. Annals of Economic and Social Measurement, 5, 475–492.Heckman, J. J. (1979). Sample selection bias as a specification error. Econometrica, 47(1), 153–161.
78ReferencesMincer, J. (1958). Investment in human capital and personal income distribution. Journal of Political Economy, 66(4),Morgan, S. L. and Winship, C. (2007). Counterfactuals and Causal Inference: Methods and Principles for Social Research. Cambridge, UK: Cambridge University Press.Reynolds, C. L., & DesJardins, S. L. (2009). The Use of Matching Methods in Higher Education Research: Answering Whether Attendance at a Two-Year Institution Results in Differences in Educational Attainment. In John Smart (Ed.), Higher Education: Handbook of Theory and Research XXIII:Rose, H., & Betts, J. R. (2001). Math matters: The links between high school curriculum, college graduation, and earnings. San Francisco, CA: Public Policy Institute of California.Rosenbaum, P. R., & Rubin, D. B. (1985). Constructing a Control Group Using Multivariate Matched Sampling Methods That Incorporate the Propensity Score. The American Statistician, 39(1),Rosenbaum, P. R. (2002). Observational Studies. 2nd ed. New York: Springer.Rosenbaum, P. R. (2010). Design of observational studies. New York: SpringerRubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66(5), 688–701.Rubin, D. B. (1977). Assignment of treatment group on the basis of a covariate. Journal of Educational Statistics, 2, 1–26.
79ReferencesRubin, D. B. (1978). Bayesian inference for causal effects: The role of randomization. Annals of Statistics, 6, 34–58.Rubin, D. B. (1980). Discussion of “Randomization analysis of experimental data in the Fisher randomization test” by Basu. Journal of the American Statistical Association, 75, 591–593.Schneider, B., Carnoy, M., Kilpatrick, J., Schmidt, W. H., & Shavelson, R. J. (2007). Estimating Causal Effects Using Experimental and Observational Designs. Washington, DC: American Educational Research Association.Shadish, W. R., Cook, T. D., Campbell, D.T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Boston: Houghton-MifflinStuart, E.A. (2010) Matching methods for causal inference: A review and a look forward. Statistical Science, 25(1), 1-21.
82Recent AERA Report on the Issue “Recently, questions of causality have been at the forefront of educational debates and discussions, in part because of dissatisfaction with the quality of education research…”. A common concern “revolves around the design of and methods used in education research, which many claim have resulted in fragmented and often unreliable findings” (Schneider, et al., 2007)
83Definition of Cause and Effect “A cause is that which makes any other thing, either simple idea, substance, or mode, begin to be; and an effect is that which had its beginning from some other thing” (Locke, 1690/1975, p. 325).
84HoldingIn quintiles, you divide your sample into five groups, the 20% LEAST likely to end up in your treatment group is quintile 1, the 20% with the GREATEST likelihood of ending up in your treatment group is quintile 5, and so on. You match the subjects by quintiles. So, if 12% of the treatment group is in quintile 1, you randomly select 12% of the control subjects from quintile 1. In nearest neighbor matching, as the name implies, you match each subject in the treatment group with a subject in the control group who is nearest in probability of ending up in the treatment group. Then, there is the calipers (radius) matching, that uses the nearest neighbors within a given radius or interval.ESSENTIAL REFERENCESPropensity score matchingRosenbaum, P.R. and Rubin, D.B. (1983), “The Central Role of the Propensity Score in Observational Studies for Causal Effects”, Biometrika, 70, 1,Caliper matchingCochran, W. and Rubin, D.B. (1973), “Controlling Bias in Observational Studies”, Sankyha, 35,Kernel-based matchingHeckman, J.J., Ichimura, H. and Todd, P.E. (1997), “Matching As An Econometric Evaluation Estimator: Evidence from Evaluating a Job Training Programme”, Review of Economic Studies, 64,Heckman, J.J., Ichimura, H. and Todd, P.E. (1998), “Matching as an Econometric Evaluation Estimator”, Review of Economic Studies, 65,Mahalanobis distance matchingRubin, D.B. (1980), “Bias Reduction Using Mahalanobis-Metric Matching”, Biometrics, 36,
85Data Set UsedData Set Name: CA AIR PSM DataSub.dta that is located in the “Data” sub-folder in the CA AIR 2014 main project folderThe data contains a subset of national education dataOnly select variables are included in the dataset
86SummaryThese methods, and others, can be helpful in studying the effects of programs, process, & practices where random assignment is not possible or feasible.They are regression-based so learning them is an extension of the OLS/logit training many have hadThe results can be displayed in a way so as to make them understandable to policy makers & administrators
87Summary (cont’d)There are many resources available to learn & extend these methodsHigher education literature, Stata (and other) publications, blogs with code & solutions to programming/statistical problemsProfessional development workshopsI hope you’ve found this exercise helpful & that you will be able to use these methods in your IR work