Presentation is loading. Please wait.

Presentation is loading. Please wait.

The logic of C ounterfactual I mpact E valuation 1.

Similar presentations

Presentation on theme: "The logic of C ounterfactual I mpact E valuation 1."— Presentation transcript:


2 The logic of C ounterfactual I mpact E valuation 1

3 To understand counterfactuals It is necessary to understand impacts

4 Impacts differ in one fundamental way from outputs and results Outputs and results are observable quantities

5 Can we observe an impact? No, we cant

6 As output indicators measure outputs, result indicators measure results, so impact indicators measure impacts Sorry, they dont

7 Almost everything about programmes can be observed (at least in principle): outputs (beneficiaries served, activities done, training courses offered, KM of roads built, sewages cleaned) outcomes/results (income levels, inequality, well-being of the population, pollution, congestion, inflation, unemployment, birth rate)

8 What is needed for M&E of outputs and results are BITs (baselines, indicators, and targets)

9 Unlike outputs and results, to define, detect, understand, and measure impacts one needs to deal with causality

10 Causality is in the mind J.J. Heckman

11 Why this focus on causality? Because, unless we can attribute changes (or differences) to policies, we do not know whether the intervention works, for whom it works, and even less why it works (or does not) Causal questions represents a bigger challenge than non causal questions (descriptive, normative, exploratory) 10

12 The social science scientific community defines impact/effect as the difference between a situation observed after a stimulus has been applied and the situation that would have occurred without such stimulus 11

13 A very intuitive example of the role of causality in producing credible evidence for policy decisions

14 Does playing chess have an impact on math learning?

15 Policy-relevant question: Should we make chess part of the regular curriculum in elementary schools, to improve mathematics achievement? Which kind of evidence do we need to make this decision in an informed way? We can think of three types of evidence, from the most naive to the most credible 14

16 1. The naive evidence: pre-post difference Take a sample of pupils in fourth grade Measure their achievement in math at the beginning of the year Teach them to play chess during the year Test them again at the end of the year 15

17 Results for the pre-post difference Pupils at the beginning of the year Average score = 40 points Difference = 12 points = + 30% Question: what are the implications for making chess compulsory in schools? Have we proven anything? The same pupils at the end of the year Average score = 52 points 16

18 Can we attribute the increase in test score to playing chess? OBVIOUSLY NOT The data tell us that the effect is between zero and 12 points There is not doubt that many more factors are at play So we must dismiss the increase in 10 points as unable to tell us anything about impact. 17

19 The pre-post great temptation The pre-post comparisons have a great advantage: they seem kind of obvious (the pop definition of impact coincides with the pre-post difference) Particularly when the intervention is big, and the theory suggests that the outcomes should be affected This is not the case here, but we should be careful in general to make causal inference based on pre-post comparisons 18

20 The risky alternative: with-without difference Impact = difference between treated and not treated? 19 Compare math test scores for kids who have learned chess by themselves and kids who have not

21 Not really Average score of pupils who already play chess on their own (25% of the total) = 66 points Difference = 21 points = + 47% This difference is OBJECTIVE, but what does it mean, really? Does it have any implication for policy? Average score of pupils who DO NOT play chess on their own (75% of the total) = 45 points 20

22 This evidence tells us almost nothing about making chess compulsory for all students The data tell us that the effect of playing chess is between zero and 21 points. Why? The observed difference could entirely be due to differences in mathematical ability that exist before the courses, between the two groups 21

23 Play chess Math innate ability Math test scores CS SELECTION PROCESS DIRDIRE DIRECT INFLUENCE Ignoring math ability could severly bias the results, if we intend to interpret them as causal effect Does it have an impact on? 66 – 45: real effect or the fruit of sorting? 22

24 Counterfeit Counterfactual Both the raw difference between self-selected participants and non-participants, and the raw change between pre and post are a caricature of the counterfactual logic In the case of raw differences, the problem is selection bias (predetermined differences) In the case of raw changes, the problem is maturation bias (a.k.a. natural dynamics) 23

25 The modern way to understand causality is to think in terms of POTENTIAL OUTCOMES Let us imagine we know the score that kids would get if they played and they would get if they did not 24

26 Lets say there are three levels of ability Kids in the top quartile (top 25%) learn to play chess on their own Kids in the two middle quartiles learn if they are taught in school Kids in the bottom quartile (last 25%) never learn to play chess 25

27 Mid math ability 50% Mid math ability 50% High math ability 25% High math ability 25% Low math ability 25% Low math ability 25% Play chess by themselves Do not play chess Unless taught in school Never learn to play 26

28 Mid math ability High math ability Low math ability If they do play chess If they do NOT play chess Impact = gain from playing chess 66 56 10 54 48 6 6 40 0 0 Potential outcomes 27

29 Mid math ability High math ability Low math ability For those who play chess For those who do not play chess 66 48 40 Observed outcomes 45 the difference of 21 points is NOT an IMPACT, it is just an OBSERVED difference Mid/Low math ability combined 28

30 The problem: we do not observe the counterfactual(s) For the treated, the counterfactual is 56, but we do not see it The true impact is 10, but we do not see it Still we cannot use 45, that is the untreated observed outcome We can think of decomposing the 68-45 difference as the sum of the true impact on the treated and the effect of sorting 29

31 Low/mid math ability High math ability If play chess If do not play chess Decomposing the observed difference 66 56 = 10 Impact for players = 10 Impact for players 45 =21 Observed difference =21 Observed difference = 11 preexisting differences 21 = 10 + 11 30

32 21 = 10 + 11 Observed differences = Impact + Preexisting differences (selection bias) The heart of impact evaluation is getting rid of selection bias, by using experiments or by using some non- experimental methods 21 = 10 + 11 Observed differences = Impact + Preexisting differences (selection bias) The heart of impact evaluation is getting rid of selection bias, by using experiments or by using some non- experimental methods 31

33 Experimental evidence to the rescue Schools get a free instructor to teach chess to one class, if they agree to select the class at random among the fourth grade classes Now we have the following situation 32

34 Results of the randomized experiment Pupils in the selected classes Average score of randomized chess players = 60 points Pupils in the excluded classes Average score of NON chess players = 52 points Difference = 8 points Question: what does this difference tell us? 33

35 Thus we are able to isolate the effect of chess from other factors (but some problems remain) The results tell us that teaching chess truly improves math performance (by 8 points, about 15%) 34

36 Mid ability High ability Low ability If they do play chess If they do NOT play chess Composition of population 66 56 25% 54 48 50% 40 25% Averages 54 48 100% Impact Impact = 54 – 48 = 6 Average Treatment Effect ATE 35

37 Play chess Math ability Math test scores DIRDIRE Note that the experiment does solve all the cognitive problems related to policy design: for example, it does identify impact heterogeneity (for whom it works) 36

38 The ATE is the average effect if every member of the population is treated Generally there is more policy interest in Average Treatment Effect on the Treated ATT = 10 the chess example, while ATE = 6 ( we ran an experiment and got an impact of 8. Can you think why this happens? ) 37

39 Mid ability High ability Low ability Schools that vounteered Schools that DID NOT vounteer 50% 10 50% 6 6 EXPERIMENTAL mean of 66 and 54 = 60 EXPERIMENTAL mean of 66 and 54 = 60 True impact Impact = 60 – 52 = 8 38 50% 0 0 CONTROL mean of 56 and 48 = 52 Internal validity Little external validity

40 Lessons learned Impacts are differences, but not all differences are impacts Differences (and changes) have many causes, but we do not need to undersand all the causes We are especially interested in one cause, the policy, and we would like to eliminate all the counfounding causes of the difference (or change) Internal vs. External validity 39

41 An example of a real ERDF policy Grants to small enterprises to invest in R&D 40

42 To design an impact evaluation, one needs to answer three important questions 1. Impact of what? 2. Impact for whom? 3. Impact on what?

43 AVERAGEN PRE65.0002400 POST75.0002400 OBSERVED CHANGE10.000 R&D EXPENDITURES AMONG THE FIRMS RECEIVING GRANTS Is 10.000 the true average impact of the grant? 42

44 43

45 44

46 The fundamental challenge to this assumption is the well known fact that things change over time by natural dynamics How do we disentangle the change due to the policy from the myriad changes that would have occurred anyway? 45





51 We cannot use experiments with firms, for obvious (?) political reasons The good news is that there are lots of non-experimental counterfactual methods 50

52 The difference-in-differences (DID) is a combination of the first two strategies And it is a good way to understand the logic of (non-experimental) counterfactual evaluation 51

53 52

54 53

55 54

56 55

57 56

58 57



61 POST DIFFERENCE =15.000 - PRE DIFFERENCE =10.000 = Impact = 5000 60

62 61

63 CAN WE TEST THE PARALLELISM ASSUMPTION? With four observed means, we cannot The parallelism becomes testable if we have two additional data points pre-intervention PRE-PRE 62

64 63

65 64

66 65

67 66

68 67

69 68

70 69 WHEN TO USE DIFF-IN-DIFF? When we have longitudinal data and have reasons to believe that most of what drives selection is individual unobserved characteristics

71 70 Second, the path taken by the controls must be a plausible approximation of what would happen to the treated The following is an example in which it would be better NOT to use DID

72 71

73 72 58.00065.0007.000 57.00055.000-2.000 9.000 65.00075.00010.000 55.00067.00012.000 -2.000 Diff-in-diff-in-diff-11.000

74 73 58.00065.000 7.00065.000 72.000 75.000 Linearly projected impact 3.000

Download ppt "The logic of C ounterfactual I mpact E valuation 1."

Similar presentations

Ads by Google