Download presentation

Presentation is loading. Please wait.

Published byRosemary Wymore Modified over 2 years ago

1
The counterfactual logic for public policy evaluation Alberto Martini hard at first, natural later 1

2
Everybody likes “impacts” (politicians, funders, managing authorities, eurocrates) misused Impact is the most used and misused term in evaluation

3
Impacts differ in a fundamental way from outputs and results Outputs and results are observable quantities

4
Can we observe an impact? No, we can’t This is a major point of departure between this and other paradigms

5
As output indicators measure outputs, as result indicators measure results, so supposedly impact indicators measure impacts Sorry, they don’t

6
Almost everything about programmes can be observed (at least in principle): outputs (beneficiaries served, activities done, training courses offered, KM of roads built, sewages cleaned) outcomes/results (income levels, inequality, well-being of the population, pollution, congestion, inflation, unemployment, birth rate)

7
Unlike outputs and results, to measure impacts one needs to deal with unobservables

8
To measure impacts, it is not enough to “count” something, or compare results with targets, or to check progress from baseline It is necessary to deal with causality

9
“Causality is in the mind” J.J. Heckman Nobel Prize Economics 2000

10
How would you define impact/effect? “the difference between a situation observed after an intervention has been implemented and the situation that ………………………………………………………………….. ? would have occurred without the intervention 10

11
There is just one tiny problem with this definition the situation that would have occurred without the intervention cannot be observed 11

12
The social science scientific community has developed the notion of potential outcomes “given a treatment, the potential outcomes is what we would observe for the same individual for different values of the treatment” 12

13
13 Hollywood’s version of potential outcomes

14
A priori there are only potential outcomes of the intervention, but later one becomes an observed outcome, while the other becomes the counterfactual outcome

15
A very intuitive example of the role of counterfactual analysis in producing credible evidence for policy decisions

16
Does learning and playing chess have a positive impact on achievement in mathematics?

17
Policy-relevant question: Should we make chess part of the regular curriculum in elementary schools, to improve mathematics achievement? Or would it be a waste of time? Which kind of evidence do we need to make this decision in an informed way? 17

18
Let us assume we have a crystal ball and we know “truth”: for all pupils we know both potential outcomes—the math score they would obtain if they practiced chess or the score they would obtain if they did not practice chess

19
General rule: what we observe can be very different than what is true

20
Mid ability 1/3 Mid ability 1/3 Types of pupils Low ability 1/3 Low ability 1/3 Practice chess at home and do not gain much if taught in school Practice chess only if taught in school, otherwise they do not learn chess Unable to play chess effectively, even if taught in school 20 High ability 1/3 High ability 1/3 What happens to them

21
Mid ability High ability Low ability If they do play chess at school If they do NOT play at school difference 70 0 0 50 40 10 20 0 0 Potential outcomes 21 math test scores Try to memorize these numbers: 70 50 40 20 10

22
SO WE KNOW THAT 1.there is a true impact but it is small 2. the only ones to benefit are mid ability students, for them the impact is 10 points 22

23
The naive evidence: observe the differences between chess players and non players and infer something about the impact of chess The difference between players and non players measures the effect of playing chess. DO YOU AGREE? 23

24
The usefulness of the potential outcome way of reasoning is to make clear what we observe and we do not observe, and what we can learn and cannot learn from the data, and how mistakes are made 24

25
Mid ability High ability Low ability 70 20 25 What we observe DO YOU SEE THE POINT? average=30

26
Results of the direct comparison Pupils who play chess Average score = 70 points Pupils who do not play chess Average score = 30 points Difference = 40 points is this the impact of playing chess? 26

27
Can we attribute the difference of 40 points to playing chess alone? There are many more factors at play that influence math scores 27 OBVIOUSLY NOT

28
Play chess Math ability Math test scores CS SELECTION PROCESS DIRDIRE DIRECT INFLUENCE Ignoring math ability could severly mislead us, if we intend to interpret the difference in test scores as a causal effect of chess Does it have an impact on? 28

29
First (obvious) lesson we learn Most observed differences tell us nothing about causality We should be careful in general to make causal claims based on the data we observe 29

30
is pretty silly, isn’t it? 30 However, comparing math test scores for kids who have learned chess by themselves and kids who have not

31
Comparing enterprises applying for subsidies with those not applying and call the difference in subsequent investment “the impact of the subsidy” 31 Almost as silly as: Comparing participants of training courses with non participants and calling the difference in subsequent earnings “the impact of training”

32
The raw difference between self- selected participants and non- participants is a silly way to apply the counterfactual approach the problem is selection bias (pre-existing differences) 32

33
Now we decide to teach pupils how to play chess in school Schools can participate or not Now we decide to teach pupils how to play chess in school Schools can participate or not 33

34
We compare pupils in schools that participated in the program and pupils in schools which did not in order to get an estimate of the impact of teaching chess in school 34

35
Pupils in the treated schools Average score = 53 points Pupils in the non treated schools Average score = 29 points Difference = 24 points is this the TRUE impact? 35 We get the following results

36
Mid ability High ability Low ability Schools that participated Schools that did NOT participate 30% 60% 70% 36 There is an evident difference in composition between the two types of schools 20% 10%

37
Mid ability High ability Low ability Schools that participated Schools that did NOT 30% 60% 70 % WEIGHTED Average of 70, 50 and 20 = 53 37 WEIGHTED Average of 70, 40 and 20 = 29 20 % 10 % Average impact = 53 – 29 = 24

38
The difference of 24 points is a combination of the true impact and of the difference in composition If we did not know the truth, we might take 24 as the true impact on math score, and being a large impact, make the wrong decision 38

39
We have two alternatives: statistically adjusting the data or conducting an experiment The mostt 39

40
Any form of adjustment assumes we have a model in mind, we know that ability influences math scores and we know how to measure ability 40

41
But even if we do not have all this information we can conduct a randomized experiment The schools who participate get free instructors to teach chess, provided they agree to exclude one classroom at random 41

42
Results of the randomized experiment Pupils in the treated classes in the volunteer schools Average score = 53 points Pupils in the excluded classes in the volunteer schools Average score = 47 points Difference = 6 points this is very close the TRUE impact 42

43
Mid ability High ability Low ability Schools that volunteered Schools that did NOT volunteer 30% 60% EXPERIMENTALS average of 70, 50 & 20 = 53 EXPERIMENTALS average of 70, 50 & 20 = 53 Impact = 53 – 47 = 6 43 CONTROLS mean of 70, 40 & 20 = 47 random assignment: flip a coin 10%

44
Experiments are not they only way to identify impacts However, it is very unlikely that an experiment will generate grossly mistaken estimates If anything, they tend to be biased toward zero On the other hand, some wrong comparisons can produce wildly mistaken estimates 44

Similar presentations

OK

The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 4 Designing Studies 4.2Experiments.

The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 4 Designing Studies 4.2Experiments.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google