Presentation is loading. Please wait.

Presentation is loading. Please wait.

Development Impact Evaluation Initiative Innovations in Investment climate reforms Paris, Nov 14, 2012 in collaboration with the Investment Climate Global.

Similar presentations


Presentation on theme: "Development Impact Evaluation Initiative Innovations in Investment climate reforms Paris, Nov 14, 2012 in collaboration with the Investment Climate Global."— Presentation transcript:

1 Development Impact Evaluation Initiative Innovations in Investment climate reforms Paris, Nov 14, 2012 in collaboration with the Investment Climate Global Practice Practical Sampling for Impact Evaluations Aidan Coville* 1 *Presentation draws from slides presented by Laura Chioda

2 Sampling Variation SAMPLE 1SAMPLE 2 HEIGHT (CM) 149 cm 157 cm 149 cm Which sample is correct? Answer: Neither... And both

3 So what do we do now?  We want accurate measures  But have to deal with sampling error  If we take a census we’ll get exact measures  So let’s take a census…  Need to think about the marginal value of added observations

4 The answer is… = 42

5 The End  Questions?

6 Calculating sample size  Think of the sample size as the accuracy of a measuring device: (assuming representative random samples drawn)  Example: guess the sentence below  Here, the # of revealed letters is analogous to the # of observations  where each letter, say, costs US$ 100,000  You have US$ 2M with which to uncover up to 20 letters (all of them)  If you guess wrong, you loose all of your investment 6 More observations More precision More confidence

7 Calculating sample size  Let’s increase the number of “observations” (in this case letters)  This is so much easier 7

8 Outline (a search for “n”) 1. Ingredients to determine sample size  Detectable effect size  Confidence/Probabilities of avoiding mistakes in inference (type I & type II errors)  Variance of outcome(s)  Clustering level 2. Enhancements  Multiple treatments  Group-disaggregated results 3. Detractions 1. Take up 2. Data quality 4. So what can we do (a guide to maximizing power) 8

9 Let’s run through an example  Intervention: Risk-based inspection  Q: what is the impact of a new risk-based inspections procedure on restaurant compliance with health safety standards?  Methods: Randomize the implementation of the regime at the town level.  Sample size:…?

10 Outline 1. Ingredients to determine sample size  Detectable effect size  Confidence/Probabilities of avoiding mistakes in inference (type I & type II errors)  Variance of outcome(s)  Clustering level 2. Enhancements  Multiple treatments  Group-disaggregated results 3. Detractions 1. Take up 2. Data quality 4. So what can we do (a guide to maximizing power) 10

11 Ingredients Detectable Effect Size ConfidenceVariance of outcomesClustering

12 Detectable Effect Size (1/3)  We do not know in advance the effect of our policy. We want to design a precise way of measuring it  But precision is not cheap: need cost-benefit analysis to decide  1 st ingredient: Smallest program effect size that you wish to detect  i.e. the smallest effect for which we would be able to conclude that it is statistically different from zero  “detect” is used in a statistical sense 12

13 Detectable Effect Size (2/3)  Cost-benefit analysis guides us in determining “smallest detectable effect”:  “What is the smallest effect of the program, below which we would consider a failure?”  That could be useful for policy  and could justify the cost of impact evaluations, etc.  The smaller are the (EXPECTED) differences between treatment & control …  … the more precise the instrument has to be to detect them  The larger the sample needs to be 13

14 Detectable effect size (3/3) 14 The larger is the sample  the more precise is the measuring device  the easier it is to detect smaller effects Increasing sample size ≈ increasing precision (of our measuring device) Who is taller?

15 Ingredients Detectable Effect Size ConfidenceVariance of outcomesClustering

16 Confidence: Type I Error (1/2)  The possibility of getting a false positive  Observing a difference between 2 groups when a TRUE difference does not exist  Because of budget concerns, treatment and control have 25 obs. each.  By pure chance, treatment businesses are more diligent. Compliance Treatment (statistically) Larger than Compliance Control 16

17 Confidence: Type II error (2/2)  Failing to detect an effect when a TRUE effect really does exist.  Compliance Treatment very similar (≈) to Compliance Control  Then we could conclude that our program has “no” effect for 2 reasons:  i.e. That treatment and control outcomes are not statistically different 1. Because our instrument is not precise (Bad Inference) 2. Because the program indeed had no effect (Good Inference)  Unless we have “enough” observations, we would not be able to decide with confidence between possibilities 1. and 2. 17

18 Ingredients Detectable Effect Size ConfidenceVariance of outcomesClustering

19 Variance of Outcomes (1/4)  How does the variance of the outcome affect our ability to detect an impact?  Example: Of the two (circled) populations, which animals are bigger? How many observations from each circle would you need to decide? 19

20 Variance of Outcomes (2/4)  Example: on average which group has the larger animals?  Comparison is more complicated in this case, such that you need more information (i.e. a larger sample)  answer may depend on which members of the blue & red groups you observes 20

21 Variance of Outcomes (3/4)  A more economic example: let’s look at our businesses and compliance rates  Imagine that the risk-based inspections leads to an increase in compliance (impact) from 50% to 60% on average  Case A: businesses are all very similar and the distribution of compliance rates is very concentrated  Case B: businesses are very different and distribution of compliance rates are spread out (distributions overlap more)  Which instance requires a more precise measuring device? 21

22 Variance of Outcomes (4/4)  In sum:  More underlying variance (heterogeneity)   more difficult to detect difference   need larger sample size  Tricky: How do we know about heterogeneity before we decide our sample size and collect our data?  Ideal: pre-existing data … but often non-existent  Can use pre-existing data from a similar population  Example: enterprise surveys, labor force surveys  Common sense 22

23 Ingredients Detectable Effect Size ConfidenceVariance of outcomesClustering

24 Clustering (1/4)  Is random sampling done at the…  Business level  Business group level  Village/port/…  Province?  Depends on:  Question being asked/Intervention type  Sampling frame availability  Cost/feasibility  Potential spillovers

25 Clustering (2/4) What is the added value of more samples in the same cluster? Village 1 Village 2 Village 4 Village 3

26 Clustering (3/4) Village 1 Village 2 Village 4 Village 3

27 Clustering (4/4) Takeaway Larger within cluster correlation (guys in same cluster are similar) lower marginal value per extra sampled unit in the cluster higher sample size/more clusters needed than a simple random sample.

28 Outline 1. Ingredients to determine sample size  Detectable effect size  Confidence/Probabilities of avoiding mistakes in inference (type I & type II errors)  Variance of outcome(s)  Clustering level 2. Enhancements  Multiple treatments  Group-disaggregated results 3. Detractions 1. Take up 2. Data quality 4. So what can we do (a guide to maximizing power) 28

29 Multiple treatments (1/2)  Risk-based inspections may increase compliance. But what if restaurants aren’t able to upgrade their processes to comply because of lack of access to credit?  Treatment 1: Risk-based inspections  Treatment 2: Matching grant to upgrade safety processes  Treatment 3: Inspections and grant  Intuition: the more comparisons (treatments) the larger sample size needed to be “confident” 29

30 Multiple treatments (2/2)  To compare treatment groups requires very large samples  The more comparisons you make, the more observations you need  Especially if the various treatments are very similar, differences between the treatment groups can be expected to be smaller 30

31 Strata (1/5)

32 Strata (2/5)  Group-disaggregated results  Are effects different for men and women? For different sectors?  If genders/sectors expected to react in a similar way, then estimating differences in treatment impact also requires very large samples  Group-disaggregated results  To ensure balance across treatment and comparison groups, good to divide sample into strata before assigning treatment  Strata  Sub-populations  Common strata: geography, gender, sector, baseline values of outcome variable  Treatment assignment (or sampling) occurs within these groups (i.e. randomize within strata) 32

33 Strata (3/5)  Example: What is the impact in a particular region?  = Treatment & = Control, assigned randomly  Can you assess with confidence the impact of compliance within regions? A B C

34 Strata (4/5)  To answer consider a few regions:  Region A: we have almost no businesses in the control group  Region B: very few observation, can you be confident?  Region C: no observations at all A B C

35 Strata (5/5)  How to prevent these imbalances and restore confidence in estimates within strata?  Sampling within region can overcome this issue  Random assignment to treatment within geographical units  Within each unit, ½ will be treatment, ½ will be control.  Similar logic for gender, industry, firm size, etc  Which Strata? Your research & policy question should guide you

36 Outline 1. Ingredients to determine sample size  Detectable effect size  Confidence/Probabilities of avoiding mistakes in inference (type I & type II errors)  Variance of outcome(s)  Clustering level 2. Enhancements  Multiple treatments  Group-disaggregated results 3. Detractions 1. Take up 2. Data quality 4. So what can we do (a guide to maximizing power) 36

37 1. Take Up  Example:  No discretionary participation in inspections BUT  We can only offer matching grant. We cannot force businesses to use it  Offer grant to 500 businesses  Only 50 participate  In practice, because of low take up rate, we end up with a less precise measuring device  We won’t be able to detect differences with precision  Can only find an effect if it is really large  Take-up  Low take-up (rate) lowers precision of our comparisons  Effectively decreases sample size 37

38 1. Take up Matching grant application vs. completion rates in Mozambique

39 2. Data Quality  Data quality  Poor data quality effectively increases required sample size  Missing observations  High measurement error  Can be partly addressed with field coordinator on the ground monitoring data collection 39

40 Overview  Who to interview is ultimately determined by our research/policy questions  How Many: 40 Elements:Implication for Sample Size: The smaller effects that we want to detect The larger the sample size will have to be The more (statistical) confidence/precision The more underlying heterogeneity (variance) The more clustering in samples The more complicated design - Multiple treatments - Strata The lower take up The lower data quality

41 Mo precision mo money

42 Need to be realistic

43 How can we boost power  Focus on homogenous group  High frequency data on core indicators  Increase take up!!!!  better quality data (its worth it…)  Avoid clustering where possible  Factorial designs InspectionsNo Inspections Matching grant500 (interaction)500 No Grant500500 (control)


Download ppt "Development Impact Evaluation Initiative Innovations in Investment climate reforms Paris, Nov 14, 2012 in collaboration with the Investment Climate Global."

Similar presentations


Ads by Google