Presentation is loading. Please wait.

Presentation is loading. Please wait.

P RACTICAL S AMPLING FOR I MPACT E VALUATIONS Marie-Hélène Cloutier 1.

Similar presentations


Presentation on theme: "P RACTICAL S AMPLING FOR I MPACT E VALUATIONS Marie-Hélène Cloutier 1."— Presentation transcript:

1 P RACTICAL S AMPLING FOR I MPACT E VALUATIONS Marie-Hélène Cloutier 1

2 I NTRODUCTION Ideally, want to compare what happens to the same schools with and without the program But impossible → use statistics.  Define treatment and control groups  Compare mean outcome (e.g. test scores) value  Random assignment ensures comparability but do not remove noise… How big should groups be and how should we select them? Warning!  Goal is to give overview of how sampling features affect what it is possible to learn from an impact evaluation  Not make you a sampling expert or give you a headache 2

3 Sampling frame - Representativeness / external validity Which populations or groups are we interested in and where do we find them? Sample size - Groups large enough to credibly detect a meaningful effect How many people/schools/units should be interviewed/observed from that population? 3 I NTRODUCTION

4 S AMPLING FRAME  Census vs Samples?  Sample – Lower cost, faster data collection (avoid capturing dynamics), and smaller data set (improved data quality)  Who are we interested in? Feasibility and what you want to learn a)All schools? b)All public schools? c)All public primary schools? d)All public primary schools in a particular region?  External validity  Can findings from a sample of population (c) inform appropriate programs to help secondary schools?  Can findings from a sample of population (d) inform national policy? 4

5 S AMPLING FRAME Finding the units we’re interested in  Depends on size and type of experiment  Required information before sampling  Complete listing all of units of observation available for sampling in each area or groups 5 ExperimentPrimary Sampling Unit Piloting new national textbooksSchools or Classrooms Early literacy programClassrooms for grades 1 - 3 Incentives for teachers in rural schoolsSchools classified as rural

6 S AMPLE SIZE AND CONFIDENCE Example: simpler question than program impact  Say we wanted to know the average annual expenses of a school  Option 1: We go out and interview 5 randomly selected headmasters and take the average of their responses.  Option 2: We interview 1,000 randomly selected headmasters and average their responses. Which average is likely to be closer to the true average? Why? 6

7 S AMPLE SIZE AND CONFIDENCE Example: simpler question than program impact  Say we wanted to know the average annual expenses of a school  Option 1: We go out and interview 5 randomly selected headmasters and take the average of their responses.  Option 2: We interview 1,000 randomly selected headmasters and average their responses. Which average is likely to be closer to the true average? Why? 7 With IE, need many observations to say with confidence whether average outcome treatment > or < average outcome control

8 Main things to be aware of: 1.Detectable effect size 2.Probability of type 1 error (significance) Probability of type 2 error (1 – power) 3.Variance of outcome(s) 8 C ALCULATING S AMPLE S IZE There is a formula…

9  What is an effect size? The extent to which the intervention affects the outcome of interest E.g. 10% increase in test scores, 25% increase in completion rate  Harder to capture (detect) a smaller effect 9 C ALCULATING S AMPLE S IZE D ETECTABLE E FFECT S IZE

10 Who is taller? Detecting smaller differences is harder 10 C ALCULATING S AMPLE S IZE D ETECTABLE E FFECT S IZE

11  Larger samples  easier to detect smaller effects  E.g. Are test scores similar in schools where teachers receive bonus than in schools where they are not? SampleTest scoresCan we say it is different? 10 schools with bonus68% With very low confidence 10 schools without bonus65% 10 school with bonus80% With high confidence 10 school without bonus50% 500 school with bonus68% With high confidence 500 school without bonus65% 11 C ALCULATING S AMPLE S IZE D ETECTABLE E FFECT S IZE

12 How to determine detectable effect size?  Smallest effect that would prompt a policy response  Smallest cost effective effect E.g. Constructing toilets for girls  significantly ↑ girls access by 10%.  Great - let’s think about how we can scale this up.  significantly ↑ girls access by 0.5%.  Great….uh..wait: we spent all of that money and it only increased test scores by that much? 12 C ALCULATING S AMPLE S IZE D ETECTABLE E FFECT S IZE

13 Minimize 2 types of statistical error: Type 1 error → repeating/continuing a bad program  Minimized after data collection, during analysis Type 2 error → stopping/not scaling up good program  Minimized before data collection 13 Conclusion, based on data analysis is that… there is an impact cannot say there is an impact Intervention has an effect (in reality) No Type 1 errorOK Yes OKType 2 error C ALCULATING S AMPLE S IZE T YPE 1 AND T YPE 2 ERRORS

14  Type 1: significance  Lower significance  Larger samples  Common levels: α = 1% or α = 5%  1% or 5% probability that there is an effect but we think found one  1- Type 2: power  Higher power  Larger samples  Common levels: 1- β = 80% or 1- β = 90%  20% or 10% probability that there is an effect but we cannot detect it 14 C ALCULATING S AMPLE S IZE T YPE 1 AND TYPE 2 ERRORS

15 Less underlying variance  easier to detect difference  smaller sample 15 C ALCULATING S AMPLE S IZE V ARIANCE IN O UTCOME

16  How do we know this before we decide our sample size and collect our data?  Ideal pre-existing data often ….non-existent  Example: EMIS, school census, national assessment  Can use pre-existing data from a similar population  Makes this a bit of guesswork, not an exact science 16 C ALCULATING S AMPLE S IZE V ARIANCE IN O UTCOME

17 1.Multiple treatment arms 2.Group-disaggregated results 3.Clustered design 4.Stratification 17 F URTHER I SSUES

18  Straightforward to compare each treatment separately to the comparison group  To compare multiple treatment groups  larger samples  Especially if treatments very similar, because differences between treatment groups would be smaller  Like fixing a very small detectable effect size  E.g. Distinguish between two amounts of scholarships 18 F URTHER I SSUES 1. M ULTIPLE T REATMENT A RMS

19  Are effects different for men and women? For different grades?  Estimating differences in treatment impacts (heterogenous)  larger samples  Especially difference is expected to react in a similar way 19 F URTHER I SSUES 2. G ROUP -D ISAGGREGATED R ESULTS

20  Sampling units are clusters rather than individuals  Very common in education: outcome of interest at the student level but sampling/randomization unit are villages/schools/classroom Examples: Impact of teacher training on student test scores Primary sampling unitSchools Secondary sampling unitTeachers Outcomes unitStudents F URTHER I SSUES 3. C LUSTERED DESIGN

21 Why?  Minimize or remove contamination – E.g.: In the deworming program, schools was chosen as the unit because worms are contagious  Basic Feasibility/Political considerations – E.g. school-feeding: Cannot include and exclude different students from the same school  Only natural choice – Example: Any education intervention that affect an entire classroom (e.g. flipcharts, teacher training). F URTHER I SSUES 3. C LUSTERED DESIGN

22 Implications of clustering  Outcomes for all the individuals within a unit may be correlated  All villagers are exposed to the same weather  All students share a schoolmaster  The program affect all students at the same time.  The member of a village interact with each other  The sample size needs to be adjusted for this correlation  More correlation btw outcomes → larger sample  Adequate number of groups!!! (often matters less than the number of individuals per groups)  e.g. You CANNOT randomize at the level of the district, with one treated district and one control district!!!! F URTHER I SSUES 3. C LUSTERED DESIGN

23 What?  Sub-populations/blocks defined by value of the control variables  Common strata: geography, gender, sector, etc.  Treatment assignment (or sampling) occurs within these groups Why?  Ensures treatment and control groups are balanced  ↓ sample size because  ↓ variance of the outcome of interest in each strata (most when high correlation btw stratification variables and outcome)  ↓ correlation of units within clusters. F URTHER I SSUES 4. S TRATIFYING

24 Geography example: What’s the impact in a particular region? Sometimes hard to say with any confidence  = T  = C F URTHER I SSUES 4. S TRATIFYING

25 Why do we need strata?  Random assignment to treatment within geographical units  Within each unit, ½ will be treatment, ½ will be control  Similar logic for gender, type of schools, school size, etc F URTHER I SSUES 4. S TRATIFYING

26 S UMMING UP  Your sample size will determine how much you can learn from your IE  Some judgment and guesswork in calculations but important to spend time on them  If sample size is too low: waste of time and money You will not be able to detect a non-zero impact with any confidence Questions? 26

27 E XAMPLE /E XERCISE  Exemple : Sampling efficiency  We generated data from a population  Compute mean and variance  Select random sample of different sizes and compute the average  And see how close the the real population value we get 27 MeanStandard deviation Confidence Interval (95%) Population, 1000006114.91 - Sample, 300061.3915 [60.84, 61.94] Sample, 100060.8615.07 [59.91, 61.80 ] Sample, 30061.7714.59 [60.09, 63.45] Sample, 3066.7314.75 [61.35, 72.11]

28 E XAMPLE /E XERCISE  Exemple : Sample size Country X wishes to improve students’ math performance in grade 2. To do so, the Minisitry of Education of X decides to distribute new math textbooks to those students that they can take home. One year earlier, a national test in Math indicated that the average test scores was 40% with a standard deviation of 19. The national statistics indicate that 15% of the students repeat grade 2. Distributing the textbooks cost on average $125 (cost of the book and distribution). Given that the Minister is unsure of the impact of this program, he would like you to evaluate it. List the different items that you need in order to determine your sample size. Fixe the value of those items. 28


Download ppt "P RACTICAL S AMPLING FOR I MPACT E VALUATIONS Marie-Hélène Cloutier 1."

Similar presentations


Ads by Google