P RACTICAL S AMPLING FOR I MPACT E VALUATIONS Marie-Hélène Cloutier 1.

Slides:



Advertisements
Similar presentations
Povertyactionlab.org Planning Sample Size for Randomized Evaluations Esther Duflo J-PAL.
Advertisements

Designing an impact evaluation: Randomization, statistical power, and some more fun…
AADAPT Workshop South Asia Goa, December 17-21, 2009 Kristen Himelein 1.
Comparing One Sample to its Population
THE DISTRIBUTION OF SAMPLE MEANS How samples can tell us about populations.
Economics 105: Statistics Review #1 due next Tuesday in class Go over GH 8 No GH’s due until next Thur! GH 9 and 10 due next Thur. Do go to lab this week.
INFERENTIAL STATISTICS. Descriptive statistics is used simply to describe what's going on in the data. Inferential statistics helps us reach conclusions.
Planning Sample Size for Randomized Evaluations Jed Friedman, World Bank SIEF Regional Impact Evaluation Workshop Beijing, China July 2009 Adapted from.
What is a sample? Epidemiology matters: a new introduction to methodological foundations Chapter 4.
Review: What influences confidence intervals?
Who are the participants? Creating a Quality Sample 47:269: Research Methods I Dr. Leonard March 22, 2010.
T-tests Computing a t-test  the t statistic  the t distribution Measures of Effect Size  Confidence Intervals  Cohen’s d.
Evaluating Hypotheses Chapter 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics.
Sampling.
Sampling and Sample Size Determination
Stat 512 – Lecture 12 Two sample comparisons (Ch. 7) Experiments revisited.
Independent Sample T-test Often used with experimental designs N subjects are randomly assigned to two groups (Control * Treatment). After treatment, the.
Agenda: Block Watch: Random Assignment, Outcomes, and indicators Issues in Impact and Random Assignment: Youth Transition Demonstration –Who is randomized?
Practical Sampling for Impact Evaluations Cyrus Samii, Columbia University.
Impact Evaluation Session VII Sampling and Power Jishnu Das November 2006.
SAMPLING AND STATISTICAL POWER Erich Battistin Kinnon Scott Erich Battistin Kinnon Scott University of Padua DECRG, World Bank University of Padua DECRG,
Determining the Size of
Sampling and Sample Size Part 2 Cally Ardington. Lecture Outline  Standard deviation and standard error Detecting impact  Background  Hypothesis testing.
17 June, 2003Sampling TWO-STAGE CLUSTER SAMPLING (WITH QUOTA SAMPLING AT SECOND STAGE)
Sample Design.
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
Copyright 2010, The World Bank Group. All Rights Reserved. Agricultural Census Sampling Frames and Sampling Section A 1.
Chapter 1: Introduction to Statistics
Sampling: Theory and Methods
Analyzing Reliability and Validity in Outcomes Assessment (Part 1) Robert W. Lingard and Deborah K. van Alphen California State University, Northridge.
Chapter 7 Sampling and Sampling Distributions Sampling Distribution of Sampling Distribution of Introduction to Sampling Distributions Introduction to.
Povertyactionlab.org Planning Sample Size for Randomized Evaluations Esther Duflo MIT and Poverty Action Lab.
F OUNDATIONS OF S TATISTICAL I NFERENCE. D EFINITIONS Statistical inference is the process of reaching conclusions about characteristics of an entire.
AADAPT Workshop Latin America Brasilia, November 16-20, 2009 Laura Chioda World Bank 1.
Concept of Power ture=player_detailpage&v=7yeA7a0u S3A.
Education Research 250:205 Writing Chapter 3. Objectives Subjects Instrumentation Procedures Experimental Design Statistical Analysis  Displaying data.
Slide 1 Estimating Performance Below the National Level Applying Simulation Methods to TIMSS Fourth Annual IES Research Conference Dan Sherman, Ph.D. American.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 10. Hypothesis Testing II: Single-Sample Hypothesis Tests: Establishing the Representativeness.
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
PowerPoint presentation to accompany Research Design Explained 6th edition ; ©2007 Mark Mitchell & Janina Jolley Chapter 10 The Simple Experiment.
1 MARKETING RESEARCH Week 5 Session A IBMS Term 2,
Sampling for an Effectiveness Study or “How to reject your most hated hypothesis” Mead Over, Center for Global Development and Sergio Bautista, INSP Male.
Business Project Nicos Rodosthenous PhD 04/11/ /11/20141Dr Nicos Rodosthenous.
Sampling Techniques 19 th and 20 th. Learning Outcomes Students should be able to design the source, the type and the technique of collecting data.
5-4-1 Unit 4: Sampling approaches After completing this unit you should be able to: Outline the purpose of sampling Understand key theoretical.
TRANSLATING RESEARCH INTO ACTION Sampling and Sample Size Marc Shotland J-PAL HQ.
Part III – Gathering Data
S-005 Sampling. Some basic terminology Population –Larger group (the research population) –Sometimes can be quite small Sample –A subset of the population.
Review I A student researcher obtains a random sample of UMD students and finds that 55% report using an illegally obtained stimulant to study in the past.
Sampling Fundamentals 2 Sampling Process Identify Target Population Select Sampling Procedure Determine Sampling Frame Determine Sample Size.
Inferential Statistics Introduction. If both variables are categorical, build tables... Convention: Each value of the independent (causal) variable has.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 7-1 Chapter 7 Sampling and Sampling Distributions Basic Business Statistics 11 th Edition.
Sampling and Statistical Analysis for Decision Making A. A. Elimam College of Business San Francisco State University.
IMPACT EVALUATION WORKSHOP ISTANBUL, TURKEY MAY
Basic Business Statistics
Designing Studies In order to produce data that will truly answer the questions about a large group, the way a study is designed is important. 1)Decide.
Slide 7.1 Saunders, Lewis and Thornhill, Research Methods for Business Students, 5 th Edition, © Mark Saunders, Philip Lewis and Adrian Thornhill 2009.
Common Pitfalls in Randomized Evaluations Jenny C. Aker Tufts University.
IMPACT EVALUATION PBAF 526 Class 5, October 31, 2011.
Chapter 9 Introduction to the t Statistic
Lecture 5.  It is done to ensure the questions asked would generate the data that would answer the research questions n research objectives  The respondents.
Development Impact Evaluation Initiative Innovations in Investment climate reforms Paris, Nov 14, 2012 in collaboration with the Investment Climate Global.
AP Seminar: Statistics Primer
AP Seminar: Statistics Primer
Power, Sample Size, & Effect Size:
Analyzing Reliability and Validity in Outcomes Assessment Part 1
Sampling and Power Slides by Jishnu Das.
SAMPLING AND STATISTICAL POWER
Analyzing Reliability and Validity in Outcomes Assessment
Sample Sizes for IE Power Calculations.
Presentation transcript:

P RACTICAL S AMPLING FOR I MPACT E VALUATIONS Marie-Hélène Cloutier 1

I NTRODUCTION Ideally, want to compare what happens to the same schools with and without the program But impossible → use statistics.  Define treatment and control groups  Compare mean outcome (e.g. test scores) value  Random assignment ensures comparability but do not remove noise… How big should groups be and how should we select them? Warning!  Goal is to give overview of how sampling features affect what it is possible to learn from an impact evaluation  Not make you a sampling expert or give you a headache 2

Sampling frame - Representativeness / external validity Which populations or groups are we interested in and where do we find them? Sample size - Groups large enough to credibly detect a meaningful effect How many people/schools/units should be interviewed/observed from that population? 3 I NTRODUCTION

S AMPLING FRAME  Census vs Samples?  Sample – Lower cost, faster data collection (avoid capturing dynamics), and smaller data set (improved data quality)  Who are we interested in? Feasibility and what you want to learn a)All schools? b)All public schools? c)All public primary schools? d)All public primary schools in a particular region?  External validity  Can findings from a sample of population (c) inform appropriate programs to help secondary schools?  Can findings from a sample of population (d) inform national policy? 4

S AMPLING FRAME Finding the units we’re interested in  Depends on size and type of experiment  Required information before sampling  Complete listing all of units of observation available for sampling in each area or groups 5 ExperimentPrimary Sampling Unit Piloting new national textbooksSchools or Classrooms Early literacy programClassrooms for grades Incentives for teachers in rural schoolsSchools classified as rural

S AMPLE SIZE AND CONFIDENCE Example: simpler question than program impact  Say we wanted to know the average annual expenses of a school  Option 1: We go out and interview 5 randomly selected headmasters and take the average of their responses.  Option 2: We interview 1,000 randomly selected headmasters and average their responses. Which average is likely to be closer to the true average? Why? 6

S AMPLE SIZE AND CONFIDENCE Example: simpler question than program impact  Say we wanted to know the average annual expenses of a school  Option 1: We go out and interview 5 randomly selected headmasters and take the average of their responses.  Option 2: We interview 1,000 randomly selected headmasters and average their responses. Which average is likely to be closer to the true average? Why? 7 With IE, need many observations to say with confidence whether average outcome treatment > or < average outcome control

Main things to be aware of: 1.Detectable effect size 2.Probability of type 1 error (significance) Probability of type 2 error (1 – power) 3.Variance of outcome(s) 8 C ALCULATING S AMPLE S IZE There is a formula…

 What is an effect size? The extent to which the intervention affects the outcome of interest E.g. 10% increase in test scores, 25% increase in completion rate  Harder to capture (detect) a smaller effect 9 C ALCULATING S AMPLE S IZE D ETECTABLE E FFECT S IZE

Who is taller? Detecting smaller differences is harder 10 C ALCULATING S AMPLE S IZE D ETECTABLE E FFECT S IZE

 Larger samples  easier to detect smaller effects  E.g. Are test scores similar in schools where teachers receive bonus than in schools where they are not? SampleTest scoresCan we say it is different? 10 schools with bonus68% With very low confidence 10 schools without bonus65% 10 school with bonus80% With high confidence 10 school without bonus50% 500 school with bonus68% With high confidence 500 school without bonus65% 11 C ALCULATING S AMPLE S IZE D ETECTABLE E FFECT S IZE

How to determine detectable effect size?  Smallest effect that would prompt a policy response  Smallest cost effective effect E.g. Constructing toilets for girls  significantly ↑ girls access by 10%.  Great - let’s think about how we can scale this up.  significantly ↑ girls access by 0.5%.  Great….uh..wait: we spent all of that money and it only increased test scores by that much? 12 C ALCULATING S AMPLE S IZE D ETECTABLE E FFECT S IZE

Minimize 2 types of statistical error: Type 1 error → repeating/continuing a bad program  Minimized after data collection, during analysis Type 2 error → stopping/not scaling up good program  Minimized before data collection 13 Conclusion, based on data analysis is that… there is an impact cannot say there is an impact Intervention has an effect (in reality) No Type 1 errorOK Yes OKType 2 error C ALCULATING S AMPLE S IZE T YPE 1 AND T YPE 2 ERRORS

 Type 1: significance  Lower significance  Larger samples  Common levels: α = 1% or α = 5%  1% or 5% probability that there is an effect but we think found one  1- Type 2: power  Higher power  Larger samples  Common levels: 1- β = 80% or 1- β = 90%  20% or 10% probability that there is an effect but we cannot detect it 14 C ALCULATING S AMPLE S IZE T YPE 1 AND TYPE 2 ERRORS

Less underlying variance  easier to detect difference  smaller sample 15 C ALCULATING S AMPLE S IZE V ARIANCE IN O UTCOME

 How do we know this before we decide our sample size and collect our data?  Ideal pre-existing data often ….non-existent  Example: EMIS, school census, national assessment  Can use pre-existing data from a similar population  Makes this a bit of guesswork, not an exact science 16 C ALCULATING S AMPLE S IZE V ARIANCE IN O UTCOME

1.Multiple treatment arms 2.Group-disaggregated results 3.Clustered design 4.Stratification 17 F URTHER I SSUES

 Straightforward to compare each treatment separately to the comparison group  To compare multiple treatment groups  larger samples  Especially if treatments very similar, because differences between treatment groups would be smaller  Like fixing a very small detectable effect size  E.g. Distinguish between two amounts of scholarships 18 F URTHER I SSUES 1. M ULTIPLE T REATMENT A RMS

 Are effects different for men and women? For different grades?  Estimating differences in treatment impacts (heterogenous)  larger samples  Especially difference is expected to react in a similar way 19 F URTHER I SSUES 2. G ROUP -D ISAGGREGATED R ESULTS

 Sampling units are clusters rather than individuals  Very common in education: outcome of interest at the student level but sampling/randomization unit are villages/schools/classroom Examples: Impact of teacher training on student test scores Primary sampling unitSchools Secondary sampling unitTeachers Outcomes unitStudents F URTHER I SSUES 3. C LUSTERED DESIGN

Why?  Minimize or remove contamination – E.g.: In the deworming program, schools was chosen as the unit because worms are contagious  Basic Feasibility/Political considerations – E.g. school-feeding: Cannot include and exclude different students from the same school  Only natural choice – Example: Any education intervention that affect an entire classroom (e.g. flipcharts, teacher training). F URTHER I SSUES 3. C LUSTERED DESIGN

Implications of clustering  Outcomes for all the individuals within a unit may be correlated  All villagers are exposed to the same weather  All students share a schoolmaster  The program affect all students at the same time.  The member of a village interact with each other  The sample size needs to be adjusted for this correlation  More correlation btw outcomes → larger sample  Adequate number of groups!!! (often matters less than the number of individuals per groups)  e.g. You CANNOT randomize at the level of the district, with one treated district and one control district!!!! F URTHER I SSUES 3. C LUSTERED DESIGN

What?  Sub-populations/blocks defined by value of the control variables  Common strata: geography, gender, sector, etc.  Treatment assignment (or sampling) occurs within these groups Why?  Ensures treatment and control groups are balanced  ↓ sample size because  ↓ variance of the outcome of interest in each strata (most when high correlation btw stratification variables and outcome)  ↓ correlation of units within clusters. F URTHER I SSUES 4. S TRATIFYING

Geography example: What’s the impact in a particular region? Sometimes hard to say with any confidence  = T  = C F URTHER I SSUES 4. S TRATIFYING

Why do we need strata?  Random assignment to treatment within geographical units  Within each unit, ½ will be treatment, ½ will be control  Similar logic for gender, type of schools, school size, etc F URTHER I SSUES 4. S TRATIFYING

S UMMING UP  Your sample size will determine how much you can learn from your IE  Some judgment and guesswork in calculations but important to spend time on them  If sample size is too low: waste of time and money You will not be able to detect a non-zero impact with any confidence Questions? 26

E XAMPLE /E XERCISE  Exemple : Sampling efficiency  We generated data from a population  Compute mean and variance  Select random sample of different sizes and compute the average  And see how close the the real population value we get 27 MeanStandard deviation Confidence Interval (95%) Population, Sample, [60.84, 61.94] Sample, [59.91, ] Sample, [60.09, 63.45] Sample, [61.35, 72.11]

E XAMPLE /E XERCISE  Exemple : Sample size Country X wishes to improve students’ math performance in grade 2. To do so, the Minisitry of Education of X decides to distribute new math textbooks to those students that they can take home. One year earlier, a national test in Math indicated that the average test scores was 40% with a standard deviation of 19. The national statistics indicate that 15% of the students repeat grade 2. Distributing the textbooks cost on average $125 (cost of the book and distribution). Given that the Minister is unsure of the impact of this program, he would like you to evaluate it. List the different items that you need in order to determine your sample size. Fixe the value of those items. 28