Designing an impact evaluation: Randomization, statistical power, and some more fun…

Slides:



Advertisements
Similar presentations
Povertyactionlab.org Planning Sample Size for Randomized Evaluations Esther Duflo J-PAL.
Advertisements

Understanding Power By Jessica Jorge.
Chapter 16 Inferential Statistics
Statistics Review – Part II Topics: – Hypothesis Testing – Paired Tests – Tests of variability 1.
Statistics.  Statistically significant– When the P-value falls below the alpha level, we say that the tests is “statistically significant” at the alpha.
COURSE: JUST 3900 INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE Instructor: Dr. John J. Kerbs, Associate Professor Joint Ph.D. in Social Work and Sociology.
Chapter 10 Section 2 Hypothesis Tests for a Population Mean
+ Chapter 10 Section 10.4 Part 2 – Inference as Decision.
Planning Sample Size for Randomized Evaluations Jed Friedman, World Bank SIEF Regional Impact Evaluation Workshop Beijing, China July 2009 Adapted from.
Comparing Two Population Means The Two-Sample T-Test and T-Interval.
Review: What influences confidence intervals?
Using Statistics in Research Psych 231: Research Methods in Psychology.
Cal State Northridge  320 Ainsworth Sampling Distributions and Hypothesis Testing.
Understanding Statistics in Research
Chapter 3 Hypothesis Testing. Curriculum Object Specified the problem based the form of hypothesis Student can arrange for hypothesis step Analyze a problem.
Inferences About Process Quality
Chapter 9 Hypothesis Testing.
Today Concepts underlying inferential statistics
Using Statistics in Research Psych 231: Research Methods in Psychology.
The t Tests Independent Samples.
Impact Evaluation Session VII Sampling and Power Jishnu Das November 2006.
SAMPLING AND STATISTICAL POWER Erich Battistin Kinnon Scott Erich Battistin Kinnon Scott University of Padua DECRG, World Bank University of Padua DECRG,
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
1 © Lecture note 3 Hypothesis Testing MAKE HYPOTHESIS ©
Jeopardy Hypothesis Testing T-test Basics T for Indep. Samples Z-scores Probability $100 $200$200 $300 $500 $400 $300 $400 $300 $400 $500 $400.
Chapter 8 Hypothesis testing 1. ▪Along with estimation, hypothesis testing is one of the major fields of statistical inference ▪In estimation, we: –don’t.
Povertyactionlab.org Planning Sample Size for Randomized Evaluations Esther Duflo MIT and Poverty Action Lab.
1 Power and Sample Size in Testing One Mean. 2 Type I & Type II Error Type I Error: reject the null hypothesis when it is true. The probability of a Type.
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
Understanding the Variability of Your Data: Dependent Variable Two "Sources" of Variability in DV (Response Variable) –Independent (Predictor/Explanatory)
1 rules of engagement no computer or no power → no lesson no SPSS → no lesson no homework done → no lesson GE 5 Tutorial 5.
Inference and Inferential Statistics Methods of Educational Research EDU 660.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
TRANSLATING RESEARCH INTO ACTION Sampling and Sample Size Marc Shotland J-PAL HQ.
KNR 445 Statistics t-tests Slide 1 Introduction to Hypothesis Testing The z-test.
Introduction to Statistical Inference Jianan Hui 10/22/2014.
Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.
1 Where we are going : a graphic: Hypothesis Testing. 1 2 Paired 2 or more Means Variances Proportions Categories Slopes Ho: / CI Samples Ho: / CI Ho:
Introduction to Hypothesis Testing: the z test. Testing a hypothesis about SAT Scores (p210) Standard error of the mean Normal curve Finding Boundaries.
Review I A student researcher obtains a random sample of UMD students and finds that 55% report using an illegally obtained stimulant to study in the past.
Framework of Preferred Evaluation Methodologies for TAACCCT Impact/Outcomes Analysis Random Assignment (Experimental Design) preferred – High proportion.
© Copyright McGraw-Hill 2004
Education 793 Class Notes Inference and Hypothesis Testing Using the Normal Distribution 8 October 2003.
2 KNR 445 Statistics Hyp-tests Slide 1 Stage 5: The test statistic!  So, we insert that threshold value, and now we are asked for some more values… The.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 3 – Slide 1 of 27 Chapter 11 Section 3 Inference about Two Population Proportions.
T tests comparing two means t tests comparing two means.
Chapter 13 Understanding research results: statistical inference.
Hypothesis Testing and Statistical Significance
Chapter 7: Hypothesis Testing. Learning Objectives Describe the process of hypothesis testing Correctly state hypotheses Distinguish between one-tailed.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Inferential Statistics Psych 231: Research Methods in Psychology.
AP Stat 2007 Free Response. 1. A. Roughly speaking, the standard deviation (s = 2.141) measures a “typical” distance between the individual discoloration.
Chapter 13 Simple Linear Regression
Section Testing a Proportion
Statistical Quality Control, 7th Edition by Douglas C. Montgomery.
Inference and Tests of Hypotheses
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Chapter 9 Testing A Claim
INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE Test Review: Ch. 7-9
Review: What influences confidence intervals?
Statistical Inference about Regression
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
Sampling and Power Slides by Jishnu Das.
Inferential Statistics
SAMPLING AND STATISTICAL POWER
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
Sample Sizes for IE Power Calculations.
Presentation transcript:

Designing an impact evaluation: Randomization, statistical power, and some more fun…

Designing a (simple) RCT in a couple steps You want to evaluate the impact of something (a program, a technology, a piece of information, etc.) on an outcome. Example: Evaluate the impact of free school meals on pupilss schooling outcomes. You decide to do it through a randomized controlled trial. – Why? The questions that follow: – Type of randomization – What is most appropriate? – Unit of randomization – What do we need to think about? – Sample size > These are the things we will talk about now.

I. Where to start You have an HYPOTHESIS Example: Free meals => increased school attendance => increased amount of schooling => improved test scores. Or could it go the other way? To test your hypothesis, you want to estimate the impact of a variable T on an outcome Y for an individual i. In a simple regression framework: How could you do this? – Compare schools with free meals to schools with no free meals? – Compare test scores before the free meal program was implemented to test scores after? Y i =α i +βT+ε i

You decided to do use a randomized design. Why?? – Randomization removes the selection bias > Trick question: Does the sample need to be randomly sampled from the entire population? – Randomization solves the causal inference issue, by providing a counterfactual = comparison group. While we cant observe Y i T and Y i C at the same time, we can measure the average treatment effect by computing the difference in mean outcome between two a priori comparable groups. We measure: ATE=E[Y T ]- E[Y C ] II. Randomization basics

What to think of when deciding on your design? – Types of randomization/ unit of randomization Block design Phase-in Encouragement design Stratification? The decision should come from (1) your hypothesis, (2) your partners implementation plans, (3) the type of intervention! Example: What would you do? Next step: How many units? = SAMPLE SIZE. Intuition --> Why do we need many observations? II. Randomization basics

Remember, were interested in Mean(T)-Mean(C) We measure scores in 1 treatment school and 1 control school > Can I say anything?

Now 50 schools:

Now 500 schools:

But how to pick the optimal size? -> It all depends on the minimum effect size youd want to be able to detect. Note: Standardized effect sizes. POWER CALCULATIONS link minimum effect size to design. They depend on several factors: – The effect size you want – Your randomization choices – The baseline characteristics of your sample – The statistical power you want – The significance you want for your estimates Well look into these factors one by one, starting by the end… III. Sample size

When trying to test an hypothesis, one actually tests the null hypothesis H 0 against the alternative hypothesis H a, and tries to reject the null. H 0 : Effect size=0 H a : Effect size0 Two types of error are to fear: III. Power calculations (1) Hypothesis testing TRUTH YOUR CONCLUSION Effective (reject H 0 )No effect (cant reject H 0 ) Effective TYPE II ERROR POWER No effectTYPE I ERROR SIGNIFICANCE

SIGNIFICANCE= Probability that youd conclude that T has an effect when in fact it doesnt. It tells you how confident you can be in your answer. (Denoted α) – Classical values: 1, 5, 10% – Hypothesis testing basically comes down to testing equality of means between T and C using a t-test. For the effect to be significant, it must be that the t-stat obtained be greater than the t-stat of the significance level wanted. Or again: must be greater or equal to t α =1.96 III. Power calculations (1) Significance

POWER= Probability that, if a significant effect exists, you will find it for a given sample size. (Denoted κ) – Classical values: 80, 90% To achieve a power κ, it must be that: Or graphically… In short: To have a high chance to detect an effect, one needs enough power, which depends on the standard error of the estimate of ß. III. Power calculations (2) Power

Intuition = the higher the standard error, the less precise the estimate, the more tricky it is to identify an effect, the higher the need for power! – Demonstration: How does the spread of a variable impact on the precision a mean comparison test?? We saw that power depended on the SE of the estimate of ß. But what does this standard error depend on? – Standard deviation of the error (how heterogenous the sample is) – The proportion of the population treated (Randomization choices) – The sample size III. Power calculations (3) Standard error of the estimate

We now have all the ingredients of the equation. The minimum detectable effect (MDE) is: As you can see: – The higher the heterogeneity of the sample, the higher the MDE, – The lower N, the higher the MDE, – The higher the power, the lower the MDE Power calculations in practice, will correspond to playing with all these ingredients to find the optimal design to satisfy your MDE.in practice – Optimal sample size? – Optimal portion treated? III. Power calculations (4) Calculations

Several treatments? – What happens when more than one treatment? – It all depends on what you want to compare !! Stratification? – Reduces the standard deviation Clustered (block) design? – When using clusters, the outcomes of the observations within a cluster can be correlated. What does this mean? – Intra-cluster correlation rhô, the portion of the total variance explained by within variance, implies an increase in overall variance. – Impact on MDE? – In short: the higher rhô, the higher the MDE (increase can be large) III. Power calculations (5) More complicated frameworks

When thinking of designing an experiment: 1.What is your hypothesis? 2.How many treatment groups? 3.What unit of randomization? 4.What is the minimum effect size of interest? 5.What optimal sample size considering power/budget? => Power calculations ! Summary