Rerandomization to Improve Covariate Balance in Randomized Experiments Kari Lock Harvard Statistics Advisor: Don Rubin 4/28/11.

Slides:



Advertisements
Similar presentations
Introduction Simple Random Sampling Stratified Random Sampling
Advertisements

Rerandomization in Randomized Experiments STAT 320 Duke University Kari Lock Morgan.
A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
Hypothesis testing Another judgment method of sampling data.
1 Chapter 4 Experiments with Blocking Factors The Randomized Complete Block Design Nuisance factor: a design factor that probably has an effect.
Chapter 4 Randomized Blocks, Latin Squares, and Related Designs
Hypothesis Testing: Intervals and Tests
Part I – MULTIVARIATE ANALYSIS
ANOVA Determining Which Means Differ in Single Factor Models Determining Which Means Differ in Single Factor Models.
Chapter 6 Hypotheses texts. Central Limit Theorem Hypotheses and statistics are dependent upon this theorem.
4. Multiple Regression Analysis: Estimation -Most econometric regressions are motivated by a question -ie: Do Canadian Heritage commercials have a positive.
BCOR 1020 Business Statistics
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
BCOR 1020 Business Statistics Lecture 21 – April 8, 2008.
Topic 3: Regression.
Chapter 2 Simple Comparative Experiments
Inferences About Process Quality
Section 4.4 Creating Randomization Distributions.
BCOR 1020 Business Statistics Lecture 20 – April 3, 2008.
Analysis of Variance & Multivariate Analysis of Variance
Today Concepts underlying inferential statistics
5-3 Inference on the Means of Two Populations, Variances Unknown
Introduction to Regression with Measurement Error STA431: Spring 2013.
Sample Size and Statistical Power Epidemiology 655 Winter 1999 Jennifer Beebe.
Using Covariates in Experiments: Design and Analysis STA 320 Design and Analysis of Causal Studies Dr. Kari Lock Morgan and Dr. Fan Li Department of Statistical.
Statistical Analysis. Purpose of Statistical Analysis Determines whether the results found in an experiment are meaningful. Answers the question: –Does.
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
Inferential statistics Hypothesis testing. Questions statistics can help us answer Is the mean score (or variance) for a given population different from.
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
1 Advances in Statistics Or, what you might find if you picked up a current issue of a Biological Journal.
ANCOVA Lecture 9 Andrew Ainsworth. What is ANCOVA?
Estimation and Hypothesis Testing. The Investment Decision What would you like to know? What will be the return on my investment? Not possible PDF for.
Jeopardy Hypothesis Testing T-test Basics T for Indep. Samples Z-scores Probability $100 $200$200 $300 $500 $400 $300 $400 $300 $400 $500 $400.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Means Tests Hypothesis Testing Assumptions Testing (Normality)
Statistical Analysis Statistical Analysis
1 Chapter 1: Introduction to Design of Experiments 1.1 Review of Basic Statistical Concepts (Optional) 1.2 Introduction to Experimental Design 1.3 Completely.
F OUNDATIONS OF S TATISTICAL I NFERENCE. D EFINITIONS Statistical inference is the process of reaching conclusions about characteristics of an entire.
Two Sample Tests Nutan S. Mishra Department of Mathematics and Statistics University of South Alabama.
Chap 20-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 20 Sampling: Additional Topics in Sampling Statistics for Business.
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
Introduction To Biological Research. Step-by-step analysis of biological data The statistical analysis of a biological experiment may be broken down into.
AP Stats Review.
Education Research 250:205 Writing Chapter 3. Objectives Subjects Instrumentation Procedures Experimental Design Statistical Analysis  Displaying data.
Statistical Power 1. First: Effect Size The size of the distance between two means in standardized units (not inferential). A measure of the impact of.
Basic concept Measures of central tendency Measures of central tendency Measures of dispersion & variability.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Hypothesis Testing A procedure for determining which of two (or more) mutually exclusive statements is more likely true We classify hypothesis tests in.
Properties of OLS How Reliable is OLS?. Learning Objectives 1.Review of the idea that the OLS estimator is a random variable 2.How do we judge the quality.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Sampling, sample size estimation, and randomisation
Statistical Hypotheses & Hypothesis Testing. Statistical Hypotheses There are two types of statistical hypotheses. Null Hypothesis The null hypothesis,
Essential Question:  How do scientists use statistical analyses to draw meaningful conclusions from experimental results?
Introduction to Inferece BPS chapter 14 © 2010 W.H. Freeman and Company.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
ANOVA Assumptions 1.Normality (sampling distribution of the mean) 2.Homogeneity of Variance 3.Independence of Observations - reason for random assignment.
Statistics: Unlocking the Power of Data Lock 5 Exam 2 Review STAT 101 Dr. Kari Lock Morgan 11/13/12 Review of Chapters 5-9.
Matching STA 320 Design and Analysis of Causal Studies Dr. Kari Lock Morgan and Dr. Fan Li Department of Statistical Science Duke University.
Logic and Vocabulary of Hypothesis Tests Chapter 13.
Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.
Tutorial I: Missing Value Analysis
Hypothesis Testing. Suppose we believe the average systolic blood pressure of healthy adults is normally distributed with mean μ = 120 and variance σ.
Rerandomization in Randomized Experiments Kari Lock and Don Rubin Harvard University JSM 2010.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
STA248 week 121 Bootstrap Test for Pairs of Means of a Non-Normal Population – small samples Suppose X 1, …, X n are iid from some distribution independent.
Methods of Presenting and Interpreting Information Class 9.
12 Inferential Analysis.
12 Inferential Analysis.
Rerandomization to Improve Baseline Balance in Educational Experiments
Presentation transcript:

Rerandomization to Improve Covariate Balance in Randomized Experiments Kari Lock Harvard Statistics Advisor: Don Rubin 4/28/11

Randomized experiments are the “gold standard” for estimating causal effects WHY? 1.They yield unbiased estimates 2.They eliminate confounding factors The Gold Standard

RRRR RRRR RRRR RRRR RRRR RRRR RRRR RRRR RRR RRRRR RRRRR RRRRR RRR RRRRR RRRRR RRRRR Randomize RRRRRRRR

12 Females, 8 Males8 Females, 12 Males5 Females, 15 Males15 Females, 5 Males Suppose you get a “bad” randomization What would you do??? Can you rerandomize??? When? How? Covariate Balance - Gender

Rubin: What if, in a randomized experiment, the chosen randomized allocation exhibited substantial imbalance on a prognostically important baseline covariate? Cochran: Why didn't you block on that variable? Rubin: Well, there were many baseline covariates, and the correct blocking wasn't obvious; and I was lazy at that time. Cochran: This is a question that I once asked Fisher, and his reply was unequivocal: Fisher (recreated via Cochran): Of course, if the experiment had not been started, I would rerandomize. The Origin of the Idea

The more covariates, the more likely at least one covariate will be imbalanced across treatment groups With just 10 independent covariates, the probability of a significant difference (at level α =.05) for at least one covariate is 1 – (1-.05) 10 = 40%! Covariate imbalance is not limited to rare “unlucky” randomizations Covariate Imbalance

We all know that randomized experiments are the “gold standard” for estimating causal effects WHY? 1.They yield unbiased estimates 2.They eliminate confounding factors … on average! For any particular experiment, covariate imbalance is possible (and likely!), and conditional bias exists The Gold Standard

Randomize subjects to treatment and control Collect covariate data Specify criteria determining when a randomization is unacceptable; based on covariate balance (Re)randomize subjects to treatment and control Check covariate balance 1) 2) Conduct experiment unacceptable acceptable Analyze results (with a randomization test) 3) 4) RERANDOMIZATION

Theorem: If the treatment groups are exchangeable for each randomization and for the rerandomization criteria, then rerandomization yields an unbiased estimate of the average treatment effect. For exchangeability: Equal sized treatment groups Rerandomization criteria that is objective and does not favor a specific group Unbiased

Unbiased Options for unequal sized treatment groups: Rerandomize into multiple equally sized groups, then combine groups after the rerandomization Discard the extra units to form equal sized groups Rerandomize to lower MSE, with perhaps slight bias Do not rerandomize

Randomization Test: Simulate randomizations to see what the statistic would look like just by random chance, if the null hypothesis were true Rerandomization Test: A randomization test, but for each simulated randomization, follow the same rerandomization criteria used in the experiment As long as the simulated randomizations are done using the same randomization scheme used in the experiment, this will give accurate p-values Rerandomization Test

t-test: Too conservative Significant results can be trusted Regression: Regression including the covariates that were balanced on using rerandomization more accurately estimates the true precision of the estimated treatment effect Assumptions are less dangerous after rerandomization because groups should be well balanced Alternatives for Analysis

We use Mahalanobis Distance, M, to represent multivariate distance between group means: Choose a and rerandomize when M > a Criteria for Acceptable Balance

Distribution of M RERANDOMIZE Acceptable Randomizations p a = Probability of accepting a randomization

Choosing the acceptance probability is a tradeoff between a desire for better balance and computational time The number of randomizations needed to get one successful one is Geometric(p a ), so the expected number needed is 1/p a Computational time must be considered in advance, since many simulated acceptable randomizations are needed to conduct the randomization test Choosing p a

The obvious choice may be to set limits on the acceptable balance for each covariate individually This destroys the joint distribution of the covariates Another Measure of Balance

Since M follows a known distribution, easy to specify the proportion of accepted randomizations M is affinely invariant (unaffected by affine transformations of the covariates) Correlations between covariates are maintained The balance improvement for each covariate is the same (and known)… … and is the same for any linear combination of the covariates Rerandomization Based on M

Theorem: If n T = n C, the covariate means are normally distributed, and rerandomization occurs when M > a, then Covariates After Rerandomization

Percent Reduction in Variance For each covariate x j, the ratio of variances is: and the percent reduction in variance is

Shadish Data n = 445 undergraduate students Randomized Experiment 235 Observational Study 210 Vocab Training 116 Math Training 119 Vocab Training 131 Math Training 79 randomization students choose Shadish, M. R., Clark, M. H., Steiner, P. M. (2008). Can nonrandomized experiments yield accurate answers? A randomized experiment comparing random and nonrandom assignments. JASA. 103(484):

Shadish Data k = 10 covariates

Shadish Data To generate 1000 rerandomizations: p a = 0.1:11.6 seconds p a = 0.01:1.8 minutes p a = 0.001:18.3 minutes p a = : 3.4 hours

Shadish Data To generate 1000 rerandomizations: p a = 0.1:11.6 seconds p a = 0.01:1.8 minutes p a = 0.001:18.3 minutes p a = : 3.4 hours

Shadish Data Rerandomize when M > 1.48 k = 10 covariates p a = Percent Reduction in Variance: 100(1 – v a ) = 88%

Shadish Data

Theorem: If n T = n C, the covariate and outcome means are normally distributed, the treatment effect is additive, and rerandomization occurs when M > a, then and the percent reduction in variance for the estimated treatment effect is where R 2 is the squared canonical correlation. Estimated Treatment Effect After Rerandomization

vava Outcome Variance Reduction

Shadish Data n = 445 undergraduate students Randomized Experiment 235 Observational Study 210 Vocab Training 116 Math Training 119 Vocab Training 131 Math Training 79 randomization students choose Outcomes: score on a vocabulary test, score on a mathematics test

Shadish Data Vocabulary: R 2 = 0.11  Percent Reduction in Variance = 88(0.11) = 9.68% Equivalent to increasing the sample size by 1/(1-0.28) = 1.39 Mathematics: R 2 = 0.32  Percent Reduction in Variance = 88(0.32) = 28.16%

Shadish Data

Since M follows a known distribution, easy to specify the proportion of accepted randomizations M is affinely invariant (unaffected by affine transformations of the covariates) Correlations between covariates are maintained The balance improvement for each covariate is the same… … and is the same for any linear combination of the covariates Rerandomization Based on M Is that good??? Since M follows a known distribution, easy to specify the proportion of accepted randomizations M is affinely invariant (unaffected by affine transformations of the covariates) Correlations between covariates are maintained The balance improvement for each covariate is the same… … and is the same for any linear combination of the covariates

In practice, covariates are often of varying importance We can place covariates into tiers, with the top tier including the most important covariates, the second tier less important, etc. Criteria for acceptable balance can be less stringent for each successive tier Tiers of Covariates

1)Check balance for covariates in tier 1  calculate M 1 using the covariates in tier 1, rerandomize if M 1 > a 1 2) For each successive tier, regress each covariate on all covariates in previous tiers, and check balance for the residuals of that tier  calculate M t using the residuals of tier t, rerandomize if M t > a t a t denotes the acceptance threshold for tier t Tiers of Covariates

Theorem: If the n T = n C and the covariate means are normally distributed, and rerandomization occurs if any M t > a t, then the ratio of variances for covariate x j is Tiers of Covariates where R t 2 denotes the squared canonical correlation between x j and all covariates in tiers 1 through t.

Empirical vs Theoretical Values

Rerandomization is easily extended to multiple treatment groups To measure balance, one of the MANOVA test statistics (such as Wilks’  ) can be used, measuring the ratio of within group variability to between group variability for multiple covariates This is equivalent to rerandomizing based on Mahalanobis distance if used on only two groups Multiple Treatment Groups

This is not meant to replace blocking, and blocking and rerandomization can be used together Blocking is great for balancing a few very important covariates, but is not easily used for many covariates “Block what you can, rerandomize what you can’t” Blocking

The theoretical results given depend on M ~  k 2 If the covariate means are not normally distributed, that is fine! The theoretical results given are used nowhere in analysis. Rerandomization does NOT depend on asymptotics. The threshold a corresponding to p a, and the percent reduction in variance for the covariates can be estimated via simulation Small Samples or Non-Normality

Rerandomization improves covariate balance between the treatment groups, giving the researcher more faith that an observed effect is really due to the treatment If the covariates are correlated with the outcome, rerandomization also increases precision in estimating the treatment effect, giving the researcher more power to detect a significant result Conclusion Thank you!!!