The use of administrative data in Randomised Controlled Trials (RCT’s) John Jerrim Institute of Education, University of London.

Slides:

Advertisements

Similar presentations

Survey design. What is a survey?? Asking questions – questionnaires Finding out things about people Simple things – lots of people What things? What people?

Advertisements

NIHR Research Design Service London Enabling Better Research Forming a research team Victoria Cornelius, PhD Senior Lecturer in Medical Statistics Deputy.

Sample size issues & Trial Quality David Torgerson.

Linking administrative data to TALIS and PISA

Study Designs in Epidemiologic

Designs to Estimate Impacts of MSP Projects with Confidence. Ellen Bobronnikov March 29, 2010.

Adapting Designs Professor David Torgerson University of York Professor Carole Torgerson Durham University.

Building Evidence in Education: Workshop for EEF evaluators 2 nd June: York 6 th June: London

Experimental Research Designs

Conference for EEF evaluators: Building evidence in education Hannah Ainsworth, York Trials Unit, University of York Professor David Torgerson, York Trials.

Reading the Dental Literature

Correlation AND EXPERIMENTAL DESIGN

Chapter 3 Producing Data 1. During most of this semester we go about statistics as if we already have data to work with. This is okay, but a little misleading.

Who are the participants? Creating a Quality Sample 47:269: Research Methods I Dr. Leonard March 22, 2010.

Impact and outcome evaluation involve measuring the effects of an intervention, investigating the direction and degree of change Impact evaluation assesses.

Impact Evaluation: The case of Bogotá’s concession schools Felipe Barrera-Osorio World Bank 1 October 2010.

Questions What is the best way to avoid order effects while doing within subjects design? We talked about people becoming more depressed during a treatment.

11 Populations and Samples.

Group Discussion Describe the similarities and differences between experiments , non-experiments , and quasi-experiments. Actions for Describe the similarities.

Types of Evaluation.

SAMPLING AND STATISTICAL POWER Erich Battistin Kinnon Scott Erich Battistin Kinnon Scott University of Padua DECRG, World Bank University of Padua DECRG,

Chapter 5: Descriptive Research Describe patterns of behavior, thoughts, and emotions among a group of individuals. Provide information about characteristics.

The World Bank Human Development Network Spanish Impact Evaluation Fund.

Cover Letters for Survey Research Studies

Chapter 8 Experimental Research

Experimental Design The Gold Standard?.

Are the results valid? Was the validity of the included studies appraised?

Making a difference? Measuring the impact of an information literacy programme Ann Craig

Moving from Development to Efficacy & Intervention Fidelity Topics National Center for Special Education Research Grantee Meeting: June 28, 2010.

EBD for Dental Staff Seminar 2: Core Critical Appraisal Dominic Hurst evidenced.qm.

Evaluating Teacher Performance Daniel Muijs, University of Southampton.

Introduction to Data Analysis Probability Distributions.

Building Evidence in Education: Conference for EEF Evaluators 11 th July: Theory 12 th July: Practice

Measuring Impact: Experiments

GROUP DIFFERENCES: THE SEQUEL. Last time  Last week we introduced a few new concepts and one new statistical test:  Testing for group differences 

T tests comparing two means t tests comparing two means.

Povertyactionlab.org Planning Sample Size for Randomized Evaluations Esther Duflo MIT and Poverty Action Lab.

Quasi Experimental Methods I Nethra Palaniswamy Development Strategy and Governance International Food Policy Research Institute.

Designing a Random Assignment Social Experiment In the U.K.; The Employment Retention and Advancement Demonstration (ERA)

ARROW Trial Design Professor Greg Brooks, Sheffield University, Ed Studies Dr Jeremy Miles York University, Trials Unit Carole Torgerson, York University,

Variables, sampling, and sample size. Overview  Variables  Types of variables  Sampling  Types of samples  Why specific sampling methods are used.

1 rules of engagement no computer or no power → no lesson no SPSS → no lesson no homework done → no lesson GE 5 Tutorial 5.

Rigorous Quasi-Experimental Evaluations: Design Considerations Sung-Woo Cho, Ph.D. June 11, 2015 Success from the Start: Round 4 Convening US Department.

CHAPTER 12 Descriptive, Program Evaluation, and Advanced Methods.

Evaluating Impacts of MSP Grants Hilary Rhodes, PhD Ellen Bobronnikov February 22, 2010 Common Issues and Recommendations.

Impact Evaluation “Randomized Evaluations” Jim Berry Asst. Professor of Economics Cornell University.

Comments on Tradeoffs and Issues William R. Shadish University of California, Merced.

Evaluating Impacts of MSP Grants Ellen Bobronnikov Hilary Rhodes January 11, 2010 Common Issues and Recommendations.

Development Impact Evaluation in Finance and Private Sector 1.

Impact of two teacher training programmes on pupils’ development of literacy and numeracy ability: a randomised trial Jack Worth National Foundation for.

Assertive Mentoring: New Curriculum Welcome parents In association with.

1 Module 3 Designs. 2 Family Health Project: Exercise Review Discuss the Family Health Case and these questions. Consider how gender issues influence.

Evaluating Impacts of MSP Grants Ellen Bobronnikov January 6, 2009 Common Issues and Potential Solutions.

Evaluation Requirements for MSP and Characteristics of Designs to Estimate Impacts with Confidence Ellen Bobronnikov February 16, 2011.

Randomized Assignment Difference-in-Differences

Pilot and Feasibility Studies NIHR Research Design Service Sam Norton, Liz Steed, Lauren Bell.

Assessment at CPS A new way of working. Background - No more levels New National Curriculum to be taught in all schools from September 2014 (apart from.

The rise of RCTs in education: statistical and practical challenges Kevan Collins.

CJ490: Research Methods in Criminal Justice UNIT #4 SEMINAR Professor Jeffrey Hauck.

CONSORT 2010 Balakrishnan S, Pondicherry Institute of Medical Sciences.

* Statutory Assessment Tasks and Tests (also includes Teacher Assessment). * Usually taken at the end of Key Stage 1 (at age 7) and at the end of Key.

Talk Boost A targeted intervention for 4-7 year olds with language delay Wendy Lee Professional Director, The Communication Trust Mary Hartshorne Head.

Setting Consistent Appraisal Targets. Starter: Think about targets that you have been set How did you feel? DepressedScaredStimulatedWorriedChallenged.

IMPACT EVALUATION PBAF 526 Class 5, October 31, 2011.

Looking for statistical twins

Linking administrative data to RCTs

Clinical Studies Continuum

Chapter Eight: Quantitative Methods

Dr. Matthew Keough August 8th, 2018 Summer School

Sample Sizes for IE Power Calculations.

Presentation transcript:

The use of administrative data in Randomised Controlled Trials (RCT’s) John Jerrim Institute of Education, University of London

Structure  What is an RCT?  What are the advantages of RCT’s?  What are their limitations?  How can administrative data help overcome these limitations?  Implications for GSS…..

Context… My experience is in conducting RCT’s in education.... ….this is the context I am talking about today BUT – has implications for RCT’s in other areas

What is an RCT? Recruit a group of willing participants….. X% (usually 50%) assigned to TREATMENT (T) X% assigned to CONTROL (C) In absence of intervention: E(T) = E(C) Hence, if after intervention, we find…… µ(T) > µ(C) …… then this is due to the treatment

Advantages (well known…..) When conducted well….. Rules out influence of confounders….…..hence gives causal effect of T Highly policy relevant Simplicity! Means + t-test. Easy to communicate Standardised reporting / conduct protocols - CONSORT - Trial registration Often described as the GOLD STANDARD

… In reality, RCT’s also have important limitations…… … though people talk about these a lot less!

A lack of power? In education: mostly cluster RCT’s Rather than randomise individuals….. Randomise whole schools Issue = ICC (ρ). Low power…… EXAMPLE Secondary schools (clusters) = children per school ρ = ,000 pupils in trial Minimum detectable effect = 0.25 standard deviations 95% CI = 0 to 0.50 standard deviations

Costly…. Imagine it costs £5 to test each child in this trial…… …you have spent £100,000 just on a post-test! Got to deliver intervention in 50 schools (expensive…..) Many EEF secondary school RCT’s > £500,000 …….. …..average detectable effect across trials = 0.25 Big ££ for quite wide confidence intervals……

Attrition Schools (and pupils within schools) drop out of the trial….. ….particularly when assigned to the control group! Problems - Breaks randomisation. Loses key advantage of the RCT - Lose power Example (my trial) - 50 schools. 25 Treatment and 25 control - Treatment follow-up = 23 / 25 schools - Control follow-up = 9 / 25 schools Worst of all worlds: - Bias (selection effects) - Low power - High cost

Short-term follow-up only Test / follow-up often immediately at the end of the trial …....often when intervention most effective BUT we are really interested in long-run, lasting effects I.e. Much point ↑ age 11 test scores if kids don’t do any better at age 16?? Ideally want short, medium and long-term follow-up….. ….but this again ↑ $$$

External validity Most RCT’s recruit participants via convenience sampling….. ….not from a well defined population How “weird” is our sample of trial participants? Have mainly rich pupils? Have only high-performing schools? How far can we generalise results? BIG ISSUE: - Will we still get an effect when we scale up / roll-out? BUT, FRANKLY, OFTEN IGNORED IN RCT’S

… How can administrative data help?

What data is available? Lucky in education. Have the National Pupil Database (NPD). - School census. Children’s school 3 times per year. - Assessments at ages 5, 7, 11, 14, 16, Demographics (FSM, gender, EAL, ethnicity etc) Strengths of NPD - Known for whole state school population - Low measurement error - Low missing data - Track children over time

NPD to increase power One way to ↑ power is to control for stuff that is linked to the outcome…. …use NPD for this purpose EXAMPLE Maths mastery Year 7 kids New way of teaching them maths Test end of year 7 CONTROL for KS2 MATH scores from NPD Detectable effect = 0.36 without control (CI = 0 to 0.72) = 0.22 with NPD controls (CI = 0 to 0.44) MASSIVE BOOST TO POWER

NPD to reduce cost….. In previous example, could have conducted a pre-test rather than use NPD. Maths Mastery in 50 schools of 200 children = 10,000 kids £5 per test. Hence pre-test would have cost a minimum of £50,000 ADMINISTRATIVE DATA SAVED THIS MONEY…. NPD data is there, ready to use. - LETS USE IT! - Doing a separate pre-test here would have had almost no benefit

NPD to reduce attrition Schools would have had to have taken time out of maths lessons to conduct this pre-test….. …there would be significant administrative burden on them to conduct the test This burden is a major reason for control schools dropping out Administrative data has…. (i) massively reduced the burden on schools (ii) Improved validity of the trial

NPD to eliminate attrition Clever design with NPD data means we can (almost) eliminate drop-out EXAMPLE: Chess in Schools - Year 5 children learn how to play chess during one school year - 50 treatment schools receive chess - 50 control schools = ‘business as usual’ - Use age 7 (Key Stage 1) as the pre-test scores - Use age 11 (Key Stage 2) as the post-test scores Almost no burden on schools (no testing to be done) Key stage 2 results for all children Have test scores even if they move schools…… …..should have very little attrition

NPD for long-run follow-up EXAMPLE: Chess in Schools Trial conducted in Year 5 (age 9/10). First follow at end Year 6 (age 10/11). Treatment and control children then move onto secondary school. Will be able to track these children via their unique pupil number. Hence long- run control: Do treatment children do better in math GCSE? (Age 16) Are they more likely to study maths post-16? Are they more likely to enter a high-status university? Administrative data means we can answer these questions at little extra cost. Can answer the question – is there a lasting impact of the treatment?

NPD for external validity / generalisability Most RCT’s based upon non-random samples of willing participants. Big issue. But often glossed over! Without random samples, how do we know if study results generalise to a wider (target) population? Admin data – give us some handle on this…….. As we have data for (almost) every child/person in the country……. …….We can examine how similar trial participants are to target population in terms of observable characteristics

Implications for GSS

Data access Everything in an RCT should be pre-specified in design stage….. To use admin data in RCT – need to be 100% sure it will be available Speed of data delivery Design phase = never as long as we ideally want…. Some of these things need quick access to the data E.g. Stratification. Get ‘better’ randomisation Implications for GSS

Documentation and ease of use Admin data can be hard to understand. E.g. School URN’s changing over time in NPD Need good documentation to ensure proper use Training needed….. Opening and linking data across departments In education, can track test scores using NPD But what about other outcomes? E.g. Health outcomes (relevant for some trials?) E.g. Labour market outcomes Implications for GSS

Conclusions

RCT’s are a very powerful research design….. …BUT we have to remember their limitations Administrative data have the potential to help us overcome many of the limitations often associated with RCT’s Together, give us a strong research design coupled with large scale, high quality data Conclusions