Presentation on theme: "The use of administrative data in Randomised Controlled Trials (RCT’s) John Jerrim Institute of Education, University of London."— Presentation transcript:
The use of administrative data in Randomised Controlled Trials (RCT’s) John Jerrim Institute of Education, University of London
Structure What is an RCT? What are the advantages of RCT’s? What are their limitations? How can administrative data help overcome these limitations? Implications for GSS…..
Context… My experience is in conducting RCT’s in education.... ….this is the context I am talking about today BUT – has implications for RCT’s in other areas
What is an RCT? Recruit a group of willing participants….. X% (usually 50%) assigned to TREATMENT (T) X% assigned to CONTROL (C) In absence of intervention: E(T) = E(C) Hence, if after intervention, we find…… µ(T) > µ(C) …… then this is due to the treatment
Advantages (well known…..) When conducted well….. Rules out influence of confounders….…..hence gives causal effect of T Highly policy relevant Simplicity! Means + t-test. Easy to communicate Standardised reporting / conduct protocols - CONSORT - Trial registration Often described as the GOLD STANDARD
… In reality, RCT’s also have important limitations…… … though people talk about these a lot less!
A lack of power? In education: mostly cluster RCT’s Rather than randomise individuals….. Randomise whole schools Issue = ICC (ρ). Low power…… EXAMPLE Secondary schools (clusters) = 100 200 children per school ρ = 0.20 20,000 pupils in trial Minimum detectable effect = 0.25 standard deviations 95% CI = 0 to 0.50 standard deviations
Costly…. Imagine it costs £5 to test each child in this trial…… …you have spent £100,000 just on a post-test! Got to deliver intervention in 50 schools (expensive…..) Many EEF secondary school RCT’s > £500,000 …….. …..average detectable effect across trials = 0.25 Big ££ for quite wide confidence intervals……
Attrition Schools (and pupils within schools) drop out of the trial….. ….particularly when assigned to the control group! Problems - Breaks randomisation. Loses key advantage of the RCT - Lose power Example (my trial) - 50 schools. 25 Treatment and 25 control - Treatment follow-up = 23 / 25 schools - Control follow-up = 9 / 25 schools Worst of all worlds: - Bias (selection effects) - Low power - High cost
Short-term follow-up only Test / follow-up often immediately at the end of the trial …....often when intervention most effective BUT we are really interested in long-run, lasting effects I.e. Much point ↑ age 11 test scores if kids don’t do any better at age 16?? Ideally want short, medium and long-term follow-up….. ….but this again ↑ $$$
External validity Most RCT’s recruit participants via convenience sampling….. ….not from a well defined population How “weird” is our sample of trial participants? Have mainly rich pupils? Have only high-performing schools? How far can we generalise results? BIG ISSUE: - Will we still get an effect when we scale up / roll-out? BUT, FRANKLY, OFTEN IGNORED IN RCT’S
What data is available? Lucky in education. Have the National Pupil Database (NPD). - School census. Children’s school 3 times per year. - Assessments at ages 5, 7, 11, 14, 16, 18. - Demographics (FSM, gender, EAL, ethnicity etc) Strengths of NPD - Known for whole state school population - Low measurement error - Low missing data - Track children over time
NPD to increase power One way to ↑ power is to control for stuff that is linked to the outcome…. …use NPD for this purpose EXAMPLE Maths mastery Year 7 kids New way of teaching them maths Test end of year 7 CONTROL for KS2 MATH scores from NPD Detectable effect = 0.36 without control (CI = 0 to 0.72) = 0.22 with NPD controls (CI = 0 to 0.44) MASSIVE BOOST TO POWER
NPD to reduce cost….. In previous example, could have conducted a pre-test rather than use NPD. Maths Mastery in 50 schools of 200 children = 10,000 kids £5 per test. Hence pre-test would have cost a minimum of £50,000 ADMINISTRATIVE DATA SAVED THIS MONEY…. NPD data is there, ready to use. - LETS USE IT! - Doing a separate pre-test here would have had almost no benefit
NPD to reduce attrition Schools would have had to have taken time out of maths lessons to conduct this pre-test….. …there would be significant administrative burden on them to conduct the test This burden is a major reason for control schools dropping out Administrative data has…. (i) massively reduced the burden on schools (ii) Improved validity of the trial
NPD to eliminate attrition Clever design with NPD data means we can (almost) eliminate drop-out EXAMPLE: Chess in Schools - Year 5 children learn how to play chess during one school year - 50 treatment schools receive chess - 50 control schools = ‘business as usual’ - Use age 7 (Key Stage 1) as the pre-test scores - Use age 11 (Key Stage 2) as the post-test scores Almost no burden on schools (no testing to be done) Key stage 2 results for all children Have test scores even if they move schools…… …..should have very little attrition
NPD for long-run follow-up EXAMPLE: Chess in Schools Trial conducted in Year 5 (age 9/10). First follow at end Year 6 (age 10/11). Treatment and control children then move onto secondary school. Will be able to track these children via their unique pupil number. Hence long- run control: Do treatment children do better in math GCSE? (Age 16) Are they more likely to study maths post-16? Are they more likely to enter a high-status university? Administrative data means we can answer these questions at little extra cost. Can answer the question – is there a lasting impact of the treatment?
NPD for external validity / generalisability Most RCT’s based upon non-random samples of willing participants. Big issue. But often glossed over! Without random samples, how do we know if study results generalise to a wider (target) population? Admin data – give us some handle on this…….. As we have data for (almost) every child/person in the country……. …….We can examine how similar trial participants are to target population in terms of observable characteristics
Data access Everything in an RCT should be pre-specified in design stage….. To use admin data in RCT – need to be 100% sure it will be available Speed of data delivery Design phase = never as long as we ideally want…. Some of these things need quick access to the data E.g. Stratification. Get ‘better’ randomisation Implications for GSS
Documentation and ease of use Admin data can be hard to understand. E.g. School URN’s changing over time in NPD Need good documentation to ensure proper use Training needed….. Opening and linking data across departments In education, can track test scores using NPD But what about other outcomes? E.g. Health outcomes (relevant for some trials?) E.g. Labour market outcomes Implications for GSS
RCT’s are a very powerful research design….. …BUT we have to remember their limitations Administrative data have the potential to help us overcome many of the limitations often associated with RCT’s Together, give us a strong research design coupled with large scale, high quality data Conclusions