Download presentation
Presentation is loading. Please wait.
Published byAdela Chapman Modified over 7 years ago
1
Replication. An Existential Crisis? Dr Fergus Kane
2
(quoted in Registered Reports and elsewhere)
“The first principle is that you must not fool yourself and you are the easiest person to fool.” Richard Feynman (quoted in Registered Reports and elsewhere)
3
In Summary I’m going to talk about the replication crisis in psychology and science more generally. I’m not going to cover all the literature, just a sample of the most important. I’m going to tell you that yes, there really is a crisis. But perhaps it’s not so bad as all that (in some ways). And the best thing. We can and will fix it (well, if we decide to).
4
A bit about me BSc in Psychology and Politics… but I started with a physics degree.. And some of that stuck MSc in Neurosciece PhD in Brain Imaging. DClinPsych. Fed up with ‘science’. Wanting to do something with immediate impact. Now: A foot in both camps. Clinical and scientific. Still a believer in good science and using this to inform practice and public policy. Here you see me trying to return to ‘hard’ science.
5
My Conflicts of Interest / Priors.
I’m self employed. I’m not generally paid to talk. But I have a vested psychological interest in this topic. I had a hard time in academia, in large part due to the problems we will talk about today. I might want to re-enter academia I believe that much of science is broken and need to be fixed.
6
Research on Reproducibility A small selection
7
1959/1995: “Publication Decisions Revisited”
“Publication Decisions Revisited: The Effect of the Outcome of Statistical Tests on the Decision to Publish and Vice Versa” In 1959 statistician Theodore Sterling found that 97% of results in experimental psychology journals rejected H0, and nothing had changed by 1995. 97% !! What does this mean? We are either not very imaginative, or something else is going on.
8
2005: “Why Most Published Research Findings Are False”
John P. A. Ioannidis. This Classic Paper simulated data to show exactly what is in the title. Drew a number of (unsurprising) corollaries: The greater the flexibility in designs, definitions, outcomes, and analytical modes in a scientific field, the less likely the research findings are to be true The smaller the studies conducted in a scientific field, the less likely the research findings are to be true. The greater the financial and other interests and prejudices in a scientific field, the less likely the research findings are to be true. The smaller the effect sizes in a scientific field, the less likely the research findings are to be true. The greater the number and the lesser the selection of tested relationships in a scientific field, the less likely the research findings are to be true. The hotter a scientific field (with more scientific teams involved), the less likely the research findings are to be true.
9
Ioannidis Recommended in 2005:
Based, as always, on the fact that it is impossible to know the truth of any research question. Better power (but power does not deal with biases). Focus large studies on ideas with good pre-study probability. And studies that may address multiple claims (conceptually) Always consider totality of evidence. How? Upfront registration of studies. Bayesian approach (essentially). Ioannidis
10
2011: “Feeling the future: Experimental evidence for anomalous retroactive influences on cognition and affect.” Difficulties in Refuting Dodgy Research Daryl Bem published in the Journal of Personality and Social Psychology the impossible finding that practicing a list of words after a test improves the previous performance on the test. Three groups tried to replicate and failed. They tried to publish in JPSP and Science and Psychological Science, who both rejected on ground that they do no publish straight replications (even of JPSP’s previously published study). The British Journal of Psychology also rejected it. The replications were finally published in PLOS One (Ritchie, Wiseman & French, 2012). This is only the most egregious example Daryl Bem
11
2011: “Measuring the Prevalence of Questionable Research Practices With Incentives for Truth Telling” Leslie John conducted a survey specifically designed to look at QRPs and found that in >2000 psychologists: >55% had looked at significance before deciding whether to collect more data. 27% had reported an unexpected finding as having been predicted from the start >45% had selective reported studies that ‘worked’. And much more… Leslie John Seriously!
12
2012: “Negative results are disappearing from most disciplines and countries”
Daniele Fanelli analysed >4600 papers published between 1990 and 2007 that had claimed to have tested a hypothesis. Over this time the frequency of positive results increased by > 22%. This trend was steeper for social sciences such as psychology than for physical sciences. From 70% to 90%! There were also interesting findings by country (read for yourself). Daniele Fanelli
13
2015: “Estimating the reproducibility of psychological science”
Study initiated by Brian Nosek. Conducted replications of 100 experimental and correlational studies from three journals, all from 2008. Mean effect size of replication was half that of the original (0.18 vs 0.40). 97% of original studies had significant results Just 36% of replications had significant results. SEEMS pretty bad. Combining original and replication: 68% were significant. Brian Nosek
14
Ah…. But... Wait. How to interpret failure to replicate.
Just 36% replicated. Ok, that seems bad, but actually, what percentage should? It’s very possible that what we are seeing is a lot of type II errors (false negatives) One failed replication certainly does not discredit a study, any more than the original study enshrines it’s result. But given the other research into QRPs etc., and what we see as researchers, from the totality of the evidence we know there is a problem. In actual fact, according to statistical modelling, based on current study power, effect sizes etc, we can only expect a replication rate of around 40%, for a real result (see Rodger Wyat from the BPS youtube video: This is pretty close to the controversial nature paper.
15
2016: “1,500 scientists lift the lid on reproducibility” Nature Survey
22
What drives these problems?
23
Conflicts of Interest For big pharma, and other large industry (Monsanto etc), the conflict of interest is clear and evidence of abuse is abundant. Pharma has been shown to actively suppress negative data, and to hide non-supportive trials. Numerous investigations and lawsuits back this up. But what about for us? ‘Independent’ ‘Inpartial’ scientists.
24
Incentive Structure The Game of Academia
Clearly having published papers helps your career. The more papers you have the better. In some universities, a bonus is paid for a paper in a high impact journal such as Nature. Even when this is not the case, continued success is generally dependent on number of publications achieved and grants received. Productivity more important than quality.
25
Incentive Structure McCleod et al. (2015) found that studies published from the ‘best’ UK universities had lower reported quality than those from less well regarded institutions. Perhaps this is evidence that the best way to get an eye-catching study is to run many low powered ones? Why do I say this?
26
Key Issues The Garden of Forking Paths P-Hacking
Inherent issues with low power, low n, high variability/noise. Fraud Hidden Moderators Publication Bias Researcher Biases
27
The garden of forking paths
28
Outright Fraud. Diederik Stapel faked his data. In at least 30 studies, including papers in journals such as Science. He did this by generating fictitious data files from his ‘network’ of researchers. Journals failed to enquire about sources of data Outed by three young researchers. Stapel
29
Publication Bias / File Drawer.
Negative results are considered as less interesting and perceived as less publishable. But they must be published.. Assuming a study was sound.
30
Researcher Bias & CoI Each researcher and group has their own ideas of what is correct. Some may be very invested in their results. For instance a CBT therapist is not going to want to find that their therapy does not work. It can be hard to overcome this. Even if not consciously (perhaps especially so), researchers may bias their results, or simply fail to publish.
31
P-Hacking Techniques Stop collecting data when p<0.05
Analyse many measures, but don’t report them all or don’t adjust for multiple comparisons. Co-vary to get p < 0.05 Exclude participants to get p <0.05 Transform data to get p <0.05 Etc.
32
Hidden Moderators This one is one of my bugbears.
When a study, especially it seems in MRI, is replicated, it is often done so with a tweak. Thus we can always say: “The activation was found in a different area, this may have been due to slightly different sample population/task design/scanner parameters, anxiety levels, medication, blah blah blah. This is sometimes a function of so many teams working on the same idea. Other times, it may be to do with the idea that all research must be ‘innovative’.
33
Noise / Variability Some branches of science are simply difficult to study due to the noise in the data. Thus visual psychophysics can be done well with just a few participants (who spend a long time doing some very boring experiments) While measuring less easy to capture and more variable phenomena such as depression needs many more participants.
34
Testing hypotheses vs. Exploring
Exploratory analyses are an essential part of science. But we must know when we are fishing and when we are looking to confirm a strong pre-existing hypothesis. And so must our readers.
35
How To Fix It. This problem is not unique to psychology
And some of the best projects are coming from psychologists
36
ClinicalTrials.gov This was perhaps the beginning. Started in 1997 with law passed by congress. More recently the concept pushed by people such as Ben Goldacre (who wrote Bad Science and Bad Pharma, both must reads). Preregistration of Clinical Trials. Obvious, long overdue and now happening. But Issues of compliance And why just for clinical trials.
37
Royal Society Open Science: RR
38
Registered Reports. Royal Society Open Science Journal “Registered Reports stem from the philosophy that the publishable quality of science should be judged according to the importance of the research question and rigour of the methodology, and never based on whether or not the hypothesis was supported” Editor and champion: Professor Chris Chambers (cognitive neuroscience and improving science). Your paper will be conditionally accepted for publication BEFORE you gather data! Allowed: Reporting of serendipitous results, non pre-registered analyses Prevents: File Drawer. Fishing Trips Promotes: Honesty, Openness, >26 other Journals have taken up the format to date.
39
Reg Rep: None of these matter
Whether Hypothesis Supported Whether p<0.05 Whether Results Have “Impact” Whether Results Novel
40
It’s not enough to just pre-register
We need to follow up the preregistered studies. ebmdatalab.net/#/
41
Dorothy Bishop Students should be taught to simulate random data.
So that when they see Jesus in their statistical toast…
42
Realign the incentive structure.
Pre registration is just part of this. We need to go further: How about: Requirements for PhD students to publish at least one replication study where feasible. Requirements for journals to publish replications of papers they have published earlier. Credit to be given for replicating studies. What else?
43
The problem for new researchers.
Many experienced scientists know which studies and study groups are a bit sketchy (reference). But new researchers to a field don’t. But even when we have a hunch, we don’t really know which studies we can ‘trust’ and which ones we can’t. Secondary research tends to only cite supportive primary research. It’s harder to find the non-supportive primary research (availability heuristic). Thus how do we decide what is a promising lead to follow, and what might be a career ending red herring?
44
Change the whole publishing model.
Is the best way to communicate still dead trees? Or should we be embracing: Online commenting Real time, open peer review What else?
45
Ban the NHSTP / p-value. Ban the Null Hypothesis Significance Testing Procedure. P-values are mosquitoes: “annoying and impossible to swat away” The journal Basic and Applied Social Psychology has basically done this. They don’t require inferential statistical procedures. Instead they want strong descriptive And they will accept alternative approaches such as Bayesian statistics. If well used. David Trafimow & Michael Marks (2015) Editorial, Basic and Applied Social Psychology, 37:1, 1-2, DOI: /
46
Open Data Data should be made available for EASY reanalysis by other researchers. That means the data needs to be as complete as possible. It needs to be well annotated. It needs to be well structured. Code must also be made available and understandable.
47
Science-Leaks. Whistleblowing protection and awards?
48
Governments The UK government spends about £2.5 billion on internal research per year. But it does not know how much is published. What about Ecuador?
49
Implications for USFQ It is important for these practices to be implemented from undergraduate level. Thus you should have a system of pre-registration for all undergraduate research projects. Undergraduates should have practice running simulated data.
50
You can have my slides. If your professor is not willing to hand out their slides, they are not in tune with open science. If you want to use my slides, you can, Always. The only proviso, is that you must use common courtesy. You must reference, otherwise you are plagiarising.
51
Reading List Royal Society Registered Reports. ClinicalTrials.gov Presentations and Panel Discussion: British Psychological Society. Well worth a watch. Contains essentially all the info and more from this talk. Professor Marcus Munafo, Prof Roger Wyat, Prof Dorothy Bishop, Prof Chris Chambers, Kathryn Sharples, Nick Brown, Prateek Buch Article from The Atlantic: Provides a good summary of the issues. Nature Survey of 1500 Scientists Dorothy Bishop’s Blog Bad Science Blog Article by David Colquhoun on p values.
52
References Baker, M. (2016). 1,500 scientists lift the lid on reproducibility. Nature News, 533(7604), Bem, D. J. (2011). Feeling the future: Experimental evidence for anomalous retroactive influences on cognition and affect. Journal of Personality and Social Psychology, 100(3), 407–425. Chambers, C. D. (2013). Registered Reports: A new publishing initiative at Cortex. Cortex, 49(3), 609–610. Collaboration, O. S. (2015b). Estimating the reproducibility of psychological science. Science, 349(6251), aac Fanelli, D. (2012). Negative results are disappearing from most disciplines and countries. Scientometrics, 90(3), 891–904. Ioannidis, J. P. A. (2005). Why Most Published Research Findings Are False. PLOS Med, 2(8), e124. John, L. K., Loewenstein, G., & Prelec, D. (2012). Measuring the prevalence of questionable research practices with incentives for truth telling. Psychological Science, 23(5), 524–532. Macleod, M. R., McLean, A. L., Kyriakopoulou, A., Serghiou, S., Wilde, A. de, Sherratt, N., … Sena, E. S. (2015). Risk of Bias in Reports of In Vivo Research: A Focus for Improvement. PLOS Biol, 13(10), e Ritchie, S. J., Wiseman, R., & French, C. C. (2012). Failing the Future: Three Unsuccessful Attempts to Replicate Bem’s “Retroactive Facilitation of Recall” Effect. PLOS ONE, 7(3), e Sterling, T. D., Rosenbaum, W. L., & Weinkam, J. J. (1995). Publication Decisions Revisited: The Effect of the Outcome of Statistical Tests on the Decision to Publish and Vice Versa. The American Statistician, 49(1), 108–112. Trafimow, D., & Marks, M. (2015). Editorial. Basic and Applied Social Psychology, 37(1), 1–2.
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.