Download presentation

Presentation is loading. Please wait.

Published byLibby Corbridge Modified over 2 years ago

1
The New Statistics: Estimation and Research Integrity 1 Geoff Cumming School of Psychological Science, La Trobe University, Melbourne, Australia APS-SMEP Workshop, APS Convention, San Francisco Thursday 22 May 2014 This PowerPoint file: tiny.cc/geoffdocstiny.cc/geoffdocs Tutorial article: The New Statistics: Why and How tiny.cc/tnswhyhowtiny.cc/tnswhyhow THANKS TO: Alan Kraut, Kate McMahon, The Australian Research Council, Neil Thomason, Fiona Fidler, and many others © G. Cumming 2014

2
The new statistics Effect sizes, confidence intervals, meta-analysis …which is Estimation The techniques are not new, but using them widely would, in many disciplines, be new Sections 1.The new statistics: Why 2.Research integrity and the new statistics 3.Effect sizes and confidence intervals 4.The new statistics: How 5.Planning, power, and precision 6.Meta-analysis Take-home message: Intuitions about variability—the dances 2

3
Understanding The New Statistics (New York: Routledge, 2012) 1. Introduction to The New Statistics 2. From Null Hypothesis Significance Testing to Effect Sizes 3. Confidence Intervals 4. Confidence Intervals, Error Bars, and p Values 5. Replication 6. Two Simple Designs 7. Meta-Analysis 1: Introduction and Forest Plots 8. Meta-Analysis 2: Models 9. Meta-Analysis 3: Larger-Scale Analyses 10. The Noncentral t Distribution 11. Cohen’s d 12. Power 13. Precision for Planning 14. Correlations, Proportions, and Further Effect Size Measures 15. More Complex Designs and The New Statistics in Practice 3

4
1.The new statistics: Why 2.Research integrity and the new statistics 3.Effect sizes and confidence intervals 4.The new statistics: How 5.Planning, power, and precision 6.Meta-analysis 4

5
The Boots anti-ageing stampede British J. Dermatology, online, 2009: A cosmetic ‘anti-ageing’ product improves photoaged skin: A double-blind, randomized controlled trial “…statistically significant improvement in facial wrinkles as compared to baseline assessment (p =.013), whereas vehicle-treated skin was not significantly improved (p =.11)” Media reports: “significant clinical improvement in facial wrinkles…” Queues at Boots for ‘No. 7 Protect & Perfect Intense Beauty Serum’ Watson, R. E. B., et al. (2009). British Journal of Dermatology, 161, Chapter 2

6
The Boots anti-ageing stampede Concluding ‘no effect’ for placebo is accepting the null hypothesis Statistical criticisms, then revised article: “non-significant trend…” p values and CIs closely linked Should assess the difference directly, but it’s a common error: Watson, R. E. B., et al. (2009). British Journal of Dermatology, 161, p =.013 p =.11 Chapter 2

7
Comparing significance levels is everywhere “…incorrect procedure… in which researchers conclude that effects differ when one effect is significant (p.05). We reviewed 513 … articles in Science, Nature, Nature Neuroscience, Neuron and The Journal of Neuroscience and found that 78 used the correct procedure and 79 used the incorrect procedure.” Nieuwenhuis, S., Forstmann, B. U., & Wagenmakers, E-J. (2011). Erroneous analyses of interactions in neuroscience: a problem of significance. Nature neuroscience, 14,

8
Conclusions, so far Presentation format can matter—a lot Null Hypothesis Significance Testing (NHST) promotes dichotomous thinking (an effect exists, or it doesn’t) NHST: seductive, but illusory ‘certainty’ CIs can prompt better interpretation …and are highly informative p values and CIs are closely linked, but there are important differences 8 Chapter 2

9
9

10
authors in medical and psychology journals. Ask them to rate: “Results of the two are broadly consistent, or similar” Ask for comments, classify these as ‘mention NHST’ or no such mention. Conclude: Even if see CIs, often think of NHST Better interpretation if avoid NHST, and think in terms of intervals Don’t report p values as well as CIs 10 Evidence? (statistical cognition—only psychology can do it) Coulson, M., Healey, M., Fidler, F., & Cumming, G. (2010). Confidence intervals permit, but do not guarantee, better inference than statistical significance testing. Frontiers in Quantitative Psychology and Measurement, 1:26, 1-9. ) tiny.cc/cisbettertiny.cc/cisbetter

11
Time for a crusade? The New Statistics: Effect sizes, confidence intervals, meta-analysis …are not themselves new, but using them widely in psychology would be new, and highly beneficial 11

12
p values—how do we think about them? The p value: …is central to research thinking …has hardly been studied A great question for statistical cognition research How do people think about p? … talk about p? … feel about p? 12 Chapter 5

13
Tenure, Research grant PhD, Prize, Top publication Consolation prize, Fair publication …and has real-life consequences! 13

14
But: p values and replication? Given the p value from an initial experiment What’s likely to happen if you replicate—do you get a similar p? We’ll simulate and ask: What p values? Experimental (N = 32), and Control groups (N = 32) Assume population difference of Cohen’s = 0.5 (a medium effect?) Power =.52, typical for many fields in social and behavioural science Dance p page of ESCI chapters 5-6ESCI chapters 5-6 …free download from Dance of the p values (It’s very drunken!) The video: tiny.cc/dancepvals or tiny.cc/dancepvals2tiny.cc/dancepvalstiny.cc/dancepvals2 14 Chapter 5

15
Replicate …get a VERY different p value p interval: 80% prediction interval for one-tailed p (given two-tailed p obt ) p obt p interval.001( ,.070).01( ,.22).05(.00008,.44) Independent of N !.2(.00099,.70) Any p could easily have been very different. (That’s sampling variability) A p value gives only extremely vague information about p next time! Researchers severely underestimate p intervals! (Med, Psych, Stats) Cumming, G. (2008). Replication and p intervals: p values predict the future only vaguely, but confidence intervals do much better. Perspectives on Psychological Science, 3, Lai, J., Fidler, F., & Cumming, G. (2011). Subjective p intervals: Researchers underestimate the variability of p values over replication. Methodology: European Journal of Research Methods for the Behavioral and Social Sciences, 8, tiny.cc/subjectiveptiny.cc/subjectivep 15 Chapter 5

16
Traditional ANOVA table 16 Star-jumping is crazy, yet common. ‘Failure to replicate’? Be cautious. Meta-analysis. Interpretation may be based on p values, and little else? Chapter 5

17
Implications of the variability of p ? Weird inconsistency in textbooks: Sampling variability of means—whole chapters Sampling variability of CIs—illustrated, part of definition Sampling variability of p—not mentioned! Require reporting of p intervals? E.g. p =.04, p interval (0,.19) See a p value, think of the (approx) p interval More generally: Sampling variability is so large—remember the dances 17 Chapter 5

18
Reasons for NHST? ??Significance testing is needed: to identify which results are real and which due to chance, to determine whether or not an effect exists, to ensure that data analysis is objective, and to make clear decisions, as in practice we need to do. In every case: NO, estimation does better Schmidt, F. L., & Hunter, J. E. (1997). Eight common but false objections to the discontinuation of significance testing in the analysis of research data. In Lisa L. Harlow, Stanley A. Mulaik, & James H. Steiger (Eds). What if there were no significance tests? (pp ). Mahwah, NJ: Erlbaum. 18

19
Section 1 Conclusions p can mislead, CIs inform Our p tells virtually nothing about the dance Our CI gives useful information about the dance A p value indicates strength of evidence against a null hypothesis A p value does not signal its unreliability, CI length does signal uncertainty p values and CIs are closely linked A CI estimates the size of an effect, which is what we want to know Estimation is more informative than dichotomous NHST NEXT: Further reasons for reform of statistical and other research practices 19 Chapter 2

20
1.The new statistics: Why 2.Research integrity and the new statistics 3.Effect sizes and confidence intervals 4.The new statistics: How 5.Planning, power, and precision 6.Meta-analysis 20

21
A title to die for The replication crisis …from cancer research to social psychology, some published findings won’t replicate Ioannidis, J. P. A. (2005). Why most published research findings are false. PLoS Medicine 2: e124. tiny.cc/mostfalsetiny.cc/mostfalse 21

22
Why are most false? The Ioannidis argument: The imperative to achieve statistical significance explains: 1. selective publication—file drawer 2. data selection, tweaking, and p-hacking until p is sufficiently small 3. why we think any finding that once meets the criterion of statistical significance is true and doesn’t require replication Many false positives published ==> Most published findings false 22

23
Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22, tiny.cc/falsepositivepsychtiny.cc/falsepositivepsych 23

24
2. Data selection, tweaking, p hacking The Simmons et al. argument: p-hacking—it’s very easy to: test a few extra participants drop or add dependent variables select which comparisons to analyze drop some results as aberrant try a few different statistical analysis strategies then finally choose which of all the above to report Many degrees of freedom ==> always find statistical significance Numerous published results are false positives Many won’t replicate, not that we do many replications 24

25
Many false positives published? Low power studies (e.g., power =.50, for medium effect) Dark = stat sig Selective publication ==> high proportion of false positives Selection, tweaking, p-hacking ==> turns some ns into stat sig Red = spurious stat sig Now even higher proportion of false positives If a journal prefers surprise, false positives selectively published Therefore even higher proportion of false positives 25

26
Research integrity ≈ Open science The problems: 1. Selective publication 2. p hacking, and other dubious data analytic practices 3. Lack of replication; replications fail (the crisis) Two meanings of ‘research integrity’ Completeness and validity of the published research literature Requires solution to all three problems Ethical, morally correct behaviour of researchers 1: Make results of all competent research available, somehow 2: Avoid dubious data analytic practices: No p-hacking 1 and 2: Report everything in full, accurate detail 3: Carry out replications How do we achieve research integrity? 26

27
Research integrity: A new-statistics perspective The new statistics should help in some ways: Remove the imperative for statistical significance Remove the dichotomous mindset of replication as Yes or No Emphasise meta-analysis and thus the importance of cumulation, replication, and making ALL results available But may not solve other problems: Career imperative to publish in best journals Selection in data analysis and reporting Journals’ preference for exciting, novel findings, not replications 27

28
Research integrity: What we need Understanding of the 3 problems—no simple changes will suffice The new statistics Better, more informative experiments A clear distinction: Pilot explorations, not publishable, vs. Planned, pre-specified experiments, results ‘published’ in full Ways to declare research plans in advance Ways to ‘publish’, whatever the data (but quality control) Ethical review boards: Require pre-registration? ‘publication’? Replications (close and not-so-close) Tools, guidelines, training, editorial policies… 28

29
Research integrity findings, proposals, arguments… Perspectives on Psychological Science, 2012, issue 7(6); 2013, issue 8(4) Makel: Only 1% of articles are replications, most ‘successful’; rate is increasing Bakker; Francis: Many meta-analyses have too many statistically significant results Giner-Sorolla: Top journals require ‘cool’ results! Aesthetics beat truth! Klein: Demand characteristics and experimenter bias: Still alive! Frank; Grahe: Students should do replication experiments Koole: Ways to reward replications. Publish online, linked to original reports. Nosek: Scientific Utopia: Place truth before ‘cool’. Open tools, data, publication. Wagenmakers: Publish protocol in advance. Pre-specified (confirmatory) vs. exploratory Fuchs: Psychologists open to change, but wary; prefer standards to rigid rules Ioannidis: Credibility of science. Replication. Truth. Progress. Open Science Collaboration: Reproducibility Project—replications of 2008 studies Fiedler: Test many alternative hypotheses. Converging evidence. Ingenuity. Gullo: DSM5: Pathological publishing Positive results, no null results, ‘good story’ 29

30
Research integrity: A few current projects Open Science Framework (OSF) tiny.cc/osftiny.cc/osf Manage workflow, declare protocols, archive results and data Reproducibility Project tiny.cc/repprojecttiny.cc/repproject An open collaboration for replication, part of OSF Replicate findings published in 2008 Registered Replication Report, in Perspectives on Psychological Science Open, refereed & pre-specified, guaranteed publication, meta-analysis PsychFileDrawer tiny.cc/psychfiledrawertiny.cc/psychfiledrawer Archive of reports of replications in psychology figshare tiny.cc/figsharetiny.cc/figshare A repository of reports and datasets Archives of Scientific Psychology (APA) tiny.cc/archivesscipsytiny.cc/archivesscipsy Open online journal; requires full data rOpenSci It’s not just psychology! (Posting of software, data, analyses, discussion…) 30

31
Research integrity findings, proposals, arguments… Perspectives on Psychological Science, 2014, issue 9(3), May Ledgerwood: Introduction. “best practices… things we can change right now” Lakens & Evers: Increasing the informational value of studies, the v statistic Sagarin et al.: Protection for data peeking, then further data collection Perugini et al.: Imprecise power estimates, ‘safeguard power’ Stanley & Spence: Replications often vary greatly. Sampling error, measurement error. “Researchers should adjust their expectations concerning replications and shift to a meta-analytic mindset.” Braver, Thoemmes, & Rosenthal: Continuously cumulating meta-analysis Maner: Implications for editors and manuscript reviewers 31

32
New journal guidelines Psychonomic Society journals New statistical guidelines tiny.cc/psychonomicstatstiny.cc/psychonomicstats Society for Personality and Social Psychology (SPSP) Task Force Funder, D. C., et al. (2014). Improving the dependability of research in personality and social psychology: Recommendations for research and educational practice. Personality and Social Psychology Review,18, Psychological Science New guidelines, from Jan 2014: tiny.cc/eicheditorial tiny.cc/pssubguidetiny.cc/eicheditorialtiny.cc/pssubguide Editor-in-chief, Eric Eich, explains: tiny.cc/apseichinterviewtiny.cc/apseichinterview 32

33
Eich, E. (2014). Business not as usual. Psychological Science, 25, 3-6. tiny.cc/eicheditorialtiny.cc/eicheditorial 33

34
Psychological Science guidelines Enhanced reporting of methods Compulsory full disclosure check-boxes: Exclusions, Manipulations, Measures, Sample Sizes Up to three ‘open science’ badges Embracing the new statistics Tutorial article: tiny.cc/tnswhyhowtiny.cc/tnswhyhow …APS is keen for other societies and journals to do similar 34

35
OSF badges …as in Psychological Science Preregistered: The design and analysis plans for the reported research were preregistered in a public, open-access repository. Open Materials: All digitally shareable materials necessary to reproduce the reported methodology have been made available in a public, open-access repository. Open Data: All digitally shareable data necessary to reproduce the reported results have been made available in a public, open-access repository. 35

36
Cumming, G. (2014). The new statistics: Why and how. Psychological Science, 25, tiny.cc/tnswhyhow tiny.cc/tnswhyhow 36

37
Statistical reform efforts The full story: Fidler (2005) tiny.cc/fionasphdtiny.cc/fionasphd The brief story: Cumming (2014, APS Observer) tiny.cc/geoffobservertiny.cc/geoffobserver International Committee of Medical Journal Editors (ICMJE) 1988: Use CIs Ken Rothman: 1990 founded Epidemiology “We won’t publish p values” …for 10 years there were virtually none Geoff Loftus at Memory & Cognition (1993-7): Increased use of figures with error bars, but a decrease after he left APA Publication Manual (2010): Recommended estimation Numerous examples of reporting CIs Format for CI: “mean is 275 ms, 95% CI [210, 340]” Fidler, F., Thomason, N., Cumming, G., Finch, S., & Leeman, J. (2004). Editors can lead researchers to confidence intervals, but can’t make them think: Statistical reform lessons from medicine. Psychological Science, 15, Finch, S., Cumming, G., Williams, J., Palmer, L., Griffith, E., Alders, C., Anderson, J., & Goodman, O. (2004). Reform of statistical inference in psychology: The case of Memory & Cognition. Behavior Research Methods, Instruments & Computers, 36,

38
Prospects for reform For 60+ years, damning critiques of NHST, almost no replies, almost no change Kline review: tiny.cc/klinechap3 (13 severe NHST problems)tiny.cc/klinechap3 Critical quotes: tiny.cc/nhstquotestiny.cc/nhstquotes Last years: Meta-analysis arrives, no need for NHST, damage done by p-based selective publication, but still little change Last 5-10 years: Replication crisis: Urgent! At last, a tipping point? 38

39
39 Psychology struggles out of the p-swamp, into the beautiful garden of confidence intervals

40
In summary, The new statistics: Why? Sections 1 and 2 Conclusions p values are unreliable, give seductive but illusory ‘certainty’ Dichotomous NHST is limiting, CIs are more informative Estimation for a cumulative quantitative discipline (Meehl, Gigerenzer) And now: Replicability crisis demands change: Research integrity APA Publication Manual recommends estimation New journal requirements, Psychological Science leads Research integrity: Pre-register, disclose fully, report fully NHST, the researcher’s heroin—can we kick the habit? —can we abandon the security blanket of ‘significance’ and p?! NEXT: Confidence intervals 40 Chapters 1, 2

41
1.The new statistics: Why 2.Research integrity and the new statistics 3.Effect sizes and confidence intervals 4.The new statistics: How 5.Planning, power, and precision 6.Meta-analysis 41

42
The new statistics: How? Effect sizes Effect size, the amount of something of interest Many ES measures are very familiar No cause need be identified An effect size (ES) can be: A mean, or difference between means A percentage, or percentage change A correlation (e.g., Pearson r) Proportion of variance (R 2, 2, 2 …) A standardised measure (Cohen’s d, Hedges g…) A regression slope (b or ) A measure of goodness of fit Many other things… (but NOT a p value!) 42 Chapter 2

43
My strategy I assume populations have a normal distribution No mention of alternatives, all full of potential: Bayesian statistics robust statistics resampling methods model comparison and selection etc… Why choose estimation? Three criteria for reform that has a chance of succeeding 1. Move on from NHST 2. Move on from dichotomous thinking and decision making 3. Resources available now to make techniques accessible 43 Chapter 3

44
A 95% confidence interval (CI) Public support for Proposition C is 53%, in a poll with a 2% margin of error (MOE) 44 Chapter 4 Relative likelihood

45
A 95% confidence interval (CI) Public support for Proposition C is 53%, in a poll with a 2% margin of error (MOE) 45 Chapter 4

46
A 95% confidence interval (CI) Public support for Proposition C is 53%, in a poll with a 2% margin of error (MOE) 46 Chapter 4

47
A 95% confidence interval (CI) Public support for Proposition C is 53%, in a poll with a 2% margin of error (MOE) 47 Chapter 4

48
A 95% confidence interval (CI) Public support for Proposition C is 53%, in a poll with a 2% margin of error (MOE) 48 Chapter 4

49
ESCI play time: CIjumping page of ESCI chapters 1-4ESCI chapters 1-4 Dance of the means: Narrow is good; large N is gold Mean heap: Sampling distribution of sample means SE (standard error) is SD of mean heap (SE = SD / ) Central Limit Theorem (CLT): Magic: Normal distribution from thin air ± 1.96 x SE contains almost all (95% of) sample means: Tram lines 95% of sample means lie within 1.96 x SE of 1.96 x SE is margin of error (MOE); most errors less than MOE M ± 1.96 x SE will capture for most (≈95% of) samples For known, 95% CI = M ± 1.96 x SE = M ± 1.96 x / For not known, 95% CI = M ± t crit x SE = M ± t crit x s/ 49 Chapter 3

50
CIs: Interpretation 1 (of five) How, pragmatically, should we think about CIs? How should we use CIs to interpret results? Interpretation 1: One from the dance RT was 457 ms [427, 487] Our CI is randomly chosen from an infinite dance, 95% of which include , but... It might be red Of all 95% CIs we ever see, around 5% will be red, but we’ll never know which 50 Our 95% CI Chapter

51
CIs: Interpretations 2-5 Is it reasonable to interpret our CI? YES, if it’s likely to be typical of its dance, so yes, provided that: N is not very small, say less than around 8 Our CI has not been selected 51 Our 95% CI Chapter

52
Dance of the means The mean is the best point estimate, for any N 52 Chapter 3

53
Dance of the CIs 95% of CIs capture , for any N BUT for N very small, CI length can be misleading 53 Chapter 3

54
CIs: Interpretations 2-5 Is it reasonable to interpret our CI? YES, if likely to be typical of its dance, so yes, provided that: N is not very small, say less than around 8 For any N, the mean is the best point estimate But for very small N, CI length can be very misleading Our CI has not been selected Several CIs available, but only a selected one reported? 54 Our 95% CI Chapter

55
CIs: Interpretation 2 Interpretation 2: Our interval, with cat’s eye picture The beautiful shape of a CI: likelihood or plausibility Interpret the point estimate, 457 ms, and CI limits We can be 95% confident our interval includes Best bets for are values near M, Less good bets toward and beyond each limit No sharp drop at the limits It matters little whether a point is just inside or just outside the CI Cumming, G. (2007). Inference by eye: Pictures of confidence intervals and thinking about levels of confidence. Teaching Statistics, 29, Our 95% CI Chapter

56
CIs: Interpretation 3 Interpretation 3: Prediction interval for next M The CI signals, approximately, the ‘width’ of the dance …it signals where the next mean is likely to land On average, approx 83% of future M ’s lie within our CI On average, a 5 in 6 chance Researchers understand this moderately well Cumming, G., Williams, J., & Fidler, F. (2004). Replication, and researchers’ understanding of confidence intervals and standard error bars. Understanding Statistics, 3, Cumming, G., & Maillardet, R. (2006). Confidence intervals and replication: Where will the next mean fall? Psychological Methods, 11, 217– Our 95% CI Chapters 3,

57
CIs: Interpretation 4 Interpretation 4: MOE as error of estimation MOE = 30 ms The error of estimation is |M – μ | MOE is the maximum likely error of estimation MOE is our measure of precision Large MOE, low precision Small MOE, high precision 57 Our 95% CI Chapter 4 MOE

58
CIs: Interpretation 5 (least preferred) Interpretation 5: NHST If null hypothesis value (e.g. zero) is outside the 95% CI, reject at the p =.05 level If within the interval, don’t reject Ignores much of the information CIs provide Can prompt incorrect interpretation of results There are links: CI length, level of confidence (C), and p Coulson, M., Healey, M., Fidler, F., & Cumming, G. (2010). Confidence intervals permit, but do not guarantee, better inference than statistical significance testing. Frontiers in Quantitative Psychology and Measurement, 1:26. tiny.cc/cisbettertiny.cc/cisbetter 58 Our 95% CI Chapter

59
CI length and level of confidence, C 59 Chapter 4 One third longer than 95% CI One sixth shorter than 95% CI One third as long as 95% CI The C% CI spans C% of the cat’s eye area Some simple approximate relations between C and CI length

60
60

61
61 CI length and level of confidence, C Chapter 4

62
Position relative to a 95% CI, and p 62 Chapter 4 Note where the null hypothesis value, 0, falls in relation to a 95% CI Eyeball the two-tailed p value

63
From a 95% CI to strength of evidence 63 Chapter 4 Note where a 95% CI falls in relation to the null hypothesis value, 0 No need to think about p

64
If the null hypothesis value falls at the limit of the 95% CI, two-tailed p value is.05 … 1/3 of MOE from M, about.50 … 1/6 MOE back from a limit, about.10 … 1/3 of MOE beyond a limit, about.01 … 2/3 of MOE beyond a limit, about.001 … eyeballed p value? … or strength of evidence 64 Chapter 4 Position relative to a 95% CI, and p.033,.12, <.001,.29,.70, % CIs

65
From p (and M) to the 95% CI 65 Chapter 4 Given p, 0, and M, eyeball the 95% CI

66
Five ways to interpret a result, with CI Public support for Proposition C is 53%, in a poll with a 2% margin of error (MOE) 1. One from the dance: most likely includes , but might be red 2. Eyeball the cat’s eye. Values around 53 are most plausible for ; values towards and beyond 51 or 55 are progressively less plausible. We are 95% confident lies in [51, 55]. Interpret the midpoint (53) and limits (51 and 55). 3. Quite likely (83% chance) a repeat survey would give a result in [51, 55]. 4. The maximum likely error of estimation is 2%. 5. Support is statistically above 50%, p <.01. … then interpret, in the research and practical context. 66 Chapters 3, 4, 5

67
Interpret ESs and their CIs Possible results of a study seeking to reduce anxiety For each possible result, perhaps consider: Interpretation of 95% CI Use any or all of five ways p value (reject H 0 ?) ?? Can we ‘accept H 0 ’? ES reference values are shown. We could mark the ES that is practically significant. MOE: short is good 67 Chapters 3, 4

68
Interpret ESs and their CIs Knowledgeable judgment is required, in context, but that’s OK Justify interpretations In the research context Including practical, theoretical, … implications Small, large, notable, important, economically valuable, negligible… Examples of ES reference values (mark in figures): 10 mm on the 100 mm pain line is the minimum change of clinical note 15% change in memory score is the smallest of clinical importance Scores of 0-13, 14-19, 20-28, and are, respectively, ‘minimal’, ‘mild’, ‘moderate’, and ‘severe’ levels of depression on the Beck (BDI) Avoid the ‘S word’ … significant (shhhh) 68 Chapters 3, 4

69
69 Chapters 4, 6 The tragedy of the error bar Samples from same population Tweaked so M, SD same for all N SD is descriptive of sample data 95% CI length varies greatly with N CI inferential, tells about population SE length varies inversely with √N Ratio CI / SE varies—for small N SE neither descriptive nor inferential Use 95% CI, not SE bars A tragedy that bars don’t say what they mean—always define bars Cumming, G., Fidler, F., & Vaux, D. L. (2007). Error bars in experimental biology. Journal of Cell Biology, 177, tiny.cc/errorbars101tiny.cc/errorbars101

70
70 Chapters 4, 6 The tragedy of the error bar Double the length of SE bars to get the 95% CI, approximately Doesn’t work for small N, other ESs Use 95% CI, not SE bars …CI is inferential—what we want Prefer 95% CIs—the most common Approx 68% CI Cumming, G., Fidler, F., & Vaux, D. L. (2007). Error bars in experimental biology. Journal of Cell Biology, 177, tiny.cc/errorbars101tiny.cc/errorbars101

71
Questions that should spring to mind… 71 What’s the DV? (on the vertical axis) What are the two conditions? What design? (Two independent groups? Paired?) What are the bars? (SE? 95% CI? SD? Some other CI?) Every figure must provide all this information, in the figure or caption. Chapters 4, 6

72
The New Statistics: How? Estimation: The eight-step plan 1. Use estimation thinking. State estimation questions as: “How much…?”, “To what extent…?”, “How many…?” The key to a more quantitative discipline 2. Indentify the ESs that best answer the questions (a difference?) 3. Declare full details of the intended procedure, data analysis, … 4. Calculate point and interval estimates (CIs) for those ESs 5. Make a picture, including CIs 6. Interpret (use knowledgeable judgment, in context) 7. Use meta-analytic thinking at every stage (…cumulative discipline) 8. Make a full report publicly available (an imperative, not just a goal) Cumming, G., & Fidler, F. (2009). Confidence intervals: Better answers to better questions. Zeitschrift für Psychologie / Journal of Psychology, 217, Chapters 1, 2, 15

73
Estimation thinking, estimation language Introduction: “The study was designed to estimate reading improvement, following the new procedure, in 6 and 4 year olds.” Even better: “The theory predicted a medium-to-large increase in 6 year olds, but little or no increase in 4 year olds.” Better still if the predictions are quantitative. Estimation thinking is assumed. Estimation of ESs is the focus. Results: “The reading age of 6 year olds increased by 4.5 months 95% CI [1.9, 7.1], which is a large and educationally substantial increase. That of 4 year olds increased by only a negligible 0.6 months [-0.8, 2.0], …” We are given ESs, with CIs, then the ESs are interpreted. Further comment would assess the more specific (ideally, quantitative) predictions. 73 Chapters 2 - 6

74
The New Statistics: Actually doing it The editor says to remove CIs and just give p values. What do you DO? Research methods best practice: Consider, decide, persist The evidence should decide: Consider statistical cognition research TNS reasons are compelling, TNS is the way of the future. Persist. Explain and justify your data analytic approach APA Publication Manual: “Wherever possible, base discussion and interpretation of results on point and interval estimates” (p. 34). New guidelines: Psychonomic Society, Psychological Science… I add p values if I must, but don’t mention them, nor remove CIs or ESs 74 Chapter 15

75
Section 3 Conclusions ES: The amount of anything of interest 95% CI gives inferential information …which is what we want Use any of five ways to think about a 95% CI Interpret the ES and CI, in context Ask estimation questions, use estimation thinking …and meta-analytic thinking NEXT: Examples of using the new statistics 75 Chapter 2

76
1.The new statistics: Why 2.Research integrity and the new statistics 3.Effect sizes and confidence intervals 4.The new statistics: How 5.Planning, power, and precision 6.Meta-analysis 76

77
The New Statistics: How? Estimation: The eight-step plan 1. Use estimation thinking. State estimation questions as: “How much…?”, “To what extent…?”, “How many…?” The key to a more quantitative discipline 2. Indentify the ESs that best answer the questions (a difference?) 3. Declare full details of the intended procedure, data analysis, … 4. Calculate point and interval estimates (CIs) for those ESs 5. Make a picture, including CIs 6. Interpret (use knowledgeable judgment, in context) 7. Use meta-analytic thinking at every stage (…cumulative discipline) 8. Make a full report publicly available (an imperative, not just a goal) Cumming, G., & Fidler, F. (2009). Confidence intervals: Better answers to better questions. Zeitschrift für Psychologie / Journal of Psychology, 217, Chapters 1, 2, 15

78
Randomised control trial (RCT) 78 Means, with 95% CIs These CIs can guide assessment of which comparisons? (between groups) They cannot help with which comparisons? (repeated measure) Figure page of ESCI chapters 14-15ESCI chapters Cumming, G., & Finch, S. (2005). Inference by eye: Confidence intervals, and how to read pictures of data. American Psychologist, 60, tiny.cc/inferencebyeye tiny.cc/inferencebyeye Chapter 15

79
Two basic experimental designs—with CIs 1. Two independent groups E.g. Experimental group vs Control group OK to use CIs on means to assess the difference (…think independent groups t test) Data two and Simulate two pages of ESCI chapters 5-6ESCI chapters Paired design (repeated measure) E.g. Pretest vs Posttest, single group of patients May NOT use CIs on Pretest and Posttest to assess the difference Need the CI on the paired differences (…think paired t test) Data paired and Simulate paired pages of ESCI chapters 5-6ESCI chapters Chapters 6

80
Two basic experimental designs, and t tests Two independent groups CIs on separate means (and on diff) are based on Paired (or matched) design Different error term CI on difference is based on 80 Chapters 6

81
81 Overlap Rule of Eye Overlap Rule of Eye Pretest Posttest Chapters 6 Compare A B page of ESCI chapters 5-6ESCI chapters 5-6 Based on different SD information: SD of the differences Based on same SD information as the two separate CIs

82
Two independent groups: Rules of Eye 82 Two 95% CIs just touching (zero overlap) indicates moderate evidence of a population difference (approx p =.01) Moderate overlap (about half average MOE) is some evidence of a difference (approx p =.05) When both samples sizes are at least 10, and the two MOEs do not differ by more than a factor of 2. Use the rule without reference to p Cumming, G., & Finch, S. (2005). Inference by eye: Confidence intervals, and how to read pictures of data. American Psychologist, 60, tiny.cc/inferencebyeyetiny.cc/inferencebyeye Chapters 6

83
Two independent groups: Rules of Eye 83 Two 95% CIs just touching (zero overlap) indicates moderate evidence of a population difference (approx p =.01) Moderate overlap (about half average MOE) is some evidence of a difference (approx p =.05) When both samples sizes are at least 10, and the two MOEs do not differ by more than a factor of 2. Use the rule without reference to p Cumming, G. (2009). Inference by eye: Reading the overlap of independent confidence intervals. Statistics in Medicine, 28, Chapters 6

84
Randomised control trial (RCT) 84 Chapter 15 Means, with 95% CIs These CIs can guide assessment of which comparisons? (between groups) They cannot help with which comparisons? (repeated measure) Figure page of ESCI chapters 14-15ESCI chapters Q: How display also the within-S CIs? Fidler, F., Faulkner, S., & Cumming, G. (2008). Analyzing and presenting outcomes: Focus on effect size estimates and confidence intervals. In A. M. Nezu & C. M. Nezu (Eds.) Evidence-based outcome research (pp ). New York: OUP.

85
85 RCT example: Plot mean change score, with its CI One choice for planned contrasts Interpret the 95% CIs Use overlap rule? Figure page of ESCI chapters Chapter 15

86
Cohen’s d (A standardised effect size), SMD Cohen’s d is an ES expressed as a number of SDs (a z score) d picture page of ESCI chapters 10-13ESCI chapters Lots of overlap of the populations For d = 0.5 (a medium effect?), 69% of E points higher than C mean Cohen’s small, medium, large: 0.2, 0.5, 0.8—but arbitrary, last resort Cohen’s d is the ES (in original units), divided by a suitable SD Our sample d is a point estimate of the population We need d to help understanding, and for meta-analysis 86 Chapter 11

87
Cohen’s d, for 2 independent groups IQ example: Control group: M C = 110, s C = 12; Experimental group: M E = 120, s E = 16 d = ES / SD … ES is in original units, we choose a value for SD 1. Which SD makes best sense as the unit of measurement? (Population SD) 2. What’s the best estimate of this SD? Three options: Use 15, SD in reference population, d = (120 – 110) / 15 = 0.67 Use s C = 12, estimate of Control population SD, d = (120 – 110) / 12 = 0.83 Use s p = 14, pooled estimate from both groups, d = (120 – 110) / 14 = 0.71 Third option: Both numerator and denominator are estimates, so change on replication The rubber ruler: the SD as an elastic unit of measurement Denominator also used for t test and CI on difference between group means Caution: d = 0.5 may be 10/20 or 12/24 or 9/18 or… 87 Chapter 11

88
Cohen’s d, for paired design IQ example: A healthy breakfast increases IQ score by average 2 points Usual: M usual = 110, s usual = 12; Healthy: M healthy = 112, s healthy = 16 (Also s diff = 1.2) d = ES / SD Which SD makes best sense as the unit of measurement? (Population SD?) Use 15, population SD, d = 2 / 15 = 0.13 Use s C = 12, Usual estimate, d = 2 / 12 = 0.17 Use s p = 14, pooled, d = 2 / 14 = 0.14 BUT for paired t test, and CI on mean of differences, use s diff …use s diff as the measuring unit for d?? NO: it gives d = 2 / 1.2 = 1.7. Silly. Our choice of SD for d may differ from what we use for inference 88 Chapter 11

89
CIs for Cohen’s d Both numerator and denominator have sampling variability …so distribution of d is tricky To calculate accurate CIs on d, need the noncentral t distribution …Fairytale “How the noncentral t distribution got its hump” tiny.cc/noncentralt To calculate CIs for d use ESCI, or an excellent approximate method: Cumming, G., & Fidler, F. (2009). Confidence intervals: Better answers to better questions. Zeitschrift für Psychologie / Journal of Psychology, 217, Introduction to CIs on d and noncentral t: Cumming, G., & Finch, S. (2001). A primer on the understanding, use and calculation of confidence intervals based on central and noncentral distributions. Educational and Psychological Measurement, 61, Chapters 10, 11

90
Unbiased estimate of is d unb Unfortunately d overestimates . The unbiased estimate of is: Multiply d by the adjustment factor to get d unb. Routinely use d unb (sometimes called Hedges’ g, but terminology a mess) Data two and Data paired pages of ESCI chapters 5-6ESCI chapters Chapter 11

91
Cohen’s d, take-home messages d = ES / SD … where ES is in original units, and we choose a value for SD d is highly valuable, especially for meta-analysis Choose SD carefully: what makes best sense as the unit of measurement? Use the best available estimate of this SD Report how d was calculated—if we don’t know that, we can’t interpret Beware the rubber ruler, interpret d values with caution Usually use d unb, the unbiased estimate of Beware terminology (Hedges’ g, Glass’s ) Use ‘Cohen’s d’ or d unb, with explanation of how calculated Interpret values in context, use 0.2, 0.5, 0.8 as a last resort 91 Chapter 11

92
CI on correlation, r Use Fisher’s r to z transformation CIs asymmetric, more for r near -1 or 1 CIs shorter when near -1 or 1 CIs surprisingly wide, unless N is large Example: r =.6, N = 30 Cohen’s benchmarks:.1,.3,.5—often inappropriate r to z and Two correlations pages of ESCI chapters 14-15ESCI chapters Correlations and Diff correlations pages of ESCI Effect sizesESCI Effect sizes Finch, S., & Cumming, G. (2009). Putting research in context: Understanding confidence intervals from one or more studies. Journal of Pediatric Psychology, 34, Chapter 14

93
CI on proportion, P Difference between two proportions (instead of OR or 2 ) ES is proportion survived. Diff = 17/20 – 11/20 =.30, [.02,.53] Proportions and Diff proportions page of ESCI Effect sizesESCI Effect sizes Altman, D. G., Machin, D., Bryant, T. N., & Gardner, M. J. (2000). Statistics with confidence: Confidence intervals and statistical guidelines (2nd ed.). London: BMJ Books. 93 Chapter 14

94
CIs for assessing model fit to data Velicer, W. F., Cumming, G., Fava, J. L., Rossi, J. S., Prochaska, J. O., & Johnson, J. (2008). Theory testing using quantitative predictions of effect size. Applied Psychology: An International Review, 57, Test the Transtheoretical Model of Behavior Change, with data from N = 3967 smokers. Calculate 95% CIs on 15 predictor variables. Dots are the quantitative predictions of the model. 94 Chapter 15

95
Section 4 Conclusions More complex designs: Planned contrasts, with 95% CIs Design: independent groups vs repeated measure Cohen’s d and d unb ; how calculated? Rubber ruler CIs on other ES measures Correlation; difference between two independent correlations Proportion; difference between two independent proportions CIs to assess model fit to data NEXT: Planning good studies 95 Chapter 2

96
1.The new statistics: Why 2.Research integrity and the new statistics 3.Effect sizes and confidence intervals 4.The new statistics: How 5.Planning, power, and precision 6.Meta-analysis 96

97
Statistical power: I’m ambivalent Statistical power is the chance of finding something if it is there Statistical power = 1 – = Prob (reject H 0, IF H 0 false) Depends on NHST If using NHST, take statistical power seriously Instead, use precision for planning: design for target MOE “Power” more loosely: “Goodness” of an experiment Instead, maximise informativeness 97 Chapter 12

98
Power picture Statistical power is the chance we’ll find an effect, if there is an effect of stated size At right: Single sample, N = 18, target ES is =.5, =.05, two-tailed, power =.52 Power picture page of ESCI chapters To calculate power, we need non- central t, unless is known Cumming, G., & Finch, S. (2001). A primer on the understanding, use, and calculation of confidence intervals that are based on central and noncentral distributions. Educational and Psychological Measurement, 61, 530– Chapter 12

99
Power example: HEAT Hot Earth Awareness Test (HEAT) Test of climate change knowledge, attitudes, and behavior Assume = 50, = 20 in reference population Use =.05, two-tailed. Choose target ES that is meaningful Two independent groups experiment, N in each group For target ES of = your choice, study variation of power with N E.g. 8 points on HEAT is = 8/20 = 0.4 Target ES makes a big difference For paired design, set , the population correlation, and study power Correlation , as well as target ES, makes a big difference Power two and Power paired pages of ESCI chapters 10-13ESCI chapters Chapter 12

100
Ways to increase power Increase N Increase target ES (!!) Increase (!!) Improve the experimental design Use better measures Scope for fudging (Grant applications, ethics proposals…) 100 Chapter 12

101
Power recommendations APA Manual: “Take seriously the statistical power considerations associated with the tests of hypotheses. … routinely provide evidence that the study has sufficient power to detect effects of substantive interest…” (p. 30) BUT power values are very rarely reported in psychology journals 101 Chapter 12

102
G*power Great software to calculate power, display power curves Test family (play with t, F, z…) Enter no. tails, type of power analysis, , etc Enter target ES For Anova, use f as the ES measure (see Cohen, 1988) Power for comparing two r values (correlations) Or set power, use ‘Determine’ to calculate ES value X-Y plot, to examine sensitivity Download from tiny.cc/gpower3tiny.cc/gpower3 Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum 102 Chapter 12

103
Statistical power: Often so low HIGH power is gold, but many disciplines are cursed by low power Cohen (1962): In published psychology research, median power to find a medium-sized effect is about.5 Maxwell (2004): It was still about.5 for a medium effect Our file drawers (and journals) are crammed with Type 2 errors: Results that are ns even though there is a real effect Maxwell, S. E. (2004). The persistence of underpowered studies in psychological research: Causes, consequences, and remedies. Psychological Methods, 9, Chapter 12

104
Post hoc power: A bad idea Calculated after data are obtained. Use obtained d as target Logical problem: Power is a prospective probability Replicate, and see ‘dance of post hoc power’ Enormous variation Simulate two page of ESCI chapters 5-6ESCI chapters 5-6 Devastatingly criticised as not telling us what we want to know (chance we’ll find an effect of a size chosen to be meaningful) Merely reflects the outcome of our study. Tells us nothing new. SPSS, etc, gives post hoc power in its printouts. Don’t use this value Poor practice by software publishers Hoenig, J. M., & Heisey, D. M. (2001). The abuse of power: The pervasive fallacy of power calculations for data analysis. The American Statistician, 55, Chapter 12

105
Informativeness Informativeness—my general term for quality, size, sensitivity, usefulness To increase informativeness (also power & precision): Choose experimental design to minimise error (repeated measures?) Improve the measures, maybe measure twice and average Statistical control? (Covariance?) Target large effect sizes: Six therapy sessions, not two Use large N (of course)—tho’ to halve SE, need to multiply N by 4 Use Meta-analysis (combine results over experiments) Don’t spread your eggs over too many baskets. Do one or two things well, rather than risk examining lots of things and finding ≈ nothing The enemy is error variability: reduce it by all means possible An essential step in research planning, worth great effort: Brainstorm 105 Chapters 12, 13

106
Precision for planning (AIPE, accuracy in parameter estimation) ESCI calculates what N is required to give: expected MOE no more than f × , (so f is like d, a number of SDs) OR to have a 99% chance MOE is no more than f × ‘assurance’ = 99%, expressed as = 99 Three Precision pages of ESCI chapters 5-6ESCI chapters f × Chapter 13

107
Precision for planning, HEAT example HEAT experiment, two independent groups Target MOE of 8, which is 0.4 × 20, or f × , where f = 0.4 Three Precision pages of ESCI chapters 5-6ESCI chapters 5-6 For f = 0.4, two independent groups, need N = 50 And for assurance = 99, need N = 65 Alas, such large N, even with such large f Paired experiment, with =.7, need N = 17 Or N = 28, with assurance = Chapter 13

108
Precision for planning (AIPE, accuracy in parameter estimation) ESCI calculates what N is required to give: expected MOE no more than f × , (so f is like d, a number of SDs) OR to have a 99% chance MOE is no more than f × ‘assurance’ = 99%, expressed as = 99 Not yet widely used, but highly recommended (No need for H 0 ) Target MOE replaces target ES Should replace power calculations for funding and ethics applications 108 f × Chapter 13

109
Section 5 Conclusions Power, informativeness, precision IF using NHST, take power seriously Low power for a meaningful effect size ==> a waste of time? Don’t use post-hoc power Make great efforts to maximize informativeness Precision (MOE of the CI) is more useful than power: No need for NHST, or for any H 0 CI pictures are so revealing And essential, if you wish to conclude no (or trivial) effect. If it could be useful, use precision for planning NEXT: Meta-analysis and meta-analytic thinking 109 Chapters 12, 13

110
1.The new statistics: Why 2.Research integrity and the new statistics 3.Effect sizes and confidence intervals 4.The new statistics: How 5.Planning, power, and precision 6.Meta-analysis 110

111
Single studies—So many problems Power often low CIs often wide, precision low CIs report accurately the uncertainty in data. But don’t shoot the messenger—it’s a message we need to hear The solutions: Increase informativeness of individual studies Combine results over studies—Meta-analysis 111 Chapter 7

112
Meta-analysis: the picture The forest plot CIs make this picture possible; p values are irrelevant ESCI Meta-Analysis Beginning undergraduates easily grasp the basics—via pictures Effect sizes used in meta-analysis: Means, Cohen’s d, r, others… A stylized cat’s eye: Cumming, G. (2006). Meta-analysis: Pictures that explain how experimental findings can be integrated. 7th International Conference on Teaching Statistics. Brazil, July. tiny.cc/teachma tiny.cc/teachma Hunt, M. (1997). How science takes stock. The story of meta-analysis. New York: Sage. 112 Chapter 7

113
Meta-analysis: Small The minimum: two studies to be combined Prior studies, or Our own, perhaps within one project, or Prior + our own …to get the best overall estimates, so far 113 Chapter 7

114
Meta-analysis: Large Cooper’s seven steps; informed critical judgment at every stage 1. Formulate the questions, and scope of the systematic review 2. Search and obtain literature, contact researchers, find grey literature Establish selection criteria, read and select studies 3. Code studies, enter ES estimates and coding of study features 4. Choose what to include, and design the analyses 5. Analyse the data. Prefer random effects model. Moderators? 6. Interpret; draw empirical, theoretical, and applied conclusions. 7. Prepare critical discussion, present the review 8. Receive $1,000,000 and gold medal. Retire early. (Joke, alas.) Cooper, H. M. (2009). Research synthesis and meta-analysis: A step-by-step approach (4 th ed.). Thousand Oaks, CA: Sage. 114 Chapter 9

115
Heterogeneity Forest plot variability, or dance of the ESs Heterogeneity measures the extent of dancing Sampling variability accounts for a certain width of dancing Studies homogeneous => sampling variability accounts for dance Studies heterogeneous => There is variability beyond that expected from sampling variability Therefore, moderating variables may contribute Can studies be too homogeneous? What might that imply? Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2009). Introduction to meta-analysis. New York: Wiley. 115 Chapter 8

116
Models for meta-analysis Fixed effect (FE) model Assume homogeneity: every study estimates the same Almost always unrealistic Random effects (RE) model—our routine choice Assume Study i estimates i, sampled from N( , ) = 0 means studies homogeneous, FE model applies > 0 means studies heterogeneous, RE model needed Assumptions are severe. Unrealistic? Other models? Varying-coefficient model of Doug Bonett—watch this space Other approaches? Bayesian Schmidt-Hunter corrections for measurement and other biases Schmidt, F., & Hunter, J. (2014). Methods of meta-analysis: Correcting error and bias in research findings (3rd ed.). Sage. 116 Chapter 8

117
Measures of heterogeneity Q …the weighted sum of squares between studies T …the estimate of , with CI Interpret T and its CI (Extends to 0? Interpret the limits) I 2 …the percentage of total variance between studies that reflects variation in true effect size, rather than sampling variability If heterogeneity is considerable, consider a moderator analysis If heterogeneity low, RE gives same result as FE => nothing to lose by using RE 117 Chapter 8

118
But there’s more: Moderator analysis If heterogeneity high, look for moderators Simplest: Dichotomous moderator? (e.g., gender) Subgroups page of ESCI Meta-analysisESCI Meta-analysis Identify moderator, even if no study manipulated that variable Meta-analysis can give theoretical progress—that’s gold Example: Peter Wilson, clumsy children, meta-analysis of 50 studies Identify performance on complex visuospatial tasks as moderator Study this moderator empirically 118 Chapter 9

119
Continuous moderator? Meta-regression Fletcher & Kerr (2010): Does RTG fade with length of relationship? Meta-regression of ES values (RTG score) against years, 13 studies Correlation, not causality. Alternative interpretations? 119 Chapter 9

120
MA in the Publication Manual Many mentions, esp. pp , 183. Mainstreaming meta-analysis MARS (Meta-Analysis Reporting Standards) pp A further big advantage of the sixth edition (2010) Cooper, H. (2010). Reporting research in psychology: How to meet Journal Article Reporting Standards (APA Style). Washington, DC: APA Books. MARS and JARS 120 Chapter 9

121
CMA: Software for meta-analysis Comprehensive Meta Analysis Enter ES, and its variance, for each study—in 100+ formats Choose FE or RE model Assess heterogeneity of studies Explore moderators (ANOVA, or meta-regression) Forest plot Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2009). Introduction to meta-analysis. New York: Wiley. 121 Chapters 8, 9

122
Health sciences: The Cochrane Collaboration Medicine, health sciences, health policy, practice… Systematic reviews: meta-analyses of research studies Publicly available in some countries 5,000+ reviews online 31,000+ people in 120+ countries Aim to update every two years RevMan software for meta-analysis Includes some psychology Campbell collaboration, for some social sciences (education, welfare…) 122 Chapter 9

123
PTSD: A Cochrane review Bisson JI, Roberts NP, Andrew M, Cooper R, Lewis C. Psychological therapies for chronic post-traumatic stress disorder (PTSD) in adults. Cochrane Database of Systematic Reviews 2013, Issue 12. Art. No.: CD Includes 70 studies, total of 4761 participants Update of 2005 Cochrane review, updated in 2007 Support for efficacy, for chronic PTSD in adults, of Trauma-focused cognitive behavioral therapy (TFCBT), and Eye movement desensitization and reprocessing (EMDR) Non-trauma-focused psychological therapies not so effective 123 Chapter 9

124
Quality of the evidence Many studies, but each included only small numbers of people Some studies were poorly designed The overall quality of the studies was very low and so findings should be interpreted with caution There is insufficient evidence to show whether or not psychological therapy is harmful 124 Chapter 9

125
Bias? Research integrity issues 125 Chapter 9 Judgments about risks of bias, as percentages of included studies (p. 13)

126
Funnel plot: Publication bias? 126 Chapter 9 Individual therapy vs waitlist/usual care Outcome: Severity of PTSD symptoms - clinician-rated (p. 26) Large studies Small studies Favors therapy Favors control …suggests the possibility of publication bias Small studies missing? Because not statistically significant?

127
Funnel plot: Publication bias? 127 Chapter 9 Large studies Small studies Favors therapy Favors control …suggests the possibility of publication bias Small studies missing? Because not statistically significant? Individual therapy vs waitlist/usual care Outcome: Severity of PTSD symptoms - clinician-rated (p. 26)

128
128 Chapter 9 Forest plot of Analysis 1.10 Comparison 1: Trauma-focused CBT/Exposure therapy vs waitlist/usual care Outcome 10: Depression month follow-up (p. 152)

129
129 Chapter 9 Forest plot of Analysis 3.1. Comparison 3: Trauma-focused CBT/Exposure Therapy vs other therapies Outcome 1: Severity of PTSD symptoms – clinician (p. 171)

130
Meta-analysis, in many disciplines Particle physics As much heterogeneity as in social sciences! Hedges, L. V. (1987). How hard is hard science, how soft is soft science? The empirical cumulativeness of research. American Psychologist, 42, 443– Chapter 9

131
Meta-analytic thinking 1. Think of past literature in meta-analytic terms 2. Think of our study as the next step in that progressively cumulating meta-analysis 3. Report results so inclusion in future meta-analysis is easy Report all effect sizes (whether ns or not), in the best way Cumming, G., & Finch, S. (2001). A primer on the understanding, use and calculation of confidence intervals based on central and noncentral distributions. Educational and Psychological Measurement, 61, Chapters 1, 7, 9

132
Meta-analysis, more generally… A typical study asks multiple questions, tests theory, doesn’t simply ask ‘how large is the effect of the treatment?’ A typical article includes several experiments, a number of manipulations, a number of DVs… MA typically chooses a single ES from each study the most important, or the average of several, or the one most often reported or carry out more than one MA, as in Cochrane Converging approaches, converging evidence—most persuasive Reduces risk that findings are merely chance fluctuations Provides some evidence of robustness, generality of findings …all part of meta-analytic thinking 132 Chapter 7

133
Section 6 Conclusions Meta-analysis: Any size from small to very large Quantitative integration, even of large, messy literatures Best (most precise) ES estimates Moderator analysis: Practical and theoretical importance Variety of meta-analysis techniques and models Watch for developments Provides the best basis for evidence-based practice Build a cumulative quantitative discipline: Meta-analytic thinking 133 Chapters 7, 8, 9

134
The New Statistics: Actually doing it The editor says to remove CIs and just give p values. What do you DO? Research methods best practice: Consider, decide, persist The evidence should decide: Consider statistical cognition research TNS reasons are compelling, TNS is the way of the future. Persist. Explain and justify your data analytic approach APA Publication Manual: “Wherever possible, base discussion and interpretation of results on point and interval estimates” (p. 34). New guidelines: Psychonomic Society, Psychological Science… I add p values if I must, but don’t mention them, nor remove CIs or ESs 134 Chapter 15

135
The New Statistics: Where next? Examples and advice, for many situations, many ESs ANOVA, multivariate, SEM, model fitting… Refs: tiny.cc/tnswhyhowtiny.cc/tnswhyhow New textbooks, new software Editors insisting More guidelines, as Psychonomic Society Statistical cognition research Provides the evidence for evidence-based statistical practice p values and emotions Better graphics Study estimation thinking Strategies for research integrity: replication, publication, ethics… Teach it, from the start 135 Chapter 15

136
Take-home message Estimation: The eight-step plan 1. Use estimation thinking. State estimation questions as: “How much…?”, “To what extent…?”, “How many…?” The key to a more quantitative discipline 2. Indentify the ESs that best answer the questions 3. Declare full details of the intended procedure, data analysis, … 4. Calculate point and interval estimates (CIs) for those ESs 5. Make a picture, including CIs 6. Interpret (use knowledgeable judgment, in context) 7. Use meta-analytic thinking at every stage (…cumulative discipline) 8. Make a full report publicly available (an imperative, not just a goal) 136 Chapters 1, 2, 15

137
137 Comments to: Book information, and ESCI: …with links to: Radio talk, magazine articles Free sample chapter, dance of the p values Other videos: At YouTube, search for ‘Geoff Cumming’ Hug a confidence interval today! This PowerPoint file: tiny.cc/geoffdocstiny.cc/geoffdocs Tutorial article: tiny.cc/tnswhyhowtiny.cc/tnswhyhow

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google