MAGNITUDE-BASED INFERENCES

Slides:



Advertisements
Similar presentations
If you are viewing this slideshow within a browser window, select File/Save as… from the toolbar and save the slideshow to your computer, then open it.
Advertisements

If you are viewing this slideshow within a browser window, select File/Save as… from the toolbar and save the slideshow to your computer, then open it.
A Spreadsheet for Analysis of Straightforward Controlled Trials
Statistical vs Clinical or Practical Significance
Statistical vs Clinical Significance
If you are viewing this slideshow within a browser window, select File/Save as… from the toolbar and save the slideshow to your computer, then open it.
Statistical Analysis and Data Interpretation What is significant for the athlete, the statistician and team doctor? important Will Hopkins
Statistics.  Statistically significant– When the P-value falls below the alpha level, we say that the tests is “statistically significant” at the alpha.
Issues About Statistical Inference Dr R.M. Pandey Additional Professor Department of Biostatistics All-India Institute of Medical Sciences New Delhi.
Review Find the mean square (MS) based on these two samples. A.26.1 B.32.7 C.43.6 D.65.3 E M A = 14 M B = Flood.
Chapter 10 Section 2 Hypothesis Tests for a Population Mean
Summary.
LECTURE 8 HYPOTHESIS TESTING AND STATISTICAL SIGNIFICANCE MSc (Addictions) Addictions Department.
Review: What influences confidence intervals?
T-tests Computing a t-test  the t statistic  the t distribution Measures of Effect Size  Confidence Intervals  Cohen’s d.
Behavioural Science II Week 1, Semester 2, 2002
Lecture 5 Outline – Tues., Jan. 27 Miscellanea from Lecture 4 Case Study Chapter 2.2 –Probability model for random sampling (see also chapter 1.4.1)
PSY 307 – Statistics for the Behavioral Sciences
Inferences About Means of Single Samples Chapter 10 Homework: 1-6.
PSY 1950 Confidence and Power December, Requisite Quote “The picturing of data allows us to be sensitive not only to the multiple hypotheses that.
Chapter 9 Hypothesis Testing.
PY 427 Statistics 1Fall 2006 Kin Ching Kong, Ph.D Lecture 6 Chicago School of Professional Psychology.
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Sample Size Determination Ziad Taib March 7, 2014.
Fall 2012Biostat 5110 (Biostatistics 511) Discussion Section Week 8 C. Jason Liang Medical Biometry I.
Statistical Inference Dr. Mona Hassan Ahmed Prof. of Biostatistics HIPH, Alexandria University.
Comparing Means From Two Sets of Data
14. Introduction to inference
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Inferential Statistics.
Chapter 8 Introduction to Hypothesis Testing
Inferential Statistics 2 Maarten Buis January 11, 2006.
N318b Winter 2002 Nursing Statistics Hypothesis and Inference tests, Type I and II errors, p-values, Confidence Intervals Lecture 5.
Significance testing and confidence intervals Col Naila Azam.
 If you are viewing this slideshow within a browser window, select File/Save as… from the toolbar and save the slideshow to your computer, then open it.
Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed.
Biostatistics Class 6 Hypothesis Testing: One-Sample Inference 2/29/2000.
Making Inferences about Effects of Hypoxia/Altitude Will Hopkins sportsci.org/will) Victoria University, Melbourne, Australia  Making.
Statistics - methodology for collecting, analyzing, interpreting and drawing conclusions from collected data Anastasia Kadina GM presentation 6/15/2015.
Chapter 221 What Is a Test of Significance?. Chapter 222 Thought Question 1 The defendant in a court case is either guilty or innocent. Which of these.
Ch 10 – Intro To Inference 10.1: Estimating with Confidence 10.2 Tests of Significance 10.3 Making Sense of Statistical Significance 10.4 Inference as.
How confident are we in the estimation of mean/proportion we have calculated?
Introduction to Inference: Confidence Intervals and Hypothesis Testing Presentation 8 First Part.
Introduction to Inference: Confidence Intervals and Hypothesis Testing Presentation 4 First Part.
Issues concerning the interpretation of statistical significance tests.
Chapter 3: Statistical Significance Testing Warner (2007). Applied statistics: From bivariate through multivariate. Sage Publications, Inc.
Chapter 8 Parameter Estimates and Hypothesis Testing.
Fall 2002Biostat Statistical Inference - Confidence Intervals General (1 -  ) Confidence Intervals: a random interval that will include a fixed.
: An alternative representation of level of significance. - normal distribution applies. - α level of significance (e.g. 5% in two tails) determines the.
Review I A student researcher obtains a random sample of UMD students and finds that 55% report using an illegally obtained stimulant to study in the past.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 10 Comparing Two Groups Section 10.1 Categorical Response: Comparing Two Proportions.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
1 Probability and Statistics Confidence Intervals.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
Chapter 9: Introduction to the t statistic. The t Statistic The t statistic allows researchers to use sample data to test hypotheses about an unknown.
Uncertainty and confidence Although the sample mean,, is a unique number for any particular sample, if you pick a different sample you will probably get.
Hypothesis Testing and Statistical Significance
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Making Inferences About Effects Seminar presented at Leeds Beckett and Split universities, March 2016 This slideshow consists of part of the lecture on.
Statistical Analysis and Data Interpretation: What is Important for the Athlete and Statistician Will G Hopkins Institute of Sport and Recreation Research.
Chapter 9 Introduction to the t Statistic
Dr.Theingi Community Medicine
Lecture 9-I Data Analysis: Bivariate Analysis and Hypothesis Testing
If you are viewing this slideshow within a browser window, select File/Save as… from the toolbar and save the slideshow to your computer, then open it.
“It is better to observe than to criticise.”
Chapter 8: Inference for Proportions
Review: What influences confidence intervals?
Review for Exam 2 Some important themes from Chapters 6-9
Statistical inference
I figured there must be a better way. There is: confidence intervals!
Quantitative Data Analysis
Presentation transcript:

MAGNITUDE-BASED INFERENCES An alternative to hypothesis testing

Lecture outline  Limitations of null hypothesis significance testing (NHST)  Confidence intervals  Magnitude-based inferences (MBI)  Smallest worthwhile effects y Limitations of magnitude-based inferences

Null hypothesis significance testing A major aim of research is to make an inference about an effect in a population based on study of a sample. Null-hypothesis testing via the P-value and statistical significance is the traditional approach to making an inference. P-value is the probability of obtaining the observed result, or more extreme results, if the null hypothesis is true. Most likely observation Probability Observed data point One-sided p-value Set of possible results (under the null hypothesis)

P-values are difficult to understand! P ≤ 0.05 is arbitrary Limitations of NHST P-values are difficult to understand! P ≤ 0.05 is arbitrary “Surely God loves the 0.06 nearly as much as 0.05.” (Rosnow and Rosenthal, 1989) Some useful effects aren’t statistically significant Some statistically significant effects aren’t useful P > 0.05 often interpreted as unpublishable So good data don’t get published Ignores ‘judgement’, leads to dichotomised thinking Statistical Hypothesis Inference Testing? In arriving at a problem statement and research question, researchers usually have good reasons to believe that effects will be different from zero. The more relevant issue is not whether there is an effect but how big it is. Unfortunately, the P value alone provides us with no information about the direction or size of the effect or, given sampling variability, the range of feasible values. Depending, inter alia, on sample size and variability, an outcome statistic with P < .05 could represent an effect that is clinically, practically, or mechanistically irrelevant. Conversely, a nonsignificant result (P > .05) does not necessarily imply that there is no worthwhile effect, because a combination of small sample size and large measurement variability can mask important effects. An overreliance on P values might therefore lead to unethical errors of interpretation.

Limitations of NHST

Limitations of NHST

Very highly significant!!! ** <0.01 Highly significant!! * <0.05 Dance of the p-values https://www.youtube.com/watch?v=ez4DgdurRPg *** <0.001 Very highly significant!!! ** <0.01 Highly significant!! * <0.05 Significant (phew) ? 0.05 – 0.10 “Approaching significance” >0.10 Non-significant

Representation of the limits as a confidence interval: Confidence intervals A range within which we infer the true, population or large sample value is likely to fall. Likely is usually a probability of 0.95 (for 95% limits). probability value of effect statistic positive negative probability distribution of true value, given the observed value Area = 0.95 observed value lower likely limit upper likely limit Representation of the limits as a confidence interval: likely range of true value value of effect statistic positive negative

CI = [M – Critical value x SE, M + Critical value x SE] Confidence intervals (CI) To calculate a confidence interval you need: Confidence level (usually 95%, but can use 90% or 99%) Statistic (e.g. group mean) Margin of error (Critical value x Standard error of the statistic) 𝑆𝐸= 𝑆𝐷 𝑛 z-score or t-score CI = [M – MOE, M + MOE] CI = [M – Critical value x SE, M + Critical value x SE] EXAMPLE: You test the vertical jump heights of 100 athletes. The mean and standard deviation of the sample was 60±20 cm. The 95% CIs for this mean would be: Lower CI = 60 – 1.96 x 20/√100 Upper CI = 60 + 1.96 x 20/√100 Lower CI = 60 – 3.92 Upper CI = 60 + 3.92 95% CI = 56 to 64 cm

Confidence intervals (CI) Confidence intervals also convey the precision of an estimate Wider confidence interval = less precision In the previous example, a smaller sample size (e.g. n=20) would have given less precision for our estimate… Lower CI = 60 – 1.96 x 20/√20 Upper CI = 60 + 1.96 x 20/√20 Lower CI = 60 – 8.77 Upper CI = 60 + 8.77 95% CI = 51 to 69 cm

Confidence intervals also convey precision of our estimate Recap P-value = The probability of obtaining the observed result, or more extreme results, if the null hypothesis is true. ‘NHST’ has several limitations, namely, it leads to dichotomised thinking and does not tell us if the effect is important/worthwhile. Confidence intervals tell us the likely range of the true (population) value. It could be red! Confidence intervals also convey precision of our estimate Larger sample size and/or more consistent response = Smaller confidence interval & more precision.

value of effect statistic Magnitude-based inferences For magnitude-based inferences, we interpret confidence limits in relation to the smallest clinically beneficial and harmful effects. These are usually equal and opposite in sign. Harm is the opposite of benefit, not side effects. They define regions of beneficial, trivial, and harmful values: All you need is these two things: the confidence interval and a sense of what is important (e.g., beneficial and harmful). value of effect statistic positive negative trivial beneficial harmful smallest clinically harmful effect smallest clinically beneficial effect

value of effect statistic Magnitude-based inferences Put the confidence interval and these regions together to make a decision about clinically significant, clear or decisive effects. MBI Statistically significant? value of effect statistic positive negative trivial harmful beneficial Use it. Yes Use it. Yes Why hypothesis testing can be unethical and impractical! Use it. No Depends No Don’t use it. Yes Don’t use it. No Don’t use it. No Don’t use it. Yes Don’t use it. Yes Unclear: need more research. No

value of effect statistic Magnitude-based inferences We calculate probabilities that the true effect could be clinically beneficial, trivial, or harmful (Pbeneficial, Ptrivial, Pharmful). Spreadsheets available at: sportsci.org The Ps allow a more detailed call on magnitude, as follows… smallest beneficial value probability value of effect statistic positive negative observed value probability distribution of true value Pbeneficial = 0.80 Ptrivial = 0.15 smallest harmful value Pharmful = 0.05

value of effect statistic Magnitude-based inferences Making a more detailed call on magnitudes using chances of benefit and harm. Chances (%) that the effect is harmful / trivial / beneficial value of effect statistic positive negative trivial harmful beneficial 0/0/100 Most likely beneficial 0/7/93 Likely beneficial 2/33/65 Risk of harm >0.5% is unacceptable, unless chance of benefit is high enough. Possibly beneficial Mechanistic: Possibly trivial 1/59/40 Clinical: unclear 0/97/3 Very likely trivial 2/94/4 Likely trivial For a mechanistic inference the spreadsheet shows the effect as unclear if the confidence interval, which represents uncertainty about the true value, overlaps values that are substantial in a positive and negative sense; the effect is otherwise characterized with a statement about the chance that it is trivial, positive or negative. For a clinical inference the effect is shown as unclear if its chance of benefit is at least promising but its risk of harm is unacceptable; the effect is otherwise characterized with a statement about the chance that it is trivial, beneficial or harmful.  28/70/2 Possibly harmful 74/26/0 Possibly harmful 97/3/0 Very likely harmful 9/60/31 Unclear

The effect… beneficial/trivial/harmful Magnitude-based inferences Use this table for the plain-language version of chances: An effect should be almost certainly not harmful (<0.5%) and at least possibly beneficial (>25%) before you decide to use it. But you can tolerate higher chances of harm if chances of benefit are much higher: e.g., 3% harm and 76% benefit = clearly useful. Use an odds ratio of benefit/harm of >66 in such situations. The effect… beneficial/trivial/harmful is almost certainly not… Probability <0.005 Chances <0.5% Odds <1:199 is very unlikely to be… 0.005–0.05 0.5–5% 1:999–1:19 is unlikely to be…, is probably not… 0.05–0.25 5–25% 1:19–1:3 is possibly (not)…, may (not) be… 0.25–0.75 25–75% 1:3–3:1 is likely to be…, is probably… is very likely to be… is almost certainly… 0.75–0.95 0.95–0.995 >0.995 75–95% 95–99.5% >99.5% 3:1–19:1 19:1–199:1 >199:1

Both these effects are clinically decisive, clear, or significant. Two examples of use of the spreadsheet for clinical chances: P value 0.03 value of statistic 1.5 Conf. level (%) 90 deg. of freedom 18 Confidence limits lower upper 0.4 2.6 positive negative 1 -1 threshold values for clinical chances -0.7 5.5 1 -1 90 18 0.20 2.4 Both these effects are clinically decisive, clear, or significant. prob (%) odds 78 3:1 likely, probable clinically positive Chances (% or odds) that the true value of the statistic is prob (%) odds 1:2071 almost certainly not clinically negative prob (%) odds 22 1:3 unlikely, probably not clinically trivial 78 3:1 likely, probable 19 1:4 unlikely, probably not 3 1:30 very unlikely

a with reference to a smallest worthwhile change of 0.5%. How to Publish Clinical Chances Example of a table from a randomized controlled trial: TABLE 1–Differences in improvements in kayaking sprint speed between slow, explosive and control training groups. Mean improvement (%) and 90% confidence limits 3.1; ±1.6 Qualitative outcomea Almost certainly beneficial a with reference to a smallest worthwhile change of 0.5%. Compared groups Slow - control Explosive - control Slow - explosive 2.6; ±1.2 Very likely beneficial 0.5; ±1.4 Unclear

Confidence intervals also convey precision of our estimate Recap P-value = The probability of obtaining the observed result, or more extreme results, if the null hypothesis is true. NHST has several limitations, namely, it does not tell us if the effect is important/worthwhile. Confidence intervals tell us the likely range of the true (population) value. It could be red! Confidence intervals also convey precision of our estimate Larger sample size and/or more consistent response = Smaller confidence interval & more precision.

Recap For magnitude-based inferences, we interpret confidence limits in relation to the smallest clinically beneficial and harmful effects. Spreadsheets at sportsci.org provide the % likelihood that an effect is harmful | trivial | beneficial. Effects that cross thresholds for benefit and harm are classed as unclear. An effect should be almost certainly not harmful (<0.5%) and at least possibly beneficial (>25%) before you decide to use it. But you can tolerate higher chances of harm if chances of benefit are much higher: e.g., 3% harm and 76% benefit = clearly useful. Use an odds ratio of benefit/harm of >66 in such situations.

Smallest worthwhile difference? Problem: what's the smallest clinically important effect? “If you can't answer this question, quit the field”. This problem applies also with hypothesis testing, because it determines sample size you need to test the null properly. 0.3 of a CV gives a top athlete one extra medal every 10 races. This is the smallest important change in performance to aim for in research on, or intended for, elite athletes. 0.9, 1.6, 2.5, 4.0 of a CV gives an extra 3, 5, 7, 9 medals per 10 races (thresholds for moderate, large, very large, extremely large effecs). References: Hopkins et al. MSSE 31, 472-485, 1999 and MSSE 41, 3-12, 2009.

Smallest worthwhile difference? The default for most other populations and effects is Cohen's set of smallest values. You express the difference or change in the mean as a fraction of the between-subject standard deviation (mean/SD). It's like a z score or a t statistic. In a controlled trial, it's the SD of all subjects in the pre-test, not the SD of the change scores. The smallest worthwhile difference or change is 0.20. 0.20 is equivalent to moving from the 50th to the 58th percentile.

Interpretation of standardised difference or change in means: Example: The effect of a treatment on strength strength post pre Trivial effect (0.1x SD) strength post pre Very large effect (3x SD) Interpretation of standardised difference or change in means: Cohen <0.2 Hopkins <0.2 trivial small moderate large very large 0.2-0.5 0.2-0.6 0.5-0.8 0.6-1.2 >0.8 1.2-2.0 ? 2.0-4.0 ? >4.0 extremely large

Smallest worthwhile difference? Relationship of standardised effect to difference or change in percentile: strength Standardised effect = 0.20 area = 50% athlete on 50th percentile athlete on 58th percentile area = 58% strength Standardised effect 0.20 Percentile change 50  58 0.20 80  85 0.20 95  97 0.25 50  60 1.00 50  84 2.00 50  98

Smallest worthwhile difference? Trivial Small Moderate Large Very large Nearly perfect Perfect Correlation 0.0 0.1 0.3 0.5 0.7 0.9 1 Diff. in means 0.2 0.6 1.2 2.0 4.0 Infinite Freq. diff 10 30 50 70 90 100 Rel. risk 1.0 1.9 3.0 5.7 19 Odds ratio 1.5 3.5 9.0 32 360 infinite

Limitations of magnitude-based inferences Problem: these new approaches are not yet mainstream. Confidence limits at least are coming in, so look for and interpret the importance of the lower and upper limits. You can use a spreadsheet to convert a published P value into a more meaningful magnitude-based inference. If the authors state “P<0.05” you can’t do it properly. If they state “P>0.05” or “NS”, you can’t do it at all. More difficult to present and discuss results? ‘Magnitude-based inferences under attack’ http://sportsci.org/2014/inbrief.htm#MBI

Summary MBI’s are an alternative to traditional NHST. For magnitude-based inferences, we interpret confidence limits in relation to the smallest clinically beneficial and harmful effects. Smallest worthwhile effects may be based on variability of performance (e.g. 0.3 of CV). Or standardised effects may be used (e.g. Cohen’s D). Spreadsheets available at sportsci.org to carry out MBI’s. Growing in popularity, but still not understood/accepted by many journals and academics. Confidence intervals convey far more information than a P-value alone, and should be presented where possible.

Recommended Reading Simulation software: http://www.latrobe.edu.au/psy/research/cognitive-and-developmental-psychology/esci Batterham, A. M. & Hopkins, W. G. (2006) Making meaningful inferences about magnitudes. International Journal of Sports Physiology and Performance, 1, 50-57. Hopkins, W., Marshall, S., Batterham, A. & Hanin, J. (2009) Progressive statistics for studies in sports medicine and exercise science. Medicine and Science in Sports and Exercise, 41, 3-12. http://sportsci.org/