Understanding p-values Annie Herbert Medical Statistician Research and Development Support Unit 0161 2064567.

Slides:



Advertisements
Similar presentations
Introduction to Hypothesis Testing
Advertisements

Statistical vs Clinical or Practical Significance
Statistical vs Clinical Significance
If you are viewing this slideshow within a browser window, select File/Save as… from the toolbar and save the slideshow to your computer, then open it.
Critical review of significance testing F.DAncona from a Alain Morens lecture 2006.
Overview of Lecture Parametric vs Non-Parametric Statistical Tests.
Quantitative Methods Lecture 3
SADC Course in Statistics (Session 09)
Intro to Statistics Part2 Arier Lee University of Auckland.
Hypothesis Testing Goal: Make statement(s) regarding unknown population parameter values based on sample data Elements of a hypothesis test: Null hypothesis.
Chapter 7 Hypothesis Testing
Inferential Statistics and t - tests
Comparison of 2 Population Means Goal: To compare 2 populations/treatments wrt a numeric outcome Sampling Design: Independent Samples (Parallel Groups)
Statistical vs. Practical Significance
Study Size Planning for Observational Comparative Effectiveness Research Prepared for: Agency for Healthcare Research and Quality (AHRQ)
Type I & Type II errors Brian Yuen 18 June 2013.
Summary Statistics & Confidence Intervals Annie Herbert Medical Statistician Research & Development Support Unit Salford Royal NHS Foundation Trust
Statistical Analysis SC504/HS927 Spring Term 2008
Putting Statistics to Work
Inferential Statistics
CHAPTER TWELVE ANALYSING DATA I: QUANTITATIVE DATA ANALYSIS.
Comparing Two Means.
1 Chapter 20: Statistical Tests for Ordinal Data.
Statistical Analysis and Data Interpretation What is significant for the athlete, the statistician and team doctor? important Will Hopkins
Statistical Significance What is Statistical Significance? What is Statistical Significance? How Do We Know Whether a Result is Statistically Significant?
Chapter Seventeen HYPOTHESIS TESTING
Statistical Significance What is Statistical Significance? How Do We Know Whether a Result is Statistically Significant? How Do We Know Whether a Result.
Research Curriculum Session III – Estimating Sample Size and Power Jim Quinn MD MS Research Director, Division of Emergency Medicine Stanford University.
PSY 1950 Confidence and Power December, Requisite Quote “The picturing of data allows us to be sensitive not only to the multiple hypotheses that.
Chapter 25 Asking and Answering Questions About the Difference Between Two Population Means: Paired Samples.
Chapter 9 Hypothesis Testing.
Sample Size Annie Herbert Medical Statistician Research & Development Support Unit Salford Royal Hospitals NHS Foundation Trust
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Sample Size Determination Ziad Taib March 7, 2014.
Effective Use of Graphs Annie Herbert Medical Statistician Research & Development Support Unit Salford Royal (Hope) Hospitals NHS Foundation Trust
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
Inference for regression - Simple linear regression
1 GE5 Lecture 6 rules of engagement no computer or no power → no lesson no SPSS → no lesson no homework done → no lesson.
Jan 17,  Hypothesis, Null hypothesis Research question Null is the hypothesis of “no relationship”  Normal Distribution Bell curve Standard normal.
Statistics & Biology Shelly’s Super Happy Fun Times February 7, 2012 Will Herrick.
Understanding the Variability of Your Data: Dependent Variable Two "Sources" of Variability in DV (Response Variable) –Independent (Predictor/Explanatory)
Introduction To Biological Research. Step-by-step analysis of biological data The statistical analysis of a biological experiment may be broken down into.
The Practice of Statistics Third Edition Chapter 10: Estimating with Confidence Copyright © 2008 by W. H. Freeman & Company Daniel S. Yates.
Sample Size And Power Warren Browner and Stephen Hulley  The ingredients for sample size planning, and how to design them  An example, with strategies.
Inference and Inferential Statistics Methods of Educational Research EDU 660.
S-012 Testing statistical hypotheses The CI approach The NHST approach.
Chapter 8 Delving Into The Use of Inference 8.1 Estimating with Confidence 8.2 Use and Abuse of Tests.
Medical Statistics as a science
The exam is of 2 hours & Marks :40 The exam is of two parts ( Part I & Part II) Part I is of 20 questions. Answer any 15 questions Each question is of.
Issues concerning the interpretation of statistical significance tests.
Fall 2002Biostat Statistical Inference - Proportions One sample Confidence intervals Hypothesis tests Two Sample Confidence intervals Hypothesis.
Chapter Eight: Using Statistics to Answer Questions.
Compliance Original Study Design Randomised Surgical care Medical care.
Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A.
Chapter 13 Understanding research results: statistical inference.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Data analysis Research methods.
Unit 5: Hypothesis Testing
Confidence Intervals and p-values
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Statistical inference: distribution, hypothesis testing
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
SDPBRN Postgraduate Training Day Dundee Dental Education Centre
Chapter 9 Hypothesis Testing.
Significance Tests: The Basics
Interpreting Basic Statistics
Advanced Algebra Unit 1 Vocabulary
Analyzing and Interpreting Quantitative Data
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
Presentation transcript:

Understanding p-values Annie Herbert Medical Statistician Research and Development Support Unit

Outline Population & Sample What is a p-value? P-values vs. Confidence Intervals One-sided and two-sided tests Multiplicity Common types of test Computer outputs

Timetable TimeTask 60 minsPresentation 20 minsCoffee Break 90 mins Practical Tasks in IT Room

‘Population’ and ‘Sample’ Studying population of interest Usually would like to know typical value and spread of outcome measure in population Data from entire population usually impossible or inefficient/expensive so take a sample (even census data can have missing values) Want sample to be ‘representative’ of population Randomise

Randomised Controlled Trial (RCT) POPULATIONSAMPLE RANDOMISATION GROUP 1 GROUP 2 OUTCOME

5 Key Questions What is the target population? What is the sample, and is it representative of the target population? What is the main research question? What is the main outcome? What is the main explanatory factor?

Example – Dolphin Study Population: people suffering mild to moderate depression Sample: outpatients diagnosed with suffering from mild to moderate depression - recruited through internet, radio, newspapers and hospitals Question: does animal-facilitated therapy help treatment of depression? Outcome: Hamilton depression score at baseline and end of treatment Explanatory Factors: whether patients participated in dolphin programme (treatment) or outdoor nature programme (control)

Dolphin Study - Making Comparisons Hamilton Depression Score Treatment Group N=15 Control Group N=15 Baseline Mean (SD) 14.5 (2.6)14.5 (2.2) 2 Weeks Mean (SD) 7.3 (2.5)10.9 (3.4) Reduction Mean (SD) 7.3 (3.5)3.6 (3.4) BMJ - Antonioli & Reveley, 2005;331:1231 (26 November)

Dolphin Study - does the treatment make a difference? For both groups the Hamilton depression score decreased between baseline and 2 weeks Clearly for our sample the treatment group has a better mean reduction by: = 3.7 points What does this tell us about the target population?

What is a p-value? Assume that there is really no difference in the target population (this is the null hypothesis) p-value: how likely is it that we would see at least as much difference as we did in our sample? Dolphin study example: if treatments are equally effective, how likely is it that we would see a difference in mean reduction between the treatment and control groups of at least 3.7 points? P=0.007

Assessing the p-value Large p-value: –Quite likely to see these results by chance –Cannot be sure of a difference in the target population Small p-value: –Unlikely to see these results by chance –There may be a difference in the target population

What is a small/large p-value? Cut-off point (‘significance level’) is arbitrary Significance level set to 5% (0.05) by convention Regard the p-value as the ‘weight of evidence’ P < 5%: strong evidence of a difference P ≥ 5%: no evidence of a difference (does not mean evidence of no difference)

Types of Statistical Error Type I Error = Probability of rejecting the null hypothesis when it is in fact true. Type II Error = Probability of not rejecting the null hypothesis when it is false.

Confidence Intervals Confidence interval = “range of values that we can be confident will contain the true value of the population” The “give or take a bit” for best estimate Dolphin study example: what is the range of values that we can be confident contains the true difference of mean reduction between treatment and control group? (95% CI: 1.1 to 6.2)

p-values vs. Confidence Intervals p-value: -Weight of evidence to reject null hypothesis -No clinical interpretation Confidence Interval: -Can be used to reject null hypothesis -Clinical interpretation -Effect size -Direction of effect -Precision of population estimate

Statistical Significance vs. Clinical Importance p-value < 0.05, CI doesn’t contain 0: indicates a statistically significant difference. What is the size of this difference, and is it enough to change current practice? E.g. Dolphin study: - P= % CI = (1.1, 6.2) Expense? Side-effects? Ease of use? Consider clinically important difference when making sample size calculations/interpreting results

One-sided & Two-sided Tests One-sided test: only possible that difference in one particular direction. Two-sided test: interested in difference between groups, whether worse or better. Dolphin study example: is the treatment reduction mean less or greater than the control reduction mean? In real life, almost always two-sided.

Multiplicity Number of testsChance of at least one significant value E.g. Significance level = /20 tests will be ‘significant’, even when no difference in target population

Reducing Multiplicity Problems Pick one outcome to be primary Specify tests in advance Focus on research question and keep number of tests to a minimum Do not necessarily believe a single significant result (repeat experiment, use meta-analysis)

Types of Outcome Data Categorical Example: Yes/No Graphs: Bar/Pie Chart Summary: Frequency/Proportion Test: Chi-squared Numerical/Continuous Example: Weight Graphs: Histogram/Boxplot Summary: Mean (SD) Median (IQR) Test (two groups): t-test or Mann-Whitney U

Notable Exceptions Comparing more than two groups Continuous explanatory factors Paired Data: -Paired t-test -Wilcoxon -McNemar Time-to-event Data: Log-rank test (For all of the above, seek statistical advice)

Computer Output - StatsDirect

Computer Output - SPSS

Final Pointers Plan analyses in advance –Seek statistical advice Start with graphs and summary statistics Keep number of tests to a minimum Include confidence intervals ‘Absence of evidence is not evidence of absence’