Significance testing and confidence intervals

Slides:



Advertisements
Similar presentations
Confidence intervals Kristin Tolksdorf (based on previous EPIET material) 18th EPIET/EUPHEM Introductory course
Advertisements

Significance testing Ioannis Karagiannis (based on previous EPIET material) 18 th EPIET/EUPHEM Introductory course
Significance testing and confidence intervals Ágnes Hajdu EPIET Introductory course
Sample size calculation
Critical review of significance testing F.DAncona from a Alain Morens lecture 2006.
HYPOTHESIS TESTING Four Steps Statistical Significance Outcomes Sampling Distributions.
Evaluating Hypotheses Chapter 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics.
Evaluating Hypotheses Chapter 9 Homework: 1-9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics ~
Chapter 9 Hypothesis Testing.
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Inferential Statistics
Thomas Songer, PhD with acknowledgment to several slides provided by M Rahbar and Moataza Mahmoud Abdel Wahab Introduction to Research Methods In the Internet.
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
Hypothesis Testing:.
Section 9.1 Introduction to Statistical Tests 9.1 / 1 Hypothesis testing is used to make decisions concerning the value of a parameter.
1 Today Null and alternative hypotheses 1- and 2-tailed tests Regions of rejection Sampling distributions The Central Limit Theorem Standard errors z-tests.
CHAPTER 16: Inference in Practice. Chapter 16 Concepts 2  Conditions for Inference in Practice  Cautions About Confidence Intervals  Cautions About.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Inferential Statistics.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 22 Using Inferential Statistics to Test Hypotheses.
The Argument for Using Statistics Weighing the Evidence Statistical Inference: An Overview Applying Statistical Inference: An Example Going Beyond Testing.
Instructor Resource Chapter 5 Copyright © Scott B. Patten, Permission granted for classroom use with Epidemiology for Canadian Students: Principles,
Hypothesis Testing Hypothesis Testing Topic 11. Hypothesis Testing Another way of looking at statistical inference in which we want to ask a question.
Significance testing and confidence intervals Col Naila Azam.
Biostatistics Class 6 Hypothesis Testing: One-Sample Inference 2/29/2000.
Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.
1 Chapter 8 Introduction to Hypothesis Testing. 2 Name of the game… Hypothesis testing Statistical method that uses sample data to evaluate a hypothesis.
How confident are we in the estimation of mean/proportion we have calculated?
Introduction to Inference: Confidence Intervals and Hypothesis Testing Presentation 4 First Part.
Issues concerning the interpretation of statistical significance tests.
© Copyright McGraw-Hill 2004
Chapter 13 Understanding research results: statistical inference.
Uncertainty and confidence Although the sample mean,, is a unique number for any particular sample, if you pick a different sample you will probably get.
Confidence Intervals and Hypothesis Testing Mark Dancox Public Health Intelligence Course – Day 3.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Chapter 9 Introduction to the t Statistic
+ Homework 9.1:1-8, 21 & 22 Reading Guide 9.2 Section 9.1 Significance Tests: The Basics.
15 Inferential Statistics.
Chapter Nine Hypothesis Testing.
HYPOTHESIS TESTING.
Lecture #8 Thursday, September 15, 2016 Textbook: Section 4.4
Sample size calculation
CHAPTER 9 Testing a Claim
Lecture Nine - Twelve Tests of Significance.
Unit 5: Hypothesis Testing
Inference and Tests of Hypotheses
Significance testing Introduction to Intervention Epidemiology
1. Estimation ESTIMATION.
HYPOTHESIS TESTING Asst Prof Dr. Ahmed Sameer Alnuaimi.
CHAPTER 9 Testing a Claim
Understanding Results
Sample Size Estimation
Statistical inference: distribution, hypothesis testing
Random error, Confidence intervals and P-values
Hypothesis Tests for a Population Mean,
Chapter 9 Hypothesis Testing.
Review for Exam 2 Some important themes from Chapters 6-9
Chapter Nine Part 1 (Sections 9.1 & 9.2) Hypothesis Testing
Hypothesis Testing.
Interpreting Basic Statistics
Chapter 7: The Normality Assumption and Inference with OLS
Statistics II: An Overview of Statistics
Statistical significance using p-value
CHAPTER 9 Testing a Claim
Interpreting Epidemiologic Results.
CHAPTER 16: Inference in Practice
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
Type I and Type II Errors
Statistical Test A test of significance is a formal procedure for comparing observed data with a claim (also called a hypothesis) whose truth we want to.
Presentation transcript:

Significance testing and confidence intervals Ágnes Hajdu EPIET Introductory course 3.10.2011

The idea of statistical inference Generalisation to the population Conclusions based on the sample Population Hypotheses Sample

Inferential statistics Uses patterns in the sample data to draw inferences about the population represented, accounting for randomness. Two basic approaches: Hypothesis testing Estimation Common goal: conclude on the effect of an independent variable (exposure) on a dependent variable (outcome).

The aim of a statistical test To reach a scientific decision (“yes” or “no”) on a difference (or effect), on a probabilistic basis, on observed data.

Why significance testing? Botulism outbreak in Italy: “The risk of illness was higher among diners who ate home preserved green olives (RR=3.6).” Is the association due to chance?

The two hypothesis! There is NO difference between the two groups (=no effect) Null Hypothesis (H0) (e.g.: RR=1) There is a difference between the two groups (=there is an effect) Alternative Hypothesis (H1) (eg: RR=3.6) When you perform a test of statistical significance you usually reject or do not reject the Null Hypothesis (H0)

Botulism outbreak in Italy Null hypothesis (H0): “There is no association between consumption of green olives and Botulism.” Alternative hypothesis (H1): “There is an association between consumption of green olives and Botulism.”

Hypothesis, testing and null hypothesis Tests of statistical significance Data not consistent with H0 : H0 can be rejected in favour of some alternative hypothesis H1 (the objective of our study). Data are consistent with the H0 : H0 cannot be rejected You cannot say that the H0 is true. You can only decide to reject it or not reject it.

How to decide when to reject the null hypothesis? H0 rejected using reported p value p-value = probability that our result (e.g. a difference between proportions or a RR) or more extreme values could be observed under the null hypothesis

p values – practicalities Small p values = low degree of compatibility between H0 and the observed data: you reject H0, the test is significant Large p values = high degree of compatibility between H0 and the observed data: you don’t reject H0, the test is not significant We can never reduce to zero the probability that our result was not observed by chance alone

Levels of significance – practicalities We need of a cut-off ! 0.01 0.05 0.10 p value > 0.05 = H0 non rejected (non significant) p value ≤ 0.05 = H0 rejected (significant) BUT: Give always the exact p-value rather than „significant“ vs. „non-significant“.

Examples from the literature ”The limit for statistical significance was set at p=0.05.” ”There was a strong relationship (p<0.001).” ”…, but it did not reach statistical significance (ns).” „ The relationship was statistically significant (p=0.0361)” p=0.05  Agreed convention Not an absolute truth ”Surely, God loves the 0.06 nearly as much as the 0.05” (Rosnow and Rosenthal, 1991)

p = 0.05 and its errors Level of significance, usually p = 0.05 p value used for decision making But still 2 possible errors: H0 should not be rejected, but it was rejected : Type I or alpha error H0 should be rejected, but it was not rejected : Type II or beta error

Types of errors Truth Decision based on the p value No diff Diff Decision based on the p value No diff Diff H0 is “true” but rejected: Type I or  error H0 is “false” but not rejected: Type II or  error

More on errors Probability of Type I error: Value of α is determined in advance of the test The significance level is the level of α error that we would accept (usually 0.05) Probability of Type II error: Value of β depends on the size of effect (e.g. RR, OR) and sample size 1-β: Statistical power of a study to detect an effect on a specified size (e.g. 0.80) Fix β in advance: choose an appropriate sample size

Even more on errors H1 is true H0 is true b a Test statistics T

Principles of significance testing Formulate the H0 Test your sample data against H0 The p value tells you whether your data are consistent with H0 i.e, whether your sample data are consistent with a chance finding (large p value), or whether there is reason to believe that there is a true difference (association) between the groups you tested You can only reject H0, or fail to reject it!

Quantifying the association Test of association of exposure and outcome E.g. Chi2 test or Fisher’s exact test Comparison of proportions Chi2-value quantifies the association The larger the Chi2-value, the smaller the p value the more the observed data deviate from the assumption of independence (no effect).

Chi-square value = sum of all cells: for each cell, subtract the expected number from the observed number, square the difference, and divide by the expected number

Botulism outbreak in Italy 2x2 table Expected number of ill and not ill for each cell : Ill Non ill 9 43 4 79 Olives 52 x10% ill x 90% non-ill 5 47 No olives 83 x10% ill 8 75 x 90% non-ill 13 122 135 Expected proportion of ill and not ill : 10 % 90 %

Chi-square value = 5.73 Botulism outbreak in Italy p = 0.016 Ill Non ill Olives No olives

Botulism outbreak in Italy “The relative risk (RR) of illness among diners who ate home preserved green olives was 3.6 (p=0.016).” The p-value is smaller than the chosen significance level of a = 5%. → Null hypothesis can be rejected. There is a 0.016 probability (16/1000) that the observed association could have occured by chance, if there were no true association between eating olives and illness.

Epidemiology and statistics

Criticism on significance testing “Epidemiological application need more than a decision as to whether chance alone could have produced association.” (Rothman et al. 2008) → Estimation of an effect measure (e.g. RR, OR) rather than significance testing.

Why estimation? Botulism outbreak in Italy: “The risk of illness was higher among diners who ate home preserved green olives (RR=3.6).” How confident can we be in the result? What is the precision of our point estimate?

The epidemiologist needs measurements rather than probabilities 2 is a test of association OR, RR are measures of association on a continuous scale infinite number of possible values The best estimate = point estimate Range of values allowing for random variability: Confidence interval  precision of the point estimate

Confidence interval (CI) Range of values, on the basis of the sample data, in which the population value (or true value) may lie. Frequently used formulation: „If the data collection and analysis could be replicated many times, the CI should include the true value of the measure 95% of the time .”

Indicates the amount of random error in the estimate Confidence interval (CI) e.g. CI for means 95% CI = x – 1.96 SE up to x + 1.96 SE a = 5% 1 - α α/2 α/2 s Lower limit upper limit of 95% CI of 95% CI Indicates the amount of random error in the estimate Can be calculated for any „test statistic“, e.g.: means, proportions, ORs, RRs

CI terminology RR = 1.45 (0.99 – 2.1) Point estimate Confidence interval RR = 1.45 (0.99 – 2.1) Lower confidence limit Upper confidence limit

Width of confidence interval depends on … The amount of variability in the data The size of the sample The arbitrary level of confidence you desire for your study (usually 90%, 95%, 99%) A common way to use CI regarding OR/RR is : If 1.0 is included in CI  non significant If 1.0 is not included in CI  significant

Looking the CI RR = 1 A B Large RR Study A, large sample, precise results, narrow CI – SIGNIFICANT Study B, small size, large CI - NON SIGNIFICANT Study A, effect close to NO EFFECT Study B, no information about absence of large effect

More studies are better or worse? Decision making based on results from a collection of studies is not facilitated when each study is classified as a YES or NO decision. 1 RR  20 studies with different results... Need to look at the point estimation and its CI But also consider its clinical or biological significance

Botulism outbreak in Italy How confident can we be in the result? Relative risk = 3.6 (point estimate) 95% CI for the relative risk: (1.17 ; 11.07) The probability that the CI from 1.17 to 11.07 includes the true relative risk is 95%.

Botulism outbreak in Italy “The risk of illness was higher among diners who ate home preserved green olives (RR=3.6, 95% CI 1.17 to 11.07).”

The p-value (or CI) function A graph showing the p value for all possible values of the estimate (e.g. OR or RR). Quantitative overview of the statistical relation between exposure and disease for the set of data. All confidence intervals can be read from the curve. The function can be constructed from the confidence limits in Episheet.

Example: Chlordiazopoxide use and congenital heart disease C use No C use Cases 4 386 Controls 1250 OR = (4 x 1250) / (4 x 386) = 3.2 p=0.08 ; 95% CI=0.81–13 From Rothman K

3.2 p=0.08 Odds ratio 0.81 - 13

Example: Chlordiazopoxide use and congenital heart disease – large study C use No C use Cases 1090 14 910 Controls 1000 15 000 OR = (1090 x 15000) / (1000 x 14910) = 1.1 p=0.04 ; 95% CI=1.05-1.2 From Rothman K

Precision and strength of association

Confidence interval provides more information than p value Magnitude of the effect (strength of association) Direction of the effect (RR > or < 1) Precision of the point estimate of the effect (variability) p value can not provide them !

What we have to evaluate the study 2 A test of association. It depends on sample size. p value Probability that equal (or more extreme) results can be observed by chance alone OR, RR Direction & strength of association if > 1 risk factor if < 1 protective factor (independently from sample size) CI Magnitude and precision of effect Significance testing evaluates only the role of chance as alternative explanation of observed difference or effect. Confidence intervals are more informative than p values.

Comments on p-values and CIs Presence of significance does not prove clinical or biological relevance of an effect. A lack of significance is not necessarily a lack of an effect: “Absence of evidence is not evidence of absence”.

Comments on p values and CIs A huge effect in a small sample or a small effect in a large sample can result in identical p values. A statistical test will always give a significant result if the sample is big enough. p values and CIs do not provide any information on the possibility that the observed association is due to bias or confounding.

2 and Relative Risk Cases Non cases Total 2 = 1.3 E 9 51 60 p = 0.13 NE 5 55 60 RR = 1.8 Total 14 106 120 95% CI [ 0.6 - 4.9 ] Cases Non cases Total 2 = 12 E 90 510 600 p = 0.0002 NE 50 550 600 RR = 1.8 Total 140 1060 1200 95% CI [ 1.3-2.5 ] « Too large a difference and you are doomed to statistical significance »

Common source outbreak suspected Exposure cases non cases AR% Yes 15 20 42.8% No 50 200 20.0% Total 65 220 2 = 9.1 p = 0.002 RR = 2.1 95%CI = 1.4-3.4 23% REMEMBER: These values do not provide any information on the possibility that the observed association is due to a bias or confounding.

Recommendations Always look at the raw data (2x2-table). How many cases can be explained by the exposure? Interpret with caution associations that achieve statistical significance. Double caution if this statistical significance is not expected. Use confidence intervals to describe your results. Report p values precisely.

Suggested reading KJ Rothman, S Greenland, TL Lash, Modern Epidemiology, Lippincott Williams & Wilkins, Philadelphia, PA, 2008 SN Goodman, R Royall, Evidence and Scientific Research, AJPH 78, 1568, 1988 SN Goodman, Toward Evidence-Based Medical Statistics. 1: The P Value Fallacy, Ann Intern Med. 130, 995, 1999 C Poole, Low P-Values or Narrow Confidence Intervals: Which are more Durable? Epidemiology 12, 291, 2001

Previous lecturers Alain Moren Preben Aavitsland Paolo D’Ancona Doris Radun Lisa King Manuel Dehnert Previous lecturers