Download presentation
Presentation is loading. Please wait.
1
Epidemiology Module 3: Systematic and Random Error (Biases and Statistical Precision)
Tuhina Neogi, MD, PhD, FRCPC Steven Vlad, MD Clinical Epidemiology Research and Training Unit Boston University School of Medicine
2
Goals Understand the difference between systematic error and random error Review types of systematic error (bias) confounding, information bias (measurement error), selection bias Review random error (statistical precision) correct interpretations of confidence intervals and p-values Type 1 and Type 2 errors
3
Bias
4
How to interpret results of a study
The result of a study (the stated effect measure) can arise because: It is the truth It is due to bias (systematic error) It is due to chance (random error)
5
Bias/Systematic Error
Once we know the study result, we need to focus on whether the result is the truth, or whether it could have been the result of bias If a study’s result is RR=1.2, could the true (unbiased) result be higher or lower than that value? How could possible biases influence the reported result?
6
What are biases (systematic errors)?
Biases can either move the observed result (as opposed to the true result) away from the null, or toward toward the null Null = 0 for difference measures (e.g. risk difference) Null = 1 for ratio measures (e.g. risk ratio) Away = if true result > null, effect appears larger if true result < null, effect appears smaller Toward = if true result > null, effect appears smaller if true result < null, effect appears larger Bias has nothing to do with sample size effect of study design
7
Bias Away from Null Bias Toward Null 1 + ∞ - ∞ + ∞ True Value A
Observed Value A 1 Observed Value B + ∞ - ∞ + ∞ True Value A True Value B Null 1 + ∞ - ∞ + ∞ Observed Value A Observed Value B Bias Toward Null
8
Direction of Bias Often, the result of bias is unpredictable
Either toward or away from null In some circumstances we can predict the effect a bias Usually toward the null 8
9
Types of Bias Only 3 types of bias! all studies
regardless of study design The different study designs have ways of dealing with these biases – whether it’s done effectively determines how valid the study is
10
The 3 different types of biases
Confounding Information Bias (Measurement Error) Selection Bias
11
Confounding
12
Confounding Question:
Are there other factors that may account for the apparent relationship between exposure (intervention) and disease (outcome)? or Is there a common cause of both exposure and disease?
13
Two ways to look at Confounding
Confounder Observed Value 2 Observed Value 2 Exposure Disease Confounder Exposure Disease
14
Confounding Note Confounder not caused by exposure or disease
Confounder not on causal path from exposure to disease i.e. not an intermediate Confounder Exposure Disease
15
Examples Smoking Yellow Finger Lung Cancer Diabetes CRP CAD Depression
SSRI Suicide
16
Why Does Confounding Occur?
As an imbalance in proportions of the confounder between the two comparison groups Total Population Exposed Unexposed No. of Cases 4,600 140 N 1,000,000 Risk 0.0046 Risk Ratio / = 33
17
Total Population Exposed Unexposed Cases 4,600 140 N 1,000,000 Risk
0.0046 RR 33 Men Exposed Unexposed Cases 4,500 50 N 900,000 100,000 Risk 0.005 0.0005 RR 10 Women Exposed Unexposed Cases 100 90 N 100,000 900,000 Rate 0.001 0.0001 RR 10 Note that the true RR is LOWER among both men AND women. This could be smoking and lung cancer
18
Total Population Exposed Unexposed Cases 4,600 140 N 1,000,000 Risk 0.0046 RR 33 Men Exposed Unexposed Cases 4,140 14 N 900,000 100,000 Risk 0.0046 RR 33 Women Exposed Unexposed Cases 460 126 N 100,000 900,000 Risk 0.0046 RR 33
19
Men Exposed Unexposed Cases 2,500 250 N 500,000 Risk 0.005 0.0005 RR 10 Women Exposed Unexposed Cases 1,000 100 N 500,000 Risk 0.001 0.0001 RR 10 Total Population Exposed Unexposed Cases 3,500 350 N 1,000,000 Risk 0.0035 RR 10
20
Why Does Confounding Occur?
Confounder Exposure Disease Men Exposed Unexposed Cases 4,500 50 N 900,000 100,000 Risk 0.005 0.0005 RR 10 Women Exposed Unexposed Cases 100 90 N 100,000 900,000 Rate 0.001 0.0001 RR 10
21
Why Does Confounding Occur?
Smoking Yellow Fingers Lung Cancer Smokers Yellow Not Yellow Cases 4,500 500 N 900,000 100,000 Risk 0.005 RR 1 Non-smokers Yellow Not Yellow Cases 100 900 N 100,000 900,000 Rate 0.001 RR 1 Crude RR = (4,600/1M) / (1,400/1M) = 3.3
22
BP, age, gender, BMI, smoking, DM
Confounding cont’d Does hyperlipidemia cause MIs? Do higher lipid levels cause MIs? BP, age, gender, BMI, smoking, DM ? High Lipids MI
23
Does lowering lipid levels lower the risk of MI?
Does statin use lower lipids? If so, does that have an effect on lowering MI events? BP, age, gender, BMI, smoking, DM, other healthy lifestyle factors, adherence ? statins lower lipid levels, MI
24
Does a diet high in cholesterol increase the risk of MI?
Does a ‘heart healthy diet’ lower lipids? If so, does that have an effect on lowering MI events? BP, age, gender, BMI, smoking, DM, other healthy lifestyle factors, diet, exercise, adherence ? Healthy heart diet lower lipid levels, MI
25
Confounding by Indication
Particularly common and difficult to deal with in (observational) pharmacoepidemiology studies RA disease severity TNF-antagonist use Lymphoma Depression SSRI Suicide
26
What Confounding is NOT
A factor on the causal pathway (intermediate) high fat diet LDL CAD smoking adenomatous polyp colon CA A factor that modifies the relationship between an exposure and a disease Effect of anti-HTN drug is different in Blacks vs Whites Effect of blood levels of X on risk of Y in men vs women
27
Effect Modification (aka Interaction)
Total Population High BP Normal BP MI 4,700 140 N 1,000,000 Risk 0.0047 RR 33.6 White Americans High BP Normal BP MI 4,500 50 N 900,000 100,000 Risk 0.005 0.0005 RR 10 Black Americans High BP Normal BP MI 200 90 N 100,000 900,000 Rate 0.002 0.0001 RR 20
28
How does a RCT address Confounding?
Randomization evenly distribute both known and unknown confounders (by chance) between the two (or more) exposure groups (i.e. active treatment and placebo) Can use specific inclusion/exclusion criteria to ensure everyone is the same for a particular confounder (e.g., all males in the study) [also known as restriction]
29
Control of Confounding by Randomization
Since potential confounders are balanced between exposure groups, they can’t confound the association Depression SSRI placebo Death 250 N 500,000 Risk 0.0005 RR 1 No Depression SSRI placebo Death 100 N 500,000 Risk 0.0002 RR 1 Depression SSRI Suicide 1,000,000 get SSRI 1,000,000 get placebo Depression distributed equally by chance, so: Crude RR = (350/1M) / (350/1) = 1
30
Control of Confounding in an RCT
Check Table 1 for imbalances between treatment arms Are any of the differences clinically meaningful? (not the same as statistically significant!) How could those differences affect the results? What other potential confounders are missing from Table 1? Could they affect the results if they were imbalanced between the treatment arms? How likely is it that unknown confounders are imbalanced? (with large RCTs, unlikely)
31
Control of Confounding in a RCT
Intention-to-treat analysis Maintains the balance of potential confounders given by randomization, thereby continuing to (theoretically) address confounding May need to ‘control’ for very unbalanced factors in the analyses
32
Control of Confounding in Observational Studies
Study design level: Restriction Inclusion/exclusion criteria to limit study population to a more homogeneous group E.g. study of alcohol and MI may exclude smokers since smoking is an important confounder Data analysis level: Stratification Analyze data stratified by an important confounder E.g. evaluate effect of alcohol on MI among smokers and among non-smokers separately Added advantage: identifies effect modification Data analysis level: Regression ‘Control’ for potential confounders in regression models can also identify effect modification if planned for by investigators Matching Match on potential confounding variables
33
Control of Confounding in Observational Studies
Have the investigators identified all the important potential confounders in the study? What factors could be common causes of both exposure and disease? Have the investigators accounted for these potential confounders either in the design of the study or in the analysis? If they haven’t, how might that affect the results they found? 33
34
Confounding: Examples
35
In large RCTs, confounding is not usually an important issue
Randomization distributes confounders equally between trial arms Table 1 appears to confirm this Probable confounders seem to be well balanced 35
36
Is this difference important
Is this difference important? Previous fractures are a risk for future fractures ...
37
Questions to ask: What are the potential sources of confounding?
Have the authors identified these potential confounders How have the authors addressed potential confounders? Was this sufficient?
38
1. What are the potential sources of confounding?
Risk factors for exposure i.e. for NSAID use RA? (chronic NSAIDs) CAD prevention? (ASA) prior GI bleeding? (avoid NSAIDs, use coxib) age? (avoid NSAIDs) confounder NSAID or type of NSAID CV event Are any of these also related to the likelihood of having an event? Think about risks of exposure and whether they are related to outcome. RA and CAD certainly are. Age probably. Prior GI bleeding, maybe not 38
39
1. What are the potential sources of confounding?
Risk factors for outcome? i.e. having a CV event smoking? prior CV event? general health? ASA use? (preventative) age? confounder NSAID or type of NSAID CV event Are any of these also related to the likelihood of being exposed to an NSAID or type of NSAID? Think about risks of outcome and whether they are related to exposure. - ASA use certainly. Age & general health, probably. Smoking? 39
40
1. What are the potential sources of confounding?
Confounding by indication Almost always a concern in observational studies of drugs Here, could the reason that NSAIDs are prescribed (the ‘indication’) be related to both exposure (NSAID prescription) and disease (CV outcomes)? What about reasons to avoid NSAIDs? Chronic kidney disease? Other health problems? What would be the results of this bias be? - poor health -> coxib -> higher rate of CV disease b/c of event CKD Poor health pain NSAID or type of NSAID NSAID or type of NSAID NSAID or type of NSAID CV event CV event CV event
41
2. Have the authors identified these potential confounders?
In this study, seems like yes they even have a section talking about ‘covariates’
42
3. How have the authors addressed potential confounders?
Statistical analysis, ‘Advanced methods’
43
4. Was this sufficient? Often the hardest of these questions to answer, especially if the authors have done a good job addressing the first three questions Requires experience, judgement, maybe further analysis My take on this study: probably sufficient They’ve acknowledged the difficulties and done something about them; What they’ve done is appropriate; The methods are reasonable and go beyond what most studies do; However, I recognize what an insidious problem confounding, especially confounding by indication, is, and therefore I’m ready to change my opinion if further evidence comes to light 43
44
Questions to ask: What are the potential sources of confounding?
Have the authors identified these potential confounders? How have the authors addressed potential confounders? Was this sufficient?
45
1. What are the potential sources of confounding?
This is another pharmacoepidemiologic study (kind of), so let’s jump right to confounding by indication again Here, could the reason that DMARDs were prescribed (the ‘indication’) be related to both exposure (DMARD use) and disease (lymphoma)? -> Disease severity, i.e. chronic inflammation If disease severity is related to both DMARD use and lymphoma, what is the likely bias going to look like? --> going to look like DMARDs cause lypmoma disease severity DMARD or type of DMARD lymphoma
46
2. Have the authors identified these potential confounders?
Clearly - that is what this paper is all about
47
3. How have the authors addressed potential confounders?
Here: detailed assessment of disease severity for each subject mutual control of drug use and disease severity etc.
48
4. Was this sufficient? Again, this is often the hardest of these questions to answer, My take on this study: probably sufficient They’ve acknowledged the difficulties and done something about them; What they’ve done is appropriate; The methods are reasonable; Again though, I recognize what an insidious problem confounding, especially confounding by indication, is, and therefore I’m ready to change my opinion if further evidence comes to light 48
49
Information Bias (Misclassification)
50
Information Bias Could exposure (intervention) or disease (outcome) be inaccurately reported or measured i.e. misclassified?
51
Information Bias Information about exposure, disease, and/or confounder(s) is erroneous Misclassified into the wrong category True exposure True disease status Measured exposure Measured disease status
52
Examples of Information Bias
True EtOH Exposure misclassification Reported EtOH Fetal outcome True Gastritis status Outcome misclassification Drug Gastritis by chart review True illicit drug use True blackouts If using the same source for exposure and disease information (e.g., from the participant), can get very biased results BOTH exposure and outcome misclassification Reported illicit drug use Reported blackouts
53
Non-differential Misclassification
Chance of misclassifying is the same in both the exposure groups (or more if more than one) and the chance of misclassifying is the same in both the outcome groups (ditto) i.e. misclassification is random in both exposure and disease General Rule If misclassification is non-differential, then the resultant bias is usually toward the null i.e. the measured effect is probably conservative Warning: this is true in general. It is not always true in every study! (There can be bias away from the null) Note that this occurring in both exposure groups and disease groups may not be very likely
54
Differential Misclassification
Chance of misclassifying is not the same in either the two exposure groups (or more if more than one) or the two outcome groups (ditto) i.e. misclassification is not random in either exposure or disease In this case, the resultant bias is unpredictable! Consider carefully whether you think any misclassification is likely to be differential or non- differential!
55
How does an RCT address Information Bias
EXPOSURE: Exposure is assigned (randomly) in an RCT the label of the treatment arm is a proxy for the actual exposure bias is usually non-differential Assigned exposure Actual exposure Outcome
56
How does an RCT address Information Bias
EXPOSURE: Problems: non-adherence and contamination With less than 100% adherence, intention-to-treat analysis is biased due to information bias (usually non-differential) If using a ‘completers’ analysis (only analyze those completing the study in their assigned group), information bias is addressed, but introduces confounding and selection bias (when subjects leave the study) randomization has been broken reasons for withdrawal could be related to treatment TRUTH is somewhere between ITT and completers analysis when <100% adherence (almost always) 56
57
Treatment A Treatment B Middle lines are supposed to be contamination
Top/bottom lines are supposed to be non-adherence and loss to follow-up ITT - thought to be more conservative estimate (biased toward null) Completers analysis - thought to usually exaggerate estimate
58
Information Bias in an RCT
OUTCOME allocation concealment (nobody knows what the next treatment assignment will be) blinding (investigators and subjects) same study procedures for all arms these should prevent non-differential misclassification of the outcome any bias should be toward the null Problems: Unblinding (e.g. side effects from treatment) Accuracy of outcome assessment reliability, reproducibility
59
Information Bias in Observational Studies
Obtain information about exposure and outcome in a structured manner (preferably ‘objective’ sources) Obtain information about the disease in exactly the same fashion regardless of the exposure status Obtain information about the exposure in exactly the same fashion regardless of the disease status Use different sources of information for exposure and disease if possible If you ask someone about their alcohol use and also ask them about how many times they fell, you are likely to get very skewed data EtOH users less likely to report both use and falls ‘Objective’ does not mean that the subject can’t be the source of the information
60
Information Bias: Examples
62
Exposure misclassification?
An RCT, so exposure should be random Pill counts and interviews to assess whether subjects took medications as assigned Not an ITT (but not a completers analysis either) (the method they used is probably better than an ITT analysis - you’ll just have to trust me)
63
Outcome misclassification?
Allocation concealment Not commented upon unfortunately this is common we’ll presume allocation was concealed Blinding Stated to be double-blind placebos for both interventions (blinds subjects in absence of noticeable side effects) BMD scan results withheld from local investigators (blinds investigators) Outcome assessments BMD - ‘Quality assurance, cross-calibration adjustment, and data processing were done centrally’ presumably without knowledge of treatment assignment Fracture - ‘Radiographs were assessed in a blinded fashion by an independent reader’ presumably using a standard protocol of some kind
64
Likelihood of Information Bias
Appears to be minimal Any bias should be toward the null Estimates therefore likely to be conservative (underestimate true results) 64
66
Exposure misclassification?
How was exposure assessed? prescriptions for NSAIDs (including coxibs) Could exposure be mis-measured? here, a prescription does not guarantee that the subject actually took the medication, or that s/he took it as prescribed OTC medication use may not be measured NSAID use NSAID Prescription CV event
67
Exposure misclassification?
Is any exposure mis-measurement likely to be non-differential or differential? Three questions here: Coxibs vs traditional NSAIDs NSAIDs vs comparison meds (glaucoma/thryoid meds) Are chronic NSAID users likely to differ in their ‘compliance’ compared to glaucoma/thyroid med users (or coxib users)? Some traditional NSAIDs are available OTC coxibs are not therefore, more likely that NSAID use is mis-measured (underestimated) therefore, differential misclassification is likely Same potential problem with NSAIDS in general (some available OTC) and comparison meds (prescription needed)
68
Exposure misclassification?
Is any exposure mis-measurement likely to be non-differential or differential? Are chronic NSAID users likely to differ in their ‘compliance’ compared to glaucoma/thyroid med users (or coxib users)? I’m not sure but I suspect there might be a difference. Any opinions? 68
69
Outcome misclassification?
How was outcome assessed? CV outcomes based on various codes in admin records Could exposure be mis-measured? errors made by coders (doctors, administrators, etc.) no verification that a coded event is an actual event Codes for CV events NSAID Prescription CV event
70
Outcome misclassification?
Is any outcome mis-measurement likely to be non-differential or differential Is it more likely for persons not coded with a CV event to actually have a CV event? or for persons coded with a CV event to not have one? Or does each seem equally likely? To me, I think it’s more likely that persons coded for a CV event were mis-coded compared to persons who never received a code in the first place.
71
Likelihood of Information Bias
Overall, low-moderate Likelihood of differential misclassification is moderate I recognize this risk, think the authors are also aware of it and have considered it, and conditionally accept the results of the study pending further studies
73
Exposure misclassification?
How was exposure assessed? Exposure here is degree of inflammation Apparently, standardized, protocolized assessment of tender/swollen joints Could exposure be mis-measured? Physician error in counting joints Tender/swollen joint count Inflammation Lymphoma
74
Exposure misclassification?
Is any exposure mis-measurement likely to be non-differential or differential? Not 100% sure, but seems to me like it’s likely to be non-differential i.e. all patients have RA More likely to count more joints if subject seems more ‘active’? Less likely to count more joints if subject seems to be doing subjectively well?
75
Outcome misclassification?
How was outcome assessed? Cases - occurrence of lymphoma All cases validated and confirmed Controls - random other RA subjects w/o RA Could exposure be mis-measured? Would have to argue that some of those without lymphoma actually had missed cases - seems unlikely Lymphoma on biopsy Inflammation Lymphoma
76
Outcome misclassification?
Is any outcome mis-measurement likely to be non-differential or differential As said, risk seems small but any risk would have to be non-differential Cases are almost certainly lymphoma Could be some controls who actually should be cases
77
Likelihood of Information Bias
Overall, low Likelihood of differential misclassification is also low Effort taken to verify cases and relatively rarity of lymphoma in those who are controls virtually eliminates the risk of information bias in this study.
78
Selection Bias
79
Selection Bias Is study entry or exit related to exposure (intervention) or disease (outcome)?
80
Selection Bias Factors that influence study participation
Subjects entering and staying in the study exposure-disease association is different among those who participate than among those who do not E.g. healthy worker effect
81
Example of Selection Bias
Exposure New Drug Chemical DM Lipids Participation Those who completed study Hospitalized patient Healthy worker (compared to general population) Health conscious or familial hypercholesterolemia or other high risk Disease Lung disease Pain Gall bladder MI
82
How does an RCT address Selection Bias?
Study entry: Randomization after study entry prevents any factors related to the exposure from influencing study participation
83
How does a RCT address Selection Bias?
Study exit: Try to achieve full follow-up Problem: Loss-to-follow-up (LTFU) LTFU is almost always related to either the exposure or disease or both (e.g. drug is ineffective (outcome), or drug has side effects (exposure)) Equal numbers of LTFU in treatment arms does not guarantee that the there is no selection bias (different reasons for dropout) Analytic techniques such as last observation carried forward or multiple imputations do not “take care of” selection bias
84
Selection Bias in Observational Studies
Difficult to deal with selection bias in observational studies Can’t always control factors influencing study participation Healthy worker effect: use a group in the same office/factory not exposed to the particular chemical Try to minimize loss-to-follow-up Use appropriate analytic methods to account for differing lengths of follow-up (incidence rate ratio, hazard ratio)
85
‘Representativeness’: This is NOT a bias
Trying to make one’s study population ‘representative’ of a more general population is counter to using restriction for control of confounding Basic science animal studies ‘restrict’ study population to genetically homogeneous lab animals To elaborate a scientific theory about a biologic process If a different group is of clinical interest, should conduct well-designed study in that group to determine whether effects differ in that group This is a question of whether the BIOLOGY is different in that group This is an issue of “Effect Measure Modification”
86
Selection Bias: Examples
88
A Clinical Trial Selection bias due to study entry is virtually never an issue true here patients randomized after selection for study Loss to follow-up 70 (33%!) in alendronate group 64 (30%!) in teriparatide group These are fairly high rates of LTFU! 88
89
Does it matter? LTFU is roughly equal in each arm
This does NOT guarantee lack of selection bias Question: Is LTFU random in each group? or is LTFU differential? e.g: alendronate subjects drop out because of side effects (maybe the same people where the drug was effective) teriparatide subjects drop out because of inefficacy This combination could make teriparatide look better than it actually is, even though LFTU is equal in each group teriparatide users who had an effect are left in study alendronate users who had no effect are left in the study
90
Judgement Seems to me that drop out is likely to be random
above scenario is unlikely Therefore I doubt there is much if any selection bias in this study 90
92
Study entry The authors recognize that healthy people are less likely to have records in their study database one reason they used other drug-users (glaucoma and hypothyroid) as a control group Therefore healthy persons may be less likely to be selected as a comparison group subject disease estimate in control group could be exaggerated results in underestimate of true risk 92
93
Study exit Subjects could leave study if they got insurance, e.g.
are persons who did this more likely to be NSAID users? or to have an event? or both? are healthier persons more likely to get insurance and not have an event? I would think so leaves less healthy persons who could have an event in the control group more events in control group underestimate true risk 93
94
Judgement Selection bias is certainly possible
Recognized by authors Methods to minimize (drug using control group) Overall, risk is there but is probably not very high(?) 94
96
Study entry Entry in this study is based on case-control status
How could cases or controls be unrepresentative of RA patients? All RA patients should have been captured in this RA registry All lymphoma patients should also be captured in the lymphoma registry One reason why ‘population-based’ studies are nice! 96
97
Study exit As with entry, these registers should capture all RA patients who have had lymphoma If RA patients left the study before they got lymphoma that could be a source of bias But these persons could not have been cases or controls. Again, likelihood of selection bias seems low 97
98
Judgement Selection bias seems unlikely in this nested case-control study Note, this can often be the case in c-c studies where the underlying population is well-defined (esp. population-based studies) When this is NOT the case (e.g. hospital- based c-c study) selection bias is much more likely 98
99
Interpreting Study Results
Any study’s results must be interpreted in the context of whether or not there are any biases that could have deviated those results from being the truth This is the critical step in understanding whether or not these results are clinically meaningful for your patients – i.e., are the results valid? If there are biases, you need to try to determine how much they would have affected the results – would the message still be about the same even taking those biases into account?
100
Summary You should now be able to:
Understand that all study designs use various strategies to address systematic error (bias) to varying degrees of success Understand how to identify confounding, information bias, selection bias
101
Random Error
102
Goals Review random error (statistical precision)
Review the correct interpretations of confidence intervals and p-values Review Type 1 and Type 2 errors
103
Random Error vs Systematic Error
True value Syst error Random error In absence of bias, with perfect precision Measured value Study size
104
Random Error Results from variability in the data, sampling
E.g. measuring height with measuring tape: 1 measurement may be off, but multiple measurements will give you a better estimate of height Relates to precision We use confidence intervals to express the degree of uncertainty/random error associated with a point estimate (e.g. a RR or OR) Measure of precision
105
Confidence Interval Epidemiologist/biostatisticians have arbitrarily chosen 95% as the level of confidence to report 95% C.I.=if the data collection and analysis were repeated infinitely, the confidence interval should include within it the correct value of the point estimate 95% of the time Note: this is NOT the same as saying the confidence interval is 95% likely to contain the ‘true’ value. (INCORRECT!!) ASSUMES NO BIAS
106
90% confidence intervals from repeated studies
True value for RR
107
Confidence Interval cont’d
If the confidence interval includes the null (‘1’ for relative measures of effect (e.g. RR, OR), ‘0’ for absolute measures of effect (e.g. risk difference)), the result is considered to be not statistically significant Since the CI contains the correct value 95% of the time, the correct value could be the null
108
Confidence Interval cont’d
“Precision” relates to the width of the confidence interval A very precise estimate of effect will have a narrow CI Can improve precision by: Increasing sample size
109
P-value The P-value is calculated from the same equations that lead to a confidence interval Again, 0.05 (related to the 95% CI) was arbitrarily chosen P-values are confounded by magnitude of effect and sample size (precision) Can get a very small p-value if a study is large enough, even if the effect is very small
110
P-value P-value=assuming the null hypothesis is true, the p-value is the probability of this result or one more extreme, assuming no bias/confounding E.g. RR=2, p=0.03: “Assuming the null hypothesis is true (RR=1), the probability of this result (i.e. RR=2) or one more extreme (i.e. RR>2) is 3%” E.g. RR=0.4, p<0.001: “Assuming that there truly is no difference between the two treatment groups, the probability of obtaining a RR of 0.4 or one more extreme (i.e. RR<0.4) is less than 0.1%”
111
Note: this is NOT the same as saying the p- value is the probability that the null hypothesis is true (INCORRECT!! since it is conditioned on the null being true) Just because your p-value isn’t “statistically significant” doesn’t mean that you can say the two arms are “equivalent” The p-value says nothing about the probability of your alternative hypothesis being true Think of the p-value as a relative marker of how consistent the data are with the null hypothesis
112
Statistical Significance Testing
Statistical significance testing is really only about statistical precision How precise are the study results? P-values on their own tell you NOTHING about the magnitude of effect; 95% CI confidence intervals at least give you an idea of potential magnitude of effect P-values and 95% CI tell you NOTHING about how valid the results are (i.e., whether the results are biased) We should place LESS emphasis on statistical significance and rather think MORE about clinical significance or meaningfulness when considering the results and validity of a study
113
Type I, Type II Errors Emanating from α=0.05, Type I error is made if one incorrectly rejects the null hypothesis when the null hypothesis is actually true This probability is determined by the significance level alpha (=0.05) Type II error is made if one fails to reject the null hypothesis when the null is false This probability is determined by β, where 1-β is the power, which is usually set at 80%
114
Basic Approach to Understanding Power
Power calculations/sample size calculations: Use estimate of the expected event rate in the unexposed (placebo group) Based on published literature/experience (e.g., 10% of men age have MIs) Based on that ‘background rate’, determine sample size needed to detect a difference of a specified magnitude between treatment and placebo with 80% power and alpha=0.05
115
Random Error: Examples
117
Statistical Significance: p-value given, but no confidence interval regarding the difference in means (i.e. the 95% CI for the mean BMD difference of 3.8%) Can theoretically use the standard error (S.E.) information to calculate the 95% confidence interval Adequate power? Sample size: 214/arm Had sufficient power to detect 2% difference in BMD at L-spine (found 3.8% difference)
119
Statistical Significance
95% confidence interval given for each rate ratio Adequate sample sizes? N=several thousand persons for each drug 100s-1000s of events Null results (lack of association) for some drugs is unlikely due to inadequate power (and did find association for Vioxx, as expected) ? Type 2 error
121
Statistical Significance
95% Confidence Intervals given for each Odds Ratio Adequate sample size? 378 cases, 378 controls They found an association, so no concerns about inadequate power due to insufficient sample size ? Type 1 error ? Bias
122
Summary You should now be able to:
Understand that confidence intervals reflect precision, which is related to sample size Understand the correct interpretation of confidence intervals and p-values Understand that power relates to sample size
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.