Traps and pitfalls in medical statistics Arvid Sjölander.

Presentation on theme: "Traps and pitfalls in medical statistics Arvid Sjölander."— Presentation transcript:

Traps and pitfalls in medical statistics Arvid Sjölander

26 april 2015Arvid Sjölander2 Motivating example  You are involved in a project to find out if snus causes ulcer.  A questionnaire is sent out to 300 randomly chosen subjects.  200 subjects respond:  We can use the relative risk (RR) to measure the association between snus and ulcer:  Can we safely conclude that snus prevents ulcer? Ulcer YesNoR Snus Yes228 2/30  0.07 No1715317/170=0.1

26 april 2015Arvid Sjölander3 Outline  Systematic errors  Selection bias  Confounding  Randomization  Reverse causation  Random errors  Confidence interval  P-value  Hypothesis test  Significance level  Power

26 april 2015Arvid Sjölander4 One possible explanation  It is a wide spread hypothesis that snus causes ulcer.  Snus users who develop ulcer may therefore feel somewhat guilty, and may therefore be reluctant to participate in the study  Hence, RR<1 may be (partly) explained by an underrepresentation of snus users with ulcer among the responders.  This is a case of selection bias.

26 april 2015Arvid Sjölander5 Selection bias  We only observe the RR among the potential responders.  The RR among the responders (observed) may not be equal to the population RR (unobserved). Population Potential non- responders Potential responders Sample

26 april 2015Arvid Sjölander6 How do we avoid selection bias?  Make sure that the sample is drawn randomly from the whole population of interest - must trace the non-responders.  Send out the questionnaire again, follow up phone calls etc. Population Potential non- responders Potential responders Sample

26 april 2015Arvid Sjölander7 Another possible explanation  Because of age-trends, young people use snus more often than old people.  For biological reasons, young people have a smaller risks for ulcer than old people.  Hence, RR<1 may be (partly) explained by snus-users being in “better shape” than non-users.  This is a case of confounding, and age is called a confounder.

26 april 2015Arvid Sjölander8 Confounding  The RR measures the association between snus and ulcer.  The association depends on both the causal effect, and the influence of age.  In particular, even in the absence of a causal effect, there will be an (inverse) association between snus and ulcer (RR  1). ?

26 april 2015Arvid Sjölander9 How do we avoid confounding?  At the design stage: randomization, i.e. assigning “snus” and “no snus” by “the flip of a coin”.  + reliable; it eliminates the influence of all confounders.  - expensive and possibly unethical.  At the analysis stage: adjust (the observed association) for (the influence of) age, e.g. stratification, matching, regression modeling.  + cheap and ethical.  - not fully reliable; cannot adjust for unknown or unmeasured confounders. ?

26 april 2015Arvid Sjölander10 Yet another explanation  It is a wide spread hypothesis among physicians that snus causes and aggravates ulcer.  Snus users who suffers from ulcer may therefore be advised by their physicians to quit.  Hence, RR<1 may be (partly) explained by a tendency among people with ulcer to quit using snus.  This is a case of reverse causation.

26 april 2015Arvid Sjölander11 Reverse causation  Reverse causation can be avoided by randomization. SnusUlcer ?

26 april 2015Arvid Sjölander12 Systematic errors  Selection bias, confounding, and reverse causation, are referred to as systematic errors, or bias.  “You don’t measure what you are interested in”.  How can you tell if your study is biased?  You can’t! (At least not from the observed data).  It is important to design the study carefully and “think ahead” to avoid bias.  What may the reason be for potential response/non-response?  How can we trace the non-responders?  Which are the possible confounders?  Do we need to randomize the study? Would randomization be ethical and practically possible?

26 april 2015Arvid Sjölander13 Example cont’d  Assume that we believe that the study is unbiased (no selection bias, no confounding and no reverse causation).  Can we safely conclude that snus prevents ulcer? Ulcer YesNoR Snus Yes228 2/30  0.07 No1715317/170=0.1

26 april 2015Arvid Sjölander14 Random errors  True RR = observed RR?  True RR  observed RR! Population Sample True RRObserved RR=0.7

26 april 2015Arvid Sjölander15 Confidence interval  Where can we expect the true RR to be?  The 95% Confidence Interval (CI) answers this question.  It is a range of plausible values for the true RR.  Example: RR=0.7, 95% CI: (0.5,0.9).  The narrower CI, the less uncertainty in the true RR.  The width of the CI depends on the sample size, the larger sample, the narrower CI.  How do we compute a CI? Ask a statistician!

CI for our data  RR=0.7, 95% CI: (0.16,2.74).  Conclusion? 26 april 2015Arvid Sjölander16 Ulcer YesNoR Snus Yes228 2/30  0.07 No1715317/170=0.1

26 april 2015Arvid Sjölander17 P-value  Often, we specifically want to know whether the true RR is equal to 1 (no association between snus and ulcer).  The hypothesis that the true RR = 1 is called the “null hypothesis”; H 0.  The p-value (p) is an objective measure of the strength of evidence in the observed data against H 0.  0 < p < 1.  The smaller p-value, the stronger evidence against H 0.  How do we compute p? Ask a statistician?

Factors that determine the p-value  What do you think p depends on?  The sample size: the larger sample, the smaller p.  The magnitude of the observed association: the stronger association, the smaller p.  A common mistake: “The p-value is low, but the sample size is small so we cannot trust the results”.  Yes you can!  The p-value takes the sample size into account. Once the p-value is computed, the sample size carries no further information. 26 april 2015Arvid Sjölander18

P-value for our data  P = 0.81  Conclusion? 26 april 2015Arvid Sjölander19 Ulcer YesNoR Snus Yes228 2/30  0.07 No1715317/170=0.1

Making a decision  The p-value is an objective measure of the strenght of evidence against H 0.  The smaller p-value, the stronger evidence against H 0.  Sometimes, we have to make a formal decision of whether or not to reject H 0.  This decision process is formally called hypothesis testing.  We reject H 0 when the evidence against H 0 are “strong enough”.  i.e. when the p-value is “small enough”. 26 april 2015Arvid Sjölander20

Significance level  The rejection threshold is called the significance level.  E.g. “5% significance level” means that we have decided to reject H 0 if p<0.05.  That we use a low significance level level means that we require strong evidence against H 0 for rejection.  That we use a high significance level means that we are satisfied with weak evidence against H 0 for rejection.  What is the advantage of using a low significance level? What about a high significance level? 26 april 2015Arvid Sjölander21

A parallell to the court room  H 0 = the prosecuted is innocent.  p value = the strength of evidence against H 0.  Low significance level = need strong evidence to condemn to jail.  Few innocent in jail, but many guilty in freedom.  High significance level = weak evidence sufficient to condemn to jail.  Many guilty in jail, but many innocent in jail as well. 26 april 2015Arvid Sjölander22

Type I and type II errors  There is always a trade-off between the risk for type I and the risk for type II errors.  Low significance level (difficult to reject H 0 )  small risk for type I errors, but large risk for type II errors.  High significance level (easy to reject H 0 )  small risk for type II errors, but large risk for type I errors.  By convention, we use 5% significance level (reject H 0 if p<0.05). 26 april 2015Arvid Sjölander23 H 0 is falseH 0 is true Reject H 0 OK Type I error (false positive) Don’t reject H 0 Type II error (false negative) OK

Relation between significance level and type I errors  In fact, the significance level = the risk for type I errors.  If we follow the convention and use 5% significance level (reject H 0 if p<0.05) then we have 5% risk of type I errors.  What does this mean, more concretely? 26 april 2015Arvid Sjölander24 H 0 is falseH 0 is true Reject H 0 OK Type I error (false positive) Don’t reject H 0 Type II error (false negative) OK Sig level

Power  Power = the chance of being able to reject H 0, when H 0 is false.  Relation between significance level and power:  High significance level (easy to reject H 0 )  high power.  Low significance level (difficult to reject H 0 )  low power. 26 april 2015Arvid Sjölander25 H 0 is falseH 0 is true Reject H 0 OK Type I error (false positive) Don’t reject H 0 Type II error (false negative) OK Sig level Power

Power calculations  It is important to determine the power of the study before data is collected.  That the power is low means that we will probably not find what we are looking for.  Direct calculation of the power is beyond the scope of this course  Ask a statistician! 26 april 2015Arvid Sjölander26

Power calculations, cont’d  Heuristically, the power of the study is determined by three factors:  The significance level; higher significance level gives higher power.  The true RR; stronger association gives higher power.  The sample size; larger sample gives higher power.  Typically, we want to have a power of at least 80%.  In practice, the significance level is fixed at 5%.  We also typically have an idea of what deviations from H 0 that are scientifically relevant to detect (e.g. RR > 1.5).  We determine the sample size that we need, to have the desired power. 26 april 2015Arvid Sjölander27

26 april 2015Arvid Sjölander28 Systematic vs random errors  There are two qualitative differences between systematic and random errors.  #1  Data can tell us if an observed association is possibly due to random errors - check the p-value.  Data can never tell us if an observed association is due to systematic errors.  #2  Uncertainty due to random errors can be reduced by increasing the sample size  narrower confidence intervals.  Systematic errors results from a poor study design, and can not be reduced by increasing the sample size.

26 april 2015Arvid Sjölander29 Summary  In medical research, we are often interested in the causal effect of one variable on another.  An observed association between two variables does not necessarily imply that one causes the other.  Always be aware of the following pitfalls:  Selection bias  Confounding  Reverse causation  Random errors