Presentation on theme: "Statistics : the ten main mistakes Didier Concordet Ecole Nationale Vétérinaire de Toulouse July 2005."— Presentation transcript:
Statistics : the ten main mistakes Didier Concordet Ecole Nationale Vétérinaire de Toulouse July 2005
2 Statistical mistakes are frequent Many surveys of statistical errors in the medical literature with error rates ranging from 30%-90% (Altman, 1991; Gore et. al.,1976; Pocock et. al., 1987 and MacArthur, 1984) Reviews of the biomedical literature have consistently found that about half the articles use incorrect statistical methods (Glantz, 1980)
3 When do they occur ? When designing the experiment When collecting data When analysing data When interpreting results
4 Design Lack of a proper randomisation the inference space is not defined poor balance of the groups to be compared lack of control group (maybe les frequent now) there exist confounding factors Lack of power the sample size is not large enough to answer the question the statistical unit is not well defined
5 Inference space definition (M1) An experiment in 2 years old beagles showed that the temperature of dogs treated with the antipyretic drug A decreased by 2 °C. Does this result still hold for all 2 years old beagles 3 years olds beagles beagles dogs man
6 Poor balance (M2) Clinical trial comparison of 2 antipyretics rectal temperature after treatment X = 39 N = 100 SD = 1 REFERENCE X = 37 N = 100 SD = 1 New TRT Reference < New TRT ( P<0.001)
7 Poor balance Clinical trial comparison of 2 antipyretics rectal temperature after treatment Clinical trial 1 X = 40 N = 90 SD = 1 REFERENCE X = 42 N = 50 SD = 1 New TRT New TRT< Ref P<0.001 Clinical trial 2 X = 30 N = 10 SD = 1 REFERENCE X = 32 N = 50 SD = 1 New TRT New TRT < Ref P<0.001 Conclusion : Reference > New TRT
8 Power (M3) A clinical study to compare efficacy of two treatments (Ref. and Test) Expected difference between the treatments = 4 SD 2. For the efficacy variable A parallel two groups design is planned with 5 dogs in each groups What to think about this study ? 35 % of power for a type I risk of 5% Even if the expected difference exists, only 35% of the samples (of size 5)of dogs actually exhibits it !
9 Power Efficacy variable on two groups of dogs Ref Test Mean 15.4 SD N55 Student t-test :P = 0.18 Actually no conclusion
10 A real story A study was performed in order to study the effect of diet on several biochemical compounds (about 20). To this end, a dog was fed with a "normal" diet during 3 months and then with the new diet during 3 months. Every two days, a blood sample was taken and the biochemical compounds were dosed. At the end of the experiment 90 data were available for each biochemical compound. There was a significant difference between the effects of the two diets for 10 biochemical compounds (P<0.001). This result was obtained with a sample size of 90
11 Statistical unit (M4) The statistical unit (an individual) is a statistical object that cannot be divided. We want to generalise results obtained on a finite collection of units (a sample) to a population of units. Despite the appearance of "wealth", the sample size was equal to 1 not 90. At the end of the experiment, the only dog of the experiment was well known but what about the other dogs of the population ?
12 Experiment Missing data not adequately reported Extreme values excluded Data ignored because they did not support the hypothesis ?
13 Analysis Failure to check assumptions of the statistical methods (M5) homoscedasticity (for a t-test, a linear regression,…) using a linear regression without first establishing linearity… correlation Ignoring informative "missing" data death and its consequences data below LOQ Choosing the question to get an answer Multiple comparisons
14 Homoscedasticity (M5) 1 Treatment Clearance 2 t-test P-value = 0.56 After log-transf P-value = What the t-test can see
15 Linearity/Correlation (M5) Linear regression Correlation R = Linear regression Correlation R = -0.93
16 Linearity/Correlation Linear regression Correlation R = 0.84 A linear model with 3 groups Within group Correlation R = -0.92
17 Ignoring data (M6)
18 Ignoring data
19 Choosing the question to get an answer (M7) Occurs frequently in the presentation of clinical trials results The question becomes random : it changes with the sample of animals. The question is chosen with its answer in hands… Think about a flip coin game where you win 1 when tail or head occurs. You choose the decision rule once you know the result of the flip ! Such an approach increases the number of false discoveries.
20 Multiple comparisons (M8) Mean SD One wants to compare the ADG obtained with 5 different diets in pig Ten T-tests A risk of 5% for each comparison : the global risk can be very large
21 Interpretation/presentation Standard error and standard deviation P values : non significant effects False causality
22 Standard error / standard deviation (M9) The clairance of the drug was equal to 68 ± 5 mL/mn Two possible meanings depending on the meaning of 5 If 5 is the standard error of the mean (se) there is 95 % chance that the population mean clearance belongs to [ ; ] If 5 is the standard deviation (SD) 95 % of animals have their clearance within [ ; ]
23 P values (M10) The difference between the effect of the drugs A and B is not significant (P = 0.56) therefore drug A can be substituted by drug B. NO The only conclusion that can be drawn from such a P value is that you didn't see any difference between the effect of the drugs A and B. That does not mean that such a difference does not exist. Absence of evidence is not evidence of absence
24 P values (M10) The drug A has a higher efficacy than the drug B (P = 0.001) The drug C has a higher efficacy than the drug B (P = 0.04) Since 0.001<0.04 the drug A has a higher than the drug B. NO The only conclusion that can be drawn from such a P value is that you are sure than A>B and less sure than C>B. This does not presume anything about the amplitude of the differences. Significant does not mean important
25 False causality : lying with statistics There is a strong positive correlation between the number of firefighters present at a fire and the amount of fire damage. Thus, the firefighters present at fire create higher fire damage ! The correlation coefficient is nothing else than a measure of the strength of a linear relationship between 2 variables. Correlation cannot establish causality. A strong correlation between X and Y can occurs when "X" causes "Y" "Y" causes "X" "Z" causes "X" and "Y" (Z = fire size in the previous example) Incidentally with small samples size when X and Y are independent
26 How to avoid these mistakes ? Consult your prefered statistician for help in the design of complicated experiments Use basic descriptive statistics first (graphics, summary statistics,…) Use common sense Consider to learn more statistics