Presentation is loading. Please wait.

Presentation is loading. Please wait.

Planning, Performing, and Publishing Research with Confidence Limits A tutorial lecture given at the annual meeting of the American College of Sports Medicine,

Similar presentations


Presentation on theme: "Planning, Performing, and Publishing Research with Confidence Limits A tutorial lecture given at the annual meeting of the American College of Sports Medicine,"— Presentation transcript:

1 Planning, Performing, and Publishing Research with Confidence Limits A tutorial lecture given at the annual meeting of the American College of Sports Medicine, Seattle, June 4 1999. © Will G Hopkins Physiology and Physical Education University of Otago Dunedin NZ will.hopkins@otago.ac.nz

2 Outline Definitions and Mis/interpretations Definitions and Mis/interpretations Planning Planning Sample size Sample size Performing Performing Sample size "on the fly" Sample size "on the fly" Publishing Publishing Methods, Results, Discussion Methods, Results, Discussion Meta-analysis Meta-analysis Publishing non-significant outcomes Publishing non-significant outcomes Conclusions Conclusions Dis/advantages Dis/advantages

3 Definitions and Mis/interpretations Confidence limits: Definitions Confidence limits: Definitions "Margin of error" "Margin of error" Example: Survey of 1000 voters Democrats 43%, Republicans 33% Margin of error is ± 3% (for a result of 50%...) Example: Survey of 1000 voters Democrats 43%, Republicans 33% Margin of error is ± 3% (for a result of 50%...) Likely range of true value Likely range of true value "Likely" is usually 95%. "Likely" is usually 95%. "True value" = population value = value if you studied the entire population. "True value" = population value = value if you studied the entire population. Example: Survey of 1000 voters Democrats 43% (likely range 40 to 46%) Democrats - Republicans 10% (likely range 5 to 15%) Example: Survey of 1000 voters Democrats 43% (likely range 40 to 46%) Democrats - Republicans 10% (likely range 5 to 15%)

4 Example: in a study of 64 subjects, the correlation between height and weight was 0.68 (likely range 0.52 to 0.79). Example: in a study of 64 subjects, the correlation between height and weight was 0.68 (likely range 0.52 to 0.79). correlation coefficient observed value observed value 0.00 0.50 1 1 upper confidence limit upper confidence limit lower confidence limit lower confidence limit

5 Confidence interval: difference between the upper and lower confidence limits. Confidence interval: difference between the upper and lower confidence limits. Amazing facts about confidence intervals (for normally distributed statistics) Amazing facts about confidence intervals (for normally distributed statistics) To halve the interval, you have to quadruple sample size. To halve the interval, you have to quadruple sample size. A 99% interval is 1.3 times wider than a 95% interval. You need 1.7 times the sample size for the same width. A 99% interval is 1.3 times wider than a 95% interval. You need 1.7 times the sample size for the same width. A 90% interval is 0.8 of the width of a 95% interval. You need 0.7 times the sample size for the same width. A 90% interval is 0.8 of the width of a 95% interval. You need 0.7 times the sample size for the same width.

6 How to Derive Confidence Limits How to Derive Confidence Limits Find a function(true value, observed value, data) with a known probability distribution. Find a function(true value, observed value, data) with a known probability distribution. Calculate a critical value, such that for 2.5% of the time, function(true value, observed value, data) < critical value. Calculate a critical value, such that for 2.5% of the time, function(true value, observed value, data) < critical value. How to Derive Confidence Limits How to Derive Confidence Limits Find a function(true value, observed value, data) with a known probability distribution. Find a function(true value, observed value, data) with a known probability distribution. Calculate a critical value, such that for 2.5% of the time, function(true value, observed value, data) < critical value. Calculate a critical value, such that for 2.5% of the time, function(true value, observed value, data) < critical value. probability function (e.g. (n-1)s 2 / 2 ) area = 0.025 critical value probability distribution of function (e.g. 2 ) Rearranging, for 2.5% of the time, true value > function'(observed value, data, critical value) = upper confidence limit Rearranging, for 2.5% of the time, true value > function'(observed value, data, critical value) = upper confidence limit

7 Mis/interpretation of confidence limits Mis/interpretation of confidence limits Hard to misinterpret confidence limits for simple proportions and correlation coefficients. Hard to misinterpret confidence limits for simple proportions and correlation coefficients. Easier to misinterpret changes in means. Easier to misinterpret changes in means. Example: The change in blood volume in a study was 0.52 L (likely range 0.12 to 0.92 L). Example: The change in blood volume in a study was 0.52 L (likely range 0.12 to 0.92 L). For 95% of subjects, the change was/would be between 0.12 and 0.92 L. For 95% of subjects, the change was/would be between 0.12 and 0.92 L. The average change in the population would be between 0.12 and 0.92 L. The average change in the population would be between 0.12 and 0.92 L. The change for the average subject would be between 0.12 and 0.92 L. The change for the average subject would be between 0.12 and 0.92 L. There may be individual differences in the change. There may be individual differences in the change.

8 P value: Definition P value: Definition The probability of a more extreme absolute value than the observed value if the true value was zero or null. The probability of a more extreme absolute value than the observed value if the true value was zero or null. Example: 20 subjects, correlation = 0.25, p = 0.29. Example: 20 subjects, correlation = 0.25, p = 0.29. probability correlation coefficient area = p value = 0.29 area = p value = 0.29 no effect observed effect (r = 0.25) observed effect (r = 0.25) distribution of correlations for no effect and n = 20 distribution of correlations for no effect and n = 20 0 0 0.5 -0.5

9 "Statistically Significant": Definitions "Statistically Significant": Definitions P < 0.05 P < 0.05 Zero lies outside the confidence interval. Zero lies outside the confidence interval. Examples: four correlations for samples of size 20. Examples: four correlations for samples of size 20. 0.00 0.50 1 1 correlation coefficient -0.50 r r likely range P P 0.70 0.37 to 0.87 0.007 0.44 0.00 to 0.74 0.05 0.25 -0.22 to 0.62 0.29 0.00 -0.44 to 0.44 1.00

10 Incredibly interesting information about statistical significance and confidence intervals Incredibly interesting information about statistical significance and confidence intervals Two independent estimates of a normally distributed statistic with equal confidence intervals are significantly different at the 5% level if the overlap of their intervals is less than 0.29 (1 - 2/2) of the length of the interval. Two independent estimates of a normally distributed statistic with equal confidence intervals are significantly different at the 5% level if the overlap of their intervals is less than 0.29 (1 - 2/2) of the length of the interval. If the intervals are very unequal... If the intervals are very unequal... p < 0.05 p = 0.05 p > 0.05 p < 0.05 p = 0.05 p > 0.05

11 Type I and II Errors Type I and II Errors You could be wrong about significance or lack of it. You could be wrong about significance or lack of it. Type I error = false alarm. Type I error = false alarm. Rate = 5% for zero real effect. Rate = 5% for zero real effect. Type II error = failed alarm. Type II error = failed alarm. Traditional acceptable rate = 20% for smallest worthwhile effect. Traditional acceptable rate = 20% for smallest worthwhile effect. Lots of tests for significance implies more chance of at least one false alarm: "inflated type I error". Lots of tests for significance implies more chance of at least one false alarm: "inflated type I error". Ditto type II error? Ditto type II error? Deal with inflated type I error by reducing the p value. Deal with inflated type I error by reducing the p value. Should we adjust confidence intervals? No. Should we adjust confidence intervals? No.

12 Mis/interpretation of P > 0.05 (for an observed positive effect) Mis/interpretation of P > 0.05 (for an observed positive effect) The effect is not publishable. The effect is not publishable. There is no effect. There is no effect. The effect is probably zero or trivial. The effect is probably zero or trivial. There's a reasonable chance the effect is < zero. There's a reasonable chance the effect is < zero. Mis/interpretation of P < 0.05 (for an observed positive effect) Mis/interpretation of P < 0.05 (for an observed positive effect) The effect is probably big. The effect is probably big. There's a < 5% chance the effect is zero. There's a < 5% chance the effect is zero. There's a < 2.5% chance the effect is < zero. There's a < 2.5% chance the effect is < zero. There's a high chance the effect is > zero. There's a high chance the effect is > zero. The effect is publishable. The effect is publishable.

13 Planning Research Sample Size via Statistical Significance Sample Size via Statistical Significance Sample size must be big enough to be sure you will detect the smallest worthwhile effect. Sample size must be big enough to be sure you will detect the smallest worthwhile effect. To be sure: 80% of the time. To be sure: 80% of the time. Detect: P < 0.05. Detect: P < 0.05. Smallest worthwhile effect: what impacts your subjects Smallest worthwhile effect: what impacts your subjects correlation = 0.10 correlation = 0.10 relative risk = 1.2 (or frequency difference = 10%) relative risk = 1.2 (or frequency difference = 10%) difference in means = 0.2 of a between-subject standard deviation difference in means = 0.2 of a between-subject standard deviation change in means = 0.5 of a within-subject standard deviation change in means = 0.5 of a within-subject standard deviation Example: 760 subjects to detect a correlation of 0.10. Example: 760 subjects to detect a correlation of 0.10. Example: 68 subjects to detect a 0.5% change in a crossover study when the within-subject variation is 1%. Example: 68 subjects to detect a 0.5% change in a crossover study when the within-subject variation is 1%.

14 But 95% likely range doesn't work properly with traditional sample-size estimation (maybe). Example: Correlation of 0.06, sample size of 760... But 95% likely range doesn't work properly with traditional sample-size estimation (maybe). Example: Correlation of 0.06, sample size of 760... 47.5% + 47.5% (=95%) likely range: 47.5% + 47.5% (=95%) likely range: 0.1 0 0 correlation coefficient -0.1 Not significant, but could be substantial. Huh? 0.1 0 0 correlation coefficient -0.1 47.5% + 30% likely range: 47.5% + 30% likely range: Not significant, and can't be substantial. OK!

15 Sample Size via Confidence Limits Sample Size via Confidence Limits Sample size must be big enough for acceptable precision of the effect. Sample size must be big enough for acceptable precision of the effect. Precision means 95% confidence limits. Precision means 95% confidence limits. Acceptable means any value of the effect within these limits will not impact your subjects. Acceptable means any value of the effect within these limits will not impact your subjects. Example: need 380 subjects to delimit a correlation of zero. Example: need 380 subjects to delimit a correlation of zero. 0 0 0.10 correlation coefficient -0.10 smallest worthwhile effects smallest worthwhile effects confidence interval for N = 380 confidence interval for N = 380

16 But sample size needed to detect or delimit smallest effect is overkill for larger effects. But sample size needed to detect or delimit smallest effect is overkill for larger effects. Example: confidence limits for correlations of 0.10 and 0.80 with a sample size of 760... Example: confidence limits for correlations of 0.10 and 0.80 with a sample size of 760... 0.1 0.3 0.5 0.7 0 0 0.9 1 1 correlation coefficient -0.1 So why not start with a smaller sample and do more subjects only if necessary? Yes, I call it... So why not start with a smaller sample and do more subjects only if necessary? Yes, I call it...

17 Performing Research Sample Size "On the Fly" Sample Size "On the Fly" Start with a small sample; add subjects until you get acceptable precision for the effect. Start with a small sample; add subjects until you get acceptable precision for the effect. Acceptable precision defined as before. Acceptable precision defined as before. Need qualitative scale for magnitudes of effects. Need qualitative scale for magnitudes of effects. Example: sample sizes to delimit correlations... Example: sample sizes to delimit correlations... 155 0.1 0.3 0.5 0.7 0 0 0.9 1 1 trivialsmallmoderatelarge 270 350 380 correlation coefficient -0.1 nearly perfect 46 very large

18 Problems with sampling on the fly Problems with sampling on the fly Do not sample until you get statistical significance: the resulting outcomes are biased larger than life. Do not sample until you get statistical significance: the resulting outcomes are biased larger than life. Sampling until the confidence interval is acceptable produces bias, but it is negligible. Sampling until the confidence interval is acceptable produces bias, but it is negligible. But researchers will rush into print as soon as they get statistical significance. But researchers will rush into print as soon as they get statistical significance. And funding agencies prefer to give money once (but you could give some back!). And funding agencies prefer to give money once (but you could give some back!). And all the big effects have been researched anyway? No, not really. And all the big effects have been researched anyway? No, not really.

19 Publishing Research In the Methods In the Methods "We show the precision of our estimates of outcome statistics as 95% confidence limits (which define the likely range of the true value in the population from which we drew our sample)." "We show the precision of our estimates of outcome statistics as 95% confidence limits (which define the likely range of the true value in the population from which we drew our sample)." Amazingly useful tips on calculating confidence limits Amazingly useful tips on calculating confidence limits Simple differences between means: stats program. Simple differences between means: stats program. Other normally distributed statistics: mean and p value. Other normally distributed statistics: mean and p value. Relative risks: stats program. Relative risks: stats program. Correlations: Fisher's z transform. Correlations: Fisher's z transform. Standard deviations and other root mean square variations: chi-squared distribution. Standard deviations and other root mean square variations: chi-squared distribution.

20 Coefficients of variation: standard deviation of 100x natural log of the variable. Back transform for CV>5%. Coefficients of variation: standard deviation of 100x natural log of the variable. Back transform for CV>5%. Use the adjustment of Tate and Klett to get shorter intervals for SDs and CVs from small samples. Use the adjustment of Tate and Klett to get shorter intervals for SDs and CVs from small samples. 2 2 1 1 coefficient of variation (%) 0 0 3 3 Example: coefficient of variation for 10 subjects in 2 tests usual adjusted Ratios of independent standard deviations: F distribution. Ratios of independent standard deviations: F distribution. R 2 (variance explained): convert to a correlation. R 2 (variance explained): convert to a correlation. Use the spreadsheet at sportsci.org/stats for all the above. Use the spreadsheet at sportsci.org/stats for all the above. Effect-size ( mean/standard deviation): non-central F distribution or bootstrapping. Effect-size ( mean/standard deviation): non-central F distribution or bootstrapping. Really awful statistics: bootstrapping. Really awful statistics: bootstrapping.

21 Bootstrapping (Resampling) for confidence limits Bootstrapping (Resampling) for confidence limits Use for difficult statistics, e.g. for grossly non-normal repeated measures with missing values. Here's how... Use for difficult statistics, e.g. for grossly non-normal repeated measures with missing values. Here's how... For a large-enough sample, you can recreate (sort of) the population by duplicating the sample endlessly. For a large-enough sample, you can recreate (sort of) the population by duplicating the sample endlessly. Draw 1000 samples (of same size as your original) from this population. Draw 1000 samples (of same size as your original) from this population. Calculate your outcome statistic for each of these samples, rank them, then find the 25th and 975th place- getters. These are the confidence limits. Calculate your outcome statistic for each of these samples, rank them, then find the 25th and 975th place- getters. These are the confidence limits. Problems Problems Painful to generate. Painful to generate. No good for infrequent levels of nominal variables. No good for infrequent levels of nominal variables.

22 In the Results In the Results In TEXT In TEXT Change or difference in means First mention:...0.42 (95% confidence/likely limits/range -0.09 to 0.93) or...0.42 (95% confidence/likely limits/range ± 0.51). Thereafter:...2.6 (1.4 to 3.8) or 2.6 (± 1.2) etc. Change or difference in means First mention:...0.42 (95% confidence/likely limits/range -0.09 to 0.93) or...0.42 (95% confidence/likely limits/range ± 0.51). Thereafter:...2.6 (1.4 to 3.8) or 2.6 (± 1.2) etc. Correlations, relative risks, odds ratios, standard deviations, ratios of standard deviations: can't use ± because the confidence interval is skewed:...a correlation of 0.90 (0.67 to 0.97)......a coefficient of variation of 1.3% (0.9 to 1.9)... Correlations, relative risks, odds ratios, standard deviations, ratios of standard deviations: can't use ± because the confidence interval is skewed:...a correlation of 0.90 (0.67 to 0.97)......a coefficient of variation of 1.3% (0.9 to 1.9)...

23 In TABLES In TABLES Confidence intervals Confidence intervals r r likely range 0.70 0.37 to 0.87 0.44 0.00 to 0.74 0.25 -0.22 to 0.62 0.00 -0.44 to 0.44 Variable A Variable B Variable C Variable D P values P values r r p p 0.70 0.007 0.44 0.05 0.25 0.29 0.00 1.00 Variable A Variable B Variable C Variable D Asterisks Asterisks r r 0.70** 0.44* 0.25 0.00 Variable A Variable B Variable C Variable D

24 In FIGURES In FIGURES -10-50510 Change in power (%) Bars are 95% likely ranges Told placebo Not told Told carbohydrate

25 -3 -2 0 1 2 3 4 02468101214 live low train low live high train high live high train low sea level altitude sea level change in 5000-m time (%) training time (weeks) likely range of true change

26 In the Discussion In the Discussion Interpret the observed effect and its 95% confidence limits qualitatively. Interpret the observed effect and its 95% confidence limits qualitatively. Example: you observed a moderate correlation, but the true value of the correlation could be anything between trivial and very strong. Example: you observed a moderate correlation, but the true value of the correlation could be anything between trivial and very strong. 0.1 0.3 0.5 0.7 0 0 0.9 1 1 trivialsmallmoderatelarge correlation coefficient -0.1 nearly perfect very large

27 Meta-Analysis Meta-Analysis Deriving a single estimate and confidence interval for an effect from several studies. Deriving a single estimate and confidence interval for an effect from several studies. Here's how it works for two: Here's how it works for two: Study 1 Study 2 Study 1+2 Equal Confidence Intervals Study 1 Study 2 Study 1+2 Unequal Confidence Intervals

28 Publishing non-significant outcomes Publishing non-significant outcomes Publishing only significant effects from small-scale studies leads to publication bias. Publishing only significant effects from small-scale studies leads to publication bias. Publishing effects with confidence limits regardless of magnitude is free of bias. Publishing effects with confidence limits regardless of magnitude is free of bias. Many smaller studies are probably better than a few larger ones anyway. Many smaller studies are probably better than a few larger ones anyway. So bully the editor into accepting the paper about your seemingly inconclusive small-scale study. So bully the editor into accepting the paper about your seemingly inconclusive small-scale study.

29 Conclusions Disadvantages of Statistical Significance Disadvantages of Statistical Significance Emphasizes testing of hypotheses. Emphasizes testing of hypotheses. Aim is to detect an effect--effects are zero until proven otherwise. Aim is to detect an effect--effects are zero until proven otherwise. Have to understand Type I and II errors. Have to understand Type I and II errors. Hard to understand; easy to misinterpret. Hard to understand; easy to misinterpret. Have to consider sample size. Have to consider sample size. Focuses on statistically significant effects. Focuses on statistically significant effects. Advantages of Statistical Significance Advantages of Statistical Significance Familiar. Familiar. All stats programs give p values. All stats programs give p values. Easy to put asterisks in tables and figures. Easy to put asterisks in tables and figures.

30 Disadvantages of Confidence Limits Disadvantages of Confidence Limits Unfamiliar. Unfamiliar. Not always available in stats programs. Not always available in stats programs. Cluttersome in tables. Cluttersome in tables. Display in time series can be a challenge. Display in time series can be a challenge. Advantages of Confidence Limits Advantages of Confidence Limits Emphasizes precision of estimation. Emphasizes precision of estimation. Aim is to delimit an effect--effects are never zero. Aim is to delimit an effect--effects are never zero. Only one kind of "error". Only one kind of "error". Meaning is reasonably clear, even to lay readers. Meaning is reasonably clear, even to lay readers. No confusion between significance and magnitude. No confusion between significance and magnitude. Journals now require them. Journals now require them.


Download ppt "Planning, Performing, and Publishing Research with Confidence Limits A tutorial lecture given at the annual meeting of the American College of Sports Medicine,"

Similar presentations


Ads by Google