‘Population’ and ‘Sample’ Studying population of interest. Usually would like to know typical value and spread of outcome measure in population. Data from entire population usually impossible or inefficient/expensive so take a sample (even census data can have missing values). Sample must be representative of population. Randomise!
E.g. Randomised Controlled Trial (RCT) POPULATIONSAMPLE RANDOMISATION GROUP 1 GROUP 2 OUTCOME
Types of Data Categorical Example: Yes/No Blood Group Graphs: Bar Chart Pie Chart Summary: Frequency (n) Proportion (%) Numerical/Continuous Example: Weight Pain Score Graphs: Histogram Box and Whisker Plot Summary: Mean & Standard Deviation (SD) Median & Inter-quartile range (IQR)
Types of Average (‘Average’ - a number which typifies a set of numbers) Mean = Total divided by n Median = Middle value Mode = Most common value/group (rarely used)
Mean or Median? Roughly Normally distributed: Mean or median Mean by convention Skewed: Median Less affected by extreme values
Variation and Spread Standard Deviation (‘SD’) - Average distance from mean - Use alongside mean Inter-Quartile Range (‘IQR’) - Range in which middle 50% of the data lie (middle 50% when ordered) - Use alongside median Range - Highest and lowest value - Possibly quote in addition to SD/IQR
Standard Error Not the same as standard deviation. Calculated using a measure of variability and sample size. Used to construct confidence intervals. Not very informative when given alongside statistics or as error bars on a plot.
Sample statistic is the best guess of the (true) population value E.g. Sample mean is the best estimate of mean in population. Mean likely to be different if take a new sample from the population. Know that estimate not likely to be exactly right.
Confidence Intervals (CIs) Confidence interval = “range of values that we can be confident will contain the true value of the population”. The “give or take a bit” for best estimate. Convention is to use a 95% confidence interval (‘95% CI’). But also leaves 5% confidence that this interval does not contain the true value.
Example: Legislation for smoke-free workplaces and health of bar workers in Ireland: before and after study (Allwright et al; BMJ Oct 2005) Before N=138 After N=138 Difference (95% CI) Salivary cotinine (nmol/l) Median (-26.7 to -19.0) Any respiratory symptoms n (%) 90 (65%)67 (49%)-16.7 (-26.1 to -7.3) Runny nose/sneezing n (%) 61 (44%)48 (35%)-9.4 (-19.8 to 0.9)
Example: Supplementary feeding with either ready-to- use fortified spread or corn-soy blend in wasted adults starting antiretroviral therapy in Malawi (MacDonald et al; BMJ May 2009) “After 14 weeks, patients receiving fortified spread had a greater increase in BMI and fat-free body mass than those receiving corn-soy blend: 2.2 (SD 1.9) v 1.7 (SD 1.6) (difference 0.5, 95% confidence interval 0.2 to 0.8), and 2.9 (SD 3.2) v 2.2 (SD 3.0) kg (difference 0.7 kg, 0.2 to 1.2 kg), respectively.”
Example: Sample size matters What proportion of patients attending clinic are satisfied? Sample size Number satisfied Proportion satisfied 95% CI for proportion 10770%35% to 93% %50% to 88% %55% to 82% %60% to 79% %67% to 73%
Example: % confidence matters Sample size = 50 No. satisfied = 35 Proportion satisfied= 70% 90% CI58% to 81% 95% CI55% to 82% 99% CI51% to 85% What proportion of patients attending clinic are satisfied?
p-values vs. Confidence Intervals p-value: -Weight of evidence to reject null hypothesis -No clinical interpretation Confidence Interval: -Can be used to reject null hypothesis -Clinical interpretation -Effect size -Direction of effect -Precision of population estimate
So… it’s not all about p-values! For some hypotheses p-value and CI will both indicate whether to reject it or not. A CI will also provide an estimate, as well as a range for that estimate. General medical journals prefer CI.
Statistical Packages PackageSummary StatisticsConfidence Intervals SPSS Not user-friendly Gives a large choice of statistics to calculate Doesn’t provide a CI for some key comparative statistics: e.g. simple percentage Stats Direct One right-click Will produce a set 20 or so of the most commonly used statistics Provides a CI for most statistics