# Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who.

## Presentation on theme: "Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who."— Presentation transcript:

Basic Statistics

“I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who go to sleep in church were laid end to end they would be a lot more comfortable.” [Mrs Robert A Taft]

“Data! Data! Data!” he cried impatiently. “I can’t make bricks without clay” [Sherlock Holmes]

Qualitative a) Nominal data (dead/alive, blood group O,A,B,AB) b) Ordered categorical/ranked data (mild/moderate/severe)

Quantitative a) Numerical discrete (no. of deaths in a hospital per year) b) Numerical continuous (age, weight, blood pressure)

Presenting data Graphs Summary statistics Tables

Graphical methods Piechart Barchart Histogram Scattergram

Pie chart

Bar chart

Histogram

Boxplot

Error bar plot

Scattergram

Graph Example

Graph

Solution

Summary statistics Qualitative data Percentages Numbers

Secondary prevention of coronary heart disease Respondents (n=1343) Non-respondents (n=578) Male58% (782)54% (314) Urban Practice54% (720)57% (331) Practice size: < 5,00014% (190)18% (105) 5,000 – 10,00039% (523)41% (238) > 10,000 47% (630)41% (235)

Summarizing data example I

Summary Statistics Quantitative data Non-normal median range inter-quartile range Normal mean standard deviation variance

Boxplot

Summary Statistics Normal data Approximately 95% of observations lie between the mean plus or minus 2 standard deviations

Histogram

Histogram of IgM values

How to test for Normality Mean = Median (mean-2sd, mean+2sd) reasonable range -1 < skewness < 1 -1 < kurtosis < 1 Histogram shows symmetric bell shape

Checking for Normality AgeLength of stay Satisfaction score Mean66.212.15.2 Median6789 SD8.29.04.3 Minimum4941 Maximum803610 Skewness-0.21.8-2.5 Kurtosis0.51.34.6

Secondary prevention of coronary heart disease Mean (sd) Respondents (n=1343) Non-respondents (n=578) Age (years)66.2 (8.2)66.6 (8.7) Time since MI (mths) *10 (6, 35)15 (8, 47) Cholesterol (mmol/l)6.5 (1.2)6.6 (1.2) [* Median (range)]

Summary statistics example II

Natural log transformation Can transform +vely skewed data to ‘Normal’ data Use transformed data in analysis Resulting mean value transformed back (using e x ) to give geometric mean Present geometric mean and range

Effect of loge transformation Length of stay Loge length of stay Mean12.12.2 Median82.1 SD9.00.5 Minimum41.4 Maximum363.6 Skewness1.80.4 Kurtosis1.30.7 [Geometric mean = e 2.2 = 9.0]

Secondary prevention of coronary heart disease Mean (sd) Respondents (n=1343) Non-respondents (n=578) Age (years)66.2 (8.2)66.6 (8.7) Time since MI (mths) *10 (6, 35)15 (8, 47) Cholesterol (mmol/l) Length of stay # 6.5 (1.2) 9.0 (4, 36) 6.6 (1.2) 11.2 (6, 83) [* Median (range), # Geometric mean (range)]

Confidence Interval “ The estimated mean difference in systolic blood pressure between 100 diabetic and 100 non-diabetic men was 6.0 mmHg with 95% confidence interval (1.1mmHg, 10.9mmHg)”

Confidence Interval Contains information about the (im)precision of the estimated effect size Presents a range of values, on the basis of the sample data, in which the population value for such an effect size may lie

Confidence Interval 95% CI for mean = mean +/- 1.96 SEM 90% CI for mean = mean +/- 1.64 SEM SEM = sd / sqrt(n)

Confidence Interval The 95% CI is a range of values which we are 95% confident covers the true population mean There is a 5% chance that the ‘true’ mean lies outside the 95% CI

Error bar plot

Confidence Interval Example

Significance/hypothesis tests Measure strength of evidence provided by the data for or against some proposition of interest Eg. Is the survival rate after X better than after Y?

Significance/hypothesis tests Null hypothesis: “Effects of X and Y are the same” Alternative hypothesis: “Effects of X and Y are different”

Significance/hypothesis tests One-sided : “X is better than Y” Two-sided: “ X and Y have different effects”

P-value P is the probability of how true is the null hypothesis

P-value P <= 0.05 null hypothesis is not true there is a difference between X and Y result is statistically significant

P-value P > 0.05 null hypothesis may be true there is probably no difference between X and Y result is not statistically significant

P-value Power of study probability of rejecting null hypothesis when false increased by increasing sample size increased if true difference between treatments is large

P-value Statistical significance does not imply clinical significance

A statistician is a person whose lifetime ambition is to be wrong 5% of the time

Types of significance tests Chi-square test: “28 out of 70 smokers have a cough compared with 5 out of 50 non-smokers - is there a significant difference?” [28/70 = 40% compared with 5/50=10%]

Chi-square test result “P=0.001” There is a significant relationship between smoking and cough

Types of significance tests Two-sample t-test: “Is there a difference in the 24 hour energy expenditure between groups of lean and obese women?”

Types of significance tests Mann-Whitney U-test: “Is there a difference in the nausea score between chemo patients receiving an active anti-emetic treatment and those receiving placebo?”

Types of significance tests Paired t-test: “Is there a difference in the dietary intake of a group of students in the week before and after Finals?”

Types of significance tests Wilcoxon matched pairs signed rank test or the Sign test: “Is there a difference in the units of alcohol consumed by students in the week before and after finals?”

Significance test example

Correlation Measures the strength of the relationship between two variables

Scattergram

Correlation Pearson correlation: Used for Normally distributed data Measures linear relation between variables

Correlation r = 0 no relationship r = 1 perfect +ve relationship r = -1 perfect –ve relationship

Scattergram

Correlation Spearman correlation: Used for non-Normally distributed data Measures monotonic relationship between variables

Correlation Example

Correlation

“The government are very keen on amassing statistics. They collect them, add them, raise them to the n’th power, take the cube root and prepare wonderful diagrams. But you must never forget that every one of these figures comes in the first instance from the village watchman, who just puts down what he damn pleases” [Comment of a judge on the subject of government statistics, 1920]

Similar presentations