Download presentation
Presentation is loading. Please wait.
Published byMarco Stokely Modified over 9 years ago
1
Basic Statistics
2
“I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who go to sleep in church were laid end to end they would be a lot more comfortable.” [Mrs Robert A Taft]
3
“Data! Data! Data!” he cried impatiently. “I can’t make bricks without clay” [Sherlock Holmes]
4
Qualitative a) Nominal data (dead/alive, blood group O,A,B,AB) b) Ordered categorical/ranked data (mild/moderate/severe)
5
Quantitative a) Numerical discrete (no. of deaths in a hospital per year) b) Numerical continuous (age, weight, blood pressure)
6
Presenting data Graphs Summary statistics Tables
7
Graphical methods Piechart Barchart Histogram Scattergram
8
Pie chart
9
Bar chart
10
Histogram
11
Boxplot
12
Error bar plot
13
Scattergram
14
Graph Example
15
Graph
16
Solution
17
Summary statistics Qualitative data Percentages Numbers
18
Secondary prevention of coronary heart disease Respondents (n=1343) Non-respondents (n=578) Male58% (782)54% (314) Urban Practice54% (720)57% (331) Practice size: < 5,00014% (190)18% (105) 5,000 – 10,00039% (523)41% (238) > 10,000 47% (630)41% (235)
19
Summarizing data example I
20
Summary Statistics Quantitative data Non-normal median range inter-quartile range Normal mean standard deviation variance
21
Boxplot
22
Summary Statistics Normal data Approximately 95% of observations lie between the mean plus or minus 2 standard deviations
23
Histogram
24
Histogram of IgM values
25
How to test for Normality Mean = Median (mean-2sd, mean+2sd) reasonable range -1 < skewness < 1 -1 < kurtosis < 1 Histogram shows symmetric bell shape
26
Checking for Normality AgeLength of stay Satisfaction score Mean66.212.15.2 Median6789 SD8.29.04.3 Minimum4941 Maximum803610 Skewness-0.21.8-2.5 Kurtosis0.51.34.6
27
Secondary prevention of coronary heart disease Mean (sd) Respondents (n=1343) Non-respondents (n=578) Age (years)66.2 (8.2)66.6 (8.7) Time since MI (mths) *10 (6, 35)15 (8, 47) Cholesterol (mmol/l)6.5 (1.2)6.6 (1.2) [* Median (range)]
28
Summary statistics example II
29
Natural log transformation Can transform +vely skewed data to ‘Normal’ data Use transformed data in analysis Resulting mean value transformed back (using e x ) to give geometric mean Present geometric mean and range
30
Effect of loge transformation Length of stay Loge length of stay Mean12.12.2 Median82.1 SD9.00.5 Minimum41.4 Maximum363.6 Skewness1.80.4 Kurtosis1.30.7 [Geometric mean = e 2.2 = 9.0]
31
Secondary prevention of coronary heart disease Mean (sd) Respondents (n=1343) Non-respondents (n=578) Age (years)66.2 (8.2)66.6 (8.7) Time since MI (mths) *10 (6, 35)15 (8, 47) Cholesterol (mmol/l) Length of stay # 6.5 (1.2) 9.0 (4, 36) 6.6 (1.2) 11.2 (6, 83) [* Median (range), # Geometric mean (range)]
32
Confidence Interval “ The estimated mean difference in systolic blood pressure between 100 diabetic and 100 non-diabetic men was 6.0 mmHg with 95% confidence interval (1.1mmHg, 10.9mmHg)”
33
Confidence Interval Contains information about the (im)precision of the estimated effect size Presents a range of values, on the basis of the sample data, in which the population value for such an effect size may lie
34
Confidence Interval 95% CI for mean = mean +/- 1.96 SEM 90% CI for mean = mean +/- 1.64 SEM SEM = sd / sqrt(n)
35
Confidence Interval The 95% CI is a range of values which we are 95% confident covers the true population mean There is a 5% chance that the ‘true’ mean lies outside the 95% CI
36
Error bar plot
37
Confidence Interval Example
38
Significance/hypothesis tests Measure strength of evidence provided by the data for or against some proposition of interest Eg. Is the survival rate after X better than after Y?
39
Significance/hypothesis tests Null hypothesis: “Effects of X and Y are the same” Alternative hypothesis: “Effects of X and Y are different”
40
Significance/hypothesis tests One-sided : “X is better than Y” Two-sided: “ X and Y have different effects”
41
P-value P is the probability of how true is the null hypothesis
42
P-value P <= 0.05 null hypothesis is not true there is a difference between X and Y result is statistically significant
43
P-value P > 0.05 null hypothesis may be true there is probably no difference between X and Y result is not statistically significant
44
P-value Power of study probability of rejecting null hypothesis when false increased by increasing sample size increased if true difference between treatments is large
45
P-value Statistical significance does not imply clinical significance
46
A statistician is a person whose lifetime ambition is to be wrong 5% of the time
47
Types of significance tests Chi-square test: “28 out of 70 smokers have a cough compared with 5 out of 50 non-smokers - is there a significant difference?” [28/70 = 40% compared with 5/50=10%]
48
Chi-square test result “P=0.001” There is a significant relationship between smoking and cough
49
Types of significance tests Two-sample t-test: “Is there a difference in the 24 hour energy expenditure between groups of lean and obese women?”
50
Types of significance tests Mann-Whitney U-test: “Is there a difference in the nausea score between chemo patients receiving an active anti-emetic treatment and those receiving placebo?”
51
Types of significance tests Paired t-test: “Is there a difference in the dietary intake of a group of students in the week before and after Finals?”
52
Types of significance tests Wilcoxon matched pairs signed rank test or the Sign test: “Is there a difference in the units of alcohol consumed by students in the week before and after finals?”
53
Significance test example
54
Correlation Measures the strength of the relationship between two variables
55
Scattergram
56
Correlation Pearson correlation: Used for Normally distributed data Measures linear relation between variables
57
Correlation r = 0 no relationship r = 1 perfect +ve relationship r = -1 perfect –ve relationship
58
Scattergram
59
Correlation Spearman correlation: Used for non-Normally distributed data Measures monotonic relationship between variables
60
Correlation Example
61
Correlation
62
“The government are very keen on amassing statistics. They collect them, add them, raise them to the n’th power, take the cube root and prepare wonderful diagrams. But you must never forget that every one of these figures comes in the first instance from the village watchman, who just puts down what he damn pleases” [Comment of a judge on the subject of government statistics, 1920]
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.