Download presentation

Presentation is loading. Please wait.

Published byMarco Stokely Modified over 2 years ago

1
Basic Statistics

2
“I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who go to sleep in church were laid end to end they would be a lot more comfortable.” [Mrs Robert A Taft]

3
“Data! Data! Data!” he cried impatiently. “I can’t make bricks without clay” [Sherlock Holmes]

4
Qualitative a) Nominal data (dead/alive, blood group O,A,B,AB) b) Ordered categorical/ranked data (mild/moderate/severe)

5
Quantitative a) Numerical discrete (no. of deaths in a hospital per year) b) Numerical continuous (age, weight, blood pressure)

6
Presenting data Graphs Summary statistics Tables

7
Graphical methods Piechart Barchart Histogram Scattergram

8
Pie chart

9
Bar chart

10
Histogram

11
Boxplot

12
Error bar plot

13
Scattergram

14
Graph Example

15
Graph

16
Solution

17
Summary statistics Qualitative data Percentages Numbers

18
Secondary prevention of coronary heart disease Respondents (n=1343) Non-respondents (n=578) Male58% (782)54% (314) Urban Practice54% (720)57% (331) Practice size: < 5,00014% (190)18% (105) 5,000 – 10,00039% (523)41% (238) > 10,000 47% (630)41% (235)

19
Summarizing data example I

20
Summary Statistics Quantitative data Non-normal median range inter-quartile range Normal mean standard deviation variance

21
Boxplot

22
Summary Statistics Normal data Approximately 95% of observations lie between the mean plus or minus 2 standard deviations

23
Histogram

24
Histogram of IgM values

25
How to test for Normality Mean = Median (mean-2sd, mean+2sd) reasonable range -1 < skewness < 1 -1 < kurtosis < 1 Histogram shows symmetric bell shape

26
Checking for Normality AgeLength of stay Satisfaction score Mean66.212.15.2 Median6789 SD8.29.04.3 Minimum4941 Maximum803610 Skewness-0.21.8-2.5 Kurtosis0.51.34.6

27
Secondary prevention of coronary heart disease Mean (sd) Respondents (n=1343) Non-respondents (n=578) Age (years)66.2 (8.2)66.6 (8.7) Time since MI (mths) *10 (6, 35)15 (8, 47) Cholesterol (mmol/l)6.5 (1.2)6.6 (1.2) [* Median (range)]

28
Summary statistics example II

29
Natural log transformation Can transform +vely skewed data to ‘Normal’ data Use transformed data in analysis Resulting mean value transformed back (using e x ) to give geometric mean Present geometric mean and range

30
Effect of loge transformation Length of stay Loge length of stay Mean12.12.2 Median82.1 SD9.00.5 Minimum41.4 Maximum363.6 Skewness1.80.4 Kurtosis1.30.7 [Geometric mean = e 2.2 = 9.0]

31
Secondary prevention of coronary heart disease Mean (sd) Respondents (n=1343) Non-respondents (n=578) Age (years)66.2 (8.2)66.6 (8.7) Time since MI (mths) *10 (6, 35)15 (8, 47) Cholesterol (mmol/l) Length of stay # 6.5 (1.2) 9.0 (4, 36) 6.6 (1.2) 11.2 (6, 83) [* Median (range), # Geometric mean (range)]

32
Confidence Interval “ The estimated mean difference in systolic blood pressure between 100 diabetic and 100 non-diabetic men was 6.0 mmHg with 95% confidence interval (1.1mmHg, 10.9mmHg)”

33
Confidence Interval Contains information about the (im)precision of the estimated effect size Presents a range of values, on the basis of the sample data, in which the population value for such an effect size may lie

34
Confidence Interval 95% CI for mean = mean +/- 1.96 SEM 90% CI for mean = mean +/- 1.64 SEM SEM = sd / sqrt(n)

35
Confidence Interval The 95% CI is a range of values which we are 95% confident covers the true population mean There is a 5% chance that the ‘true’ mean lies outside the 95% CI

36
Error bar plot

37
Confidence Interval Example

38
Significance/hypothesis tests Measure strength of evidence provided by the data for or against some proposition of interest Eg. Is the survival rate after X better than after Y?

39
Significance/hypothesis tests Null hypothesis: “Effects of X and Y are the same” Alternative hypothesis: “Effects of X and Y are different”

40
Significance/hypothesis tests One-sided : “X is better than Y” Two-sided: “ X and Y have different effects”

41
P-value P is the probability of how true is the null hypothesis

42
P-value P <= 0.05 null hypothesis is not true there is a difference between X and Y result is statistically significant

43
P-value P > 0.05 null hypothesis may be true there is probably no difference between X and Y result is not statistically significant

44
P-value Power of study probability of rejecting null hypothesis when false increased by increasing sample size increased if true difference between treatments is large

45
P-value Statistical significance does not imply clinical significance

46
A statistician is a person whose lifetime ambition is to be wrong 5% of the time

47
Types of significance tests Chi-square test: “28 out of 70 smokers have a cough compared with 5 out of 50 non-smokers - is there a significant difference?” [28/70 = 40% compared with 5/50=10%]

48
Chi-square test result “P=0.001” There is a significant relationship between smoking and cough

49
Types of significance tests Two-sample t-test: “Is there a difference in the 24 hour energy expenditure between groups of lean and obese women?”

50
Types of significance tests Mann-Whitney U-test: “Is there a difference in the nausea score between chemo patients receiving an active anti-emetic treatment and those receiving placebo?”

51
Types of significance tests Paired t-test: “Is there a difference in the dietary intake of a group of students in the week before and after Finals?”

52
Types of significance tests Wilcoxon matched pairs signed rank test or the Sign test: “Is there a difference in the units of alcohol consumed by students in the week before and after finals?”

53
Significance test example

54
Correlation Measures the strength of the relationship between two variables

55
Scattergram

56
Correlation Pearson correlation: Used for Normally distributed data Measures linear relation between variables

57
Correlation r = 0 no relationship r = 1 perfect +ve relationship r = -1 perfect –ve relationship

58
Scattergram

59
Correlation Spearman correlation: Used for non-Normally distributed data Measures monotonic relationship between variables

60
Correlation Example

61
Correlation

62
“The government are very keen on amassing statistics. They collect them, add them, raise them to the n’th power, take the cube root and prepare wonderful diagrams. But you must never forget that every one of these figures comes in the first instance from the village watchman, who just puts down what he damn pleases” [Comment of a judge on the subject of government statistics, 1920]

Similar presentations

OK

STATISTICAL ANALYSIS. Your introduction to statistics should not be like drinking water from a fire hose!!

STATISTICAL ANALYSIS. Your introduction to statistics should not be like drinking water from a fire hose!!

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google