Presentation is loading. Please wait.

Presentation is loading. Please wait.

Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who.

Similar presentations


Presentation on theme: "Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who."— Presentation transcript:

1 Basic Statistics

2 “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who go to sleep in church were laid end to end they would be a lot more comfortable.” [Mrs Robert A Taft]

3 “Data! Data! Data!” he cried impatiently. “I can’t make bricks without clay” [Sherlock Holmes]

4 Qualitative a) Nominal data (dead/alive, blood group O,A,B,AB) b) Ordered categorical/ranked data (mild/moderate/severe)

5 Quantitative a) Numerical discrete (no. of deaths in a hospital per year) b) Numerical continuous (age, weight, blood pressure)

6 Presenting data Graphs Summary statistics Tables

7 Graphical methods Piechart Barchart Histogram Scattergram

8 Pie chart

9 Bar chart

10 Histogram

11 Boxplot

12 Error bar plot

13 Scattergram

14 Graph Example

15 Graph

16 Solution

17 Summary statistics Qualitative data Percentages Numbers

18 Secondary prevention of coronary heart disease Respondents (n=1343) Non-respondents (n=578) Male58% (782)54% (314) Urban Practice54% (720)57% (331) Practice size: < 5,00014% (190)18% (105) 5,000 – 10,00039% (523)41% (238) > 10,000 47% (630)41% (235)

19 Summarizing data example I

20 Summary Statistics Quantitative data Non-normal median range inter-quartile range Normal mean standard deviation variance

21 Boxplot

22 Summary Statistics Normal data Approximately 95% of observations lie between the mean plus or minus 2 standard deviations

23 Histogram

24 Histogram of IgM values

25 How to test for Normality Mean = Median (mean-2sd, mean+2sd) reasonable range -1 < skewness < 1 -1 < kurtosis < 1 Histogram shows symmetric bell shape

26 Checking for Normality AgeLength of stay Satisfaction score Mean66.212.15.2 Median6789 SD8.29.04.3 Minimum4941 Maximum803610 Skewness-0.21.8-2.5 Kurtosis0.51.34.6

27 Secondary prevention of coronary heart disease Mean (sd) Respondents (n=1343) Non-respondents (n=578) Age (years)66.2 (8.2)66.6 (8.7) Time since MI (mths) *10 (6, 35)15 (8, 47) Cholesterol (mmol/l)6.5 (1.2)6.6 (1.2) [* Median (range)]

28 Summary statistics example II

29 Natural log transformation Can transform +vely skewed data to ‘Normal’ data Use transformed data in analysis Resulting mean value transformed back (using e x ) to give geometric mean Present geometric mean and range

30 Effect of loge transformation Length of stay Loge length of stay Mean12.12.2 Median82.1 SD9.00.5 Minimum41.4 Maximum363.6 Skewness1.80.4 Kurtosis1.30.7 [Geometric mean = e 2.2 = 9.0]

31 Secondary prevention of coronary heart disease Mean (sd) Respondents (n=1343) Non-respondents (n=578) Age (years)66.2 (8.2)66.6 (8.7) Time since MI (mths) *10 (6, 35)15 (8, 47) Cholesterol (mmol/l) Length of stay # 6.5 (1.2) 9.0 (4, 36) 6.6 (1.2) 11.2 (6, 83) [* Median (range), # Geometric mean (range)]

32 Confidence Interval “ The estimated mean difference in systolic blood pressure between 100 diabetic and 100 non-diabetic men was 6.0 mmHg with 95% confidence interval (1.1mmHg, 10.9mmHg)”

33 Confidence Interval Contains information about the (im)precision of the estimated effect size Presents a range of values, on the basis of the sample data, in which the population value for such an effect size may lie

34 Confidence Interval 95% CI for mean = mean +/- 1.96 SEM 90% CI for mean = mean +/- 1.64 SEM SEM = sd / sqrt(n)

35 Confidence Interval The 95% CI is a range of values which we are 95% confident covers the true population mean There is a 5% chance that the ‘true’ mean lies outside the 95% CI

36 Error bar plot

37 Confidence Interval Example

38 Significance/hypothesis tests Measure strength of evidence provided by the data for or against some proposition of interest Eg. Is the survival rate after X better than after Y?

39 Significance/hypothesis tests Null hypothesis: “Effects of X and Y are the same” Alternative hypothesis: “Effects of X and Y are different”

40 Significance/hypothesis tests One-sided : “X is better than Y” Two-sided: “ X and Y have different effects”

41 P-value P is the probability of how true is the null hypothesis

42 P-value P <= 0.05 null hypothesis is not true there is a difference between X and Y result is statistically significant

43 P-value P > 0.05 null hypothesis may be true there is probably no difference between X and Y result is not statistically significant

44 P-value Power of study probability of rejecting null hypothesis when false increased by increasing sample size increased if true difference between treatments is large

45 P-value Statistical significance does not imply clinical significance

46 A statistician is a person whose lifetime ambition is to be wrong 5% of the time

47 Types of significance tests Chi-square test: “28 out of 70 smokers have a cough compared with 5 out of 50 non-smokers - is there a significant difference?” [28/70 = 40% compared with 5/50=10%]

48 Chi-square test result “P=0.001” There is a significant relationship between smoking and cough

49 Types of significance tests Two-sample t-test: “Is there a difference in the 24 hour energy expenditure between groups of lean and obese women?”

50 Types of significance tests Mann-Whitney U-test: “Is there a difference in the nausea score between chemo patients receiving an active anti-emetic treatment and those receiving placebo?”

51 Types of significance tests Paired t-test: “Is there a difference in the dietary intake of a group of students in the week before and after Finals?”

52 Types of significance tests Wilcoxon matched pairs signed rank test or the Sign test: “Is there a difference in the units of alcohol consumed by students in the week before and after finals?”

53 Significance test example

54 Correlation Measures the strength of the relationship between two variables

55 Scattergram

56 Correlation Pearson correlation: Used for Normally distributed data Measures linear relation between variables

57 Correlation r = 0 no relationship r = 1 perfect +ve relationship r = -1 perfect –ve relationship

58 Scattergram

59 Correlation Spearman correlation: Used for non-Normally distributed data Measures monotonic relationship between variables

60 Correlation Example

61 Correlation

62 “The government are very keen on amassing statistics. They collect them, add them, raise them to the n’th power, take the cube root and prepare wonderful diagrams. But you must never forget that every one of these figures comes in the first instance from the village watchman, who just puts down what he damn pleases” [Comment of a judge on the subject of government statistics, 1920]


Download ppt "Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who."

Similar presentations


Ads by Google