Download presentation
Presentation is loading. Please wait.
Published byDorcas Collins Modified over 9 years ago
1
Introduction to Statistics Dr Linda Morgan Clinical Chemistry Division School of Clinical Laboratory Sciences
2
Outline Types of data Descriptive statistics Estimates and confidence intervals Hypothesis testing Comparing groups Relation between variables Statistical aspects of study design Pitfalls
3
Types of data Categorical data –Ordered categorical data Numerical data –Discrete –Continuous
4
Descriptive statistics Categorical variables Graphical representation – bar diagram Numbers and proportions in each category
5
Descriptive statistics Continuous variables Distributions –Gaussian –Lognormal –Non-parametric Central tendency –Mean –Median Scatter –Standard deviation –Range –Interquartile range
6
Gaussian (normal) distribution
7
Central tendency Mean = x n Scatter Variance = (x-mean) 2 n –1 Standard deviation = variance Gaussian (normal) distribution
8
Lognormal distribution
10
Mean = log x n Geometric mean = antilog of mean (10 mean ) Median –Rank data in order –Median = (n+1) / 2 th observation
11
Variability Variance = (x-mean) 2 n –1 Standard deviation = variance Range Interquartile range
12
Variability of Sample Mean The sample mean is an estimate of the population mean The standard error of the mean describes the distribution of the sample mean Estimated SEM = SD/ n The distribution of the sample mean is Normal providing n is large
13
Standard error of the difference between two means SEM = SD/ n Variance of the mean = SD 2 /n Variance of the difference between two sample means = sum of the variances of the two means = (SD 2 /n) 1 + (SD 2 /n) 2 SE of difference between means = [ (SD 2 /n) 1 + (SD 2 /n) 2 ]
14
Variability of a sample proportion Assume Normal distribution when np and n(1-p) are > 5 SE of a Binomial proportion = (pq/n) where q = 1-p
15
Standard error of the difference between two proportions SE (p 1 – p 2 ) = [variance (p 1 ) + variance (p 2 ) ] = [ ( p 1 q 1 /n 1 ) + ( p 2 q 2 /n 2 ) ]
16
Confidence intervals of means 95% ci for the mean = Sample mean 1.96 SEM 95% ci for difference between 2 means = (mean 1 – mean 2 ) 1.96 SE of difference
17
Confidence intervals of proportions 95% ci for proportion = p 1.96 (pq/n) 95% ci for difference between two proportions = (p 1 – p 2 ) 1.96 x SE (p 1 – p 2 )
18
Hypothesis testing The null hypothesis The alternative hypothesis What is a P value?
19
Comparing 2 groups of continuous data Normal distribution: paired or unpaired t test Non-Normal distribution: transform data OR Mann-Whitney-Wilcoxon test
20
Paired t test We wish to compare the fasting blood cholesterol levels in 10 subjects before and after treatment with a new drug. What is the null hypothesis?
21
Paired t test SubjectFasting cholesterolD NumberPredrugPostdrug 016.74.42.3 027.87.00.8 038.16.02.1 045.55.8 -0.3 058.69.0 -0.4 066.76.10.6 077.17.3 -0.2 089.99.90 098.26.31.9 106.57.1 -0.6
22
Paired t test Calculate the mean and SEM of D The null hypothesis is that D = 0 The test statistic t = mean(d) – 0 SEM (d)
23
Paired t test Mean = 0.62 SEM = 0.351 t = 1.766 Degrees of freedom = n - 1 = 9 From tables of t, 2-tailed probability (P) is between 0.1 and 0.2 How would you interpret this?
24
Comparing 2 groups of categorical data In a study of the effect of smoking on the risk of developing ischaemic heart disease, 250 men with IHD and 250 age-matched healthy controls were asked about their current smoking habits. What is the null hypothesis?
25
Results 70 of the 250 patients were smokers 30 of the healthy controls were smokers SmokerNon- smoker Total IHD 70180250 Control 30220250 Total 100400500
26
SmokerNon-smokerTotal IHD 70 50 180 200 250 Control 30 50 220 200 250 Total 100400500 Calculate expected values, E, for each cell
27
Calculate (observed – expected) value, D SmokerNon-smokerTotal IHD 70 – 50 = 20 180–200= -20 Control 30-50= -20 220-200= 20 Total
28
Calculate D 2 /E SmokerNon-smokerTotal IHD 400/50= 8 400/200= 2 Control 400/50= 8 400/200= 2 Total
29
Calculate the sum of D 2 /E 8 + 8 + 2 + 2 = 20 This is the test statistic, chi squared Compare with tables of chi squared with (r-1)(c-1) degrees of freedom In this case, chi squared with 1 df has a P value of < 0.001 How do you interpret this?
30
Statistical analysis using computer software SPSS as an example
31
Planning Experimental design Suitable controls Database design
32
Statistical power The power of a study to detect an effect depends on: –The size of the effect –The sample size The probability of failing to detect an effect where one exists is called The power of a study is 100(1- )% Wide confidence intervals indicate low statistical power
33
Statistical power The necessary sample size to detect the effect of interest should be calculated in advance Pilot data are usually required for these calculations
34
Statistical power - example 30% of the population are carriers of a genetic variant. You wish to test whether this variant increases the risk of Alzheimers Disease. For P < 0.05, and 80% power, number of controls and cases required: Control carriersCase carriersSample size 30%50% 100 30%40% 350 30%35%1400
35
Multiple testing Number ofProbability of Testsfalse positive 10.05 20.10 30.14 40.19 50.23 100.40 200.64 Bonferroni correction: Divide 0.05 by the number of tests to provide the required P value for hypothesis testing at the conventional level of statistical significance
36
Data trawling Decide in advance which statistical tests are to be performed Post hoc testing of subgroups should be viewed with caution Multiple correlations should be avoided
37
HELP! “In house” support Cripps Computing Centre Trent Institute for Health Service Research Practical Statistics for Medical Research Douglas G Altman
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.