Medical Statistics as a science

Medical Statistics as a science

Меdical Statistics: Extrapolate from data collected to make general conclusions about larger population from which data sample was derived To do this we must assume that all data is randomly sampled from an infinitely large population, then analyse this sample and use results to make inferences about the population Allows general conclusions to be made from limited amounts of data

Statistical Analysis in a
Simple Experiment Half the subjects receive one treatment and the other half another treatment (usually placebo) Randomly select sample of subjects to study (exclusion criteria but define a precise patient population) Measure baseline variables in each group (e.g. age, Apache II to ensure randomisation successful) Define population of interest Use statistical techniques to make inferences about the distribution of the variables in the general population and about the effect of the treatment

Outline Power Basic Sample Size Information
Examples (see text for more) Changes to the basic formula Multiple comparisons Poor proposal sample size statements Conclusion and Resources

Measures of central tendency Measures of variability
Тypes of descriptive statistics: Graphs Measures of central tendency Measures of variability

Numerical data:  the value is a number (either measured or counted)
Ordinal data: there is natural order e.g. Adverse Events (Mild/Moderate/Severe/Life Threatening) Categorical data:  values belong to categories Data Nominal data: there is no natural order to the categories e.g. blood groups

Data Categorical data:  values belong to categories
Nominal data: there is no natural order to the categories e.g. blood groups Ordinal data: there is natural order e.g. Adverse Events (Mild/Moderate/Severe/Life Threatening) Binary data: there are only two possible categories e.g. alive/dead Numerical data:  the value is a number (either measured or counted) Continuous data: measurement is on a continuum e.g. height, age, haemoglobin Discrete data: a “count” of events e.g. number of pregnancies

Descriptive Statistics:
concerned with summarising or describing a sample eg. mean, median Inferential Statistics: concerned with generalising from a sample, to make estimates and inferences about a wider population eg. T-Test, Chi Square test

3)Data management and treatment
1)Basic requirement of medical research Why we need to study statistics? 2)Update your medical knowledge. 3)Data management and treatment

Statistical Terms Mean:  the average of the data  sensitive to outlying data Median:  the middle of the data  not sensitive to outlying data Mode:  most commonly occurring value Range:  the spread of the data IQ range:  the spread of the data  commonly used for skewed data Standard deviation:  a single number which measures how much the observations vary around the mean Symmetrical data:  data that follows normal distribution  (mean=median=mode)  report mean & standard deviation & n Skewed data:  not normally distributed  (meanmedian mode)  report median & IQ Range

Standard Normal Distribution

Standard Normal Distribution
Mean +/- 1 SD  encompasses 68% of observations Mean +/- 2 SD  encompasses 95% of observations Mean +/- 3SD  encompasses 99.7% of observations

1. Experimental Design Convenience Sampling
Use results that are easy to get

1. Experimental Design Stratified Sampling
Draw a sample from each stratum

Basic concepts Homogeneity: All individuals have similar values or belong to same category. Example: all individuals are Chinese, women, middle age (30~40 years old), work in a textile mill ---- homogeneity in nationality, gender, age and occupation. Variation: the differences in height, weight…

Steps in Statistical Testing
Null hypothesis Ho: there is no difference between the groups Alternative hypothesis H1: there is a difference between the groups Collect data Perform test statistic eg T test, Chi square Interpret P value and confidence intervals P value  Reject Ho P value > Accept Ho Draw conclusions

Population and sample Sample: A representative part of the population.
The whole collection of individuals that one intends to study Sample: A representative part of the population. Randomization: An important way to make the sample representative

Probability Measure the possibility of occurrence of a random event.
P(A) : Probability of the random event A P(A)=1 , if an event always occurs. P(A)=0, if an event never occurs.

2. Descriptive Statistics & Distributions
Parameter: population quantity Statistic: summary of the sample Inference for parameters: use sample Central Tendency Mean (average) Median (middle value) Variability Variance: measure of variation Standard deviation (sd): square root of variance Standard error (se): sd of the estimate Median, quartiles, min., max, range, boxplot Proportion

Normal distribution

Standard normal distribution: Mean 0, variance 1

Z-test for means T-test for means if sd is unknown

Meaning of P P Value: the probability of observing a result as extreme or more extreme than the one actually observed from chance alone Lets us decide whether to reject or accept the null hypothesis P > Not significant P = to 0.05 Significant P = to 0.01 Very significant P < Extremely significant

3. Inference for Means Click ‘File’ to import data and create the SAS data set. Click ‘Solution’ to create a project to run statistical test Click ‘File’ to open the SAS data set. Click ‘Statistics’ to select the statistical procedure.

3. Inference for Means Mann-Whitney U-Test (Wilcoxon Rank-Sum Test)
Nonparametric alternative to two-sample t-test The populations don’t need to be normal H0: The two samples come from populations with equal medians H1: The two samples come from populations with different medians

3. Inference for Means Mann-Whitney U-Test Procedure
Temporarily combine the two samples into one big sample, then replace each sample value with its rank Find the sum of the ranks for either one of the two samples Calculate the value of the z test statistic

T Test PLACEBO: APC: mean age 60.6 years mean age 60.5 years
T test checks whether two samples are likely to have come from the same or different populations Used on continuous variables Example: Age of patients in the APC study (APC/placebo) PLACEBO: APC: mean age 60.6 years mean age 60.5 years SD+/ SD +/ n= n= 850 95% CI % CI What is the P value? 0.01 0.05 0.10 0.90 0.99 P =  not significant  patients from the same population (groups designed to be matched by randomisation so no surprise!!)

T Test: SAFE “Serum Albumin”
PLACEBO ALBUMIN n mean SD 95% CI Q: Are these albumin levels different? Ho = Levels are the same (any difference is there by chance) H1 =Levels are too different to have occurred purely by chance Statistical test: T test  P < (extremely significant) Reject null hypothesis (Ho) and accept alternate hypothesis (H1) ie. 1 in chance that these samples are both from the same overall group therefore we can say they are very likely to be different

Effect of Sample Size Reduction
PLACEBO ALBUMIN n mean SD 95% CI smaller sample size (one tenth smaller) causes wider CI (less confident where mean is) P = (i.e. approx 0.01  P is significant but less so) This sample size influence on ability to find any particular difference as statistically significant is a major consideration in study design

Reducing Sample Size (again)
PLACEBO ALBUMIN n mean SD 95% CI using even smaller sample size (now 1/100) much wider confidence intervals p=0.41 (not significant anymore)  SMALLER STUDY has LOWER POWER to find any particular difference to be statistically significant (mean and SD unchanged) POWER: the ability of a study to detect an actual effect or difference

3. Inference for Means Mann-Whitney U-Test, Example
Numbers in parentheses are their ranks beginning with a rank of 1 assigned to the lowest value of 17.7. R1 and R2: sum of ranks

3. Inference for Means Hypothesis: The group means are different
Ho: Men and women have same median BMI’s H1: Men and women have different median BMI’s p-value= 0.33, thus we do not reject H0 at =0.05. There is no significant difference in BMI between men and women.

3. Inference for Means SAS Programming for Mann-Whitney U-Test Procedure Data steps : The same as slide 21. Procedure steps : Click ‘Solutions’ Click ‘Analysis’ Click ‘Analyst’ Click ‘File’ Click ‘Open By SAS Name’ Select the SAS data set and Click ‘OK’ Click ‘Statistics’ Click ‘ ANOVA’ Click ‘Nonparametric One-Way ANOVA’ Select the ‘Dependent’ and ‘Independent’ variables respectively and choose the interested test Click ‘OK’

3. Inference for Means Click ‘File’ to open the SAS data set.
Click ‘Statistics’ to select the statistical procedure. Select the dependent and independent variables:

3. Inference for Means Notation for paired t-test d = individual difference between the two values of a single matched pair µd = mean value of the differences d for the population of paired data = mean value of the differences d for the paired sample data sd = standard deviation of the differences d for the paired sample data n = number of pairs

3. Inference for Means Example: Systolic Blood Pressure
OC: Oral contraceptive ID Without OC’s With OC’s Difference 1 115 128 13 2 112 3 107 106 -1 4 119 9 5 122 7 6 138 145 126 132 8 105 109 104 102 -2 10 117

3. Inference for Means Hypothesis: The group means are different
Ho: vs. H1: Significance level:  = 0.05 Degrees of freedom (df): Test statistic P-value: 0.009, thus reject Ho at =0.05 The data support the claim that oral contraceptives affect the systolic bp.

3. Inference for Means Confidence interval for matched pairs
100(1-)% CI: 95% CI for the mean difference of the systolic bp:  (1.53, 8.07)

3. Inference for Means Click ‘File’ to open the SAS data set.
Click ‘Statistics’ to select the statistical procedure. Put the two group variables into ‘Group 1’ and ‘Group 2’

Chi Square Test Proportions or frequencies Binary data e.g. alive/dead
PROWESS Study: Primary endpoint: 28 day all cause mortality Reduction in death rate = 30.8%-24.7%= 6.1% ie 6.1% less likely to die in APC group ALIVE DEAD TOTAL % DEAD PLACEBO (69.2%) (30.8%) 840 (100%) DEAD (75.3%) (24.7%) 850 (100%) TOTAL (72.2%) (27.8%) 1690 (100%) Perform Chi Square test  P = (very significant) 6 in 1000 times this result could happen by chance  994 in 1000 times this difference was not by chance variation

Reducing Sample Size Same results but using much smaller sample size (one tenth) Reduction in death rate = 6.1% (still the same) ALIVE DEAD TOTAL % DEAD PLACEBO (69.2%) (30.8%) (100%) DEAD (75.3%) (24.7%) (100%) TOTAL (72.2%) (27.8%) (100%) Perform Chi Square test  P = in 100 times this difference in mortality could have happened by chance therefore results not significant Again, power of a study to find a difference depends a lot on sample size for binary data as well as continuous data

Summary Size matters=BIGGER IS BETTER Spread matters=SMALLER IS BETTER
Bigger difference=EASIER TO FIND Smaller difference=MORE DIFFICULT TO FIND To find a small difference you need a big study

Thank you!

Medical Statistics as a science

Similar presentations

Presentation on theme: "Medical Statistics as a science"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Medical Statistics as a science

Similar presentations

Presentation on theme: "Medical Statistics as a science"— Presentation transcript:

Similar presentations

About project

Feedback