Session 6: Basic Statistics Part 1 (and how not to be frightened by the word) Prof Neville Yeomans Director of Research, Austin LifeSciences.

Slides:



Advertisements
Similar presentations
Aron, Aron, & Coups, Statistics for the Behavioral and Social Sciences: A Brief Course (3e), © 2005 Prentice Hall Chapter 6 Hypothesis Tests with Means.
Advertisements

Research Methods in Politics Chapter 14 1 Research Methods in Politics 14 Understanding Inferential Statistics.
Using the T-test IB Biology Topic 1.
Confidence Intervals Chapter 9.
Chapter 11 Comparing Two Means. Homework 19 Read: pages , , LDI: 11.1, 11.2, EX: 11.40, 11.41, 11.46,
Statistical Sampling.
Statistical vs. Practical Significance
Sampling Distributions Suppose I throw a dice times and count the number of times each face turns up: Each score has a similar frequency (uniform.
SUMMARIZING DATA: Measures of variation Measure of Dispersion (variation) is the measure of extent of deviation of individual value from the central value.
RESEARCH METHODOLOGY & STATISTICS LECTURE 6: THE NORMAL DISTRIBUTION AND CONFIDENCE INTERVALS MSc(Addictions) Addictions Department.
Standard Deviation and Standard Error Tutorial
Central Limit Theorem.
Confidence Intervals for Proportions
Independent Sample T-test Classical design used in psychology/medicine N subjects are randomly assigned to two groups (Control * Treatment). After treatment,
Understanding sample survey data
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
Review of normal distribution. Exercise Solution.
AM Recitation 2/10/11.
Data Collection & Processing Hand Grip Strength P textbook.
Topic 6.1 Statistical Analysis. Lesson 1: Mean and Range.
Basic statistics 11/09/13.
Estimation Statistics with Confidence. Estimation Before we collect our sample, we know:  -3z -2z -1z 0z 1z 2z 3z Repeated sampling sample means would.
F OUNDATIONS OF S TATISTICAL I NFERENCE. D EFINITIONS Statistical inference is the process of reaching conclusions about characteristics of an entire.
Populations, Samples, Standard errors, confidence intervals Dr. Omar Al Jadaan.
Estimation of Statistical Parameters
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
Statistical Analysis Mean, Standard deviation, Standard deviation of the sample means, t-test.
Measures of Spread Chapter 3.3 – Tools for Analyzing Data I can: calculate and interpret measures of spread MSIP/Home Learning: p. 168 #2b, 3b, 4, 6, 7,
Education Research 250:205 Writing Chapter 3. Objectives Subjects Instrumentation Procedures Experimental Design Statistical Analysis  Displaying data.
Measures of Dispersion CUMULATIVE FREQUENCIES INTER-QUARTILE RANGE RANGE MEAN DEVIATION VARIANCE and STANDARD DEVIATION STATISTICS: DESCRIBING VARIABILITY.
Standard Error and Confidence Intervals Martin Bland Professor of Health Statistics University of York
Day 2 Session 1 Basic Statistics Cathy Mulhall South East Public Health Observatory Spring 2009.
Statistical analysis Outline that error bars are a graphical representation of the variability of data. The knowledge that any individual measurement.
Measures of central tendency are statistics that express the most typical or average scores in a distribution These measures are: The Mode The Median.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
PCB 3043L - General Ecology Data Analysis. OUTLINE Organizing an ecological study Basic sampling terminology Statistical analysis of data –Why use statistics?
Confidence Intervals: The Basics BPS chapter 14 © 2006 W.H. Freeman and Company.
Statistics in Biology. Histogram Shows continuous data – Data within a particular range.
Determination of Sample Size: A Review of Statistical Theory
Population and Sample The entire group of individuals that we want information about is called population. A sample is a part of the population that we.
Physics 270 – Experimental Physics. Let say we are given a functional relationship between several measured variables Q(x, y, …) x ±  x and x ±  y What.
CHAPTER-6 Sampling error and confidence intervals.
Statistics and Quantitative Analysis U4320 Segment 5: Sampling and inference Prof. Sharyn O’Halloran.
Stats Lunch: Day 3 The Basis of Hypothesis Testing w/ Parametric Statistics.
PCB 3043L - General Ecology Data Analysis. PCB 3043L - General Ecology Data Analysis.
Introducing Communication Research 2e © 2014 SAGE Publications Chapter Seven Generalizing From Research Results: Inferential Statistics.
Statistical analysis Why?? (besides making your life difficult …)  Scientists must collect data AND analyze it  Does your data support your hypothesis?
Week 6. Statistics etc. GRS LX 865 Topics in Linguistics.
Measuring change in sample survey data. Underlying Concept A sample statistic is our best estimate of a population parameter If we took 100 different.
INFERENTIAL STATISTICS DOING STATS WITH CONFIDENCE.
1 Probability and Statistics Confidence Intervals.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
Review Statistical inference and test of significance.
CHAPTER 8 (4 TH EDITION) ESTIMATING WITH CONFIDENCE CORRESPONDS TO 10.1, 11.1 AND 12.1 IN YOUR BOOK.
3.3 Measures of Spread Chapter 3 - Tools for Analyzing Data Learning goal: calculate and interpret measures of spread Due now: p. 159 #4, 5, 6, 8,
PCB 3043L - General Ecology Data Analysis Organizing an ecological study What is the aim of the study? What is the main question being asked? What are.
Statistical analysis.
Lecture 9-I Data Analysis: Bivariate Analysis and Hypothesis Testing
Chapter 7 Review.
AP Biology Intro to Statistics
Statistical analysis.
PCB 3043L - General Ecology Data Analysis.
POSC 202A: Lecture Lecture: Substantive Significance, Relationship between Variables 1.
AP Biology Intro to Statistics
Chapter 9 Hypothesis Testing.
Sampling Distributions
Sampling Distributions
GENERALIZATION OF RESULTS OF A SAMPLE OVER POPULATION
Presentation transcript:

Session 6: Basic Statistics Part 1 (and how not to be frightened by the word) Prof Neville Yeomans Director of Research, Austin LifeSciences

Session 6: Basic Statistics Part 2 (and how not to be frightened by the word) Prof Neville Yeomans Director of Research, Austin LifeSciences

So now we’ve got some results? How can we make sense out of them?

What I will cover (over 2 sessions, August 28 and September 25) Sampling populations Describing the data in the samples How accurately do those data reflect the ‘real’ population the samples were taken from? We’ve compared two groups. Are they really from different populations, or are they samples from the same population and the measured differences are just due to chance?

What I will cover (contd.) Tests to answer the question ‘Are the differences likely to be just due to chance?’ – Data consisting of values (e.g. hemoglobin concentration)(‘continuous variables’) – Data consisting of whole numbers – frequencies, proportions (percentages) – Tests for just two groups; tests for multiple groups Tests that examine relationships between two or more variables (correlation, regression analysis, life- table)

What I will cover (contd.) How many subjects should I study to find the answer to my question? (Power calculations) Statistical packages and other resources

We’ve got some numbers. How are we going to describe them to others? Suppose we’ve measured heights of a number of females (‘a sample’) picked off the street in Heidelberg Subject # Height (cm) Subject # Height (cm) Subject # Height (cm)

How could we more concisely describe these data – using just one or two numbers that would give us useful information about the sample as a whole? 1.A measure of ‘central tendency’ 2. A measure of how widely the values are spread

The median (middle value) = The range ( ) a poor measure for describing the whole population because it depends on sample size – range is likely to be wider with larger samples Interquartile range (25 th percentile to 75 th percentile of values: ) what we should always use with the median – it’s largely independent of sample size

the Mean (average)( Ʃ x/N) = cm the Standard Deviation* ± 7.2 cm *In Excel, enter formula ‘=STDEV(range of cells)’ - doesn’t vary much with sample size (except very small samples) - approx. 67% of values will lie within ± 1 SD either side of mean** - approx 95% of values will lie within ± 2 SD either side of mean** (amalgamated into 3cm ranges: e.g , etc.) ** Provided the population is ‘normally distributed’

The ‘Normal distribution’ Mean = Median in a ‘perfect’ normal distribution Standard deviations away from mean

We measured the mean height of our sample of 25 women... (it was cm) But what is the average height of the whole population – of ALL Heidelberg women? We didn’t have time or resources to track them all down – that’s why we just took what we hoped was a representative sample. What I’m asking is: how good an estimate of the true population mean is our sample mean? This is where the Standard Error of the Mean* (or just Standard Error, SE) comes in. *It’s sometimes called the Standard ESTIMATE of the Error of the mean

The Standard Error (contd.) The mean height of our sample of 25 women was cm We calculated the Standard Deviation (SD) of the sample to be 7.3 cm (that value, on either side of the mean, that should contain about 2/3 of those measured) Standard Error of the mean = SD/√N, i.e. 7.3/ √25 = 7.3/5 = 1.46 So now we can express our results for the height of our sample as ± 1.5 (Mean ± SEM) But what does this really tell us? The actual true mean height of the whole population of women has a 67% likelihood of lying within 1.5 cm (i.e. 1 SEM) either side of the mean we found in our sample; and a 95% likelihood of lying within 3 cm (i.e. 2 x SEM) either side of that sample mean. (It’s actually 1.96xSEM for a reasonably large sample - e.g. roughly 30 – and wider for small samples, but let’s keep it simple).

The concept of the standard error of the mean (SEM) – e.g. serum sodium values mmol/L True population mean 142 mmol/L (SD=4.0) Sample mean =142.8 mmol/L (SD of sample = 3.4) 1 x SEM = (i.e. 3.4/√10) = 1.1 mmol/L 2 x SEM (~’95% confidence interval’) = 2.2 mmol/L Random sample of 10 normal individuals That means: ‘There is a 95% chance (19 chances out of 20) that the actual population mean, estimated from our random sample lies between and mmol/L)’

Why does Standard Error depend on population SD and sample size? SE = SD/√N A: Narrow population spread (i.e. small SD) B: Wide population spread (i.e. large SD) Increasing N decreases SE of mean. i.e. increases accuracy of our estimate of the population mean based on results of our sample

Testing significance of differences Women: Mean = SE = 1.46 Men: Mean = SE = 1.43 On a quick, rough, check we can see that: (a)the 95% confidence interval for our estimate of the height of women is cm (approximately mean ± 2SE). (b) our estimate of the mean height of the men sampled is quite a lot outside the 95% confidence interval (range)for the women, so it looks improbable that they are from the same population 95% confidence intervals

Testing significance of differences.... How likely is it that the two random samples came from the same population? Student’s t-test Standard deviation either side of mean Composite frequency distribution, created by pooling data from both samples Mean: SD: 10.0 cm Women Men How likely is it that these two samples (the pink and the blue) were taken from the SAME population? [this is called the NULL HYPOTHESIS] Tested for statistical significance of difference: p<0.001 i.e. there is less than 1 chance in 1000 that these two samples came from the same population

In fact, though, running a Student’s t-test on the two samples of height in the Heidelberg men and women slide, gives this error message:

Assumptions to be met before testing significance of differences with PARAMETRIC TESTS* – i.e. tests that use the mathematics of the normal curve distribution The combined data should approximate a normal curve distribution – in this instance the male data were skewed (not evenly distributed around the mean) and spread a bit too far out into the tails of the frequency- distribution curve The variances (=SD 2 ) of the groups should not differ significantly from each other * Student’s t-test, Paired t-test, Analysis of Variance

An example of data where groups have different variance (spread), and one group is skewed Means Medians The equal variance test failed (P<0.05), and the normality test almost failed (P=0.08) – so we should not use a parametric test such as t-test Lower limit of ‘normal’ range

So what do we do if we can’t use a parametric test to check for significances of differences? Use a non-parametric test These tests, instead of using the actual numerical values of the data, put the data from each group into ascending order and assign a rank number for their place in the combined groups. The maths of the test is then done on these ranks Examples: Rank sum test, Wilcoxon Rank Test, Mann Whitney rank test, etc. – (the P value for our slide of heights of Heidelberg men and women was calculated using Wilcoxon test)

Group 1 data Rank when groups combined Group 2 data Rank when groups combined Sum of ranks: Mann-Whitney test: P = Example of how a rank-sum (non- parametric) test is constructed manually* *In reality, these days you’ll just feed the raw data into a program to do it for you

Tests to examine significance of differences between 3 or more groups Parametric tests (tests based on the mathematics of the ‘normal curve’) – Analysis of Variance (1-way, 2-way, factorial, etc.) Non-parametric tests (rank-sum tests) – Kruskal-Wallis test [strictly this should read... ‘tests to decide how likely are data from 3 or more samples to come from the same population’]

One variable (dose), compared across 3 groups.... So this gets tested with one-way ANOVA

This tells us that it is very unlikely the three groups belong to the same population.. But which differ from which?

One Way Analysis of Variance Thursday, June 28, 2012, 3:39:25 PM Normality Test:Passed(P = 0.786) Equal Variance Test:Passed(P = 0.694) Group Name N MissingMeanStd DevSEM Control Normopress 0.5 mg Normopress 2.0 mg Source of Variation DF SS MS F P Between Groups <0.001 Residual Total The differences in the mean values among the treatment groups are greater than would be expected by chance; there is a statistically significant difference (P = <0.001). All Pairwise Multiple Comparison Procedures (Holm-Sidak method): Overall significance level = 0.05 Comparisons for factor: Comparison Diff of Means t Unadjusted P Critical Significant? Level Control vs. Normopress 2.0 mg < Yes Normopress 0.5 mg vs. Normopress 2.0 mg < Yes Control vs. Normopress 0.5 mg No

Before and after data – paired tests Create a paired analysis of length of hair after going to hairdresser. Hypothesis: cutting hair makes it shorter

Two independent groups Difference between means tested for significance with Student’s t test 9.7±1.9* 8.2±1.7 *mean ± SE P=0.58

Our actual data P=0.025 Difference between means tested for significance with paired Student’s t test BeforeAfter The variation within each individual is much less than between individuals The paired t-test examines the mean and standard error of the changes In each individual, and tests how likely are the changes due to chance

So far we have been dealing with ‘CONTINUOUS VARIABLES’ – numbers such as heights, laboratory values, velocities, temperatures etc. that could have any value (e.g. many decimal points) if we could measure accurately enough. – whole numbers, most often as proportions or percentages. Now we’ll look at... ‘DISCONTINUOUS VARIABLES’

Rates and proportions In 1969, a home for retired pirates has 93 inmates, 42 of whom have only one leg. In 2004, a subsequent survey finds there are now 62 inmates, 6 of whom have only one leg. Has there been a ‘real’ change (i.e. a change unlikely to be due to chance) in the proportion of one-legged pirates in the home between the two surveys?

YearPirates with 1 leg (%) Pirates with 2 legs (%) Total pirates (45.2) 51 (54.8) (9.7) 56 (10.3) 62 Totals Chi-square= with 1 degrees of freedom. (P = <0.001) i.e. The likelihood that the difference in proportions of 1-legged inmates between 1969 and 2004 is due to chance... is less than 1:1000 Expected 29 19

One trap with chi-square tests and small numbers.... TreatmentNo. deadNo. surviving Placebo95 Penicillin18 Treatme nt No. deadNo. surviving Totals Placebo9514 Penicillin189 Totals Fisher’s exact test: P = 10! x 13! x 14! x 9! = ! x 5! x 1! x 8! x 23! Penicillin treatment for pneumonia

Correlation Fairly straightforward concept of how likely are two variables to be related to each other Examples: – Do children’s heights vary with their age, and if so is the relation direct (i.e. get bigger as get older) or converse (get smaller as get older)? – Does respiratory rate increase as pulse rate increases during exertion? The correlation coefficient, R, tells us how closely the two variables ‘travel together’ P value is calculated to tell us how likely the relationship is to be ‘only’ by chance

Examples of regression (correlation) data R = P< R = P = 0.30

Some other common statistical analyses Life-table analyses – Observing and comparing events developing over time; allows us to compensate for dropouts at varying times during the study Multiple linear and multivariate regression analyses – Looking for relationships between multiple variables

Life table analyses Scagliotti et al. J Clin Oncol 2012; 30: 2829 Advanced lung cancer. Trial compared motesanib + 2 conventional chemo drugs... with placebo plus the two other drugs

Multiple regression analysis Examines the possible effect of more than one variable on the thing we are measuring (the ‘dependent variable’) Perret JL et al. The Interplay between the Effects of Lifetime Asthma, Smoking, and Atopy on Fixed Airflow Obstruction in Middle Age. Am J Respir Crit Med 2013; 187: from Institute of Breathing and Sleep (Austin), University of Melbourne, Monash University, Alfred Hospital, And others

Perret et al. 2013

Sample Size Calculations How many patients, subjects, mice etc. do we need to study to reliably* find the answer to our research question? *We can never be certain to do this, but should aim to be considerably more likely than not to find out the truth about the question

Sample size calculations (1) First we need to grapple with two types of ‘error’ in interpreting differences between means and/or medians of groups: Type 1 (or α) error:... that we think the difference is ‘real’ (data are from 2 or more different populations) when it is not – This is what we’ve dealt with so far, and the P- values assess how likely the differences are due to chance Type 2 (or β) error:... that our experiment, and the stats test we’ll apply to the results, will FAIL to show a significant difference when there REALLY IS ONE

Sample size calculations (2) If we end up with a Type 2 (  ) error, it will be because our sample size(s) was too small to persuade us that the actual difference between means was unlikely due to chance (i.e. P<0.05) The smaller the real difference between population means, the larger the sample size needs to be to detect it as being statistically significant

Sample size calculations (3) How do we go about it? Most of the good statistical packages have a function for calculating sample sizes 1.Decide what statistical test will be appropriate to apply to primary endpoint when study completes 2.Estimate the likely size of difference between groups, if the hypothesis is correct 3.Decide how confident you want to be that the difference(s) you observe is unlikely due to chance 4.Decide how much you want to risk missing a true difference (i.e. what power you want the study to have) Note: We really should have done a sample size calculation before we started our experiments, but for this course we needed to deal with the basics of stats tests first

Sample size calculations A worked example (i) We want to see whether drug X will reduce the incidence of peptic ulcer in patients taking aspirin for 6 months 1. Decide what statistical test: chi square, to compare differences in frequencies in 2 groups

Sample size calculations A worked example (i) We want to see whether drug X will reduce the incidence of peptic ulcer in patients taking aspirin for 6 months We expect a 10% incidence of ulcers in the controls We hypothesize that a 50% reduction (i.e. 5%) in those treated with X would be clinically worthwhile 2. We’ve now decided the size of the difference between groups we are interested to look for

Sample size calculations (4) A worked example (i) We want to see whether drug X will reduce the incidence of peptic ulcer in patients taking aspirin for 6 months We expect a 10% incidence of ulcers in the controls We hypothesize that a 50% reduction (i.e. 5%) in those treated with X would be clinically worthwhile We decide to be happy with a likelihood of only 1:20 that difference observed is due to chance 3. That is to say, we want to set P  0.05 as the level of α (alpha) risk (the risk of concluding the difference is real when it’s actually due to chance)

Sample size calculations (4) A worked example (i) We want to see whether drug X will reduce the incidence of peptic ulcer in patients taking aspirin for 6 months We expect a 10% incidence of ulcers in the controls We hypothesize that a 50% reduction (i.e. 5%) in those treated with X would be clinically worthwhile We decide to be happy with a likelihood of only 1:20 that difference observed is due to chance We would like to have at least an 80% chance of finding that 50% reduction (  20% of missing it, i.e. of β risk) 4. That is, set Power of the study ≥80% (1-β) to detect such a difference (if it exists)

Sample size calculations A worked example (i) Summary of sample size calculation setting: Estimated ulcer incidence in controls = 10% Estimated incidence in group receiving drug X = 5% For P(α)  0.05, and Power (1-β) ≥80% Data tested by chi square Calculated required sample size = 449 in each group

Sample size calculations A worked example (ii) We hypothesize that removing the spleen in rats will result in an increase in haemoglobin (Hb) from the normal mean of 14.0 g/L to 15.0 g/L We already know that the SD (Standard deviation) of Hb values in normal rats is 1.2 g/L (if we don’t know we’ll have to guess!) Testing will be with Student’s t test We’ll set α (likelihood observed difference due to chance) at 0.05 We want a power (1-β) of at least 80% to minimize risk of missing such a difference if its real* * More correctly, we should say if the samples really are from different populations

Sample size calculations A worked example (ii) Summary of sample size calculation setting: Control mean = 14.0 g/L; Operated mean = 15.0 g/L Estimated SD in both groups = 1.2 g/L For P(α)  0.05, and Power (1-β) ≥80% Data tested by Student t Calculated required sample size = 24 in each group

Summary of the most common statistical tests in biomedicine (1. Parametric tests) TestPurposeComments Student t-testCompare 2 groups of ‘continuous’ data* Only use if data are ‘normally distributed’ and variances of groups similar Paired Student t-testCompare before-after data on the same individuals The differences (between before- after) need to be ‘normally distributed’ More powerful than ‘unpaired’ t- test because less variability within individuals than between them 1- way analysis of variance Compare 3 or more groups of continuous data Same requirements as for Student t-test 2-way analysis of variance Compare 3 or more groups, stratified for at least 2 variables As above *For measured values, not numbers of events (frequencies)

Summary of the most common statistical tests in biomedicine (2. non-parametric) TestPurposeComments Rank-sum or Mann- Whitney test Compare 2 groups of ‘continuous’ data, using their ranks rather than actual values Use if t-test invalid because data not ‘normally distributed’ and/or variances of groups significantly different Signed rank testRank test to use instead of paired t-test Use instead of paired t-test if the differences (between before and after) are not ‘normally distributed’ Non-parametric analysis of variance Compare 3 or more groups of continuous data As above (it’s the generalised form of Mann-Whitney test when there are >2 groups)

Some tools for statistical analyses Excel spreadsheets – e.g. If column A contains data in the 8 cells, A3 through A10 – Mean : =average(a3:a10) – SD: =stdev(a3:a10) – SEM: =(stdev(a3:a10))/sqrt(8) Common statistical packages for significance testing – Sigmaplot – SPPS (licence can be downloaded from Unimelb) – STATA

Other resources Armitage P, Berry G, Matthews JNS. Statistical methods in medical research. 4 th edn. Oxford: Blackwell Science, Dawson B, Trapp R. Basic and clinical biostatistics. 4th edn. New York: McGraw Hill (Electronic book in Unimelb electronic collection) Rumsey DJ. Statistics for Dummies. 2nd edn. Oxford: Wiley & sons 2011.