Data Analysis for the BSc Year

Data Analysis for the BSc Year
Sachin Ananth BSc (Hons) Respiratory Science BSc (Class of 2018)

Lecture Objectives To understand common statistical terms
To learn how to calculate common statistical terms using raw data To learn which statistical test to use during your projects

Analysing Published Data
This is important for ICAs and background reading for your project

Null and Alternative Hypotheses
Null Hypothesis (H0) No statistical difference between the variables Alternative Hypothesis (H1) Statistical difference between the variables Null Hypothesis = nothing interesting going on (Null is Dull) I am explaining this as the Null Hypothesis is important in the definition of the P value Null Hypothesis = nothing interesting going on You want to NULLify the NULL hypothesis – because you do want to see a difference!

Type I vs Type II Error Type I Error
Probability of rejecting Null Hypothesis when it is actually true i.e. false positive Type II Error Probability of accepting Null Hypothesis when it is actually false i.e. false negative

Statistical Power Ability of a study to pick up a difference when it actually exists i.e. probability that a Type II error will NOT be made Increase power by: Increase sample size Increase effect size Increase precision of measurement Remember: type II error is a false negative

P Value “Probability of observed result occurring when Null Hypothesis is true” Translation: probability of finding a difference where there actually is no difference P < 0.05 = significant result A lower P value does not mean that the difference is meaningful – it just shows that there is a difference P < 0.05 means that there is less than 5% chance of detecting a difference when there is no difference A lower P value does not mean that the difference is meaningful – it just shows that there is a difference e.g. average BP 120/80 vs 117/78 could have a P < 0.05 if there is a large enough n number ... But is this difference clinically meaningful? If you comment on an example like this in your ICA/project = will look really clever!

Relative Risk (RR) Relative Risk (RR) = risk of developing disease in exposed group vs unexposed group Used in prospective studies 1.0 0.5 1.5 ↓50% risk Equal risk ↑50% risk You will see relative risk LOTS when you are reading papers – understanding what these values actually mean will make it MUCH easier to read the results sections of papers

Relative Risk (RR) Relative Risk (RR) = risk of developing disease in exposed group vs unexposed group Used in prospective studies Diseased Not diseased Exposed A B Not exposed C D Camera = take pic of this slide if you want A / (A+B) C / (C+D) RR =

Relative Risk (RR) Relative Risk (RR) = risk of developing disease in exposed group vs unexposed group Used in prospective studies 100 100 Diseased Not diseased Exposed A B Not exposed C D 80 20 30 70 80 30 80 A / (A+B) C / (C+D) 100 RR = = 2.7 (↑170% risk) 30 100

Relative Risk (RR) Relative Risk (RR) = risk of outcome in treatment group vs control group Used in prospective studies 100 100 Diseased Not diseased Treatment A B Control C D 40 60 60 40 Relative Risk can also be calculated in treatment group vs control group (just substitute treatment for exposed and control for not exposed in the table) 40 60 40 A / (A+B) C / (C+D) 100 RR = = 0.67 (↓33% risk) 60 100

Relative Risk (RR) Relative Risk of early preterm delivery in n-3 group? You have 90 seconds to answer! Diseased Not diseased Exposed A B Not exposed C D A / (A+B) C / (C+D) RR =

Relative Risk (RR) Relative Risk of early preterm delivery in n-3 group? Diseased Not diseased Exposed A B Not exposed C D 61 A / (A+B) C / (C+D) 2734 61 2673 RR = = 1.12 55 2752 55 2697

Hazard Ratio (HR) Similar to Relative Risk – used when risk is not constant with respect to time Used in context of survival over time Uses info collected at different times Will not be expected to calculate Hazard Ratio, just expected to interpret it In this graph, survival changes with respect to time (it is worst in the first 100 days, as this is when most complications of TB occur) HR = 0.79  means 21% reduction of risk in people treated with isoniazid, over 500 days

Odds Ratio (OR) Odds Ratio (OR): odds that outcome will occur with exposure vs without exposure Used in retrospective studies (esp case-control) 1.0 0.5 1.5 ↓50% odds Equal odds ↑50% odds Case-control = take cases (diseased patients) and controls (un-diseased patients)  look back at the differences in exposures to risk factors in cases and controls

Odds Ratio (OR) Odds Ratio (OR): odds that outcome will occur with exposure vs without exposure Used in retrospective studies (esp case-control) Diseased Not diseased Exposed A B Not exposed C D Use the same table that you would use for Relative Risk – the calculation is different Camera = take pic of this slide if you want A / C B / D OR =

Odds Ratio (OR) Odds Ratio of previous suicide attempt in suicidal group? Diseased Not diseased Exposed A B Not exposed C D You have 90 seconds to answer! = study which contains this table A / C B / D OR =

Odds Ratio (OR) Odds Ratio of previous suicide attempt in suicidal group? Diseased Not diseased Exposed A B Not exposed C D 89.6 A / C B / D 10.4 89.6 71.6 OR = = 3.42 71.6 28.4 10.4 28.4

95% Confidence Interval The population mean will lie within this range 95% of the time Example: Translation: if you repeat this experiment 100 times, the average height will be between cm 95 times Average height = 170cm (95% CI cm)

95% Confidence Interval The population mean will lie within this range 95% of the time Narrower 95% confidence intervals are better Confidence intervals can be used for Relative Risk, Odds Ratio e.g. RR was 1.30 (95% CI ) = negative result as confident interval crosses 1

Incidence vs Prevalence
Number of new cases over specific period of time e.g. 10,000 new cases of diabetes per year Prevalence Number of cases at a specific period of time e.g. 50,000 cases of diabetes in the UK

Absolute Risk Reduction (ARR)
Difference in the incidence of disease between 2 groups P(event occurring in control group) – P(event occurring in treatment group) 100 30% risk of death ARR = 30 – 20 = 10% 100 20% risk of death

Relative Risk Reduction (RRR)
Absolute Risk Reduction P (event occurring in control group) X 100 100 30% risk of death ARR = 30 – 20 = 10% RRR = (10/30) x 100 = 33.3% 100 20% risk of death

Number Needed to Treat (NNT)
Number of patients needed to be treated to prevent one event 1 ARR (in decimal format) 100 30% risk of death If you read a paper and work out the NNT by yourself, and then use this in an ICA/project thesis = will look really good ARR = 30 – 20 = 10% NNT = 1/0.1 = 10 100 20% risk of death

Calculate the Number Needed to Treat ARR = P(event occurring in control group) – P(event occurring in treatment group) NNT = 1/ARR You have 90 seconds to answer!

Calculate the Number Needed to Treat ARR = P(event occurring in control group) – P(event occurring in treatment group) NNT = 1/ARR ARR = 5.9% - 5.1% = 0.8% Important to realise that “standard-treatment” group is the control group, and “genotype-guided group” is the treatment group Remember: to calculate the NNT, you divide 1 by the ARR in decimal format (hence we have divided 1 by in this example) NNT = 1/0.008 = 125!

Summary Absolute Risk Reduction (ARR):
P(event occurring in control group) – P(event occurring in treatment group) Relative Risk Reduction (RRR): Number Needed to Treat (NNT): 1 ARR (in decimal format) Absolute Risk Reduction P (event occurring in control group) X 100

Sensitivity vs Specificity
Sensitivity = % of true positives that are identified as positives Specificity = % of true negatives that are identified as negatives Sensitivity = a/a+c Specificity = d/d+b Gold Standard Positive Negative Your Test a b c d

Analysing Experimental Data

Parametric vs Non-Parametric
Parametric Test for normally distributed data Non-Parametric Test for non-normally distributed data Can turn non-normal data into normal by plotting the log of the non-normal variable – I would advise against doing this though…

Statistical Tests Reason for test Parametric Test Non-Parametric Test Compare 2 samples from same population e.g. boys’ heights vs girls’ heights Unpaired t-test Mann-Whitney U test Compare 2 sets of observations on a single sample = “paired data” e.g. weight of babies before and after feed Paired t-test Wilcoxon matched-pairs test = one-tailed vs two-tailed t-test eplaination Paired data = each data point in one data set is related to one data point in another data set Always pick “two-tailed t-test”, rather than “one-tailed t-test”

Statistical Tests Reason for test Parametric Test Non-Parametric Test
Compare 3+ sets of observations on a single sample e.g. plasma glucose 1, 2 and 3hrs after a meal One-way ANOVA One-way ANOVA by ranks = “Kruskal Wallis Test” As above, but testing influence of 2 different variables e.g. if above differs between men and women Two-way ANOVA Two-way ANOVA by ranks = “Friedman Test” ANOVA stands for Analysis of Variance

Statistical Tests Must show r value and P value Reason for test
Parametric Test Non-Parametric Test Relationship between 2 continuous variables e.g. HBA1c vs triglycerides Pearson’s r Spearman’s rank correlation coefficient This graph was from my BSc project! Must show r value and P value

Statistical Tests Reason for test Parametric Test Non-Parametric Test
Relationship between 2 continuous variables e.g. HBA1c vs triglycerides Pearson’s r Spearman’s rank correlation coefficient Relationship between 2 continuous variables, allowing one value to predict the other e.g. how peak flow varies with height Regression by least-squares method N/A Relationship between 1 dependent variables and many predictor variables e.g. how BP is affected by age, BMI, salt intake Multiple regression by least-squares method N/A

What is “regression”? Regression = equation that allows one variable to predicted from another E.g. input patients’ heights into a computer to create an equation that predicts their weights Multiple regression = equation that allows one variable to predicted from many variables E.g. input patients’ heights, ages, sex, diet… into a computer to create an equation that predicts their weights You will read the word “regression” lots in papers – it is important to know what it means (roughly!)

Data Dredging = analysing your data based on various subgroups (age, gender, height, favourite colour) so that you find a significant result Bad because the more tests you do = more likely to find a result by chance (if P < 0.05, 1 in 20 statistically significant results are by chance!) Need to do Bonferroni Correction: 0.05 Number of tests you will do Example of data dredging: you try to identify a difference between a treatment group and control group. So you decided to stratify the groups based on many different variables, like gender, age, presence of certain risk factors… if you do enough tests, you are bound to find a difference in the groups (e.g. there could be a specific difference if you looked at groups stratified by presence of atopy) = new significance level

Question Which test would you use to compare patients’ blood pressures before and after they go on a rollercoaster? The data is not normally distributed. Unpaired t-test Paired t-test Mann-Whitney U test Wilcoxon matched-pairs test Harry Styles test

Question Which test would you use to compare patients’ blood pressures before and after they go on a rollercoaster? The data is not normally distributed. Unpaired t-test Paired t-test Mann-Whitney U test Wilcoxon matched-pairs test – paired, non-parametric Harry Styles test

Microsoft Excel for Statistics
T-test ANOVA (one-way and two-way) Linear regression YouTube tutorials are very helpful for this!

GraphPad Prism for Statistics
Can do most statistical tests on GraphPad Prism Easy to use Graphs look professional Get a 30-day free trial during your project The software remembers your computer, so you cannot sign up for multiple trials using different addresses! Ask the researchers on your team if they have a free pirated copy of GraphPad Prism (most do- even Heads of Department…)

Select your data and click “Analyse” Select your data

Click the appropriate test

Choose further options

Results sheet P value summary NS: P > 0.05 *: P < 0.05 **: P < 0.01 ***: P < 0.001

Mentimeter Champions

Any Questions? Please fill in the feedback form!

Data Analysis for the BSc Year

Similar presentations

Presentation on theme: "Data Analysis for the BSc Year"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Data Analysis for the BSc Year

Similar presentations

Presentation on theme: "Data Analysis for the BSc Year"— Presentation transcript:

Similar presentations

About project

Feedback