Introduction to Statistics Biomedical Sciences Degrees Honours Students Derek Scott

Slides:



Advertisements
Similar presentations
BM3502 Neuroscience & Neuropharmacology Data Handling Dr Derek Scott.
Advertisements

CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Applied statistics Katrin Jaedicke
Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression.
Biol 500: basic statistics
Introduction to Probability and Statistics Linear Regression and Correlation.
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
PSY 307 – Statistics for the Behavioral Sciences
Inferential Statistics
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
AM Recitation 2/10/11.
Hypothesis Testing – Examples and Case Studies
CENTRE FOR INNOVATION, RESEARCH AND COMPETENCE IN THE LEARNING ECONOMY Session 2: Basic techniques for innovation data analysis. Part I: Statistical inferences.
Statistical Analysis Statistical Analysis
Choosing and using statistics to test ecological hypotheses
Introduction to Statistics Steven A. Jones Biomedical Engineering Louisiana Tech University (Created for our NSF-funded Research Experiences in Micro/Nano.
Statistics & Biology Shelly’s Super Happy Fun Times February 7, 2012 Will Herrick.
PARAMETRIC STATISTICAL INFERENCE
Statistical Analysis Mean, Standard deviation, Standard deviation of the sample means, t-test.
Education Research 250:205 Writing Chapter 3. Objectives Subjects Instrumentation Procedures Experimental Design Statistical Analysis  Displaying data.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 10 Comparing Two Populations or Groups 10.2.
Statistical Analysis. Statistics u Description –Describes the data –Mean –Median –Mode u Inferential –Allows prediction from the sample to the population.
Statistical analysis Prepared and gathered by Alireza Yousefy(Ph.D)
Research Project Statistical Analysis. What type of statistical analysis will I use to analyze my data? SEM (does not tell you level of significance)
10.2 Tests of Significance Use confidence intervals when the goal is to estimate the population parameter If the goal is to.
Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.
Measures of central tendency are statistics that express the most typical or average scores in a distribution These measures are: The Mode The Median.
PCB 3043L - General Ecology Data Analysis. OUTLINE Organizing an ecological study Basic sampling terminology Statistical analysis of data –Why use statistics?
Copyright © 2012 Pearson Education. All rights reserved © 2010 Pearson Education Copyright © 2012 Pearson Education. All rights reserved. Chapter.
Essential Question:  How do scientists use statistical analyses to draw meaningful conclusions from experimental results?
Review Hints for Final. Descriptive Statistics: Describing a data set.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Medical Statistics as a science
Chapter 8 Parameter Estimates and Hypothesis Testing.
Chapter 12 Confidence Intervals and Hypothesis Tests for Means © 2010 Pearson Education 1.
Stats Lunch: Day 3 The Basis of Hypothesis Testing w/ Parametric Statistics.
Data Analysis.
PCB 3043L - General Ecology Data Analysis.
Inferential Statistics Introduction. If both variables are categorical, build tables... Convention: Each value of the independent (causal) variable has.
Introducing Communication Research 2e © 2014 SAGE Publications Chapter Seven Generalizing From Research Results: Inferential Statistics.
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
Statistical Analysis II Lan Kong Associate Professor Division of Biostatistics and Bioinformatics Department of Public Health Sciences December 15, 2015.
Introduction to Medical Statistics. Why Do Statistics? Extrapolate from data collected to make general conclusions about larger population from which.
MAKING MEANING OUT OF DATA Statistics for IB-SL Biology.
Inference About Means Chapter 23. Getting Started Now that we know how to create confidence intervals and test hypotheses about proportions, it’d be nice.
Learning Objectives After this section, you should be able to: The Practice of Statistics, 5 th Edition1 DESCRIBE the shape, center, and spread of the.
HYPOTHESIS TESTING FOR DIFFERENCES BETWEEN MEANS AND BETWEEN PROPORTIONS.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
10.1 Estimating with Confidence Chapter 10 Introduction to Inference.
Inferential Statistics Psych 231: Research Methods in Psychology.
Choosing and using your statistic. Steps of hypothesis testing 1. Establish the null hypothesis, H 0. 2.Establish the alternate hypothesis: H 1. 3.Decide.
CHAPTER 9 Testing a Claim
Modify—use bio. IB book  IB Biology Topic 1: Statistical Analysis
CHAPTER 10 Comparing Two Populations or Groups
PCB 3043L - General Ecology Data Analysis.
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
CHAPTER 9 Testing a Claim
Statistical Analysis Error Bars
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
Psych 231: Research Methods in Psychology
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
Presentation transcript:

Introduction to Statistics Biomedical Sciences Degrees Honours Students Derek Scott

Why use statistics? Statistics are used to analyse populations and predict changes in terms of probability. Normally, a representative sample is taken, large enough to make likely conclusions about the population as a whole. Descriptive statistics: summarise the data and describe the population. These values allow you to see how large and how variable the data are. Inferential statistics: propose null hypothesis and endeavour to disprove it. By looking at these, you can check for error.

When analysing data, you want to make the strongest possible conclusion from limited amounts of data. To do this, you need to overcome 2 problems: Important differences can be obscured by biological variability and experimental error. This makes it difficult to distinguish real differences from random variability. The human brain excels at finding patterns, even from random data. Our natural inclination (especially with our own data) is to conclude that any differences are real, and to minimise the contribution of random variability. Statistical rigor prevents you from making this mistake.

Errors Bias or systematic error: Data go in a predictable direction perhaps due to experimental design or human errors. Can remove the errors if you identify them. Random error: Unpredictable errors. Can’t get rid of these. Usually you will quote a measure of error with your data (e.g. standard deviation, standard error of the mean) EXAMPLE: The mean height of a student in BM4005 is: 1.71 ± 0.20 (43) metres. MEAN VALUESD or SEMn, the number of samples Units!!!

Independent Sampling 1 Measure BP in rats, 5 rats per group. Measure BP 3 times in each animal. You do not have 15 independent measurements, since triplicate measurements in each animals will be closer to one another than to those in other animals. You should average values from each rat. Now have 5 independent mean values.

Independent Sampling - 2 Perform a biochemical test 3 times, each time in triplicate. Do not have 9 independent values, as an error in preparing the reagents for 1 experiment could affect all 3 triplicates. Average the triplicates, and you have 3 independent mean values.

Doing a human exercise study. Recruit 10 people from the inner-city, and 10 people from the countryside. Have not independently sampled 20 subjects from one population. Data from inner-city subjects may be closer to each other than to the data from rural subjects. You have sampled from 2 populations, and need to account for this in your analysis. Independent Sampling - 3

Gaussian (Normal) Distribution Data usually follow a bell-shaped distribution called Gaussian distribution. t-tests and ANOVA tests assume that the population follows an approximately Gaussian distribution. For example, of we measure the height of everyone in 4 th year and plot this, most people would fall in the middle of the curve, with a few at the bottom end, and a few at the top end of the curve. For Gaussian distribution, we use parametric tests

Gaussian Distribution “Bell-shaped” curve

Outliers When analysing data, some values can be very different the rest. Tempting to delete it from analysis. Was the value typed in correctly? Was there an experimental problem with that value? Is it due to biological diversity? What if answers to these questions are no?

Outliers If outlier is due to chance, keep it in the data set. If it is due to a mistake (e.g. bad pipetting, voltage spike, apparatus problem) then you must remove it from the analysis. If you want to be absolutely sure whether the outlier is due to chance or not, there are specific statistical tests you can do, but usually these basic checks are enough to decide.

Mean Sample mean will probably not be exactly the population mean. Mean is more accurate if you have a bigger sample size with a low variability. You may calculate Confidence Intervals (CI’s) telling you the area in which 95% of the population will fall. EXAMPLE: Mean height of a student in BM4005 is 1.71 metres. The 95% confidence limits for this value are 1.5 and 1.8 metres. These are the upper and lower heights between which 95% of the class will fall.

Confidence Intervals Nothing magical about 95%. You could do it for any value you liked – 99%, 90% etc. If you set a value of 99%, then the intervals would be wider because 99% of the class’s heights must fall within that range. 95% confidence limits mean you have a reasonable level of confidence that the true population mean lies within that range.

Standard Deviation (SD) Quantifies variability If data follow Gaussian distribution, then 68% of values lie within one SD of mean (on either side) and 95% of values lie within 2 SD’s of the mean. So, as a rule of thumb, if 2 points on a graph are more than 2 SD’s away from each other, they are significantly different. Expressed in same units as data

Standard Error of the Mean (SEM) Measure of how far sample mean is likely to be from the true population mean. SEM = SD/  n Smaller than SD, so used more to give smaller error bars! SD quantifies scatter – how much values vary from each other. Doesn’t really change much even if you have a bigger sample size. SEM quantifies how accurately you know the true mean of the population. SEM gets smaller as sample gets larger

P Values P ValueWordingSymbol > 0.05Not significantns 0.01 to 0.05Significant* to 0.01Very significant** < Extremely significant ***

Student’s t-test Used to compare the means of two groups of data. Paired t-test: control expt. and treatment done on same person, animal or cell etc. Unpaired t-test: control done on 1 group of subjects, with the treatment being done on another separate group. Can be 1- or 2-tailed.

Iron and zinc evoke electrogenic responses that are pH- dependent IRON (100  M)ZINC (100  M) Krebs pH 6.0Krebs pH 7.4

Iron- and zinc-evoked transport is temperature- dependent  4 o C  37 o C IRONZINC

Paired or Unpaired? Choose paired if the 2 columns of data are matched, e.g. You measure weight before and after an intervention in the same subjects. You recruit subjects as pairs, matched for variables such as age, ethnic group, disease severity. One of the pair gets one treatment, the other gets an alternative treatment. You perform the control experiment in one cell or piece of tissue, and then apply a drug. You measure the effect of the drug in the same cell or tissue. Shouldn’t be based on the variable you are comparing. For example, if measuring BP, you can match subjects based on their age or postcode, but not on their BP’s.

Student’s t-test You will probably always use a 2-tailed t-test. 2-tailed test just asks whether there is a difference between the 2 means. 1-tailed test predicts whether: –Mean 1 is bigger than Mean 2 or –Mean 2 is bigger than Mean 1. For 1 tailed you must know which mean will be bigger before you start – not usually possible Stick to a 2-tailed t-test to be safe!!!

Analysis of Variance (ANOVA) Used to compare means of 3 or more groups. Again, can have matched (paired) or unmatched (unpaired) values. You will probably only use 1-way ANOVA EXAMPLE: Your null hypothesis is that the average BP for 4 men is equal. ANOVA can compare each subject’s BP and say if they are different or not.

Features of ANOVA ANOVA produces an F value which tells you how much variation there is in your sample. Higher F value means more variation. Dunnett’s post test allows you to compare against 1 group e.g. A v B, A v C, A v D. Handy if A is the control group. Tukey’s post test allows you to compare all columns against one another just to check for any differences between any groups. Good way of finding significant differences that you may not have expected.

The effect of non-selective protein kinase inhibition with staurosporine IRONZINC  8-Br cGMP + Staurosporine  Staurosporine (0.5  M)  8-Br cGMP (100  M)  Control

Non-Gaussian Distribution Use non-parametric tests for these unusual situations which rank data from low to high and analyse distribution of ranks. Less powerful than parametric but used when values are too low or high to measure by assigning arbitrary values. Also used if outcome is a rank or score with only a few categories. P values are usually higher.

Skewness

Correlation Correlation doesn’t tell you about the cause of the effect, it just tells you that there is a link between value X and value Y. The nearer the R value is to 1, the better the correlation. +ve correlation-ve correlation

Regression Regression calculates a line of best fit. Often used to calculate a standard curve which you could use to estimate value x if you know value y. Unknowns must fall within your standard curve’s range.

Correlation and regression A word of caution about doing regression and finding correlations. Just because you can draw a line of best fit through some points and make quite a good straight line, it does not necessarily mean there is a relationship. Correlation does not necessarily imply causation! For example, the consumption of tropical fruit in the UK since WW2 has increased, and so has the birth rate in the UK. If I plot this on a graph, and did a regression, I would probably get a nice straight line as both increase together. I would probably also show there is a good correlation. This does not mean that I can say that eating tropical fruit improves your fertility!!! Use some common sense when interpreting your data!

Summary This is just a basic introduction. For extra information, try the Help files on Graphpad Prism (on the University PC’s) If you end up doing an Honours project with certain types of data (e.g. collecting psychological data, epidemiological studies etc.), your supervisor should inform you about any special tests/calculations they use for that type of data. Finally, if you are still unsure, make it clear to your supervisor that you do not understand why or what you are doing.