Statistical Analysis IB Diploma Biology Stephen Taylor Image: 'Hummingbird Checks Out Flower'

Slides:



Advertisements
Similar presentations
Statistical Analysis IB Diploma Biology Modified by Christopher Wilkinson from Stephen Taylor Image: 'Hummingbird Checks Out Flower'
Advertisements

Statistical Analysis IB Diploma BiologyIB Diploma Biology (HL/SL)
The Simple Regression Model
PSY 307 – Statistics for the Behavioral Sciences
Statistical Analysis. Purpose of Statistical Analysis Determines whether the results found in an experiment are meaningful. Answers the question: –Does.
AM Recitation 2/10/11.
BOT3015L Data analysis and interpretation Presentation created by Jean Burns and Sarah Tso All photos from Raven et al. Biology of Plants except when otherwise.
Answering questions about life with statistics ! The results of many investigations in biology are collected as numbers known as _____________________.
TOPIC 1 STATISTICAL ANALYSIS
Statistical Analysis Statistical Analysis
Topic 1: Statistical Analysis
Statistical Analysis A Quick Overview. The Scientific Method Establishing a hypothesis (idea) Collecting evidence (often in the form of numerical data)
STEM Fair Graphs & Statistical Analysis. Objectives: – Today I will be able to: Construct an appropriate graph for my STEM fair data Evaluate the statistical.
The Argument for Using Statistics Weighing the Evidence Statistical Inference: An Overview Applying Statistical Inference: An Example Going Beyond Testing.
Statistical Analysis Mean, Standard deviation, Standard deviation of the sample means, t-test.
Beak of the Finch Natural Selection Statistical Analysis.
User Study Evaluation Human-Computer Interaction.
Statistical Analysis Topic – Math skills requirements.
Statistical analysis Outline that error bars are a graphical representation of the variability of data. The knowledge that any individual measurement.
1 Psych 5500/6500 The t Test for a Single Group Mean (Part 1): Two-tail Tests & Confidence Intervals Fall, 2008.
Statistics - methodology for collecting, analyzing, interpreting and drawing conclusions from collected data Anastasia Kadina GM presentation 6/15/2015.
Data Collection and Processing (DCP) 1. Key Aspects (1) DCPRecording Raw Data Processing Raw Data Presenting Processed Data CompleteRecords appropriate.
Statistics in Biology. Histogram Shows continuous data – Data within a particular range.
Test for Significant Differences T- Tests. T- Test T-test – is a statistical test that compares two data sets, and determines if there is a significant.
Essential Question:  How do scientists use statistical analyses to draw meaningful conclusions from experimental results?
Statistics allow biologists to support the findings of their experiments.
Sampling  When we want to study populations.  We don’t need to count the whole population.  We take a sample that will REPRESENT the whole population.
1.1 Statistical Analysis. Learning Goals: Basic Statistics Data is best demonstrated visually in a graph form with clearly labeled axes and a concise.
Statistical analysis. Types of Analysis Mean Range Standard Deviation Error Bars.
Chapter 8 Parameter Estimates and Hypothesis Testing.
More statistics notes!. Syllabus notes! (The number corresponds to the actual IB numbered syllabus.) Put the number down from the syllabus and then paraphrase.
Statistical Analysis Topic – Math skills requirements.
Statistical Analysis. Null hypothesis: observed differences are due to chance (no causal relationship) Ex. If light intensity increases, then the rate.
Statistical Analysis Image: 'Hummingbird Checks Out Flower'
Data Analysis.
CHAPTER 27: One-Way Analysis of Variance: Comparing Several Means
Statistical analysis Why?? (besides making your life difficult …)  Scientists must collect data AND analyze it  Does your data support your hypothesis?
+ Data Analysis Chemistry GT 9/18/14. + Drill The crown that King Hiero of Syracuse gave to Archimedes to analyze had a volume of 575 mL and a mass of.
Excel How To Mockingbird Example BIO II Van Roekel.
MAKING MEANING OUT OF DATA Statistics for IB-SL Biology.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
Advanced Higher Biology Unit 3 Investigative Biology.
Statistical Analysis adapted from the work of Stephen Taylor.
Data Analysis. Qualitative vs. Quantitative Data collection methods can be roughly divided into two groups. It is essential to understand the difference.
The Data Collection and Statistical Analysis in IB Biology John Gasparini The Munich International School Part II – Basic Stats, Standard Deviation and.
The Data Collection and Statistical Analysis in IB Biology John Gasparini The Munich International School Part III – Hypothesis Testing with T-tests.
Welcome back!. May 12 th, 2014…TASIS Exam hall 3vI 3vI.
Statistical Analysis IB Topic 1. IB assessment statements:  By the end of this topic, I can …: 1. State that error bars are a graphical representation.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
Data measurement, probability and Spearman’s Rho
Statistical analysis.
AP Biology Intro to Statistics
Statistical Analysis IB Diploma Biology Stephen Taylor
Modify—use bio. IB book  IB Biology Topic 1: Statistical Analysis
Statistical analysis.
AP Biology Intro to Statistics
Statistics (0.0) IB Diploma Biology
Statistical Analysis - IB Biology - Mark Polko
STATISTICAL ANALYSIS.
EXAMPLES OF STATS FUNCTIONS
Statistical Analysis Determining the Significance of Data
STEM Fair Graphs & Statistical Analysis
TOPIC 1: STATISTICAL ANALYSIS
Elementary Statistics
Statistical Analysis Error Bars
Topic 1 and Data analysis
Statistical Analysis IB Topic 1.
STATISTICS Topic 1 IB Biology Miss Werba.
STATISTICAL ANALYSIS.
1.1 Statistical Analysis.
Presentation transcript:

Statistical Analysis IB Diploma Biology Stephen Taylor Image: 'Hummingbird Checks Out Flower' Found on flickrcc.net

Assessment StatementsObj State that error bars are a graphical representation of the variability of data.  Range and standard deviation show the variability/ spread in the data  95% Confidence Interval error bars suggest significance of difference where there is no overlap Calculate the mean and standard deviation of a set of values  Using Excel (Formula =STDEV(rawdata))  Using your calculator State that the term standard deviation (s) is used to summarize the spread of values around the mean, and that 68% of all data fall within (±) 1 standard deviation of the mean Explain how the standard deviation is useful for comparing the means and the spread of data between two or more samples.  A greater standard deviation shows a greater variability of data around the mean.  This can be used to infer reliability in methods or results Deduce the significance of the difference between two sets of data using calculated values for t and the appropriate tables.  Using t-values, t-tables and critical values  Directly calculating P values using Excel in lab reports Explain that the existence of a correlation does not establish that there is a causal relationship between two variables. 3 Assessment statements from: Online IB Biology Subject GuideOnline IB Biology Subject GuideCommand terms:

MrT’s Excel Statbook has guidance and ‘live’ examples of tables, graphs and statistical tests.

“Why is this Biology?” Variation in populations. Variability in results. affects Confidence in conclusions. The key methodology in Biology is hypothesis testing through experimentation. Carefully-designed and controlled experiments and surveys give us quantitative (numeric) data that can be compared. We can use the data collected to test our hypothesis and form explanations of the processes involved… but only if we can be confident in our results. We therefore need to be able to evaluate the reliability of a set of data and the significance of any differences we have found in the data. Image: 'Transverse section of part of a stem of a Dead-nettle (Lamium sp.) showing+a+vascular+bundle+and+part+of+the+cortex' Found on flickrcc.net

“Which medicine should I prescribe?” Image from: Donate to Medecins Sans Friontiers through Biology4Good:

“Which medicine should I prescribe?” Image from: Donate to Medecins Sans Friontiers through Biology4Good: Generic drugs are out-of-patent, and are much cheaper than the proprietary (brand-name) equivalents. Doctors need to balance needs with available resources. Which would you choose?

“Which medicine should I prescribe?” Image from: Donate to Medecins Sans Friontiers through Biology4Good: Means (averages) in Biology are almost never good enough. Biological systems (and our results) show variability. Which would you choose now?

Hummingbirds are nectarivores (herbivores that feed on the nectar of some species of flower). In return for food, they pollinate the flower. This is an example of mutualism – benefit for all. As a result of natural selection, hummingbird bills have evolved. Birds with a bill best suited to their preferred food source have the greater chance of survival. Photo: Archilochus colubris, from wikimedia commons, by Dick Daniels.wikimedia commonsDick Daniels

Researchers studying comparative anatomy collect data on bill-length in two species of hummingbirds: Archilochus colubris (red-throated hummingbird) and Cynanthus latirostris (broadbilled hummingbird). To do this, they need to collect sufficient relevant, reliable data so they can test the Null hypothesis (H 0 ) that: “there is no significant difference in bill length between the two species.” Photo: Archilochus colubris (male), wikimedia commons, by Joe Schneidwikimedia commons

The sample size must be large enough to provide sufficient reliable data and for us to carry out relevant statistical tests for significance. We must also be mindful of uncertainty in our measuring tools and error in our results. Photo: Broadbilled hummingbird (wikimedia commons).wikimedia commons

The mean is a measure of the central tendency of a set of data. Table 1: Raw measurements of bill length in A. colubris and C. latirostris. Bill length (±0.1mm) nA. colubrisC. latirostris Mean s Calculate the mean using: Your calculator (sum of values / n) Excel =AVERAGE(highlight raw data) n = sample size. The bigger the better. In this case n=10 for each group. All values should be centred in the cell, with decimal places consistent with the measuring tool uncertainty.

The mean is a measure of the central tendency of a set of data. Table 1: Raw measurements of bill length in A. colubris and C. latirostris. Bill length (±0.1mm) nA. colubrisC. latirostris Mean s Raw data and the mean need to have consistent decimal places (in line with uncertainty of the measuring tool) Uncertainties must be included. Descriptive table title and number.

DELETE X DELETE X

Descriptive title, with graph number. Labeled point Y-axis clearly labeled, with uncertainty. Make sure that the y-axis begins at zero. x-axis labeled

From the means alone you might conclude that C. latirostris has a longer bill than A. colubris. But the mean only tells part of the story.

Standard deviation is a measure of the spread of most of the data. Table 1: Raw measurements of bill length in A. colubris and C. latirostris. Bill length (±0.1mm) nA. colubrisC. latirostris Mean s Standard deviation can have one more decimal place. =STDEV (highlight RAW data). Which of the two sets of data has: a.The longest mean bill length? a.The greatest variability in the data?

Standard deviation is a measure of the spread of most of the data. Table 1: Raw measurements of bill length in A. colubris and C. latirostris. Bill length (±0.1mm) nA. colubrisC. latirostris Mean s Standard deviation can have one more decimal place. =STDEV (highlight RAW data). Which of the two sets of data has: a.The longest mean bill length? a.The greatest variability in the data? C. latirostris A. colubris

Standard deviation is a measure of the spread of most of the data. Error bars are a graphical representation of the variability of data. Which of the two sets of data has: a.The highest mean? a.The greatest variability in the data? A B Error bars could represent standard deviation, range or confidence intervals.

Put the error bars for standard deviation on our graph.

Delete the horizontal error bars

Title is adjusted to show the source of the error bars. This is very important. You can see the clear difference in the size of the error bars. Variability has been visualised. The error bars overlap somewhat. What does this mean?

The overlap of a set of error bars gives a clue as to the significance of the difference between two sets of data. Large overlap No overlap Lots of shared data points within each data set. Results are not likely to be significantly different from each other. Any difference is most likely due to chance. No (or very few) shared data points within each data set. Results are more likely to be significantly different from each other. The difference is more likely to be ‘real’.

Our results show a very small overlap between the two sets of data. So how do we know if the difference is significant or not? We need to use a statistical test. The t-test is a statistical test that helps us determine the significance of the difference between the means of two sets of data.

The Null Hypothesis (H 0 ): “There is no significant difference.” This is the ‘default’ hypothesis that we always test. In our conclusion, we either accept the null hypothesis or reject it. A t-test can be used to test whether the difference between two means is significant. If we accept H 0, then the means are not significantly different. If we reject H 0, then the means are significantly different. Remember: We are never ‘trying’ to get a difference. We design carefully-controlled experiments and then analyse the results using statistical analysis.

P value = confidence 90%95%98%99% degrees of freedom We can calculate the value of ‘t’ for a given set of data and compare it to critical values that depend on the size of our sample and the level of confidence we need. Example two-tailed t-table. “Degrees of Freedom (df)” is the total sample size minus two. What happens to the value of P as the confidence in the results increases? What happens to the critical value as the confidence level increases? “critical values”

P value = confidence 90%95%98%99% degrees of freedom We can calculate the value of ‘t’ for a given set of data and compare it to critical values that depend on the size of our sample and the level of confidence we need. Example two-tailed t-table. “Degrees of Freedom (df)” is the total sample size minus two*. We usually use P<0.05 (95% confidence) in Biology, as our data can be highly variable *Simple explanation: we are working in two directions – within each population and across populations. “critical values”

2-tailed t-table source:

t was calculated as 2.15 (this is done for you) t cv 2.15 If t < cv, accept H 0 (there is no significant difference) If t > cv, reject H 0 (there is a significant difference) 2-tailed t-table source:

0.05 t was calculated as 2.15 (this is done for you) t cv 2.15 If t < cv, accept H 0 (there is no significant difference) If t > cv, reject H 0 (there is a significant difference) 2-tailed t-table source:

t was calculated as 2.15 (this is done for you) t cv 2.15 > If t < cv, accept H 0 (there is no significant difference) If t > cv, reject H 0 (there is a significant difference) 2-tailed t-table source:

t was calculated as 2.15 (this is done for you) t cv 2.15 > If t < cv, accept H 0 (there is no significant difference) If t > cv, reject H 0 (there is a significant difference) Conclusion: “There is a significant difference in the wing spans of the two populations of birds.” 2-tailed t-table source:

2-tailed t-table source:

2-tailed t-table source:

tailed t-table source: “There is no significant difference in the size of shells between north-side and south-side snail populations.”

2-tailed t-table source:

tailed t-table source: “There is a significant difference in the resting heart rates between the two groups of swimmers.”

Excel can jump straight to a value of P for our results. One function (=ttest) compares both sets of data. As it calculates P directly (the probability that the difference is due to chance), we can determine significance directly. In this case, P= This is much smaller than 0.005, so we are confident that we can: reject H 0. The difference is unlikely to be due to chance. Conclusion: There is a significant difference in bill length between A. colubris and C. latirostris.

Two tails: we assume data are normally distributed, with two ‘tails’ moving away from mean. Type 2 (unpaired): we are comparing one whole population with the other whole population. (Type 1 pairs the results of each individual in set A with the same individual in set B).

95% Confidence Intervals can also be plotted as error bars. These give a clearer indication of the significance of a result: Where there is overlap, there is not a significant difference Where there is no overlap, there is a significant difference. If the overlap (or difference) is small, a t-test should still be carried out. no overlap =CONFIDENCE.NORM(0.05,stdev,samplesize) e.g =CONFIDENCE.NORM(0.05,C15,10)

Error bars can have very different purposes. Standard deviation You really need to know this Look for relative size of bars Used to indicate spread of most of the data around the mean Can imply reliability of data 95% Confidence Intervals Adds value to labs where we are looking for differences. Look for overlap, not size Overlap  no sig. diff. No overlap  sig. dif.

Interesting Study: Do “Better” Lecturers Cause More Learning? Find out more here: Students watched a one-minute video of a lecture. In one video, the lecturer was fluent and engaging. In the other video, the lecturer was less fluent. They predicted how much they would learn on the topic (genetics) and this was compared to their actual score. (Error bars = standard deviation). n=21

Interesting Study: Do “Better” Lecturers Cause More Learning? Find out more here: Students watched a one-minute video of a lecture. In one video, the lecturer was fluent and engaging. In the other video, the lecturer was less fluent. They predicted how much they would learn on the topic (genetics) and this was compared to their actual score. (Error bars = standard deviation). Is there a significant difference in the actual learning? n=21

Interesting Study: Do “Better” Lecturers Cause More Learning? Find out more here: Evaluate the study: 1. What do the error bars (standard deviation) tell us about reliability? 2.How valid is the study in terms of sufficiency of data (population sizes (n))? n=21

Dog fleas jump higher that cat fleas, winner of the IgNobel prize for Biology,

P value = confidence 90%95%98%99%99.50% degrees of freedom degrees of freedom degrees of freedom degrees of freedom

Cartoon from: Correlation does not imply causation, but it does waggle its eyebrows suggestively and gesture furtively while mouthing "look over there."

From MrT’s Excel Statbook.MrT’s Excel Statbook

Interpreting Graphs: See – Think – Wonder See: What is factual about the graph? What are the axes? What is being plotted What values are present? Think: How is the graph interpreted? What relationship is present? Is cause implied? What explanations are possible and what explanations are not possible? Wonder: Questions about the graph. What do you need to know more about? See – Think - Wonder Visible Thinking Routine

Diabetes and obesity are ‘ risk factors ’ of each other. There is a strong correlation between them, but does this mean one causes the other?

Correlation does not imply causality. Pirates vs global warming, from

Correlation does not imply causality. Pirates vs global warming, from Where correlations exist, we must then design solid scientific experiments to determine the cause of the relationship. Sometimes a correlation exist because of confounding variables – conditions that the correlated variables have in common but that do not directly affect each other. To be able to determine causality through experimentation we need: One clearly identified independent variable Carefully measured dependent variable(s) that can be attributed to change in the independent variable Strict control of all other variables that might have a measurable impact on the dependent variable. We need: sufficient relevant, repeatable and statistically significant data. Some known causal relationships: Atmospheric CO 2 concentrations and global warming Atmospheric CO 2 concentrations and the rate of photosynthesis Temperature and enzyme activity

Flamenco Dancer, by Steve Corey

i-Biology.net This is a Creative Commons presentation. It may be linked and embedded but not sold or re-hosted. Please consider a donation to charity via Biology4Good. Click here for more information about Biology4Good charity